Frontiers in Statistical Quality Control 8
Hans-Joachim Lenz Peter-Theodor Wilrich Editors
Frontiers in Statistical Quality Control 8 With 92 Figures and 93 Tables
Physica-Verlag A Springer Company
Professor Dr. Hans-Joachim Lenz
[email protected] Professor Dr. Peter-Theodor Wilrich
[email protected] Freie Universitåt Berlin Institut fçr Statistik und Úkonometrie Garystraûe 21 14195 Berlin Germany
ISBN-10 ISBN-13
3-7908-1686-8 Physica-Verlag Heidelberg New York 978-3-7908-1686-0 Physica-Verlag Heidelberg New York
Cataloging-in-Publication Data applied for Library of Congress Control Number: 2006921315 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Physica-Verlag. Violations are liable for prosecution under the German Copyright Law. Physica is a part of Springer Science+Business Media springer.com ° Physica-Verlag Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Soft-Cover-Design: Erich Kirchner, Heidelberg SPIN 11611004
88/3153-5 4 3 2 1 0 ± Printed on acid-free and non-aging paper
Editorial
The VIIIth International Workshop on "Intelligent Statistical Quality Control" took place in Warsaw, Poland, and was hosted by Professor Dr. Olgierd Hryniewicz, Systems Research Institute of the Polish Academy of Sciences and Warsaw School of Information Technology, Warsaw, Poland. The workshop itself was jointly organized by Professor Dr. 0 . Hryniewicz, Professor Dr. H.-J. Lenz, Professor Dr. P.-T. Wilrich, Dr. P. Grzegorzewski, Edyta Mrowka and Maciej Romaniuk. The workshop papers integrated in this volume are divided into three main parts: Part I: "General Aspects of SQC Methodology", Part 2: "On-line Control" with subchapters "Sampling Plans", "Control Charts" and "Monitoring", and Part 3: "Off-line Control" including Data Analysis, Calibration and Experimental Design. In Part 1 "General Aspects of SQC Methodology" von Collani and Palcat analyze "How Some ISO-Standards Complicate Quality Improvement". They compare the aims of ISOStandards for QC with the aims of continuous quality improvement. Due to a lack of compatibility different QC procedures are proposed. In Part 2 "On-line Control" there are fifteen papers. It starts with two papers on "Sampling Plans". Hryniewicz considers "Optimal Two-Stage Sequential Sampling Plans by Attributes". Acceptance sampling by attributes requires large samples when the fraction of nonconforming items in sampled lots or processes is very low. Wald's sequential sampling plans have been designed in order to meet this situation. Hryniewicz proposes restricted, curtailed sequential sampling plans for attributes. The plans fulfil pre-specified statistical requirements for risks, while offering minimal sampling efforts. Palcat reviews three-class sampling plans useful for legal metrology studies in his paper entitled "ThreeClass Sampling Plans - A Review with Applications". He reviews the key features of the three-class sampling plan theory and discusses some applications, where such plans would be effective for QC. The author examines applications, which are specific to the field of legal metrology. He closes with case studies where isolated lots are common and currently used methods are problematic. Control Charting has been an integral part of On-line Control, and there is evidence that this will continue. Therefore almost one half of the papers focus on "Control Charts". Bodnar and Schmid in "CUSUM Control Schemes for Multivariate Time Series" extend multivariate CUSUM charts to VARMA processes with Gaussian noise superimposed. They consider both modified control charts and residuals charts. By an extensive Monte Carlo study they compare them with the multivariate EWMA chart (Kramer and Schmid 1997). Knoth pays special attention to the correct design goal when control charts are run. His paper is entitled "The Art of Evaluating Monitoring Schemes - How to Measure the Performance of Control Charts". The author asks for caution when using the "the minimal out-of-control ARL" as a design criterion of monitoring schemes and advocates the "minimal steady-state ARL" from the viewpoint of features of the steady-state delay distribution. Morais and Pacheco present some striking examples of joint (p, 0)-schemes in "Misleading Signals in Joint Schemes for p and o".They show that the occurrence of misleading signals should alert the quality staff on the shop floor, and the practioneers should be bothered. Mrdwka and Grzegorzewski contribute to a new design of control
charts with a paper on "The FrCchet Control Charts". They suggest the FrCchet distance for simultaneously monitoring of process level and spread. Their new chart behaves comparable to classic control charts if changes either in process level or in process spread only are observed. However, it is much better than a combined (7- $-chart if simultaneous disturbances of the process level and spread happen. In their paper entitled "Reconsidering Control Charts in Japan" Nishinn, Kuzuya and Ishii study the role of causality and its relation to goals as target functions of control charting. Machine capability improvements due to advanced production technology have resulted in variance reduction within subgroups. They note that part of the variance between subgroups can be included into the variance due to chances. In a case study they show that a measurement characteristic specified by a related Standard is not necessarily appropriate for the control characteristic. Pokropp, Seidel, Begun, Heidenreich and Sever monitor police activities in "Control Charts for the Number of Children Injured in Traffic Accidents". They specify a generalised linear model (GLM) with Poisson counts. Parameter estimation is based on data, which represents the daily number of injuries. Seasonal effects are considered. Control limits are computed by Monte-Carlo simulation of the underlying mixing distributions in order to detect deviations from the police target values for various periods of interest. Reynolds jr. and Stoumbos take a look at process deviations and follow up the rational subgroup concept in "A New Perspective on the Fundamental Concept of Rational Subgroups". Control charts are usually based on a sampling interval of fixed length. They investigate the question whether it is better or not to use sample sizes n = 1 or n > I and to select either concentrated or dispersed sampling. A tandem chart to control p and o is investigated. They conclude that the best overall performance is obtained by taking samples of n = l and using an EWMA or CUSUM chart combination. The Shewhart chart combination with the best overall performance is based on n > 1. Saniga, McWilliams, Dnvis and Lucas investigate "Economic Advantages of CUSUM Control Charts for Variables". Their view on an economic CUSUM design is more general than the scope of earlier publications on this topic. ARLs are calculated using the Luceno and Puig-Pey (2002) algorithm in combination with a Nelder Mead search procedure. The policy decision of choosing a CUSLJM chart or a Shewhart chart is addressed. Suzuki, Harada and Ojima present a study on "Choice of Control Interval for Controlling Assembly Processes". Time series models are used for effective process control of specific assembly processes, especially, if the number of products is high. Influential factors like the control interval or the dead time of the assembly process are considered. Yasui, Ojima and Suzuki in "Generalisation of the Run Rules for the Shewhart Control Charts" extend Shewart's 3sigma rule -and propose two new rules based on sequences of observations. The performance of such modifications is evaluated under several out-of-control scenarios. Part 2 closes with three papers on "Monitoring". Andersson in her contribution to "Robust On-Line-Turning Point Detection. The Influence of Turning Point Characteristics" is interested in turning point problems of cyclical processes. She develops and evaluates the methodology for on-line detection of turning points in production processes by using an approximate ML estimation technique combined with a nonparametric approach. Iwersen and Melgnard in 'Specification Setting for Drugs in the Pharmaceutical Industry" discuss the practical implications of setting and maintaining specifications for drugs in the pharmaceutical industry. These include statistical process control limits, release limits, shelf life limits and in-use limits. The challenge is to make the limits consistent and practical. The approach involves normal linear mixed models and the Arrhenius model, a kinetic model, which describes for example the temperature dependence on drug degradation. In "Monitoring a Sequencing Batch Reactor for the Treatment of Wastewater by a Combination of Multivariate Statistical Process Control and a Classification
VII
Technique" Ruiz, Colomer amd Melendez combine multivariate SPC and a specially tailored classification technique in order to monitor a wastewater treatment plant. Part 3 "Off-line Control" includes five papers. Gbb discusses in "Data Mining and Statistical Control - A Review and Some Links" statistical quality control and its relation to very large (Terabytes) databases of operational databases sampled from industrial processes. He strongly advocates for adoption of techniques for handling and exploring large data sets, i.e. OLTP databases and (OLAP) data warehouses in industry. He reviews the links between data mining techniques and statistical quality control and sketches ways of reconciling these disciplines. Grzegorzewski and Mrdwka consider the calibration problem in which the corresponding loss function is no more piecewise constant as in Ladany (2001). In their paper on "Optimal Process Calibration under Nonsymmetric Loss Function" they consider the problem of how to set up a manufacturing process in order to make it capable. They propose an optimal calibration method for such loss functions. The suggested calibration procedure depends on the process capability index C p. Ojima, Yasui, Feng, Suzuki and Hararin are concerned with "The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments". They apply a canonical form of generalised staggered nested designs, and the probability of the occurrence of negative LS estimates of variance components is evaluated. Some practical hints are derived for the necessary number of laboratories involved in such problems. Koyama in "Statistical Methods Applied to a Semiconductor Manufacturing Process" uses a L16(2") orthogonal design and presents a semi-conductor factory scenario where new types of semiconductors are to be manufactured very shortly after the design. The lack of time causes small data sets as well as a lot of missing values. Finally, Vining and Kowalski in "An Overview of Composite Designs Run as Split-Plots" firstly summarise the results of Vining, Kowalski, and Montgomery (2004) and Vining, Parker, and Kowalski (2004). The authors secondly illustrate how to modify standard central composite designs and composite designs based on Plackett-Burman designs to accommodate the split-plot structure. The paper concludes with a walk through a fully worked-out example. The impact of any workshop is mainly shaped by the quality of papers, which are presented at the meeting, revised later and finally submitted. We would like to express our deep gratitude to the following members of the scientific programme committee, who did an excellent job with respect to the recruiting of invited speakers as well as refereeing all the submitted papers: Mr David Baillie, United Kingdom Prof. Elart von Collani, Germany Prof. Olgierd Hryniewicz, Poland Prof. Hans-J. Lenz, Germany Prof. Yoshikazu Ojima, Japan Prof. Poul Thyregod, Denmark Prof. Peter-Th. Wilrich, Germany Prof. William H. Woodall, U.S.A. We would like to close with our cordial thanks to Mrs. Angelika Wnuk, Institute of Production, Information Systems and Operations Research, Free University Berlin, who assisted us to clean up and to integrate WINWORD papers.
VIII
We gratefully acknowledge financial support of the Department of Economics, Institute of Statistics and Econometrics, and Institute of Production, Information Systems and Operations Research of the Free University of Berlin, Germany, which made it possible to get this volume put to press. Moreover, we again thank the Physica-Verlag, Heidelberg, for his continuing efficient collaboration. On behalf of all participants, the editors would like to thank Professor Dr. Olgierd Hryniewicz and his staff for their superb hospitality, the perfect organisation, and the stimulating scientific atmosphere. We are happy and proud to announce that the International Workshop on Intelligent Statistical Quality Control will be continued in 2007. Berlin, November 2005
Hans - J. Lenz Peter-Th. Wilrich
Contents PART 1:
GENERAL ASPECTS OF SQC METHODOLOGY
How Some IS0 Standards Complicate Quality Improvement E. von Collani, F. A. Palcat ...................................................................... PART 2:
3
ON-LINE CONTROL
2.1 Sampling Plans Optimal Two-Stage Sequential Sampling Plans by Attributes 0. Hryniewicz ................................................................................... 21 Three-Class Sampling Plans: A Review with Applications F. A. Palcat ........................................................................................
34
2.2 Control Charts CUSUM Control Schemes for Multivariate Time Series M. Bodnar, W. Schmid ...........................................................................
55
--
The Art of Evaluating Monitoring Schemes How to Measure the Performance of Control Charts? S. Knoth .............................................................................................
74
Misleading Signals in Joint Schemes for p and o M. C. Morais, A. Pacheco ........................................................................100 The Fr6chet Control Charts E. Mrowka, P. Grzegorzewski
..................................................................
123
Reconsidering Control Charts in Japan K. Nishina, K. Kuzuya, N. Ishii ..................................................................
136
Control Charts for the Number of Children Injured in Traffic Accidents F. Pokropp, W. Seidel, A. Begun, M. Heidenreich, K. Sever ................................ 151 A New Perspective on the Fundamental Concept of Rational Subgroups M. R. Reynolds, Jr., Z. G. Stoumbos ..........................................................
172
Economic Advantages of CUSUM Control Charts for Variables E. M. Saniga, T. P. McWilliams, D. J. Davis, J. M. Lucas .................................. 185 Choice of Control Interval for Controlling Assembly Processes T. Suzuki, T. Harada, Y. Ojima ................................................................... 199
Generalization of the Run Rules for the Shewhart Control Charts S. Yasui, Y. Ojima, T. Suzuki ................................................................... 207 2.3 Monitoring
Robust On-Line Turning Point Detection. The Influence of Turning Point Characteristics E. Andersson ............................ ... ............................................. 223 Specification Setting for Drugs in the Pharmaceutical Industry J. Iwersen, H. Melgaard .......................................................................... 249 Monitoring a Sequencing Batch Reactor for the Treatment of Wastewater by a Combination of Multivariate Statistical Process Control and a Classification Technique M. Ruiz, J. Colomer, J. Melendez ............................................................... 263
PART 3 :
OFF-LINE CONTROL
Data Mining and Statistical Control - A Review and Some Links ........... ........................................................ 285 R. Gob ............................. Optimal Process Calibration under Nonsymmetric Loss Function P. Grzegorzewski, E. Mrowka ................................................................... 309 The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments Y. Ojima, S. Yasui, Feng L., T. Suzuki, T. Harada .......................................... 322 Statistical Methods Applied to a Semiconductor Manufacturing Process T. Koyama .........................................................................................
332
An Overview of Composite Designs Run as Split-Plots G. Vining, S. Kowalski ........................................................................... 342
Author Index Andersson, Eva, Dr., Goteborg University, Statistical Research Unit, PO Box 660, SE-405 30 Goteborg, Sweden e-mail:
[email protected] Begun, Alexander, Dipl.-Math., Helmut-Schmidt-Universit5it/Universitiit der Bundeswehr Hamburg, Institut fur Statistik und Quantitative Okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail:
[email protected] Bodnar, Olha, Dr., Europe University Viadrina, Department of Statistics, Postfach 1786, D- 15207 Frankfurt(Oder), Germany e-mail: \
[email protected] Collani, Elart von, Prof. Dr., Universitat Wiirzburg, Volkswirtschaftliches Institut, Sanderring 2, D-97070 Wurzburg, Germany e-mail:
[email protected] Colomer, Joan, Prof. Dr., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail:
[email protected] Davis, Darwin J., Ph.D., Prof., Department of Business Administration, College of Business and Economics, University of Delaware, 204 MBNA America Building, Newark, DE 19716, U S A e-mail:
[email protected] Feng, Ling, Dr., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-8510, Japan e-mail:
[email protected] Gob, Rainer, Prof. Dr., Universitat Wiirzburg, Institute for Applied Mathematics and Statistics, Sanderring 2, D-97070 Wurzburg, Germany e-mail:
[email protected] Grzegorzewski, Przemyslaw, Ph.D., Polish Academy of Sciences, Systems Research Institute, Newelska 6, 01-447 Warsaw, Poland and Warsaw University of Technology, Faculty of Mathematics and Information Sciences, Plac Politechniki 1, 00-661 Warsaw, Poland e-mail:
[email protected]
Harada, Taku, Ph.D., Tokyo University of Science, Department of Industrial Administration, 264 1 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail:
[email protected],ac.jp Heidenreich, Melanie, Dipl.-Math., Helmut-Schmidt-UniversitatlUniversitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative bkonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail:
[email protected] Hryniewicz, Olgierd, Prof. Dr., Systems Research Institute of the Polish Academy of Sciences and Warsaw School of Information Technology, Newelska 6, 0 1-447 Warsaw, Poland e-mail:
[email protected] Ishii, Naru, Nagoya Institute of Technology, Department of Civil Engineering and Systems Management, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan e-mail:
[email protected] Iwersen, Jmgen, Dr., Novo Nordisk AIS, Novo Alle, DK-2880 Bagsvaerd, Denmark e-mail:
[email protected] Knoth, Sven, Dr., Advanced Mask Technology Center, Postfach 110 16 1, D-0 1330 Dresden, Germany e-mail:
[email protected] Kowalski, Scott M., Dr., Technical Trainer, Minitab, Inc., State College, PA 16801, U.S.A. Koyama, Takeshi, Prof. Dr., Tokushima Bunri University, Faculty of Engineering, Sanuki City, 769-2101, Japan e-mail:
[email protected] Kuzuya, Kazuyoshi, SQC Consultant, Ohaza-Makihara, Nukata-cho, Aichi 444-3624, Japan e-mail:
[email protected] Lucas, James M., Dr., J. M. Lucas and Associates, 5 120 New Kent Road, Wilmington , DE 19808, U.S.A. e-mail:
[email protected] McWilliams, Thomas P., Ph.D., Prof., Drexel University, Department of Decision Sciences, Philadelphia, PA 19104, U.S.A. e-mail:
[email protected]
Melendez, Joaquim, Prof. Dr., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail:
[email protected] Melgaard, Henrik, Dr., Novo Nordisk A/S, Novo Alle, DK-2880 Bagsvaerd, Denmark e-mail:
[email protected] Morais, Manuel C., Technical University of Lisbon, Department of Mathematics and Centre for Mathematics and its Applications, Instituto Superior Tdcnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal e-mail:
[email protected] Mrowka, Edyta, M.Sc., Polish Academy of Sciences, Systems Research Institute, Newelska 6, 01-447 Warsaw, Poland e-mail:
[email protected] Nishina, Ken, Prof. Dr., Nagoya Institute of Technology, Department of Techno-Business Administration, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan e-mail:
[email protected] Ojima, Yoshikazu, Prof. Dr., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail:
[email protected] Pacheco, Antonio, Technical University of Lisbon, Department of Mathematics and Centre for Mathematics and its Applications, Instituto Superior Tecnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal e-mail:
[email protected] Palcat, Frank, Measurement Canada, Ottawa, Ontario, KIA OC9, Canada email:
[email protected] Pokropp, Fritz, Prof. Dr., Helmut-Schmidt-Universitat/Universitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail:
[email protected] Reynolds Jr., Marion R., Prof. Dr., Virginia Polytechnic Institute and State University, Department of Statistics, Blacksburg, VA 24061-0439, U.S.A. e-mail:
[email protected] Ruiz, Magda, Prof., University of Girona, Department of Electronics, Computer Science and Automatic Control, Campus Montilivi, Building PIV, C.P. 17071 Girona, Spain e-mail:
[email protected]
XIV
Saniga, Erwin M., Prof. Dr., University of Delaware, Department of Business Administration, Newark, DE 19716, U.S.A. e-mail:
[email protected] Schmid, Wolfgang, Prof. Dr., Europe University Viadrina, Department of Statistics, Postfach 1786, D-15207 Frankfurt(Oder), Germany e-mail:
[email protected] Seidel, Wilfried, Prof. Dr., Helmut-Schmidt-UniversitWUniversittitder Bundeswehr Hamburg, Institut fur Statistik und Quantitative Okonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail:
[email protected] Sever, Krunoslav, Dipl.-Math., Helmut-Schmidt-UniversitatLJniversitat der Bundeswehr Hamburg, Institut fur Statistik und Quantitative ijkonomik, Holstenhofweg 85, D-22043 Hamburg, Germany e-mail:
[email protected] Stoumbos, Zachary G., Prof. Dr., Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8054, U.S.A. e-mail:
[email protected] Suzuki, Tomomichi, Ph.D., Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail:
[email protected] Vining, G. Geoffrey, Prof. Dr., Virginia Tech, Department of Statistics, Blacksburg, VA 24061, U.S.A. e-mail:
[email protected] Yasui, Seiichi, Science University of Tokyo, Department of Industrial Administration, 264 1 Yamazaki, Noda, Chiba, 278-85 10, Japan e-mail:
[email protected]:jp
Part 1 General Aspects of SQC Methodology
How Some I S 0 Standards Complicate Quality Improvement Elart von Collanil and Frank A. palcat2 University of Wiirzburg, Sanderring 2, D-97070 Wiirzbuyg, Germany,
[email protected] Measurement Canada, Standards Building, Holland Avenue, Ottawa, Canada, palcat.frankQic.gc.ca Summary. The practice of industrial quality control is often defined by IS0 standards, which are considered to represent the state of the art in relevant technology and science. At the same time industrial enterprises make great efforts to develop and implement strategies for continuous quality improvement in all parts of their organizations, focussing on reducing waste and producing better quality at lower costs in order to stay in business in a globally-competitive marketplace. In the first part of this paper, the aims of some relevant I S 0 standards for controlling quality are compared with the aims of a strategy for continuous quality improvement. As it turns out, the two aims are hardly compatible and using IS0 standards for controlling quality may constitute a major barrier for quality improvement. The second part outlines procedures for quality control which support quality improvement and, therefore, are more appropriate in modern industrial environments than the existing IS0 standards which are essentially still based on the thinking surrounding needs of the US armed forces during World War I1 and the first wave of progress in quality control over 60 years ago.
1 International Organization for Standardization I S 0 stands for the International Organization for Standardization, a body that has released more than 14000 standards in all areas of life. T h e IS0 internet homepage provides the following information: "If there were no standards, we would soon notice. Standards make an enormous contribution to most aspects of our lives - although very often, t h a t contribution is invisible. I S 0 (International Organization for Standardization) is the world's largest developer of standards. Although ISO's principal activity is
the development of technical standards, IS0 standards also have important economic and social repercussions. I S 0 standards make a positive difference, not just to engineers and manufacturers for whom they solve basic problems in production and distribution, but t o society as a whole. I S 0 standards contribute to making the development, manufacturing and supply of products and services more efficient, safer and cleaner. I S 0 standards are technical agreements which provide the framework for compatible technology worldwide." Accordingly, without standards, and in particular I S 0 standards, the lives of individuals and societies throughout the world would be a t a minimum very cumbersome. This is an undeniable fact and everybody should acknowledge the value of international standardization and, in particular, ISO's very important contribution in this regard. However, this does not mean that, among the many thousands of different standards in existence, all of them achieve their objectives. 1.1 I S 0 S t a n d a r d s
At present, I S 0 standards are divided into 40 different sections starting with "Generalities, Terminology, Standardization, Documentation" and ending with "Domestic and commercial equipment, Entertainment, Sports." This paper deals with the Section 07 heading "Mathematics, Natural Sciences," and more specifically with Subsection 07.020 representing mathematics and entitled "Mathematics: Application of statisticd methods in quality assurance." Natural sciences are represented by subsections for "Physics, Chemistry," for "Astronomy, Geodesy, Geography," for "Geology, Meteorology, Hydrology," for "Biology, Botany, Zoology" and for "Microbiology." The fact that standards dealing with statistical quality assurance are listed under the caption "mathematics" is quite strange, as no mathematician would agree that quality assurance is a special branch of mathematics. This leads t o the question as to which meaning of "statistics" is assumed by the relevant parts of the I S 0 standards. Unfortunately, this question is not addressed in the three voluminous standards entitled "Statistics - Vocabulary and symbols." However, there is a short note in the 2003 Business Plan of the technical committee responsible for Subsection 07.020, i.e., I S 0 T C 69 "Applications of statistical methods." There it is said: "Statistics is the science that develops probabilistic models, based on collection of data, that are used to optimize decisions under uncertainty and to forecast the impact of selected decisions. Thus, statistics contributes to all phases of the product life cycle such as market needs
assessment during conception, optimal specification development under cost and quality constraints during development, process control and optimization in delivery, and customer satisfaction assessment." Without intending to be captious, this definition of statistics falls short of being factual. Probabilistic models are developed in probability theory and this development is not at all based on data or the collection of data, but rather on mathematical principles. Moreover, it is unclear how statistics optimize any decision and looking a t textbooks of statistics reveals that in a majority of them there are no statistical methods for forecasting. Generally, statistics is called a methodology as stated earlier in the same Business Plan: "Statistical methodology coming from the mathematical branch of probability theory . ..," which again is not correct. Statistics came into being earlier and independently from probability theory. One might be inclined to ignore this looseness in the handling of concepts and definitions, however, as we will see, it is symptomatic of the underlying problem and has immediate consequences regarding the developed International Standards. 1.2 General Principles
The development of a standard is supposed to follow some general principles that are stated in ISO/IEC Directives, Part 2: "Rules for the structure and drafting of International Standards." Section 4 contains the general principles in 4.1 and 4.2: 4.1 Objective The objective of documents published by I S 0 and I E C ~is to define clear and unambiguous provisions in order to facilitate international trade and communication. To achieve this objective, the document shall - be as complete as necessary within the limits specified by its scope, - be consistent, clear and accurate, - take full account of the state of the art (see 3 . 1 1 ) ~ ~ - provide a framework for future technological development, and - be comprehensible to qualified persons who have not participated in its preparation. 4.2 Performance approach Whenever possible, requirements shall be expressed in terms of performance rather than design or descriptive characteristics. This approach
3~nternationalElectrotechnical Commission 4 ~ S O / ~Directives, ~C Part 2: "State of the art: developed state of technical capability at a given time as regards products, processes and services, based on the relevant consolidated findings of science, technology and experience."
leaves maximum freedom to technical development. Primarily those characteristics shall be included that are suitable for worldwide (universal) acceptance. Where necessary, owing to differences in legislation, climate, environment, economies, social conditions, trade patterns, etc., several opinions may be indicated. In the following, the objectives of some standards frequently used in industrial quality control are analyzed and, moreover, the conformity of the standards with the above stated general principles is examined.
2 AQL-Based Sampling Plan Standards The most popular types of lot-by-lot acceptance sampling plans are the socalled AQL-based sampling plan standards (hereafter referred to AQL-plans). There are two different types. The first type is classified as "inspection by attributes" and is currently represented by:
0
I S 0 2859-0:1995 Sampling procedures for inspection by attributes - Part 0: Introduction to the I S 0 2859 attribute sampling system. I S 0 2859-1:1999 Sampling procedures for inspection by attributes - Part 1: Sampling schemes indexed by acceptance quality limit (AQL) for lotby-lot inspection.
The second type is classified as "inspection by variables" and is currently represented by: I S 0 3951:1989 Sampling procedures and charts for inspection by variables for percent nonconforming.
2,l H i s t o r y of AQL-Plans Banks [3] gives a historical overview of the evolution of quality control from ancient times, including future trends. He gives the following account regarding the introduction of AQL-plans during World War 11:
" World War I1 caused a rapid expansion in industry connected with the war effort. The attempts t o meet the large demands for war material resulted in increased employment of unskilled personnel in the manufacturing industries. Inevitably, the quality of goods fell. Something had to be done to halt this degradation. Quality Control experienced a new impetus and came of age. The military played a significant role in this maturation. In 1942, the Army's Office of the Chief Ordnance came out with the "Standard Inspection Procedures," whose development was largely due to G. Edwards, H. Dodge, and G. Gause. Romig and Torrey also provided assistance in this effort. These procedures also contained sampling tables, which were based on an acceptable quality level (AQL)."
During the 1950s, these sampling procedures were further developed and resulted finally in the well-known MIL-STD-105D, which is the forerunner of IS0 2859. In 1957, MIL-STD-414 was released and is essentially identical in principle t o the current I S 0 3951. Thus, it can be stated that the AQL-plans were basically developed for the US armed forces and their development was largely concluded in the 1950s. Both of these military standards had the dual objective of protecting the soldiers from poor quality while providing them with sufficient material for their missions. Thus, methods were developed in order to reject product of such poor quality that a delivery to the troops would result in an incalculable danger. In addition, the methods were intended to impose the pressure of lot rejection on suppliers in order to motivate them to improve their production processes if inferior quality was being provided. This situation led to the development of the concept known as Acceptable Quality Level (AQL) and the associated AQL-plans (or schemes). In Duncan [13] the following description is given: "The focal point of MIL-STD-105D is the Acceptable Quality Level or AQL. In applying the standard it is expected that in a conference (at a high level) between a supplier and a military agency it will be made clear to the supplier what the agency considers to be an acceptable quality level for a given product characteristics. It is expected that the supplier will be submitting for inspection a series of lots of this product, and it is the purpose of the sampling procedure of MIL-STD105D to so constrain the supplier that he will produce product of AQL quality. This is done not only through the acceptance and rejection of a particular sampling plan but by providing for a shift t o another, tighter sampling plan whenever there is evidence that the contractor's product has deteriorated from the agreed upon AQL target." In practice, once a supplier had succeeded in achieving a production process producing continuously at the AQL, the goal was reached and no incentive for any further improvement was usually considered necessary. 2.2 Definition of AQL
The notion that the "AQL" would be the main point of negotiation between the military and the supplier was ambiguous from the beginning. Originally, the abbreviation stood for "acceptable quality level," however, in [24] this later changed to "acceptance quality limit5." Some of the various definitions given for AQL over time are listed, starting with the original definition given at the August 1942 Ordnance Control training conference. '3.4.6.15 acceptance quality limit AQL
worst tolerable product quality level
1942 Ordnance Control training conference [12]: "Acceptable Quality Level (AQL) - the maximum percent defective which can be considered satisfactory as a process average; that is, it is the poorest quality which a facility can be permitted continually to present for acceptance." Dictionary of Statistical Terms (1960) [16]: "The proportion of effective units in a batch which is regarded as desirable by the consumer of the batch; the complement of the proportion of defectives which he is willing to tolerate." Quality Control Handbook (1962) [15]: "Closely related t o classifying of characteristics is the setting of the tolerable per cent defective for each of the defect classes. For vendor inspection this is commonly expressed as in acceptable quality level (AQL). Whereas classification indicates the relative seriousness of function of each characteristic class, the AQL specifies quantitatively the percentage of nonconformance which will be acceptable to the buyer in a 'mass' (or quantity) of product. In effect, the buyer is agreeing ahead of time that he cannot expect perfect product and that he will consider the purchase contract fulfilled if the degree of nonconformance in a lot is no worse than the specified level." American National Standard (1978) [I]:"The maximum percentage or proportion of variant units in a lot or batch that, for the purpose of acceptance sampling, can be considered satisfactory as a process average." American National Standard (1981) [2]: "The AQL is the maximum percent nonconforming (or the maximum number of nonconformities per 100 units) that, for purpose of sampling inspection, can be considered satisfactory as a process average." IS0 2859-1 (1989) [19]: "Acceptable quality level (AQL): When a continuous series of lots is considered, the quality level which for the purpose of sampling inspection is the limit of a satisfactory process average. The AQL is a parameter of the sampling scheme and should not be confused with the process average which describes the operating level of the manufacturing process. It is expected that the process average will be less than or equal to the AQL to avoid excessive rejections under this system." Encyclopedia of Statistical Sciences (1981) [17]: "This is usually defined as the maximum percent defective (or the maximum number. of defects per 100 units) that can be considered satisfactory for a process average." Schilling (1982) [18]: "Note on the meaning of AQL. When a consumer designates some specific value of AQL for a certain defect or group of defects, he indicates to the supplier that his (the consumer's) acceptance sampling plan will accept the great majority of the lots or batches that the supplier submits, provided the process average level of percent defective (or defects per hundred units) in these lots or batches be no greater than the designated value of AQL. Thus, the AQL is a designated value of percent defective (or defects per hundred units) that the consumer indicates will be accepted most of the time by the acceptance sampling procedure to be used. The sampling plans provided herein are so arranged that the
probability of acceptance at the designated AQL value depends upon the sample size, being generally higher for large samples than for small ones, for a given AQL. The AQL alone does not describe the protection t o the consumer for individual lots or batches but more directly relates to what might be expected from a series of lots or batches, provided the steps indicated in this publication are taken. It is necessary t o refer to the operating characteristic curve of the plan, to determine wha) protection the consumer will have. It also contains the following limitation: Limitation. The designation of an AQL shall not imply that the supplier has the right to supply knowingly any defective unit of product." In particular, Schilling's longish explanation demonstrates that the concept is not at all clear or unambiguous. AQL mixes process quality level with the proportion nonconforming of a stream of lots on the one hand, and of a single lot on the other. Moreover, it relates all three of them to a set of sampling plans determined within the framework of a sampling scheme. The ambiguity of the concept was well known, e.g., Duncan [14] notes "Intense difference~of opinion as to the desirability of certain sampling schemes sometimes arise because the various parties are not in agreement as to the meaning of the AQL." It would appear that the concept of an AQL could only be born in a military atmosphere characterized by conflicting interests and an inclination towards obscure solutions. The fact that civilian organizations for standardization have uncritically adopted the statistical quality concepts developed for the US Military reflects the decisive role the US Military had gained in this area and the unclear part of "statistics" within science and technology. 2.3 Objectives of AQL-Plans
According to I S 0 2859-1:1989, an AQL-plan standard is not "intended for estimating lot quality or for segregating lots", but in I S 0 3951:1989 the objective is described as follows: "The object of the method laid down in this International Standard is to ensure that lots of an acceptable quality have a high probability of acceptance and that the probability of not accepting inferior lots is as high as possible. In common with I S 0 2859, the percentage of nonconforming products in the lots is used to define the quality of these lots and of the production process in question." The formulation "lots of an acceptable quality have a high probability of acceptance" is neither clear nor accurate and reflects the obscure performance
of the sampling plans. In fact it is not possible to state any consistent performance criteria for the sampling plans contained in the International Standard, as each single sampling plan performs differently. The claim "the probability of not accepting inferior lots is as high as possible" incorrectly suggests an optimisation procedure. The claim is irrelevant and illustrates that there is no "full account of the state of the art." Of course, it makes sense to use the percentage of nonconforming products in a given lot to define its quality. One could also use this percentage to make inferences regarding the quality of the production process in question, but it makes no sense to use the percentage to define the quality of the production process. Moreover, the assumed close relationship between the percentage of nonconforming product in the lot and the production process restricts the application of these International Standards to situations where the lots are made up by consecutively produced items of a production process. Having in mind the original situation during World War 11, then the primary aim of AQL-plans was to reject all those lots that, if delivered, would endanger one's own soldiers. The detection of poor quality is not made directly by measuring it, but indirectly by showing in some way that the claim of acceptable quality (quantified by the agreed-to AQL) is wrong. The second more implicit aim is t o have the producer improve the process in question until process quality has reached the agreed-upon AQL-value. As mentioned earlier there are two types of AQL-plans. The second type is given by L'ISO3951:1989 Sampling procedures and charts for inspection by variables for percent nonconforming." As early as 1990, it had already been shown [4] that these standards cannot give same protection in principle as the corresponding attribute plans of IS0 2859. These findings were discussed, for example, during the 4th Workshop in Baton Rouge [5] and subsequently investigated by Gob [9, 10, 111 and other researchers. Nevertheless, the I S 0 3951 standard has not been withdrawn and is still being recommended for use by national and international bodies throughout the world.
3 Control Chart Standards Besides the AQL-plan standards, the next most widely applied International Standards for quality control are those for control charts and, in particular, for the so-called Shewhart Charts contained in IS0 8258. In the introduction of I S 0 8258 [21], we may read: "The traditional approach to manufacturing is to depend on production to make the product and on quality control to inspect the final product to screen out items not meeting specifications."
The object of statistical process control is to serve to establish and maintain a process at an acceptable and stable level so as to ensure conformity of products and services to specified requirements. The major statistical tool to do this is the control chart, which is a graphical method for presenting and comparing information based on a sequence of samples representing the current state of a process against limits established after consideration of inherent process variability." It appears that the standard is based on a rather out-dated process perception with respect to producing quality on the one hand and the relevant quality aims on the other. Fortunately, the practice of "inspecting quality into product" has been abandoned long ago. Thus, it has not become a tradition that is handed down through generations and still affects the present. 3.1 H i s t o r y of Control C h a r t s
In 1924, Western Electric's Bell Telephone Laboratories established its Inspection Engineering Department. One of its key members was Walter A. Shewhart, who proposed the first control chart for monitoring process quality that same year. Shewhart distinguished between chance causes and assignable causes of variation, where the latter can be assigned to a special disturbance and the former cannot. Based on this classification of variation, he searched for a method that would be able to distinguish between the two types of variation and could thus detect the occurrence of assignable causes, while tolerating the chance causes. Removing an assignable cause leads to an increase in profit in contrast to an attempt to remove chance causes, which is not economically. During the 1950s, W. Edwards Deming brought the concept of control charts t o Japan and the tremendous success of the transformed Japanese industry constituted the final breakthrough for control charting and, more generally, for statistical process control. 3.2 Objectives of C o n t r o l C h a r t s
The objective of a control chart is described in I S 0 8258 as follows: "The object of statistical process control is to serve to establish and rnaintain a process at an acceptable and stable level so as t o ensure conformity of products and services to specified requirements. The major statistical tool used to do this is the control chart, which is a graphical method of presenting and comparing information based on a sequence of samples representing the current state of a process
against limits established after consideration of inherent process variability. The control chart method helps first to evaluate whether or not a process has attained, or continues in, a state of statistical control at the proper specified level and then to obtain and maintain control and a high degree of uniformity in important process or service characteristic by keeping a continuous record of quality of the product while production is in progress. The use of a control chart and its careful analysis leads to a better understanding and improvement of the process." Thus, the general aim of control charts is to establish and maintain a satisfactory process state. This is done in a manner similar t o that of AQLplans. Control charts do not prove that an assignable cause has occurred, but the claim that no assignable cause has occurred is shown to be false. As an important consequence of this indirect procedure, no substantial statement can be made about the probability of detecting an assignable cause.
4 Modern Industrial Processes Since the 1940s and 1950s many things have changed in industry. Processes are better designed, controlled, and monitored, and automation is extremely advanced. Modern production processes have almost nothing in common with those manufacturing processes and conditions that gave rise to the development of AQL-based sampling plans and control charts. Nowadays, any important quality characteristic is continuously monitored and automatically controlled. Consequently, the produced quality when compared with the specifications is in general near perfect. 4.1 Role of AQL-Plans
As a matter of fact, none of the reasons that led to the development and application of AQL-plans by the US Department of Defense still exist. Nevertheless the International Standards mentioned above, which are essentially identical to the Military Standards of the 1950s, are recommended and applied throughout the world, even so they constitute costly activities that are deprived of appropriate meaning. One could think that applying AQL-plans may be nonsensical, but nevertheless harmless. However, any pro forma activity performed continuously will be inherently dissatisfying, will damage motivation, and will be counterproductive with respect to the true aims and requirements of modern industrial production processes. Clearly, in cases where near perfect quality is produced, it makes no sense to control to an acceptable quality level. However, this does not mean that
one can completely cease with product control activities. There are at least two reasons for continuing with product control: Legal liability regulations. Monitoring the actual quality level. The first reason is clear and requires special m e t h ~ d sthat meet the legal requirements. The second reason will be discussed below. But in any case, the AQL-sampling schemes as offered by the I S 0 are not appropriate for either purpose, as AQL-sampling plans are neither adapted t o meeting the legal requirements nor do they constitute measurement procedures for monitoring the actual quality level. 4.2 Role of Control Charts
In times where demand far exceeds supply and the number of suppliers is small, the primary focus of an enterprise is on product quantity. In situations where the consumer is as powerful as the US Military, the issue of an acceptable quality level (AQL) becomes important and certain quality requirements can be imposed on suppliers and their production processes. In an era of global competition, where supply far exceeds demand and suppliers are operating in a worldwide market, quantity and acceptable quality level lose their importance as measures of being successful in business. Success is determined by the ability to produce better product for less cost compared to a company's worldwide competitors. Reaching this objective means efforts must be increased to further improve processes and products in order to maintain a competitive edge. One consequence of this situation is the fact that only huge globally-operating companies may be able to afford the operational expenditures for research, development, and continuous improvement t o achieve this objective. In this modern environment, production processes not only look completely different from those of World War I1 era, but their requirements are fundamentally different, focussing on continuous improvement rather than simply achieving and maintaining an acceptable quality level. And in this new environment, any enterprise that intends to "rest on its laurels" knows that it will soon be out of business. Any strategy of continuous process improvement must consist of many co-ordinated activities, which, of course, include monitoring and controlling. There are many objectives of monitoring and controlling. The two most important are the following: 0
Revealing room for improvement.
Verifying improvements made. Clearly, neither of these aims can be achieved by using control charts. Therefore, as in the case of using AQL-plans, performing insignificant activities is not harmless but counterproductive and should be abandoned.
5 New Generation of Standards The existing International Standards for quality control cannot contribute towards a strategy of continuous quality improvement because they have been developed and designed for completely different and even conflicting objectives. Therefore, a question arises as to which International Standards are needed to cope with the present situation in modern industries. As already mentioned there is a need for International Standards for acceptance control in order to meet the requirements set by legal liability regulations. These standards have no impact on continuous process improvement and, therefore, will not be discussed in this paper. Of course, more interesting from the stochastic point of view are the methods supporting continuous process improvement. Modern process requirements and recent scientific advancements necessitate and enable completely new standards for different areas of application. As a general guideline, it should be emphasized that the new generation of stochastic standards should follow the "performance approach" criterion stated as a general development principle but neglected by the existing I S 0 statistical standards, which specify design details but leave little room for technological and scientific developments. Hence, they do not constitute a "framework for future development," but rather hinder further developments. This is a consequence of the I S 0 statistical standards still closely following the designs of their military forerunners. 5.1 S t o c h a s t i c Modelling
Because any appropriate method for monitoring or control must be based on a comprehensive stochastic model, International Standards based on stochastic modelling should be developed. The required models must be able to incorporate any available knowledge and express existing ignorance. Existing standards are generally not based on models that sufficiently reflect the existing uncertainty in a given situation. The majority of statistical standards assumes the normal approximation without a thorough justification. Consequently, it remains unknown whether or not the results are useful.
What is needed is a schematic procedure that starts with the collection of available knowledge and leads in a stepwise manner to a mathematical model describing the existing uncertainty while accounting for the available knowledge. The relevant principles for stochastic modelling, which should not be confused with modelling in probability theory, are derived and listed in some detail in [7]. The main difference between stochastic modelling and probabilistic modelling is the fact that the former closely follows the actual situation as given while the latter proceeds according to mathematical principles, in particular, using limiting processes. 5.2 Stochastic Monitoring Stochastic monitoring from the viewpoint of a continuous improvement strategy consists of continuously measuring the actual quality level for detecting room for improvement on the one hand and for directly verifying implemented improvements on the other. Therefore, measurement procedures are needed which enable variations in process quality level as related to environmental variations to be documented. The question as to how to measure process quality depends on the comprehensive process model as derived and need not necessarily consist of measuring the probability of nonconformance. Moreover, a process improvement need not necessarily lead to a decrease in the probability of nonconformance, but it may enhance the dependence structure of relevant characteristics. A method to construct the necessary measurement procedures based on the deriyed stochastic model is described in [8]. 5.3 Stochastic Verification
Finally, the suitability of a stochastic model on the one hand and of implemented improvements on the other must be verified. To this end, appropriate prediction procedures are necessary, as the usefulness of any mathematical model can only be verified by comparing predictions based on the model with the actual occurring events. Again the scientific principles for developing appropriate International Standards for prediction procedures to be used for model verification can be found in [8].
6 Conclusions The relevant International Standards aim for quality assurance and are essentially sorting methods aimed a t rejecting the hypothesis t h a t quality is satisfactory. They are based on the concept of in-control and out-of-control states for the production process and can be looked upon as a means for preserving the in-control state or for maintaining an acceptable quality level. Almost all of the relevant standards descend from some earlier military standard and, consequently, the underlying thinking is associated with that earlier time period. The military standards were characterized by application instructions being specified in detail, while their actual performance characteristics remained obscure. This is contrary to ISO1s general guidelines for standards development which expects requirements t o be performance-based rather than being prescriptive. As a consequence, many of the statistical standards cannot be sufficiently well adapted to given situations. However, the main weakness of existing International Standards in the field of statistics is the fact that their objectives are obsolete. New requirements arising from the dramatic technological and communications revolution on the one hand and progressive globalisation on the other have completely changed the needs in industry. Therefore, a new generation of stochastic International Standards is needed to meet the demands resulting from the efforts of implementing continuous quality improvement strategies. In the development of this new generation of stochastic standards the opportunity should also be taken t o change the I S 0 standards section from "mathematics" to "stochastics6" and develop a completely new, clear, and unambiguous terminology.
References 1. American National Standard (1978) Sampling Procedures and Tables for Inspection by Attribute. American Society for Quality Control, Wisconsin. 2. American National Standard (1981) Terms, Symbols and Definitions for Acceptance Sampling. American Society for Quality Control, Wisconsin. 3. Banks, J (1989) Principles of Quality Control. John Wiley & Sons, New York. 4. v. Collani, E (1991) A Note on Acceptance Sampling by Variables. Metrika 38: 19-36. 5. v. Collani, E (1992) The Pitfall of Variables Sampling. In: Lenz H-J, Wetherill GB, Wilrich P-Th (eds) Frontiers in Statistical Quality Control 4, Physica Verlag, Heidelberg, 91 -99.
he term "stochastics" originates from Greek, it was introduced by Jakob Bernoulli in his masterpiece "Ars conjectandi," which appeared 1713, and its meaning is "Science of Prediction"
6. v. Collani E, Drager K (2001) Binomial Distribution Handbook for Scientists and Engineers. Birkhauser, Boston. 7. v. Collani, E (2004) Theoretical Stochastics. In: v. Collani E (ed) Defining the Science of Stochastics. Heldermann Verlag, Lemgo, 147-174. 8. v. Collani, E (2004) Empirical Stochastics. In: v. Collani E (ed) Defining the Science of Stochastics. Heldermann Verlag, Lemgo, 175-213. 9. Gob, R (1996) An Elementary Model of Statistical Lot Inspection and its Application to Sampling Variables. Metrika: 44, 135-163. 10. Gob, R (1996) Test of Significance for the Mean of a Finite Lot. Metrika: 44, 223-238. 11. Gob, R (2001) Methodological Foundation of Statistical Lot Inspection. In: Lenz H-J, Wilrich P-Th (eds) Frontiers in Statistical Quality Control 6, Physica Verlag, Heidelberg, 3-24. 12. Dodge, H F (1969) Note on the Evolution of Acceptance Sampling Plans, Part 11. Journal of Quality Technology: 1, 155-162. 13. Duncan, AJ (1965) Quality Control and Industrial Statistics. 3rd ed., Richard D. Irwin, Homewood. 14. Duncan, AJ (1972) Quality Standards. Journal of Quality Technology: 4, 102109. 15. Juran, JM (1962) Quality Control Handbook. 2nd ed. McGraw-Hill, New York. 16. Kendall MG, Buckland WR (1960) A Dictionary of Statistical Terms. 2nd ed., Oliver and Boyd, Edinburgh. 17. Kotz S, Johnson NJ (1981) Encyclopedia of Statistical Sciences, Vol. 1, John Wiley & Sons, New York. 18. Schilling, EG (1892) Acceptance Sampling in Quality Control. Marcel Dekker, New York. 19. I S 0 2859-1 (1989) Sampling procedures for inspection by attribute - Part 1. International Organization for Standardization, Geneva. 20. I S 0 3951 (1989) Sampling procedures and charts for inspection by variables for percent nonconforming. International Organization for Standardization, Geneva. 21. I S 0 8258 (1991) Shewhart control charts. International Organization for Standardization, Geneva. 22. I S 0 T C 69 Application of statistical methods (2003) Business Plan 2003. International Organization for Standardization, Geneva. 23. Draft International Standard ISO/DIS 3534-1 (2003) Statistics - Vocabulary and symbols - Part 1: Probability and General Statistical Terms. International Organization for Standardization, Geneva. 24. Draft International Standard ISO/DIS 3534-2 (2003) Statistics - Vocabulary and symbols - Part 2: Applied statistics. International Organization for Standardization, Geneva.
Part 2 On-line Control
2.1 Sampling Plans
Optimal Two-Stage Sequential Sampling Plans by Attributes Olgierd Hryniewicz Systems Research Institute, Newelska 6, 01-447 Warsaw, POLAND hryniewi(9ibspan.waw.pl
Summary. Acceptance sampling plans have been widely used in statistical quality control for several decades. However, when nearly perfect quality is needed, their practicability is questioned by practitioners because of required large sample sizes. Moreover, the majority of well-known sampling plans allow nonconforming items in a sample, and this contradicts the generally accepted "zero defect" paradigm. Sequential sampling plans, introduced by Wald [7], assure the lowest possible sample size. Thus, they are applicable especially for sampling products of high quality. Unfortunately, their design is rather complicated. In the paper we propose a simple, and easy to design, special case of sequential sampling plans by attributes, named CSeq-1 sampling plans, having acceptance numbers not greater than one. We analyze the properties of these plans, and compare them to the properties of other widelyused sampling procedures.
1 Introduction Procedures of acceptance sampling for attributes have been successfully used for nearly eighty years. Sampling plans (single, double, multiple, and sequential) have been proposed by prominent statisticians, and published in many papers, textbooks and international standards. During the last ten years, however, their usage has been criticized by practitioners, who have pointed out many their many deficiencies. First of all, for quality levels that are considered appropriate for contemporary productions processes, the sampling plans published in international standards require excessively many items t o be sampled. Secondly, the majority of well-known sampling plans allow nonconforming items in a sample. This is in conflict with the zero-defect philosophy which has become a p a r a d i y for quality managers and engineers. Practitioners prefer to use sampling plans with a low sample size and with acceptance number set t o zero. Moreover, they require, if asked, very good protection against lots of bad quality. It has to be stated clearly, however, that the
fulfillment of all these requirements is, except for few cases, practically impossible. Therefore, there is a need for simple sampling procedures that are characterized by relatively low average sample sizes that do not frighten practitioners by the magnitude of their acceptance numbers. In many cases the maximum acceptance number is equal to one. One nonconforming item in a sample may be accepted in all cases where perfect quality is not attainable due to economical or physical limitations, but high quality, expressed as a low fraction nonconforming, is required. It is a well-known fact that sequential sampling plans are characterized by the lowest possible average sample size among all sampling plans that fulfill certain statistical requirements. Sequential sampling plans were introduced by Abraham Wald [7, 81 during the Second World War. Similar results were independently obtained by Bartky [4] and Barnard [3] at approximately the same time. Speaking in terms of mathematical statistics, sequential sampling plans can be regarded as sequential probability ratio tests (SPRT) for testing a simple hypothesis 0 = Bo against an alternative 8 = > 00, where 0 is the parameter of a probability distribution f (x, 8) that describes the observed random variable X. In the case of sampling by attributes when the random variable of interest is described by the binomial distribution, the sequential sampling plan is used to test the null hypothesis p = pl against the alternative hypothesis p = pz, where p is the parameter of the binomial distribution (probability of the occurrence of the event of interest in a single trial). Under sequential sampling, consecutive items are drawn from a given population (process or lot) one at a time, until a decision is y a d e whether to accept the null hypothesis or to reject it. In the context of quality control, the sampled items are judged either as conforming or nonconforming to certain requirements. For the sake of simplicity we will use this terminology, knowing that in a different context the meaning of the result of sampling an item might be different. Let d (n) denote the cumulative number of nonconforming items found in n sampled items. According to Wald's original proposal, we accept the null hypothesis (in quality control it means that we accept the sampled lot or process) if the plot of d (n) crosses from above to below the acceptance line A (n) = -hA gn. (1) When the plot of d (n) crosses from below to above the rejection line
+
R (n) = h~
+ gn,
the null hypothesis is rejected (so we reject the sampled lot or process). Otherwise, the next item is taken, and the procedure continues until an acceptance or rejection decision is made. Sampling plan parameters: hA,hR and g are called, respectively, the acceptance intercept, the rejection intercept, and the slope of the decision lines.
Let L (p) be the probability of acceptance (of the null hypothesis) when the value of the parameter of the considered binomial distribution is equal to p. We usually assume that the sampling plan has to fulfill the following two requirements: L ( ~ 1 2) 1 - a' (3)
In statistical quality it means that we require the high acceptance probability ( 2 1 - a ) for a product of good quality (with a fraction nonconforming equal to p l ) , and the low acceptance probability ( 5 P) for a product of bad quality (with a fraction nonconforming equal to pz). Probabilities cr and P are usually called producer's and consumer's risks, respectively. When the requirements (3) and (4) have the form of equalities, it has been proven that the sequential sampling procedure is the optimal with respect to the expected sample size [8]. It means that a sequential sampling plan is characterized by the minimal average sample size among all possible sampling plans that fulfill these requirements. It has f o be noted that requirements (3) and (4) in a form of equalities in the case of sampling by attributes may not be fulfilled when we sample from a finite population without replacement. In the case of sampling from an infinite population (or sampling with replacement) Wald [7] found a n approximation for the OC function of the sequential sampling plan, and showed that its parameters may be calculated from the following formulae:
For many years Wald's sequential sampling plans for attributes have been offered to practitioners as the procedures with the smallest average sample size. In practice, however, they have not been used in their original noncurtailed version. Despite the theoretical fact that the sequential sampling plan has the smallest average sample size, it may occur that the decision of acceptance or non-acceptance is made at a very late stage of sampling, i.e. after sampling a large and unknown number of items. Such a situation
may occur when the quality of a lot (measured in fraction nonconforming) is close to g. Practitioners do not like these situations, and want to know in advance what is the largest possible cumulative sample size. In order to meet these requirements a maximum cumulative sample size n, is introduced. When the cumulative sample size reaches the curtailment value without the decision having been made, the lot is accepted if the cumulative number of nonconforming items d (n,) is not greater than a given acceptance number '4 (4. The analysis of the statistical properties of the curtailed sequential sampling plans requires the application of numerical computations. Such methodology was proposed in the paper by Aroian [I]. Woodall and Reynolds [9] proposed a general model for the analysis of curtailed sampling plans using a discrete Markov chain representation. They also proposed an approximate method for finding the optimal curtailment. However, the implementation of their theoretical results is rather difficult for practitioners, as it requires special statistical skills and specialized software (for calculation of eigenvectors of matrices). The work on practical implementation of the curtailed sampling plans for attributes has been initiated by Baillie [2] who examined statistical characteristics of curtailed sequential sampling plans from the international standard I S 0 8422 [6] with the parameters calculated according to Wald's formulae. In this paper Baillie noticed that the actual risks of the sampling plans proposed in this standard are substantially different from the nominal ones (5% for a , and 10% for p). This result has important practical consequences, as the plans from this standard are practically the only sequential sampling plans for attributes that are used in practice. Moreover, Hryniewicz [5] has shown that curtailed sequential sampling plans for attributes have always smaller average sample size than non-curtailed ones. These and other similar results have prompted a group of researchers from ISO/TC 69 committee to seek for curtailed sequential sampling plans for attributes that fulfill (3) and (4), and are optimal in a certain sense. During the work on a new version of I S 0 8422 International Standard on sequential sampling plans for attributes many numerical experiments have revealed that some optimal curtailed sampling plans have the acceptance number at the curtailment equal to one. We believe that such sampling plans may be acceptable for practitioners who are looking for cost-efficient procedures with decision risks under control, and a low number of accepted nonconforming items in a sample. In the second section of the paper we introduce such a simple acceptance sampling procedure, which in fact is a special case of a well-known curtailed sequential sampling plan described above. For this procedure we present formulae for the calculation of its statistical properties. These formulae are used in the third section of the paper where we propose algorithms for the design of sampling plans that fulfill certain practical requirements. Theoretical results are illustrated with some numerical examples. In the fourth section of
the paper we compare the newly introduced sampling plans with the Wald's sequential sampling plans given in the international standard I S 0 8422 [6], and with curtailed single sampling plans having the acceptance number equal to zero.
2 Curtailed sequential sampling plans with acceptance
number not greater than one (CSeq-1) Let us introduce the proposed sampling plan by the description of its operation. It is typical for all sequential sampling plans by attributes. Sample items are drawn at random and inspected one by one, and the cumulative count (the total number of nonconforming items or nonconformities) d ( n ) is recorded. If, at a given stage, the cumulative count fulfills acceptability criteria, i.e. it is not greater than a certain acceptability number, then the inspection is terminated, and the inspected lot or process is accepted. If, on the other hand, the cumulative count is equal to a rejection number, then the inspection is terminated, and the inspected lot or process is rejected. If neither of these decisions can be made, then an additional item is sampled and inspected. Let us assume that the inspected lot or process can be accepted only in two cases: either when the cumulative count for a certain clearance cumulative sample size no is equal to zero, or if the cumulative count for the curtailment value n, such that n, > no is equal to one. In case of n, > no let the rejection number at the curtailment be set to two, and assume that it is also equal to two for all cumulative sample sizes. When n, = no, the rejection number is set to one for all cumulative sample sizes. Hence, the proposed sampling plan (CSeq-1) is defined by two parameters (no, n,), no n, . Note, however, that in the case of n, = no the CSeq-1 sampling plan is the same as the curtailed single sampling plan with acceptance number equal to zero. Let us assume that the probability of drawing a nonconforming item is constant for all inspected items, and equal to p. It means that in the case of sampling without replacement, as it is usually done in practice, we assume infinite (sampling from a process) or sufficiently large lots. In such a case the number of nonconforming items in the sample is distributed according to the binomial distribution. Thus, the probability of acceptance when i items have been inspected is given by
<
The probability of rejection is given by
Let nl = n, -no. Then the OC function of the CSeq-1 sampling plan is given by a simple formula
If nl = 0 , then L ( p ) = ( 1 - p)"' Another important characteristic of a sampling plan, the average sample size, is given by the following expression -
+
+
n ( P ;n o , nc)= C;zl i [PA( i ) PR ( i ) ] = n o ( 1 - p)"' ncnop (1 -P)~'-' + C;22i ( i - I ) p2 ( I - p)'-' + Cr2no+l' ~ O2 P( 1 - p)2-2 = no (1 -P ) ~ '
+{2
+ ncnop ( 1 - P ) ~ " - '
+ 2 ( n o + 1 ) ( n o - 1 ) ( 1 - p)"') - n o (no111 ( 1 - P ) " ' + ~ } / P + n o { ( l - p)"'-l (nap + 1 ) - ( 1 - p)"'-l ( n C p+ 1 ) ) .
(11)
Both formulae ( 1 0 ) and ( 1 1 ) will be used for the design of the CSeq-1 sampling plan.
3 Design of CSeq-1 sampling plans Let us assume that the CSeq-1 sampling plan has to fulfill the two typical requirements given by ( 3 ) and ( 4 ) . Thus, for given p l , and p2 such that pl < p z , it can be seen from ( 1 1 ) that the parameters n o and n, = n o nl have to be chosen in such a way that the following two inequalities hold:
+
and
+ n 0 ~(21 - ~ 2 ) " ' - l ] 5 4, n1 > 0 . (13) If nl = 0, then we have ( 1 - pl)"' > 1 -a, and ( 1 < p. From ( 1 0 ) we can see that for n1 > 1 the following inequality holds ( 1 - ~2)"' [ I
Hence, we can find from ( 1 2 ) the following necessary condition for the value of the clearance parameter n o :
On the other hand, we can see from (13) that the following condition has to be also fulfilled (1 - pzIn0 < 0 . (16) Both inequalities (15) and (16) define, respectively, the upper and the lower limits for possible values of no. Now, let us find the limits for possible values of n l . For a given value of no, we can find from (13) that
nl L and from (12) we can find that
Thus, inequalities (15)-(18) define the joint ranges for admissible values of the parameters no and n,. If, for given a , 0, p l , and pz, these inequalities do not hold, then a CSeq-1 sampling plan which fulfills requirements (12) and (13) does not exist. Particularly, when there is no such n l that fulfills both (17) and (18), a discrimination rate between pl and pz is too high. In such a case it is necessary t o apply sequential sampling plans with the maximal acceptance number (at curtailment) greater than 1.
4 Design of optimal CSeq-1 sampling plans Sequential sampling plans are used in situations when it is necessary to minimize sampling costs. Therefore, from among all plans fulfilling conditions (12) and (13), we should choose as optimal the one that is characterized by the lowest average sample size. The average sample size for CSeq-1 sampling plan, defined by ( l l ) , is a complex function of the fraction nonconforming p. Therefore, it is not possible to propose a simple algorithm for its optimization, even for a simple objective function. In this paper we consider three objective functions: 0
minimization of supp ii (p; no, n,), minimization of (n(pl; no, n,) ;it (p2;no, n,)) 12, and minimization of (E(0; no,n,) ii (pl; no, n,)) 12.
+ +
The first of these objective functions is traditionally used for the optimization of sampling plans with requirements set on producer's and consumer's
risks. The second function can be recommended when the actual quality of inspected lots or processes is usually better than p l . The third function describes the sampling costs in the worse possible case. It may be recommended when the fraction nonconforming varies, e.g. when we inspect large lots delivered by different suppliers. Optimal CSeq-1 sampling plans can be found using numerical procedures. A significant simplification of computations can be attained with the help of the following lemma: Lemma 1. For a fixed value of no the average sample size T i (p; no, n,) is an increasing function of n,. The proof is given in the appendix. Thus, for any of the considered (or similar) objective functions the optimal value of n l = n, - no is equal to the smallest integer not smaller than the right-hand side of the inequality (17). Notice that this value is a function of pz , and is not a function of p l . Hence, if the objective function does not depend directly on p l , the optimal plan depends exclusively on pz. The value of pl only has impact on the range of admissible values of no. To illustrate the theoretical results let us find optimal CSeq-1 sampling plans for a subset of p l , and pz values taken from the international standard I S 0 8422. The risks are the same as in I S 0 8422, i.e. a = 0.05 and P = 0.1. The results presented in Table 1 confirm our previous remark that the parameters of the optimal CSeq-l sampling for the considered objective function do not depend on the value of pl. Another interesting, and somewhat unexpected, feature is the abrupt and significant change between the sampling plans when the discrimination rate becomes too high for the application of the curtailed single sampling plan with acceptance number equal to zero. This feature indicates the fundamental difference between these sampling plans. In the general case of the optimal curtailed Wald's sampling plans this difference is not so visible. The results presented in Table 2 confirm our claim, that in the case of the objective function that explicitly depends on pl , the parameters of the optimal CSeq-1 sampling plan depend both on pl and pz. It is worth noting that larger curtailment numbers are required for higher discrimination ratios. The results of optimization for the case considered in Table 3 are similar to those presented previously. We may add an additional remark that in the considered cases the dependence of the optimal clearance number no upon the values of pl and pz is not so simple. In the case of a fixed pl the clearance number no decreases with the decreasing discrimination rate. However, for a fixed value of pz the type of dependence is reversed. The interpretation of this phenomenon needs further investigation.
Table 1. Optimal CSeq-1 sampling plans (no,n,) that minimize sup, T i ( p ;no, n,)
0,Ol
(230,230)
(247,500)
(247,500)
(247,500) (247,500)
0,0125
(184,184)
(184,184)
(197,401)
(197,401) (197,401)
0,016
(143,143)
(143,143)
(143,143)
(153,316) (153,316)
0,02
(114,114)
(114,114)
(114,114)
(114,114) (124,244)
0,025 NA
-
.
(91,91) (91,91) (91,91) (91,91) (91,91) CSeq-1 plan not available (discrimination ratio too high).
(no,no)- curtailed single sampling plan.
5 CSeq-1 sampling plans vs. curtailed Wald's sequential sampling plans and curtailed single sampling plans T h e CSeq-1 sampling system described in the preceding sections is a special case of Wald's curtailed sequential sampling system defined by formulae (1) t o (7). By simple analysis of the CSeq-1 acceptance and rejection regions, we can easily show t h a t the following relations hold:
I t is easy t o see t h a t for Wald's original plans the value of h R may be smaller than the left-hand side of (20). T h e practical consequence of this is the following: it is possible t o reject the lot when we observe only one nonconforming item. I t gives a n additional degree of freedom in the design of the sampling plans. Thus, it is possible t o find Wald's curtailed sampling plans which are more effective (smaller average sample sizes) t h a n the optimal
Table 2. Optimal CSeq-1 (7%(PI;no, nc) +? ( ~ 2no,nc)) ; 12.
+
sampling
plans
(no,nc)
that
minimize
0,0125
(184,184)
(184,184)
(186,515)
(187,493)
(188,476)
0,016
(143,143)
(143,143)
(143,143)
(145,403)
(146,381)
0,02
(114,114)
(114,114)
(114,114)
(114,114)
(116,316)
0,025 NA
-
(91,91) (91,91) (91,91) (91,91) (91,91) CSeq-1 plan not available (discrimination ratio too high).
1 (no,no)- curtailed single sampling plan.
CSeq-1 sampling plan. The illustration of this fact is given in Table 4 where we compare the Wald's sequential sampling plan (taken from the draft of the new version of the I S 0 8422 international standard: h~ = 0.883, hR = 0.991, g = 0.000903, n, = 2083 with the optimal CSeq-1 sampling plan (no = 968, n, = 2144), both designed to fulfill the following requirements (PRQ = 0.02%, CRQ = 0.25%, a = 0.05, ,B = 0.1) Thus, CSeq-1 sampling plans are, in general, inferior in comparison t o the optimal Wald's plans. However, its design is much simpler, and requires significantly smaller computational effort. Moreover, if one accepts one nonconforming item in the sample ("benefit of doubt"), then the CSeq-1 plan seems to be more "logical " to practitioners. Now, let us consider the comparison between the CSeq-1 sampling plan, and the curtailed single sampling plan with clearance number (maximal sample size) no , and acceptance number equal t o zero. The probability of acceptance for this sampling plan is given by a simple formula
Table 3. (E(0; no, n,)
Optimal
CSeq-1
+ 5 (PI;no, n,)) 12.
sampling
plans
(no, n,)
that
minimize
PZ/P~
0,0002
0,00025
0,000315
0,0004
0,0005
0,002
NA
NA
NA
NA
NA
0,0025
(968.2144)
NA
NA
NA
NA
NA
-
CSeq-1 plan not available (discrimination ratio too high).
(no,no)- curtailed single sampling plan.
Table 4. Comparison of Wald's and CSeq-1 sampling plans. Parameter
Wald's sequential plan
CSeq-1 plan
PA (CRQ)
0,0997
0,lO
(E(PRQ) + A (CRQ)) I2 922,7 It is easy to show t h a t the requirements (3) curtailed single sampling plan if
927,14
-
(4) are fulfilled by t h e
where
From the analysis of (3), (4), and (22) we can find t h a t the parameter no has t o fulfill the following relation:
It is easy to show that the average sample size for the curtailed single sampling plan is an increasing function of no. Thus, the optimal (assuring the minimum average sample size) value of no is given by the following formula:
Let us analyze the conditions (17) and (18). From this analysis it follows immediately that the parameter no of the CSeq-1 sampling plan has to fulfill two inequalities:
Hence, for the optimal curtailed single sample and CSeq-1 sampling plans the following inequality holds no 5 no. If (29) holds, then for any sequence of sampling results the decisions under the rules of the CSeq-1 sampling plans are made not earlier than the decisions under the rules of the curtailed single sampling plans. Therefore, if the curtailed single sampling plan that fulfills (3) and (4) exists, than it is always more effective than any CSeq-1 sampling plan that also fulfills these requirements. This result is not unexpected if we consider the curtailed single sample plan with acceptance number equal to zero as the special case of the CSeq-1 sampling plan with n, = co.
6 Appendix Proof. of Lemma 1. Let An, = f i (p; no, n, have Anc = no (1 no
+ 1) - f i (p; no, n,) . From (11) we
+ (n, + 1) nop (1 - p)"" n,+l
+ iC= 2 i (i - 1)p2 (1 - p)i-2 + i=no+l C inop2(1 - p)i-2
.
-no (1 - p)nO - ncnop (1 - P ) ~ " - ' no
-
C i (i - 1)p2 (1 - p)i-2 -
i=2 = nop (1 - P ) ~ " - '[(nc n,-1 > 0, = nap (1 - P)
+
5
inop2(I -p)'-' i=no+l 1) (1 - p ) - n,] (n, 1)nop2 (1 - p)"'-'
and this completes the proof.
+
+
References 1. Aroian, L.A. (1976):"Applications of the direct method in seqpential analysis". Technometrics, 18, 301 - 306. 2. Baillie, D..H. (1994); "Sequential sampling plans for inspection by attributes with nearnominal risks ".Proc. of the Asia Pacific Quality Control Organisation Conference. 3. Barnard, G.A. (1946): "Sequential tests in industrial statistics". J o u m . Roy. Stat. Soc. Supplement, 8, 1 - 21. 4. Bartky, W. (1943): "Multiple sampling with constant probability". Annals of Math. Stat., 14, 363-377. 5. Hryniewicz, 0 . (1996):"Optimal sequential sampling plans ". In: Proc. of the 4th Wuerzburg-Umea Conf. in Statistics, E.von Collani, R.Goeb, G.Kiesmueller (Eds.), Wuerzburg, 209 - 221. 6. I S 0 8422: 1991. Sequential samplzng plans for inspection by attributes. 7 . Wald, A. (1945):"Sequential tests of statistical hypotheses". Annals of Math. Stat., 16, 117 186. 8. Wald, A. (1947): "Sequential Analysis". J.Wiley, New York. 9. Woodall, W. H., Reynolds Jr., M.R. (1983):"A discrete Markov chain representation of the sequential probability ratio test ". Commun. Statist. - Sequential Analyszs, 2(1), 27 - 44.
Three-Class Sampling Plans: A Review with Applications Frank A. Palcat Measurement Canada, Ottawa, Ontario, KIA 0C9, Canada [email protected]
Summary. Acceptance sampling plans have been widely used in statistical quality control
for several decades. The vast majority of the relevant research and development over this period has focused on two-class sampling plans that involve classifying product characteristics as either conforming or nonconforming with respect to specified acceptance requirements. In recent years, some developmental work has occurred with respect to three-class sampling plans that additionally involve classifying product characteristics as marginally conforming with respect to requirements. This work remains relatively unknown among practitioners and applications seem to be presently limited to control of undesirable microbiological presence in food. This paper presents a review of the key developmental contributions to three-class sampling plan theory and discusses some applications where such plans would provide a more effective means of quality control. Applications specific to the field of legal metrology, including case-studies where isolated lots are common and currently-used methods are problematic, are particularly examined.
1 Introduction Acceptance sampling plans have been widely used in statistical quality control for several decades. The vast majority of the relevant research and development over this period has focused on two-class sampling plans that involve classifying product characteristics as either conforming or nonconforming with respect to specified acceptance requirements. Since the 1940s and 1950s, several national and international standards have been formulated using this dichotomous basis for product quality, including the popular IS0 2859 and IS0 3951 series for sampling by attributes and sampling by variables respectively. In recent years, some practitioners and researchers have expressed interest in sampling plans for evaluating products that may be categorized into three (or more) quality classes. For example, under such sampling plans, a product with three quality states may be classified as conforming, marginally-conforming, or nonconforming, where the original conforming category is split into two parts, although other methods for defining and labeling the resulting classes are possible. However, developmental work with respect to multiple-class sampling plans has
been extremely minimal in comparison to that associated with their two-class counterparts and, outside the field of food safety inspection, such sampling plans are generally unknown in application and not even mentioned in popular statistical quality control texts. The present paper attempts to rectify the problem regarding a lack of awareness of such sampling plans among statistical quality control professionals. Section 2 provides definitions of symbols used throughout the paper. Section 3 provides a review of the key developmental contributions to three-class sampling plan theory over the past three decades. In section 4, a few extensions to the contributions in section 3 are presented along with some comments of a practical nature. Applications specific to the field of legal metrology, including case-studies where isolated lots are common and currently-used methods are problematic, are examined in section 5. Section 6 concludes the paper with some closing remarks.
2 Symbols The symbols in this section are used throughout the paper. Wherever equations or symbols from the reviewed papers are cited in this one, they have been translated in accordance with this list. maximum allowable number of marginally-conforming items in a sample maximum allowable number for the sum of the marginally-conforming and nonconforming items in a sample maximum allowable number of nonconforming items in a sample number of conforming items in a lot (Do= N- Dl - D2) number of marginally conforming items in a lot number of nonconforming items in a lot number of conforming items in a sample (do= n - dl - d2) number of marginally conforming items in a sample sum of the marginally-conforming and nonconforming items in a sample ( 4 2 = dl + d2) number of nonconforming items in a sample acceptability constant associated with the sum of the marginally-conforming and nonconforming items acceptability constant associated with the nonconforming items in the sample lower specification limit separating marginally-conforming from conforming items lower specification limit separating nonconforming from marginallyconforming items lot size sample size
probability of acceptance lot or process proportion conforming (po= 1 -pl -p2) lot or process proportion marginally conforming lot or process proportion marginally conforming plus nonconforming @I2 = P I + ~ 2 ) lot or process proportion nonconforming sample standard deviation (n-1 degrees of freedom) noncentral t variable narrow limit compression factor upper specification limit separating conforming from marginallyconforming items upper specification limit separating marginally-conforming from nonconforming items critical value for quality value function (QVF) approach value assigned to a marginally-conforming item (0 < v < 1) sample mean the xthfractile of the standardized normal distribution noncentrality parameter of a noncentral t distribution lot or process standard deviation
3 Review of Key Contributions 3.1 Sampling by Attributes
The subject of three-class acceptance sampling plans appears to have been first introduced in the statistical quality control literature in 1973. Motivated by applications in the health protection and food safety fields, Bray et al. [4] laid down the basic theory for three-class acceptance sampling by attributes using the trinomial probability distribution model. In general, such a sampling plan is specified by a sample size, a critical value representing the maximum allowable number for the sum of the marginallyconforming and nonconforming items in the sample, and a critical value representing the maximum allowable number of nonconforming items in the sample (n, c12, cz). A random sample of n items is inspected and the number of marginallyconforming items (dl) and nonconforming items (d2) in the sample are counted. If both of the following inequalities are satisfied:
dl, 5 c I 2 and d, S c , (where dl*= d l + d2) then the lot is accepted; otherwise, it is rejected. The probability mass function for the trinomial distribution is:
(1)
and the function for the sampling plan's probability of acceptance is:
Bray et al.'$ primary interest was in the subset of such sampling plans where c2 is fixed at 0 and c , is the maximum allowable number of marginally-conforming items in the sample. In this subset of plans, a lot is rejected if d2 > 0 or dl > cl and accepted otherwise. The simplified function for the sampling plan's probability of acceptance is:
Several different "c2 = 0" sampling plans are tabulated in the paper for various combinations of pi, along with contour representations of the operating characteristics for two example sampling plans. Sampling plans based on Bray et al.'s paper [and in particular equation (4)] were adopted for food safety purposes by such organizations as the International Commission on Microbiological Specifications for Foods (ICMSF) and the Codex Alimentarius Commission. References include ICMSF [I 81, which was originally published in 1974, and chapter 7 of Ahmed [I]. It should be noted that these sampling plans are one-sided, based on two upper specification limits conventionally represented by the symbols m and M, whereas in this paper, these two specification limits are defined as U , and U2respectively. 3.2 Sampling by Attributes - Curtailed Inspection
Shah and Phatak [26, 271 extended the basic work of Bray et al. [4] regarding three-class attributes sampling plans, focusing on the practice of curtailed inspection. In [26], the authors first explored the case of semi-curtailed sampling where inspection would cease as soon as the number of nonconforming items or nonconforming plus marginally-conforming items discovered in the sample were sufficient to cause lot rejection. This was followed by consideration of the case of fully-curtailed inspection where, in addition to ceasing inspection due to early lot rejection, sampling inspection would cease as soon as lot acceptance was evident. The authors provide equations for the average sample number (ASN) as well as the maximum likelihood estimators (MLE) of the proportions marginallyconforming and nonconforming in the lot and their associated asymptotic vari-
ances under these forms of curtailed inspection. The relationship between the percent saving in inspection and the efficiency of the estimators is also provided. In their second paper [27], the authors extended their research to address threeclass attribute sampling plans in a multiple sampling context, with particular focus on double sampling. Expressions for the ASN, the MLEs of the proportions marginally-conforming and nonconforming after a specified number of lots have been inspected, and the relations between the estimators' asymptotic variances and the ASN are obtained under uncurtailed, semi-curtailed, and fully-curtailed inspection in the double sampling context, then generalized to multiple sampling. As the full details of Shah and Phatak's two papers are beyond the intended scope of this one, the interested reader is advised to consult their work. 3.3 Sampling by Attributes - Indexed by AQL
In a series of three papers, Clements [7, 8, 91 set out to create a system of sampling plans comparable to those in such standards as MIL-STD-lO5D [30] and I S 0 2859-1:1999 [19]. As his papers were largely of an evolving nature, this section will focus primarily on Clements [9]. His approach differs slightly from that of Bray et al. [4] in that he specifies a sampling plan in terms of a sample size, a critical value for the maximum allowable number of marginally-conforming items in the sample, and a critical value for the maximum allowable number of nonconforming items in the sample, i.e., (n, c,, c Z ) A random sample of n items is inspected and the number of marginally-conforming items (dl) and nonconforming items (d2) in the sample are counted. If both of the following inequalities are satisfied:
d, lc, and d2 lc, then the lot is accepted; otherwise, it is rejected. In this case, the function for the sampling plan's probability of acceptance is simply the cumulative distribution function for the trinomial distribution:
Clements [9] used Table 11-A of [30] as a launching pad. This table provides code letters, sample sizes, and associated acceptance and rejection numbers based on various acceptance quality limits (AQL) under normal inspection. As much as possible, Clements limited himself to using the preferred sample sizes and AQL values to construct sets of possible trinomial-based alternatives to the binomialbased plans in the table. He pointed out that the system he developed worked best for AQL values less than 4.0% and code letters L and above. For code letters from F to K, adjustments to sample sizes were necessary. However, he was able to succeed in using a single set of acceptance numbers along each diagonal in his version of the master table.
3.4 Sampling by Variables
At the suggestion of O.B. Allen (University of Guelph), Brown [5] investigated extending the three-class attributes sampling plan approach to a sampling-byvariables framework. (Newcombe and Allen [22] later published an abbreviated version of this thesis.) The primary focus of this work was the application where the variable of interest is distributed according to a normal distribution with unknown mean and standard deviation and the marginally-conforming and nonconforming specification limits are one-sided. However, the thesis [5] does include a chapter dealing with techniques for applying the method under situations of nonnormality. Somewhat paralleling the methodology of Bray et al. [4], Brown approached the problem by applying two two-class sampling plans simultaneously: one for the proportion nonconforming and one for the combined proportion marginallyconforming and nonconforming. Such a sampling plan is specified by a sample size, an acceptability constant with respect to the sum of the proportions marginally-conforming and nonconforming, and another acceptability constant with respect to the proportion nonconforming, i.e., (n, kI2,k2). A random sample of n items is inspected and the sample mean (x,- ) and standard deviation (s) are calculated. If both of the following inequalities are satisfied:
Z+k,,s
(8)
then the lot is accepted; otherwise, it is rejected. The probability of acceptance of a lot of quality (pl, p2) involves the bivariate noncentral t distribution as indicated below:
Pr(?+ k,,s 5 U , and ? + k2s5 U2I p , , p 2 ) =Pr(?; 5-k,,&
and T2 i-k2J;;16,,62)
(9)
where T, has a noncentral t distribution with n-1 degrees of freedom and noncentrality parameter 6, = -dn z , , for i = 1, 2. The special case of the bivariate noncentral t distribution needed to solve this probability was originally given by Owen [23]. Brown gives details of both the exact and an approximate method in her thesis [5]. Newcombe and Allen [22] also give the necessary details. To design a sampling plan, Brown proposed using a method of specifying three-points on the operating characteristic surface, initially setting p2 = 0 and solving for n and kI2,given the acceptable and rejectable values o f p , and the associated producer's and consumer's risks. The next step involved selecting a value o f p l between the originally specified acceptable and rejectable values from the first step, setting p2 = pi, and then solving for k2 using the previously established value of n. This method permits the usual two-class procedure to be used to determine the acceptability constants. Brown also provides guidance for matching
His method was to use the binomial-based plan's AQL value as the process nonconforming value p2 and then choose AQL values that were odd numbers of steps higher than p2 for the process marginally-conforming values pi. For each higher p, value selected, the sample size corresponding to the next lower code letter was used. Values of cl and c2 were then determined to provide approximately the same probability of acceptance as the two-class sampling plan for the new AQL pair ($1,p d . For example, for an AQL of 1 .O% and code letter L, the two-class plan (n, c2) is (200, 5). With p2 fixed at 1.0%, pl is allowed to assume the values 1.5%, 4.0%, and 10% with n assuming the values 125, 80, and 50, respectively, and then appropriate values of c l and c2 are determined as illustrated in Table 1 below. Table 1: Trinomial Sampling Plans Matching Code Letter L and AQL = 1.0%
In addition to determining trinomial sampling plans for the various code letters and AQL values, Clements also provided tables with their probabilities of acceptance at the AQL values o f p , and p2, and the values of the sampling plans' limiting quality levels (LQL), under the assumption of a normal distribution with known standard deviation, and maximum limiting quality levels (MQL), in a distribution-free sense. Clements also proposed using a narrow hmit technique (see [25]) for establishing the value of the specification limit separating conforming from marginally-conforming items when the standard deviation is known and the distribution of the quality characteristic is normal. For a one-sided sampling plan with upper specification limits, the additional specification limit is determined as follows:
Samples of this additional information are included in Table 1 for the example earlier discussed. Clements also includes approximate formulae for determining generic trinomial sampling plans based on the assumption of quality characteristics distributed according to a normal distribution with known standard deviation, as well as graphical operating characteristic curves and contours for example sampling plans.
three-class variables sampling plans with their attributes counterparts. The thesis includes APL programming code to aid in developing three-class sampling plans. 3.5 Sampling by Attributes - Multiple Classifications of Lots
Bebbington et al. 131 studied the practices used in New Zealand with respect to the procurement inspection and grading of one of the country's leading export variety of apples. At the time of their review, the current practice involved a samplingby-variables approach. The method was found to suffer from a number of deficiencies including using a sample mean as an indicator of individual apple quality as well as ignoring the variability and non-normality of the measured variables. Furthermore, the approach was determined to be biased in the producer's favour. The authors concluded that a more appropriate sampling plan solution for grading this produce would involve classification by attributes. The practice was to grade lots of apples into one of four grades (A, B, C, or R) and criteria existed to grade individual apples according to these same grades. Lacking, however, was a standard in terms of quality levels and fraction nonconforming. To address this deficiency, the authors used simulation methods for establishing critical values for grading the individual lots while maintaining approximately the same expectations with respect to different grades from past history of the various growers' lines under the sampling-by-variables plan. As four classes of acceptability (or grades) were involved, the applicable probability models identified were the quadrivariate hypergeometric distribution (for isolated lots) and the quadrinomial distribution (for a continuing series of lots). The authors developed their formulae for the four classification probability functions based on the quadrinomial distribution and dominance logic with respect to individual sample item classifications, then applied the rules in the order from R to A (rather than from A to R) to reduce producer-oriented bias. The authors conclude the paper with several recommendations to enhance the implementation of the insightful solution they developed. The approach used by the paper's authors is interesting in that it takes a rather obscure grading approach and translates it into terms that are more transparent and amenable to critical evaluation by purchasers and consumers. Their approach is no doubt applicable in principle to many practical applications beyond the grading of produce. 3.6 Sampling by Attributes - Quality Value Functions
Cassady and Nachlas [6] introduced an interesting, flexible variation on the work of Bray el al. [4], using what they refer to as quality value functions (QVF). They reviewed two valuation schemes. The first (QVF,) assigned a value of 0 to conforming items, a variable value v between 0 and to marginally-conforming items, and a value of 1 to nonconforming items. The second scheme (QVF2) assigned a value of 1 to conforming items, a variable value greater than 1 to marginally-conforming values, and another variable but greater yet value to nonconform-
ing items. The authors reported that their experiments and analyses did not reveal any significant advantage of one scheme over the other so this section will focus only on QVF,. Their method involves specifying a sample size and a single critical value (n, V,). A random sample of n items is inspected and the number of marginallyconforming items ( d l )and nonconforming items (d2)in the sample are counted. If the following inequality is satisfied:
then the lot is accepted; otherwise, it is rejected. The hnction for the sampling plan's probability of acceptance (for valid values of V,) is:
To aid in the implementation of such sampling plans, the authors provide approximate formulae for calculating n and V, to give the desired probabilities of acceptance, given p, vectors for both acceptable and rejectable quality and specified producer's and consumer's risks. They also suggest considering several different values of v and verifying the actual resulting risks against those specified before deciding upon a sampling plan.
4 Extehsions and Comments 4.1 Sampling by Attributes - Finite Lots
Although mentioned in passing in the papers reviewed in section 3, the equations necessary to implement three-class sampling plans in common finite-lot situations were not fully provided. This section provides them for purposes of completeness. The probability mass function for three-class or trivariate hypergeometric distribution is as follows:
The function for such a sampling plan's probability of acceptance is:
For the subset of such sampling plans where c2 is fixed at 0, the simplified function for the sampling plan's probability of acceptance is:
Finally, the function for a finite-lot sampling plan's probability of acceptance under the QVFl approach is:
4.2 Sampling by Variables
Brown [5] and Newcombe and Allen [22] limited their work to the unknown mean and standard deviation, one-sided scenario, which applies directly to two upper specification limits (Ul, U2) or to two lower specification limits (LI, L2). However, their work may be extended to the two-sided scenario where combined control of the proportions outside both the upper and lower specification limits is required. Brown had suggested an approximate method that is frequently used for two-class sampling plans in the "further research" section of her thesis [5]. Once the values of k12and k2 are determined for the one-sided, unknown mean and standard deviation case, these values may be converted into critical values for the estimates o f p 1 2and p2respectively using the "M method" described in, for example, Schilling [25, Ch. 101. Uniform minimum variance unbiased estimates (UMVUE) of the values of p12and p2 can then be computed from the sample statistics, with the estimates due to the upper specification limits being added to their lower specification limit counterparts and then compared to the previously determined critical values to decide lot acceptance. As pointed out in Brown [5], the method is approximate as the actual probability of acceptance of such a lot also depends on how much of the proportion of interest is outside each specification limit; the approximation is good, however, because the operating characteristic band is very narrow (see Baillie [2]). 4.3 Comments
in'order to implement a three-class sampling plan, it is first necessary to be able to define three mutually exclusive classes of product quality. Where the quality characteristic of interest may only be classified into one of two states, this is not an option. However, where the quality characteristic is measurable, a potentially useful approach that may be employed to create a third class of acceptability is narrow-limit gauging (also referred to as compressed-limit or reduced-limit gauging) as discussed by Schilling [25]. The method was originally created to permit a form of sampling-by-attributes inspection to be used as an alternative to samplingby-variables inspection with approximately equivalent control. (As already men-
tioned in 3.3, Clements used this method in [9].) Although the method is traditionally associated with characteristics distributed according to a normal distribution with known standard deviation, it would seem reasonable to alter these conditions in some applications. In particular, with respect to sampling inspection applications where consumer protection is emphasized, a normal distribution centered at the desired lot target mean could be used with a maximum standard deviation calculated to provide the required proportion nonconforming p2 beyond the upper or lower specification limit (or both). Once the mean and standard deviation are established for the model normal distribution, narrow limit(s) can be readily determined such that the total lot proportion outside these new limits Ca12)becomes associated with the usual low probability of acceptance assigned to p2 alone. Values of n, c12,and c2 may then be determined to complete the sampling plan. It should also be noted that model distributions other than the normal distribution could be justified in such narrow limit determinations. More generally, sampling plans are usually designed with consideration to both the quality level that should be accepted with a specified high probability (producer's risk quality or PRQ) and the quality level that should be accepted with a specified low probability (consumer's risk quality or CRQ). In the case of threeclass sampling plan design, values for both the proportion marginally-conforming and proportion nonconforming need to be specified as a vector or pair (pl, p2) for both the PRQ and CRQ scenarios. As an example, the PRQ may be specified as 95% probability of acceptance for quality (0.10, 0.01) with the CRQ specified as 10% probability of acceptance for quality (0.30,0.04). The establishment of these quality levels is a matter for the producer and consumer to agree upon and ideally should involve consideration of relevant cost information. Determining a sampling plan to provide the desired performance will usually require a computer program that steps through various combinations of n, c12,and c2 until the required set of values is found. However, it should be noted that Cassady and Nachlas [6] and Clements [9] provided more efficient algorithms in this regard for their particular versions of these sampling plans. In addition, judging by its title, a paper by Singh et al. [28] may provide useful information in this regard, but unfortunately a copy could not be obtained in time to include in this review. As a closing observation, the method employed by Clements [7, 8, 91 could raise questions by practitioners. His method specified independent specifications for the marginally-conforming items (dl) and nonconforming items (d2) in the sample. Consider any case where c2 > 0 and, for example, the sampling plan (n, c,, c2) = (20, 3, 1). If (dl, d2) = (4, O ) , the lot would be rejected. However, if one of the marginally-conforming items were classified at a worse level (i.e., nonconforming) instead, the lot would be accepted. This is a questionable feature of this particular approach.
5 Case Studies - Legal Metrology Applications 5.1 Introduction
In Canada, acceptance sampling has been used in legal metrology applications for nearly four decades. One of its principal uses has been in the quality control of utility meters that measure electricity or natural gas supplied to consumers. By law, such meters must be inspected for conformance to specification requirements prior to use and be periodically inspected while h u s e . With few exceptions, due to the numerous utility companies in the country and their varied practices, the meters exist in the form of isolated lots for inspection purposes. The proportion of nonconforming meters in a lot has traditionally defined lot quality for utility meter sampling inspection purposes. Another principal application of acceptance sampling has been in the quality control of the net contents of packaged products sold in the marketplace. Such products include those sold on the basis of such measures as weight, volume, length, and area. In this particular application, products are also usually inspected on an isolated-lot basis for regulatory purposes. However, lot quality is usually measured on the basis of two criteria for such products: the proportion of nonconforming packages in the lot and the lot mean quantity. This section reviews Canadian quality control practices in these two areas of application, highlighting some of the deficiencies and issues. Three-class sampling plans are proposed as a possible solution to some of these deficiencies and issues. 5.2 Utility Meters 5.2.1 Control of In-service Quality
in-service utility meters have been inspected by sampling since the mid-1960s when sampling inspection was first introduced on an experimental basis to monitor their in-service quality. From the outset, sampling by variables has been used far this purpose. The sampling plans were designed using the "combined control of double specification limits" method with acceptability constants initially derived to provide 99% confidence that 99% of the items would be within specification limits (the confidence level would be later reduced to 95%). The ranges of nominal sample sizes used were 25(25)100(50)300, with the possibility of choosing a second sample equal to the size of the first. On the surface, the heart of approach was in line with methods described in the statistical literature. However, in application, the approach was accompanied by arbitrary rules and practices that impaired its ability to ensure any uniformity in the quality control of these meters. Some of the more obvious deficiencies included:
meters with extreme performance results were permitted to be designated as "outliers" and systematically excluded based on a fixed percentage of the sample size. tests to determine whether the distribution of the meters' performance results satisfied the normality assumption were never performed. lot homogeneity criteria permitted mixtures thus reducing the possibility of the normality assumption to hold. a significant percentage of the sample was permitted to be designated as missing or damaged in transit without objective evidence or proper redress. univariate sampling plans were being applied to a product with correlated multivariate characteristics. individual meter performance results were averaged when only two characteristics were involved but kept separate for analytical purposes when three or more characteristics were involved. the provision for selecting a second sample was not based on proper double sampling methodology. the effects of measurement uncertainty were ignored. In addition to these design and implementation-related deficiencies which could have been corrected with policy and specification revisions, a very fundamental problem existed with the sampling-by-variables approach itself. As mentioned in the introduction, this particular application involves finite (isolated) lots. Beginning in 1990, research by von Collani [ l l , 12, 131, followed shortly afterwards by Hryniewicz [17] and Gbb [14, 161, demonstrated that sampling by variables was not an appropriate means to control the quality of finite lots. Collani's [11] remark that when using a variables sampling plan for control of lot proportion nonconforming "you never know your risks, unless you know completely the lot in question" (in which case, sampling itself would be unnecessary) appropriately summarized the state of affairs not only with the theoretical underpinnings of the method's use in general practice, but also regarding the effects of the deficiencies in implementation described above. Fortunately, the important contribution of these researchers is finally finding its way into sampling inspection standards and associated informative material, which should help prevent future misapplication. As stated by Gob [16], "attributes sampling always provides the best (most powerful) test for lot proportion nonconforming." Consequently, and especially in a legal metrology setting, a sampling-by-attributes plan is the appropriate method, not a sampling-by-variables plan, when attempting to control the lot proportion nonconforming. However, unless the lot size is very large, the proportion of the lot that must be inspected to achieve a criterion such as 95% confidence that 99% of the lot is within specification limits can be considerable. For example, in the case of an infinite lot size, the minimum necessary sample size and acceptance number (n, c2) pair is (299, 0). Even for a finite lot size of 200, these minimum values are still considerable at (155, 0).
A possible attributes-based sampling method for this application that could reduce sampling costs is a three-class plan. Using the principles of the narrow limit gauging technique (see Schilling [25]), values for L1 and U1 can be determined such that control is focused primarily on the proportion of the lot within these tighter limits. Continuing with the example of a finite lot size of 200 and setting c2 = 0, if L1 and U I were established such that p12= 0.10, sampling plans (n, cl) such as (27, O), (42, I), and (55,2) will maintain at least 95% confidence that 90% of the lot is within the narrow specification limits. Using the normal distribution as the narrow-limit model with 1% outside the original two-sided specification limits L2 and U2 (0.5% per side), the narrow limits LI and U1 would be determined such that 10% of the lot would be outside these limits (5% per side). As the specification limits are symmetric around a target of 0% error (i.e., U2 = -L2), the narrow limits would be set at:
A drawback of this method is that, once the sample size is reduced, the risk of accepting lots with an unacceptable quantity of nonconforming units is increased, so this fact should be taken into consideration by studying a tabulation or operating characteristic contour for various values of pl and p2 for the sampling plan. A possible means to mitigate the effects of this in the case where c2 > 0 may be to' reduce the values of the outer specification limits (using the same method described above) to the point where c2 would become 0 for the sample size being used and the desired probability of acceptance for the proportion outside these limits (in effect, this introduces a fourth class). 5.2.2 Control of Initial Quality
In the early 1970s, following the introduction of sampling inspection for the control of in-service quality, acceptance sampling was introduced for the control of the initial quality of new and reserviced utility meters prior to their installation and use. Again, a sampling-by-variables approach was adopted. However, in this case, MIL-STD-414 [29] was selected and sampling plans were based on an AQL of 1.0%. As in the case for the control of in-service meter quality, the use of these sampling plans was beset with problems. Some of these deficiencies included: lots were most frequently not produced in a manner to satisfy the required continuing-series assumption. tests to determine whether the distribution of the meters' performance results satisfied the normality assumption were never performed. the required switching rules were not applied and no possibility for discontinuation of sampling inspection existed. rejected lots could be resubmitted without evidence of corrective action. univariate sampling plans were being applied to a product with correlated multivariate characteristics.
attribute-type qualitative characteristics were being evaluated using inappropriate sample sizes. the effects of measurement uncertainty were ignored. To address the more serious problems above, I S 0 2859-2 [20] is an obvious solution as it is designed for isolated lots and does not assume quality characteristics are normally distributed. The existing AQL of 1.O% would be translated to a limiting quality (LQ) of 3.15% per the standard's recommendation. As a result, sample sizes will be necessarily larger than those that were previously used. A possible means to reduce the required sample size would be to apply the narrow-limit approach described in 5.2.1 above. The new specification limits designating the marginally-conforming class could be defined using one of the other tabulated LQ values (e.g., LQ = 8.0%) in the isolated-lot sampling standard. The sample size and acceptance number associated with this LQ and a c2 value of 0 would complete the sampling plan. In addition, double or multiple sampling plans could be devised to further reduce sample sizes on average for good-quality scenarios.
5.3 Packaged Products Sampling plans for the net contents of packaged products (also referred to as prepackaged commodities) are set out in the Canadian Weights and Measures Regulations. These sampling plans involve both attributes and variables criteria. An attributes sampling plan is used to control the proportion beyond a lower specification limit and a hypothesis test is used to determine whether the lot mean is acceptably close to the declared quantity for the packages. In addition, a requirement is specified that no more than one unit may be below twice the lower specification limit, in effect creating a three-class attributes plan with two classes of nonconforming product. Unlike utility meters, packaged products are not inspected before entering the marketplace, and when they are inspected, the focus is on the available lot rather than the process that produced it. The objective of these sampling plans is not explicitly given but it is apparent upon some analysis of the given details that the plans are very loosely based on I S 0 2859-1 [19]. Evidently, two sampling plans (n, c2) based on an AQL of 2S%, namely, (32, 2) and (125, 7), were extracted from the standard and then several additional sampling plans were constructed around them [the ranges of the sample sizes are 10(1)32(32)125 with a range of acceptance numbers of 0(1)7]. The criterion regarding the sample mean is based on a t-test designed to accept the null hypothesis 99.5% of the time. Considering the general design of the sampling scheme and how the sampling plans are used in practice, several deficiencies are apparent: wrong sampling standard being applied as lots are inspected in isolation from the production process. incorrect hypothesis test for the mean being used as the lots are finite. tests to determine whether the distribution of the packages' net quantities satisfy the normality assumption of the t-test not being performed.
arbitrary sample sizes and acceptance numbers being used rather than a coherent system of sampling plans. the effects of measurement uncertainty being ignored. sampling scheme design is overly biased towards the producer and lacking uniformity in consumer protection. Most of these problems could be eliminated by using a more appropriate sampling plan standard such as I S 0 2859-2 [20]. If an AQL of 2.5% could be accepted in the "continuing series of lots" sense, an LQ of 8.0% would be appropriate according to the isolated-lot sampling standard. In addition, GSb [15, 161, for example, has given the proper test of significance for the mean of a finite lot under the normality assumption and this test has been known for several decades. Despite readily available solutions, even OIML's draft recommendation [21] on this subject suffers from the same problems mentioned above with the exception that only three individual sampling plans have been proposed. A possible alternative that would eliminate the need to make any distributional assumptions would be to use a three-class sampling plan approach where the marginally-conforming class would consist of packages with net quantities less than the declared quantity but greater than or equal to the lower specification limit. Since the t-test for the lot mean has an analogous distribution-free alternative through the well-known sign test (see, for example, Randles [24]), this hypothesis test can be applied to determine the acceptability of the distribution's location or central tendency. As the test simply involves counting the number of packages in the sample with actual quantities less than the declared quantity, it can be readily integrated within the sampling plan design. An example of a possible three-class sampling plan (n, c12,c2) would be (50, 30, l), based on a lot size from 501 to 1200 per [20]. The value c12was determined using the binomial distribution plan (50, 30) to provide approximately a 95% probability of acceptance when the proportion of the lot above the declared quantity is 0.50. Such an approach has the added advantages of being readily applicable to net contents that are variable within a homogeneity classification as well as being transparent to the user. It should also be noted that if the criterion regarding a package exceeding twice the lower specification limit were maintained, technically a four-class sampling plan would result.
6 Conclusion This paper has reviewed key contributions to three-class sampling plan development, offered some minor extensions to this work, and considered some applications that could potentially benefit from their implementation. Three-class sampling plans provide viable alternatives to the much more commonly used two-class sampling plans, offering greater ability to discriminate between lots of acceptable and unacceptable quality as well as offering economy in
inspection for similar control of quality. Their addition to the set of tools currently available to quality practitioners should be beneficial in quality control and improvement activities. With recent emphasis on quality concepts such as "continuous improvement" and "six sigma" (where quality levels in parts per million are being sought), three-class sampling plans may find many useful applications. The development of generic international standards for three-class sampling plans would aid in making this tool more generally available to potential users. In this regard, the work of Clements [7, 8, 91 could serve as a good foundation for creating a three-class counterpart to I S 0 2859-1 [I91 and I S 0 2859-2 [20]. As a by-product of this review, another area that could benefit from more research and possibly benefit from the exposure of international standardization is recommended methodologies for multiple classification or grading of lots as a result of sampling inspection. Bebbington et al. [3] observed that current sampling inspection standards do not address this reality. Collani also brought to my attention some of his work [I01 in this regard. These authors' research could serve as a foundation for developing a flexible methodology for the multiple classification of lots. Finally and more generally, there are many areas in both government and industry where standardized statistical quality control methods (such as I S 0 standards) are applied outside their intended scope or modified in uninformed manners by users. In many cases, more appropriate solutions already exist or could be readily developed upon analysis of the control activity's objective and setting. However, unless formal reviews are conducted, opportunities for improvement may not be identified and ineffective practices may go uncorrected for years. In the case of regulatory applications in particular, interested researchers could assist by initiating such reviews as research topics. This would contribute not only to the advancement of science but also to more efficiency in rules and regulations and hopefully reduced costs to society.
Acknowledgements I would like to thank the following individuals for their support and contributions to the development of this paper: Mr. David Baillie (Chair, I S 0 TC69lSC5) for inviting and supporting the topic of this paper; Prof. Dr. Elart von Collani (Universitat Wiirzburg) for supplying reference material as well as general support for the topic of this paper; Dr. Pat Newcombe-Welch (University of Waterloo) for kindly providing me with her MSc. thesis [5]; and Dr. K. Govindaraju (Massey University) for kindly providing me with his paper [3] as well as identifying additional references [27,28]. I would also like to thank Mr. Baillie, Dr. von Collani, Dr. Govindaraju, Dr. Olgierd Hryniewicz (Polish Academy of Sciences), Dr. Alvin Rainosek (University of South Alabama), and Mr. Henry Telfser (Measurement Canada) for providing helpful feedback on an earlier draft of this paper. Fi-
nally, I would like to thank an anonymous referee for identifying several editorial improvements and generally adding value to this final version of the paper.
References Ahmed, F.E., Ed. (1991). Seafood Safety. National Academy Press, Washington. Baillie, D.H. (1987). Multivariate acceptance sampling. In: Frontiers in Statistical Quality Control 3, Lenz, H.J. et al., Eds., Physica-Verlag, Heidelberg, 83-1 15. Bebbington, M., Govindaraju, K., DeSilva, N., and Volz, R. (2000). Acceptance sampling and practical issues in procurement inspection of apples. ASA Proceedings of the Section on Quality and Productivity, 104-109. Bray, D.F., Lyon, D.A., and Burr, I.W. (1973). Three Class Attributes Plans in Acceptance Sampling. Technometrics, Vol. 15, No. 3, 575-585. Brown, P.A. (1984). A Three-Class Procedure for Acceptance Sampling by Variables. Unpublished M.Sc. Thesis, University of Guelph, Department of Mathematics and Statistics, Guelph, Ontario, Canada. Cassady, C.R. and Nachlas, J.A. (2003). Evaluating and Implementing 3Level Acceptance Sampling Plans. Quality Engineering, Vol. 15, No. 3, 361369. Clements, J.A. (1979). Three-Class Attribute Sampling Plans. ASQC Technical Conference Transactions - Houston, 264-269. Clements, J.A. (1980). Three-Class Sampling Plans - Continued. ASQC Technical Conference Transactions - Atlanta, 475-482. Clements, J.A. (1983). Trinomial Sampling Plans to Match MIL-STD-105D. ASQC Quality Congress Transactions - Boston, 256-264. v. Collani, E. and Schmidt, R. (1988). Economic Attribute Sampling in the Case of Three Possible Decisions. Technical Report of the Wiirzburg Research Group on Quality Control, No. 13, September. v. Collani, E. (1990). ANSIIASQC 21.4 versus ANSIIASQC Z1.9. Economic Quality Control, Vol. 5, No. 2,60-64. v. Collani, E. (1991). A Note on Acceptance Sampling for Variables. Metrika, Vol. 38, 19-36. v. Collani, E. (1992). The Pitfall of Acceptance Sampling by Variables. In: Frontiers in Statistical Quality Control 4, Lenz, H.J. et al., Eds., PhysicaVerlag, Heidelberg, 91-99. Gdb, R. (1996). An Elementary Model of Statistical Lot Inspection and its Application to Sampling by Variables. Metrika, Vol. 44, 135-163. Gob, R. (1996). Tests of Significance for the Mean of a Finite Lot. Metrika, Vol. 44,223-238. Gdb, R: (2001). Methodological Foundations of Statistical Lot Inspection. In: Frontiers in Statistical Quality Control 6, Lenz, H.J. and Wilrich, P.-Th., Eds., Physica-Verlag, Heidelberg, 3-24.
17. Hryniewicz, 0 . (1991). A Note on E. v. Collani's Paper "ANSIIASQC 21.4 versus ANSIIASQC Z1.9". Economic Quality Control, Vol. 6, No. 1, 16-18 18. ICMSF (1986). Sampling for microbiological analysis: principles and specific applications, 2nd ed. Microorganisms in Foods, book 2. University of Toronto Press, Toronto. 19. International Organization for Standardization (1999). I S 0 2859-1:1999. Sampling procedures for inspection by attributes - Part I: Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection. ISO, Geneva. 20. International Organization for Standardization (1985). IS0 2859-2:1985. Sampling proceduresfor inspection by attributes - Part 2: Sampling plans indexed by limiting quality (LQ)for isolated lot inspection. ISO, Geneva. 21. International Organization of Legal Metrology (2002). OIML R 87 Net Quantity of Product in Prepackages, 3rd Committee Draft Revision. OIML, Paris. 22. Newcombe, P.A. and Allen, O.B. (1988). A Three-Class Procedure for Acceptance Sampling by Variables. Technometrics, Vol. 30, No. 4,415-421. 23. Owen, D.B. (1965). A Special Case of the Bivariate Non-Central tDistribution. Biometrika, Vol. 52, 437-446. 24. Randles, R.H. (2001). On Neutral Responses (Zeros) in the Sign Test and Ties in the Wilcoxon-Mann-Whitney Test. The American Statistician, Vol. 55, NO. 2, 96-101. 25. Schilling, E.G. (1982). Acceptance Sampling in Quality Control. Marcel Dekker, New York. 26. Shah, D.K. and Phatak, A.G. (1977). The Maximum Likelihood Estimation Under Curtailed Three Class Attributes Plans. Technometrics, Vol. 19, No. 2, 159-166. 27. Shah, D.K. and Phatak, A.G. (1978). The maximum likelihood estimation under multiple three class attributes plan - Curtailed as well as uncurtailed, Metron, 36,99-118. 28. Singh, H.R., Sankar, G., and Chatterjee, T.K. (1991). Procedures and tables for construction and selection of three class attributes sampling plans, (STMA V33 1597) IAPQR Transactions, Journal of the Indian Association for Productivity, Quality & Reliability, 16, 19-22. 29. U.S. Department of Defense (1 957). MIL-STD-414: Military Standard: Sampling Procedures and Tables for Inspection by Variablesfor Percent Defective. U.S. Government Printing Office, Washington. 30. U.S. Department of Defense (1963). MIL-STD-IO5D: Military Standard: Sampling Procedures and Tables for Inspection by Attributes. U.S. Government Printing Office, Washington.
Part 2 On-line Control
2.2 Control Charts
CUSUM Control Schemes for Multivariate Time Series Olha Bodnarl and Wolfgang Schmid2 Department of Statistics, European University, PO Box 1786, 15207 Frankfurt (Oder), Germany obodnarQeuv-f rankf urt-o .de Department of Statistics, European University, PO Box 1786, 15207 Frankfurt (Oder), Germany schmidBeuv-f rankf urt-o de
.
Summary. Up to now only a few papers (e.g., Theodossiou (1993), Kramer and Schmid (1997)) dealt with the problem of detecting shifts in the mean vector of a multivariate time series. Here we generalize several well-known CUSUM charts for independent multivariate normal variables (Crosier (1988), Pignatiello and Runger (1990), Ngai and Zhang (2001)) to stationary Gaussian processes. We consider both modified control charts and residual charts. It is analyzed under which conditions the average run lengths of the charts do not depend on the covariance matrix of the white noise process. In an extensive Monte Carlo study these schemes are compared with the multivariate EWMA chart of Kramer and Schmid (1997). The underlying target process is a vector autoregressive moving average process of order (1,l). For measuring the performance of a control chart the average run length is used.
1 Introduction There are many cases in which a n analyst is interested in t h e surveillance of several characteristic quantities. If, for instance, we consider a portfolio of stocks then t h e investor wants t o have a hint about possible changes in t h e return behavior of t h e stocks. By adjusting his portfolio a t a n early stage he avoids t o lose money in a bear market and he is able t o increase his profit in a bull market. T h e second example is taken'from environmental statistics. In order to measure t h e air pollution a lot of characteristics are monitored. T h e early detection of a n increase in t h e pollution helps t o reduce any health risks. Such problems can be found in many areas of economics, engineering, natural sciences, and medicine. Of course it is always possible t o reduce t h e surveillance problem t o a univariate one by considering each individual process characteristic separately. But then all information about t h e dependence structure of t h e quantities is lost. This attempt will not lead t o a power-
ful algorithm. For that reason it is necessary to jointly monitor all process characteristics. Most of the literature about the surveillance of multivariate observations is based on the assumption of independent and normally distributed samples. Moreover, it mainly deals with the detection of mean shifts. The first control chart for multivariate data was proposed by Hotelling (1947) (see also Alt (1985), Alt and Smith (1988)). It is known as a multivariate Shewhart control chart (T2- or X2- chart) and it is designed for detecting shifts in the mean vector of independent multivariate normally distributed random vectors. This control scheme is based on the Mahalanobis distance between the vector of observations and the target mean vector of the process. The interpretation of the out-of-control signals is discussed in detail by Murphy (1987), Mason et al. (1995), Runger et al. (1996), and Timm (1996). An extension of the exponentially weighted moving average (EWMA) chart of Roberts (1959) to multivariate observations was given by Lowry et al. (1992). Each component is weighted by its own smoothing parameter. Instead of one smoothing parameter the multivariate EWMA chart (MEWMA) works with a smoothing matrix. The distance between the multivariate EWMA recursion and its target value is measured by the Mahalanobis distance. The underlying observations are assumed to be independent and normally distributed. The generalization of the cumulative sum (CUSUM) chart of Page (1954) has turned out t o be more difficult. Several proposals have been made in literature (e.g., Woodall and Ncube (1985), Healy (1987), Crosier (1988), Pignatiello and Runger (1990), Hawkins (1991, 1993)). More recently, Ngai and Zhang (2001) proposed a projected pursuit CUSUM (PPCUSUM) control chart which is an extension of the multivariate CUSUM procedure proposed by Healy (1987). In many applications the underlying data-generating process possesses a more complicated structure. Alwan (1989) analyzed many data sets and he showed that the independence assumption is frequently not fulfilled. In the last 15 years a great number of publications in statistical process control treated the topic of control charts for time series. A natural procedure is to determine the design of the control scheme in the same way as for independent samples but with respect to the time series structure. This is the main idea behind the modified control charts (e.g., Vasilopoulos and Stamboulis (1978), Schmid (1995, 1997alb)). The second procedure is to transform the original data to independent ones. This leads to the residual charts (e.g., Alwan and Roberts (1988), Montgomery and Mastrangelo (1991), Lu and Reynolds (1999)). Up to now there are only a few papers dealing with the surveillance of multivariate time series. Using the sequential probability ratio method of Wald (1947) Theodossiou (1993) derived a multivariate CUSUM control chart for vector autoregressive moving average (VARMA) processes. It is based on the knowledge on the size and the direction of the shift. It has been shown that this chart reacts very quickly to deviations from the chosen direction. Runger
(1996) considered a model which is related to the method of principal component analysis. Based on a principal component decomposition a control statistic is presented. An extension of the MEWMA chart to time series was given by Kramer and Schmid (1997). They proposed modified and residual control charts for the detection of shifts within correlated components. In a new paper ~ l i w aand Schmid (2005) proposed several c~ntr01charts for the surveillance of the correlation structure between the components of a multivariate time series. However, in this paper we shall focus on mean shifts. The remainder of the paper is organized as follows. In the next section (Section 2) a model for the description of the out-of-control situation is introduced. In Section 3 modified multivariate CUSUM control charts are derived. The control charts of Crosier (1988), Pignatiello and Runger (1990), and Ngai and Zhang (2001) are extended to a multivariate stationary time series. In Section 3.5 an invariance property of these control schemes is considered. Several CUSUM control schemes for residuals are introduced in Section 4. In Section 5 the results of a simulation study are presented. Based on the average run length (ARL) all control charts are compared with each other. Moreover, we include the MEWMA scheme of Kramer and Schmid (1997). Section 6 contains concluding remarks.
2 Model One of the main aims of statistical process control is to monitor whether the observed process (the actual process) coincides with a target process. In engineering the target process is equal to the process which fulfills the quality requirements. In economics it is obtained by fitting a model to a previous data set. In the following {Yt) stands for the target process. Let { Y t ) be a p-dimensional time series and let Y t = ( Y t l , ...,Ytp)'. In the rest of the paper it is always assumed that {Yt) is a (weakly) stationary process with mean vector pO= (pol,..,pop)' and cross-covariance matrix r ( h ) = E((Yt+h -po)(Yt -PO)') = [E((&t - ~ o t ) ( & - - h ,-jC L O ~ ) ) ] : , = ~ =[~Y Z J ( ~ ) ] : , ~ = ~ at lag h. In practice it is necessary to estimate the parameters of the target process. We do not want to discuss the influence of parameter estimation on the power of a control scheme. This was done in literature by, e.g., Kramer and Schmid (2000). Thus, we assume that the process mean vector ,uO and the crosscovariance matrices r ( h ) , h 0 are known. Suppose that the data ~ 1 ~ x.. 2are , sequentially taken. The data are assumed to be a realization of the observed process {Xt). Here our aim is to monitor the mean behavior of the observed process. The observed process may deviate from the target process by a mean change. It is assumed in the following that Xt = Yt + al{q,+ 1,...) ( t ) , (1)
>
d m ,
where a = (al ...,a,d-)' E 1RP. The symbol l A ( t )denotes the indicator function of the set A at point t . The values a l , ...,a, E lR and q E fl are unknown quantities. If a # 0 we say that a change point at time q is present. a describes the size and the direction of the change. In case of no change the target process coincides with the observed process. We say that the observed process is in control. Else, it is denoted to be out of control. The in-control mean vector po is called target value. Note that X t = Y t for t 0, i.e. both processes are the same up to time point 0. For determining the control design only the target process is of relevance. It is assumed that { Y t ) is a p-dimensional stationary Gaussian process. The relationship between the observed and the target process has to be clarified in order to obtain statements about the performance of the control charts (Section 5). This is done by applying model (1). In the following EaYqdenotes the expectation taken with respect to this model. Eo means that no shift is present. By analogy the notations Cov,,,, Cove, etc. are used.
<
3 Modified Multiyariate CUSUM Charts There are two different ways of constructing control charts for multivariate time series. It can be distinguished between modified schemes and residual charts. Modified control charts are based on the direct analysis of the observed process. The control design is determined in the same way as in the independent case, however, taking into account the time series structure of the underlying process. The main idea behind the residual schemes is to transform the original observations to independent ones in the in-control state. Then the multivariate CUSUM control charts can be applied to the transformed series. For independent multivariate observations several different types of CUSUM control charts were obtained by, e.g., Crosier (1988), Pignatiello and Runger (1990), Ngai and Zhang (2001). Next we want to extend these schemes to time series. The main idea of our approach is to standardize X t by I ' ( O ) - ' / ~ ( X -~ p O )instead of z - ' l 2 ( x t - p o ) as it is done for independent samples with covariance matrix E. If the sequence of random variables { Y t ) is independent and identically distributed then the modified control schemes reduce to those for independent variables. Thus the modified charts introduced in this section are extensions of the well-known schemes. 3.1 Modified MC1 Pignatiello and Runger (1990) proposed two types of multivariate CUSUM charts, MC1 and MC2. The MC2 approach will be discussed in Section 3.2. Let S,J = c:=,+~(x~ - p O ) for l,m 2 0. The control chart is constructed by defining the M C 1 statistic as
For an arbitrary vector x and a positive definite matrix C the norm llxllc is defined as llxllc = .nt is equal to the number of observations taken until the last reconstruction of the CUSUM chart, i.e.
for t 2 1 with MClo = 0. If MClt exceeds an upper control limit then the process is considered to be out of control. For independent samples the ARL of the control chart is directionally invariant, i.e. it depends on p and X only through the non-centrality parameter X = ll,u - po115. Because the MC1 chart is not derived from the sequential probability ratio test of Wald (1947) like the univariate CUSUM chart of Page (1954) no statement about the choice of k can be made. Nevertheless, Pignatiello and Runger (1990) proposed to apply the same strategy as in the univariate case, i.e. to follow k = IIp, - p011x/2 where p1 is a specified off-target state. For that reason it seems to be possible to improve the performance of the MC1 chart at selected off-target conditions by taking alternative choices for k into account. Next we want to extend the idea of Pignatiello and Runger (1990) to time series. If {Yt) is a p-dimensional stationary process with mean vector po and cross-covariance matrix r ( 0 ) at lag 0 then the norm of the vector St-,,,t is calculated with respect to the matrix r(0). Consequently the control chart for time series gives a signal as soon as for t 2 1
where hl is a given constant. Here the index m is used t o indicate that a modified chart is present. n,,t is defined by analogy t o (2) but with respect to MCl,,t-l. Thus, IISt-n,,t,tllr(o) is again a distance of Mahalanobis type. The constant hl is chosen to achieve a given in-control average run length J. 3.2 Modified MC2
Pignatiello and Runger (1990) introduced a further control chart. It makes use of the square of the distance between the t-th observation and its target value p O ,i.e. 0: = (Xt - / A ~ ) ' X - ~ (-XpO) ~ . For independent and normally distributed samples 0; has a X2-distribution with p degrees of freedom and non-centrality parameter X = lip - poll&. The chart is based on the statistic
with MC20 = 0. A recommendation about the choice of k follows from the derivation of the univariate CUSUM chart from Wald's sequential probability
ratio test. Because E(D: - p) = X it is recommended to take k equal t o X/2 in the MC2 chart. The difference between MC2 and the COT scheme proposed by Crosier (1988) is that the latter scheme is based on D t instead of 0;. We derive a modified MC2 control chart by taking into account the covariance matrix of the time series. Thus we make use of
For stationary Gaussian processes D k , , is again X2-distributed with p degrees The CUSUM recursion of freedom and non-centrality parameter I I p - pO1 is the same as in (4). A signal is given at time t if MC2,,t exceeds a given control limit. 3.3 Modified Vector-Valued C U S U M
Crosier (1988) proposed two multivariate control charts for independent samples. The first one, based on the shrinking method, generalized the univariate proposal of Crosier (1986) to the multivariate situation by replacing the scalar quantities in the univariate CUSUM recursion by vectors in the multivariate case. This would lead to S t = max(0, St-l ( X t - p o ) - k}. However, there are two problems in this expression. First, how the maximum over vectors has to be understood and second, how k has to be chosen. k must have the same direction as St-1 ( X t - p o ) . Its length is equal to k = dk';T=-lk.The vector St-1 ( X t - p O ) is shrunk toward 0 by the vector k (St-1 ( X t - po))/Ct, where Ct is the length of St-1 (Xt - p a ) , i.e.
+
+
+
+
+
Ct = 11%-1
+ (Xt - p0)IIx .
The constant k > 0 plays the role of a 1-dim. reference value. Hence the multivariate CUSUM recursion is defined in the following way 0
(St-I
+ X t - p o ) ( l - $1
if Ct 5 k if Ct > k
(5)
for t 2 1 with So = 0. The scheme gives an out-of-control signal as soon as the length of the vector S t , i.e.
exceeds a preselected control limit. Crosier (1988) proved that the distribution of the run length depends on the mean vector p and the covariance matrix C only through the non-centrality parameter X = lip - poll&, i.e. it is directionally invariant. Concluding, the idea behind his MCUSUM scheme is first to update the vector of cumulative sums, then to shrink it toward 0 , and finally, to use the length of the updated and shrunken CUSUM t o test whether or not the process
is out-of-control. The choice of the reference value k = X/2 is recommended to detect any shift in the mean vector. The extension of Crosier's MCUSUM control chart t o time series is obtained in a similar way as for the MC1 and MC2 schemes. At time t we calculate Ct with respect to the covariance matrix I'(O), i.e.
Then the cumulative vector of sums Sm,t is calculated with respect to the new quantity C,,t. This means that
for t
> 1 with Sm,o =
0.Finally the control scheme signals at time t if
MCUSUMm,t = max(0, Cm,t - l c )
> hs .
(7)
3.4 Modified P r o j e c t e d P u r s u i t C U S U M
Another extension of the CUSUM chart is based on the idea of projection pursuit. It was proposed by Ngai and Zhang (2001) for independent vectors. Here it will be denoted as PPCUSUM. For a given direction a 0 with /la0112 = 1 (Euclidean norm) we define the CUSUM statistic Cp
= 0,
Ctao = max(0, lC !:
+ a b ~ = - l / -~ p( o~) ~- k ) ,
,
t
> 1 . (8)
Pollak (1985) and Moustakides (1986) showed that the univariate CUSUM chart possesses certain optimality properties. If the direction of the shift would be known in the multivariate context then the CUSUM chart based on the projected observations afXt would reflect this desirable behavior. The problem consists in the fact that the direction is unknown and therefore the statistic cannot be directly applied. Ngai and Zhang (2001) proposed to solve this problem by estimating a0 by Ao, and approximate C F by Here tois the value at which CFo attains its maximum on the unit circle, i.e. C? = maxllal12,1 C f . They proved that maxllal12,1 Cta = P P C U S U M t with
c?.
P P C U S U M t = max(0,
Itst-1,tllz - k,
11%-z,tllz - 2k,
..., IISo,tllz
- tic)
(9)
>
for t 1 with St-,,t as in Section 3.1. The control scheme gives a signal as soon as P P C U S U M t is sufficiently large. If the process is concluded to be out of control at time t o then there is tl < to such that
J s : ~ , ~ ~ z- -(to~- st l )~k ~ = ,max ~ ~C t > h Ilallz=l
The direction of the shift is estimated by
and
Because the process should be normalized with respect to the covariance matrix of the time series I= has to be replaced by r ( 0 ) in, (8) and thus the norm in (9) is taken with respect to r(O), i.e. Il..llrco, is always used. Hence the modified control chart gives an alarm if
where h4 is a given constant. The constant h4 is again chosen to achieve the fixed in-control average run length. 3.5 Invariance Property of the Modified MC1, MC2, MCUSUM, and PPCUSUM Chart The control schemes derived in the Sections 3.1-3.4, i.e., the modified MC1 and MC2 schemes, the modified MCUSUM chart, and the modified PPCUSUM method are rather general and can be applied to a large family of stochastic processes. On the other side, it is useful to know properties of the control charts, which one can use to simplify the calculation of the control design. One of the most important properties of these control charts in the independent case is the invariance of the in-control average run length with respect to X. However, it is rather difficult to establish such a property for time series data. That is why in this section we restrict ourselves to the case of a stationary VARMA(1,l) process { Y t ) i.e. , {Yt) is astationary solution of the stochastic difference equation
Assuming that det(1 - cf, z ) has no zeros on and inside the unit circle there exists a unique stationary solution. This model has turned out to be of great importance in many practical applications.
Theorem 1. Let Y t be a stationary p-dimensional VARMA(1,l) process. Let cf, be a matrix such that det(1 - @ z ) has no zeros on and inside the unit circle. Suppose that {et} are independent and normally distributed with zero Then the mean vector and positive definite covariance matrix I= = [aij]7,j=1. distributions of the run lengths of the modified M C I , modified MC2, modified MCUSUM, and modified PPCUSUM control charts do not depend on X i n the in-control state if cf, 2 and O X are symmetric matrices.. The proof of the Theorem is given in the appendix. Note that for a symmetric matrix cf, it only holds that cf,Xis symmetric for all positive definite matrices X if @ = $1 for some 4 E (-1,l). Thus the charts,are in general not directionally invariant.
4 Multivariate CUSUM Residual Charts In recent literature there are many papers about residual charts. In most of the papers the mean behavior of a time series is monitored for the mean dealing with (e.g., Alwan and Roberts (1988), Kramer and Schmid (1997), Lu and Reynolds (1999)). The idea behind the residual charts is to transform the original data to new variables which are independent. Next the above presented CUSUM charts for independent values are applied to the transformed quantities, the residuals. The process residuals are determined as the difference between the observation a t time t and a predictor for this value. For calculating the predictor of X t it is frequently assumed that the whole history of the process up to -m is known. But in practical applications such an assumption is not realistic. Following Kramer and Schmid (1997) we want to predict X t based on the random vectors X I , ..., This is done under the assumption that the process is in control. In other words, xt = f ( X I , ...,Xt-l), where f ( y l , ...,yt-I) is the best linear predictor of Y t given that Y I = y l , ...,Yt-I = yt-I. In the comparison study of Section 5 we assume that the target process {Yt) is a stationary VARMA(1,l) process. Then from Brockwell and Davis (1991, ch.11) it follows that the best linear predictor is given by
xt = with
xl
@ X t d l - Elt (Xt-1 - x ~ - ~ ) for
t 22
= 0 and the matrix O t is computed recursively as
where X+OI=O'-OtVt-lQ~
for for
t=1 t22
'
When the process { Y t ) is invertible then it follows that O t -t O and V t -t E with t -+ w. A recursive procedure for the calculation of xt and V t = Varo(Xt xt)for an arbitrary stationary process can be found in Brockwell and Davis (1991, ch.11). The normalized residuals are given by q t = v;"~ ( X t x t ) for t 1. Now it holds that Eo(qt) = 0 , Covo(qt) = I and Eo(qt 77:) = 0 for t # s. Assuming that {Xt) is a Gaussian process it follows that in the in-control state the variables { q t ) are independent and normally distributed. Therefore one can apply the usual control charts for independent observations to the residuals {q,). It has to be noted that in the case of a shift in the mean vector the normalized residuals are no longer identically distributed but normally distributed. The reason for this behavior lies in the starting problem (see Kramer and Schmid (1997)). Moreover, under the outof-control situation analyzed in this paper, the normalized residuals are still independent. Many analysts prefer t o work with the limit of O t and V t . Using the asymptotic values the corresponding residuals are not independent. This is a further drawback of this procedure.
>
4.1 MC1 for Residuals In Section 3.1 the modified MC1 control chart was introduced. In this section we use the same control procedure but instead of the original observations it is applied to the normalized residuals. Hence, we obtain the following control statistic
The number of observations since the last reconstruction of the control chart n,,t is defined by analogy to (2). The control chart gives a signal as soon as the statistic MClrtt exceeds the control limit. It is determined such that the in-control ARL of the chart is equal to a fixed value [. In practice this equation has to be solved by simulations.
4.2 MC2 for Residuals The MC2 scheme for residuals is based on the quantity D;,, = q:qt The CUSUM recursion is given by
with MC2r,o = 0. The MC2 control chart for residuals signals a change if the statistic M C ~ , Jexceeds the preselected control limit.
4.3 MCUSUM for Residuals By analogy to Section 3.3 we consider C , t = (S,t-I +qt)'(Sr,t-l + q t ) with S , t as in ( 5 ) . Then,
A large value of MCUSUMr,t is a hint that a change has happened. The control limit is determined as described above. It is obtained within an extensive Monte Carlo study.
4.4 PPCUSUM for Residuals The PPCUSUM control statistic for residuals is defined as
As usually the control limit is chosen such that the in-control ARL is equal to a fixed value [. Because of no explicit formula is available for the distribution of the ARL we calculate it via simulations.
5 A Comparison of the Multivariate Control Charts In this section we want to compare the introduced control charts with the multivariate EWMA charts of Kramer and Schmid (1997). 5.1 Structure of the Monte Carlo Study The in-control process is taken to be a two-dimensional stationary VARMA(1,l) process with mean vector po = 0 and normally distributed white noise {Q) as defined in Theorem 1. It is assumed that the conditions of Theorem 1 hold. In order to assess a control scheme it is necessary to fix the out-of-control situation. Here we make use of the model (1) with q = 1. Hence,
with a = (a1 J m , a z J W K ) ) ' = (a1 J r l l o , a z J y z z O y z z o ) ' . Because the distribution of st is symmetric it follows that the distribution of ( X I , ..., Xk) for a shift of size a is the same as its distribution for a shift of size -a, i.e. it holds that Pa((X1, . . , X k ) E B ) = P-,((XI, . . , X k ) E B ) for all Bore1 sets B C RZk.Because of this symmetry we can assume without any restriction in our analysis that a2 2 0. The usual problem in dealing with multivariate time series is the huge number of parameters. Therefore we assume that @ and Q are diagonal matrices. Here we present the results for the configuration
As a measure of the performance of a control chart the average run length (ARL) is applied. All charts were calibrated to have the same in-control ARLs, here 200. Because no explicit formula for the in-control and the out-of-control ARLs are available a Monte Carlo study is used to estimate these quantities. Our estimators of the in-control ARLs are based on l o 5 simulated independent realizations of the process. The control limits of all charts were determined by applying the Regula falsi to the estimated ARLs. While in Table 1 the control limits of the modified charts are given for various values of the reference value k and the smoothing parameter r , Table 2 contains the control limits for the residual schemes for the same values of k and r. For the modified charts the control limits depend on the process parameters. Because in the in-control state the residuals are independent and identically distributed the control limits of the residual charts are the same as for independent samples. Furthermore they do not depend on the process parameters. In the out-of-control state they are again independent but no longer identically distributed. Consequently they are not directionally invariant. However, the influence of the
first observations is small and thus the charts behave nearly like directionally invariant. For the CUSUM schemes the control limits decrease as k increases. Conversely, for the MEWMA charts the control limits increase as the parameter r increases. Finally, almost in all cases the control limits of the modified control charts are larger than the corresponding limits of the residual charts. Table 1. Control limits of the modified MEWMA, MC1, MC2, MCUSUM, and PPCUSUM charts for the two-dimensional VARMA(1,l) process of Section 5.1 (incontrol ARL = 200).
Table 2. Control limits of the MEWMA, MC1, MC2, MCUSUM, and PPCI.JSUM residual charts for the two-dimensional VARMA(1,l) p.rocess of Section 5.. l (incontrol ARL = 200).
In order to study the out-of-control behavior of the proposed control charts we take various reference values k into account. For all CUSUM charts k is chosen as an element of the set {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1). For the multivariate EWMA charts the smoothing matrix is taken as a diagonal matrix with equal diagonal elements r with r E {0.1,0.2,0.3,0.4,0.5,0.6,0.7, 0.8,0.9,1.0).
5.2 Behavior in the Out-Of-Control State For the determination of the out-of-control ARLs we made use of l o 6 independent realizations of the underlying process. In the Tables 3 to 6 the out-of-control ARLs of all control charts within our study are presented. The corresponding values r and k at which the smallest out-of-control ARLs are attained, are given in brackets. These values should be taken to detect the specific shift in the mean vector of the target process. For a given shift the ARL of the best chart is printed boldfaced.
Table 3. Minimal out-of-control ARLs of the modified MEWMA, MC1, MC2,
MCUSUM, and PPCUSUM control charts for different values of the parameters
r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - part 1 l\az
0.0
0.25
0.5
0.75
1.0
1.25
1.5
5.56(0.3) 4.74(0.3) 3.98(0.4) 3.36(0.4) 2.84(0.5) 2.42(0.7) 2.06(0.q 5.04 (0.9) 4.31 (1.0) 3.62 (1.1) 3.08 (1.1) 2.66 (1.1)2.35 (1.1)2.09 (1.1) 6.83 (0.8) 5.56 (1.1) 4.49 (1.1) 3.64 (1.1) 3.00 (1.1) 2.50 (1.1) 2.12 (1.1) 1.50 5.25 (1.1) 4.49 (1.1) 3.81 (1.1) 3.29 (1.1) 2.86 (1.1) 2.53 (1.1) 2.28 (1.1) 5.29 (1.0) 4.49 (1.1) 3.79 (1.1) 3.23 (1.1) 2.80 (1.1) 2.47 (1.1) 2.20 (1.1) 7.59 (0.2) 6.24 (0.3) 5.04 (0.3) 4.15 (0.4) 3.44 (0.4) 2.86 (0.5) 2.42 (0.7) 6.94 (0.7) 5.74 (0.9) 4.67 (1.0) 3.80 (1.1) 3.15 (1.1) 2.69 (1.1) 2.35 (1.1) 10.36 (0.8) 8 1 1 0 6.21 (1.0) 4.77 (1.1) 3.74 (1.1) 3.02 (1.1) 2.50 (1.1) 6.01 (1.0) 4.89 (1.1) 4.00 (1.1) 3.37 (1.1) 2.89 (1.1) 2.53 (1.1) 1.25 7.27 (0.9) 7.37 (0.8) 6.05 (0.9) 4.90 (1.0) 3.98 (1.1) 3.32 (1.1) 2.83 (1.1) 2.46 (1:l) 11.06 (0.2) 8.52 (0.2) 6.63 (0.2) 5.20 (0.3) 4.19 (0.3) 3.41 (0.4) 2.82 (0.5) 10.06 (0.6) 8.05 (0.7) 6.25 (0.8) 4.87 (1.0) 3.87 (1.1) 3.16 (1.1)2.66 (1.1) 17.03 (0.5) 12.57 (0.7) 9.03 (0.7) 6.57 (1.0) 4.88 (1.1) 3.73 (1.1) 2.98 (1.1) 10.60 (0.7) 8.43 (0.8) 6.54 (0.9) 5.09 (1.1) 4.05 (1.1) 3.36 (1.1) 2.86 (1.1) 1.0 10.86 (0.7) 8.55 (0.8) 6.63 (0.9) 5.13 (1.0) 4.03 (1.1) 3.30 (1.1) 2.80 (1.1) 17.03 (0.1) 12.34 (0.1) 8.98 (0.2) 6.66 (0.2) 5.13 (0.3) 4.08 (0.4) 3.31 (0.4) 15.83 (0.4) 11.83 (0.5)8.66 (0.6) 6.40 (0.8) 4.86 (1.0)3.78 (1.1) 3.07 (1.1) 31.01 (0.3) 21.23 (0.4) 13.96 (0.6)9.37 (0.8) 6.54 (1.0) 4.74 (1.1) 3.61 (1.1) 3.75 16.71 (0.5) 12.47 (0.6) 9.11 (0.8) 6.70 (0.9) 5.08 (1.1) 3.98 (1.1) 3.27 (1.1) 17.34 (0.5) 12.81 (0.6) 9.24 (0.7) 6.79 (0.9) 5.11 (1.0) 3.96 (1.1) 3.22 (1.1) 32.06 (0.1) 19.21 (0.1) 12.55 (0.1)8.73 (0.2) 6.41 (0.2) 4.88 (0.3) 3.86 (0.47 28.67 (0.3) 19.03 (0.4)12.48 (0.5)8.58 (0.7) 6.19 (0.8) 4.61 (1.0)3.59(1.1) 63.89 (0.2) 39.55 (0.3) 22.87 (0.4)13.89 (0.6)8.93 (0.8) 6.12 (1.0) 4.43 (1.1) 29.99 (0.3) 20.01 (0.4) 13.21 (0.6)9.05 (0.8) 6.48 (1.0) 4.81 (1.1) 3.79 (1.1) 1.5 -,31.72 (0.3) 20.85 (0.4) 13.53 (0.6)9.18 (0.7) 6.55 (0.9) 4.86 (1.1) 3.76 (1.1) 84.86(0.1)35.97(0.1)18.28 (0.1)11.51 (0.1)7.96 (0.2) 5.87 (0.3) 4.50 (0.3) 67.48 (0.1) 34.08 (0.2)18.70 (0.4)11.61 (0.6)7.82 (0.7)5.63 (0.8)4.24 (1.0) 136.29 (0.2) 79.03 (0.2) 39.28 (0.3)20.95 (0.4)12.32 (0.7)7.93 (0.9) 5.45 (1.1) 1.25 69.80 (0.2) 35.63 (0.3) 19.69 (0.4)12.22 (0.6)8.24 (0.8) 5.89 (1.0) 4.43 (1.1) 75.81 (0.2) 38.26 (0.3) 20.54 (0.4)12.58 (0.6)8.36 (0.8) 5.94 (1.0) 4.42 (1.1) m o d m e w m a 76.17 (0.1) 27.73 (0.1) 14.97 (0.1)9.91 (0.2) 6.93 (0.2) 5.19 (0.3) modmcl 64.95 (0.1)27.65 (0.3)15.37 (0.4)9.76 (0.6) 6.73 (0.8) 4.93 (0.9) 138.03 (0.2)64.03 (0.2)30.69 (0.4) 16.69 (0.5)10.08 (0.8)6.64 (0.9) modmc2 m o d m c u s u m 67.35 (0.2) 29.07 (0.3) 16.17 (0.5) 10.27 (0.7)7.06 (0.9) 5.16 (1.1) .O m o d p p c u s u m 73.95 (0.2) 300.3 (0.3) 16.78 (0.5) 10.50 (0.9 7.16 (0.8) 5.16 (1.0) ~~~~~
The results of the modified charts can be found in the Tables 3 and 4. In our study the modified MC1 control chart overperforms the other modified schemes. In almost all cases it provides the smallest out-of-control ARL. On the second place the modified CUSUM chart can be ranked. It is clearly worse than the MC1 approach but it seems to be a little bit better than the other competitors. The modified PPCUSUM method and the modified multivariate EWMA chart follow on the next places. They provide similar results, once modppcusum is better, once modewma. The modified MC2 scheme behaves considerably worse. For nearly all shifts under consideration it has a larger out-of-control ARL than the other charts. In the Tables 5 and 6 the out-of-control ARLs of the residual charts are shown. In all cases the best performance is obtained by the MC1 control chart. On the second rank we find the MCUSUM control scheme applied to the residuals. The results of the PPCUSUM and MEWMA control charts are again quite similar. While for small and large shifts a better performance is reached
Table 4. Minimal out-of-control ARLs of the modified EWMA, MC1, MC2, MCUSUM, and PPCUSUM control charts for different values of the parameters T and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 2 al\a2
1.5 0.75 1.O 1.25 0.25 0.5 0.0 85.06 ( 0 . 1 ) 88.35 ( 0 . 1 ) 37.05 ( 0 . 1 ) 18.55 ( 0 . 1 ) 11.66 ( 0 . 1 ) 7.96 ( 0 . 2 ) 5.91 ( 0 . 3 )
67.31 ( 0 . 1 ) 66.11 ( 0 . 1 ) 33.49 ( 0 . 2 ) 18.46 ( 0 . 4 ) 11.49 ( 0 . 5 ) 7.76 (0.7) 5.61 ( 0 . 9 ) 0.25
137.26 ( 0 . 2 ) 137.08 (0.2) 69.76 ( 0 . 2 ) 68.32 ( 0 . 2 ) 76.03 ( 0 . 2 ) 74.93 (0.2) 32.03 ( 0 . 2 ) 43.54 (0.1)
28.52 ( 0 . 3 ) 34.62 (0.2) 0.5
0.75
1.0
1.25
1.5
63.82 30.13 31.72 17.06
(0.2) (0.3) (0.3) (0.1) 16.76 ( 0 . 5 ) 31.09 ( 0 . 4 ) 17.36 ( 0 . 5 ) 17.36 ( 0 . 5 ) 11.05 ( 0 . 2 ) 10.06 ( 0 . 6 ) 17.01 ( 0 . 5 ) 10.58 (0.7) 10.86 ( 0 . 6 ) 7.57 ( 0 . 2 ) 6.93 ( 0 . 7 ) 10.34 ( 0 . 6 ) 7.26 (0.9) 7.35 (0.8) 5.56 (0.3) 5.02 ( 0 . 9 ) 6.84 ( 0 . 8 ) 5.27 (1.1) 5.30 ( 1 . 0 )
78.81 36.23 38.67 21.87
(0.2) (0.3) (0.3) (0.1) 20.26 ( 0 . 4 ) 39.33 ( 0 . 3 ) 21.10 ( 0 . 4 ) 21.10 ( 0 . 4 ) 13.47 ( 0 . 1 ) 11.93 ( 0 . 5 ) 21.28 (0.5) 12.55 (0.6) 12.95 ( 0 . 6 ) 8.95 ( 0 . 2 ) 88.4 (0.7) 12.57 (0.6) 8.47 ( 0 . 8 ) 8.62 ( 0 . 8 ) 6.40 ( 0 . 3 ) 5.77 (0.8) 8.15 (0.8) 6.04 ( 0 . 9 ) 6.10 ( 0 . 9 )
( 0 . 2 ) 39.15 (0.3) 20.76 (0.4) 12.21 ( 0 . 7 ) 7.85 (0.8) ( 0 . 3 ) 19.48 ( 0 . 4 ) 12.11 ( 0 . 6 ) 8.19 ( 0 . 9 ) 5.87 (1.1) ( 0 . 3 ) 20.36 ( 0 . 4 ) 12.43 (0.6) 8.26 ( 0 . 8 ) 5.92 (1.0) ( 0 . 1 ) 19.79 (0.1) 12.82 (0.1) 8.82 ( 0 . 2 ) 6.46 (0.2) 28.23 ( 0 . 3 ) 18.59 ( 0 . 4 ) 12.24 ( 0 . 5 ) 8.45 (0.1) 6.11 (0.8) 63.81 ( 0 . 2 ) 39.18 ( 0 . 3 ) 22.56 (0.4) 13.65 ( 0 . 6 ) 8.82 (0.8) 29.49 ( 0 . 3 ) 19.61 (0.4) 12.95 (0.6) 8.92 ( 0 . 8 ) 6.40 (0.9) 31.18 (0.3) 20.43 (0.4) 13.30 (0.6) 9.03 ( 0 . 7 ) 6.47 (0.9) 22.12 ( 0 . 1 ) 17.38 ( 0 . 1 ) 12.65 (0.1) 9.08 ( 0 . 2 ) 6.77 (0.2) 20.14 (0.4) 16.46 (0.5) 12.23 (0.6) 8.94 (0.7) 6.62 ( 0 . 9 ) 39.35 ( 0 . 3 ) 30.80 (0.4) 20.86 (0.4) 13.74 ( 0 . 5 ) 9.19 (0.7) 20.94 (0.4) 17.07 (0.5) 12.56 (0.6) 9.04 (0.7) 6.69 (0.9) 20.94 ( 0 . 4 ) 17.07 (0.5) 12.56 (0.6) 9.04 (0.7) 6.69 (0.9) 14.55 ( 0 . 1 ) 13.63 ( 0 . 1 ) 11.20 (0.2) 8.60 ( 0 . 2 ) 6.73 (0.2) 12.69 (0.5) 11.80 ( 0 . 5 ) 9.87 (0.6) 7.86 (0.7) 6.13 ( 0 . 8 ) 23.00 ( 0 . 4 ) 21.21 ( 0 . 4 ) 16.80 (0.5) 12.34 ( 0 . 6 ) 8.90 (0.7) 13.36 ( 0 . 6 ) 12.47 ( 0 . 6 ) 10.43 (0.7) 8.30 ( 0 . 9 ) 6.43 (1.0) 13.83 ( 0 . 6 ) 12.81 ( 0 . 6 ) 10.66 ( 0 . 7 ) 8.40 ( 0 . 8 ) 6.50 (0.9) 9.97 (0.2) 10.02 ( 0 . 2 ) 9.05 ( 0 . 2 ) 7.67 ( 0 . 2 ) 6.26 (0.3) 8.74 (0.6) 8.72 ( 0 . 6 ) 7.96 (0.7) 6.83 (0.8) 5.64 (0.9) 14.06 ( 0 . 6 ) 14.W ( 0 . 6 ) 12.51 (0.7) 10.22 ( 0 . 7 ) 7.97 (0.8) 9.22 (0.8) 9.17 (0.7) 8.37 ( 0 . 8 ) 7.13 (0.9) 5.89 (1.1) 9.35 (0.7) 9.32 (0.7) 8.49 (0.8) 7.24 ( 0 . 8 ) 5.96 (1.0) 7.16 ( 0 . 3 ) 7.42 (0.2) 7.16 (0.3) 6.45 (0.3) 5.58 (0.3) 6.27 (0.8) 6.47 ( 0 . 8 ) 6.27 ( 0 . 8 ) 5.71 (0.9) 4.97 ( 0 . 9 ) 9.14 (0.7) 9.51 ( 0 . 8 ) 9.09 ( 0 . 8 ) 8.03 ( 0 . 8 ) 6.72 (0.9) 6.59 (0.9) 6.78 ( 0 . 9 ) 6.53 ( 0 . 9 ) 5.97 ( 1 . 1 ) 5.18 (1.1) 6.67 (0.9) 6.88 ( 0 . 8 ) 6.61 ( 0 . 9 ) 6.01 (0.9) 5.23 (1.0)
79.30 35.15 37.53 33.07
by the PPCUSUM approach, the residual MEWMA scheme overperforms the last control scheme for moderate values of the shift. Finally, the MC2 chart shows much worse performance. An advantage of the MEWMA residual scheme is the robustness with respect to the choice of the smoothing parameter. In most cases the smallest out-of-control ARL was obtained for r lying between 0.1 and 0.3 while the optimal values of the modified charts are larger. Such a behavior was already described in Kramer and Schmid (1997). The other charts were more sensitive with respect to the choice of the smoothing parameter. For small shifts a 1 and a 2 the best smoothing parameter is small as well while for large changes the smoothing parameter should be taken large. This is the same behavior as for the univariate EWMA chart for independent samples. The comparison between the modified and the residual charts leads to interesting results. If the ARL of a modified chart is compared with its residual counterpart then for a fixed shift both types of control schemes have a similar performance, except for the MC2 approach. For the MC1, MCUSUM, and PPCUSUM approaches the modified schemes behave better than the residual charts if a 2 is small, else the residual approach is better. However, this behavior might change for other parameter constellations (cf. Kramer and Schmid
Table 5. Minimal out-of-control ARLs of the residual EWMA, MC1, MC2, MCUSUM, and PPCUSUM control charts for different values of the parameters r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 1 0.25 1.0 1.25 1.5 0.5 0.75 al\az 0.0 5.67 (0.2) 4.81 (0.3) 4.03 (0.3) 3.40 (0.4) 2.85 (0.5) 2.41 (0.5) 2.06 (0.6) 5.04 (0.7) 4.26 (0.8) 3.58 (0.9) 2.99 (1.0) 2.52 (1.1) 2.17 (1.1) 1.89 (1.1) 10.02 (0.7) 7.74 (0.8) 5.86 (0.9) 4.45 (1.0) 3.45 (1.1) 2.73 (1.1) 2.26 (1.1) -1.5 5.36 (0.8) 4.51 (0.9) 3.79 (1.0) 3.15 (1.1) 2.68 (1.1) 2.31 (1.1) 2.03 (1.1) 5.55 (0.8) 4.66 (0.8) 3.85 (0.9) 3.21 (1.1) 2.69 (1.1) 2.30 (1.1) 2.02 (1.1) 7.77 (0.2) 6.38 (0.2) 5.20 (0.3) 4.21 (0.3) 3.49 (0.4) 2.89 (0.4) 2.40 (0.57 7.00(0.5) 5.72(0.6) 4.64(0.7) 3.75(0.8) 3.07(1.0) 2.55(1.1) 2.16(1.1) -1.25 16.31 (0.5) 12.11 (0.6) 8.68 (0.7) 6.26 (0.9) 4.60 (1.0) 3.49 (1.1) 2.74 (1.1) 7.46 (0.7) 6.12 (0.8) 4.91 (0.9) 3.96 (0.9) 3.24 (1.1) 2.70 (1.1) 2.31 (1.1) 7.77 (0.6) 6.33 (0.7) 5.08 (0.8) 4.06 (0.9) 3.28 (1.0) 2.71 (1.1) 2.29 (1.1) 11.19 (0.1) 8.93 (0.2) 6.85 (0.2) 5.41 (0.2) 4.27 (0.3) 3.47 (0.4) 2.85 (0.5) 10.27(0.4) 8.05(0.5) 6.18(0.6) 4.81(0.7) 3.80(0.9) 3.06(1.0) 2.51(1.1) -1.0 27.87 (0.3) 19.75 (0.4) 13.40 (0.5) 9.05 (0.7) 6.32 (0.8) 4.56 (1.1) 3.40 (1.1) 10.95 (0.5) 8.59 (0.6) 6.61 (0.7) 5.11 (0.9) 4.02 (0.9) 3.23 (1.1) 2.66 (1.1) 11.62 (0.5) 9.01 (0.6) 6.85 (0.7) 5.28 (0.8) 4.11 (0.9) 3.28 (1.1) 2.68 (1.1) 17.48 (0.1) 12.84 (0.1) 9.53 (0.2) 6.95 (0.2) 5.34 (0.3) 4.16 (0.3) 3.34 (0.4) 16.35 (0.3) 11.88 (0.4)8.53 (0.5) 6.31 (0.6) 4.77 (0.7) 3.71 (0.9) 2.97 (1.0) -0.75 50.16 (0.2) 33.67 (0.3) 21.27 (0.4) 13.48 (0.5) 8.82 (0.7) 6.04 (0.9) 4.32 (1.1) 17.30 (0.4) 12.71 (0.5) 9.16 (0.6) 6.68 (0.7) 5.05 (0.8) 3.91 (1.0) 3.11 (1.1) 18.70 (0.4) 13.54 (0.4) 9.60 (0.5) 6.97 (0.7) 5.20 (0.8) 4.00 (0.9) 3.17 (1.1) 33.61 (0.1) 20.60 (0.1) 13.19 (0.1) 9.29 (0.2) 6.62 (0.2) 5.03 (0.3) 3.92 (0.3) 29.54 (0.2) 19.08 (0.3)12.32 (0.4)8.38 (0.5) 6.01 (0.6) 4.52 (0.8) 3.51 (0.9) -0.5 94.29 (0.1) 59.57 (0.2) 34.52 (0.3) 20.15 (0.4) 12.43 (0.5)8.04 (0.8) 5.48 (1.0) 31.30 (0.3) 20.21 (0.3) 13.05 (0.5) 8.94 (0.6) 6.38 (0.7) 4.77 (0.9) 3.71 (1.0) 34.16 (0.2) 21.73 (0.3) 13.92 (0.4) 9.33 (0.6) 6.61 (0.7) 4.93 (0.9) 3.78 (1.0) 88.96 (0.1) 38.90 (0.1) 19.44 (0.1) 12.06 (0.1) 8.29 (0.2) 6.05 (0.2) 4.57 (0.3) 68.96 (0.1) 33.75 (0.2) 18.08 (0.3)11.15 (0.4)7.57 (0.5) 5.47 (0.7) 4.11 (0.8) -0.25 160.83 (0.1) 104.67 (0.1)55.64 (0.2) 29.90 (0.3) 17.22 (0.4)10.61 (0.7)6.96 (0.9) 72.14 (0.1) 35.57 (0.2) 19.22 (0.4) 11.86 (0.5) 8.09 (0.7) 5.80 (0.8) 4.35 (0.9) 80.77 (0.1) 38.56 (0.2) 20.65 (0.3) 12.59 (0.5) 8.37 (0.6) 5.99 (0.7) 4.48 (0.9) r e s m e w m a 79.20 (0.1) 28.91 (0.1) 15.44 (0.1) 10.20 (0.1)7.11 (0.2) 55.3 (0.3) 62.22 (0.1)26.21 (0.2)14.63 (0.3)9.35 (0.5) 6.49 (0.6) 4.78 (0.7) resmcl 153.29 (0.1)82.74 (0.1) 41.85 (0.2) 22.97 (0.4)13.63 (0.5)8.62 (0.7) resmc2 0.0 r e s m c u s u m 65.22 (0.1) 27.49 (0.3) 15.43 (0.4) 9.90 (0.6) 6.88 (0.7) 5.04 (0.9) resppcusum 72.45 (0.2) 29.93 (0.3) 16.35 (0.4) 10.40 (0.5)7.17 (0.7) 5.20 (0.8)
(1997)). Note that in the univariate case modified schemes lead to a smaller out-of-control ARL if the process has a positive correlation structure while for a negative correlation residual charts should be preferred (cf. Knoth and Schmid (2004)). In the multivariate case the situation is more difficult. If we consider a VAR(1) process then E ( ~ ? ( o ) - ' / ~=x I~' () ~ ) - l / ~Ifa .all elements ~/~ positive then the shift is overweighted. For the residual chart of I ' ( O ) - are we get that limt+oo E(v~-'/'(x~ - x ~ ) )= c-'/'(I - @)a. Here for positive elements of C-'/' the shift is overweighted (downweighted) if all components of @ are negative (positive). Because a VARMA process depends on many parameters a comparison between the modified and the residual charts turns out to be difficult. However, in all of our simulations the MC1 chart provides very good results. For that reason we recommend to apply either the modified or the residual MC1 scheme. For reasons of simplicity we have taken the smoothing values of the MEWMA chart all equal the same. This chart has
Table 6. Minimal out-of-control ARLs of the residual EWMA, MC1, MC2,
MCUSUM, and PPCUSUM control charts for different values of the parameters
r and k for the two-dimensional VARMA(1,l) process of Section 5.1 - Part 2
1.5 1.25 1.0 0.75 0.25 0.0 0.5 88.66 (0.1) 83.56 (0.1) 36.30'10.1) 18.55 (0.1) 11.74 (0.1) 8.18 (0.2) 5.95 (0.2) 69.00 (0.1) 65.26 (0.1) 31.77 (0.2)17.41 (0.3) 10.90 (0.4) 7.46 (0.5) 5.39 (0.7) 0.25 160.84 (0.1)156.72 (0.1)99.21 (0.2) 52.41 (0.2) 28.51 (0.4) 16.55 (0.4) 10.30 (0.7) 71.65 (0.1) 68.00 (0.1) 33.54 (0.2) 18.34 (0.4) 11.53 (0.5) 7.91 (0.7) 5.73 (0.8) 80.67 (0.2) 76.31 (0.2) 36.41 (0.2) 19.80 (0.4) 12.20 (0.5) 8.24 (0.6) 5.92 (0.7) 33.61 (0.1) 41.96 (0.1) 31.03 (0.1) 18.97 (0.1) 12.55 (0.1) 8.90 (0.2) 6.47 (0.2) 29.54 (0.2) 36.00 (0.2) 27.72 (0.2)17.83 (0.3)11.68 (0.4)8.08 (0.5) 5.89 (0.6) 54.26(0.2) 31.37(0.3) 18.61(0.4)11.72(0.6) 93.65(0.1) 109.98(0.1)87.27(0.1) 0.5 31.19 (0.3) 37.67 (0.2) 29.15 (0.3) 18.73 (0.4) 12.34 (0.5) 8.59 (0.6) 6.23 (0.7) 34.22 (0.3) 41.05 (0.2) 31.76 (0.3) 20.26 (0.9) 13.13 (0.5) 8.98 (0.6) 6.44 (0.7) 17.56 (0.1) 21.73 (0.1) 21.01 (0.1) 16.39 (0.1) 12.08 (0.1) 8.98 (0.2) 6.67 (0.2) 16.33 (0.3)19.94 (0.3) 19.31 (0.3)15.35 (0.3)11.22 0.4) 8.16 (0.5) 6.09 (0.6) 0.75 50.42 (0.2) 63.60 (0.2) 61.07 (0.2) 45.62 (0.2) 30.03 (0.3) 19.07 (0.4)12.38 (0.5) 17.27 (0.4) 21.13 (0.3) 20.54 (0.3) 16.16 (0.4) 11.82 (0.5) 8.69 (0.6) 6.46 (0.7) 18.74 (0.4) 22.80 (0.3) 22.13 (0.3) 17.42 (0.4) 12.58 (0.5) 9.08 (0.6) 6.70 (0.7) 11.23 (0.1) 13.27 (0.1) 13.96 (0.1) 12.74 (0.1) 10.63 (0.1) 8.45 (0.2) 6.54 (0.2) 10.28(0.4) 12.26(0.4) 13.02(0.4) 11.83(0.4) 9.75(0.4) 7.65(0.5) 5.96(0.6) 27.84 (0.3) 35.50 (0.3) 37.98 (0.3) 33.28 (0.3) 25.06 (0.3) 17.45 (0.4)12.09 (0.6) 1.0 10.95 (0.5) 13.08 (0.5) 13.84 (0.4) 12.58 (0.5) 10.36 (0.5) 8.15 (0.6) 6.32 (0.7) 11.57(0.5) 13.95(0.4) 14.77(0.4) 13.39(0.4) 10.90(0.5) 8.48(0.6) 6.52(0.7) 7.78(0.2) 9.20(0.2) g9.90(0.1)9.79(0.1) 8.80(0.2) 7.38(0.2) 6.12(0.2) 7.00 (0.6) 8.20 (0.5) 8.91 (0.5) 8.77 (0.5) 7.89 (0.5) 6.72 (0.6) 5.54 (0.7) 1.25 16.36 (0.4) 20.47 (0.4) 22.96 (0.4) 22.34 (0.4) 19.04 (0.4) 14.79 (0.5)10.95 (0.6) 7.46 (0.7) 8.77 (0.6) 9.55 (0.5) 9.41 (0.6) 8.47 (0.6) 7.15 (0.7) 5.86 (0.8) 7.79 (0.6) 9.20 (0.6) 10.04 (0.5) 9.89 (0.5) 8.83 (0.6) 7.42 (0.6) 6.08 (0.7) 5.66 (0.2) 6.47 (0.2) 7.09 (0.2) 7.24(0.2) 6.94 (0.2) 6.27 (0.2) 5.48 (0.2) 5.03(0.7) 5.78(0.6) 6.35(0.6) 6.52(0.6) 6.24(0.6) 5.64(0.7) 4.88(0.7) 10.08 (0.7') 12.48 (0.5) 14.32 (0.5) 14.73 (0.5) 13.65 (0.5) 11.57 (0.5) 9.23 (0.7) 1.5 5.37(0.8) 6.20(0.7) 6.78(0.7) 6.94(0.7) 6.66(0.7) 55.9(0.8) 5.18(0.8) 5.56(0.8) 6.42(0.7) 7.06(0.6) 7.28(0.7) 6.94(0.6) 6.22(0.7) 5.36(0.8)
.al\az
much more flexibility and improvements can be expected if different values are chosen. In Woodall and Mahmoud (2004) the so-called inertia behavior of control schemes is studied. The chart statistic can take values such that changes in the parameters are more difficult to detect. To measure this effect they introduce a new measure, the signal resistance. In their paper they compare several multivariate control charts for independent samples based on this performance criterion. It is shown that, e.g., the PPCUSUM chart has a much better worstcase signal resistance perforrriance than the MCl chart. In our analysis we have not considered the inertia problem. Further research in this direction is necessary.
6 Conclusions Due to the fast development of the computer technology in the last decade it is nowadays possible t o analyze multivariate data sets in a n on-line way. Although such problems arise in various scientific disciplines it is quite surprising that only a: few papers dealt with this topic. There seems to be a
lack of statistical methods for the simultaneous analysis of several correlated processes. In this paper we introduce several CUSUM control charts for detecting a shift in the mean vector of a stationary multivariate time series. We consider modified schemes and residual charts. Due to the large amount of parameters we focus on a VARMA(1,l) process. Sufficient conditions are derived under which the in-control ARL does not depend on the covariance matrix of the white noise process (Theorem 1). However, in general, the out-of-control ARL depends on all parameter matrices of the VARMA model. These charts are compared with the MEWMA chart of Kramer and Schmid (1997). In most cases the best performance is displayed by the MC1 control chart. Our paper can be considered as a first step in this direction. It is necessary to compare these charts for further parameter constellations. Especially, it is desirable to have a rule of thumb whether residual or modified charts should be applied. In our comparison study we restricted ourselves to the two-dimensional case. Because many data sets in practice are highly dimensional it is of great interest to study the behavior of these control schemes for higher dimensions.
7 Appendix In this section the proof of Theorem 1 is given.
We give the proof for the modified MC1 control chart. For the other modified schemes the results can be verified in the same way. Without loss of generality it is assumed that po = 0. We observe that in the in-control state (MC11, ..,MC1,) is a function of (Yi Y j ) ' r - l ( 0 ) (Y, Y,) for l < i < j < n . According to Khatri et. a1 (1977) the joint characteristic function of the quadratic forms (Y, Y,)'r-'(0) (Y, Y,) for 1 i j n is given by
+
+
+
+
< < <
where A = diag(I'-' (O), ... , (o)), T = diag(tl I,, ...,t,(,+l)p I,), H = Var((Y: +Y:,Y: +Y;, ...,Yh-l + Y ~ , Y ~ + Y ~=) '(Cov(Y,, ) +Y,,, Y,, Y , , ) ) I ~ , ~ < , ~Since ~~,~ C ~= , ,X~a', and , ~O ~X . = I= O', it follows that @ h X = C W and +h O X = C O ' a' h . From r ( l ) = \El+' C Qi, where 90 = I and Q1 = a'-'(@ - O ) (e.g., Reinsel (1997, p.36)) we obtain that
+
+
Thus
and where Q ( @ ,O , il ,jl ,i2,jz)and Q(@, 0 ) are functions not depending on Then for each il, j l , i 2 , j2 = l , . . . , n
X.
does not depend on X. Consequently T A H does not depend on X as well. Because (MC11, ..,MC1,) is a measurable function of (Y; Yj)' I'-l(0) ( Y i Y,) t h e result follows.
+
+
References 1. Alt FB (1985) Multivariate quality control. In: Kotz S, Johnson NL (eds) Encyclopedia of Statistical Sciences. Wiley, New York 2. Alt FB, Smith ND (1988) Multivariate process control, In: Krishnaiah PR, Rao CR (eds) Handbook of Statistics. North-Holland, Amsterdam 3. Alwan LC (1989) Time Series Modelling for Statistical Process Control. Ph.D. Thcsis, Graduate School of Business, Chicago 4. Alwan LC, Roberts HV (1988) Time-series modelling for statistical process control. Journal of Business and Economic Statistics 6:87-95 5. Brockwell P J , Davis RA (1991) Time Series: Theory and Methods. Springer, New York 6. Crosier RB (1986) A new two-sided cumulative sum quality control scheme. Technometrics 28:187-194 7. Crosier RB (1988) Multivariate generalizations of cumulative sum quality control schemes. Technometrics 30:291-303 8. Hawkins DM (1991) Multivariate quality control based on regression-adjusted variables. Technometrics 33:61-75 9. Hawkins DM (1993) Regression adjustment for variables in multivariate quality control. Journal of Quality Technology 25:170-182 10. Healy J D (1987) A note on multivariate CUSUM procedure. Technometrics 29:409-412 11. Hotelling H (1947) Multivariate quality control - illustrated by the air testing of sample bombsights. In: Eisenhart MW, Hastay, Wallis WA (eds) Techniques of Statistical Analysis. McGraw-Hill, New York 12. Khatri CG, Krishnaiah PR, Sen PK (1977) A note on the joint distribution of correlated quadratic forms. Journal of Statistical Planning and Inference 1:299307 13. Knoth S, Schmid W (2004) Control charts for time series: A review. In: Lenz H-J, Wilrich P-Th (eds) Frontiers of Statistical Quality Control. Physica, Heidelberg 14. Kramer H, Schrnid W (1997) EWMA charts for multivariate time series. Sequential Analysis 16:131-154
15. Kramcr H, Schmid W (2000) Thc influence of parameter cstimation on the ARL of Shcwhart-type charts for timc scrics. Statistical Papers 41:173-196 16. Lowry CA, Woodall WH, Champ CW, Rigdon S E (1992) A multivariate exponcntia,lly weightcd moving averagc control chart. Tcchnometrics 34:46-53 17. Lu CW, Rcynolds MR (1999) EWMA control charts for monitoring the mean of autocorrelated processes. Journal of Quality and Technology 31:166-188 18. Mason RL, Tracy ND, Young J C (1995) Decomposition of T' for multivariate control charts interprctation. Journal of Quality Technology 27:99-108 19. Montgomery DC, Mastrangelo CM (1991) Some statistical process control methods for autocorrelated data. Journal of Quality Technology 23:179-204 20. Moustakidcs GV (1986) Optimal stopping times for detecting changes in distributions. Annals of Statistics 14:1379-1387 21. Murphy BJ (1987) Selecting out-of-control variables with T2 multivariate quality control procedures. The Statistician 36:571-583 22. Ngai H-M, Zhang J (2001) Multivariate cumulative sum control charts based on projection pursuit. Statistica Sinica 11:747-766 23. Pagc ES (1954) Continuous inspection schemes. Biometrika 41:lOO-114 24. Pignatiello J J J r , Rungcr G C (1990) Comparison of multivariate CUSUM charts. Journal of Quality Technology 22:173-186 25. Pollak M (1985) Optimal detection of a change in distribution. Annals of Statistics 13:206-227 26. Rcinsel G C (1997) Elcmcnts of Multivariate Time Series Analysis. Springer, New York 27. Roberts S W (1959) Control charts tests based on gcometric moving averages. Tcchnomctrics 1:239-250 28. Rungcr. G (1996) Multivariate statistical process control for autocorrelated proccsscs. 1ntcrnationa.l Journal of Production Research 34:1715-1724 29. Rungcr G , Alt FB, Montgomcry DC (1996) Contributors t o a multivariate statistical process control chart signal. Communications in Statistics - Theory and Mcthods 25:2203-2213 30. Schmid W (1995) O n the run lcngth of a Shewhart chart for correlated data. Sta.tistica1 Papcrs 36:111-130 31. Schmid W (1997a) On EWMA charts for time series, In: Lenz H-J, Wilrich P-Th (cds) F'rontiers of Statistical Quality Control. Physica, Heidelberg 32. Schmid W (1997b) CUSUM control schemes for Gaussian processes. Statistical Papcrs 38:191-217 33. Sliwa P, Schmid W (2005) Monitoring the cross-covariances of a multivariate time series. Metrika 6139-115 34. Theodossiou P T (1993) Predicting shifts in the mean of a multivariate time series proccss: an application in predicting business failures. Journal of the American Statistical Association 88:441-449 35. Timm NH (1996) Multivariate quality control using finite intersection tests. Journal of Quality Technology 28:173-186 36. Vasilopoulos AV, Stamboulis A P (1978) Modification of control chart limits in thc presence of data correlation. Journal of Quality Technology 20:20-30 37. Wald A (1947) Sequential Analysis. Wilcy, New York 38. Woodall WH, Mahmoud MA (2004) The inertial properties of quality control charts. Technometrics, forthcoming 39. Woodall WH, Ncube MM (1985) Multivariate CUSUM quality control procedures. Technometrics 27:285-292
The Art of Evaluating Monitoring Schemes How to Measure the Performance of Control Charts? Sven Knoth Advanced Mask Technology Center, Dresden, Germany, [email protected],http://wvw.amtc-dresden.com
1 Introduction Contrary to the errors of the first and second kind methodology of statistical inference, the evaluation of control charting is based on the concept of run length. This is similar to the sequential testing theory where the (random) sample number is assessed besides the above mentioned errors. In nonsequential statistics the sample number is only considered when some desired error probabilities have to be ensured. In control charting, an alarm signal declares the process to be out of control, which corresponds to the decision for the alternative in statistical testing. This corresponds to tests of power 1 that end always with the rejection of the null hypothesis. A blind alarm resembles the error of the first kind. It is more subtle, however, to define equivalently an error of the second kind. Recall that, e. g., the test of power 1 experiences no error of the second kind at all. Only by introducing a truncation time point, which leads to the concept of repeated significance tests, a non-trivial probability of the error of the second kind is obtained. Thus, by observing an control chart which operates on an out-of-control process, the error of the second kind repeatedly occurs at each time point before the alarm is signaled. Therefore, instead of searching for small error probabilities we ought to look for control charts with a small rate of false alarms and short detection delays (equals to the number of time points with "errors of the second kind"). And this is how the run length comes into the game. Roughly speaking, the run length is the number of observations taken until the control chart signals an alarm. Eventually, the most widespread performance measure, the Average Run Length (ARL), resembles the expected run length for a very specific process situation. That is, the process remains unchanged during the whole monitoring (charting) period or changes right at, or changed before the control chart startup. Remark that this assumption is similar to adopt to know the actual chart state while the change takes place.
Of course, a control chart with large ARL values for the out-of-control process and small ARL values for the disturbed one might be a good objective for chart tuning and choosing the right one among CUSUM, EWMA, and others. Therefore, a lot of papers were written aiming for minimizing the socalled out-of-control ARL (disturbed process) for a given lower bound of the in-control ARL (undisturbed process). The current paper starts with some notation dedicated to various run length measures and a historical sketch about control chart evaluation. Then, drawbacks of uncontrolled usage of the "minimal out-of-control ARL" criterion (the one-sided EWMA-S2 control chart as illustrating example) are demonstrated. Three control charts (CUSUM, EWMA, Shiryaev-Roberts) are considered to promote the so-called steady-state ARL as a well-balanced evaluation measure. To sum up, this paper is more an expedition through the jungle of control charting (or should I say monitoring, surveillance, change point detection, or any similar scheme) than a complete description of all species of schemes and evaluation measures.
2 Notational preliminaries Let {Xz),=1,2,...be a sequence of random variables, which could be some batch statistics as well. An appropriate model for dealing with control charting is the change point model Of course, it is the driving model of change point detection schemes, see Woodall & Montgomery (1999): ... area of control charting and SPC ... we include i n this area any statistical method designed t o detect changes in a process over time, and Hawkins, Qiu & Kang (2003) for a very recent attempt to link the change point model and SPC. Denote {F8(t,),=l,z,... the corresponding sequence of cumulative distribution functions of the above X,. Let
{6(i))) is a typical change point model. Index m is simThen ({Xi), {F0(%,), ply called change point. Usually, 6 stands for the mean or the variance of X,. Nowadays, more complex parameters are monitored too. The parameter model (1) reflects abrupt and persistent changes in 19. There are more general models which capture drifts, multiple change points, some specified dependency structure etc. In the sequel, the above model will serve as our main model. For a given change point m, let Pm(.) and Em(.) be the probability measure and the expectation related to the sequence {X,)z,l,z,,,,. In SPC literature the process {X,) is called in-control for i < m and out-of-control for i m. Remember that this rests upon the concept of statistical control due to Shewhart (1931). Furthermore, denote by L the run length, which is a stopping
>
time in the language of sequential statistics. In order to give a general idea of control chart, we make use of a chart statistic 2, as a measurable function of {xi}i=1,2, ...,n . Then
where [c;, c:] resembles the continuation region of the charting process. Usually, c; and c: do not depend on time n (except the original EWMA control chart). Additionally, one of these values could be a reflecting barrier. For special one-sided control charts the continuation region could be unbounded below or above. Denote by
the process history before the change point m. The notation c(.) refers to the minimal sigma algebra generated by the random variables under consideration. Note that the star version stand for all in-control sequences which do not lead to an alarm before time point m. Now, we are ready to define various run length measures of specific ARL types. 1. Zero-state ARL: L={
El (L) E,(L)
, out-of-control measure , in-control measure
(3)
Recall that the index (1 or a)corresponds to a change point at chart start up (or before) and in far future, respectively. This ARL type is the most popular one and mostly called shortly ARL. The main drawback is the very specific change point position. In other words, by using the zerostate ARL we are assuming that we know the condition of the control chart right at the change point. At least, it is a satisfying measure in the in-control case, i. e. when we are quantifying the mean time until a wrong signal. Eventually, the phrase zero points to the fact that the initial chart state is given by a certain natural state, which corresponds sometimes to the average state (EWMA charts, see next section) or to the worst-case state (CUSUM schemes, details in next section). Thus, the zero-state ARL might be expressed as Le(z) with actual parameter value 6 and starting value Zo = z. 2. Conditional ARL:
Contrary to the other concepts this one is introduced exclusively for our listing. It allows to create a link between different competing concepts which do not possess a straightforward linkage. Obviously, DT = E1(L). For m > 1, the quantity D& is a real random variable (F&Ll-mea~~rable, i. e. a function of the history XI,. .. , Xm-1 with L > m - 1).
Expected conditional ARL (expectation is taken over the in-control part):
Dm measures a kind of average detection delay. Again, Dl= E1(L). Taking the limit leads to the well-known Steady-state ARL (after a very long in-control period (m -+ oo) without a signal): 2) = lim Dm (6) m+oo Worst-case ARL (the worst case due to Lorden 1971): W = sup ess sup D k m
We can imagine W as the average detection delay of a monitoring scheme being in the least favorable state for detection of a change. Lorden (1971) proved (and later some others too, see next section) some optimality results. The notation above differs from the original one by Lorden, so that we want to sketch the prove of equivalence in the next lines.
= sup m
ess sup D;
, because Pm(L 1 m) > 0 for any finite m .
Note, that for all schemes under consideration in case of independent { X , ) the worst-case ARL, W, is simply the largest zero-state ARL value. That is, we start the monitoring scheme in its worst condition concerning detection speed and the change takes place right at the beginning. Pollak-Szegmund worst-case ARL (more similar to the steady-state ARL): WPS = sup Dm = sup E(DL) m
m
From (6) we see that Dm converges to V, so that either WpS (sup) and V (lim) coincide, or there are some finite many Dm only, that are larger than V. We would like to conjecture that WPS = max {c,2)). a-worst-case ARL - since the very small likelihood of being in the worst case position of both the EWMA control chart and the Shiryaev-Roberts scheme (introduced in next section) the following concept is presented: with
For usual control charts, and change point models as (1) the limiting distribution of D& exists (see Appendix), so that the above definition can be used. Furthermore, the following identities hold:
E
(m-m lim
lim Dm
m+oo
lim
m+oo
All identities are valid by definition except for the second equality. It can be proved by means of Lebesgue's Theorem on Dominated Convergence. For details see the Appendix. In the following we give a historical outline and a more detailed study on a favorite ARL. The steady-state ARL will be preferred, while the zero-state ARL serves as a suitable surrogate of the steady-state ARL in the case of an EWMA scheme.
3 Remarks concerning history of measuring control chart performance In the beginnings of control charting, similar performance measures as in statistical test theory with fixed sample statistics were used. That is, one considered the probabilities of the errors of the first and the second kind. For control charting, the error of the first kind corresponds to the event of a wrong signal (blind alarm), while a missing signal in case of a disturbed process generates the error of the second kind. In order to evaluate the chart performance, often a fixed time point during the observation process is assumed. The standard Shewhart chart, however, is equivalent to a sequence of tests, so that it is reasonable to extend the consideration to the required number of tests until one of these tests ends with the rejection of the null hypothesis. Aroian & Levene (1950) were the first who tried to assess control charts in a more appropriate way than the simple manner of the error probabilities. They presented the so-called average spacing number and average efficiency number as predecessors of the more famous (zero-state) ARL. Then, Page (195413) introduced the term Average Run Length (ARL) "as the average number of articles inspected between two successive occasions when rectifying action is taken. " So, the most popular performance measure of control charts was born. And it prevailed, despite such apocalyptical comments like in Barnard (1959) "If it were thought worthwile one could use methods analogous to these given by Page (1954) and estimate the average run length as a function of the departure from the target value. However, as I have already indicated, such computations could be regarded as having the function merely of avoiding unemployment amongst mathematicians. " Nevertheless, hundreds of papers were written about the computation of the ARL, and probably some more dozens will be written more. In the 1950s further measures were considered. Girshick & Rubin (1952) formulated a Bayesian framework and constructed a monitoring scheme which
was the first step on the way to a scheme called "Shiryaev-Roberts procedure" which is now very popular amongst mathematical statisticians. The initial scheme is based on a constant g which describes the probability of switching from the in-control state to the out-of-control state at each time point and the likelihood ratio of the competing densities. More precisely, they employ
The parameter g stands for the probability that the process switches from incontrol to out-of-control in the next time unit. The authors proved optimality for a certain threshold a given a complex loss function. More comprehensive contributions to the Bayesian approach are given by Shiryaev (1963a)/Shiryaev (1963b) and Shiryaev (1976). Shiryaev introduced a random change point M with geometric distribution.
Contrary to (1) Shiryaev distinguishes between the cases m = 0 and m = 1, where the former represents changes in the past and the latter a change right at the beginning of the monitoring process. Further, he exploited the following loss function. Recall, that L denotes the run length of the scheme.
P,,p(L < M ) denotes the false alarm probability and the second term quantifies the average delay until a valid alarm. The constant c quantifies the cost relation between false alarms and detection delay. Both values should be small for an appropriate monitoring scheme. Shiryaev proved, that a scheme based on the a-posteriori probability P ( M 5 nlX1, X2,. . . ,Xn) minimizes the above loss function. This scheme minimizes E,,p(L - MIL 2 M ) for given upper bound a of P,,p(L < M ) (or given maximal number of expected false alarms) as well. Note, that by means of P ( M = ml M 2 m) = p we see that the Girshick-Rubin scheme and the Shiryaev scheme coincide for the geometric prior. In Shiryaev (1963b) the author considered schemes from the viewpoint of a stationary regime (unfortunately for continuous time only). Shiryaev introduced a quantity called mean delay time which in fact looks similar to the steady-state ARL (there remain some differences). Moreover, for obtaining the optimal scheme he let X (the continuous time counterpart to the geometric prior is an exponential one with density X exp(-X t)) tend to zero which corresponds to Pollak (1985) and Roberts (1966). The latter author took the
Girshick-Rubin scheme and set more pragmatically g = 0, while Pollak proved that the original Shiryaev scheme converges to that procedure which is now known as Shiryaev-Roberts scheme. Shiryaev (2001) notes that the statistic $ = ($t)t20 is called the Shiryaeu-Roberts statistic (the discrete time $CI, is equivalent to Girshick-Rubin's 2, with g = 0). Thus, in the sequel we maintain " Shiryaev-Roberts scheme" as title and " GRSR" as abbreviation for this famous limiting scheme. The Bayes schemes became not very popular in SPC literature contrary to the more theoretic oriented papers, cf. F'ris6n & de Mar6 (1991), F'ris6n & Wessman (1998), Risen (2003), and Mei (2003) for a very recent and lively collection of papers concerned with this approach. Barnard (1959) combined estimation and monitoring, so that measures from estimation theory could be exploited. Bather (1963) addressed the problem of economic design of control charts and used a couple of economic constants for setting up control charts. Contrary to (1) he based his study on monitoring a non-constant parameter, which itself fluctuates around a target value. His optimal scheme resembles the EWMA scheme of Roberts (1959). There are, of course, more papers about economic design. The incorporation of economic parameters in the control chart design, however, leads to a more complicated framework, so that mostly the simple Shewhart chart was treated alone. Thus, the current paper abstains from the economic design. For more details we confer to Ho & Case (1994a) for an overview about literature in that field up to 1994, and to Ho & Case (199413) for economic design for an EWMA control chart. A more recent overview is given in Keats, del Castillo, von Collani & Saniga (1997). An important contribution to control charting performance measures is Roberts (1966). He compared five (finally eight due to combinations) control charts by using one measure. On the one hand, nearly all state of the art schemes were covered. On the other hand, the author constituted a kind of benchmark for evaluating control charts. Unfortunately, he mixed two different concepts. While he used results from earlier published papers (Ewan & Kemp 1960) for considering the CUSUM scheme, which provided zero-state ARL values, he evaluated all other schemes by simulating a quantity which is more similar to the steady-state ARL. More precisely, he determined D9 in (5), that is the so-called expected conditional ARL (see Section 2). Evidently, Roberts was the first and for some time the only one who employed a kind of steady-state performance measure. Note that in case of the Shewhart chart all ARL types are equal, because only the most recent data point (we assume independent data here) does influence the current decision about a change. Roberts' paper provides a good basis to discuss measuring control chart performance. Therefore, the control charts used are introduced in some detail right here and compared by Roberts' measures as well as by the measures of Section 2. Let us recall the setup of the main schemes and the computational methods used by Roberts. He analyzed the one-sided case, which is the usual approach of more mathematically working SPC researchers (the "change
point followers"), while the more applied ones are dealing more frequently with two-sided schemes. In the whole study it was assumed, that the data are independently normally distributed, i. e. X , r v N ( A ,1) with A E {0,0.25,. . . ,5). The stopping rules (times) are written again in sequential statistics manner and represent the run length. Shewhart
0
X
chart
-
Shewhart (1931):
Moving Average (MA) - Roberts (1966):
Exponentially Weighted Moving Average Chart (EWMA) -Roberts (1959):
Lfiuedfirnits= inf { n E N : Zn > c J-}
Lvarying limits= inf n E IN : Zn > c
(1 - (1 -
, x A/(2
-
A)
The fixed limits EWMA chart is the more popular one. Roberts (1966), however, included the varying limit chart in his analysis. Cumulative Sum scheme (CUSUM) - Page (1954a):
Shiryaev-Roberts (GRSR) - Girshick & Rubin (1952), Shiryaev (1963b), Roberts (1966), Pollak (1985), here written in the In() fashion like in Roberts (1966):
The schemes ought to be adjusted to give an in-control ARL (zero-state, expected conditional) of 740, which is equal to the in-control ARL of the onesided X chart with 3a-limit. Moreover, the target out-of-control shift for the more complex charts is A = 1. In Table 1 the actual design parameters and computational methods used by Roberts (1966) for deriving the ARLs are collected. While it is not easy t o understand the way how Roberts (1966) "translated" results of previous papers into suitable ones for his comparison, the "translated" values are fortunately not very far from the true values.
Table 1. One-sided control charts considered in Roberts (1966)
chart design type of ARL all in one Shewhart c=3 MA n = 8, c = 2.79 'D EWMA A = 0.25, c = 2.87 'D CUSUM k = 0.47, h = 5 El(,) ( L ) GRSR
k
= 0.5, g = 390
'D9
computational method exact ( E m(L) = 741.43) Monte Carlo, 25 000 repetitions Monte Carlo, 25 000 repetitions "translation" of two-sided results of Ewan & Kemp (1960) Monte Carlo, 50000 repetitions
In Table 2, the original results of Roberts' and recent results are tabulated to demonstrate the effects of Roberts' mixture of performance measures. In order to obtain accurate numbers for our recent results, a Monte Carlo study with lo8 repetitions was performed, and a density recursive estimation with quadrature was applied except for MA as described in Knoth (2003). Both Table 2. Different ARL types for one-sided schemes given in Roberts (1966), original and recent results
chart
0 0.5 Shewhart 741 161 740 40 737 39 736 39 740 40 686 39 EWMA 689 39 685 38 740 34 CUSUM 719718 32 724 34 740 32 GRSR 690asa 30 697 33
A
type of ARL 1.0 1.5 2.0 2.5 3.0 44 15 6.3 3.2 2.0 la11 in one (exact) 10.2 6.0 4.6 3.8 3.3 Roberts 10.0 5.8 4.4 3.7 3.1 'Dg/V 8.9 4.2 2.6 1.9 1.5 L 10.1 5.1 3.5 2.7 2.2 Roberts 9.8 5.0 3.4 2.6 2.1 'Dg/D 2.6 2.2 10.0 5.1 3.4 L, fixed limits 9.2 4.4 2.7 2.0 1.5 C,varying limits 10.0 5.8 4.3 3.5 2.9 Roberts 9.29.1 5.1 3.6 2.8 2 . 4 ~ . ~ v9/'D 9.9 5.6 3.9 3.1 2.6 C 9.2 5.2 3.7 2.9 2.4 Roberts 9.18.9 5.25.13.73.6 2.92.8 2.4 'D9/'D 10.4 6.2 4.4 3.5 2.9 C
methods provide the same values. Therefore we take these results as the true ones. Note, that the standard error of the Monte Carlo study is about or less than (ARL value)/lo4. Besides, we applied the Gaui3-Legendre Nystrom method for solving the ARL integral equation (and the left eigenfunction
integral equation as well) except for EWMA with varying limits, where this method is not applicable. These results support the previous. In Table 2, the results originally reported by Roberts and the corresponding values of the newer study are boldly written. Furthermore, if the D values (the limit) differ from the Dg ("very" finite) ones, then it is indicated by an index which presents the D value. Roberts chose m = 9 as sufficiently large for approximating the case m = oo (m >> 1 might look more reasonable), which is supported by the more recent results for 2). For the EWMA control chart we cannot observe differences in the "non zero-state" ARL values (D9/D) in considering the fixed and the varying control limits scheme, at least for the chosen accuracy. Note that for all schemes not really the limit D is neither simulated nor derived by the density recursion based approach. But Dloo is taken as substitute, because the sequence { D m ) , = 1,2,. . . is nearly constant for considerably small m already. The completely integral equation oriented approach exploits implicitely that behavior as well (the dominant eigenvalue is computed by means of the power method) and confirms all these approximation results again. Roberts obtained good accuracy for the out-of-control results, while the incontrol values are evidently smaller (one positive exception only - MA) than the true values. The most importing issue of the above table consists in the fact that in case of the CUSUM scheme the worst-case ARL (E1(L) = W = WpS for standard CUSUM) was taken, while all other schemes were evaluated by taking an average measure (D9 z Dloo z D). Now, we return to the historical roots. Lorden (1971) introduced five years after Roberts (1966) a very pessimistic performance measure, the quantity W ( W L ~see ~ Section ~ ~ ~ 2). , He proved that the CUSUM control chart is asymptotically optimal in this sense. Later, Moustakides (1986) and Ritov (1990) extended this result to the finite case. Since for the classical CUSUM control chart the worst case and the usual zero state coincide (as for the simple Shewhart chart), the ARL could be established as a dominating measure. This property is not valid for the EWMA control chart, that was not very popular in those times. After Hunter (1986) and Lucas & Saccucci (1990) it regained attention. Afterall, it was very common to compare control charts in terms of their ARL. Remember that Woodall & Maragah (1990) in the discussion of Lucas & Saccucci (1990) described the inertia problem of EWMA control charts which causes bad worst-case behavior. In order to get some more insights into the inertia problem of the EWMA control chart we want to use the a-worst-case ARL, i. e. W,, which accounts for the probability of being in worst-case condition. We look on the (upper) EWMA example given by Roberts (1966). Remember that for deriving the ARL values, all our quadrature based approaches demand a bounded continuation region (the more popular Markov chain approach as well need it), so which seems that we truncate the region for our study at z, = -6 Thus, the to be reasonable for a (upper) threshold value of 2.87 J-.
+
EWMA statistic follows 2, = max {z,, (1 - A)ZnP1 AX,). There are no substantial differences between the truncated scheme and the primary scheme validated by the Monte Carlo results. For instance, for A = 1Monte Carlo simulation yields 10.0205 with standard error 0.0007, while quadrature provides 10.0193 in case of computation of the zero-state ARL, El (L) . Analogously in case of the steady-state ARL, D,the values are 9.8468 (standard error 0.0007) and 9.8473, respectively. Therefore we are allowed to use the truncated scheme for a worst case analysis. In Table 3, the corresponding values are tabulated. The worst-case ARL W is based on the zero-state ARL with zo = z,, i. e. the EWMA statistic 2, has to bridge the broadest gap to signal. The likelihood of 2, is small being Table 3. Worst-case ARLs, W e , for the upper EWMA chart of Roberts (1966)
with
CY E
chart
{0.05,0.01,0.001)
I
A 0 0.5 1.0 1.5 2.0 2.5 3.0
GRSR 1697 33 10.4 6.2 4.4 3.5 2.91
type of ARL
C/W
near by or just at the lower border (steady-state probability is smaller than lop9), so that in fact W is equal to W,lo-s. Recall that with (steady-state) probability 1 - a the average number of observations (or samples) from the change point m until signal is smaller than or equal to W,. Taking a = 0.5 (W.5, not given in Table 3), we observe the same values like for E1(L) for the given accuracy, i. e. the median-worst-case resembles the zero-state result (approximately). Regarding the numbers in Table 3, we see that the onesided EWMA control chart is inferior to CUSUM and GRSR in terms of the worst-case behavior even for "more likely worst cases". Pollak & Siegmund (1985), Pollak (1985), and Pollak (1985) considered a further worst-case measure in the 1980s. The quantity WpS (see again Section 2) takes only the worst change point position, but not the worst possible incontrol past as Lorden's measure does. Thus, it is not completely unexpected that a different scheme from CUSUM won the new race. The previous mentioned Shiryaev-Roberts rule with a "slight" modification is asymptotically optimal in terms of minimizing WPS. The modification consists in replacing the deterministic starting value zo by a random variable with a density which
characterizes the steady-state distribution of the chart statistic. This means that we start our control chart in an artificial steady-state. Mevorach & Pollak (1991) compared both the CUSUM and the Shiryaev-Roberts scheme by their steady-state behavior for exponentially distributed data. As evident Table 2 the steady-state results of both schemes are very similar, while the ShiryaevRoberts rule is slightly better. Mei (2003) mentioned that strict optimality for the W p Scriterion is still an open problem. For conservative users, however, the CUSUM chart exhibits that valuable worst-case behavior in terms of Lorden's measure. See in Table 2 that now CUSUM is slightly better (except A = 0.5, but the schemes were designed for A = 1) than GRSR (Shiryaev-Roberts). Recall that for CUSUM and GRSR schemes the zero state resemble the worst case. From Tables 2 and 3 we learn that in the one-sided framework CUSUM and GRSR are the dominating schemes. Only in terms of steady-state measures, EWMA behaves nearly as CUSUM and GRSR. The situation becomes better for EWMA in the two-sided case. In the famous EWMA paper by Lucas & Saccucci (1990) the two-sided case was examined. And the authors stated: "Our comparisons showed that the ARLs for the E W M A are usually smaller than the ARLs of CUSUM up to a value of the shift near the one that the scheme was designed to detect." This is a very common statement in control charting literature. Let us reconsider the results of their paper. In Table 4 the results of the comparison study are reorganized with some corrections (mostly regarding the worst-case ARL, where both authors did compute the wrong values for small A). The target mean change is A = 1. And in terms of W and 2)the control chart CUSUMl is the first choice for all considered shifts starting fiom 1. In the two-sided case, however, the differences between all these schemes are considerably small. For a more comprehensive study we refer to Section 5. Finishing our historical journey of performance measuring we want to cite some further ideas. In Dragalin (1988, 1994a) the optimality problem was considered if the out-of-control parameter is unknown. The resulting chart, a generalized CUSUM scheme, is based on the GLR (Generalized Likelihood Ratio) statistic. Some more recent results in that field are Gordon & Pollak (1997), Yakir (1998), and very recently the PhD thesis of Mei (2003). These results, however, stem from the "most theoretical stream" among statisticians working in change point detection. Thus, it needs some more time to understand what their results mean in practice (be aware of terms like "asymptotic"). The same is valid for results like Aue & Horvath (2004) and citations therein, where it is not clear whether the results could employed in SPC or not. The consideration of several papers by Lai (1973 - 2001) lead to further complications. He advocates the steady-state look on control charts, which will be more discussed in the next section. Finally, we mention Gan (1995), who took the median instead of the expectation of the run length. Margavio, Conerly, Woodall & Drake (1995) consider alarm rates, Tartakovsky (1995)
Table 4. Results from Lucas & Saccucci (1990), reordered and adjusted; the smallest ARL value among the four schemes is boldly typed; the second CUSUM scheme is added only here; the indexes indicates that the original result (equals to the index) is replaced; EWMAl - X = 0.133, c = 2.856, EWMA2 - X = 0.139, c = 2.866, CUSUMl - k = 0.5, h = 5, CUSUMz - k = 0.4, h = 6, E,(L) x 465 for all charts. chart
CUSUMl CUSUM2 EWMAl
1
.25
.5
139 120
38.0 33.6
114
32.6
A .75 1. 1.5 worst-case ARL W
17.0 10.4 5.75 16.5 10.6 6.19 steady-state ARL V 15.6 9.85g.84 5.62
2.
2.5
3.
4.01 4.41
3.11 2.57 3.46 2.88
3.98
3.13
2.61
zero-state ARL C
analyzes nonhomogeneous Gaussian processes, Poor (1998) exploites a n exponential instead of a linear penalty for the delay, and Morais & Pacheco (1998) and Morais (2002) employ stochastic orderings for the analysis of control charts. Trying a rksum6, we suggest t o roughly distinguish the worst-case and the steady-state approach of assessing detection power. Hence, the simple look a t the usual (zero-state) ARL becomes less and less appropriate. In the next section, a very surprising behavior of one-sided EWMA control charts will be demonstrated. Looking for minimal out-of-control (zero-state) ARLs leads in some cases to control charts with strange properties.
4 The dilemma of the minimal out-of-control ARL
criterion Already Lai (1995) criticized the uncontrolled usage of the (zero-state) ARL by saying this: The ARL constraint Ee,(T) 2 y stipulates a long expected duration to false alarm. However, a large mean of T does not necessarily
imply that the probability of having a false alarm before some specified time m is small. I n fact, it is easy to construct positive integer-value random variables T with a large mean y and also having a high probability that T = 1. And
then in FrisQn (2003) the "minimal out-of-control ARL criterion" was directly attacked. In the one-sided case, she proved that the optimal EWMA control chart for a given out-of-control shift has an arbitrary small X > 0. Up to now it seemed to be impossible to analyze EWMA control charts with such a small A. Novikov (1990) provides asymptotical ARL results for X -+ 0, but his approach could not be used here, because the alarm threshold is fixed and does not depend on A. This leads to E,(L) + cc for X -+ 0. Moreover, the larger E,(L) is the larger the optimal X (in terms of both C and 2)). While for CUSUM and GRSR the corresponding value does not depend on Em (L), it is generally difficult to analyse EWMA optimality in Novikov's framework. Thus, in Fris6n (2003) a Monte Carlo study was employed to illustrate this phenomenon. Furthermore, one-sided EWMA control charts are not usual for mean monitoring. Therefore, we illustrate the dilemma by considering an upper EWMA control chart for monitoring the normal variance. Before F'risQn (2003) it was common sense that an EWMA control chart design is chosen to ensure a given in-control ARL value and a minimal out-ofcontrol ARL. For instance, Mittag, Stemann & Tewes (1998) consider several control charts for monitoring a normal variance. Rational subgroups of size 5
Fig. 1. Like in Mittag et al. (1998): Relative ARL efficiency W(E) = ARLEWMA etc.(~)/ARL~hewhart S ( E ) (E = U/Q) of EWMA and CUSUM control charts for monitoring normal variance, E, (L) = 250.
are formed and the sample variance S2was computed. Based on S2,different control charts were considered. The whole comparison was summarized in a figure like Figure 1. The control charts under consideration are EWMA charts based on S2 and S with continuation regions [0,c:], an EWMA chart based on In S2with
reflecting barrier at l n a i = 1 (from Crowder & Hamilton 1992), a CUSUM scheme based on lnS2 (like in Chang & Gan 1995), and a CUSUM scheme based on S2 (not included in the original paper of Mittag et al. 1998). The ARLs were computed by means of the Nystrom method based on G a d Legendre quadrature for the ln S2 based charts, while piecewise collocation methods as in Knoth (2005a, 2004b) were employed for the S2 (and S) based charts. Mittag et al. (1998) utilized the Markov chain approximation which is less accurate. The target out-of-control standard deviation is 1.5 and the dominating (in terms of the zero-state ARL) EWMA-S2 control chart looks like:
Now, we do not want to discuss whether this is a fair comparison between the control charts by taking the zero-state ARL. On the contrary, we want to look for a more optimal EWMA-S2 chart in terms of the out-of-control ARL for E = 1.5. Mittag et al. (1998) took X E [0.05,1] which provides the smallest E1(L) for given E,(L) = 250. The lower border 0.05 was chosen because of numerical difficulties (and applicational reasons). What happens if we are able to consider X < 0.05? The answer is given by Figure 2.
Fig. 2. Out-of-control ARL E l ( L )vs. X for EWMA-S2control charts with in-control ARL E,(L) = 250, rational subgroups with size 5.
The dotted line marks the lower X limit used by Mittag et al. (1998). The = 0.000 042. Taking X = 0.000 041 smallest X that was used for Figure 2 is,iX,
would need a critical value c (to give E,(L) = oo) which is negative, so that does not belong to the transition region of the the starting value zo = chart. Thus, Xmi, seems to be a natural border for reasonable X's, because E,(Z,) = a; (for all n) should be a value of the inner transition region. The ARL values were computed by exploiting again the collocation approach and with increasing dimension for decreasing A. The smallest X was validated by a Monte-Carlo study with 10' repetitions which provided 250.103 (s.e. 0.091) and 1.3628 (s.e. 0.0000) for the collocation results E,(L) = 250 and E1(L) = 1.3630, respectively. First, it is an unpleasant behavior if the minimum is obtained at the (lower) border of the X domain. Second, the false alarm probability P,(L = 1) is about 0.4! Thus, we constructed an EWMA control chart with E,(L) = 250 and a high alarm probability at chart startup, whose easy construction Lai mentioned in his paper. Third, with decreasing X the inertia problem becomes more heavier, that is, the worst-case ARL increases. Hence, should we take the local instead of the global minimum? In Figure 3, where we take subsamples of size 2 (or use individual observations and known, fixed mean p o ) and again in-control ARL with E,(L) = 250, there is no local minimum. Everybody,
~02
Fig. 3. Out-of-control ARL El (L) vs. X for EWMA-S2 control charts with in-control
ARL E,(L) = 250, rational subgroups with size 2.
who wants to design an EWMA control chart in the usual way, will take the smallest X under consideration, which will lead to some possible strange side effects. However, the corresponding curve of the steady-state ARL, V ,for the EWMA-S2 control chart of Mittag et al. (1998), would lead to a global, inner minimum again, cf. Figure 4.
zero-state ARL .........
.. .-
;
steady-state ARL ...................................!.. ........ % :
Fig. 4. Out-of-control zero-state ARL E l ( L ) and steady-state ARL 2) vs. X for EwMA-s2 control charts with in-control ARL E,(L) = 250, rational subgroups with size 5 .
In Figure 4 we see that for not too small X both ARL types under consideration, i. e. E1(L)and V, nearly coincide. Thus, the zero-state ARL, E1(L), appears like a suitable performance measure as long it mimics the steady-state ARL, D. Note that, finally, the one-sided EWMA control chart could act as nontrivial case for Lai's anti-(zero-state) ARL example. Lai (1995) proclaimed the following:
I n practice, the system only fails after a very long in-control period and we expect many false alarms before the first correct alarm. It is therefore much more relevant to consider (a) the probability of no false alarm during a typical (steady state) segment of the base-line period and (b) the expected delay i n signaling a correct alarm, instead of the ARL which is the mean duration to the first alarm assuming a constant in-control or out-of-control value. Hence, the consideration of the steady-state behavior seems to be of current interest. The probability mentioned in (a) is directly linked to the incontrol V, because in steady state the run length is geometrically distributed. And for (b) we employ the out-of-control V. Eventually, as for similar considerations of Markov chain quasi-stationary behavior we ought to distinguish the steady-state concept described by conditioning on {L _> m ) with m -t oo, and a kind of cyclical steady-state where after each blind alarm the chart is restarted (see, e. g., Shiryaev 196313).
In the next section a more detailed study by exploiting the random variable D* provides a good comparison of all the competing ARL concepts for CUSUM, EWMA, and GRSR schemes.
5 The steady-state delay D* as framework for comparing control chart performance measures First of all, during the whole section we consider control charts monitoring a normal mean. The in-control mean is po = 0, while the target out-of-control mean is p1 = 1. The variance is set by g 2 = 1. The in-control ARL E,(L) is given by 500. Remember that the zero-state ARL could be written as L,(zo) with actual mean p and initializing value zo (for the chart statistic 2,). Then with the steady-state density of Z,, or more suitably let Z* the steady-state chart statistic which possesses the density function +(.) (see Appendix) we define the steady-state delay by D; = L,(Z*) . Due to the positiveness of density +(.) and using the Lebesgue's Dominated Convergence Theorem we conclude that (see again Appendix) W = ess sup D; D
,
= E(D;).
As conjectured before we state that WpS = max{D, L,(zO)). With D* (in the sequel we suppress the index p) we utilize a random variable, whose expectation resembles the steady-state ARL, D, its essential supremum is equal to the worst-case ARL, W, due to Lorden, and its range coincide with that of the zero-state ARL function. And for small m already we could observe that the probability laws of D& and D* are nearly the same. Thus it might be a good framework for dealing with the competing control chart measures. This analysis will be done in terms of the survival function of D*. First, we start with one-sided control charts. To deal with the one-sided EWMA problem we allow reflection at a pre-specified lower border z,. Moreover, we set X = 0.155 which minimizes the out-of-control D at p l = l for the chart without reflexion, cf. Section 4. Figure 5 demonstrates the effects of different values of z,. Increasing the reflection border improves the worst-case and the steady-state ARL. There are some turning points, however. Regarding the values of Table 5 we realize that the best W could be achieved around 0 and the best D between -1 and 0 (we take, finally, z, = -0.4). Setting z, = 0, that is reflection at the incontrol mean of the chart statistic (without reflection), is a usual approach for one-sided variance EWMA control charts (see Section 4). Therefore, we choose the EWMA charts with z, E (-6, -0.4,O) (z, = -6 should mimic the
Fig. 5. Survival function P(D* > 1) for one-sided EWMA charts (A = 0.155) with different reflection borders z,, E,(L) = 500, at out-of-control mean p = 1.
Table 5. Zero-state, steady-state, and worst-case ARL for one-sided EWMA charts (A = 0.155) with different reflection borders z,, X = 0.155, E,(L) = 500, at out-ofcontrol mean p = 1.
chart without any reflection) as candidates for the comparison with CUSUM and GRSR. Now, CUSUM and GRSR are adjusted (Ic = 0.5) to work perfectly for a shift of size 1. In Figure 6, all 5 control charts under consideration are compared in terms of the survival function P ( D * > 1). In Table 6 some numerical values complement the figure. In Figure 6, P ( D * > I ) curves looks very similar up t o 1 = 9. At 1 = 9.13 = WcusuM, PcusuM(D* > I ) jumps to 0 driven by the reflection barrier 0 of the CUSUM scheme. Similar, but weaker effects one can observe for the reflected EWMA charts (see as well Figure 5). The first EWMA curve that jumps, belongs to z, = 0, then the curve jumps with z, = -0.4, and for z, = -6 we do not see any jump ( P ( D * > 12) is about 0.004, while W = W,lo-, = 14.52). The latter EWMA chart exhibits the unhappy tail behavior: with (steadystate) probability of about 0.2 the delay is larger than the worst cases of the 4 competitors. Based on the zero-state ARL, though, this EWMA chart would
Fig. 6. Survival function P(D* > 1) for one-sided control charts, E,(L) out-of-control mean p = 1.
= 500, at
Table 6. Zero-state, steady-state, and worst-case ARL for one-sided control charts, E,(L) = 500, at out-of-control mean p = 1.
chart GRSR CUSUM
EWMA z, = -6
z, = -0.4
ZT = 0
be the best chart. Finally, the GRSR scheme (Shiryaev-Roberts) provides a smooth and beneficial shape with the smallest expectation V = E ( D * ) .Note that the CUSUM chart operates with a probability larger than 0.5 in worst case condition. Now, let us consider the two-sided case. For the EWMA control chart, things become easier. No reflection is needed and the worst-case problem lessens. We include two different values of A, one for the smallest out-of-control L: (A = 0.1336) and one for the smallest out-of-control 2) (A = 0.1971). The two-sided CUSUM scheme is formed as usually by two one-sided charts. While it is untroublesome to compute the zero-state ARLs (see Lucas & Crosier 1982), it is much more demanding for the steady-state ARL (and of course for dealing with D*). For the latter, a two-dimensional Markov chain (for the Markov chain approach see Brook & Evans 1972) is adapted. Then by means of that discrete model, 2) and P ( D * > 1) are approximated. In the same way the two-sided GRSR scheme is treated (with the exception that the zero-state ARL is approximated by the Markov chain method as well).
We remark that there are different ways of constructing a two-sided GRSR scheme. According t o Pollak & Siegmund (1985) we would take the average of two one-sided GRSR statistics (one upper, one lower scheme), and the combined scheme signals if that value crosses a threshold. We used the usual coupling as in the CUSUM case, that is, the first single scheme which signals, generates the alarm of the combined scheme. While the zero-state ARL values coincide (LPS = Lcoupling= 11.142), the steady-state values slightly differ (VpS = 9.644 # 9.630 = 'Dcoupling). The last chart is a modification of the CUSUM chart due to Crosier (1986), which results in a single chart. For the sake of simplicity we do not consider couplings of two one-sided EWMA charts. These control charts are compared in the same way as we did it for the 7 the survival function P(D*> 1) and in Table 7 one-sided ones. In Figure the ARL types under consideration are presented.
Fig. 7. Survival function P(D* > I ) for two-sided control charts, E,(L) out-of-control mean p = 1.
= 500,
at
First of all, the two-sided CUSUM and GRSR schemes behave similarly to their one-sided counterparts. Crosier's CUSUM modification surpasses both EWMA charts in all ARL types (see Table 7). The worst-case ARLs, W, of the EWMA charts, however, reach tenable size. But with probability of about 0.3 both EWMA delays are larger than the worst cases of GRSR and CUSUM. The main advantage of the EWMA charts (and Crosier's CUSUM) is the simpler setup and analysis. At last, what measure, W or V, turns out to be the appropriate one? We see from Figure 7 and partially from Figure 6 that W does not resemble a "representive" measure. For CUSUM only the worst case is the usual case.
Table 7. Zero-state, steady-state, and worst-case ARL for two-sided control charts, E,(L) = 500, at out-of-control mean p = 1. chart GRSR CUSUM
CrosierA = 0.1336 X = 0.1971 CUSUM EWMA
Looking at the shapes of the P ( D * > 1) curves of CUSUM and GRSR, it is difficult to decide which scheme is better. For 1 < WcusuM the GRSR curve lies below the CUSUM curve, and for 1 2 W c u s u ~vice versa. Taking the expectation of D* seems to be a reasonable answer for that problem. But, further questions could be asked. Should we consider also the second moment or quantiles of the run length? Both types were adressed in literature. However, optimal solutions are now even more difficult to find. Finally, I would like to take this opportunity to thank the referee for helpful comments and corrections.
6 Appendix Properties of D* Based on the filtrations given in (2) we rewrite the conditional ARL D h for Markovian control charts (that is, the distribution of the current chart statistic does only depend on the previous statistic) with transition kernel M ( . , .). For instance, the EWMA transition kernel for monitoring normal mean looks like M(5, z) = 4((z - (1 - X)i)/X) /A with the normal density $(.). Denote 0 the continuation region [c: ,ct].
The measure M is equivalent to the Lebesgue measure with one exception for control charts with a reflecting barrier c; (or c: analogously), where M(cl*) = 1 (see Woodall 1983). Madsen & Conn (1973) proved that for primitive kernel functions M(., .) (fulfilled, e. g., for monitoring normal mean or variance),
fi(-) uniformly converges to $(.), the left normalized, positive eigenfunction of the kernel M ( - ,.). Let D* = L(Z*) with Z*
N
$(.)
Then, by means of Lebesgue's Dominated Convergence Theorem we can prove that
fkPl( z )L ( z )dM ( z ) =
lim E , ( ~ - m + l l ~ > m = ) lim Dm
m-co
m--roo
=v, and P(D* > I )
= =
10
$ ( z ) l { L ( z )> I ) d M ( z )
lirn m+m
10
f~-l(~)l{~(~)>l)dM(~)
= lim P(D; m+cc
>I).
For one-sided control charts, the ARL function L(z) is usually decreasing in z so that by using L ( z ) > 1 H z < C 1 ( l ) the above survival function simplifies to (0 = [c; ,c;])
P(D* > 1 )
= 1 ( 1 ) + ( z )d M ( Z ) .
For two-sided control charts things become more complicate: CUSUM and GRSR need a two-dimensional analysis, while for EWMA the L(z) is not monotone anymore.
References Aroian, L. A. & Levene, H. (1950). The effectiveness of quality control charts, J. Amer. Statist. Assoc. 45: 520-529. Aue, A. & Horvath, L. (2004). Delay time in sequential detection of change, Stat. Probab. Lett. 67: 221-231. Barnard, G. A. (1959). Control charts and stochastic processes, J. R. Stat. Soc., Ser. B 21(2): 239-271. Bather, J. A. (1963). Control charts and minimization of costs, J. R. Stat. Soc., Ser. B 25: 49-80. Brook, D. & Evans, D. A. (1972). An approach to the probability distribution of CUSUM run length, Bzometrzka 59(3): 539-549. Chang, T. C. & Gan, F. F. (1995). A cumulative sum control chart for monitoring process variance, Journal of Quality Technology 27(2): 10S119.
Crosier, R. B. (1986). A new two-sided cumulative quality control scheme, Technometrics 28(3): 187-194. Crowder, S. V. & Hamilton, M. D. (1992). An EWMA for monitoring a process standard deviation, Journal of Qualzty Technology 24(1): 12-21. Dragalin, V. (1988). Asimptoticheskie reshenya zadachi obnaruzhenya razladki pri neizvestnom parametre, Statisticheskie Problemy Upravlenya 83: 47-51. Dragalin, V. (1994). Optimality of generalized Cusum procedure in quickest detection problem, Proceedings of the Steklov Instztute of Mathematics 202(4): 107119. Ewan, W. D. & Kemp, K. W. (1960). Sampling inspection of continuous processes with no autocorrelation between results, Bzometmka 47: 363-380. Friskn, M. (2003). Statistical Surveillance. Optimality and Methods, Int. Stat. Rev. 71(2): 403-434. Friskn, M. & de Mark, J. (1991). Optimal surveillance, Bzometrzka 78(2): 271-280. FrisBn, M. & Wessman, P. (1998). Quality improvements by likelihood ratio methods for surveillance, in B. Abraham (ed.), Quality improvement through statistical methods. International conference, Cochin, India, December 28-31, 1996, Statistics for Industry and Technology, Boston: Birkhauser, pp. 187-193. Gan, F. F. (1995). Joint monitoring of process mean and variance using exponentially weighted moving average control charts, Technometrzcs 37: 446-453. Girshick, M. A. & Rubin, H. (1952). A Bayes approach to a quality control model, Ann. Math. Stat. 23: 114-125. Gordon, L. & Pollak, M. (1997). Average run length to false alarm for surveillance schemes designed with partially specified pre-change distribution, Ann. Stat. 25(3): 1284-1310. Hawkins, D. M., Qiu, P. & Kang, C. W. (2003). The changepoint model for Statistical Process Control, Journal of Qualzty Technology 35(4): 355-366. Ho, C. & Case, K. E. (1994a). Economic design of control charts: a literature review for 1981 - 1991., Journal of Qualzty Technology 26: 39-53. Ho, C. & Case, K. E. (1994b). The economically-based EWMA control chart, Int. J. Prod. Res. 32(9): 2179-2186. Hunter, J. S. (1986). The exponentially weighted moving average, Journal of Qualzty Technology 18: 203-210. Keats, J. B., del Castillo, E., von Collani, E. & Saniga, E. M. (1997). Economic modelling for statistical process control, Journal of Quality Technology 29(2): 144147. Knoth, S. (2003). EWMA schemes with non-homogeneous transition kernels, Sequential Analysis 22(3): 241-255. Knoth, S. (2005a). Accurate ARL computation for EWMA-SZ control charts, Statistics and Computing 15(4): 341-352. Knoth, S. (2005b). Computation of the ARL for CUSUM-S2 schemes, Computatzonal Statistics &' Data Analysis in press. Lai, T . L. (1973). Gaussian processes, moving averages and quick detection problems, Ann. Probab. l(5): 825-837. Lai, T. L. (1974). Control charts based on weighted sums, Ann. Stat. 2: 134-147. Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems, J. R. Stat. Soc., Ser. B 57(4): 613-658.
Lai, T. L. (1998). Information bounds and quick detection of parameter changes in stochastic systems, IEEE Transactzons on Informatzon Theory 44(7): 29172929. Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges, Statzstzca Sznzca 11:303-408. Lorden, G. (1971). Procedures for reacting to a change in distribution, Ann. Math. Stat. 42(6): 1897-1908. Lucas, J. M. & Crosier, R. B. (1982). Fast initial response for CUSUM qualitycontrol schemes: Give your CUSUM a head start, Technometncs 24(3): 199205. Lucas, J. M. & Saccucci, M. S. (1990). Exponentially weighted moving average control schemes: Properties and enhancements, Technometncs 32: 1-12. Madsen, R. W. & Conn, P. S. (1973). Ergodic behavior for nonnegative kernels, Ann. Probab. 1: 995-1013. Margavio, T . M., Conerly, M. D., Woodall, W. H. & Drake, L. G. (1995). Alarm rates for quality control charts, Stat. Probab. Lett. 24: 219-224. Mei, Y. (2003). Asymptotzcally optzmal methods for sequentzal change-poznt detectzon, Ph. D. dissertation, California Institute of Technology. Mevorach, Y. & Pollak, M. (1991). A small sample size comparison of the Cusum and Shiryayev-Roberts approaches to changepoint detection, Amencan Journal of Mathematzcal and Management Sczences 11: 277-298. Mittag, H.-J., Stemann, D. & Tewes, B. (1998). EWMA-Karten zur ~ b e r w a c h u n ~ der Streuung von Qualitatsmerkmalen, Allgemeznes Statzstzsches Archzv 82: 327-338. Morais, M. C. (2002). Stochastzc ordenng zn the performance analyszs of qualzty control schemes, Ph. D. dissertation, Instituto Superior TBcnico, Technical University of Lisbon. Morais, M. C. & Pacheco, A. (1998). Two stochastic properties of one-sided exponentially weighted moving average control charts, Commun. Stat. Szmula. Comput. 27(4): 937-952. Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions, Ann. Stat. 14(4): 1379-1387. Novikov, A. (1990). On the first passage time of an autoregressive process over a level and a application to a "disorder" problem, Theor. Probabzlzty Appl. 35: 269-279. translation from Novi:1990a. Page, E. S. (1954a). Continuous inspection schemes, Bzometnka 41: 100-115. Page, E. S. (195413). Control charts for the mean of a normal population, J. R. Stat. Soc., Ser. B 16: 131-135. Pollak, M. (1985). Optimal detection of a change in distribution, Ann. Stat. 13: 206227. Pollak, M. & Siegmund, D. (1985). A diffusion process and its applications to detecting a change in the drift of brownian motion, Bzometnka 72(2): 267-280. Poor, H. V. (1998). Quickest detection with exponential penalty for delay, Ann. Stat. 26(6): 2179-2205. Ritov, Y. (1990). Decision theoretic optimality of the CUSUM procedure, Ann. Stat. 18(3): 1464-1469. Roberts, S. W. (1959). Control-charts-tests based on geometric moving averages, Technometncs 1: 239-250. Roberts, S. W. (1966). A comparison of some control chart procedures, Technometncs 8: 411-430.
Shewhart, W. A. (1931). Economzc Control of Qualzty of Manufactured Product, D. van Nostrand Company, Inc., Toronto. Shiryaev, A. N. (1963a). Ob otimal'nykh metodakh v zadachakh skorejshego obnaruzheniya, Teomya Veroyatnostez z eye pmmenenzya 8(1): 26-51. in Russian. Shiryaev, A. N. (1963b). On optimum methods in quickest detection problems, Theor. Probabzlzty Appl. 8: 22-46. Shiryaev, A. N. (1976). Statzstzcal sequentzal analyszs. Optzmal stoppzng rules. (Statzstzcheskzj posledovatel'nyj analzz. Optzmal'nye pravzla ostanovkz), Moskva: Nauka. Shiryaev, A. N. (2001). Essentials of the arbitrage theory, Part 111. http://www.ipam.ucla.edu/publications/fm200l/ashiryaev1.pdf. Tartakovsky, A. G. (1995). Asymptotic properties of CUSUM and Shiryaev's procedures for detecting a change in a nonhomogeneous gaussian process, Math. Methods Stat. 4(4): 389-404. Woodall, W. H. (1983). The distribution of the run length of one-sided CUSUM procedures for continuous random variables, Technometmcs 25: 295-301. Woodall, W. H. & Maragah, H. D. (1990). Discussion of "exponentially weighted moving average control schemes: Properties and enhancements", Technometmcs 32: 17-18. Woodall, W. H. & Montgomery, D. C. (1999). Research issues and ideas in statistical process control, Journal of Qualzty Technology 31(4): 376-386. Yakir, B. (1998). On the average run length to false alarm in surveillance problems which possess an invariance structure, Ann. Stat. 26(3): 1198-1214.
Misleading Signals in Joint Schemes for p and a Manuel Cabral Moraisl and Ant6nio Pacheco*12 Instituto Superior Tkcnico, Departamento de Matembtica/CEMAT Av. Rovisco Pais, 1049-001 Lisboa, PORTUGAL majQmath.ist . u t l . p t apacheco0math. i s t .utl .p t
Summary. The joint monitoring of the process mean and variance can be achieved by running what is termed a joint scheme. The process is deemed out-of-control whenever a signal is observed on either individual chart of a joint scheme. Thus, the two following events are likely to happen: a signal is triggered by the chart for the mean although it is on-target and the standard deviation is off-target; the mean is out-of-control and the variance is in-control, however, a signal is given by the chart for the standard deviation. Signals such as these are called misleading signals ( M S ) and can possibly send the user of the joint scheme to try to diagnose and correct a nonexistent assignable cause. Thus, the need to evaluate performance measures such as: the probability of a misleading signal (PMS); and the number of sampling periods before a misleading signal is given by the joint scheme, the run length t o a misleading signal (RLMS). We present some striking and instructive examples that show that the occurrence of misleading signals should be a cause of concern in practice. We also establish stochastic monotonicity properties for RLMS and monotone behaviors for PMS, which have important implications in practice in the assessment of the performance of joint schemes for the mean and variance of process output.
1 Joint schemes for p and a Control schemes are widely used as process monitoring tools t o detect simultaneous changes in the process mean p and in its standard deviation (T which can indicate a deterioration in quality. T h e joint monitoring of these two parameters can be achieved by running what is grandly termed a joint (or combined) scheme.
Table 1. Some individual and joint schemes for /I and u. Individual scheme for p
Acronym
X
S-p c-ll CS - p E-P ES - p
CUSUM Combined CUSUM-Shewhart
EWMA Combined EWMA-Shewhart
Sf -p C+ - p CS+ - p E+ - p Es+ - p
Upper one-sided x Upper one-sided CUSUM Combined upper one-sided CUSUM-Shewhart Upper one-sided E W M A Combined upper one-sided E WMA-Shewhart Individual scheme for a
Sf -0 C+ - 0
E+ - a ESf - u
Upper one-sided S2 Upper one-sided CUSUM Combined upper one-sided CUSUM-Shewhart Upper one-sided E W M A Combined upper one-sided E WMA-Shewhart
Joint scheme
Scheme for
CSC
SS CC CCS EE CES
-
0
S-fi
c-fi CS-p E -P ES-11
fi
Scheme for u
S+ - 0 C+ - 0 CS+ - u Ef -0 ES+ - o
The most popular joint schemes are obtained by simultaneously running a control scheme for p and another one for a (see for instance Gan (1989, 1995)), such as the ones in Table 1. Primary interest is usually in detecting increases or decreases in the process mean, and yet we consider both standard charts for p and upper one-sided charts for p . The former individual charts have acronyms S - p , C - p , CS-p, E - p and ES - p and the latter are denoted by Sf - p, C + - p, C S + - p, E+ - p and ES+- p; the description of these charts can be found in Table 1. Moreover, we only consider the problem of detecting inflations in the process standard deviation, because an increase in a corresponds to a reduction in quality and, as put by Reynolds Jr. and Stoumbos (2001), in most processes, an assignable cause that influences the standard deviation is more likely to result in an increase in a. Thus, only upper one-sided charts for a are considered in this paper. Their acronyms are St - a, C+ - a, C S + - a, E+ - a and ES+ - a and their description can be also found in Table 1.
As a result we shall deal with the joint schemes denoted by SS, C C , C C S ,
EE, C E S , S S + , C C + , C C S + , EE+ and C E S + , as described in Table 1. These joint schemes involve the individual charts for p and a whose summary statistics and control limits can be found in Tables 11 and 12, respectively, in the Appendix.
2 Misleading signals The process is deemed out-of-control whenever a signal is observed on either individual chart of the joint scheme. Thus, a signal from any of the individual charts could indicate a possible change in the process mean, in the process standard deviation or in both parameters. Moreover, the following two types of signals are likely to happen: a a
a signal is triggered by the chart for p although p is on-target and a is off-target; p is out-of-control and a is in-control, however, a signal is given by the chart for a.
These are some instances of what St. John and Bragg (1991) called "misleading signals" (MS). These authors identified the following types of misleading signals arising in joint schemes for p and a: I. the process mean increases but the signal is given by the chart for a, or the signal is observed on the negative side of the chart for p; 11. p shifts down but the signal is observed on the chart for 0,or the chart for p gives a signal on the positive side; III.an inflation of the process standard deviation occurs but the signal is given by the chart for p. Only type I11 correspond to what is called a "pure misleading signal" by Morais and Pacheco (2000) because it is associated t o a change in the value of one of the two parameters that is followed by an out-of-control signal by the chart for the other parameter - it corresponds to misinterpreting a standard deviation change as a shift in the mean. However, there is a situation that also leads to a "pure misleading signal" and is related to both misleading signals of Types I and 11: 1V.a shift occurs in p but the out-of-control signal is observed on the chart for a. This is called a misleading signal of Type IV (although it is a sub-type of Types I or 11) by Morais and Pacheco (2000) and it corresponds to misinterpreting a mean change as a shift in the process standard deviation. Let us remind the reader that diagnostic procedures that follow a signal can differ depending on whether the signal is given by the chart for the mean
or the chart for the standard deviation. Moreover, they can be influenced by the fact that the signal is given by the positive or negative side of the chart for p (that is, the observed value of the summary statistic is above the upper control limit or below the lower control limit, respectively). Therefore a misleading signal can possibly send the user of a joint scheme in the wrong direction in the attempt to diagnose and correct a nonexistent assignable cause (St. John and Bragg (1991)). These misleading results suggest inappropriate corrective action, aggravating unnecessarily process variability and increasing production (inspection) costs. In addition, we strongly believe that no quality control operator or engineer with proper training would be so naive to think that a signal from the scheme for the mean only indicates possible shifts in the mean. However, based on the independence between p and the RL distributions of the schemes for a, signals given by the scheme for a are more likely to be associated t o an eventual shift in this parameter. Nevertheless, the main question here is not whether there will be misleading signals but rather: 0
the '(probability of a misleading signal" (PMS); and the number of sampling periods before a misleading signal is given by a joint scheme, the "run length to a misleading signal" (RLMS).
St. John and Bragg (1991) believed that the phenomenon of misleading signals had not been previously reported. The fact that no such studies had been made is rather curious because misleading signals can arise in any joint scheme for multiple parameters (as, e.g., the multivariate CUSUM quality control schemes proposed by Woodall and Ncube (1985)) and also in any twosided control scheme for a single parameter. But, in fact, as far as we have investigated, there are few references devoting attention t o misleading signals or even realizing that one is confronted with such signals.
St. John and Bragg (1991). Figure 3 of this reference illustrates the frequency of misleading signals for various shifts in the process mean when a joint scheme of type (X, R) is used; the results were obtained considering 5000 simulated runs of subgroups of five observations from a normal process with mean p and standard deviation u. Yashchin (1985). In Figure 10 of Yashchin (1985) we can find three values of the probability of a signal being given by the upper one-sided chart for p when there is a decrease in this parameter; but the author does not mention that these values refer in fact to the probability of a misleading signal for a two-sided control scheme for p which comprises an upper onesided chart and a lower one-sided chart for p. Morais and Pacheco (2000). These authors provide formulae for the probability of misleading signals of Types I11 and IV for joint schemes for p and a. Based on those expressions these probabilities are evaluated for the joint schemes S S and EE. This paper also accounts for the comparison of these two joint schemes, not only in terms of conventional performance
measures such as ARL and RL percentage points, but also with regard to the probabilities of misleading signals. Morais and Pacheco (2001b). This paper introduces the notion of run length to a misleading signal and provides monotonicity properties to both PMS and RLMS of the joint EWMA scheme E E + . Reynolds Jr. and Stoumbos (2001). This paper refers to the joint monitoring of p and a using individuals observations and also discusses the phenomenon of misleading signals (although not referred as such). Table 3 provides simulation-based values not only of the probability of misleading signals of Types I11 and IV but also of the probability that correct signals (i.e, non misleading signals) occur and of the probability of a simultaneous signal in both individual charts, when p is on-target and a is out-of-control, and when 5 is in-control and p is off-target. The authors claim that these probabilities can provide guidelines in the diagnosis of the type of parameter(s) shift (s) that have occurred. Keeping all this in mind, this paper provides striking and instructive examples that alert the user to the phenomenon of (pure) misleading signals, namely when dealing with the ten joint schemes for ji and a in Table 1, whose constituent individual charts are introduced in the same table and described more thoroughly in the Appendix. The monotonicity behaviors of PMSs and the stochastic monotonicity properties of RLMSs of some of these joint schemes are also addressed in this paper. Comparisons between the joint schemes are also carried out, based on PMSs and RLMSs. The numerical study that we conduct is designed with careful thought into the appropriate selection of individual chart parameters to ensure common ARLs for these charts and hence fair comparisons among the joint schemes. Based on this extensive study, we believe that the PMSs and the RLMSs of Types I11 and IV should also be taken in consideration as additional performance measures in the design of joint schemes for p and a or any joint scheme for more than one parameter.
3 Probability of a misleading signal (PMS) Throughout the remainder of this paper po and a 0 denote the in-control process mean and standard deviation (respectively). We shall also consider that the shift in p is represented in terms of the nominal value of the sample mean standard deviation 6 = f i ( p - po)/ao and the inflation of the process standard deviation will be measured by 6' = a/ao with: -00 < 6 < +CQ and 6' 1, for the joint schemes S S , CC, EE, C C S and C E S ; and 6 0 and 6' 2 1, for schemes SS+,C C + , EE+, CCS+ and C E S + . In this setting misleading signals of Type I11 occur when 6 = 0 and 6' > 1. On the other hand, misleading signals of Type IV occur when: S # 0 and 6' = 1, for schemes SS, C C , EE, C C S and C E S ; and 6 > 0 and 8 = 1, for
>
>
schemes SS+,C C f , EE+, CCS+ and C E S + . And since we are dealing with a normally distributed quality characteristic, the summary statistics of the two individual charts for p and a are independent, given (S,8). Thus, we can provide plain expressions for the probabilities of misleading signals of Types I11 and IV, denoted in general by PMSIII(B) and P M S I ~ ( S ) . Lemma 1 - T h e expressions of the PMSs of Types 111 and I V for joint schemes involving individual schemes with independent s u m m a r y statistics (such as the t e n joint schemes i n Table 1 ) are
(or S > 0 w h e n we are using upper one-sided schemes for p), where RL,(6, 8 ) and RL,(B) represent the r u n lengths of the individual schemes for p and 5 , and P x (x), F x (x) and Fx (x) denote the probability, distribution and survival function of the discrete random variable X . The exact expressions of the PMSs of the joint schemes SS and SS+ follow immediately by plugging in the survival functions of the run lengths RL, and RL, into equations (2) and (3). The approximations to PMSs of the remaining joint schemes are found by using the Markov approximations (Brook and Evans (1972)) to the survival function of RL, and RL, and truncating the series (2) and (3). The approximate values of the PMSs converge to the true values due to the convergence in law of the approximate RLs involved in the definition of the PMSs. Theorem 2 - T h e monotonicity properties (1)-(16) in Table 2 are valid for the (exact) PMSs of Types 111 and I V of the joint schemes in Table 1 based exclusively o n upper one-sided individual charts: SS+, CC+ , CCS+, EE+ and C E S + .
The monotonicity properties of PMSs of Types I11 and IV given in Theorem 2 are intuitive and have an analytical justification - most of them follow
Table 2. Monotonicity properties of the (exact) PMSs of Types I11 and IV.
Joint scheme
Type I11 (6 = 0,0 > 1)
Type IV (6 > 0,0 = 1)
SS
(1) PMSIII(O) -1 with 0
(8) PMSrv(6) 1with 6
CC+, CCS+
(2) PMSrrl(8) with a! (3) PMSIII(B) 1 with P (Cl) PMSrrr (6)" (4) PMSrrr(6') 1 with k i (5) PMSrrr(O) I' with k$
+
r
(9)
(10) (11) (12) (13)
PMSrv(6) 1 with a! PMSrv(6) 1 with P PMSrv(6) 4 with 6 PMSrv(0) 1 with k,$ PMSrv(0) 1 with k$
(6) PMSrrr(6) T with a! (7) PMSrrr(0) 4 with P (C2) PMSrrr (8)"
(14) PMSrv(6) 1 with a (15) PMSrv(6) with P (16) PMSrv(6) 1 with 6 *(Cl, C2) Conjecture of no monotonc behavior in terms of 0
E E + , CESf
directly from expressions (1)-(4) of Lemma 1, and from the stochastic monotone properties of the RLs of the individual schemes for p and of those for 0.
Take for instance the PMS of Type IV of the joint scheme SS+,
It is a decreasing function of 6 because: the distribution of RLs+-,(l) does not depend on 6; RLF(6, 1) stochastically decreases with 6 and so does its survival function for any i and PMSIv,ss+(b) defined by Equation (3). As a consequence the joint scheme SS+ tends to trigger less misleading signals of Type IV as 6 increases. As for the monotone behavior of the PMS of Type I11 of the joint scheme S S + , PMSIII,ss+ (Q),it follows immediately that it is a decreasing function of Q if we note that, for 0 > 1,
=
[I-
1- -
-l
1 - W t 10)
Note that the properties of the PMS of the remaining joint schemes depend on parameters like: a and ,G that refer to the head start given to the individual Markov type scheme for p and a (respectively); and Ic that represents the reference value of a CUSUM scheme. The proof of the results concerning the Markov-type joint schemes can be found in Morais (2002, pp. 118-119) and follows from several facts: stochastically monotone matrices (Daley (1968)) arise naturally when we are dealing with upper one-sided Markov-type schemes;
r
it is possible to rewrite the survival function of RL of any of these schemes as an increasing functional of the associated Markov chain; we can establish a stochastic order relation, in the sense of Kalmykov (Kalmykov (1962)), between pairs of probability transition matrices, which allows t o compare the associated Markov chains, i.e., any pair of increasing functionals (Shaked and Shanthikumar (1994, p. 124)), thus establishing a n inequality between the survival functions of two competing run lengths.
The conjectures ( C l ) and (C2) included in the table of Theorem 2 concern the PMSs of Type I11 of the joint schemes C C + ( C C S + ) and EE+ ( C E S + ) , respectively, and surely deserve a comment. P M S l l l ( 0 ) involves in its definition RL,(O, 0) and RL,(O). If, in one hand, this latter random variable stochastically decreases with 0 (see Morais (2002, p. 169),), on the other hand, we cannot tell what is the stochastic monotone behavior of RL,(O, 8) in terms of 0, according to Morais (2002, pp. 169 and go), and Morais and Pacheco (2001a) for the case of scheme E+-p. Thus, establishing a monotonic behavior of PMSIIl(8) in terms of 0 seems to be non trivial. However, as we shall see, the numerical resuits in the next section not only illustrate Theorem 2 but also support conjectures ( C l ) and (C2). In fact values of P M S I ~ I ( O ) seem to decrease and then increase with 0 for the joint schemes CC+, C C S + , EEf and C E S + . The practical significance of this non-monotonous behavior is as follows: the joint scheme tendency t o misidentify a shift in u can increase as the displacement in c becomes more severe.
4 PMS: numerical illustrations As an illustration, we provide values for the PMSs of Types I11 and IV of the ten joint schemes in Table 1 considering: r r
r
sample size equal to n = 5; nominal values po = 0 and uo = 1; and b = 0.05,0.10,0.20,0.30,0.40,0.5,0.6,0.7,0.8,0.9,1.0,1.5,2.0,3.0and 0 = 1.01,1.03, 1.05,1.10,1.20,1.30,1.40,1.50,1.60,1.70,1.80,1.90,2.00, 3.00,
which practically cover the same range that was found useful previously in Gan (1989). With the exception of the joint schemes SS and SS+,the values of these probabilities are approximate and based on the Markov approach using 41 transient states and considering a relative error of in the truncation of the series in (2) and (3). The range of the decision intervals [LCL, UCL) of all the individual schemes for p and for u has been chosen in such way that, when no head start has been adopted, these schemes are approximately matched in-control and all the corresponding in-control ARLs are close to 500 samples (see Table 3). Take for instance the Shewhart-type schemes:
Table 3. Parameters and in-control ARLs of individual and joint schemes for p and 0.
Scheme for p
Parameters
S-P
Er
c-P CS-M E - IL ES-p
h,, [, y, [,
= 3.09023 = 22.7610 = 3.29053, h, = 31.1810 = 2.8891, A, = 0.134 = 3.29053, y , = 3.0934, A, = 0.134
Scheme for o
Parameters
S' - o C+ - o CSf -u E+-o ES+ - o
[:
= 16.9238
h: = 3.5069, k,f = 0.055 [: = 18.4668, h: = 3.9897, k: = 0.055
y,f = 1.2198, A: = 0.043 [ : = 18 4668, y,f = 1.3510, A: = 0.043
500.000 500.001 500.001 499.988 499.999
ARLc(1) 500.000 499.993 500.002 500.027 500.033
Joint scheme
No. of Iterations for ARL,,,,(O, 1)
ARL,,n(O, 1)
SS-a CC - o CCS - o EE-o CES - u
--
250.250 272.653 267.309 253.318 252.003
1882 1909 2050 2060
It should be also added that, in the case of the individual combined CUSUM-Shewhart and EWMA-Shewhart schemes for p and for u, all the Shewhart-type constituent charts have in-control ARL equal t o 1000. Therefore:
A few more remarks on the choice of the individual charts parameters ought to be made, namely that most of them were taken from the literature
in order to optimize the detection of a shift in p from the nominal value po to po 1.0 x aO/& and a shift in a from a0 to 1.25 x ao. LLOptimality" here means that extensive numerical results suggest that with such reference values and smoothing constants will produce an individual chart for p (a) with the smallest possible out-of-control ARL, ARL,(l.O, 1.0) (ARL,(1.25)), for a fixed in-control ARL of 500 samples. The upper control limits of the individual charts are always searched in such way that, when no head start has been adopted, they have in-control ARLs close to 500.
+
C - p: we adopted a null reference value because we are dealing with a standard CUSUM, a positive (negative) reference value would suggest that positive (negative) shifts are more likely to occur; also note that this standard individual chart is different from the two-sided scheme which makes use of an upper and a lower one-sided chart for p . a E - p: the smoothing constant A, = 0.134 was taken from Gan (1995) and agrees with Figure 4 from Crowder (1989); a C+ - p: the reference value k t = 0.5 is suggested by Gan (1991) for a two-sided chart for p; a E+ - p: we adopt the same smoothing constant A t = 0.134 as for E - p ; a C+ - a, Ef - a : Gan (1995) suggests a reference value k: = 0.055 and a smoothing constant A: = 0.043, respectively. a
Finally, note that the individual charts C S - p , E S - p , C S + - p , E S + - p , C S + - a, ES+ - a have the same reference values and smoothing constants as C - p, E - p, C+ - p , E+ - p, C + - a, E+ - a . However, the control limits are larger than the corresponding "non-combined" individual schemes and are obtained taking into account the supplementary Shewhart control limits and the in-control matching of the ARLs. We proceed to illustrate the monotonicity properties stated in the previous section with the joint scheme EE+. For that purpose some head starts (HS) have been given to this joint scheme: HS, = 0%, 50% and HS, = 0%, 50%. It is worth recalling that we did not consider HS,, HS, = -50% because both individual schemes are upper one-sided. The results in Table 4 not only show that PMSs of Types I11 and IV can be as high as 0.47 but also remind us of some of the monotonicity properties in Theorem 2, namely that:
a
giving head starts to the individual chart for p, E + - p , leads to an increase of PMSs of Type I11 and a decrease of the PMSs of Type IV; adopting a head start to the individual chart for a, E+ - a, yields a decrease of the values of PMS of Type I11 and an increase of the ones of Type IV; and underestimating the magnitude of the changes in p results in an overestimation of the PMS values of Type IV.
The numerical results also suggest that underestimating the magnitude of the changes in a also leads to an overestimation of the values of the PMSs
of Type I11 (recall Conjecture ( C l ) ) , for most of the values we considered for 9. However, note that P M S I I I ( ~ changes ) its monotonous behavior, for large values of 9, according to the values in bold in Table 4. Table 4. PMSs of Types I11 and IV for the joint scheme EE'.
The second illustration accounts for the comparative assessment of the schemes SS, C C , C C S , EE and C E S , and of the schemes SS+,C C + , C C S + , EE+ and C E S + , with regard to probabilities of misleading signals of Types I11 and IV. Tables 5 and 6 provide values of these probabilities for the former and the latter groups of five joint schemes, respectively. Also, no head starts have been given to any of the individual schemes whose approximate RL has a phase-type distribution. Values of PMSIV(6), for 6 < 0, were omitted from Table 5 by virtue of the fact that the run length RL,(6,1) is identically distributed to RL,(-6,l) for symmetric values of HS,, hence PMSIV(-6) for HS, = -a! x 100% equals P M S I ~ ( S for ) HS, = a! x loo%, where a! E [O,l). Tables 5 and 6 and Figures 1 and 2 show how the use of joint schemes based on CUSUM and E WMA summary statistics can offer substantial improvement with regard to the (non)emission of MSs of Type 111: schemes SS and SS+ tend to produce this type of MSs more frequently for a wide range of the values of 8. We can add that the joint schemes EE and C E S are outperformed by schemes C C and C C S (respectively) in terms of PMSs of Type 111. However, the joint schemes EE+ and C E S + appear t o offer a slightly better performance than C C + and C C S f , having in general lower PMSs of Type 111.
Table 5. PMSs of Types I11 and IV for the joint schemes SS, CC, CCS, E E and C E S (standard case). PMSIII(H) H
SS
CC
CCS
PMSIV(B) EE
CES
d
SS
CC
CCS
EE
CES
Fig. 1. PMSs of Types I11 and IV for the joint schemes SS, CC, CCS, EE and C E S (standard case).
The numerical results in Tables 5 and 6 and Figures 1 and 2 also suggest that the use of joint schemes C C S , CES, C C S + and C E S f instead of CC, E E , CCf and EEf (respectively), causes in general a n increase in PMSs of Type 111. This is probably due to the fact that a joint combined scheme has a total of four constituent charts (instead of the usual two), thus, twice as much sources of MSs. However, note that there are a few instances where combined joint schemes appear to trigger slightly less PMSs of Type I11 than their "non combined" counterparts, for moderate and large values of 8. Tables 5 and 6 and Figures 1 and 2 show that MSs of Type IV are more likely to happen in schemes SS and SS+ than while using the remaining joint schemes for p and a. All these latter joint schemes seem t o have a similar behavior in terms of the frequency of PMSs of Type IV, in particular when the individual charts for p are upper one-sided, as shown by Figures 1 and 2.
Table 6. PMSs of Types I11 and IV for the joint schemes S S f , C C f , E E f , CCSf and CES+ (upper one-sided case).
PMS IV
PMS 111
O,S[
0 5,
I...
,
1.5
9
25
3
e
PMSs of Types I11 and IV for the joint schemes SS+,C C f , CCS', EE' and CESf (upper one-sided case). Fig. 2.
5 Run length to a misleading signal (RLMS) Another performance measure that also springs to mind is the number of sampling periods until a misleading signal is given by the joint scheme, the ) , of run length to a misleading signal (RLMS) of Type 111, R L M S I ~ I ( 0 and Type IV, R L M S I V ( 6 ) .R L M S I I I ( B )and R L M S I V ( 6 )are improper random variables with an atom in +m because the non occurrence of a misleading signal is an event with non-zero probability:
The next lemma not only adds the survival functions of the R L M S of Types I11 and IV but also provides alternative expressions that prove to be useful in the investigation of the stochastic monotonicity properties of this performance measure.
Lemma 3 - Let RLMSIrI(0) and RLMSIV(6) denote the R L M S s of Types 111 and I V of any of the joints schemes i n Table 1. T h e n
(or 6 > 0 when upper one-sided schemes for p are at use), for any positive integer m. Exact expressions for the survival functions of the R L M S of Types I11 and IV of the schemes SS and SS+ are once again obtained by plugging in the survival functions of the run lengths RL, and RL, into equations (9) and (11). Additionally, if we consider the exact expressions of the PMSs of the joint schemes SS and SS+ we get
where FRL,,,c(6,e) (m) = 1 - FRLl,(6,0) (m) x FRL,(@) (m). These alternative expressions of the distribution function of the RLMSs are due to the fact that the constituent Shewhart charts of schemes SS and SS+ deal in any case with (time-)independent summary statistics. Approximations to these performance measures for the remaining joint schemes are obviously obtained by replacing the survival functions in Equations (9) and (11) by the corresponding Markov approximations and Equations (13) and (14) do not hold in this case because of the (time-)dependence structure of their summary statistics. Theorem 4 - T h e stochastic monotonicity properties (1)-(16) in Table 7 hold for the (exact) R L M S s of Types 111 and I V of the joint schemes S S + , C C f , C C S , EE+ and C E S + .
Table 7. Stochastic monotonicity properties for the RLMSs of Types I11 and IV. > 1)
.Joint scheme
T y p e 111 (6 = 0 , 8
C C f ,CCSf
( 1 ) R L M S r r r ( 8 ) 1,t with a ( 2 ) R L M S r r r ( 8 ) T s t with P ( C 4 ) RLMSrrr(O)* ( 3 ) R L M S r r r ( 8 ) Tat with k ; ( 4 ) R L M S r r r ( 0 ) l s t with kf
E E f , CESf
( 5 ) R L M S r r r ( 0 ) l s t with (6) R L M S r r r ( 8 ) Tat with ( C 5 ) RLMSrrr(o)*
*
T y p e IV (6
(8) R L M S r v ( 6 ) tat with a: (9) R L M S r v ( 6 ) Lt with P (10) R L M S r v ( 6 ) f,t with 6 (11) R L M S r V ( 8 ) l,t with k: (12) R L M S r v ( 9 ) T.t with k,f
(14) (15) (16) (C3, C4, C 5 ) . without stochastic monotonous behavior, in O(
P
> 0 , e = 1)
R L M S r v ( 6 ) T,t with a R L M S r v ( 6 ) lat with P R L M S r v ( 6 ) f a + with 6 t h e usual sense, regarding 8
The stochastic monotonicity properties described in Theorem 4 come as no surprise - they point in the opposite direction of the monotone behavior of the corresponding PMSs, except for Conjecture (C3) which refers to the joint scheme S S f that we proved it has decreasing P M S I I I ( 8 ) .These properties are ensured by the stochastic monotonicity properties of RL,(6, 8 ) and RL,(8) and Equations ( 9 ) - ( 1 2 ) . For example, the increasing behavior of the survival function of R L M S I V , s s + ( G ) follows from ( 1 1 ) and the fact that RL,(6, 1) stochastically decreases with 6 . The run length to a misleading signal of Type 111of the joint schemes C C + , C C S + , EE+ and C E S + , R L M S l I I ( 8 ) ,stochastically decreases with a . This conclusion can be immediately drawn from (10) because: the in-control run ), length of a Markov-type scheme for a with a P x 100% head start, ~ L f ( l does not depend on the head start a; and the run length of a Markov-type scheme for p on the presence of a shift in a and with a a x 100% head start, RLE(0,O ) , stochastically decreases with a, thus FRLE(o,e)(i) increases with a for any i. However, this result could not be drawn from (9) because PRL,P(O,B)(i) is not an increasing function of a, although RL:(O,O) stochastically decreases with a . Similarly, to prove that RLMSIv(b) stochastically decreases with P, we have to use ( 1 2 ) instead of ( 1 1 ) . We ought to add that, based on the percentage points of R L M S I I I ( 8 ) numerically obtained in the next section, we conjecture that R L M S I I I ( 8 ) has no stochastic behavior, in terms of 8 , for none of the five upper one-sided joint schemes under investigation.
6 RLMS: numerical illustrations The percentage points of RLMS are crucial because this random variable has no expected value or any other moment; and note that any p x 100% percentage point of probability, for p equal or larger than PMS, is equal to +co,as illustrated by Tables 8-10.
The presentation of the numerical results concerning RLMS follows closely the one in Section 4. Thus, we use the same constellation of parameters for the ten joint schemes and we begin with an illustration of some stochastic monotonicity properties of the RLMS of the joint scheme EE+. Table 8 and the following ones include I % , 5%, lo%, 15%, 20% percentage points of RLMSIII(B) and RLMSrV(6) for a smaller range of B and 6 values: 0 = 1.01,1.03,1.05, 1.10 and 6 = 0.05,0.10,0.20,0.30. Table 8. Percentage points of RLMS of Type I11 and Type IV of the joint scheme EE' (listed in order corresponding t o p x 100% = 1%,5%, lo%, 15%,20% percentage points, for each 0 and each 6); percentage points of RL,,,(O, 6) and RL,,,(6,1) are in parenthesis.
Also, in order to give the user an idea of how quick a misleading signal of Type I11 and Type IV is triggered by a joint scheme, the corresponding percentage points of RL,,,(O, 8) and RL,,,(6,1) have been added in parenthesis to Tables 8-10 and, thus, can be compared to the corresponding percentage points of RLMSIII(B) and RLMSIV (6).
The results in Table 8 illustrate the findings concerning the RLMSs stochastic monotonicity properties of the scheme EE+. The emission of MSs of Type I11 is indeed speeded up by the adoption of a head start H S p ; giving a head start to the individual chart for g has exactly the opposite effect. Besides this R L M S I v (6) stochastically increases with 6, which means that MS of Type IV will tend t o occur later as the increase in p becomes more severe. However, the entries in bold in Table 8 show that a few percentage points of RLMSrII(B) do not decrease with 0. Table 8 also gives the reader an idea of how soon misleading signals can occur. For instance, the probability of triggering a misleading signal of Type I11 within the first 59 samples is of a t least 0.10 when there is a shift of 1%in the process standard deviation and no head start has been adopted for scheme
EE+. Table 9. Percentage points of RLMSs of Type I11 and Type IV (listed in order corresponding, to p x 100% = 1%,5%, lo%, 15%,20% percenbage points, for each 8 and each 6); percentage points of RL,,,(O, 8) and RL,,,(6,1) are in parenthesis. F-1 ~ ~ (~8 ) "s) (FRL: , V
FR - IL M S ~ ~ ~ ( ~ ( F) R( l P f . ,) L , O ( O . ~ ) ( ~ ) ) H
SS
CC
CCS
EE
CES
6
SS
CC
CCS
EE
CES
Tables 9 and 10 allow us to assess and compare the performance of all the joint schemes in terms of number of sampling periods until the emission of misleading signals. The examination of these two tables leads to overall conclusions similar t o those referring to the PMSs. According to the percentage points in Table 9, the joint scheme SS tends to produce MSs of Type I11 considerably sooner than scheme CC, and the schemes EE, C C S and C E S appear to trigger them later than scheme SS. This probably comes by virtue of the fact that P M S I r ~ ( 0 )of scheme SS is larger than the corresponding PMSs of the remaining joint schemes, as men-
tioned in Section 4. Note, however, that the percentage points of R L M S I I ~ ( 0 ) for schemes SSf, C C f , C C S f , EE+ and C E S + differ much less than when standard individual charts for p are a t use, as apparent in Table 10. Table 10. Percentage points of RLMSs of Type 111 and Type IV (listed in order corresponding to p x 100% = I%, 5%, lo%, 15%,20% percentage points, for each 9 and each 6 ) ; percentage points of RL,,,(O, 6) and RL,,,(6,1) are in parenthesis.
The schemes C C and C C S tend to require more samples to trigger MSs of Type 111 than joint schemes EE and C E S , respectively. O n the other hand, scheme C C f appears to give such MSs almost as late as scheme EE+. The same seems to hold for schemes C C S + and CES+. Supplementing Shewhart upper control limits to the individual CUSUM charts for p and 0 of the joint C C scheme seems t o substantially speed up the emission of M S of Type 111. The same comments do not stand for schemes EE and C E S , or C C + and CCS+, or even EE+ and C E S f . A brief remark on the percentage points of RLMSIII,ss+ (0): all the values suggest that this random variable stochastically decreases as the shift in the process standard deviation increases. However, recall that we proved that PMSIII,ss+ (0) decreases with 0. Thus, if we had considered larger values of 0, we would soon get percentage points equal to +oo instead of smaller ones. In addition, note that Conjectures (C4) and (C5) concerning the remaining schemes C C + , C C S + , EE+ and C E S + are conveniently supported by the numerical results in bold in Table 10. As for RLMSs of Type IV, it is interesting to notice that the joint schemes SS and SS+ do not show such a poor performance when compared to their
Markov-type counterparts CC, EE, C C S , C E S and C C f , EE+, C C S + , C E S f , respectively. Also, joint schemes C C , E E , C C S and C E S seem to have similar performance as far as RLMSIV(G) is concerned, as suggested earlier by the PMSIV(G) values in Table 5 and by Figure 1. The same appears to happen with schemes C C + , EE+, C C S f , C E S + . In addition, the use of combined schemes yields in general smaller values for the percentage points of RLMSIV(S); that is, the MSs of Type IV tend to be triggered sooner. It is worth mentioning that the RLMS is important to assess the performance (in terms of misleading signah) of schemes for p and a based on univariate summary statistics, such as the Shewhart type scheme proposed by Chengalur et al. (1989) whose statistic is n-I (*)2. This comes by virtue of the fact that in such cases we cannot define PMS. The RLMSs of those schemes are particular cases of the run length of the joint scheme itself:
x:=l
7 Final remarks The numerical results obtained along this paper suggest that the schemes SS and SSf compare unfavorably to the more sophisticated joint schemes C C , C C f , etc., in terms of MSs of both types, in most cases. Thus, the SS and SSf schemes are far from being reliable in identifying which parameter has changed. This is the answer for St. John and Bragg's (1991) concluding question: (Misleading signals can be a serious problem for the user of joint charts.) Would alternatives (EWMA or CUSUM) perform better in this regard? Tables 5 and 6 give the distinct impression that joint schemes for p and a can be very sensitive to MSs of both types: the values of PMSs are far from negligible, especially for small and moderate shifts in p and a, thus, misidentification of signals is likely to occur. The practical significance of all these results will depend on the amount of time and money that is spent in attempting to identify and correct nonexisting causes of variation in p (u),i.e., when a MS of Type I11 (IV) occurs. No monotonicity properties results have been stated to the PMSs and the RLMSs of the joint schemes CC, C C S , EE and C E S . This is due to the fact that the constituent individual Markov-type schemes for p are not associated with stochastically monotone matrices (as in the upper one-sided case), an absolutely crucial characteristic to prove the stochastic monotonicity
results of the RLs and, thus, the monotonicity properties of the PMSs and the stochastic monotonicity properties of the RLMSs. Finally, we would like to refer that Reynolds Jr. and Stoumbos (2001) advocate that the probabilities of misleading signals are useful to diagnose which parameter(s) has(have) changed, and suggest the use of the pattern of the points beyond the control limits of the constituent charts in the identification of the parameter that has effectively changed. A plausible justification for this diagnostic aid stems from the fact that changes in p and a have different impacts in those patterns. Table 11. Summary statistics and control limits of the individual schemes for p. Chart for p
Summary statistic
Appendix: Individual control schemes for p and a Let ( X I N , .. . , X n N ) denote the random sample of size n at the sampling period N ( N E IN). XN = n-I Cy=lX Z ~SN2 , = (n--l)-l C:=, (XiNdXN)2 and ZN = fi x ( X N - pO)/oOrepresent the sample mean, sample variance and nominal standardized sample mean, respectively. According to this, we define the summary statistics and control limits of the individual schemes for p and a in Table 11 and Table 12 (respectively), preceded by the corresponding acronym. Some of the summary statistics of these individual control schemes are trivial ( S - p, S+ - p, S+ - a ) or can be found, such as presented here or with a slight variation, in Montgomery and Runger (1994, p.875) ( C - p), Lucas and Saccucci (1990) ( E - p and ES - p), Lucas and Crosier (1982) (C+ - p ) , Yashchin (1985) (CS+ - p), Crowder and Hamilton (1992) ( E + - a ) , Gan(1995) (C+-o), or are natural extensions of existing ones (CS-p, E + - p , E S + ' - p , C S + - u , ES+ - 5 ) . Table 12. Summary statistics and control limits of the individual schemes for a. Scheme for o Summary statistic
E+ - a ES+
-o
W;N
=
N =0
,,zu,:
max {ln(ao2), (1 - A:)
x
w ; ~ -+~A:
x ln(sN2)), N
>0
wTN and s:+
Scheme for o Control limits
Note that, with the exception of the charts S - p, S+ - p and S+- u, an initial value has to be considered for the summary statistic of the standard schemes for p ( C - p, C S - p, E - p , ES - p) and the individual upper one-sided schemes for p and a (C+ - p, CS+ - p, E+ - p, ES+ - p, and C + - 5 , C S + - a , E+ - a , E S + - 0 ) . Let (LCL UCL)/2 a x (UCL - LCL)/2 (LCL a x (UCL - LCL)) be the initial value of the summary statistic of the standard schemes for p
+
+
+
(upper one-sided schemes for ,u and a ) , with a E ( - 1 , l ) ( a E [0,1)). If E (-1,0) U ( 0 , l ) ( a E ( 0 , l ) ) a n a x 100% head start (HS) has been given to t h e standard schemes for ,u (to the individual upper one-sided schemes for p and a); no head start has been adopted, otherwise. T h e adoption of a head start may speed up t h e detection of shifts by t h e control scheme a t start-up and also after a restart following a (possibly ineffective) control action (see Lucas (1982), Lucas a n d Crosier (1982), Yashchin (1985) and Lucas and Saccucci (1990)). T h e control limits of the individual charts for p and a are also in Table 11 and Table 12, respectively. It is worth adding t h a t the control limits of the individual EWMA charts for p and a are all specified in terms of the exact asymptotic standard deviation of W P , ~ and Wu,N = (1 - Au) x W U , ~ - 1 A, x ln(SN2) (see Lucas and Saccucci (1990) a n d Crowder a n d Hamilton (1992), and recall t h a t these latter authors used a n infinite series expansion t o approximate the trigamma function $'(.)).
a
+
References 1. Brook D, Evans DA (1972) An approach to the probability distribution of CUSUM run length. Biometrika 59:539-549. 2. Chengalur IN, Arnold JC, Reynolds Jr. MR (1989) Variable sampling intervals for multiparameter Shewhart charts. Communications in Statistics - Theory and Methods 18: 1769-1792. 3. Crowder SV (1989) Design of exponentially weighted moving average schemes. Journal of Quality Technology 21:155-162. 4. Crowder SV, Hamilton MD (1992) An EWMA for monitoring a process standard deviation. Journal of Quality Technology 24:12-21. 5. Daley DJ (1968) Stochastically monotone Markov chains. Zeitschrift Wahrscheinlichkeitstheorie werwandte Gebiete 10:305-317. 6. Gan FF (1989) Combined mean and variance control charts. ASQC Quality Congress Transactions Toronto, 129- 139. 7. Gan FF (1991) Computing the percentage points of the run length distribution of an exponentially weighted moving average control chart. Journal of Quality Technology 23:359-365. 8. Gan FF (1995) Joint monitoring of process mean and variance using exponentially weighted moving average control charts. Technometrics 37:446-453. 9. Kalmykov GI (1962) On the partial ordering of one-dimensional Markov processes. Theory of Probability and its Applications 7:456-459. 10. Lucas JM (1982) Combined Shewhart-CUSUM quality control schemes. Journal of Quality Technology 14:51-59. 11. Lucas JM, Crosier RB (1982) Fast initial response for CUSUM quality-control schemes: give your CUSUM a head start. Technometrics 24:199-205. 12. Lucas JM, Saccucci MS (1990) Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 32:l-12. 13. Montgomery DC, Runger JC (1994) Applied Statistics and Probability for Engineers. John Wiley & Sons, New York.
14. Morais MC (2002) Stochastic Ordering in the Performance Analysis of Quality Control Schemes. PhD thesis, Department of Mathematics, Instituto Superior TBcnico, Lisbon, Portugal. 15. Morais MC, Pacheco A (2000) On the performance of combined EWMA schemes for p and o:A Markovian approach. Communications in Statistics - Simulation and Computation 29:153-174. 16. Morais MC, Pacheco A (2001a) Some stochastic properties of upper one-sided x and EWMA charts for ,LL in the presence of shifts in u.Sequential Analysis 20(1/2):1-12. 17. Morais MC, Pacheco A (2001b) Misleading signals em esquemas combinados EWMA para p e a (Misleading signals in joint EWMA schemes for p and a ) . In Olivcira P, Athayde E (eds) Um Olhar sobre a Estatistica:334-348. Ediq6es SPE. In Portuguese. 18. Rcynolds Jr. MR, Stoumbos ZG (2001) Monitoring the process mean and variance using individual observations and variable sampling intervals. Journal of Quality Technology 33:181205. 19. Shaked M, Shanthikumar JG (1994) Stochastic Orders and Their Applications. Academic Press, London. 20. St. John RC, Bragg DJ (1991) Joint X-bar & R charts under shift in mu or sigma. ASQC Quality Congress Transactions - Milwaukee, 547-550. 21. Woodall WH, Ncube MM (1985) Multivariate CUSUM quality-control procedures. Technometrics 27:285-292. 22. Yashchin E (1985) On the analysis and design of CUSUM-Shewhart control schemes. IBM Journal of Research and Development 29:377-391.
The Frechet Control Charts Edyta Mr6wka and Przemyslaw Grzegorzewski Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01-447 Warsaw, Poland mrowkaQibspan.waw.pl,pgrzegBibspan.waw.pl
Summary. A new control chart based on the Frkchet distance for monitoring simultaneously the process level and spread is suggested. Our chart reveals interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than - S control chart under simultaneous disturbances in the process level and spread.
1 Introduction Statistical process control (SPC) is a collection of methods for achieving continuous improvement in quality. This objective is accomplished by a continuous monitoring of the process under study in order to quickly detect the occurrence of assignable causes and undertake the necessary corrective actions. One of the most popular S P C procedures are Shewhart - S control -for monitoring the process level and spread. chart X control chart contains three lines: a center line (CL) corresponding to the process level and two other horizontal lines, called the upper control limit (UCL) and the lower control limit (LCL), respectively. When applying this chart one draws samples of a fixed size n at specified time points, then he computes an arithmetical mean of each sample and plots it as a point on the chart. As long as the points lie within the control limits the process is assumed to be in control. However, if a point plots outside the control limits (i.e. below LCL or above UCL) we are forced to assume that the process is no longer under control. One will immediately intervene in the process in order to find disturbance causes and undertake corrective actions to eliminate them. The construction of S control chart is similar, however its center line is equal to the process standard deviation and we plot on this chart points corresponding to sample standard deviations computed for consecutive samples (see, e.g., Mittag, Rinne, 1993; Montgomery, 1991; Western Electric, 1956). In the present paper we suggest another approach for designing control charts based on goodness-of-fit tests. Then we propose a particular realization of that idea, i.e. a new control chart based on the Frkchet distance for monitoring simultaneously the process level and spread. Our chart, called, the Frkchet control chart, is very simple in use. Moreover, a simulation study
shows that the FrCchet control chart reveals quite interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than 5 f - S control chart under simultaneous disturbances in the process level and spread. The paper is organized as follows: the idea of designing control charts using goodness-of-fit tests is proposed in Section 2. Section 3 is devoted to the FrCchet distance between distributions. Then, in Section 4 we show how to construct a goodness-of-fit test based on the FrCchet distance. In section 5 we design the FrCchet control chart and in Section 6 we discuss some statistical properties of the suggested control chart.
2 Control charts and statistical tests Suppose that the process under consideration is normally distributed. Let us first assume that we know the parameters of the process (i.e. its mean po and standard deviation ao) when the process is thought to be in control. In such a case the traditional Shewhart X control chart is equivalent to the two-sided hypothesis testing problem for the mean, i.e. H : p = po against K : p # po. Similarly, S control chart is equivalent to the two-sided hypothesis testing problem for the standard deviation, i.e. H : a = a 0 against K : a # DO. Therefore, 5 f - S control chart is equivalent to the following testing problem H : p = p~ &
a = (TO
K :1H
(i.e. the process is in control) (i.e. the process is out of control).
(1)
In this case we will accept the null hypothesis if ~0 and a0 fall into the acceptance area designed by the appropriate confidence intervals. Assuming a = 0.0027 significance level we get 99.27% confidence intervals for p and a with borders corresponding to limits of the control chart (see Figure 1). 0
LCL,
-
.. .
LCL,
_
_ _-
CL,
CLx
I UCLC
_-
UCL,
CI
--
, \ acceptance region
Fig. 1. Acceptance area of
- S control chart
It is easily seen that the process is thought to be in control as long as points (Ti,Si), corresponding to successive samples fall into the following area
(2) It is quite obvious that points Si) located close to the central point ( C L ~CLs) , of this area are treated as the evidence that the process is in control. Unfortunately, according to Figure 1 the process is also thought to be in control if both its mean and standard deviation are quite far from this central point and close to the "corners" of the area (2), i.e. ( U C L ~UCLs), , (LCLY, UCLs), (LCLy, LCLs) and (UCLF, LCLS). This unpleasant feature of - S control chart inclined us to design such a statistical device that would behave in general similarly as - S control chart but which would be more sensitive for the simultaneous disturbances of the process level and spread than this control chart. under study X , and let Fo be a Let F denote a distribution of a distribution of the process Xo which is assumed to be in control. To check if the process under study is in control one may also use a goodness-of-fit test. Then our problem could be expressed as a problem of testing following hypotheses H:F=Fo K : F # Fo. (3)
(x,
Assuming that the test statistic T is a distance between X and X o l i.e.
then we get the one-sided critical region of the form
where d, is a critical value such that
and a is an accepted significance level. If we apply such goodness-of-fit test for designing control chart, we obtain a new SPC tool with only one upper control limit corresponding to the critical value of that test for given significance level a. The idea how to operate such a new control chart is very simple and could be described as follows: - draw a sample X I , . . . , X,. Then compute a value of the test statistic
T ( X l 1 .. . , X,) and plot it as a point on the control chart;
- as far as points fall below the control limit the process is thought to be in control. A point above the control limit is interpreted as evidence that the process is out of control (see Figure 2).
Fig. 2. Control chart based on a goodness-of-fit test
Having in mind that our device should be simple for use we are looking for such a distance measure that enables us to construct a control chart that fulfill following requirements: - our control chart based on a goodness-of-fit test could be applied instead
of
51 - S control chart;
- our controI chart based on a goodness-of-fit test should be more sensitive than - S control chart under simultaneous disturbances of the process
X
level and spread; - our control chart based on a goodness-of-fit test should behave comparably with control chart under disturbances of the process level only; - our control chart based on a goodness-of-fit test should behave comparably with S control chart under disturbances of the process spread only; - control chart based on a goodness-of-fit test should be simple in use and should not require more observations and calculations than r? - S control chart. Now the crucial point is to find an appropriate distance measure which could be used as a test statistic. Many distances between distribution are considered in the literature (see, e.g., Gibbis and Su, 2002; Sobczyk and Spencer, 1992). In the next section we propose how to design a desired control chart based on the Frkhet distance. It is also worth noting that there exist some other approaches for monitoring the mean and variance using single statistic or circular boundaries, like Reynolds et al. (1981), Cox (1996), Chao and Cheng (1996) or He and Grigoryan (2006).
3 The F'rechet distance Let X and Y denote two random variables with distribution functions F and G, respectively, having first and second moments. Then the Frkchet distance between these two distributions is defined by (Frkchet, 1957)
dF (F,G ) = d~ (X, Y) =
J
min E IU - v12,
u.v
where the maximization is taken over all random variables U and V having distributions F and G, respectively. In the particular case when F and G belong to a family of distributions closed with respect to changes of location and scale, the FrCchet distance takes the simple form (see Dowson and Landau, 1982)
where p x , p y , O X , uy denote means and standard deviations of random variables X and Y, respectively. For univariate distributions the FrCchet distance can be graphically interpreted as the length of the hypotenuse in a right triangle with legs p~ - p~ and o x - g~ (see Radi and Nyquist, 1995) which is shown in Figure 3.
Fig. 3. Graphical interpretation of the F'rBchet distance
It is worth noting that a family of normal distributions is closed with respect to changes of location and scale. Hence the FrCchet distance between two normal distributions with parameters p x , p y , O X , O Y , respectively, is given by (8). It is known the normal distribution is completely characterized by two parameters - its mean and standard deviation - and we find these very parameters in (8). Thus the FrCchet distance utilizes the whole information on normal random variables. Recalling that in SPC we traditionally assume normality of the process under study, the FrCchet distance seems to be an excellent candidate for the distance measure that could be used for designing control chart suggested in the previous section.
4 Goodness-of-fit test based on the Frechet distance Suppose that the process in control X o is normally distributed N (PO,00). Moreover, let X I , . . . X n denote a random sample from the process X which is also normally distributed N ( p ,0 ) . TO decide whether the process under study is in control we consider a following hypotheses testing problem
H : X -- N (PO,0 0 ) K:iH
(i.e. the process is in control) (i.e. the process is out of control)
To verify these hypotheses we construct a goodness-of-fit test based on the Frhchet distance between the distribution of the process in control and the empirical distribution of the process under study. Namely, we define the test statistic as follows
x
where and Sx denote a sample average and a sample standard deviation, respectively, i.e.
Then, assuming the significance level a, we accept the null hypothesis H, which is equivalent to the statement that the process under study is in control, if the test statistic (10) does not exceed the critical value d, taken from the following equation
The acceptance area of the hypothesis H is given in Figure 4. It is seen immediately (compare Figure 4 and Figure 1) that samples having the average and standard deviation significantly different than values given in the null hypothesis lead to rejection H. In the context of the statistical process control such a situation is interpreted as evidence that the process is out of control. In order to compute critical values d, of the desired test let us assume that Z1,. . . Zn are realizations of a random variable Z from the standard normal distribution N ( 0 , l ) . Then the Frkchet distance (10) between the empirical distribution of the random variable Z and Zo -- N ( 0 , l ) is given by
I
acceptance region
Fig. 4. The acceptance area of the goodness-of-fit test based on the Frkchet distance
where 'Z and Sz denote a sample average and a sample standard deviation, respectively, from the sample Z1,. . . 2,. The critical value d, corresponding to the significance level a fulfills the following equation
P (d2, (Z,ZO) > d,) = a.
C'
(15)
Suppose that the null hypothesis H given by 9 holds, i.e. X I , . . . Xn N (PO,00). Then after the standardization Zi = we get 21,. . . Zn N ( 0 , l ) . Hence z=-x - P o (16) 'Jo and S z = - sx . 'Jo Substituting (16) and (17) into (14) we get N
Therefore, by (lo), our test statistic could be expressed in a following form
and finally we will reject H if T ( X l , . . . , X n ) > '~idcx, where d, is obtained from (15).
5 The Frkchet control chart As it was mentioned in Section 2, control charts based on goodness-of-fit tests have only one control limit. In the case of the control chart utilizing the FrCchet distance, called the Frhchet control chart, this very control limit (the upper one) has the following form
where 002 is a variance of the process in control, while d, is a critical value obtained from (15). Since the process variance a; is very often unknown, we estimate it bv
which is the average of sample variances 5': obtained for preliminary samples X t l , . . . , X,,, i = 1 , .. . , m taken when the process is thought to be in control, i.e. 1 " S: = - x ) ~ , i = l,...,m. (23)
-
C(X,,
3=l
Since control limits of the traditional Shewhart control charts are &sigma limits which correspond to the significance level a = 0.0027, thus we also construct the FrCchet control charts for the same significance level. One should also remember that critical values d, depend not only on a but on a sample size n as well. Accepting the convention applied for the notation in the - S or X - R control charts, we treat the critical value d, as Shewhart a coefficient that depend on the sample size and we denote it by G2. Thus finally the control limit of the FrCchet control chart is given by
where factors G2 for some sample sizes are tabulated in Table 1. Now, after designing the Frkchet control chart we may plot points corresponding to values T ( X I , . . . , X,) given by (10) and obtained from the consecutive samples. Since the parameters po and - a0 of the process - in control are usually unknown, we estimate them by fl: and 3,where X given by
is the arithmetic mean of the averages obtained for the preliminary samples X i l , , . . , Xi,, i = 1,.. . , m , i.e.
Table 1. Factors Gz for constructing the Frkchet control chart.
while
3 given by
is the average of sample standard deviations Si obtained for preliminary samples taken when the process is thought to be in control, i.e.
Hence we plot on the chart points computed from the formula
An algorithm given below shows in a brief way how to apply the Frkchet control in practice:
1. Using preliminary-samples, taken when the process is thought to be in control, compute and 3. 2. Find the control limit U C L F . 3. Take a sample X I , . . . ,Xn.4. For given sample compute Xi and Si. 5. Compute df =
(xi-R)' + (s,-3)'.
6. Plot point d7 on the control chart. 7. If d: < U C L p then go back to step 3. Otherwise go to step 8. 8. Alarm showing that the process is probably out of control.
9. Investigation and corrective action is required to find and eliminate the assignable cause or causes responsible for this behavior. 10. Go back to step 1.
6 Statistical properties of the Frkchet control chart In this section we discuss briefly some statistical properties of the Frkchet control chart and compare them with the properties of the traditional F- S control chart. We have performed a broad simulation study to check whether the requirements given in Section 2 are fulfilled. The operating-characteristic curves for the Fr6chet control chart, 5? control chart and S control chart under disturbances of the process level p only are shown in Figure 5. The operating-characteristic curves for these charts under disturbances of the process spread only are shown in Figure 6. Finally, in Figure 7 we show the OC curves for these three charts under simultaneous disturbances of the process level and spread.
- Freche chart o
0.5
1
1.5
2
2.5
38
Fig. 5. The OC curves under disturbances of p for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.
----- Frechet chart -.- X chart
06
P
\
- - S chart
04
\
02
.'
02
0
o
2
4
5 8
o
2
I
48
3
Fig. 6. The OC curves under disturbances of u for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.
d) n=20 0 6
P
chart -.- XFrechet chart --- S chart
0 4
0 1
o
2
I
6
a 8
Fig. 7. The OC curves under disturbances of p and u for samples of size a) n = 5, b) n = 10, c) n = 15 and d) n = 20.
It is seen that under disturbances of the process level the Frkchet control control chart while under disturbances of chart behaves comparable with the process spread only the Frkchet control chart behaves comparable with S control chart. However, it is evident that under simultaneous disturbances of the process level and spread the Frhchet control chart behaves better than the traditional and S control charts.
7 Conclusions We have proposed a new method for designing control charts based on goodness-of-fit tests. The construction of these charts utilizes distance measures between distributions. In the paper we have investigated the so-called the Frkchet control chart, based on the Frkchet distance for simultaneously monitoring the process level and spread. Our new control chart is very simple in use and reveals quite interesting statistical properties: its behavior is comparable with traditional control charts if changes in process level only or changes in process spread only are observed, but it is much better than X - S control chart under simultaneous disturbances in the process level and spread. However, the idea of designing control charts through goodness-of-fit tests does not reduce to the Frhchet distance. One may try t o design control chars utilizing other distances between distributions as well. Moreover, the idea is not restricted t o charts for variables. Thus our investigations in this directions will be continued in further papers.
References 1. Chao M.T., Cheng S.W. (1996), Semicircle Control Chart for Variables Data, Quality Engineering 8: 441-446. 2. Cox M.A.A. (1996), Implementing the Circle Technique Using Spreadsheets, Quality Engineering 9: 65-76. 3. Dowson D.C., Landau B.V. (1982), The Re'chet distance between multivariate normal distributions, J . Mult. Anal. 12, 450-455. 4. Gibbis A., Su F. (2002), On choosing and bounding probability metrics, Cite Seer, NEC Research Institute. 5. FrBchet, M. (1957), Sur la distance de deux lois de probabilite, C. R. Acad. Sic. Paris 244: 689-692. 6. Hadi A.S., Nyquist H. (1995), R & h e t distance as a tool for diagnosing multivariate data, Proceedings of the Third Umea - Wuerzburg Conference in Statistics, Umea. 7. He D. and Grigoryan A. (2006), Joint statistical design of double sampling and s charts, Eur. J . Oper. Res. 168: 122-142. 8. Mittag H.J., Rinne H. (1993), Statistical Methods of Quality Assurance, Chapman and Hall.
9. Montgomery D. (1991), Statistical Quality Control, John Wiley & Sons, INC., New York. 10. Reynolds M.R., Jr., Ghosh B.K. (l98l), Designing Control Charts for Means l Transactions, San Francisco, pp. 400and Variances, ASQC ~ n n u a Congress 407. 11. Sobczyk K., Spencer B.E. (1992), Random fatigue: from data to theory, Academic Press. 12. Western Electric (1956), Statistical Quality Control Handbook, Wester Electric Corporation, Indianapolis Ind.
Reconsidering Control Charts in Japan Ken ~ i s h i n a ' Kazuyoshi , ~ u z u ~Naru a ~ ,~ s h i i '
'
Nagoya Institute of Technology, Department of Techno-Business Administration, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan [email protected], [email protected] SQC Consultant, Ohaza-Makihara, Nukat-cho, Aichi 444-3624, Japan [email protected]
Summary. In this paper, the role of control charts is investigated by considering the practice of using Shewhart control charts from the following viewpoints:
1) Which rules should be used for detecting an assignable cause? 2) What should be considered as chance cause? 3) What should be selected as control characteristics? Woodall (2000) indicated that it is very important to distinguish between the use of control charts in Phase 1 and Phase 2. Considering the different aims of control charts in Phase 1 and Phase 2, we suggest how to use a set of eight pattern tests specified in I S 0 8258: Shewhart control charts The improved machine performance by advanced production engineering results in a reduction of variability within a sub-group and a relatively high process capability. Despite, there are frequently alarms when using Shewhart control charts and this is one of the reason why control charts are not used in Japanese industry. For decreasing the false alarm rate, we propose to include a part of the variability between sub-groups into the variability due to chance causes. The complexity of quality characteristics to be assured has grown with the complexity of products. On the other hand, the advances in measurement technology enable to monitor almost every characteristics. Based on a case study we show that the characteristic to be assured must not necessarily be similar to the characteristic to be monitored.
1 Introduction In Japan "must-be quality" has been regarded as important as "attractive quality" because especially during the last few years some quality issues regarding must-be quality have appeared in industry. Therefore, the time has come to reconsider the fundamental quality activities, although, Japan is proud to be looked upon as a Quality Nation. While control charts are considered to be one of the main methodologies to assure a good must-be quality, they disappeared in Japanese industry long ago. Lately, however, some Japanese companies have reconsidered SPC activities by using Shewhart control charts. About 80 years have passed since Dr. Shewhart proposed control charts, and since then the industrial environment has undergone vast changes. Production engineering and production systems have advanced remarkably and, hence, it is worthwhile to reconsider the use of control charts in the light of such changes. In this paper, the role of control charts in SPC activities is confirmed by considering the practical approach using Shewhart control charts from the following viewpoints: 1) Which rules should be used for detecting an assignable cause? 2) What should be considered as a chance cause? 3) What should be selected as control characteristics?
2 Which rules should be used for detecting an assignale cause? 2.1 Background
I S 0 8258: Shewhart control charts (1991) specifies a set of eight pattern tests for assignable causes (see Fig. l), which are originated from the Western Electric Statistical Quality Handbook (1956). Nelson (1984) presented some notes on the tests. But I S 0 8258 does not explain how to use them depending on the process stage and situation. At the I S 0 voting stage, Japan stood against the I S 0 draft (DIS) arguing that the set of eight pattern tests should not be specified in the formal text because such specific rules, once published, might be arbitrarily interpreted. In the Japanese Industrial Standard (JIS), the content of the revised JIS 9021 of 1999 is consistent with I S 0 8258; however, JIS 9021 explains in a Remarks column that the set of eight pattern tests must not be regarded as specified rules but rather as a kind of guideline.
Some practitioners are concerned about the question which of the tests should be used, because in case of many tests the probability of Type I error may become too large. On the other hand, if only the 3-sigma rule is applied, then the power of test may be small. Before considering the right way to use the rules of control charts in process control, the statistical properties of control charts have to be analyzed. It is important to show how these tests affect the procedure of process control.
UCL
P O T E ] The chart is portioned into six zones of equal width (one sigma) between the upper control limit (UCL) and the lower control limit (LCL), which are labeled A, B, C, C, B, A. Test 1 : One point beyond zone A Test 2: Nine points in a row on zone C or beyond Test 3: Six points in a row steadily increasing or decreasing Test 4: Fourteen points in a row alternating up and down Test 5: Two out of three points in a row in zone A or beyond Test 6: Four out of five points in a row in zone B or beyond Test 7: Fifteen points in a raw in zone C above and below central line Test 8: Eight points in a row on both sides of central line with none in zone C Fig. 1: Tests for assignable causes (IS0 8258)
2.2 Role of control charts in each phase of process control
It is important to understand the procedure of process control by means of control charts. At the start (Phase 1 - see, for example, Woodall and Montgomery (1999)) the process has to be brought into a stable state with a high process capability. In Phase 1, the control charts are used without given standard value for making a retrospective analysis. During the second phase (Phase 2) the process is monitored using control charts with given standard value. The determination of the standard value is a key factor of Phase 2 as it is closely connected with the question what should be considered as chance cause. This question is discussed in the next chapter. Woodall (2000) indicated that it is very important to distinguish between use of a control chart in Phase 1 and Phase 2. In Phase 1 the control chart should be used as a tool for exploratory data analysis, because in this phase the process has to be brought into a stable state with high process capability. On the other hand, in Phase 2, control charting can be considered as repeated hypothesis testing aiming at preserving the process capability. Woodall (2000)'s paper is accompanied by various discussion contributions and some of them deal with Phase 1 and Phase 2. We agree with Woodall's opinion basically; however, if we distinguish the procedure of process control from the viewpoint of whether the control chart is used as a tool for exploratory analysis or hypothesis testing, we think that Phase 2 should be divided into two phases, Phase 2-1 and Phase 2-2. Phase 2-1 is similar to Phase 2 of Woodall (2000); the control chart is used for repeated hypothesis testing and runs from the start of process monitoring until an out of contol state is detected. Phase 2-2 is defined as the subsequent period of searching for the assignable causes. In Phase 2-2 the control chart is used as a tool for exploratory data analysis for identifying the type of assignable cause. The change pattern can provide valuable information about the assignable cause and, additionally, it is necessary to estimate the change-point. 2.3 How to use the rules in each phase
Starting from the discussion in Section 2.2, we next investigate the different use of the rules of control charts in Phase 1, Phase 2-1 and Phase 2-2. As mentioned in Section 2.2, in Phase 1 the control chart is a tool for exploratory data analysis aiming at achieving a stable state for the process and arriving at a sufficiently good process performance. Therefore, it is of utmost importance to pinpoint quickly improvement factors and, clearly, the set of eight tests in I S 0 8258 is very helpful in doing so.
On the other hand, in Phase 2 the control chart may be regarded as repeated hypothesis testing and the focus lies upon the probability of type I error. Phase 2 means that the process has entered mass production and the routine phas has started. Therefore, any alarm released by the control chart represents a serious matter and, hence, a very small probability of Type I error is required. Test I in I S 0 8258 marks the original Shewhart control chart and is called the 3-sigma rule. Any supplementary rule applied in addition to the 3-sigma rule increases inevitably the probability of a Type I error. The 3-sigma rule is an omnibus test. If it is desirable to detect a relatively small shift and/or a trend in the process mean then the use of a supplementary rule is helpful and the question arises which tests should be selected as supplementary rules. Champ and Woodall (1987) and Davis and Woodall (1988) have analyzed the performances of Test 2, Test 5 and Test 3. In this paper we investigate the performances of Test 2, Test 3, Test 5 and Test 6 as supplementary rules. The alarm rate at time t, a ( t ) , is defined as the probability of an alarm at time t given no alarm prior to time t, i.e.,
where T stands for the (discrete) random time until the control chart triggers an out-of-control signal (see Margavio et. a1 (1995)). Fig. 2 shows the false alarm rates of Test 2, Test 3, Test 5, Test 6 and of Cusum charts, where each rule is used together with the 3-sigma rule. They are obtained on the assumption of standard normality. As mentioned earlier, in Phase 2 the probability of a Type I error is of vital importance because Phase 2 corresponds to routine mass production. Often, only the 3-sigma rule is applied, but if particular change patterns in a process shall be detected quickly, then supplementary rules are helpful. However, any too large increase of the probability of Type I error would be counterproductive. The calculations for Test 2, Test 5 and Test 6 in Fig. 2 are performed using a Markov chain approach (see Champ and Woodall (1987)) and those for Test 3 are done by means of a Monte Carlo simulation. To this end 60,000 simulation runs were performed using Numerical Technologies Random Generator for Excel (NtRand) Version 2.01. From Fig. 2 it can be seen that Test 2 performs very bad at time t=9, and Test 6 performs almost everywhere worse that the other rules. On the other hand, the false alarm rate of Test 3 is nearly equal to the 3-sigma rule. This suggests that Test 3 is closely associated with the 3-sigma rule. Davis and Woodall (1988) did not recommended Test 3, because Test 3 does not significantly improve the detection power for trend in process mean. It is well known that the other rules,
except Test 3, improve the performance (see, for example, Champ and Woodall (1987)). Fig. 2 also contains the false alarm rate of a Cusum chart with parameters 5.0 and 0.5 as supplementary rule. It can be seen that the pattern of Test 5 is similar to that of the Cusum chart. The results shown in Fig. 2 suggest to select Test 5, i.e., "2 out of 3 rule" with 2-sigma-warning limits from the rules in given I S 0 825. If, for example, by computer aided SPC, it is possible to implement Cusum charts, then we recommend them as a supplementary rule for Shewhart control charts with 3sigma control limits. In Phase 2-2 the control charts aim at searching for and identifying assignable causes. Thus, the following information are helpful: 1) The pattern characterizing the process change: shift or trend. 2) The time of the process change. In Phase 2-2 a control chart should also be used as a tool for exploratory data analysis and actually the rules in IS0 8285 except Test 7 are helpful for detecting the pattern of change. Additionally, Cusum charts may be used for the changepoint (see Nishina (1996)). The above discussions suggest to distinguish three phases in process control: Phase I , Phase 2-1 and Phase 2-2. A control chart is used as a tool for exploratory data analysis in Phase 1 and Phase 2-2, and for repeated hypothesis testing in Phase 2-1.
2 -o- Test 3 -A- Test 5 -++ Tets 6 +Cusum --t Test
0.003 -
0.002 -
0.001
0
-
0 t 0
Fig. 2: False alarm rate
5
10
15
20
3 What should be considered as a chance cause? 3.1 Background
One of the reasons for not using control charts in Japanese industry is the common understanding that false alarms are released too often by Shewhart control charts. As is generally known, control charts are used for visualizing variability due to chance causes and for visualizing the change in variability due to the occurrence of an assignable cause. The problem is to identify the chance causes. Chance causes are related to the inherent variability exhibited by the " 4 M (Machine, Material, Man, Method), which is uncontrollable from the technical, economic and organisational viewpoint. Dr. Shewhart proposed a suitable way to fix the amount of variability due to chance causes. The method is based on relatively small sub-groups, in which the 4M conditions are more or less equal and to identify the variability due to chance causes by the within sub-group variability. A rational sub-group is defined as a subgroup within which variability is due only to random causes, and between which variability is due to special causes, selected to enable the detection of any special cause of variability among subgroups (by ISOIFDIS 3534-2). However, in practice it is difficult to specify a rational sub-group. In practice, sub-groups are made up in terms of a given day, operator shift or a treatment lot, etc.
Consider 2 - R control charts. The variability within a sub-group is estimated using the mean range after confirming that the X - R control chart has not indicated an out-of-control state; then the estimated variability within a sub-group is regarded as the variability due to chance causes. It is a nearly standardized procedure. However, even in the case that the process capability is adequate, the control chart can have many points outside the control lines and, consequently, process engineers distrust control charting. 3.2 Investigations and countermeasures Because of the unfavorable behavior of control charts experienced by the European Centre for TQM at Bradford University, Caulcutt (1995) pointed out that the above standardized procedure must not necessarily be appropriate as it results in X charts which often release alarms although no cause of instability can be detected. Bissell (1992) suggested that X - R control charts be supplemented by a delta chart to monitor the medium-term variability. The delta chart would be used to plot the successive differences in the group means.
Kuzuya (2000) investigated 63 cases of control charts in the stage of routine mass production used for monitoring essential quality characteristics for processes exhibiting a capability index exceeding 1.33. He found that half of the X or charts triggered alarms, thus, confirming Caulcutt's results (1995). We propose a procedure in which the variability due to chance causes includes some of the variability between sub-groups. The essential is that the process performance in the early-stage mass production is regarded as the standard value of variability due to chance causes in routine mass production. We explain the procedure by illustrating a heat treatment process. Fig. 3 shows an ordinary X - R chart in the early-stage mass production, where the control characteristic is hardness. The R chart indicates the in-control state, but the chart has many out-of-control points. On the other hand, Fig. 4 shows another R chart, where the control limits of chart are deduced from the overall process variability and not from the mean range R .
z
z-
z
In this case, the sub-group is composed of a treatment lot; that is, the variability within a sub-group is the variability within a lot. The sub-group has a meaning from the viewpoints of both physical aspect and quality assurance. Therefore, it is necessary to control the variability within a treatment lot using R charts. Fig. 4 indicates that the process is in-control and, hence, the process may proceed from early-stage mass production to routine-stage mass production where the control limits of R chart given in Fig. 4 are used as a standard control level. This means that the random variability due to some allowable causes between sub-groups in the early-stage mass production is included as variability due to chance causes, for example, the variability within the material, which is allowed by a standard for material of the company.
z-
The procedure outlined above leads to suggest the following standard procedure of process control activities from pilot production to routine mass production: Stage 1: Understanding and improving machine performance in pilot production. The machine performance index PM is given as
where W is the tolerance and sM is the machine performance represented by the standard deviation. We decide whether or not to proceed to the early-stage production with reference to the machine performance index PM.
Stage 2: Understanding and improving process performance in early-stage production. This stage corresponds to Phase 1. We decide whether or not to remove the early-stage production control system and proceed to the routine mass production
By means of the process performance index, Pp.
where the standard deviation s, represents the process performance. In case of a decision to proceed to the next stage the standard deviation s, becomes the standard value for control during the routine mass production stage. Stage 3: Monitoring process in routine mass production
This stage corresponds to Phase 2. Any change of the 4M in the production system should lead to a restart in Stage 1. An improvement of the quality level leads to a restart in Stage 2. Improving the machine performance by advanced production engineering results in reduction of variability within a sub-group and the chance causes have to be reconsidered, but note that the variability within a sub-group is not necessarily considered as variability due to chance causes.
Fig. 3: Ordinary
- R chart in the early-stage mass production
Fig. 4:
X -R
chart, where the control lines of
X
chart taken from the overall process
variability instead of the mean of range R
4 Selection of control characteristics 4.1 A case study (see Isaki, Kuzuya and Nishina (2002))
Consider the following parts production process by a transfer machine, which is a turntable system in the horizontal axis direction, the x-coordinate direction. In this process, drilling holes, cutting sections, tapping screws and so on are performed at each station in turn. The subject of this section is drilling holes. The characteristic to be assured is the location of a hole. We specify a target location (xo, y o ) for the center of the hole. The coordinates (xo, y o ) are measured from some specified point. Similarly, the coordinates (x, y) denote the center of a produced hole. Then the position deviation for the hole location is defined as D 2 . Its specification region is a circle with a diameter of 0.1.
(4.1)
Fig. 5 shows the histogram of the value D. The,sample size is 50. The control chart indicates that the process is in control. In addition, the value of the process capability index C p is obtained as
The value of C, indicates that there is a high level of process capability. However, after having left the production stage and proceeded to mass production, a nonconforming issue appeared, although the process capability had been highly acceptable. Fig. 6 shows the two-dimensional plot of the centers of the drilled holes, where the origin of the coordinates is the target location ( x o ,y o ) . Fig. 6 indicates that the variability for x-coordinate is much larger than that for y-coordinate.
where s, and s y denote the standard deviations for the location for xcoordinate and y-coordinate, respectively.
I frequency
2 ~=2,l'(x-xo) +(y-yd2 Fig. 5: Histogram of the value D (Isaki et al. (2002))
Calculating the process capability for the two coordinates separately, Cp, and Cpy , we obtain: Cp, = 0.94 , Cpy = 3.35
The process capability index of the x-coordinate is much lower than that of the ycoordinate. The essential point in the above case study is that the selection of the quality characteristic has been inappropriate. This is a very instructive case showing the significance of selecting the control characteristic
Fig. 6: Two-dimensional plots of the center of the produced hole (Isaki et al. (2002))
4.2 Characteristic to be assured and/or monitored
In the above case study the characteristic to be monitored is D shown in Equation (4.1). But if D is selected as a control characteristic, performance evaluation is misleading and process control results in failure. There are many sources of the location variability, for instance, the variability of the location of jig, of setting the part and so on. However, the above observed variability is caused by the turntable mechanism of the transfer machine (see Fig. 7). The low process capability with respect to the x-coordinate is due to the
variability of the turntable stopping position. The location variability can be decomposed to the variability in the x-coordinate and the y-coordinate with different sources within the mechanism of the production process.
Fig.7: Turntable mechanism of transfer machine
The lesson from this case study can be extended to more general situations, especially when geometric characteristics are involved, for example, perpendicularity, deviation from circulation etc. These characteristics can be decomposed to some more elementary characteristics corresponding to relevant process mechanisms. In the case study at hand the process capability study must be based on each of the deviations of the hole location along the x- and ycoordinates, respectively. It may be difficult to find out the assignable cause when the control chart indicates an out-of-control, however, any inappropriate selection of the control characteristic makes it even more difficult. The control characteristic must be selected in accordance to the relevant elements of the process. The compelxity of quality characteristics of modern production processes have grown with the complexity of products and they can be monitored only by the great advance in the measurement technology. However, the case study shows that a measurement characteristic specified by a related technical standard is not necessarily appropriate as control characteristic.
5 Conclusive remarks Control charts disappeared long ago from Japanese industries. Some of the key reasons have been addressed in this paper, referring to the rules how to indicate assignable causes, how to consider chance causes, and how to select the control characteristics. As mentioned in the Introduction, the role of control charts has recently been reconsidered in Japanese industry. Although control charts are generally looked upon as fundamental tools in SPC, their unquestioned use yields unwanted results and leads to distrust in the control chart concept itself. When talking about control charts in Japan, we mean Shewhart control charts. However, where computer-aided SPC is available, an integration of SPC and APC and the use of more sophisticated control charts, as for example Cusum charts, could be beneficial, Additionally, Phase 1 and Phase 2 should be strictly distinguished in industry. However, process industry has some features, which make it difficult to use control charts compared with the parts industry; for example, the process mean is apt to driR owing to uncontrollable factors (see Box and Kramer (1992)). In such a situation, the main problem is to arrive at Phase 2.
References I S 0 8258 (1 991): Shewhart control charts. Western Electric (1956): Statistical Quality Control Handbook, American Telephone and Telegraph Company, Chicago, IL. Nelson, L. S. (1984): "The Shewhart Control Chart - Tests for Special Causes", Journal of Quality Technology, Vol. 16, No. 4,237 - 239. Woodall, W. H. and Montgomery, D. C. (1999): "Research Issues and Ideas in Statistical Process Control", Journal of Quality Technology, Vol. 31, No. 4, 376 - 386. Woodall, W. H. (2000): "Controversies and Contradictions in Statistical Process Control (with discussions)", Journal of Quality Technology, Vol. 32, NO. 4,341 - 378. Champ, C. W. and Woodall, W. H. (1987): "Exact Results for Shewhart Control Charts With Supplementary Runs Rules", Technometrics, Vol. 29, No. 4,393 - 399. Davis, R. B. and Woodall, W. H. (1988): "Performance of the Control Chart Trend Rule Under Linear Shift", Journal of Quality Technology, Vol. 20, No. 4,260 - 262.
8.
9. 10. 11.
12.
Margavio, T. M., Conerly, M. D., Woodall, W. H. and Drake, L. G. (1995): "Alarm Rates for Quality Control Charts", Statistics & Probability Letters, V O ~24,219. 224. Nishina, K. (1996): "A study on estimating the change-point and the amount of shift using cusum charts", The Best on Quality, Chapter 14, Edited by Hromi, J. D., ASQC. Caulcutt, R. (1995): "The Rights and Wrongs of Control Charts", Applied Statistics, Vol. 44, No. 3, 279 - 288. Kuzuya, K. (2000): "Control Charts are renewed by Adaptation of New JIS", Proceeding of the 65lhConference of Japanese Society of Quality Control, 72 - 75 (in Japanese). Isaki, Y., Kuzuya, K. and Nishina, K. (2002): "A Review of Characteristic in Process Capability Study", Proceeding of the 32nd Annual Conference of Japanese Society of Quality Control, 6 1 64 (in Japanese). Box, G. and Kramer, T. (1992): "Statistical Process Monitoring and Feedback Adjustment - A Discussion", Technometrics, 25 1 - 267.
-
13.
Control Charts for the Number of Children Injured in Traffic Accidents Pokropp, F., Seidel, W., Begun, A., Heidenreich, M., Sever, K. Helmut-Schmidt-Universitat/Universitat der Bundeswehr Hamburg, Holstenhofweg 85, D-22043 Hamburg, Germany [email protected], wilfried.seidelBhsuhh.de
1 Introduction and Summary One of the major objectives of police work in Germany is to reduce the number Y of children injured by trafic accidents. For monitoring police activities, "target-numbers" of injuries provided by police authorities were mainly used as control limits. In general however, those limits do not reflect random v a r b tions of Y. Consequently they cause false alarm too often or they do not give sufficient hints to unusual increases of injury numbers. Improved methods of monitoring the effectiveness of police measures are needed. (See Section 2.) To construct more reliable control limits a stochastic model for Y should be used. We aimed at developing two "versions" of control limits: (a) for detecting significant deviations from injury incidences in the past, (b) for detecting significant deviations from pre-given target values. (See Section 5.) We had data for about five years in various regions. Doing some data analysis we observed several predominant patterns of Y for all years and all regions: Due to weather conditions (e.g.), typical differences showed up between seasons (e.g. winter and summer time); due to family and/or children's activities during holidays, periods of vacancies differed from school periods; weekends behaved differently from normal week days, even special weekdays showed up in the data. (See Section 3.) The structure of patterns which we had recognized in the data recommended to look for explanatory variables which could represent "seasonal" effects, particularly weather conditions, and periods of high or low traffic activities, also of different intensities of children's participation or presence in traffic. Also weekday effects had to be taken care of. Since one of our major objectives was to predict injury numbers for several regions on the basis of estimated parameters from data in the past and values of explanatory variables in the future which cannot be controlled, we had to look for variables prevailing in time and region. Thus we decided to base a model for the number YT of injured children within a time-period T on daily numbers of injured children,
explained by month of the day, weekday and holiday- resp. "bridgeday"character of the day. (See subsections 4.1, 4.2.) Parameter estimation was done by Maximum Likelihood methods. To get some insight in the performance of ML-estimation for our model we computed the ML-estimators on simulated populations. The parameters used for the simulation process were sufficiently well "reproduced" by the estimated parameters. Validation of the model was done heuristically as well as by well established methods for Generalized Linear Models. Results were encouraging. (See Subsection 4.3.) The construction of control limits was done by simulation. Using the estimated parameters - eventually "corrected" by pre-given target values - a random sample from the estimated distribution of YT (the number of injured children during time period T ) was drawn. Empirical quantiles from the Sample were taken as control limits. Several examples of control limits are given and commented. (See Section 5.) Finally the power of the control scheme is discussed in section 6 .
2 Applied Practices A first step to judge observed numbers of injuries, to discover trends or changes of patterns, is to look at average numbers in the past. This was done by police personal engaged in monitoring injuries caused by traffic accidents, but without any reference to the stochastic character of the involved events. In very detailed overviews local police personal can register locations which turn out to be crucial points of accidents. Further, they can identify areas of special population structures. Numbers of injuries can then be related to character of location, population structure etc. A typical way to proceed in practice is to agree for each particular local area and sometimes also for special population groups (e.g. nationalities) on an annual absolute target number TN of injuries. For better comparison between different areas, the corresponding relative number of injuries RN = TNx100000/(size of population 5 15 years) is also frequently used. As these target numbers should reflect the goal of cutting down accident numbers, they are taken as fractions of corresponding averages of observed numbers from the last years. Usually the size of the fraction varies between different areas or population groups; it is a matter of political bargaining. To ensure continuous controlling within a year, TN and RN (for a specified region) were percentually "distributed" on quarters and then uniformly broken down to monthly TN-s and RN-s. Percentages for the distribution are chosen to cope with "seasonal" effects. A real life eelcample is quarter (month) 1 (1-3) 2 (4-6) 3 (7-9) 4 (10-12) percent (percent) 17.3 (5.77) 30.9 (10.3) 31.6 (10.5) 20.2 (6.73)
TN and RN have always been understood to be upper limits for injury numbers, although they were taken as fractions of averages of observed numbers. This method can cause serious problems, particularly if injury numbers are small. This will be demonstrated by the following real life example for two "very small" regions A and B, where in Table 2 we list the accumulated target number of injuries and the accumulated observed number of injuries. Table 1. Target = accumulated target number of injuries Obs = accumulated observed number of injuries
region size of population < 15 years TN RN A 15 100 54 357.50 53 B 300.22 17 700 Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec A: Target 3.11 6.23 9.34 14.90 20.47 26.03 31.72 37.40 43.09 46.73 50.36 54 Obs 10 13 18 22 23 27 36 39 50 52 53 54 B: Target 3.06 6.11 9.17 14.63 20.09 25.55 31.13 36.71 42.29 45.86 49.43 53 12 20 Obs 1 4 6 25 30 39 49 57 60 61 In region A, the accumulated observed numbers (Obs) increase rapidly over the accumulated target numbers (Target) and later on slow down to end up with Obs = 54 = Target. Contrary to that we find a very slow growth of accumulated observed numbers at the beginning of the year in region B, but extreme growth towards year's end with Obs = 61 and Target = 53, and in this context we might well say: 61 >> 53. Naturally, we have to ask for an explanation of the peculiar difference between region A and region B. Can we find "structural" differences between A and B, or is it just stochastics which is "active"? There seem to be two major weak points in the described practitioner's procedures: (1) the lack of taking into account random effects in injury numbers, (2) the "distribution" of the annual TN on quarters and months without any reference to an idea of modelling "seasonal" behaviour of injury numbers, possibly with local specifications. We propose to base controlling on a stochastic model for the distribution of injury numbers. Targets may be defined in terms of expected values of numbers of injuries. indirectly monitored by control limits derived from the quantiles of the distribution of Y . Seasonal patterns are explained by the model. The model further allows to specify targets individually for different time periods, even for an arbitrary collection of days, as long as the "values" of explaining variables in the model are known for those days.
3 Some Data Analysis We had daily data for the number of injured children from 1997 until 2001. For a better overview let us look a t monthly numbers for only three years: 1998, 2000, 2001. Do this for a "large" population of about 3 million children of age 5 15 (Figure 1) and a "small" population of about 40 thousand children (Figure 2).
Jan
Mar
May
Jul
S ~ P
Nov
Months in 61998
0 2000
Fig. 1. Injury-Numbers in a Large Population of
2001
= 3 Mio Children
Look particularly a t July and August in Figure 1. In 1998 and 2000, total July and the first third of August were within the summer holiday period. Summer holidays in 2001 began the July, 5. and ended August, 18. The number of injured children in July 2001 is higher than in July 1998 and 2000 and lower in August 2001 than in August 1998 and 2000. If we look a t April we find a remarkably low number of injuries in 2001. A closer look shows that particularly a t the end of April the number of injuries is low 2001. This might partly be attributed t o the fact that May, 1, was a Tuesday, making Monday, 30. of April a "bridgeday" with less traffic activities than at a "normal" Monday. Similar patterns can be recognized in Figure 2. Additionally we may conclude that in different regions the "contribution" of a month (here e.g. September) can have a large variation over several years. If we look a t the calender we notice that September 1998 has only 8 weekend-days (i.e. Saturday or Sunday), whereas September 2000 an 2001 have 9 and 10 weekend-days. A closer look to daily data does often reveal the importance of the "distribution" of weekends or holidays or "bridge days" within a month for the number of injuries.
Jan
Mar
May
Jul
Sept
Nov
Months in 1998
02000
f32001
Fig. 2. Injury-Numbers in a Small Size Population of w 40 000 Children
This leads us to look at the number of injured children per weekday. This is done for 1999 in Figure 3. A very similar picture shows up if we consider time period 1997 - 2001. Apparently, weekend-days Saturday and Sunday are "good" for children; Friday is the "worst" day, and Monday through Thursday are all similar.
4 The Model 4.1 Explanatory Variables
The structure of patterns which we had recognized in the data recommended to look for explanatory variables which could represent "seasonal" effects, particularly weather conditions, and periods of high or low traffic activities, also of different intensities of children's participation or presence in traffic. Also daily effects - look at Figure 3 - had to be taken care of. (See also (3), second version for log At .) Since one of our major objectives was to predict injury numbers for several regions on the basis of estimated parameters from data in the past and values of explanatory variables in the future which cannot be controlled, we had to look for variables suitable for our purposes and prevailing in time and region. Most obvious was that the weekday played an essential role for the number of injuries. Periods of school holidays and of "bridge" days showed up, too.
Mo
Tu
We
Th
Fr
Sa
So
Fig. 3. Number of Injuries per Weekday in 1999
(Different from a day in a period of school holidays, a bridge day is within a group of at most four days off school.) Since weather conditions can strongly influence traffic but cannot be foreseen a longer period ahead we took months 'as indicators for weather and other conditions which influence traffic activities. Thus we decided to base our model on daily numbers of injured children, explained by month of the day, weekday and holiday- resp. bridgedaycharacter of the day. The number of injured children in traffic accidents in a given time interval T is modelled as aggregated daily number of injuries for days in time interval T . 4.2 Model Assumptions
We introduce the following
Notation for day t : Yt = number of injured children at day t Mm(t) = 1 if t belongs to month m , m = 1 , . . . , 1 2 W,(t) = 1 if t is weekday w , w = 1, . . . 7 (see also (3)) H ( t ) = 1 if t is a day within "longer" school holidays ("longer" = more than 4 days) B(t) = 1 if t is a "bridge day", i.e. a day within a short period (at most 4 days) of school vacancy. M's, W's, H, B are indicator variables with alternative value = 0 .
For each day t we model Yt to be distributed as a r-mixture of Poissondistributed variables. The mixing variable Z ("frailty") is introduced to take care of overdispersion. The r ( a , b)-distribution is given by the r-density:
f r ( a , b ) (2) =
exp{-bx' r(a)
ba
, expectation a/b , variance a/b2 .
N B M , a Negative Binomial Model: The daily M o d e l , - i.e. the model for Yt is given as a r-mixture of Poisson variables:
(a) Z -- r ( l / u 2 , 1/u2) ; E ( Z ) = 1 , Var(Z) = u2 ; (b) (Y,I Z = z ) -- Pois(z -At) ; (c) For fi we then have a Negative Binomial Model (NBM): fi -- N B with E(&) = A t , Var(fi) = At(1 a2Xt); alternative notation: fi -- NB(b,p) with b := 1 / 0 2 , p := 1 / ( 1 + Atu2).
+
(d)
{fi : t
E
T ) are independent for each set T of days.
where Wednesday is day w = 7 (randomly chosen) and sufficiently taken care of by W w = 0 for w 5 6 ; thus, S7 is not in the model. Second version for log Xt : log X t =
xi2
+
(3)
xWGl 3
pm~,(t) bWww(t)+ T H H ( ~ )+ rsB(t) . m=l After grouping weekdays with estimated 6-s close to zero we obtained a t most four weekdays: the "Wednesdayn-group (containing also Monday, Tuesday, Thursday) with Ww = 0 , Friday, Saturday and Sunday. Thus, only at most three S's are needed. This was done separately for each region under consideration. (Sometimes Saturday and Sunday build a group, and only two 6-s are needed.) In short notation: log A t = 4 . P w ~ ~ ~ P = ( P I , . . . , P I z , S I , S ~ , S ~ , Y H , Y B ) ' , xt = [ M i @ ) ., . . , ~ i z ( t ) , W ~ (~t )2, ( t ) , ~ 3 ( t ) , ~ ( t ) , ~ . (t)]' N B M for t i m e p e r i o d s T (e.g. months, quarters of years) : YT = number of injured children during time period T ; YT =
EiET fi with
yT =
c:=,
Y, according t o ( I ) , (3).
We also write
fi if T = (1,. . . , n) without loss of generality.
(4)
4.3 E s t i m a t i o n a n d Validation
To estimate the parameters in (3) with ( I ) , the marginal likelihood-function f M a L resp. L = log f M a L was maximized. This was done on the basis of all data available from about five years. With ( 2 ) ) (3) and (4) we have the marginal likelihood
Z2 ,
2,thus Xt
for all t
To check the performance of the marginal ML-Method for our model, we estimated parameters from artificial data for the days of about five years, which had been generated according to (3) with known given parameters. Results were encouraging : P-s , 6-s , 5-s and S2 turned out to be very close to the given parameter set of p-s , b-s, y-s and u 2 for the generation of the artificial data set. To zntuitively verify a reasonably good performance of model (4) we compared the known monthly numbers of injured children of a specified year with the corresponding estimated expected numbers, where the estimation was based on the parameter estimates from the data for the four other years; i.e. we made an "ex-post-prognosis" for already known monthly data of one year, but the data - daily injury numbers - might also well come from days later than the time period for which the injury numbers were to be "predicted". Again we judged the results to be sufficiently good to base the construction of control limits on model (4) with (3). A systematic way to look at the adequacy of a model is to apply deviance measures from t h e theory of Generalized Linear Models (GLMs). One way of modelling overdispersion (see LEE, NELDER(2000)) is to use a PoissonGamma-Model (NBM = Negative Binomial Model) - as we did in (3), (4). We compared the performance of our NBM with performances of a PoissonGeneral-Linear Model (PGLM) and of a Quasi-Likelihood-Model (QLM) . PGLM : The PGLM has overdispersion u2 = 1 : Var(Y,) = 1 . E(Y,) . QLM : In a QLM we have Var(Yt) = u 2 .V(Yt) with some suitable variance function V and without specifying the distribution of Yt within the exponential family. (The lack of further model specification is somewhat unsatisfactory. We follow CHRISTENSEN'S comment on QLM: "Such extensions are widely accepted as providing valuable data analytic tools; however, many people have difficulty in understanding the theoretical basis for them." CHRISTENSEN (1990), p. 364 .) To compare NBM with PGLM and QLM we made model checking plots based on the deviance residuals for the three models. (Examples are given in A
A
Figures 4 and 5.) Because these models have different variance forms we used a variance checking plot, where the absolute values of studentized deviance residuals were plotted against the fitted values. The deviance D in a GLM compares the log-likelihood function 1(X;y) with ML-estimated 5 and the log-likelihood function for a saturated model, where we Lave as many parameters as observations and thus X = y : D(Y;A) = 2g2 (L(Y;Y) - l(X;Y)) = C:==l dt . Deviance Residuals are defined as rt = sig(yt - At)& . PIERCE, SCHAEFER (1986) show that the deviance residuals may be considered as (asymptotically) normal with mean zero and common variance, irrespective of the distribution postulated for the yt. For model checking plots the usual studentized deviance residuals r ; are used. Two model checking plots were made: 1. The r,*were plotted against the fitted values transformed to the constant information scale of the assumed distribution. In all plots the residuals seem to fluctuate randomly around zero. This indicates that there is no misspecification of the link function or the linear predictor. 2. The absolute values I rf I were plotted against the fitted values transformed to the constant information scale. These plots may be used to check the variance function. Here one can see that the I r,* I of the PGLM (Figure 5) reach much higher values than the I rf I of the NBM (Figure 4). That means that the residuals of PGLM have a greater variance than residuals of NBM. Further there is no systematic change of range (i.e. no increase or decrease of the variance) in the plot for the NBM. Plots for QLM looked very much like the plots for NBM. But in view of a certain lack of model structure - as mentioned above - the NBM should be preferred. A
A
5 Control Limits 5.1 C o n s t r u c t i o n for P r e d i c t e d N u m b e r s of I n j u r i e s The prediction
pT
:=
YT (Pred) for YT (4) is clearly done by
~ET
with
:= x i ,
log(xt) := x: .
( s t ,P : see ( 3 ) ( f ) ) . (6)
For the construction of control limits for YT (4) we need the estimated distrzbution of YT . Since the analytic handling of that distribution is not obvious, simulation procedures were used: for each t E TI generate realizations yt,, ( s = 1,.. . ,2000) from
N B @ : = 1/S2, $:= 1/[1+ xtS2]) (see (3)(c)). Realisations yT,, = CtET yt,, , s = 1,.. . , 2000, offer simulated yl,, = al2-quantile and y,,, = (1 - a/2)-quantile as simulated lower and upper (1 - a)-CL for YT .
(7)
I
10
I
I
I
I
20
30
40
50
Scaled Fitted Values Fig. 4.
1 T*
/-Plot for NBM in a Large Population of
L 6
%
I
I
I
I
8
10
12
14
3 Mio Children
Scaled Fitted Values Fig. 5. 1 r * ]--Plot for PGLM in a Large Population of
FZ
3 bdio Children
Typical results with a! = 0.15 and a = 0.01 for months 2003, quarters 2003 and the year 2003 are shown in Figures 7, 8, 6 for a large population of about 3 million children, in Figures 9, 10, 11 for a small population of about
40 000 children.
-
Upp +99% LB Upp +85% --
Pred Ea Low 85%
Low 99%
1
Fig. 6. Control Limits for Injuries, Year 2003, in a Large Population of w 3 Mio Children. (Exact Values are in the Appendix, Table 4)
C)early, an observed numker YT of injuries in time period T may well differ from the predicted number YT . In Figure 12 we show a typical p&ture of YT in relation t o the control limits which are constructed "around" YT . 5.2 C o n s t r u c t i o n for G i v e n T a r g e t N u m b e r s of I n j u r i e s
Target numbers (in general as upper limits) for the number of injuries during some time T are given by police authorities as percentage numbers qT for the decrease of predicted (often derived from observed) numbers within some population. Percentages 1- qT are easily converted into given target numbers yT of injuries. For our purposes (and in view of our model) it is much more convenient to use YT rather than 1 - qT . Assume that for time period T we have a
for which control limits are wanted without any reference to what might happen in sub-periods of T. In view of (8) we then can proceed as follows: introduce correction factor CT with
(a) C T : =1 - q T ; (b)
yT = C T . Y T ;
with At := CT . xt (see (6)) ; proceed as in (7) with replaced by At to obtain control limits for yT .
YT = CtET At
st
Jan
Mar
~
+
~
~
Jul
May
+
9
9
%
~ -Pred + a
S ~ P 5
% Low 85%
Nov
- -- Low 99%]
Fig. 7. Control Limits for Injuries, Months 2003, in a Large Population of = 3 Mio Children. (Exact Values are in the Appendix, Table 2)
1500
4 2
1
4
3
Quarters 2003
1[+-;:
Gp+99FzUpp+85%
-Pred
-+Low
85%
- - - Low 99% 1
Fig. 8. Control Limits for Injuries, Quarters 2003, in a Large Population of = 3 Mio Children. (Exact Values are in the Appendix, Table 3)
+ ,
,
. I
-....--'Jan
Mar
May
b-Upp +99% -u- Upp +85% -Pred
Jut
SeP -+-Low 85%
Nov
- -- Low 99% 1
Fig. 9. Control Limits for Injuries, Months 2003, in a Small Population of rz 40 000 Children. (Exact Values are in the Appendix, Table 5)
0
4 4
3
2
1
Quarters 2003
1- + - Upp +99%
Upp +85% -Pred
+
+Low
85%
- - - Low 99%]
Fig. 10. Control Limits for Injuries, Quarters 2003, in a Small Population of x 40000 Children. (Exact Values are in the Appendix, Table 6)
Fig. 11. Control Limits for Injuries, Year 2003, in a Small Population of = 40000 Children. (Exact Values are in the Appendix, Table 7)
Jan
Mar
E-Upp +99%
-
May
Jul
Upp +85% -Observed
Sept --c Low 85%
Nov
- - - Low 99% /
Fig. 12. Observed Number of Injuries and Control Limits in Year 2000 in a Medium Size Population of = 500 000 Children
What can we do if control limits are wanted for a target number in time period T as well as for target numbers in sub-periods T ( i ) C T ? Suppose that we have a disjoint decomposition: k
U
(a) T = 2 = 1 T ( i ) , T ( i ) # 0 for all i , T ( i ) n T ( j ) = 0 for i # j ; (b) yT = pre-given target number in T (as in (8)) ; = pre-given target number in T ( i ) , i = 1,.. . , k - 1 ; (c) Y T ( k cannot ) be pre-given (see (12) with (14)).
(10)
We do have the predicted injury numbers in all time periods of interest according to (6): A
k
-
A
A
y~ = Cs=l y ~ ( i= ) YT(*) + Y T ( k );
pT(*)=
k-1
-
YT(()
Let now as in (9) and with (10)
Define now the "correction factor" for T ( k ) :
We then obviously obtain (also with (11))
Equation (13) makes sense if 0 < C T ( k ).
-
Due to ( l l ) , (12), (13), (14) we have the corrected
At = CT(,). At for t E T ( i )
,
and the construction of control limits can be done as in (9)(b) This procedure opens a large variety of control limits according to (9). Note in particular that in (10)(a) the T ( i ) are not necessarily intervals. We thus might for suitable special subsets of T - in view of Figure 3 e.g. for the subset of all Fridays ! - construct control limits to pre-given target numbers of injuries.
6 Power of Control Limits Suppose that the expected value of the number of injuries in some time period T differs from the target number yT by some factor a , i.e.
How likely is it that our procedure will detect such a deviation? and ,,y, have been constructed Let us assume that control limits yl, for &- = CtET At according to (9). For a given factor a we assume that
YT =
xtET & where now
&
NB(b,p(a)) with (16)
b = l / a 2 , p = l / (1 + a . At v2) ,
and (15) holds. We are now interested in the power
with a simulated distribution function Fa of YT under (16). We calculated power @(a)(17) - with a range of several a near 1 - in a large population (= 3 million children) with 0.99-control limits for the year 2000 (see Figure 13), with 0.85-control limits for the year 2000 (see Figure 14) and for May 2000 (see Figure 15); in a small population (= 40 thousand children) the power for the year 2000 (see Figure 16) and for May 2000 (see Figure 17) with 0.85-control limits was calculated.
0,9
0,92
0.94
0,96 0,98
1
1,02
1,04
1,06
1,08
1,l
a = Factor of Deviation Fig. 13. Power in 2000, cr = 0.01 ; Large Population: = 3 Mio Children
Let us at first look at the large population. In Figure 13 we see that with significance 0.01 an increase resp. decrease of injury numbers of about 5% (i.e. ( a - 1 ( = 0.05) within year 2000 will be detected with probability of around 0.60; this power becomes = 0.95 if we have significance 0.15, as is seen in Figure 14. For a short period of time like a month - here May 2000 - Figure 15 shows a power of hardly more than 0.30 for I a - 1 I = 0.05. But a
Fig. 14. Power in 2000, a = 0.15; Large Population: x 3 Mio Children
0.9
0,92
0,94
0.96
0,98
1
1,02
1,04
1,06
1.08
1,l
a = Factor of Deviation Fig. 15. Power in May 2000,
ai
= 0.15; Large Population:
x 3
Mio Children
0,g
03
12
1,l
1
a = Factor of Deviation Fig. 16. Power in 2000, a = 0.15 ; Small Population:
0,4
0,6
03
1
= 40 000 Children
12
1,4
a = Factor of Deviation Fig. 17. Power in May 2000, a = 0.15 ; Small Population:
= 40 000 Children
1,6
larger increase resp, decrease of about 10% would be detected with probability 0.70 . In a small population, the probability of detecting a 5% increase resp. decrease within one year is less than 0.20, as we see in Figure 16; only large increases resp. decreases of ~ 2 0 %have a probability of about 0.70 to be detected. Even very large increases resp. decreases (say 20%) within one month will hardly be detected (power = 0.20) , as shown in Figure 1 7 . Due to the character of our model the results on the power of control limits for injury numbers should not surprise. Control limits and their power are useful for sufficiently large populations in sufficiently large time periods corresponding to the (in time and space) "global" character of our model. We might additionally take the hint that "local" events require more individual treatments.
--
7 Appendix
Table 2. Large Population of z 3 Mio Children: CL for Injuries, Months 2003
Pred 606
Month Jan Feb Mar A P ~ May Jun Jul
-4%
'
Sept Oct Nov Dec
Table 3. Large Population of
= 3 Mio Children: CL for Injuries, Quarters 2003 Upper
Quarter
2. Apr
-
Jun
Lower 85% 99% 571 530 628 580 753 697 856 788 1242 1159 1187 1113 1119 1045 902 839 946 888 834 771 702 652 626 583
3580
Lower 85% 99% 1999 1916 3365 3228 3037 2921 2219 2130
170
Table 4. Large Population of z 3 Mio Children: CL for Injuries, Year 2003
Upper Year
2003
Table 5. Small Population of
Lower
85% 99% +85% Pred 11155 10984 10791 10575
+99,0% 11362
= 40 000 Children: CL for Injuries, Months 2003 Lower
Upper Month Jan Feb Mar A P ~ May Jun Jul Aug Sept Oct Nov Dec
+99,0% 14 13 18 19 25 26 31 17 25 17 22 19
+85% 9 9 13 14 18 19 23 12 19 13 16 14
Pred
6 6 10 10 14 14 18 9 15 9 12 10
Table 6. Small Population of z 40 090 Children: CL for Injuries, Quarters 2003
Lo.
85% 17 31 34 26
Table 7. Small Population of x 40000 Children: CL for Injuries, Year 2003
Upper Year
2003
+99,0% 166
Lower
+85% 147
Pred
134
85% 121
99% 106
References 1. Christens, P.F. (2003): Statistical Modelling of Traflic Safety Development, P h D thesis. T h e Technical University of Denmark, IMM,Denmark. 2. Christensen, R. (1990): Log-Linear Models. Springer-Verlag New York etc. 3. Dobson, A.J. (2002): An Introduction t o Generalized Linear Models. Chapman and Hall. 4. Lee,Y., Nelder, J.A. (2000): Two Ways of Modelling Overdispersion in NonNormal Data. Applied Statistics, Vol. 49, p. 591-598. 5. McCullagh, P., Nelder J.A. (1989): Generalized Linear Models. Chapman and Hall. 6. Pierce, D.A., Schaefer, D.W. (1986): Residuals in Generalized Linear Models. JASA, Vol. 81, p. 977-986. 7. Toutenburg, H. (2003): Lineare Modelle. Physica-Verlag.
A New Perspective on the Fundamental Concept of Rational Subgroups Marion R. Reynolds, ~ r . and ' Zachary G. stoumbos2
'
Virginia Polytechnic Institute & State University, Department of Statistics, Blacksburg, VA 24061-0439, USA [email protected] Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8054, USA [email protected]
Summary. When control charts are used to monitor processes to detect special causes, it is usually assumed that a special cause will produce a sustained shift in a process parameter that lasts until the shift is detected and the cause is removed. However, some special causes may produce a transient shift in a process parameter that lasts only for a short period of time. Control charts are usually based on samples of n 2 1 observations taken with a sampling interval of fixed length, say d. The rational-subgroups concept for process sampling implies that sampling should be done so that any change in the process will occur between samples and affect a complete sample, rather than occur while a sample is being taken, so that only part of the sample is affected by the process change. When using n > 1, the rational-subgroups concept seems to imply that it is best to take a concentrated sample at one time point at the end of the sampling interval d, so that any process change will occur between samples. However, if the duration of a transient shift is less than d, then it appears that it might be beneficial to disperse the sample over the interval d,to increase the chance of sampling while this transient shift is present. We investigate the question of whether it is better to use n > 1 and either concentrated or dispersed sampling, or to simply use n = I . The objective of monitoring is assumed to be the detection,of special causes that may produce either a sustained or transient shift in the process mean p and/or process standard deviation a. For fair comparisons, it is assumed that the sampling rate in terms of the number of observations per unit time is fixed, so that the ratio nld is fixed. The best sampling strategy depends on the type of control chart being used, so Shewhart, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) charts are considered. For each type of control chart, a combination of two charts is investigated; one chart designed to monitor ,u, and the other designed to monitor a. The conclusion is that the best overall performance is obtained by taking samples of n = I observations and using an EWMA or CUSUM chart combination. The Shewhart-type chart combination with the best overall performance is based on n > I, and the choice between concentrated and dispersed sampling for this control chart combination depends on the importance attached to detecting transient shifts of duration less than d.
1
Introduction
Consider the problem of using control charts to monitor the mean p and standard deviation a of a normal process variable, where the objective is to detect small as well as large changes in p or a. A change in p or a may correspond to a sustained shift that lasts until it is detected by a control chart and removed. Most evaluations of the performance of control charts consider sustained shifts. A change in p or a may also correspond to a transient shift that lasts for only a short period of time, even if not detected by a control chart. For example, a process may be affected by temporary changes in electrical voltage caused by problems external to the process being monitored. The general question being considered here is the question of how to sample from the process to detect sustained and transient shifts. The traditional approach to deciding how to sample from control charts is based on the rational-subgroups concept. This concept was introduced by Shewhart (1931) and is discussed in standard texts such as ~ a w k i n and s Olwe11(1998), Ryan (2000), and Montgomery (2005). The basic idea of this concept is that sampling should be done so that it is likely that any process change will occur between samples and thus affect a complete sample, rather than occur while a sample is being taken, so that only part of the sample is affected by the process change. There are a number of well-known applications of the rational-subgroups concept. For example, if there is a change in process personnel every eight hours, then a sample should not overlap two of these eight-hour periods because a process change may correspond to the change in personnel. Any sample containing observations from the two periods would contain observations from both before and after the change. As another example, consider a process that consists of multiple streams, where a process change can affect only one stream. For example, components may be manufactured simultaneously on five machines and a special cause may affect only one of the five machines. In this situation, a sample should not consist of one observation from each machine, because a change in only one machine would tend to be masked by the observations from the other unchanged machines. In this paper, we assume that when process change occurs, it is just as likely to occur in one place as another. Thus, we assume that we can avoid situations such as sampling from two different work periods. In addition, we will not investigate the situation where only part of the sequence of observations is affected by the process change, as would occur, for example, when sampling from multiple streams and only one stream is affected by the change. Thus, we assume that when a process change occurs, it affects all observations obtained while this change is in effect. To more clearly explain the sampling issues that we will investigate, consider a process that is currently being monitored by taking samples of size n, where n = 4,
using a sampling interval of d, where d = 4 hours. Assume that the four observations in a sample are taken at essentially the same time, so that the time between observations can be neglected. For example, the sample might consist of four consecutive items produced. We call this concentrated sampling, because the four observations are concentrated at one time point. Concentrated sampling seems to correspond to the rational-subgroups concept, because any process change is likely to occur during the four-hour time interval between samples. Consider a transient shift that lasts for a duration of I hours. For example, if a change in the electrical voltage to the process lasts for only two hours, then 1= 2 hours. If a concentrated sample is taken every four hours, then a transient shift of duration I = 2 hours may be completely missed. To detect transient shifts of short duration, it appears that it would be better to spread out the sampling over the four-hour interval, with one observation every hour (see Reynolds and Stoumbos (2004a, 2004b) and Montgomery (2005)). In general, a sample of n > 1 observations spread out over the time interval d, with a time interval of dln between each observation, will be called dispersed sampling, because the observations are dispersed over the interval. , The objective of this research work is to investigate several questions connected with concentrated versus dispersed sampling. In particular, for detecting sustained andlor transient shifts, we investigate the question of whether it is better to use concentrated sampling and take all observations at the same time, or instead use dispersed sampling and spread the observations out over the interval d. If dispersed sampling is used with n = 4 and d = 4, then observations are taken individually every hour. In this case, we have the option of plotting a point after each observation, so this would correspond to n = 1 and d = 1. So, if observations are taken individually every hour, we investigate the question of whether it is better to plot a point after each observation, corresponding to n = 1, or instead to defer plotting until four observations have been obtained and thus plot a point every four hours, corresponding to dispersed sampling.
Specifically, in this paper we consider the above questions for Shewhart and exponentially weighted moving average (EWMA) control charts and refer the reader to Reynolds and Stoumbos (2004a, 2004b) for related, detailed investigations on the cumulative sum (CUSUM) control charts. We will show that the answers to these questions depend on the type of control chart being used.
2
Monitoring the Process
The process variable of interest, X,is assumed to have a normal distribution with mean p and standard deviation a. Assume that all observations from the process are independent. Let be the in-control or target value for p, and let oobe the incontrol value for a.
Suppose that samples of size n 2 1 are taken every d hours, and, without loss of generality, assume that the sampling rate is nld = 1.0 observation per hour. So, this could correspond to one observation every hour, or four observations every four hours, or eight observations every eight hours, and so forth. We assume that the sampling cost depends only on the ratio nld, so that all of these sampling patterns have the same cost. For monitoring p and a, it is customary to use two control charts in combination, one designed to detect changes in p and the other to detect changes in cx The traditional Shewhart control chart for monitoring p is the X chart based on plotting the sample means. Three-sigma control limits are usually used with this chart. For monitoring o, consider the Shewhart s2chart based on the sample variance, with only an upper control limit determined by the chi-squared distribution. Here, we only consider the problem of detecting increases in o; but the conclusions of the paper are similar if both increases and decreases are considered. When n > 1, we consider the performance of the Shewhart X and SZ charts when used in combination to detect shifts in p or o. When n = 1, the X chart reduces to the Xchart. In this case, the moving-range (MR) chart is frequently used to monitor o, but recent research has shown that using the X chart alone is sufficient to monitor both p and o. Thus, when n = 1, we consider the X chart alone for monitoring p and a . The EWMA control statistic for detecting changes in p is
where the starting-value is Yo= p o ,as it is usually taken to be in practice. A signal is generated if Y, falls outside a lower or an upper control limit. The smoothing parameter 1 (0 -45 0) serves as a tuning parameter that can be used to make the chart sensitive to small or large parameter shifts. The EWMA control chart was originally proposed by Roberts (1959), and a number of papers in the last 20 years have considered the design and implementation of EWMA charts (see, for example, Crowder (1 987), Lucas and Saccucci (1990), Yashchin (1 993), Gan (1 995), Morais and Pacheco (2000), Stoumbos and Sullivan (2002), Stoumbos, Reynolds, and Woodall (2003), and Stoumbos and Reynolds (2005)). Very effective control charts for detecting changes in a can be based on squared deviations from target (see, for example, Domangue and Patch (1991), MacGregor and Harris (1993), and Shamma and Amin (1993)). The one-sided EWMA control statistic for detecting increases in o c a n be written as
where the starting value is 2, = 0:. A signal is given if 2, exceeds an upper control limit. A two-sided EWMA chart for o c a n be developed for detecting both increases and decreases in o, but we consider only the problem of detecting increases in o. Extensive investigations of the properties of the EWMA charts based on Y, and Z, when these two EWMA charts are used in combination to detect changes in ,u or o have been conducted by Stoumbos and Reynolds (2000) and Reynolds and Stoumbos (2001a, 2001b, 2004a, 2004b, 2005,2006)). Here, we will also consider the performance of the EWMA charts based on Y, and Z, when these two EWMA charts are used in combination to detect changes in ,u or a. The performance of this EWMA chart combination depends on the choice of the tuning parameter 1.When n = 4, we use 1= 0.2. When n = 1, we use A = 1- (1 - 0.8)"~ = 0.05426, which ensures that the sum of the weights of four observations is 0.2, the same weight used for the case of n = 4 (for additional discussion, see Reynolds and Stoumbos (2004a, 2005)).
3
Measures of Control Chart Performance
Define the average time to signal (ATS) to be the expected amount of time from the start of process monitoring until a signal is generated by a control chart. When the process is in control (,u=b and o = oo), we want a large ATS, which corresponds to a low false-alarm rate. When there is a change in the process, we want the time from the change to the signal by the chart to be short. Define the steady-state ATS (SSATS) to be the expected time from the change to the signal. The SSATS is computed assuming that the control statistic is in steady state at the time that the process change occurs, and that the process change can occur at random between sampling points. Additional discussion of the SSATS can be found in Reynolds (1995), Stoumbos and Reynolds (1996, 1997,2001), and Stoumbos, Mittenthal, and Runger (2001). Comparisons of control charts or combinations of control charts will be done for the case in which the in-control ATS is very close to 1481.6 hours. This is the in-control ATS value for a Shewhart X chart based on samples of n = 4 observations every d = 4 hours and using the standard three-sigma limits. When two control charts are considered in combination, the control limits of each chart have been adjusted to give equal individual in-control ATS values (usually of about 2700) and a joint in-control ATS value veryscloseto 1481-6. For all of the numerical results presented in the tables provided in this paper, we assume, without loss of generality, that = 0 and oo= 1.
4
Evaluations for Shewhart Control Charts
Table 1 gives SSATS values for sustained shifts in ,u or o and concentrated sampling, for the Shewhart X chart with (n, 6) = (1, 1) and the Shewhart X and s2 chart combination with (n, 6)= ( I , I), (2, 2), (4, 4), (8, 8), and (16, 16). From Table 1, we see that using n = 1 is good for detecting very large shifts, but performance is very bad for detecting small shifts. Using a large value of n is good for detecting small shifts, but performance is bad for detecting very large shifts. Table 1. SSATS Values for Shewhart Control Charts and Sustained Shifts Using Concentrated Sampling
X Chart Size of Sustained Shift i n p o r cr
Combined
& S' Charts
n=l d= 1
In-Control ATS p = 0.5 1.o 2.0 3.O 5.0 10.0
o= 1.4 1.8 3.0 5.0 10.0
Thus, the Shewhart control charts are very sensitive to the choice of n. A reasonable compromise for detecting both small and large shifts would be an intermediate value of n. For comparisons later with EWMA control charts, we will use n = 4 and d = 4 as a good compromise. This corresponds closely to the traditiona1 practice of taking samples of around n = 4 observations (see, for example, Stoumbos et al. (2000) and reference therein). Table 2 gives SSATS values for Shewhart control charts based on concentrated or dispersed sampling when there is a sustained shift in p or cr. From Table 2, we see that concentrated sampling is always better than dispersed sampling for detecting sustained shifts. Thus, the rational-subgroups concept gives the correct answer in this case. Observations should be taken in groups so that a shift will occur between groups.
This makes intuitive sense. For a sustained shift, we want to wait until the end of the interval d to take a sample so that the sample will not include any in-control observations. Table 3 gives signal probabilities for Shewhart charts based on concentrated or dispersed sampling when there is a transient shift of duration 1 = 1 hour or I = 4 hours. The signal probabilities are the probability of a signal either while the transient shift is present or within 4 hours after the end of the transient shift. From Table 3 we see that for transient shifts of short duration (I= 1.0), the Shewhart X chart with n = 1 is best. But we have already established that n = 1 is not good for detecting sustained shifts. Thus, we should rule out using n = 1 unless transient shifts and large sustained shifts are the only process changes that are of concern. If n = 4 or 8 is being used and I = 1, then dispersed sampling is better than concentrated sampling (except for small shifts). Thus, we have a situation in which we want to use dispersed sampling, because using concentrated sampling when the transient shift is of short duration would mean that we might not sample when the shift is present. Note that the use of dispersed sampling violates the rationalsubgroups concept because a shift will occur within a sample. If the transient shift is of longer duration (I = 4), then using n = 4 and concentrated sampling is best. So the rational subgroups concept gives the correct answer for transient shifts of longer duration. When Shewhart control charts are being used, we see that the choice of n and the choice between dispersed or concentrated sampling depends on what we want to detect. If detecting large sustained shifts or transient shifts of short duration is the primary concern, then using n = 1 is reasonable. But if we want to have reasonable performance for detecting smaller shifts, then we need to use a larger value of n, such as n = 4. In this case, dispersed sampling is best for detecting transient shifts of short duration. In many applications, sustained shifts may be more common than transient shifts, and in this case, using concentrated sampling would be reasonable.
179
Table 2. SSATS Values for Shewhart Control Charts and Sustained Shifts Using Concentrated or Dispersed Sampling Combined
X Chart
1.O 2.0 3 .O 5.0 10.0
a= 1.4 1.8 3.0 5.0 10.0
&
n=4 d=4
Size of Sustained Shift inpor a
p = 0.5
2
s2 Charts n=8 d=8
Dispersed
Concentrated
Dispersed
521.3 121.3
65.4 16.5 3.3
Table 3. Signal Probabilities for Shewhart Control Charts and Transient Shifts Using Concentrated or Dispersed Sampling Combined
X Chart Duration 1 and Size of Transient Shift in p
;,: Concentrated
Dispersed
X
1
& S' Charts
;,: Concentrated
Dispersed
Evaluations for EWMA Control Charts Table 4 gives SSATS values for sustained shifts in p or a for EWMA control charts with (n, d)= (1, 1) and (4, 4). For purposes of comparison, SSATS values are also given for the Shewhart X and S' chart combination with (n,d) = (4,4). When (n, d)= (4,4), SSATS values are given for both concentrated and dispersed sampling. From Table 4, we see that if n = 4 in the EWMA control charts, then concentrated sampling is better than dispersed sampling for sustained shifts. This is the same conclusion reached for n = 4 and sustained shifts for the Shewhart control charts. Now, consider the choice of n in the EWMA charts. For detecting large shifts, it is much better to use n = 1 in the EWMA charts, but for smaller shifts, n = 4 and concentrated sampling is a little better. Overall, it seems to be better to use n = 1 in the EWMA charts. If the EWMA charts use the best overall value of n (n = 1) and the Shewhart charts use the best overall value of n (an intermediate value such as n = 4), then the EWMA charts are much better than the Shewhart charts for both small and large shifts. The Shewhart charts are slightly better for some intermediate shifts. Note that this contradicts the conventional wisdom that Shewhart charts are best for detecting large parameter shifts. Shewhart charts are best only in the case when n = 1 is used, but n = 1 should not be used with Shewhart charts unless detecting large shifts is the only objective. Table 5 gives signal probabilities for EWMA and Shewhart control charts based on concentrated or dispersed sampling when there is a transient shift of duration 1 = 1 hour or 1 = 4 hours. From Table 5 we see that, if n = 4 is used in the EWMA charts, then dispersed sampling is much better than concentrated sampling for detecting large shifts of short duration ( I = 1). For the case of n = 4 and 1= 4, concentrated sampling is a little better than dispersed sampling. However, for the EWMA control charts, using n = 1 is better than using n = 4 in all cases of the transient shifts in Table 5. This corresponds to the value of n that is best for sustained shifts. Thus, for EWMA control charts, the best overall value of n does not depend on the type of shift. Using n = 1 is best for both sustained and transient shifts. If n = 1 is used in the EWMA charts, then the rational subgroup concept applies in the sense that a shift must occur between samples, because the samples are of size n = 1.
181
Table 4. SSATS Values for Shewhart and EWMA Control Charts and Sustained Shitts Us-
ing Concentrated or Dispersed Sampling 2ombined Size of Sustained Shift inpor a
z-&S2 Charts
Combined EWMA Charts
n=4 d=4 :oncentrated
Dispersed
Concentrated
Dispersed
In-Control ATS p = 0.5
1 .o
2.0 3.0 5.0 10.0 c = 1.4
1.8 3.0 5.0 10.0
Table 5. Signal Probabilities for Shewhart and EWMA Control Charts and Transient Shifts Using Concentrated or Dispersed Sampling
1 Combined
& S' Charts
1
Combined EWMA Charts
Duration 1 and
Concentrated
I = 1.0 p = 2.0
3.0 4.0 5.0 7.0
0.20 0.25 0.25 0.25 0.25
0.05 0.19 0.48 0.80 1 .OO
0.04 0.18 0.50 0.82 1 .OO
0.79 1 .OO 1 .OO 1 .OO 1 .OO
0.50 0.92 1 .OO 1 .OO 1 .OO
0.49 0.96 1 .OO 1 .OO 1 .OO
1 = 4.0 p = 2.0
3.0 4.0 5.0 7.0
Dispersed
Conclusions and Discussion Shewhart control charts are very sensitive to the choice of n. The best overall performance is achieved when n > 1 (for example, n = 4 is a reasonable compromise). If a Shewhart chart uses n = 4, then concentrated sampling is better for sustained shifts, but dispersed sampling is better for transient shifts of short duration. Using n = 4 and dispersed sampling violates the rational subgroup concept. For the EWMA control charts, it is best overall to use n = 1, so the issue of concentrated versus dispersed sampling does not arise. For EWMA charts with n = 1, there is no violation of the rational subgroups concept. Our recommendation for monitoring the process mean and standard deviation is to take small frequent samples ( n = 1 and d = 1) and use an EWMA chart for the mean in combination with an EWMA chart for the standard deviation based on squared deviations from target. We have rigorously verified that the basic conclusions about EWMA control charts apply to CUSUM control charts (see Reynolds and Stoumbos (2004a, 2004b)). That is, CUSUM chart combinations based on sample means and squared deviations from target can be used in place of the respective EWMA chart combinations. The rational subgroup concept, as usually formulated, can be usefd in some contexts, but not useful in other contexts. The rational subgroup concept was originally formulated when Shewhart control charts were the only charts being used. EWMA and CUSUM control charts accumulate information over time, so what is being plotted is not just a function of the current process data. For example, an EWMA control statistic plotted after a process change will contain information from both the in-control and the out-of-control distribution, so there is no way to avoid violating the rational concept that stipulates that what is plotted should be affected by just the out-of-control distribution. When n = 1 is used in EWMA or CUSUM control charts, we still have the issue of avoiding a sampling plan that samples from multiple streams. Perhaps the idea that a process change should affect a complete sample should be replaced with the idea that that sampling should be done so that all observations taken during the time that the special cause is present should be affected by this special cause.
References 1. Crowder, S. V. (1987), "A Simple Method for Studying Run-Length Distributions of Exponentially Weighted Moving Average Charts," Technometrics, 29,40 1-407.
Domangue, R. and Patch, S. C. (1991), "Some Omnibus Exponentially Weighted Moving Average Statistical Process Monitoring Schemes," Technometrics, 33,299-3 13. Gan, F. F. (1995), "Joint Monitoring of Process Mean and Variance Using Exponentially Weighted Moving Average Control Charts," Technomefrics, 37,446-453. Hawkins, D. M., and Olwell, D. H. (1998), Cumulative Sum Control Charts and Chartingfor Quality Improvement, New York: Springer-Verlag. Lucas, J. M., and Saccucci, M. S. (1990), "Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements," Technometrics, 32, 1-12. MacGregor, J. F. and Harris, T. J. (1993), "The Exponentially Weighted Moving Variance", Journal of Quality Technology, 25, 106-118. Montgomery, D. C. (2005), Introduction to Statistical Quality Control, 5th Edition, New York: Wiley. Morais, M. C., and Pacheco, A. (2000), "On the Performance of Combined EWMA Schemes for p and o: A Markovian Approach," Communications in Statistics Part B - Simulation and Computation, 29, 153-174. Reynolds, M. R., Jr. (1995), "Evaluating Properties of Variable Sampling Interval Control Charts," Sequential Analysis, 14, 59-97. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2001a), "Individuals Control Schemes for Monitoring the Mean and Variance of Processes Subject to Drifts," Stochastic Analysis and Applications, 19, 863-892. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2001b), "Monitoring the Process Mean and Variance Using Individual Observations and Variable Sampling Intervals," Journal of Quality Technology, 33, 181-205. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2004a), "Control Charts and the Efficient Allocation of Sampling Resources," Technometrics, 46,200-214. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2004b), "Should Observations be Grouped for Effective Process Monitoring?" Journal of Quality Technology, 36, 343-366. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2005), "Should Exponentially Weighted Moving Average and Cumulative Sum Charts Be Used With Shewhart Limits?" Technometrics, 47,409-424. Reynolds, M. R., Jr., and Stoumbos, Z. G. (2006), "An Evaluation of an Adaptive EWMA Control Chart," preprint. Roberts, S. W. (1959), "Control Charts Based on Geometric Moving Averages," Technometrics, I, 239-250. Ryan, T. P. (2000), Statistical Methods for Quality Improvement, 2nd Edition, New York: Wiley. Shamma, S. E. and Amin, R. W. (1993), An EWMA Quality Control Procedure for Jointly Monitoring the Mean and Variance", International Journal'of Quality and Reliability Management, 10,58-67.
19. Shewhart, W. A. (1931), Economic Control of Quality of Manufactured Product, New York: Van Nostrand. 20. Stoumbos, Z. G., Mittenthal, J., and Runger, G. C. (2001), "Steady-StateOptimal Adaptive Control Charts Based on Variable Sampling Intervals," Stochastic Analysis and Applications, 19, 1025-1057. 21. Stoumbos, Z. G., and Reynolds, M. R., Jr. (1996), "Control Charts Applying a General Sequential Test at Each Sampling Point," Sequential Analysis, 15, 159-183. 22. Stoumbos, Z. G., and Reynolds, M. R., Jr. (1997), "Control Charts Applying a Sequential Test at Fixed Sampling Intervals," Journal of Quality Technology, 29, 21-40. 23. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2000), "Robustness to Nonnormality and Autocorrelation of Individuals Control Charts," Journal of Statistical Computation and Simulation, 66, 145-187. 24. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2001), "The SPRT Control Chart for the Process Mean with Samples Starting at Fixed Times," Nonlinear Analysis: Real World Applications, 2, 1-34. 25. Stoumbos, Z. G., and Reynolds, M. R., Jr. (2005), "Economic Statistical Design of Adaptive Control Schemes for Monitoring the Mean and Variance: An Application to Analyzers," Nonlinear Analysis: Analysis: Real World Applications, 6, 8 17-844. 26. Stoumbos, Z. G., Reynolds, M. R., Jr., Ryan, T. P., and Woodall, W. H. (2000), "The State of Statistical Process Control as We Proceed into the 21st Century," Journal of the American Statistical Association, 95,992-998. 27. Stoumbos, Z. G., Reynolds, M. R., Jr., and Woodall, W. H. (2003), "Control Chart Schemes for Monitoring the Mean and Variance of Processes Subject to Sustained Shifts and Drifts," in Handbook of Statistics: Statistics in Industry, 22, eds. C. R. Rao and R. Khattree, Amsterdam, Netherlands: Elsevier Science, 553-571. 28. Stoumbos, Z. G., and Sullivan, J. H. (2002), "Robustness to Non-Normality of the Multivariate EWMA Control Chart," Journal of Quality Technology, 34,260-276. 29. Yashchin, E. (1993), "Statistical Control Schemes: Methods, Applications and Generalizations," International Statistical Review, 6 1,41-66.
Economic Advantages of CUSUM Control Charts for Variables Erwin M. ~ a n i g a ' Thomas , P. ~ c ~ i l l i a mDarwin s ~ , J. ~ a v i s ~ and James M. ~ u c a s ~
' University of Delaware, Dept. of Business Administration, Newark, DE 19716, USA [email protected]
Drexel University, Department of Decision Sciences, Philadelphia, PA 19104, USA [email protected]
University of Delaware, Dept. of Business Administration, College of Business and Econmics, Newark, DE 19716, USA [email protected]
J.M. Lucas and Associates, 5120 New Kent Road, Wilmington, DE 19808, USA [email protected]
Summary. CUSUM charts are usually recommended to be used to monitor the quality of a stable process when the expected shift is small. Here, a number of authors have shown that
the average run length (ARL) performance of the CUSUM chart is better than that of the standard Shewhart chart. In this paper we address this question from an economic perspective. Specifically we consider the case where one is monitoring a stable process where the quality measurement is a variable and the underlying distribution is normal. We compare the economic performance of CUSUM and X charts for a wide range of cost and system parameters in a large experiment using examples from the literature. We find that there are several situations in which CUSUM control charts have an economic advantage over X charts. These situations are: 1. when there are high costs of false alarms and high costs of repairing a process; 2. when there are restrictions on sample size and sampling interval; 3 , when there are several components of variance, and; 4. when there are statistical constraints on ARL.
1
Introduction
Taylor (1968), Goe1(1968), Chiu(1974) and von Collani (1987) have addressed the problem o f economic design o f CUSUM charts for controlling a process where quality is measured by variables. All used simple models due to Duncan(1956) although Taylor's model does not include sampling costs. Virtually all other work on CUSUM design for variables data has been on statistical design. Lucas( 1976),
Wooda11(1986a), Gan(1991), Hawkins(l992) and Prabhu, Runger and Montgomery(1997), among others, have addressed the problem. Goel's (1968) conclusions were that CUSUM charts perform better in terms of ARL than F charts but the cost advantages were small unless there were smaller sample sizes used than optimal. He also finds that there are smaller sample sizes required for CUSUM charts than for X charts when they are designed to achieve the same in-control and out-of- control ARL's. Goel's small experimental study indicates that, within the range of the study, economically optimal CUSUMS behave much like an y chart; i.e. the reference value k is relatively large and the decision interval h is small. ChiuYs(1974)conclusions are similar. His small experiment of fifteen runs indicates that economic CUSUM designs are also characterized by small h values and by an out-of-control ARL of less than 1.25. The latter finding indicates that a small out of control ARL may be economically optimal as a constraint when designing CUSUM control charts. Unfortunately, Chiu's economic CUSUM designs can have very small in control ARL's; some examples have an in-control ARL of 21.2. These small values of an in-control ARL can lead to a loss of management trust in the control procedure as well as unnecessary process adjustments which can itself lead to an increase in process variability. Von Collani's(1987) conclusions are that Shewhart charts perform quite well in terms of costs when compared to more complex methods such as CUSUM charts and he supports this conclusion with several examples. The problem of a small in-control ARL as well as other problems associated with economically designed control charts have been addressed by Woodall(1986b). In particular, Woodall(1986a) has criticized the economic design of a CUSUM chart by pointing out an example of Chiu's (1974) in which the out of control ARL S for a statistically designed CUSUM chart are better than the out of control ARL's for an economically designed CUSUM control chart, especially if a shift apart from the expected shift occurs. His argument is that the small increase in cost resulting from the use of the CUSUM chart "appears to be a small price to pay for the increased sensitivity of the procedure" (Wooda11(1986a, p.101). Nonetheless, as Chiu (1974, p 420) argues, "the use of control charts is basically an economic problem" even though control charts are usually designed statistically if a formal procedure of design is used at all. Moreover, one can easily achieve desired ARL's by placing statistical constraints on the economic model that ensure specific in control and out of control ARL limits are achieved, a solution proposed by Saniga(1989). Interestingly, he has shown that tighter than desired statistical designs can be economically advantageous. In this paper we look at the problem of economic CUSUM design on a larger and broader scope than either Goel (1968), Chiu (1974) or von Collani(1987).
Our purpose is to investigate CUSUM design using a general economic model due to Lorenzen and Vance (1986). We solve the economic design problem for a wide range of input parameters using a Nelder Meade(1965) search procedure: ARL S are calculated using the Luceno and Puig-Pey (2002) algorithm. Specifically, we address a number of issues concerning CUSUM chart design and, in addition, the policy decision of choosing a CUSUM chart versus an X bar chart. These issues are: 1. In what regions of cost and other input parameters are there cost advantages to CUSUM charts versus X charts? 2. What form do economically designed CUSUM charts take in terms of the reference value, decision interval and ARL 's. Are the conclusions of Goel(1968) and Chiu(1974) accurate over a wide range of examples? 3. What part do restrictions on sample size and the sampling interval have on these decisions? 4. Does the presence of several components of variance affect the conclusions in 1 above? 5. What is the effect of statistical constraints on the economic decision when using a CUSUM chart versus an X chart? In the next section we present the economic model we employ to answer the above questions and discuss the algorithm used to find economic designs. In section 3 we present results that are used to answer the questions posed above. In section 4 we address the variance component problem. Section 5 addresses the issue of statistical constraints. Finally, in section 6 we draw some brief conclusions.
2
The economic model
Lorenzen and Vance(1986) developed a general form of the popular Duncan(1956) model for the economic design of a control chart. This model contains all costs associated with using a control chart to maintain current control of a process including costs of nonconformities, costs of sampling of sampling and inspection, costs of false alarms and the costs for locating and repairing the process. The Lorenzen and Vance model is more general in the sense that production can continue or be stopped during a search for the assignable cause and production can be either continued or stopped during repair. Otherwise, the standard assumptions in economic control chart design are made; these are that the time in control follows the negative exponential distribution, a single shift of known size can occur, and the other cost and system parameters are deterministic. The model is defined as follows where C is the expected cost per hour:
+ TI + T2}+ {[(a + bn)lh]
x llh - t + nE + h(ARL2) + 61 TI + d2T2]} +{l/h + ( 1 - Gl)sTolARLl - t + nE + h(ARL2) + TI + T 2 } .
The terms are: n = sample size. h = hours between samples. L = number of standard deviations from control limits to center line for the chart. kreference value for the CUSUM chart h=decision interval for the CUSUM chart g; intersample interval t= [l - (1 + h )e-'q/[ h(l - e-*g)]. s = e'Agl(l - e' g). ARLl= average run length while in control ARL2= average run length while out of control. h = llmean time process is in control. 6 = number of standard deviations slip when out of control. E = time to sample and chart one item. To= expected search time when false alarm. TI= expected time to discover the assignable cause. T2= expected time to repair the process. dl= 1 if production continues during searches = 0 if production ceases during repair. a2= 1 if production continues during repair = 0 if production ceases during repair. Co=quality costlhour while producing in control. CI= quality costlhour while producing out of control (> Co). Y= cost per false alarm. W= cost to locate and repair the assignable cause. a= fixed cost per sample. b= cost per unit sampled.
x
8
We calculate ARLs for cusum charts using an algorithm developed by Luceno, A. and J Puig-Pey(2002). In our study we use zero state ARL 's following the argument by Taylor(1968) in which he supports this contention with results from a full factorial experiment.
x
The program used to find optimal and cusum-chart design parameters searches over all possible sample size values in a user-specified range. Given the sample size n, the Nelder-Mead simplex algorithm is used to find the optimal control chart parameters (L for the 2-chart, k and h for the cusum chart). We initially experimented with two methods for determining the optimal time between samples g. The first was to include g as a parameter in the simplex search, the second was to use an approximation given by McWilliams (1994)). The approximation
was seen to be quite accurate for virtually all cases considered and its use led to a substantial reduction in computing time, so it is used to find an initial solution in the final version of the program. The few cases observed where the approximation procedure failed were cases where the optimal value of g was considerably larger than the mean time to the occurrence of an assignable cause, so if the initial approximate solution has this property then the program switches to the approach of including g in the simplex search. Finally, the approach of monitoring the process by simply searching for an assignable cause every g hours, without taking a sample, was always included as an option in the search for an optimal plan as in some cases this is indeed the cost-minimizing strategy. The algorithm also incorporates the option of constraining the intersample interval g and/or adding constraints on in- and out-of-control average run lengths. This is done via the use of a penalty term added to the calculated cost function value indicating the extent to which a constraint or series of constraints is violated.
3
Results
We found economic designs for CUSUM charts and for X charts for 216 configurations of cost and system parameters based upon Chiu's(1974 ) first example. Some of these results are given in Table 1. In this experiment we set CI =100,200,300,500,750,1000, b=0.1,0.5,1.0, =0.01,0.05, and 8 =0.5(0.5)3. All other parameters were the same as ChiuYs(1974);note that here we set 8 = 6 =O. For these examples the cost advantages of the economically designed CUSUM control chart were small when compared to the economically designed X chart. The largest difference in cost was a cost increase of 0.15 1 percent gained from using an X chart versus a CUSUM chart. This example was one in which the expected shift was very small; i.e. 6 =0.50. For both charts the in control ARL's were very small, 23.1 and 23.9 respectively for the and CUSUM chart while the out of control ARL 's were 1.17 and 1.18. Small in control ARL 's have caused several researchers, in particular Woodall (198613) to question the validity of economic designs in general. The actual designs were ~ 3 8 L=2.02 , and g-2.89 for the 2 chart and n= 37, k-1.52, h= 0.53, and g= 2.83 for the CUSUM chart.
x
In this experiment the smallest difference in cost occurred not unexpectedly when 8 =3.0; this cost difference was 0.003 percent. The fourth example in Table 1 shows a case where the optimal policy is to search for an assignable cause every 4.06 hours without sampling. Thus, sampling may not always be the optimal policy. Our 216 examples show that in all cases economic CUSUM designs behave similarly to an X design in that k is relatively large when compared to h, a finding also noted by Goel(1968).
Futher experiments were done where we ran the same example above except that we set Y= 5, 200 and W = 1,l O,5O and 6 =l,2,3. We also tried the same example with 6 ,=0,1, 6 =0,1 and 6 = 1,2,3. In these examples the maximum difference in cost between the X chart and the CUSUM chart was 0.354 percent which occurred when 8 =1.0 and the cost of a false alarm was relatively high; here Y=200. For both charts the in control average run lengths for the economic designed charts was about 760, a result due to the high cost of a false alarm. We also ran 72 experiments using Lorenzen and Vance's example problem where we varied (Co, C,) =(0,835) and (114.2, 949), (Y, W)=(200,200),(977,977),(1500,1500), a=0,10,50,200 and 6 =0.5,0.86, 1.5. We present several results from this experiment in Table 2. In 16 of the 72 cases the X design had more than a 1% cost disadvantage. In all but one of these 16 cases the (Y,W)combinations were at the two highest levels. The most extreme cost disadvantage of the X chart occurred when Y=W=1500 and, again expectedly, 6 =0.5. This design is shown in the last row of Table 2. The cost advantage of the CUSUM chart was 9.4%. Here the design was n=20, g=2.99 and L=2.03. The respective in and out of control ARL's were 23 and 1.7. For the CUSUM chart the design parameters were n=l, ~ 0 . 1 k-0.25 , and h= 10.38. ARLS for the CUSUM chart were 1256 and 38. Note here the substantial cost savings of 9.4% coupled with a design that is more satisfying in terms of in control ARL albeit not as satisfying in the small intersample interval ofg=0.10.
x
In some applications there may be restrictions on sample size. Similarly, there may be a simultaneous bound on the intersample interval. To investigate the effect of these bounds we returned again to the experiment where we used Chiu's(1974) first example and varied C, ,b, 1 and 6 as above. In this set of 2 16 runs, though, we placed a constraint on n and g of n l2 and g 2 1. The results from this experiment, several examples of which appear in Table 3, were that in 34 of the 216 runs there is at least an hourly savings of 1% occurring from the use of a CUSUM chart. In 11 of these 34 runs there is at least a 3% cost savings. The most extreme savings of 10% occurred when 8 =1 rather than when 6 =0.5. (Here the design is to use n=2, g=l, k=0.707 and h= 1.75. ARL's are 27.6 and 3.2, the in control ARL making this design most likely not employable)
4
The impact of variance components In the examples we presented above we assumed x -N(P,O) when the process is in control and x -~(p+8o,a')
when the process is out of control. Now suppose we have several components of variance that cannot be eliminated (due to variations in materials, machines, workers, etc.). Here, if 0 2 B is the between sample variance and 0 2 w is the within sample variance, then, X - ~ ( , u ,+o0 2~W )~ . Then, the in control distribution of 2 is x - N( p, o 2e + o 2wmJ where no is the sample size used to estimate these components of variance. The out of control distribution is x N( 6 + p,o 2B + o 2W/nO)where 6 = ( a 2B + o 2w/nO)112 The standardized one sigma shift, i.e. 6 = 1, for both the and CUSUM chart is ( O 2B + o 2W/nO)1'2/(a 2B + a 2w/n)1'2
-
x
We present some examples of economically designed control charts in Table 4 that were designed when there were variance components. The other parameters were obtained using Chiu's (1974) example 1. Several interesting conclusions can be drawn from these few runs. First, there are more substantial cost savings resulting to the use of CUSUM charts when there are variance components then when there is a single component of variance. This is due to the fact that an increase in n does not affect o 2B SO the increased power of the CUSUM is more relatively advantageous here when compared to the situation examined in Table 1 where there is only the single component of variance. Note also the other advantages of the CUSUM in some of the examples. For example, consider the three examples in the set (second) of examples where Y=W=100.0. The cost advantage of the CUSUM ranges from 22 to 31% and while the ARL's out of control are similar the in control ARLs of the CUSUM range from 53 1 to 723 when compared to the optimal X designs whose out of control ARL's range from 98 to 123. It is also interesting to note that the optimal economic CUSUM designs are more in line with typical CUSUM designs in the relationship of k to h rather than the other economic designs we have found where the CUSUM performs much like an X chart.
5
Economic statistical design of CUSUM and charts
X control
Saniga (1989) has shown that one can eliminate many of the problems with the applicability of economic designs by solving optimizing ( 1 ) above with the addition of constraints of the type: ARLO,2 ARLboundO,, i=1,2 ,...,rn and,
ARLO 4 ARLbound,, j=1,2,..,q
where ARLbound are bounds on the ARL's. (Note that there can be any number of contstraints , represented here by m and q.) For example, one may wish to place a lower ARL bound on the in control ARL and an upper bound on the out of control ARL at the level of the expected shift. Note that the design solution that satisfies these bounds is a statistical design and note also that there can be many feasible statistical designs. One interesting aspect of designs that are found by optimizing an economic model coupled with statistical constraints is that the optimal design may be tighter than the statistical design that just meets the constraints. For example, an optimal design may have an in control ARL that is larger than the target value of the statistical design and the out of control ARL may be smaller than the target ARL. We solved several economic statistical design problems for CUSUM and X charts and present the results in Table 5. Once again, we explore these results in the context of a problem due to Lorenzen and Vance (1986). In this particular configuration of parameters the X control chart has an economic disadvantage over the CUSUM chart of a little more than 2% (Case 1 in the Table). It is interesting to note that neither design has an in control ARL that would be acceptable in practice; these are respectively 14 and 28. In cases 2 through 9 we placed constraints on the problem such that the in control ARL is at least 150. We also placed constraints on a small shift of little importance ( 8 = 0.1) such that either chart would not signal. This constraint was ARLS = 0.1 2 80-100 for the 8 cases with constraints. We also placed constraints on the ARL for a small shift of 6 = 0.5 of ARL 4 1.5-4 for various cases. Note from cases 2-4 that there is a substantial savings of as from 15-17% per hour gained from using the CUSUM chart instead of the 2 chart. Note also that this advantage increases as one places a higher ARL constraint on the small shift of little importance. The advantage of the CUSUM decreases from 17% to 2% as the constraint on the out of control ARL for 6 = 0.5 becomes more restrictive. Note from the designs themselves that the economic statistical designs are characterized by nearly 3 o control limits but large n for the F chart while the CUSUM designs can result in nearly a 100% reduction in sample size over the 7 design in some cases (2,3,4). It is interesting to compare the average time to signal a shift, or ATS= gARL, for the economically optimal designs. In case 4 the ATS's for the three levels of 6 = 0.0,0.1,0.5 are respectively 425,196 and 5.6 for the X chart and 292,119 and 6.5 for the CUSUM chart. So the CUSUM chart has considerable worse ATS's but is cheaper. In case 5 on the other hand the X ATSS are 425,196 and 5.6 while for the CUSUM chart the ATS's are 475,168 and 4.9. Thus, here the X chart is almost 13% more costly and has ATS's not as good as those of the CUSUM except for the very small shift.
Generally, this small experiment shows that the addition of statistical constraints on ARL can result in a cost advantage for the CUSUM chart.
6
Summary
In this study we discuss some situations in which there are economic advantages to using CUSUM charts over charts to monitor a stable process. These situations are:l. when there are high costs of false alarms and high costs to locate and repair the assignable cause of poor quality; 2. when there are restrictions on sample size and the intersample interval; 3. when there are several components of variance; and 4. when there are constraints on ARL. Some numerical results we present show the magnitude of the economic advantage to the CUSUM chart.
x
References 1. Chiu, W.K. (1974) The Economic Design of CUSUM Charts for Controlling Normal Means. Applied Statistics 23,420-433 2. Duncan, A.J. (1956) The Economic Design of X bar Charts Used to Maintain Current Control of a Process. Journal of the American Statistical Association 5 1,228-242 3. Gan, F.F. (1991) An Optimal Design of CUSUM Quality Control Charts. Journal of Quality Technology 23,279-286 4. Goel, A.L. (1968) A Comparative and Economic Investigation of X bar and Cumulative Sum Control Charts. Unpublished PhD dissertation, University of Wisconsin. 5. Hawkins, D.M. (1992) A Fast Accurate Approximation for Average Run Lengths of CUSUM Control Charts. Journal of Quality Technology 24,37-43 6. Lorenzen, T.J., Vance, L.C. (1986) The Economic Design of Control Charts: A Unified Approach. Technometrics 28,3-10. 7. Lucas, J.M. (1976) The Design and Use of V-Mask Control Schemes. Journal of Quality Technology 8, 1-12 8. Luceno, A., Puig-Pey, J. (2002) Computing the Run Length Probability Distribution for CUSUM Charts. Journal of Quality Technolqgy 34,209-215 9. McWilliams, T.P. (1994) Economic, Statistical, and Economic-Statistical Chart Designs. Journal of Quality Technology 26,227-238 lO.Nelder, J.A., Meade, R. (1965) A Simplex Method for Function Minimization. The Computer Journal 7 1 l.Prabhu, S.S., Runger, G.C., Montgomery, D.C. (1997) Selection of the Subgroup Size and Sampling Interval for a CUSUM Control Chart. I.I.E. Transactions 29,45 1-457
x-
12.Saniga, E.M. (1989) Economic Statistical Design of Control Charts with an Application to X bar and R Charts, Technometrics 3 1 , 3 13-320 13.Taylor, H.M. (1968) The Economic Design of Cumulative Sum Control Charts. Technometrics 10,479-488 14.Von Collani, E. (1987) Economic Process Control. Statistica Neerlandica 41, 89-97 15.Woodal1, W.H. (1986a) The Design of CUSUM Quality Control Charts. Journal of Quality Technology 18, 99-102 16.Woodall, W.H. (1986b) Weaknesses of the Economic Design of Control Charts (Letter to the Editor). Technometrics 28,408-410
Table 1 OPTIMAL CUSUM AND CONTROL CHART DESIGNS CHIU (1974) EXAMPLE 1
TABLE 2 OPTIMAL ECONOMIC CUSUM AND CONTROL CHART DESIGNS LORENZEN AND VANCE (1986) EXAMPLE
TABLE 3
x
OPTIMAL CUSUM AND CONTROL CHART DESIGNS CHUI (1974) EXAMPLE 1 n52,gzl
Table 4 OPTIMAL CUSUM ANDFCONTROL CHART DESIGNS WHEN THERE ARE TWO = 1) COMPONENTS OF VARIANCE (no = 4, dB
TABLE 5 OPTIMAL ECONOMIC-STATISTICAL DESIGNS FOR VARIOUS CONSTRAINTS
arameters
cusum parameters
*Binding Constraint Economic Design Inputs /2=0.02, 6 = 0 . 5 0 , E = t = t 1 = 0 . 0 8 3 3 , t t = 0 . 7 5 , C ~ = O , C =835.0, ~ Y = W=977.4,a= 10.0,b=4.22, & = 1, &=O
Choice of Control Interval for Controlling Assembly Processes Tomomichi Suzuki, Taku Harada, and Yoshikazu Ojima Tokyo University of Science, 2641 Yarnazaki, Noda, Chiba, 278-85 10, JAPAN, [email protected]
Many industrial products are produced by continuous processes. Time series analysis and control theory have been widely applied to such processes. There are also many industrial products that are produced by assembly processes. In those processes, time series analysis and control theory are usually not required. However, there exist assembly processes that do need time series analysis for effective process control. Problems are discussed when time-series analysis methodology is applied to specific assembly processes for effective process control, especially if the number of products is high. Influential factors like the control interval and dead time of the process under consideration are considered. Summary.
1 Introduction Time series analysis is applied in many fields and there are many authors such as Harvey (1993) and Akaike and Kitagawa (1994). In industrial fields, it is used mainly as a tool for system identification and controlling the process illustrated by Akaike and Nakagawa (1972). It is usually used in process industry, and not commonly used in assembly industry. The assembly processes themselves usually do not depend on time, but the cause of disturbances such as environmental factors depend on time. Therefore, appropriate use of time series analysis methods can lead to effective control of the relevant assembly process. To control a dynamic process effectively, there are two stages that should have been done. The first stage is the identification stage. Here the system model of the process is built, identified, and diagnosed. This is best done by thorough examination of the process and of a designed experiment for identification. The second stage is the controller design stage. Here the control equation is calculated based on the result of the first stage. From the viewpoint of time series analysis, the most important aspect of the assembly processes is that the number of the products is often very large. If each product is analyzed separately, the total size of the data for time series analysis will be extremely tedious, and the statistical model of the process will be very complicated and its validity will be in doubt. Hence the need to how to handle the
data in terms of time series analysis should be discussed. The proposed method considers such factors as the sampling interval, the control interval, and the dead time of the process. The proposal is presented through analysis of a precision instrument manufacturing process.
2 Controlling Assembly Processes 2.1 The Process
The process we have analyzed is the assembly process manufacturing components of a precision instrument originally presented by Ishii (1995). The main part of the component is the rotating part. The speed of the process is so high that it produces about fifty thousand components per day. The most important characteristic of the product is the number of rotation. The characteristic is measured automatically for each of the products. The number of rotation has a target value. Deviation from the target value can be compensated by adjusting the control variable which is the depth or position of the rotating part. The process is controlled by feedback control; that is adjusting the depth according to the measured value of characteristics. Because there is a pool for keeping the intermediate products, there is a delay between the input and the output of the process. 2.2 Practical Problems
When actually controlling the process, there are a number of difficulties to overcome. The process had been controlled by the control scheme developed heuristically by the operators. The control scheme considered the effects of the following factors. a) Accuracy of measuring the number of rotation that is the output characteristics of the product b) Speed of production which is very fast c) Process dead time that is the number of waiting parts in the pool whose size is usually below 100 The control scheme had been to adjust the control variable according to the average value for every 100 products. The adjustment of the control variable is automatically conducted. The optimum sampling interval for system identification is based on the sampling theorem cf. Goodwin and Sin (1984) and Astrom and Wittenmark (1995). To control this process, we need system identification and control. We need to obtain data, identify the process, formulate controller, and then apply the controller. This series of analyses are done for the same time interval. Since our aim is
to control the process, we investigate that time interval and call it 'control interval'. The most important factor for controlling this process is how to select the control interval since analyses for system identification and control simulation as well as daily operations will be performed based on the selected value of the control interval. The next chapter deals with the control interval and its effects.
3 Evaluating Effect of Control Interval 3.1 Obtaining the Noise Series We analyzed the data for four continuous days. The number of rotation is recorded for each of the product. The size of the data is approximately 280,000. After checking the outliers, 241,000 data are used for further analyses. The value of the control variable is not recorded, but it can be calculated by using the control scheme and the deviation from the target. The transfer function that is the relation between control variable and the process output was already identified by performing the experiment. We estimated the noise process. Since analyzing or controlling the process per each data is not practical at all, we calculated the average for every 10 data. As a result, the noise series of length 24,100 is obtained. The plot of the noise series is shown as Fig.1. From Fig. 1, we learn that the noise is varying gradually and waving largely.
Fig. 1. Plot of the Noise Series
3.2 Control Interval
We calculated the noise series that is made from the average value of each interval. The control intervals we investigated are shown in Table I. Table 1. Control Intervals Investigated
3.3 Identifying the Noise
Identifications of the obtained noise series are performed for each of the time intervals considered in chapter 3.3. We used ARIMA models discussed by Box et a1.[4]. Since identifications are performed for many series, we need to evaluate and compare the results. We used AIC as a goodness of fit measure. System identifications are performed for all the control intervals. After examining the results, the following two models are selected. Model 1: IMA(1,l) process Model 2: AR(2) process Since our aim is not focused only on determining the best fit model, but also on evaluating the effect of the control interval, we discuss two models. The values of AIC for those two models were actually close for most of the control intervals. 3.4 Effect of the Control Interval In this section, we discuss the effect of the control interval over MSE when MMSE controllers are used. 3.4.1 MMSE Controller
The model considered in this paper is described as equations (1) and (2). The system model is described as
where the noise N, is a linear stochastic process expressed as equation (2).
Y,: Process characteristic, Output Variable to be controlled. The objective of control is to keep this variable as close as possible to the target value, which is set to zero without losing generality. Each Y, is the average of all the latest measured characteristic for each of the time interval. X,: Manipulated Variable, Input This variable is adjusted in order to control the process output. N,: Process Noise ARIMA process with order p for AR, d for I, and q for MA, and its values considered as given. a,: White Noise The variance of the white noise is considered as given. $ Dead Time The lag between process input and the process output, which is considered as given. B: Backshift operator L2(B)ILI(B): Transfer Function This function expresses relation between input and output.
In equation (I), we consider the transfer function to be stable, when we write
L,(B)/L,(B) = v ( B )=v, +v,B+v2B2
+ - a -
, weassume f),l<m. j=O
When the process model is expressed in equations (1) and (2), then the MMSE controller is derived as shown in Box et al. (1994). It is given as equations (3) and (4).
where
',(')=
Y,+, + yl,+2'
+ ylf+$'
+
...
L , ( B ) = ~ + ~ , B ++.-.+I,Y,B~ ~~B~ As can be understood from equations (3) and (4), we assume both L4B) and L4(B) to be linear filters. When using the controllers given as (3) and (4), MSE of the process is given as
where 0*express the variance of the white noise.
3.4.2 MSEs of Model I
We applied the following procedure to calculate the MSE for each of the control interval for the Model 1 (IMA(1,l) process) described in chapter 3.3. Identify the transfer function L ~ ( B ) ~ ' ~ ~ ( B ) . Derive L3(B) and L4(B) using the noise model (Model 1 in this case) and the transfer function. Calculate the MSE (partial MSE) based on the identified model using equation (5) assuming that the model identified by procedures 1 and 2 are correct. Calculate the final MSE (total MSE) when the MMSE controller is used, that is the MSE considering all the products produced. Since the process itself does not depend on time, the transfer function is constant, so on& the determination of the dead time f is necessary. The MSE calculated by the procedure 3 can not be used directly to compare the effect of the control intervals because each data consist of different number of data. So we need the procedure 4 to consider the MSE of all the products. This is done by recalculating the MSE based on time interval of 10. We repeated the procedure for all the control intervals. The result is shown in Fig.2.
Fig. 2. MSEs of Model 1
From Fig.2, the following became clear. 1. Total MSE is at its best at time interval of 10. 2. The shorter the interval is, the smaller the MSE becomes. 3. Partial MSE is at its best at time interval of 100.
4. The curve of the partial MSE shows a decrease at time interval of 200. This is due to the fact that 200 is a multiple of 100 which is the process dead time. Therefore, the process should be controlled at time interval of 10 which is the shortest control interval considered in this case. 3.4.3 MSEs of Model 2 We calculated the MSE the same way as the previous section for the Model 2 (AR(2) process) described in chapter 3.3. The result is shown in Fig.3. From Fig.3, the following became clear. 1. 2. 3. 4.
Total MSE is at its best when time interval is 20. The curve of the total MSE is smooth. Partial MSE is at its best when time interval is 130. The curve of the partial MSE shows a decrease at time interval of 200. This also happened in the case for the model 1.
Therefore, the process should be controlled at time interval of 20. This is not the shortest control interval considered in this case.
Fig. 3. MSEs of Model 2
4 Conclusion We have clarified the practical problems when applying time series analysis to assembly processes. When production speed is high and all the process output are measured, then the choice of the control interval is the important problem. It is very important because daily operations and analysis for system identification will be performed based on this interval. When deciding the control interval, we have to consider the trade-off between adjusting frequently to adapt to process deviations and not adjusting too often to grasp the state of the process accurately.
References 1. Akaike, H., Kitagawa G (1994): Time Series Analysis in Practice I, Asakura
Publishing (in Japanese) 2. Akaike, H., Nakagawa (1972): Statistical Analysis and Control of Dynamic Processes, Saiensu publishing (in Japanese) 3. Astrom, K.J., Wittenmark B (1995): Adaptive Control, Addison-Wesley 4. Box, G.E.P., Jenkins, G.M., and Reinsel, G.C. (1994): Time Series Analysis (3rd edition), Prentice-Hall. 5. Goodwin, G.C., Sin, K.S. (1984): Adaptive Filtering Prediction and Control, Prentice-Hall. 6 . Harvey, A.C. (1993): Time Series Models (2nd edition), Harvester Wheatsheaf. 7. Ishii, D. (1995): Constructing feedback control system using auto checking machine, Proceedings of the 25thAnnual JSQC Congress, 67-70 (in Japanese)
Generalization of the Run Rules for the Shewhart Control Charts Seiichi Yasui, Yoshikazu Ojima and Tomomichi Suzuki Department of Industrial Administration, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba, 278-8510, JAPAN yasuiQia.noda.tus.ac.jp
Summary. It is well-known that the Shewhart control charts are useful to detect large shifts of a process mean, but it is insensitive for small shifts and/or other types of variation. We extend the Shewhart's three sigma rule and propose two new rules based on successive observations. One is that a signal occurs when m successive observations exceed kl sigma control limit. The other is that a signal occurs when m - 1 of m successive observations exceed kz sigma control limit. The original Shewhart control chart is included in the first generalized rule as m = 1. The performance of the proposed rules is evaluated under several out-of-control situations by both the average run length and the standard deviation of the run length. These rules are more powerful than Shewhart's three sigma rule at detecting moderate step shifts.
1 Introduction The Shewhart control charts with three sigma rule are widely used in industrial fields to monitor a process mean. When a single point exceeds the three sigma control limit, the process is found to be out-of-control. The Shewhart control charts are convenient for practitioners and they have high ability t o detect large shifts of the process mean. However, they are insensitive for small shifts and/or other types of process variation. The exponentially weighted moving average (EWMA) control chart, the cumulative sum (CUSUM) control chart and supplementary rules for Shewhart control charts are famous for high power of detecting small shifts. They are effective, because the current plot point includes information of past observations. However, it is not so easy to use these control charts in practical fields, and it is undesirable to use the supplementary rules in combination with the three sigma rule, because the in-control type I error is inflated. Champ and Woodall (1987) calculates the inflated in-control type I error. Therefore, we propose two kinds of the new rules extending the Shewhart three sigma rule, based on m successive observations. One is that the process is found t o be out-of-control when m successive observations exceed kl sigma
control limit. This type is natural generalization, since this is the same as the Shewhart three sigma rule in m = 1. We call this GRl(m). The other is that the process is found to be out-of-control when m - 1 of m successive observations exceed kz sigma control limit. This type is the variation of GRl(m). We call this GR2(m).The constants kl and k2 are determined so that the average run length (ARL) is 370.4 ,which is equal to the Shewhart's three sigma rule. Klein (2000) proposed two different Shewhart control charts with rules using successive points. The Klein's rules are similar to G R l ( m = 2) and GR2(m = 3 ) . In the first, either two successive points above an upper control limit or two successive points below a lower control limit are needed to obtain an out-of-control signal. In the second, the out-of-control signal is given if two of three successive points are above an upper control limit or two of three are below a lower control limit. These rules are the only special cases of our proposed rules. We can deal with the general m. Page (1955) proposed the Shewhart control charts with warning lines and action lines. The rule is that the action is taken if any point falls outside the action lines or if any k out of the last n points fall outside the warning lines. The rules without the action lines are the same as our rules, however, the Page's procedure can not obtain warning and action lines for any n, k. Consequently Page (1955) proposed the rules only for k = 2, any n or k = n. Chaing and Niu (1981) and Derman et al. (1981) argue over consecutive k-out-of-n : F system in reliability. Consecutive k-out-of-n : F system doesn't work, if consecutive k components fail. Hence, to obtain reliability for consecutive k-out-of-n : F system is equivalent to calculating the probability that consecutive k failures occur in the Bernoulli trials of n times. The sequence appearing in the Shewhart control chart needs to have the successive m run exceeding the control limit in the last subsequence. Therefore, it is impossible to apply them to our study. Kolev and Minkove (1997) obtains the distributions related to success runs of length k in a multi-state Markov chain. One of the obtained distribution is the distribution of the number of runs until the first occurrence of the kth consecutive success. This distribution can be applied to G R l ( m ) , however, not to GR2(m). Furthermore, its form is complicated. Our proposed rules are constructed by using an absorbing Markov chain without complicated calculation. In the next section, we obtain control limits for GRl (m), m = 2,. . . , 8 and GRz(m),m = 3,. . . ,8. The performance of the proposed rules is evaluated by both out-of-control ARL and out-of-control standard deviation of the run length (SDRL) under several out-of-control situations which will be described in Section 3.
2 Calculating Control Limits in the Generalized Run Rules Observations are expected to have an independent and identical normal distribution under an in-control process. We assume that the observations have mutually independent N ( p O ,u2). The coefficients kl and k2 can be calculated by an absorbing Markov chain. Let Xt = ( X t , . . . ,Xt+,-1) be the vector relating to observations with m elements at time t (t = 1, . . . ). The each element of the vector is a binary value with the condition 1
x t = {0
the observation exceeds the control limit at time t otherwise.
Then X t is a m dimensional Markov chain. Let Si be the i th state (i = 1, . . . , n) and then the state space is {S1, . . . , S,). The states are obtained by enumerating all the possible combinations of m binary values and combining equivalent states. For GR2(m) the absorbing state like (1,1,0) is removed. In m = 2 of GRl(m), the apparent total number of the states are four and these are S1= ( 1 , l ) , S2= (0, I ) , S3= (1,O) and S4= (0,O). However, the S3is equivalent to the S4because the value of Xt doesn't related to the state Xt+1 after transiting. Hence all the states are S1 = (1, I ) , S 2 = (0, I ) , S3 = (*,O) ,where the symbol "*" denotes the value 0 or 1. The S1 is an absorbing state and then an out-of-control signal has occurred. In m = 3 of GR2(m), S1= (0,1,1) and S2= (1,0,1) are absorbing states, and Sg= (0,0, I ) , S4 = (0,1,0), S5 = (1,0,0), S s = (0,0,0) are transient states. S5is equivalent to S 6 . Hence S5= (*, 0,O) is redefined by combining S5 and S s . In addition, it is possible that X t is So= ( 1 , l ) only a t t = 1. Hence, XI E {So,S 1 , . . . , S,) and Xt E { S l , . . . , S,) for t > 1 in GR2(m) rule. The Markov chain is homogeneous under an in-control process. Hence, the transition probability is stationary and the transition matrix doesn't depend on time. The each element of the transition matrix is the probability transiting from S, to S,. In m = 2 of GRl (m), the transition matrix P is
where p = Pr[Xt = l(the observed observation will exceed the control limit)]. The mean absorbing time is (I - Q)-I J, where Q is the matrix of probabilities above with the first row and column removed from P and J is the column vector with the elements of 1. The in-control ARL can be calculated based on the mean absorbing time. Let the absorbing time of a Markov chain starting from the initial transient state Si be ati. If the run length under the initial transient state Si is rli, ati
is equal to (rli - m). Hence the mean absorbing time E(ati) = E(rli) - m. E(rli) is the average run length under the initial transient state Si,and then the in-control ARL under the initial transient state Si is
The in-control ARL for GRl ( m ) is
If the initial state is the transient state, E[rlIX1 = Si]is the formula (2). If the initial state is the absorbing state, E[rlIX1 = Si]is m. The in-control ARL for GR2(m) is l {So,S l , . . . , S,)] ARL = E [ ~ l l X E n
=
C E[rEIX1 = Si]P r [ X l = Si].
(4)
i=O
If the initial state is the transient state, E[rlIX1 = Si]is the formula (2). If the initial state is the absorbing state except for So,E[rlIX1 = Si] is m. In X1 = So,E[rlIX1 = So]is m - 1. The formulae (3) and (4) are functions of p. The probability satisfying ARL = 370.4 that an individual observation exceeds the control limit is numerically calculated in the range [O,1]by the Newton-Raphson method. Hence we calculate kl and k2 as @-l(1- ( p / 2 ) ) ,where @ is the cumulative distribution function of the standard normal distribution. These results are shown in Tables 1 and 2. Table 1. Control limit coefficients kl in G R l ( m )
Table 2. Control limit coefficients kz in G R z ( m )
3 Evaluation There are several variations of the process mean in an out-of-control process by some causes. We consider the typical ones, i.e. step shift, trend, and between group variation. These are sustained shifts. When Yt is the observation in an out-of-control process, these are modeled as follows, respectively,
fi Yt Yt
--
+
N(p0 60, 0 2 ) ,6 : the amount of the shift N(p0 ( p t ) ~0 2, ) ,P : the proportionality constant N(po, (y2 l ) 0 2 ) ,y : the ratio of variation
+
+
We call these models the step shift situation, the trend situation and the between group variation, respectively. The model of between subgroup variation is & = po at etra N(0, y u 2 ) , & N ( 0 , u 2 ) . Therefore y is the ratio of between group variation and the variation of individual observation. The Shewhart control chart with the generalized rules is evaluated by the outof-control ARL and the out-of-control standard deviation of the run length (SDRL). The out-of-control ARL and SDRL are defined as ARL and SDRL until the variation of the process mean is detected, assuming that has already been occurred in starting the control chart.
+ +
-
-
3.1 T h e S t e p Shift Out-of-Control Situation
The out-of-control ARLs under the step shift situation, in GRl(m) are shown in Table 3. m = 1 is the Shewhart's three sigma rule. These ARLs are obtained by substituting p with pl in the transition matrix, where pl = Pr[Yt is the out of control]. The ARLs in S = [0.75,2.50] are minimalized in case of m = 2, and those in other range are minimalized in case of m = 1. The ARLs in S = [0.75,2.00]of m = 3 are less than those of m = 1. The ARLs for any 6 are increasing when m is increasing (m = 2, . . . '8). Hence m = 2 of GRl(m) is effective for the step shift in S = [0.75,2.50]. The SDRLs for GRl(m) and GR2(m) in the step shift situation can be mathematically calculated by an absorbing Markov Chain approach. The variance of the run length for GRl(m) is
where M = ( I - Q)-', z = M J and ~ C is T the row vector with the elements of Pr[X1 is the transient state S,].The variance of the run length for GRz(m) is x~(2M - 1 )-~( 7 ~ ~ 2 ) ' n O ( l2nTz), (6)
+
+
where no = Pr[X1 = So]. Hence, the SDRLs for G R l ( m ) and GR2(m) are the square root of the formula (5) and ( 6 ) , respectively. Table 4 shows the out-of-control SDRLs under the step shift situation in GRl (m). In 6 0.75, all the GRl(m) rule have the smaller standard deviation
>
than the Shewhart's three sigma rule, and the variation of the run length for m = 8 is the smallest. The out-of-control ARLs under the step shift situation in GR2(m) are shown in Table 5 . The first column m = 1 is the Shewhart's three sigma rule. These results are obtained by the same procedure as the case GRl(m). The ARLs of m = 3 in S = 0.25 and 0.50 are the smallest of ARLs for other m's. In the range 0.75 to 1.50, ARLs of m = 4 are the smallest. In the range 1.75 to 2.50, those of m = 2 are the smallest again and the Shewhart's three sigma rule ( m = 1) has the smallest ones in 6 2 2.75. The most ARLs in 6 = 1.0 to 2.0 for m 2 3 are smaller than that of m = 1. Hence Our proposed rules GRz(m), especially m = 3, are effective for the step shift of 6 = 1.0 to 2.0. The out-of-control SDRLs under the step shift situation in GR2(m) are shown in Table 6. The SDRLs for GR2(m),m = 3 , . . . , 8 are approximately the same, however, those are smaller than the SDRLs of the Shewhart's three sigma rule. Therefore, GR2(m = 3 or 4) rules can detect the step shift from small to large stably in aspect of both the average and the standard deviation of run length. Table 3. Out-of-control ARL under the step shift situation in GRl ( m )
3.2 The Trend Shift Out-of-Control Situation
Tables from 7 to 10 are the ARLs and SDRLs under the trend. The Markov chain isn't homogeneous under the trend, since the probability the observation is out of control at time t is a function on time t . In addition, it is non-linear because the observations have the normal distribution. Therefore it is difficult
Table 4. Out-of-control SDRL under the step shift situation in GRl(m)
Table 5. Out-of-control ARL under the step shift in GR2(m)
to calculate out-of-control ARLs and SDRLs by the absorbing Markov chain mathematically. We calculate them by Monte Carlo Method. The number of the replication is 10,000. The out-of-control ARLs for G R l ( m )are shown in Table 7 under the trend shift situation. The ARLs of m = 2 and 3 in the p's range 0.01 to 0.07 are slightly smaller than those of m = 1. In other words, the m = 2 and 3 are better at detecting the slowly increasing shift than the Shewhart's three sigma rule.
Table 6. Out-of-control SDRL under the step shift situation in GRZ(m)
The out-of-control SDRLs for GRl(m)are shown in Table 8 under the trend shift situation. There is hardly difference between GRl ( m )rules about the standard deviation of run length. On the other hand, SDRLs for GRz(m) shown in Table 10 are smaller than that for the Shewhart's three sigma rule in p 5 0.05. This result means that GR2(m)is stabler about detecting the out-of-control process, when the gradual trend shift occurs. Table 9 is the result of the out-of-control ARL for GR2(m)under the trend. The ARLs of m = 8 are similar to those of m = 1. The ARLs of m = 3 and m = 4 are the smallest of m's through all P's. Hence the generalized run rules GR2(m)have the higher performance than the Shewhart's three sigma rule. 3.3 The Between Group Variation Out-of-Control Situation
We show the performance of the generalized rules under the between group variation situation in from Table 11 to Table 14 . These results are calculated by Monte Carlo method with 10,000 replications. In Tables 11 and 13, the out-of-control ARLs of GRl(m)and GR2(m)are approximately equal to or larger than those of Shewhart's three sigma rule. In Tables 12 and 14, the SDRLs of GRl(m)and GR2(m)are also approximately equal to or larger than those of Shewhart's three sigma rule. Hence the generalized run rules are not good at detecting the between group variation very much.
Table 7. Out-of-control ARL under the trend situation in GRl(m) m kl
1 2 3 4 5 6 7 8 3.0000 1.9322 1.4514 1.1644 0.9704 0.8296 0.7224 0.6381
Table 8. Out-of-control SDRL under the trend shift situation in GRl ( m )
Table 9. Out-of-control ARL under the trend situation in GRZ(m)
Table 10. Out-of-control SDRL under the trend shift situation in GRz(m)
Table 11. Out-of-control ARL under the between group variation situation in
GRl (m)
Table 12. Out-of-control SDRL under the between group variation in GRl(m)
m k1
1 2 3 4 5 6 7 8 3.0000 1.9322 1.4514 1.1644 0.9704 0.8296 0.7224 0.6381
Table 13. Out-of-control ARL under the between group variation in GRz(m)
Table 14. Out-of-control SDRL under the between group variation in GRz(m)
4 Conclusions In the Shewhart control chart, an out-of-control signal occurs when the current observation exceeds three sigma control limit. We generalized this rule and proposed the new rules based on successive runs. False alarm rates of these rules are controlled to ARL = 370.4. The performance for our proposed rules is evaluated from the aspects of both the average and standard deviation of run length under the three out-ofcontrol situations. The ARLs and SDRLs for our rules ( G R l ( m )and G R 2 ( m ) ) are smaller than the Shewhart three sigma rules, when the amount of the step shift is moderate and the slope of trend is gradual. However, G R l ( m ) and G R z ( m )are not powerful for detecting large shifts, because of the affection of the lower limit m. The performances for Klein (2000) are better in step shift and trend shift situations than those for our rules. However our rules are better in the between subgroup variation situaiton. Klein' s two of two rule is that no out-of-control signal is given if one of two successive points is above an upper limit and the other is below a lower limit. It is also the same for the two of three rule. Thus foregoing results are obtained. Hence, some of generalized run rules are effective for small shifts and slowly increasing ( decreasing ) shifts. Especially effective are the rules that 2 of 3 successive observations exceed the 2.0698 sigma control limit and that 2 successive observations exceed the 1.9322 sigma control limit.
5 Acknowledgments The authors thank the referees for useful comments on an earlier draft.
References 1. Chaing, D, and Niu, S. C. (1981). "Reliability of consecutive-k-out-of-n : F System", IEEE. Trans. Reliability, Vol. R-30, No. 1, pp.87-89. 2. Champ, C. W. and Woodall, W. H. (1987). "Exact Results for Shewhart Control Charts With Supplementary Runs Rules", Technometrics, Vol. 29, No. 4, pp.393-399. 3. Derman, C., Lieberman, G. J. and Ross, S. M. (1981). "On the Consecutive-kof-n : F System", IEEE. Trans. Reliability, Vol. R-31, No. 1, pp.57-63. 4. Klein, M. (2000). "Two Alternatives to the Shewhart X Control Chart", Journal of Quality Technology, Vol. 32, No. 4, pp.427-431. 5. Kolev, N. and Minkova, L. (1997). "Discrete Distributions Related t o Success Runs of Length k in a Multi-State Markov Chain", Commun. Statist. Theory and Meth, Vol. 26(4), pp.1031-1049. 6. Page, E. S. (1955). "Control Charts with Warning Lines", Biometrika, 42, pp.243-257.
Part 2
On-line Control
2.3 Monitoring
Robust On-Line Turning Point Detection. The Influence of Turning Point Characteristics E. Andersson Goteborg University, Statistical Research Unit, PO Box 660, SE-405 30 Goteborg, Sweden [email protected] Summary: For cyclical processes, for example economic cycles or biological cycles, it is often of interest to detect the turning points, by methods for on-line detection. This is the case in prediction of the turning point time in the business cycle, by detection of a turn in monthly or quarterly leading economic indicators. Another application is natural family planning, where we want to detect the peak in the human menstrual cycle in order to predict the days of the most fertile phase. We make continual observation of the process with the goal of detecting the turning point as soon as possible. At each time, an alarm statistic and an alarm limit are used to make a decision as to whether the time series has reached a turning point. Thus we have repeated decisions. An optimal alarm system is based on the likelihood ratio method. The full likelihood ratio method is optimal. Here we use a maximum likelihood ratio method, which does not require any parametric assumptions about the cycle. The alarm limit is set to control the false alarms. The influence, on the maximum likelihood ratio method, of some turning point characteristics is evaluated (shape at the turn, symmetry and smoothness of curve). Results show that the smoothness has little effect, whereas a non-symmetric turn, where the postturn slope is steeper, is easier to detect. If a parametric method is used, then the alarm limit is set in accordance with the specified model and if the model is mis-specified then the false alarm property will be erroneous. By using the maximum likelihood ratio method, the false alarms are controlled at the nominal level. Another characieristic is the intensity, i.e, the frequency of the turns. From historical data we construct an empirical density for the turning point times. However, using this information only benefits the surveillance system if the time to the turn we want to detect, agrees with the empirical density. If, on the other hand, the turn we want to detect occurs "earlier than expected", then the time until detection is long. A method that does not use this prior information works well for all turning point times.
1 Introduction In many situations it is important to monitor a process in order to detect an important change in the underlying process. In statistical process control we often want to detect a change in the mean (or mean vector) or a change in the variance
(or covariance matrix), see Sepulveda and Nachlas (1997), Gob et al. (2001), Wu et al. (2002). In this paper we consider cyclical processes where it is of interest to detect the turning points. One example is turning point detection in leading economic indicators, in order to predict the turning point time of the general business cycle (Neftci (1982)). Another example is natural family planning, when the most fertile phase is preceded by a peak in a hormone (Royston (1991)). In public health it can also be of interest to monitor the influenza activity, in order to detect an outbreak or a decline as soon as possible (Baron (2002), Sonesson and Bock (2003)). In the financial market the aim is to maximize the wealth. An indicator is monitored with the aim of detecting the optimal time to trade a certain asset. Optimal times to trade are related to changes (regime shifts) in the stochastic properties of the indicator. Thus also here the aim is to detect the next turn (regime shift), see Bock et al. (2003), Dewachter (2001), Marsh (2000). The monitoring is made on-line and an alarm system is used to make a decision as to whether the change has occurred or not. The alarm system consists of an alarm statistic and an alarm limit and can be presented in a chart. The alarm statistic can be based on only the last observation (the Shewhart method, Shewhart (193 1)) or all the observations since the start of the surveillance, as is done in the CUSUM method (Page (1954), Woodall and Ncube (1985)) and the EWMA method (Roberts (1959), Morais and Pacheco (2000)). The alarm limit can be constant or time dependent. When the statistic crosses the limit there is an alarm, which can be false or motivated. The alarm limit is often set to control the false alarms, for example by a fixed average run length to first false alarm (ARLO). When evaluating a surveillance system it is important to consider the timeliness of the alarms, i.e. how soon after the change that a motivated alarm is called. Ideally we want an alarm system which has few false alarms and short delay for motivated alarms. It has been shown (Fris6n and de Mar6 (1991)) that optimal methods are based on the likelihood ratio between in-control and out-of control processes. Likelihood ratio methods are based on full knowledge of the process incontrol and out-of control, only the time of the change is unknown. When monitoring a cyclical process this means that the optimal method assumes knowledge of the parametric structure of the cycles. In a practical situation we estimate the parameters using previous data. Since the cycles often change over time, there is always a risk for mis-specification. Therefore we will use a nonparametric method, which is not based on any parametric assumptions. This nonparametric approach is compared to the optimal method. The effect of different data generating processes is investigated, as well as the effect of using prior information about the turning point time.
2 Statistical surveillance In the business cycle application, the process under surveillance, X, is a monthly (or quarterly) leading economic indicator. Each month (or quarter) the alarm system is used to decide whether there is enough evidence that the change (here a turn) has occurred yet. Thus we have repeated decisions and at each decision time we want to discriminate between the event that the process is in-control (denoted D) and the event that the process is out-of control process (denoted C). The following model for X at time s is used
where E is iid N(0; 0).At an unknown time z there is a turn in p. The assumptions in (1) can be too simple for many applications and extensions could be motivated by the features of the data or by theory. Model (I), however, is used here to emphasize the inferential issues. We want to control the risk of giving an alarm when there has been no turn. In hypothesis testing this is done by the size (a).In a situation with repeated decisions, if a should be limited (to e.g 0.05, see Chu et al. (1996)), then the alarm limit would have to increase with time. The drawback of this is that the time until the change is signaled can be long, for changes that occur late (see Pollak and Siegmund (1975), Bock (2004)). In much theoretical work on surveillance the false alarms are controlled by a fixed expected probability of a false alarm (see Shiryaev (1963), Frisen and de Mare (1991)). In quality control the ARLO is often fixed (the Average Run Length to the first alarm when the process is in-control). Hawkins (1992), Gan (1993) and Anderson (2002) suggest that the control instead be made by the MRL' (the median run length), which has easier interpretations for skewed distributions and much shorter computer time for calculations. The time of alarm, tA, is defined as the first time for which the alarm statistic crosses the alarm limit. Quick detection is important and one evaluation measure that reflects this is the conditional expected delay (see Fris6n (2003))
where T = time of turn. CED measures how long it takes before the system signals that a turn has occurred. Often the ARL' is reported (the average time to an alarm, given that the change occurred at the same time as the surveillance was started) and ARL' = CED(I)+l.
2.1 The likelihood ratio method
The surveillance method that is based on the likelihood ratio, f(x I C)/f(x I D), is optimal in the sense of a minimal expected delay for a given false alarm probability (Friskn (2003)). The events C and D are formulated according to the application. For example, when it is of interest to detect if the change (in p) has occurred at the current time, then C={T=S) and D={r>s), which implies p D : p ( l ) = p ( 2 ) = . . . = p (s ) = p 0 , s
.
p(1) = p(2) = ...= p(s-1) = p,, p(s) = p , ,
s2z.
where p(t) is the value of p at time t. If the observations are independent and C={z=s}, the likelihood ratio simplifies to
f (x(s>lp(s> = P, f lm = p , )
'
Thus for this situation it is optimal to use the Shewhart method where the alarm statistic is based only on the last observation, x(s). In this paper we are interested in detecting if there has been a change since the ) D={z>s). Then the likelihood start, i.e. C={zls}={{r=l}, {2=2},..., { ~ s } and ratio is based on all s partial likelihood ratios, i.e.
where x,={x(l), x(2), ..., x(s)} and wj=P(z=j)/P(r$s). We want to detect a turn in p (hereafter exemplified with a peak), thus we know that p is such that z > s (3a) ( 1 ) ( 2 )(
j
- l),(j-1 .
s),
z=j.(3b)
If the correct parametric structure of p is known (for example piecewise linear), then p, conditional on event D and C, can be specified
where
Po, PI and P2 are known and Cj denotes the event {z=j).
Given that the parametric model of p is known (for example.as above), the following surveillance method can be used (suggested by Shiryaev (1963), Roberts (1966)). The method is hereafter denoted the SR method and gives an alarm when
where c is a constant alarm limit. FrisBn and de Mare (1991) showed that using the SR method for surveillance is, under certain conditions, the same as using the posterior probability in a Hidden Markov Model approach (Hamilton (1989), Le Strat and Carrat (1999)). 2.2 The maximum likelihood ratio method
In practice p is seldom known but estimated from previous data. If the structure of p changes over time (as it often does, see Figure l), then an estimate based on previous data can result in a mis-specified p (further discussed in Section 3.3).
Fig 1: Swedish industrial production index, monthly data for the period January 1976 December 1993. Source: Statistics Sweden.
A surveillance method based on non-parametric estimation, without any parametric assumptions, was suggested by Frisen (1994). This method is based on the maximum likelihood ratio
max f(x,
Ic>
ID)
max f(xT
'
where the likelihoods are maximized under the order restrictions in (3). The aim is to detect a peak in p, from monotonically increasing to unimodal. Thus, we have
where pD and pC are unknown and estimated, unlike the likelihood ratio in (4) where pD and pC are known. The estimation is made using non-parametric regression under order restrictions, see Barlow et al. (1972) and FrisBn (1986). The solution technique is the pool adjacent violators algorithm, described in e.g. AD Robertson et al. (1988). For the monotonic case ( p ), also called isotonic regression, the aim is to estimate p(t)=E[X(t)] as a function of time, under order restrictions. This is a least square estimate where the sum of squares is minimized under the monotonicity restriction in (3a): (p(1)I p(2)< ...
zDminimizes
The unimodal case ( z q ) is described by Frisdn (1986) as consisting of an upphase where E[X(t)] is monotonically increasing with t, and a down-phase, where the regression is monotonically descreasing with t. Under assumption of a turn at time point j, we have a partition into monotonous phases and hence we can use the least square isotonic regression for each phase. This means that the sum of square should be minimized under the restriction in (3b). Just as ,hD, it is a least square estimate and for the model in (1) it is also the maximum likelihood estimate. Below, the isotonic and unimodal regressions are illustrated.
Fig. 2: The observations (unfilled circles) and the regression Cjoined filled squares), isotonic (IeR) and unimodal (right).
Following the suggestion by Frisen (1994), the estimates from isotonic and unimodal regression are used in a surveillance system, with the following alarm statistic
The method in (9, denoted MSR, was evaluated in Andersson (2002) and Andersson et al. (2005). The performance of the MSR method is illustrated in Figure 3. For the turn in the first period (the trough), the delay is 6 months. The delay is caused by the rounded turn. There are no false alarms, even though there are a few "outliers" (approximately at t=20). For the turn in the second period (the
peak), the delay is 7 months. It is especially noteworthy that there are no false alarms during this long period (96 months to the peak), since MRL' is set to 60 months.
Fig. 3: Swedish industrial production index (IPI), monthly data (Source: Statistics Sweden). Top: IPI for the period January 1991 - December 1997, and the MSR statistic for the same period (the alarm limit, for which MRL'=~O,is marked with a dashed line). Bottom: IPI for the period August 1993 - January 2003, and the MSR statistic for the same period.
3 Performance of the non-parametric method Different aspects of the performance of the MSR method are investigated in this paper. A comparison is made with an optimal method, which will act like the benchmark. The MSR method is also evaluated for different turning point characteristics and for different assumptions regarding the time of the turn (7).The advantages of the MSR method, which is based on a minimum amount of assumptions, are discussed.
3.1 Comparing the non-parametric method to an optimal method
The MSR method, (9,is compared to the SR method, see (4), which is optimal for situations where the intensity of a change is very small. In the SR method, the parametric structure of p is known, but the time of the turn, z, is unknown. The evaluation is made under assumption that there is a peak in p after 3 years. Observations are generated from model (1) where, without loss of generality, o = l . The vector p is modeled according to p(t) = Po + PR.t- 2 .DI.PR.(t-z+l), O,t
. The values = 0.26 1,trz and o=l are estimated from data on Swedish Industrial Production index, whereas is of no importance to the surveillance, since the data can easily be transformed by x(t)-x(0). Thus the SR method uses the correct model for p, as given above, whereas in the MSR method p is estimated, using order-restricted regression. As mentioned in Section 2, the alarm limit is set so that the system has a known false alarm property. Here, the alarm limits of SR and MSR are adjusted to yield a fixed value of the following false alarm probability, where
t
E (1,
2,
...I,
= 0.26, z = 36, Dl =
For the Shewhart method, when the observations are iid and normally distributed, the alarm limit can be determined analytically for a specified ARLO. But for the more complex alarm statistics of SR and MSR, the respective alarm limit is determined by simulations. The alarm limit c is determined in an iterative process, where different c-values yield different PFA-values. For two values c' and c" yielding PFA' and PFA" such that the difference (PFA'-PFA") is very small and PFA'<0.31
Fig. 4: The distribution hnctions for the false alarms, MSR (solid) and SR (dotted).
The performance of SR and MSR are compared by the conditional expected delay in (2), for the situation of a turn after 3 years (36 months). Table 1: Conditional expected delay, for a turn after 3 years
Method SR (parametric)
known
CED (months) 2.7
MSR (non-parametric)
estimated
4.1
u
CED determined by simulations, sd[CED]=O.O18 The CED is longer for the non-parametric MSR method. The alarm probability at different time points can be studied by the probability of successful detection, PSD, which measures the ability to detect the turn within a limited time interval, d (see FrisCn (1992)). In applications where the time for action is limited, for example natural family planning (Royston (1991)), PSD is a suitable measure for the success of the surveillance. The probability of successful detection is defined as
Fig. 5: The probability of successful detection, for MSR (solid) and SR (dotted).
The PSD for d={l, 2) is almostthe same for MSR and SR. The reason is that, just are still rather similar and therefore after the turn, the vectors pD and discrimination is difficult, even for SR. For detection within 3 up to 7 time units, SR is considerable better than MSR. Here the difference between pD and is clearer, thus benefiting SR, whereas for MSR the number of post-turn observations for estimating pCis still rather small, and therefore the estimated pDand are similar. 3.2 The effect of different turning point characteristics
An evaluation is made of how different turning point characteristics affect the detection ability of the MSR method. The MSR performance is evaluated for different shapes of the turn and for different assumptions regarding the changeprocess (the intensity). 3.2.1 The shape of the turn
The model in (I), with o=l,is used to simulate observations on X. Five different models are used for the vector p: the piecewise linear case in Section 3.1 (denoted Ref) and four other cases (A, B, C, D) with characteristics as presented below. p is symmetric, piecewise linear (Ref) p is symmetric, non-smooth (A) p is non-symmetric, non-smooth (B) p is symmetric, smooth (C) p is non-symmetric, smooth (D).
The y-vectors for these cases are presented in Figure 6 .
Fig. 6: Expected value for the turn. Right: Reference case (solid), Case A (dashed), case B (dotted). Left: Reference case (solid), case C (dotted), case D (dashed).
For case A the vector pA(t)is based on the seasonally adjusted Swedish Industrial Production index. The average slope of pA (both pre-peak and post-peak) is the same as for the reference case, i.e. for the pre-peak slope we have
.. where
a ( j ) = 0 , t.= 36. The p~ is symmetric around the turning point and has ,=I
a non-constant growth (modeled by a above), which is hereafter called nonsmoothness. ~ For case B and the pre-peak part of the vector ,ug,the same model as for , u is used, but the post-peak slope, t 2 t,is steeper
r+14
b 2 ( j ) = 0 , bl>O, r= 36.
where 1.r-l
For case C the vector k ( t ) is modeled using trigonometric functions, which results in a smooth curve with a rounded peak (i.e. the growth rate of ,u is decreasing continuously just before the peak time and increasing just after the peak time). The model used is
For the pre-peak part of the vector b,the same model as for is used, but the post-peak slope is steeper, by using a different parametric model than for the prepeak slope. The model used for h ( t ) is
The PFA is set to 0.3 1 for all investigated cases. The distribution functions of the false alarms are shown in Figure 7 and since the pre-peak slopes of case A and B are the same, and so are C and D, the figure only contains three functions.
Fig. 7: The distribution functions for the false alarm, for Reference (solid), A+B (dashed), C+D (dotted).
The non-smoothness of p,,, and p~ is reflected in the distribution function. For the four cases A, B, C and D the peak is rounded. As a result, the false alarm rate is increasing just before the peak. The performance of the MSR method for the different cases, is evaluated by the CED for a turn after 3 years (36 months). The CED is between 4 and 6 months for all cases, reflecting the fact that the nonparametric method is rather independent of the characteristics of the turn. However, there are some effects. A non-symmetric turn (case B and D) results in a lower CED than a symmetric turn (case A and C), since the alarm limit is set according to the pre-turn slope and variance. A rounded peak gives two opposite effects: i) The alarm statistic increases just before the turning point so that only a small increase in the alarm statistic at the next time point is needed to yield an alarm, with the result that CED decreases. ii) The peak is rounded also after the top, which affects the alarm statistic in the opposite direction, thus prolonging the CED. Generally, though, the CED is longer for a rounded peak (case A and C), compared to Ref.
Table 2: Conditional expected delay for MSR, for a turn after 3 years
Case Symmetric, piecewise linear (Ref)
CED (months) 4.1
Symmetric and non-smooth (A) Non-symmetric and non-smooth (B) Symmetric and smooth (C) Non-symmetric and smooth (D) CED determined by simulations, sd[CED]=0.018 A non-smooth p (case A and B) has a similar effect as an increased1variance so the alarm limits have to be higher in order to have the same PFA. The nonsmoothness does contribute a little in yielding a lower CED, but the effect is not large. For A and C the CED is rather similar, which indicates after adjusting the alarm limit to the larger variance, there is not much difference left. The difference in CED between B and D is larger, however, which is fhrther investigated in the PSD graphs below. At (almost) all measurement times the reference case and case B have the highest detection probability. Cases A and C (symmetric, with rounded top) are the most difficult to find. For the effect of non-smoothness, we compare case A to case C and case B to case D. The difference between the PSD curves for B and D is larger than between A and C, thus explaining that CED is the same for A and C, but different between B and D. The non-smoothness in B is modeled in such a way that the resulting variance is larger than it is in A. Thus, what seems to be a larger non-smoothness effect for B and D (compared to A and C) is really an effect of a larger variance.
- -D
.... C
Ref
d
Fig. 8: The probability of successful detection.
3.2.2 The intensity of the turns
The performance of a surveillance system does depend on the assumptions made regarding the distribution of the time of the turn (r). The likelihood ratio method is optimal for the correct distribution of r. For C={rls}, the s partial likelihood ratios of the full LR method are weighted together according to the distribution of r and the alarm limit is time dependent
where w,= P(z=j)/P(.rls) and k, = k.P(z>s)/P(zls), where k is a constant. The full LR method has the shortest expected delay for a given false alarm probability, if the correct distribution of T is used. If a geometric distribution is assumed for .r (Shiryaev (1963), Hamilton (1989)), then the intensity of a change, v,=P(.r=t I z2t), is constant. When v, tends to zero, Shiryaev (1963) and Roberts (1966) showed that the SR method is the limiting method. Thus, the SR method is optimal for situations where the intensity of a change is very small, but it works well even for intensities of 0.20 and 0.50, (Frisen and Wessman (1999), Andersson (2004)). By using an empirical density for T in the alarm system in (6), information from previous turning points is included in the alarm system. For the Swedish industrial production, quarterly data are available on the turning point times (National Institute of Economic Research (1992)). These data show that the probability is very high for a turn after 8-9 quarters (Figure 9). In our study the empirical density is approximated by the negative binomial probability mass function, and used on the method denoted MLRemp MLRemp(s)
III
f (x. /P = P" f (x. Ilr = b C 2 ) f ( x . = BCS) + ...+ W$ + w2 > k; f (x" la = P D ) f (x, l/r = P D ) fcx, la = b D )
= w,
where ,& is defined in Section 2.1 and w, and k', are defined as in (6), where .r follows the negative binomial distribution in Figure 9.
Fig. 9: Left: the empirical density of the time to the next turn (quarters) for the Swedish industrial production. Right: approximation of the empirical density by the probability mass function of the negative binomial distribution.
In the MLRemp method the highest weights are given to those partial LR components that represent a turn at 8-9. Thereby the detection power is concentrated to these time points, which historically' are the most likely turning points. The MLRemp method is compared to the MSR method with constant alarm limit and equal weights. The MSR method can be regarded as having a noninformative density for z and is denoted MSRnon hereafter. The false alarms are adjusted by setting the expected false alarm probability equal to 0.23. In the expected false alarm probability, the distribution of z is included
The performances of MSRnon and MLRemp are evaluated for a turn after 1 year, 2 years and 4 years.
Table 3: Conditional expected delay for different turning point times
Method
z
12
MLRemp
empirical
MSRnon
non-informative
estimated
CED 1 year 3.17
2 years 1.5 1
4 years 0.01
estimated
1.90
1.66
1.64
If the turn occurs early (1 year after the start of the surveillance) it takes a long time for MLRemp to detect it. The reason is the very low prior probability of an alarm at this early stage. If the turn occurs late, it is detected quickly by MLRemp. The predictive value measures the trust you can have in an alarm (FrisBn (1992)), and it often depend on the time of the alarm.
Table 4: Predictive value for different alarm times
Method
z
P
MLRemp
empirical
MSRnon
non-informative
estimated
PV 1 year 0.47
2 years 0.84
4 years 0.48
estimated
0.27
0.79
0.97
For early alarms the PV is higher for the MLRemp method, as a result of early alarms being so rare. But for late alarms the MSRnon has higher PV. The relatively low PV for MLRemp at late alarms is connected to the short CED here: at late time points the alarm limit is low, as a result of the empirical prior. Therefore MLRemp gives alarms even if nothing has happened. If the turn occurs after 2 years (8 quarters), which is the same as in the empirical density, then MLRemp has shortest CED and highest PV.
3.3 Advantages of a minimum assumption method
The MSR method is not based on any assumptions regarding p or z. This is an advantage if the assumptions are not well founded. 3.3.1 Assumptions about p
If the correct model of p is known, then the LR (or SR) method is optimal (the results on CED in section 3.1 illustrate the optimality of SR, from Frisdn (2003)). An obvious problem in surveillance is that models for p can only be based on data from previous cycles, since, naturally, complete data from the current cycle are not available at the start of the surveillance. There is always the risk that the current cycle is different from the previous ones, resulting in mis-specified models. For changes in expected value (p), the parametric likelihood ratio methods are based on the deviations between x and the specified expected value, (x-m). If m#p, then the model is mis-specified either in the parameter values or in the parametric shape. Mis-specified values, (Figure 10, left), affect the parametric methods (LR and SR) largely. If the alarm limit of the parametric method is not adjusted, then the false alarm property is much higher than the nominal. If the alarm limit is adjusted to the correct false alarm property, then we still have an increasing systematic difference between x and m, which leads to long delays for early turns and low predictive values for late alarms (see Taipei). In the MSR method, p is estimated and therefore this method is unaffected by misspecifications of the parametric values in m. The parametric shape can be mis-specified (Figure 10, right). For the parametric methods, if the alarm limit is not adjusted, the false alarm property is higher than the nominal, due to the rounded top producing too many false alarms. If the alarm limit is adjusted, it must be set much higher. This would again lead to long delays for early turns. As we have seen, the shape of the turn does only have a small effect on the MSR method. The nominal false alarm property is always valid (conservative), however.
Fig. 10: The true expected value, p, (solid) and the specified m, used in the parametric method (dotted). Left: the parametric values are mis-specified. Right: the parametric shape is mis-specified.
3.3.2 Assumptions about z
Regarding the intensity (the distribution of z), the advantage of using an empirical density for z is that the detection power is concentrated to the time points where it, historically, is most likely that the turn occurs. However the detection power at unlikely turning points is very poor. At early turns the delay is very long. An approach without any assumptions has high PSD and high PV for all time points.
4 Summary and discussion The performance of a non-parametric on-line monitoring method, for turning point detection in cyclical processes, is evaluated. The evaluation is made using measures that reflect timeliness, such as the expected delay of an alarm and the probability to detect a turn within d time units. An advantage with the non-parametric method is that there is no risk for misspecification of the parameters since no information from the previous cycles is used in the estimation. However, this means that all information comes from the current data and at the start of the surveillance or just after the turn, there are very few observations on which to base the estimate. The performance of the non-parametric method is compared to the optimal method. For early detection, the non-parametric method has almost the same detection probability as the optimal method. The reason is that, just after the turn, both methods have very limited amount of information for deciding that there has been a turn. Even for the optimal method, the p does not differ so much from preturn to post-turn. The detection ability for the non-parametric method is investigated for different characteristics of p. If the turn is non-symmetric in the sense that the post-turn slope is steeper than the pre-turn slope, then the turn is detected quicker. If the turn is non-symmetric in the other sense, then the detection probability is lower. There is a small effect of a rounded turn: it takes longer to detect, which is a result of the observations at this plateau are very similar, independent of whether the observations are from the pre-turn phase or the post-turn phase, thereby making it more difficult to discriminate between the phases. The non-smoothness only has a small effect. After adjusting the alarm limit for a higher variance (nonsmoothness), there is a little higher detection probability for a non-smooth turn. Information from the empirical distribution of the time of the turn is included in the non-parametric surveillance system and the performance at different turning point times is evaluated. Often the delay for an early turn is long since there are few observations available from which to make the inference. The method that uses the information from the empirical z-distribution has short delay and high predictive value when the actual turning point agrees with the empirical distribution. But when the turn occurs earlier than expected, the delay is long. Also, for late alarms, the predictive value for late alarms is low. A safe way is to use a method that does not use this prior information, since it performs well at all turning point times. A mis-specification of a parametric model in the surveillance system can lead to a false alarm property which is higher than the nominal. Even if the alarm limit is adjusted to yield the correct false alarms, the mis-specification can lead to bad properties, such as long delays for early turns. For the non-parametric method, the false alarms are really controlled at the nominal level, thus it is conservative in this sense.
Acknowledgements This work was supported by the Bank of Sweden Tercentenary Foundation and by the Swedish Council for Research in the Humanities and Social Sciences.
References Andersson, E. (2002) Monitoring cyclical processes. A non-parametric approach. Journal of Applied Statistics, 29, 973-990. Andersson, E. (2004) The impact of intensity in surveillance of cyclical processes. Communications in Statistics-Simulation and Computation, 33, 889-913. Andersson, E., Bock, D. and FrisCn, M. (2005) Statistical surveillance of cyclical processes with application to turns in business cycles. Accepted for publication in Journal of Forecasting. Barlow, R. E., Bartholomew, D. J., Bremer, J. M. and Brunk, H. D. (1972) Statistical inference under order restrictions, Wiley, London. Baron, M. (2002) Bayes and asymptotically pointwise optimal stopping rules for the detection of influenza epidemics. In Case Studies in Bayesian Statistics, Vol. 6 (Ed, Verdinelli, I.) Springer-Verlag, New York, pp. 153-163. Bock, D. (2004) Aspects on the control of false alarms in statistical surveillance and the impact on the return of financial decision systems. Research report 2004:2, Department of Statistics, Gdteborg University, Sweden. Bock, D., Andersson, E. and Frisen, M. (2003) The relation between statistical surveillance and certain decision rules in finance. Research report 2003:4, Department of Statistics, Goteborg University, Sweden. Chu, C. S. J., Stinchcombe, M. and White, H. (1996) Monitoring Structural Change. Econometrica, 64, 1045-1065. Dewachter, H. (2001) Can Markov switching models replicate chartist profits in the foreign exchange market? Journal of International Money and Finance, 20,25- 41. FrisCn, M. (1986) Unimodal regression. The Statistician, 35,479-485. Frisen, M. (1992) Evaluations of Methods for Statistical Surveillance. Statistics in Medicine, 11, 1489-1502. Frisen, M. (1994) Statistical Surveillance of Business Cycles. Research report 1994:l (Revised 2000), Department of Statistics, Goteborg University, Sweden. Frisen, M. (2003) Statistical surveillance. Optimality and methods. International Statistical Review, 71,403-434. FrisCn, M. and de Mark, J. (1991) Optimal Surveillance. Biometrika, 78,271-80. FrisCn, M. and Wessman, P. (1999) Evaluations of likelihood ratio methods for surveillance. Differences and robustness. Communications in Statistics. Simulations and Computations, 28, 597-622.
Gan, F. (1993) An optimal-design of EWMA control charts based on median runlength. Journal of Statistical Computation and Simulation, 45, 169-184. Gob, R., Del Castillo, E. and Ratz, M. (2001) Run length comparisons of Shewhart charts and most powerful test charts for the detection of trends and shifts. Communications in Statistics-Simulation and Computation, 30, 355377. Hamilton, J. D. (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57,357-384. Hawkins, D. M. (1992) A fast accurate approximation for average run lengths of cusum control charts. Journal of Quality Technology, 24,37-43. Le Strat, Y. and Carrat, F. (1999) Monitoring epidemiologic surveillance data using hidden Markov models. Statistics in Medicine, 18, 3463-3478. Marsh, I. W. (2000) High-frequency Markov Switching Models in the Foreign Exchange Market. Journal of Forecasting, 19, 123-134. Morais, M. C. and Pacheco, A. (2000) On the performance of combined EWMA schemes for mu and sigma: A Markovian approach. Communications in Statistics. Simulations and Computations, 29, 153-174. National Institute of Economic Research (1992) Konjunkturltiget Maj 1992. National Institute of Economic Research, Stockholm, Sweden. Neftci, S. (1982) Optimal prediction of cyclical downturns. Journal of Economic Dynamics and Control, 4, 225-4 1. Page, E. S. (1954) Continuous inspection schemes. Biometrika, 41, 100-1 14. Pollak, M. and Siegmund, D. (1975) Approximations to the Expected Sample Size of Certain Sequential Tests. Annals of Statistics, 3, 1267-1282. Roberts, S. W. (1959) Control Chart Tests Based on Geometric Moving Averages. Technometrics, 1,239-250. Roberts, S. W. (1966) A Comparison of some Control Chart Procedures. Technometrics, 8,411-430. Robertson, T., Wright, F. T. and Dykstra, R. L. (1988) Order Restricted Statistical Inference, John Wiley & Sons Ltd. Royston, P. (1991) Identifying the fertile phase of the human menstrual cycle. Statistics in Medicine, lO,22 1-240. Sepulveda, A. and Nachlas, J. A. (1997) A simulation approach to multivariate quality control. Computers & Industrial Engineering, 33, 113-1 16. Shewhart, W. A. (1931) Economic Control of Quality of Manufactured Product, MacMillan and Co., London. Shiryaev, A. N. (1963) On optimum methods in quickest detection problems. Theory of Probability and its Applications., 8,22-46. Sonesson, C. and Bock, D. (2003) A review and discussion of prospective statistical surveillance in public health. Journal of the Royal Statistical Society A, 166, 5-21. Woodall, W. H. and Ncube, M. M. (1985) Multivariate Cusum Quality Control Procedures. Technometrics, 27,285-292.
Wu, C., Zhao, Y. and Wang, Z. (2002) The median absolute deviations and their applications to Shewhart control charts. Communications in StatisticsSimulation and Computation, 3 1,425-443.
Specification Setting for Drugs in the Pharmaceutical Industry Jsrgen Iwersen and Henrik Melgaard Novo Nordisk AIS, Novo AIIB, DK-2880 Bagsvaerd, Denmark [email protected] Summary. In the pharmaceutical industry a drug must conform to certain limits through its shelf life period. To ensure compliance in practice we need manufacturing processes to be robust and in control, measurements systems to be in control and the measurements must be traceable. Storage conditions must be under control. In this paper we discuss the practical implications involved in setting and maintaining specifications for drugs in the pharmaceutical industry. These include statistical process control limits, release limits, shelf life limits and in-use limits. The challenge here is to make this chain of limits consistent and at the same time be practical for use. The scientific approach to establishing a chain of specifications involves normal linear mixed models and an Arrhenius model, a kinetic model, describing e.g. the temperature dependence of drug degradation. These models are applied to data from stability studies as well as data from batch release.
1 Methodology Specifications must be based on a scientifically sound rationale. This section discusses the influence o f the factors involved in the setting o f specifications, including producibility in terms o f process capability, process performance, measurement uncertainty and stability. Correspondence between release and specification limits is established. These limits must be established for all relevant parameters. See refs 1-3. T h e statistical method described in this paper makes it probable that the true mean o f a h t u r e batch will remain within the specified shelf-life limits throughout the shelf life o f the batch. Thus the method fulfils the requirements given b y ICH' guidelines, see ref. 3.
I
International Conference on Harmonisation of Technical Requirements for registration of Pharmaceuticals for Human Use. The final guidelines are adopted by the regulatory bodies of the European Union, Japan and USA.
In connection with submission of either new specification limits or changes in specification limits to the authorities, documentation justifying proposed limits must include calculations described in the following before submission.
capability
Shelf4fe
Confidence
precision
Calculation based on relevant input using statistical
Q CR cornminee
Submission of specification limits
Authorities
Not approved
Approval
The above flow chart describes the process and some of the parameters involved when establishing documentation for correspondence between limits.
Definition of limits
I
Target
-
1
I
i
I
1
!
:
i
I
I
E
I
1
1
3
I
Lower Limits f
In-use Limits Shelf-life Limits
Release Limits Alert Limits
Upper Limits
Definition
Description
LSL, USL
Lower and upper shelf-life limits. The product will remain within these limits throughout its shelf life. If the analytical results are on the border of the release limits, there is a given high probability, usually 95% that the batch will remain within the shelf-life limits
LRL, URL
Lower and upper release limits. These limits are calculated in such a way that when the analytical results are within the limits, assurance is provided, at the specified confidence level, that the true value of the individual batch will remain within the shelf-life limits throughout the shelf-life period
LCL, UCL
Lower and upper alert limits. These limits are based on the production variability (and implicitly also the analytical variability), for the process when in statistical control
TARGET
Target for production
Linking shelf-life limits and release limits The linking of shelf-life limits and release limits are based on a modification of Allen's formula, see ref. 4. The following terms are included in the formula (other variance components may be included, when necessary): Systematic1 Random
Reason for inclusion
Expected degradation
Systematic
Uncertainty of expected degradation
Random
The expected degradation, equals slope x shelf life. In case of no degradation, this term is left out. The stability studies performed provide limited information, which must be taken into account. In case of no degradation, this term is left out.
Uncertainty of measurement
Random
The result used to document compliance with the specification has an uncertainty of measurement. So even though the measurement result is within the limits, the true value of the batch might not be. By including this term, it is made probable that the true value of the batch will be within the shelf life limits at any time during shelf life.
Batch-slope variation
Random
Allen's formula as presented in ref. 4 assumes a common degradation for all batches. If this is not the case, an extra term must be added to take account of the variation of future batches.
Term
Name
The model describing the stability data is a normal distribution mixed model, accounting for loss of activity or increase in degradation, and a separate starting
point for each batch, and for random effects due to the uncertainties on batch slope and the uncertainty of measurement. The assumption of a common batch degrada: tion rate must be evaluated by a statistical test. The uncertainty of measurement, oZA,can be determined from all assay results obtained from a control sample (a single stable lot which is tested whenever the assay is run). Such a determination would ideally include changes in reference materials, different equipment, operators, etc. and would therefore give a realistic reproducibility assay standard deviation. If such measurements are not available, the uncertainty of measurement may be determined from the validation of the analytical method or from the residual variation from the stability data itself. Where relevant, it should be examined if the uncertainty of reference material was included in the determination of the uncertainty of measurement, if not, an additional term should be added to take account of this uncertainty. In addition an uncertainty budget may be a useful tool for evaluating the components of variation included in the uncertainty of measurement, 02A. This is specifically relevant when optimizing the analytical method, see ref. 6. The formula relating the external shelf-life limits and release limits in the case of common batch degradation is then given by:
where to95 = 95% one-sided t-value with df degrees of freedomZ df = the degrees of freedom associated with the total variation, calculated by the Satterthwaite's approximation, see ref. 4. o2A is the variance for the uncertainty of measurement 02,ts? is the variance for the uncertainty of expected degradation y tSLis the expected degradation tSLis the shelf-life period. If there is a variation in the degradation rate between batches, an extra term is included in the formula:
where oZB,is the standard deviation for variation between slopes for different
batches.
95% one-sided t-values are used also in the case of two-sided limits.
These formulas calculate the difference between the internal release limit and shelf-life limit and may be used either way to calculate one of the limits based on the other limit as fixed. The formulas can also be used in an economic analysis. If the difference is too wide, for instance, the most contributing term can be identified and possibly improved. One possibility is to increase n, the number of measurements used for batch release. There will be only one release limit, which will take account of all the terms described above, possibly extended with more sets of storage conditions. Calculation of alert limits Control charts aid in the detection of unnatural patterns of variation in data resulting from repetitive processes and provide criteria for detecting lack of statistical control. The process is in statistical control when the variability results only from random causes. Once this acceptable level of variation is determined, any deviation from that level is assumed to be the result of assignable causes, which should be identified and eliminated or reduced. The most common and also simplest control chart is the Shewhart control chart. The alert limits on the control charts are at a distance of 3 0 on each side of the central line, where o is the population standard deviation of the statistic being plotted, in this case the batch average result.
-
Process capability linking process variability and internal release limits For a process in statistical control, two commonly used measures are used for quantifying the capability of the process to deliver the demanded quality. In this case the demanded quality, at the moment of release, is within the range of the release limits. The optimal control is obtained if the process is centred exactly in the middle of the specification interval; this is expressed through the process capability. The process capability is calculated as: C, =
URL - LRL 60
where 0 is the total process variation, including batch-to-batch variation as well as the analytical uncertainty. This process variation should ideally include long-term variation, i.e. variation of raw materials, active pharmaceutical ingredient, preparation, filling and analytical testing. When the target is not centred in the middle of the interval, it is useful also to provide an alternative index for process capability: C,, = min
{URL;-.
,
target - LRL 30
However, usually it is not possible to remove all systematic variation or to adjust the process to be centred exactly on the target. In this case the process performance is calculated, which takes into account that the actual process may not be centred exactly on the target: Cpk= min
U R L - p , p-LRL
where p is the actual process average. Usual company standards is to require that both C,, C,, and Cpk should be larger than 1.33, and the value should at least be larger than 1 .O, in which case the alert limits and release limits coincide.
2 Case Study In practice, one will often face the problem of combining limits for bulk material (e.g. crystals) with limits for the final (liquid) presentation of the product. In this case study we demonstrate an example of a solution to this problem. In this case study, we consider a drug product where the drug substance in crystalloid form is mixed with water and preservatives, yielding a filling batch of drug product on liquid form. The filling batch is then filled into cartridges with a unit size of 3ml. This process involves a number of limits. 1) Release limits and shelf life limits or retest period for drug substance. 2) Release limits and shelf life limits for drug product. The limits are shown in the figure3 below.
The authors would like to thank Mrs. Lise Skotner, Novo Nordisk AIS for the original of this figure.
1
t
Process Contribution DS
rua Substance
I
Y
102 1 %
Uncertainty on degradation DS
100o%
.----------..-..---.---.----.--..--..-.Y 979% I
URL Dmg product
)Preparation uncertainty -----*---
) Preparatton uncertainty
Target DP LRL Dmg product
Process Contnbut~onDP
Intermediate precision + uncertainty on slope DP
I
LSL Drug Product
To determine appropriate release limits, all available stability data are collected and analyzed. Since measurement processes usually improve over time, a time window of 2 to 3 years is usually a good choice, for these data, to make sure they are from the same measurement process, but it is always relevant to determine an appropriate time window. Data are analyzed by the following model:
where Yijk(t) P ai
: observation of response Y from batch j of product i at time t
b(a)j(i)
: Deviation of the j'th batch of the i'th product from the common intercept ' : Common slope : deviation of the slope of the i'th product from the common slope
Y aYi
: Common intercept
: Deviation of the i'th product from the common intercept"
The factor product may be a combined effect of other factors, such as container, container size, potency, etc.
By(a)j(i) : Random deviation of the slope of the j'th batch of the i'th product 2
fi0m the common slope, NID(0,o By,i) €ijk(t)
2
: residuals, NID(O,o ) 8
The parameters in such a model may be estimated by Proc Mixed in SAS . The model is used both for analyzing batches of drug substance and batches of drug product. Storage conditions for drug substance are usually at low temperatures, so that the slope of the degradation parameter is practically zero, which is also the case here. The model above is the general model used for analyzing stability data for batches stored at a given temperature, standard stability condition for the drug product is T = 5°C. Accelerated studies are also performed at T = 25°C and T = 37°C in order to assess the stability of the product parameters at different environmental conditions. A generally accepted model for predicting stability at various temperatures for a given product is the Arrhenius equation: log a(T) = a + b/T T is the temperature in Kelvin a(T) is the slope of the degradation parameter at a given temperature a and b are constants for a given product parameter Plot
of Antmius Fa and Estimated Slopes Dtug Product
in UKeMn) This model can be used to assess process contributions, e.g. temporary storage at temperatures above standard storage conditions, T = 5°C.
Initially, we find the release limits for the drug substance as mean pluslminus 3 standard deviations5. This standard deviation accounts for process- and measurement uncertainty for the drug substance. From the drug substance batches of drug product are prepared basically by mixing the drug substance with water and preservatives yielding the liquid filling batch. Release limits for a batch of drug product are given by: 2 URL = Target + 3da$ + a,,,
2 LRL = Target - 3da&+ a,,,
Where oDs2is the Drug Substance standard deviation described above and op,; is the standard deviation associated with preparation of drug product and measurement uncertainty. The calculation of Shelf life limits for drug product is done using release limits for drug product as the starting point; then accounting for process contributions and storage at standard conditions during the shelf life period and measurement uncertainty. In this case we have:
I
L
Temperature
Treatment
15 OC
25 "C
35°C
Process Contribution
7.5 days
7.5 days
3 days
Shelf-Life Degradation
5°C
36 months
Shelf life limits are calculated as: LSL = LRL + A
where A is the necessary difference between the shelf life limit and the release limit to assure 95% probability of being within the shelf life limits at the end of shelf life:
In the equations we use k30, in general this term may be replaced by a tolerance- or prediction type interval.
and
~
USL = URL + t,,ss,dr
In this case we estimate the parameters based on 44 batches of stability at T = 5OC. We also have data fi-om accelerated studies, 12 batches at T = 25°C and 12 batches at T = 37OC. We get the following estimates for T = 5"C, after reduction in the model:
Estimating similar parameters for T rhenius fit is obtained:
=
25OC and T
=
37°C the following Ar-
-0.1028
Based on these estimates we obtain the contributions for A:
[ Contribution Process
I Assay 1 -0.28
With the internal release limits LRL = 97.9 URL = 102.1
we get LSL = LRL + A
= 97.9 - 2.9 = 95.0
40:
USL = URL + tOgSldf
=
102.1 + 1.645*0.75 = 103.3
The figure below shows a capability plot of the next 266 batches compared to the above limits. It shows a Cpk value of 2.043, which is greater than 1.33, and hence acceptable. To assess capability in the case production does hit target on average, we calculate the C,, measure in the following figure. Note that in this case, C,, is the same as calculating Cpkon the observations shifted 0.32 units to the left, as the bias is 0.32 units. Capability of Drug Product / Potency 100% LC
- N* CPk
A-D @-Square)
: R > A-Square
IM/StJA Release Umlts
The figure below shows a capability plot for the process centered around target, - ---
-
Capability of Drug Product / Potency 100% LC Mean
; ;
10Rm
om
std~&u~+ Mblhun M&um N
q, Cpt
w.1a laI.03
/
i I R>A-Square
288 2A2 2.42 1.1641 0.-
la
104.8
i
- 4 A-D @-square) I
I 1
85
85.8
86.8
97.4
862
89
w-j
1 k/StJA Release Umlts
W.8
100.6
Data
101.4
- Cpt
1012
lm.8
Since the release limits in this case are symmetric around target, C,, is equal to C, at 2.412, which would be very acceptable. Hence a gain in capability will be achieved, when production improves and hits target on average.
3 Discussion In the paper we have presented a methodology for setting specification limits for drug products. The setup described in this paper makes it probable that the true mean of a future batch will remain within the specified shelf-life limits throughout the shelf life of the batch. Thus the method fulfils the requirements as given by ICH guidelines6, see ref. 3. The methodology includes the use of capability analysis for justification of release limits and the use of statistical control charts for monitoring the production process as well as the analytical procedure of the QC laboratory. The methodology can either be used to suggest specification limits for a new drug product based on current properties of the drug, production and analytical method or to set the requirements for the production and shelf life in terms of release limits for given specification limits proposed for medical or historical reasons. We believe the approach suggested in this paper has better properties than the one suggested in the ICH guidelines, see ref. 3 and ref. 8.
References [I] ICH Topic Q6A, Specifications: Test Procedures and Acceptance Criteria for
New Drug Substances and New Drug Products: Chemical Substances, July 1997. [2] ICH Topic Q6B, Specifications: Test Procedures and Acceptance Criteria for BiotechnologicalIBiologicalProducts, CPMPlICHl365196, March 1999. [3] ICH Topic QlA, Stability Testing of New Drugs and Products, CPMPlICHl2736199, March 2003. [4] Allan, Poul V. et al.: Determination of Release Limits: A General Methodology. Pharm. Res., 8(9), p 1210, 1991. [5] ICH Topic QlE, Evaluation of Stability Data, CPMP/ICW420/02, March 2003. [6] Guide to the Expression of Uncertainty in Measurements. Limits based on ICH suggested principles are typically based on only 3 stability batches and testing for all batch-related effects are done using significance level of a = 0.25 in a fixed effects model.
[7] Davis, Gregory C. et al.: Rational Approaches to Specifications Setting: An Opportunity for Harmonization. Pharm. Tech., September, 1995. [8] Wen-Jen Chen and Yi Tsong: Significance Levels for Stability Pooling Test: A Simulation Study, J.Biopharm. Stat. Vol. 13, No. 3, pp. 355-374,2003.
Monitoring a Sequencing Batch Reactor for the Treatment of Wastewater by a Combination of Multivariate Statistical Process Control and a Classification Technique Magda Ruizl, Joan Colomerl, and Joaquim ~ e l e n d e z ' University of Girona, Avenue Lluis Santallo Campus Montilivi Building PIV CP 17071Girona - Spain mlruizo,colomer,[email protected]
Summary. A combination of Multivariate Statistical Process Control (MSPC) and an automatic classification algorithm is applied to monitor a Waste Water Treatment Plant (WWTP). The goal of this work is to evaluate the capabilities of these techniques for assessing the actual state of a WWTP. The research was performed in a pilot WWTP operating with a Sequencing Batch Reactor (SBR). The results obtained refer to the dependence of process behavior with environmental conditions and the identification of specific abnormal operating conditions. It turned out that the combination of tolls yields better classifications compared with those obtained by using methods based on Partial Least Squares.
1 Introduction This work makes a contribution to monitoring waste water plants (WWTP) operating with sequencing batch reactors (SBR). The work illustrates the use of statistical techniques to describe and analyze the plant's behavior aiming at identifying of situations which may lead to failures. The goal is to develop specifications for the plant's operation [18] including faulty or abnormal situations. Modern monitoring techniques enable the collection of a huge number of data, which, however, necessitates data reduction techniques, particularly in case of manufacturing processes such as polymerization [12], industrial pharmaceutical process [29], fabrication of flexible polyurethane foams [24], drying process [21], polyester film process [lo], WWTP [4]. The problems are generated by uncertainty which often is classified in random noise (common cause) and randomly occurring special cause. Standard control strategies aim at detecting and removing special causes, but have no impact on the common cause variation, which is inherent to the process. There is the same pattern of variation for the same operating conditions implying that the mean and variance remain unchanged unless the operating conditions have changed. Thus, it becomes possible to distinguish different conditions and detect them automatically
by observing statistical characteristics and comparing them with certain thresholds [l 11. The corresponding methodology is known as Statistical Process Control (SPC) and Multivariate SPC, where the latter makes extensively use of the Covariance Matrix representing the relations among process variables. This paper aims at illustrating a methodology for assessing the conditions of a WWTP when eliminating organic matter, nitrogen and phosphorus to guarantee the regulations for quality monitoring (directive 911271lCEE [3]). This project refers to a Sequencing Batch Reactor (SBR) pilot plant shown in Figure 1 where sludge is used for removing nitrogen and eliminating organic matter. The processes are nonlinear, time-varying and subject to significant disturbances such as equipment defects, atmospheric changes and variation in the composition and dependence of the relevant variables. The dependence is modelled by a correlation structure of the relevant variables.
Fig. 1. Real SBR pilot plant
Multivariate Statistic Process Control (MSPC) methods aim at detecting events that cause a significant change in the structure of correlation of the process variables 171. An extension of the Multiway Principal Component Analysis (MPCA) has been applied on SBR processes [161[251 for obtaining a classification scheme for batch processes [20]. This paper is organized as follows: In section 2 the operation of a SBR process is presented. The section 3 describes the PCA extensions for process monitoring and section 4 presents the classification method. Section 5 contains a numerical example using data recorded from a pilot SBR plant. Finally, future work and conclusions are summarised in section 6.
2 The SBR Pilot Plant This paper is an outcome of the research project Development of a system of control and supervision applied to a Sequencing Batch Reactor (SBR)for the elimination of organic matte6 nitrogen and phosphorus DPI2002-04579-C02-01 funded by the Spanish Government. The involved partners are the Control, Engineering and Intelligent Systems (eXIT) group, the Laboratory of Chemical and Environmental Engineering (LEQUIA), both from the University of Girona (Spain) and the firm Inima Servicios del Medio Ambiente SA.
Fig. 2. Schematic overview of the SBR pilot plant
The SBR Pilot plant (Figure 2. ) used in the research project has a reactor, which is composed of a metal square tank of lm3 and the capacity of processing a volume of 200 liters. The wastewater is obtained from a real plant sited in CassZc(Girona) (Spain) by means of a peristaltic pump and stored in a buffer tank which feeds the reactor. The treatment carried out by the SBR pilot plant is oriented to nitrogen removal. Remember, that nitrogen and phosphorus are essential nutrients of life, but if they are in overabundance in the water, algae develop quickly, die and decompose. Since decomposition is done under aerobic conditions (with oxygen), oxygen level slumps and does not permit life [26]. Nitrogen removal in this SBR pilot plant is performed in two steps: 0
Nitrification: Ammonia is converted to nitrate by aerobic microorganisms. Denitrification: Nitrate is converted to N 2 0 or nitrogen gas under anoxic (without oxygen) conditions by anoxic microorganisms.
Figure 3 shows the alternation of the two phases in the operation cycle defined by the reactor. Each cycle is based on alternating anoxic and aerobic reaction, where filling only occurs during anoxic stages. The anoxic period is longer than the aerobic, because of increasing denitrification. The total filling volume is 200 liters, divided into six feeding parts during a 8-hours cycle. Settling and draw take 1 hour and 0.46 hours, respectively [6]. Data gathered from the plant is organized in 8 hours batches containing 5760 samples (obtained every 5 seconds) per variable: pH, oxidation reduction potential (ORP), dissolved oxygen (DO) and temperature. To monitor essential variables, the SBR process is equipped with DO-Temperature (OXIMAX-W COS 41), pH (CPF 81) and ORP (CPF 82) Endress-Hauser probes. Signals, filtered in the transmitter, are captured by a data acquisition card (PCI-6025E from National Instruments) [22].
Llenado
Vaciar
Fig. 3. Fixed operational cycle apply in the pilot plant
The analysis of profiles of the relevant variables allows to interpret the process behavior. Figure 4 displays ORP and pH profiles by means of which unusual instances can be identified. One of them is the Nitrate Knee (in the ORP variable) which appears in anoxic conditions and denotes the complete denitrification. Others are Ammonia Valley (AV) and Nitrate Appex (NA), which appear in the pH variable. AV indicates the end of nitrification (aerobic condition) while NA gives evidence of the end of denitrification under anoxic conditions [5] [19].
3 MSPC for Batch Processes Modern processes have in general large volumes of historical data stored in databases and their exploitation is of crucial importance for an improvement of operation. Typically, these data are generated by highly depending variables. Therefore, it was de-
i
Ammonia valley (pH)
t
Nitrate k n r (RedOx) I N I U N apex (pH)
Fig. 4. ORP and pH profiles
cided to use for the reduction of dimensionality the Principal Component Analysis (PCA) together with Partial Least Squares (PLS) [8]. PCA and PLS are techniques that provide low dimensional models useful for a further analysis and efficient monitoring [I31 [I 11. The application of statistical process control to monitor simultaneously multiple variables using these reduction techniques is known as Multivariate Statistical Process Control (MSPC), [I71 [lo] [16] [9]. These techniques have been widely used for monitoring continuous process. Only recently the application of these methods has been extended to cover batch processes, too. Some approa~hes proposed to deal with statistical batch process control are briefly introduced in the next subsection. 3.1 Multiway Principal Component Analysis (MPCA) Consider a typical batch run in which j=1,2, ...,J variables are measured at k=1,2, ...,K time instants throughout the batch. Similar data will exist on a number of such batch runs i=1,2,...,I. All the data are contained in the X ( I x J x K) array illustrated in figure 5, where different batches are organized along the vertical line, the measurement variables along the horizontal line, and their time evolution takes the third dimension. Each horizontal slice through this array is a ( J x K) data matrix representing the time histories or trajectories for all variables of a single batch (i). Each vertical slice is an (I x J ) matrix representing the values of all the variables for all batches at a common time interval (k) [16] [27]. MPCA is equivalent to performing ordinary PCA on a large two-dimensional (2D) matrix constructed by unfolding the three-way. Six ways of unfolding the threeway data matrix X are possible, as indicated in table 1 [29]. In this work the ways A and D are used. Undey and Cinar [25] used type A (figure 6), while Nomikos and MacGregor used type D [I61 (figure 7), which is particularly meaningful, because by
Fig. 5. Arrangement of a three-way array
subtracting the mean of each column of the matrix X the main nonlinear and dynamic components in the data are removed. A PCA performed on the mean-corrected data is therefore a study of the variation of the variables in time with respect to their mean trajectories [15]. Table 1. Types of unfolding a three-way data matrix, according to [28]
Type Structure Direction A B C D E F
IK x J JI x K IJxK I x KJ I x JK J x IK
variable time time batch batch variable
MPCA decomposes the three-way X,into a large two-dimensional matrix X separating the data in an optimal way into two parts: The noise or residual part (E), R which is small in the sense of least squares, and the systematic pai-t (C,=, t, @ P,), which consists of a first factor ( t )related to the batches and a second factor ( P ) related to the variables and their time variation [16]. MPCA is performed by means of the NIPALS algorithm resulting in the matrix X. It is the product of the score vector tr and the loading matrices Pr, plus a residual matrix E, that is minimized in the sense of least-squares:
Structure of the unfolding matrix Direction that remains unaltered
Batches time x variable
Batches (I)
me (K)
Varlables (J) Hlstorlcal Batch Data
Fig. 6. Decomposition of
.I Batches
R
VJ
I-
Variables (J)
W
Time(1) Time(2)
"H
I I 2K
-
Batches (I)
I
I
11.
3
to 2-D ( I K x J )
Time (K) Historical Batch Data
KJ
kJ
. ..
Time(k)
. ..
Tlme(K)
Variables x Times Batches x time variable
Fig. 7. Decomposition of X to 2-D ( I x K J )
where @ denotes the Kronecker product (X = t @ P is X(i,j,k) = t(i)P(j, k)) and R denotes the number of principal components retained. Equation (1) is the 3-D decomposition while Equation (2) displays the more common 2-D decomposition ~51. Abnormal behavior of a batch is generally identified by means of the Q-statistic or the D-statistic, which are compared with control limits determining whether the process is in control or not. These methods are based on the assumption (generally motivated with the central limit theorem) that the underlying process follows approximately a multivariate normal distribution with zero first moment vector. The Q-statistic is a measure of the lack of fit. For batch number i, Qiis defined as :
where e j k are the eiements of E. Qiindicates the distance between the actual values of the batch and the projected values onto the reduced space. The D-statistic or Hotelling T2 statistic, measures the degree of fit:
where S is the estimated covariance matrix of the scores.
4 Classification Method For classification, the Learning Algorithm for Multivariate Data Analysis (LAMDA) has been used [I]. This method takes advantage of hybrid logical connectives to perform a soft bounded classification. LAMDA is proposed as a classification technique to apply to principal components selected for monitoring. The goal is to assess the actual situation according to profiles previously learned [I][14].
$ Mar
escrlptorst
I
m class
I
MADS
Class assignment
Fig. 8. Basic LAMDA recognition methodology
Input data is presented to LAMDA as a set of observations or individuals characterized by its descriptors or attributes and recorded as rows. Principal components
obtained in the MPCA step are used as input variables to be classified. Once, the descriptors are loaded, every individual is processed individually according to the desired goal [I]: 1. To classify the individuals according to a known and fixed set of classes. 2. To learn and adapt from a previous given set classes which can be modified according to the new individuals. 3. To discover and learn representative partitions in the training set. The basic assignment of an individual to a class follows the procedure represented by figure 8. In this, MAD and GAD stand for Marginal (it takes into account only one attribute) and Global Adequacy Degree (obtained from the hybrid logical combination of the previously obtained MADS) respectively, of an individual to a given class. Equations (5) and (eq:GAD)are used to calculate them. This classifying structure resembles that of a single neuron ANN [I].
M A D ( d i x j / p i l k )= pfi;'(l - p z /k where dixj = Descriptor i of the object j pi/,, =p of descriptor i and class k
GAD = P T ( M A D )
+ (1- P ) S ( M A D )
(6)
Formalizing the description of LAMDA, it is possible to define an individual as a series of descriptors values d l , ...,d, such that each d, takes values from the either finite or infinite set D,. We will call universe or context to the Cartesian product U = Dl x D2 ... x D,. Thus, any object or individual is represented as a vector x = ( X I ,..., x,) from U ,such that each component x, expresses the value for the descriptor d, in the object x. The subset of U gathering all these vectors will be called data base or population. To assign individuals to classes MAD step will be calculated for each individual, every class and each descriptor, and these partial results will be aggregated in order to get the GAD of an individual to a class. The simplest way to build this system would be by using probability distributions functions, and aggregating them by the simpIe product, but that would force us to impose a series of hypothesis on the data distribution and independence which are too arbitrary. Finally, MAD and GAD have been used according to definitions of equation 5 and equation 6 respectively [I]. The hybrid connective used for GAD is a combination between a t-norm and a t-conorm by means of the P parameter. P = 0 represents the intersection and ,b = 1 means the union. This parameter will -inversely- determine the exigency level of the classification, so it can be identified as a tolerance or exigency parameter.
5 Results and Discussion 5.1 Types of Batch Processes The data obtained from the SBR process was analyzed by two different methods. The first one is proposed proposes a profile study of the variables [20]. The second analysis constitutes a preliminary MSPC. First and second component principal are depicted in Figure 9. Abnormal behavior can be detected outside of the dotted line. Besides, small groups are formed inside of the limit. One of them is conformed by batches with a excellent denitrification and nitrification process (solid line).
2- , 15
/
-
/ , *
/'
/
1-
/
P
2
0-
1
-
-1 5
-
0" *.
*.
*'
Normal
, 4.55 ..., .. . .'. ;
. \
*
I
-*-.
-*
*PP;.~
t
-1
f(c"\ ' ' Abnorbal behav~dur
.
'
*
.. * '
ha;iour
-
I
'
/
/
Abnormal behw
..
/
*
* * -2 1
-2
.I
0 1 2 Scores on PC 1 (2%4496)
3
I
4
Fig. 9. Score plot for batches. Dashed line is the model
The combination between both studies allow to identify five types of behavior in SBR pilot plant which are summarized in Table 2.
OPR
I
OD
Effluent Quallty Regular Quallty Complete nltrlflcatlon Partlal denltrllkatlan
Elecblcal Fault
Varlatlon In the composltlon
Regular Ouallty Partlal nltrlflcatlon Partlal denltrltlcatian
EqUlptWnt detects
Bad Quallty None nltrlflcatlon and denltrltkstlon
Normal Quality Complete nltrlflcatlon Partlal denltrlfkatlon
Atmospherle changes
EXCellenWgOodlnOrinaI Quallty Complete nltrlflcatlon! dcnltrltkatlon
Normal behavlour
Table 2. Q p e of Batch process
1. Electrical fault: Correspond to voltage sags, which are short lasting reductions in rms voltage, caused by short circuits, overloads and starting of motors. Voltage sags cause problems on several types of equipment [2], e.g. a voltage sag may lead to problems with the sensors. However, in the moment the voltage sag disappears, the sensors retrieve their normal behavior. 2. Variation in the composition: Microorganisms or the influent composition can change causing disturbances in the normal profiles of the variables [20]. 3. Equipment defects: The SBR pilot plant is exposed at season changes with rise and fall of temperature. These factors added with day-to-day cause wearing down in sensors, acquisition card, computer, among others which induce to missing data or not register of the process. 4. Atmospheric changes: During wet season, the relative amount of organic matter and nitrogen are smaller. 5. Normal behavior: This is characterized by the removal of nitrogen and the elimination of organic matter. According to the nitrogen removal, the quality was classified in excellent, good and normal. Using the above classification scheme, it is possible to quantify the number of batches for each class. In Table 3 the classification of all batches of the SBR process is given. There are 60 (equivalent to 33.5%) batches with abnormal behavior divided into the four abnormal classes given in Table 2. Normal behavior was observed in
66.5% of all cases. The normal behavior exhibits a higher nitrogen removal than legally required.
'I'ypc of batch 96 Quantity proccss .. .- - --- ---Atmospheric 17 9.50 changes Equipment 8 4.47 defects Variations in the 33 18.44 composition l
5.2 Application of MSPC MPCA Each batch is treated 8 hours and every 5 seconds the relevant variables are monitored yielding a total of 5760 samples. However, due to computational limitations only a sample of 392 data sets from each batch can be used. For comparison, the selected data (392) represents sufficiently well the totality of 5760 data samples. The corresponding profiles are compared for both sampling in Figure 10 which shows a satisfactory agreement. Finally, the MPCA algorithm was applied to the three-way data array, X,with dimensions 179 x 4 x 392, where K = 392 is the number of time instants throughout the batch (samples), J = 4 is the measured variables and I = 179 is the historical data batches.
MPCA: Batch direction The three-way array X is unfolded in batch direction (I x K J ) gets 8 principal components with a final data matrix dimension of (179 x 8). This model explains 92.79% of the total variability (see Table 4).
Table 4. Principal component extraction
392 samples
,
r<-",-"...*
ORP
I ; :
d; 1 ?I\ ti (l, 11): .
:
<
,-
1, .
,,,
1-
>.
,
CJ
1
!,.I ..
i......................... Ammonia valley ] ,
,
,
,
:
Temp
Fig. 10. Comparison between 5760 and 392 samples for variables to determinate the validity of the selection
Examining the process data in the reduced projection spaces, first and second principal components are plotted (see Figure 9). One group of falls are outside of the normal range. These batches correspond to Variations in the Composition. Abnormal batch behavior can be identified by means of Q and T~ control charts. Figure 11 shows Q and T 2charts for all process batches. The Q chart identifies several batches due several abnormal behaviors. In T 2 chart shows two batches falling excessively outside the limits, both due to Electrical Fault (EF). In Table 5 the batches outside the limits are summarized for both control charts. The Q chart detects only one third of the total number of abnormal behavior and produces, furthermore, 8 false alarms. The T 2 chart identifies only 20 batches with abnormal behavior and not produces false alarms. Only 39 cases of the 60 cases
Fig. 11. Graphics of Q and T 2charts with 92.79% confidence limits
of the abnormal behavior are detected by the two charts and 9 cases of abnormal behavior is detected simultaneously by both charts.
Atmospheric changes Equ~pment defects Variat~onu1n thc composltron ISlectr~calJ:ault Xorrnal behavior
9
5.03
4
2.23
0
000
6
335
11
6 15
8
4 47
0
000
8
447
2 0
112 0 00
Table 5. Results from Q and T'
MPCA: Variable direction The three-way array X is unfolded in variable direction (IK x J) gets 3 with a final data matrix dimension of (70168 x 4). Three principal components explain 95.18% of the total variability. In figure 12 a projection on the first and second component
plane is displayed.
Fig. 12. Weighs of the variables and the model graphic with two principal components in
variable direction
The batches can be separated in three different sections. Each section are different phase of process fitting. First section corresponds at month where the pilot plant was tested. Second section corresponds at spring season. Finally, third section corresponds at summer season. In this analysis, temperature contributes marginally therefore, it is omitted and a new model is built using only 3 variables. Figure 13 shows the model without temperature and Figure 12 shows the model with temperature
OD. ORP: 06
C.65
Fig. 13. Weighs of the variables and the model graphic with two principal components in variable direction without Temperature
Conclusions on MSPC Combining MPCA with analytical methods it is possible to develop a classification schemes for the batches. The model with MPCA in batch direction improve the knowledge about the process while the model developed in variable direction allow to detect the relationship between process behavior and environment (test phase, spring and summer seasons). The investigation shows, MPCA is capable to detect abnormal behavior in SBR process by projecting the data into a lower dimensional space. Applying the classification tool to principal components will allow identifying and grouping of similar situations according to match criteria.
5.3 Classification for Situation Assessment MPLS was used for reducing the dimensionality and maximizes the relation between the matrix X (I x J K ) and the predicted matrix Y [23] (179 x 5) where 179 is the number of historical data batches and 5 are the identified types of batch processes. However, the model does not describe the process because matrix Y was created with the results obtained in the preliminary MSPC analysis (depicted in table 2). Thus, MPCA and a classification tool is used for situation assessment.
5.4 MPCA Classification Principal components obtain by MPCA step is the input in the classification tool. The dimension of this matrix is 8 x 179 (Table 6) which is called X . LAMDA auto.matically classifiy the data in eleven classes as shows Figure 14.
Table 6. Descriptors used for the batches
Figure 15 compares the classes and the types of the batch process. According to these results, it is possible to identify classes with equipment defects, electrical faults, atmospheric changes and variation in the composition. Classes 1,9 and 10 correspond to normal behavior. Class 6 is associated to atmospheric changes. Classes 3 and 11
Fig. 14. LAMDA Frame with two classes of batches identify
normal
111 80
.-
la equipment defects El electrical fault
Uatmospheric chanaes lllvariation composi~oon
Fig. 15. Composition of the classes accord to type of batch
represent variations in the composition while classes 7 and 8 include electrical fault. Finally, the classes 2, 4 and 5 include different types of batches. Table 7 shows amount and percentage of batches in each class. The predominant is class 1 which has 48.04% of the total historical data, this class represents normal behavior.
Table 7. Numerical composition for each class accord to results of classification method
The similar form, Table 8 shows amount for every class, besides shows the composition of each one in which some batches are wrongly classified as. Nevertheless, The classes can be identified (Table 9) with a name, e.g., class1 is normal behavior.
280
In the case of class 5 is called abnormal behavior due to atmospheric changes and equipment defects.
Table 8. Composition by class
abnormal amap~rr. sleclrml Llehavlour fault
dsdrkal fault
normal normal eqvlpnent behsvaur behavlcur dele&
Table 9. Classification by type of batch process
The relationship between the class and the principal components is shown in (Table 10). The relationship of the gth component with every classes does not change indicating that x can be computed using only seven descriptors. Consequently, only 7 principal components are used in Multiblock MPCA analysis. In this case, the total explain is 90.54%.
Table 10. Composition of the classes accord to principal component
6 Conclusions and Future Work Multivariate Statistical Process Control is an effective means for detecting abnormal behavior in SBR processes by projecting the data into a lower dimensional space that characterizes the state of the process. The type of the batch process and the classes are identified by a classification tool (LAMDA). MSPC in conjunction with the classification tool establish the relationships between batch behavior and type of batch process. Splitting the data into meaningful groups enables to better identify the batches with abnormal behavior and getting a characterization of the types of batch process: normal behavior, atmospheric changes, etc. In order to improve the results and to process the data faster, it would be desirable to have a method which combines the dimensionality reduction and the nonlinear
classification. Furthermore, in future the SBR pilot plant will not begin to operate with a fixed stage at each cycle. This become difficult the use of multiway strategies, because it leads to registers of different length (duration), which has to b e taken into account.
7 Acknowledgement This work is part of the research project Development of a system of control and supervision applied to a Sequencing Batch Reactor by loads ( S B R )for the elimination of organic matter; nitrogen and phosphorus DP12002-04579-C02-01 supported by the Spanish Government and the FEDER Founds. The authors also appreciate the valuable contributions made by the LEQUIA team: Jesus Colprim, Sebastiii Puig, Lluis Corominas and Maria Dolors Balager.
References 1. J. Aguliar-Martin and R. Lopez. The process of classification and learning the meaning of linguistic descriptors of concepts. Approximate Reasoning in Decision Analysis, pages 165-175, 1982. 2. Math H. J. Bollen. Undertanding Power Quality Problems. 0-7803-1713-7. Power Engineenng, 2000. 3. CEE. Diario oficial n 1 135 de 30/05/1991 p. 0040 - 0052 directiva 91127llcee del consejo, de 21 de mayo de 1991, sobre el tratamiento de las aguas residuales urbanas, mai 1991. 4. Jesus Flores Cerrillo and John F. MacGregor. Multivariate monitoring of batch processes using batch to batch information. AIChE Jorunal, sO(6):l219-1228,2004. 5. Chao H. Chang and Oliver J. Hao. Sequencing batch reactor system for nutrient removal: Orp and ph profiles. Journal of Chemical Technology and Biotechnology, 67:27-38, sep 1996. 6. LI. Corominas, M. Rubio, S. Puig, M. Vives, M. Balaguer, J. Colomer, and J. Colprim, editors. On-line Optimisation of Step-Fedd Operation of an Urban Wastewater Nitrogen Removal SBR by On-Line OUR Determination and ORP Analysis. 6th Specialist Conference on Small Water and Wastewater Systems (Australian), feb 2004. 7. Alberto Ferrer, editor. Control Estadistico MegaVariantepara 10s Procesos del Siglo XXI. 27 Congreso Nacional de Estadistica e Investigacion Operativa (Spain), 2003. 8. Theodora Kourti. Process analysis and abnormal situation detection: From theory to practice. IEEE Control Systems Magazine, 22(5): 10-25, oct 2002. 9. Dae Sung Lee and Peter A. Vanrolleghem. Adaptive consensus principal component analysis for on-line batch process monitoring. Technical report, Fund for Scientific Reseach - Flander (F.W.O.) and the Ghent University Resarch Fund, Coupure Links 653, B-9000 Gent, Belgium, 2003. 10. Dae Sung Lee and Peter A. Vanrolleghem. Monitoring of a sequencing batch reactor using adaptive multiblock principal component analysis. Biotechnology and Bioengineering, 82(4):489-497, mai 2003. 1 1. Barry Lennox. Multivariate statical process control. Technical report, Control Technology Centre Ltd School of Engineering University of Manchester, Dept. of Chemical and Process Engineering, University of Newcastle-upon-Tyne, UK, 2003.
12. J.A. Lopes, J.C. Menezes, J.A. Westerhuis, and A.K. Smilde. Multiblock pls analysis of an industrial pharmaceutical process. Biotechnol Bioeng, (80):419427, 2002. 13. John F. MacGregor, editor. Multivariate Statistical Approaches to Fault Detection and Isolation. 5th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes, 2003. 14. K. Moore. Using neural nets to analyse qualitative data. A Marketing Research, 7(1):3539, 1995. 15. Paul Nomikos and John MacGregor. Multivariate spc charts for monitoring batch process. Technometrics, 37(1):41-59, feb 1995. 16. Paul Nomikos and John F. MacGregor. Monitoring batch processes using multiway principal component analysis. AIChE, 40(8): 1361-1375, aug 1994. 17. Paul Nomikos and John F. MacGregor. Multi-way partial least squares in monitoring batch processes. First International Chemometrics InterNet Conference, 1994. 18. Aras Norvilas, Eric Tatara, Antoine Negiz, Jeffrey DeCicco, and Ali Cinar, editors. Monitoring and fault diagnosis of a polymerization reactor by interfacing knowledge based and multivariate SPM tools, number 0-7803-453. American Control Conference, 1998. 19. Y.Z. Peng, J.F. Gao, S.Y. Wang, and M.H. Sui. Use ph and orp as fuzzy control parameters of denitrification in sbr process. Water Science and Technology, 46(4-5):131-137,2002. 20. S. Puig, M.T. Vives, Ll. Corominas, M.D. Balaguer, and J. Colprim, editors. Wastewater nitrogen removal in SBRs, applying a step-feed strategy: From Lab-Scale to pilot plant operation. 31 IWA Specialised Conference on Sequencing Batch Reactor, Australia, feb 2004. 21. S. Joe Qin, Sergio Valle, and Michael J. Piovoso. On unifying multiblock analysis with application to decentralized process monitoring. Journal of Chemometrics, (15):715-742, 200 1 . 22. Montse Rubio, Joan Colomer, Magda Ruiz, Jesus Colprim, and Joaquim Melendez. Qualitative trends for situation assessment in sbr wastewater treatment process. Technical report, Workshop Besai'04 Valencia Spain, aug 2004. 23. Evan L. Russell, Leo H. Chiang, and Richard D. Braatz. Data-driven techniques for fault detection and diagnosis in chemical processes "Advances in Industrial Control". ISBN 1-85233-258-1, London, 2000. 24. Garca-Muoz S., Kourti T., MacGregor J., Mateos A,, and Murphy G. Troubleshooting of an industrial batch precess using multivariate methods. Ind. Eng. Chem. Res, 42:35923601,2003. 25. Cenk Undey and Ali Cinar. Statistical monitoring of multistage, multiphase batch processes. IEEE Control Systems Magazine, 22(5):40-52, oct 2002. 26. M.T. Vives, M.D. Balaguer, R. Garca, and J. Colprim. Study of the operational conditions for organic matter and nitrogen removal in a sequencing batch reactor. Technical report, University of Girona, 2001. 27. Johan A. Westerhuis, Theodora Kourti, and John F. MacGregor. Analysis of multiblock and hierarchical pca and pls models. Journal of Chemometrics, 12:301-321, 1998. 28. Johan A. Westerhuis, Theodora Kourti, and John F. MacGregor. Comparing alternative approaches for multivariate statistical analysis of batch process data. Journal of Chemometrics, 13:397-413, 1999. 29. Manuel Zarzo and Alberto Ferrer. Batch process diagnosis: PIS with variable selection versus block-wise pcr. Chemometrics and intelligent laboratory systems, 73: 15-27, jun 2004.
Part 3 Off-line Control
Data Mining and Statistical Control - A Review and Some Links Rainer Gijb Institut fur Angewandte Mathematik und Statistik, Universitat Wurzburg, Sanderring 2, D-97070 Wurzburg, Germany, goebQmathematik.uni-wuerzburg.de Summary. Due to the potential of modern data processing industrial companies collect largc amounts of business and engineering data. Interest in analysing these data has lead to a strong demand for data mining techniques and for corresponding software packages. Although data analysis is the common interest of statistics and data mining, the relationship between the two fields has remained unclear, in practice as well as in mcthodology. The present paper reviews links between data mining and statistical control in two instances: database management, particularly data warchousing, and temporal pattern analysis.
1 Introduction T h e statistical community has payed few attention t o the enormous and still rapidly continuing innovation in computer technology and electronic d a t a processing. In particular, there has been few interest in the practical aspects of d a t a acquisition, data storage, data handling, d a t a processing. Many statisticians use modern computing facilities as a tool. However, mathematical statistics is not substantially related to the areas of computing or electronic d a t a processing. Progress in these areas is due t o disciplines like computer science, engineering, robotics, artificial intelligence, management science. Statistical methodology grew in the 19th and 20th century before the mass distribution of electronic computing devices. At this time, mathematical techniques were the only effective tools for condensing and processing information contained in data. Necessarily, statistics was closely related t o mathematics. However, this relation emerged from its original motivation and became a purpose in itself. Statistics was finally considered as applied mathematics. Instead of d a t a analysis, mathematical methodology became the guideline for statistical research interests. Statistical methodology was cut off from innovation in d a t a processing. Tuley's Explorative Data Analysis is t o be considered as an exception.
Recently, the mathematical turn in statistical methodology has been discussed and criticized by many authors, e. g., Bendell et al. (1999) or Hand (1999). Already in 1962, Tukey propagated data analysis instead of mathematical methodology as the central paradigm of statistics: Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data t o make its analysis easier, more precise and more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data. Sound data handling is an indispensable prerequisite for sound data analysis. Hence, in Tukey's (1962) sense, statistics should be considered as the discipline of handling and analyzing data from variation phenomena. This includes two basic database related fields:
0
Database technology: Tools and methods for collecting, storing, customizing, and administrating data. Exploration and knowledge discovery in databases: Tools and methods for identifying, validating and administrating meaningful patterns from databases. In particular, this includes the methods widely known as data mining or knowledge discovery from databases (KDD).
These topics have successively been occupied by new d a t a related disciplines like computer science, artificial intelligence, computational intelligence. According t o a conjecture of Friedman (1997, 2001) such disciplines might have become a part of statistics if the latter had defined its field in a broad sense by the data analysis paradigm. From a general point of view, several authors have tried t o clarify the relations, similarities and differences of statistics and database related fields, in particular data mining, see Gale et al. (1993), Friedman (1997, 2001), Hand (1999), Mannila (2000) for instance. The present paper considers the relations between database related fields and statistical control methods, particularly statistical quality control (SQC).
2 Data Mining as a Tool in Control Charting: An
Example Quality control practitioners use databases and KDD techniques t o support or t o enhance classical instruments of statistical quality control. T h e following example illustrates some typical features and problems of so-called data mining applications in quality control. Thornhill et al. (1999) report on implementing control charts for the melt index (MI) of high density polyethylene (HDPE) a t B P Chemicals. HDPE is
a material for various production techniques, e. g., blow molding, film extrusion, injection molding, profile extrusion, sheet extrusion, pipe manufacturing. Each application requires a specific MI grade. For each MI grade, a data base archive of production history is searched for in control manufacturing episodes. In control episodes are identified according t o the expertise of plant managers: The plant manager assesses the percentage p of nonconforming operation among total operation time. Then the p percent worst production episodes are discarded from the data archive and the remaining episodes are considered as a reference set of in control operation. Parameters (standard deviation, quantiles) of the in control distribution of the MI grade are estimated from the reference set. Two methods of constructing control limits for Shewhart 57 charts are considered: i) 3 sigma limits. ii) lower and upper quantiles of the in control distribution. The aim is t o achieve the well-known in control value 320 of the ARL of the 3 sigma Shewhart X chart. An adequately chosen quantile chart meets the objective whereas the 3 sigma chart fails and produces an excess of false alarms. The contribution of Thornhill et al. (1999) has two interesting features in respect of data mining and of statistical methodology. The data mining aspect consists of estimating the parameters of an in control distribution from a very large database. Specific KDD methods are not used. D a t a mining is used as a catchword t o allude t o the high volume of data investigated. This is a very common view of data mining among practitioners. From a methodological point of view, the authors see the problem with 3 sigma charts in the fact that the distribution is not Gaussian. However, from theory and from numerical studies it is well-known that the 3 sigma rule is very robust against deviations from normality, see Wheeler (1995). Concluding from the d a t a graph displayed and from general experience with chemical processes, the problem may result rather from autocorrelation. This topic has been extensively investigated in literature, see Faltin et al. (1997) for many references. Many approaches use time series analysis models like autorgressive, moving average ones. The choice and implementation of such models is difficult for practitioners with few skill in stochastic theory. For a stationary autocorrelated process, considering the quantiles of a stationary in control distribution is a natural idea. Estimation by empirical quantiles is free from distribution assumptions and works considerably well under autocorrelation. This type of probably not optimum, but easily understood approaches should receive more attention in statistical research so as t o provide simple tools for data mining based applications. The availability of large data bases and of software encourages less expertised practitioners t o perform data analysis. Sophisticated approaches are either not used or they are often misinterpreted.
3 Problems of Data Handling in Modern Industry Adequate data management is an essential prerequisite for sound data analysis, in particular, for data mining approaches as described by paragraph 2. In a modern environment, the problem is not in the restriction of data storage capacities. The problem is rather in the amount and in the structure of data storage. Gigabytes of data are collected, created, processed and stored in different departments of an organization (R&D, engineering, manufacture purchasing, customer service) and in various distributed and autonomous systems which correspond t o heterogeneous interests. The business core is a n E R P (enterprise resource planning) system. E R P systems are based on business transactions. A variety of further systems with specific purposes are implemented in different parts of an enterprise: MRP (manufacturing resource planning) system; MES (manufacturing execution system); SCADA (supervisory control and data acquisition) system for gathering and analyzing real time data in plant or equipment monitoring and control; SPC systems; LIMS (laboratory information management system) designed specifically for the analytical laboratory, including R&D labs, in-process testing labs, quality assurance (QA) labs. From the point of view of data analysis, the above data handling scheme exhibits three problems: 1) Design of databases inadequate for data analysis. 2) Distribution of data. 3) Poor quality of data. The design of databases often conflicts with the interests of data analysis. In business and manufacturing environments, database schemes are motivated by operational interests, e. g., inventory control, resource planning, logistics, accounting, online process maintenance. The underlying database technology is addressed as online transaction processing (OLTP). OLTP designs support high frequency real time transactions of simple manner, usually adding or retrieving a single record. Data are organized according t o the record-by-record transaction paradigm, e. g., incoming or outgoing inventory, sales, bookings. Entity types focussed by data analysis, like customer, employee, product, process history, are not directly reflected by the database structure. Particularly, diachronic data series are fragmented. The information is in the system, but it is hidden. For instance, it is quite laborious and time consuming to extract from a standard E R P system the transaction history of customers with a given characteristic, e. g., holders of a loyality card. For purposes of d a t a analysis, data warehouse and online analytical processing (OLAP) designs have been suggested, see Inmon (1996), and below. Essential principles of OLAP designs are: analysis orientation, focus on entity types of analysis (customer, employee, product, process), suitable data aggregation, supporting historic data views. Distribution of data means the distribution over various and possibly disconnected bases and systems, different in purpose, logical structure, data format, organizational position. In particular, the interface of standard statistical software packages (e. g., Statistica, Mintab) and E R P systems is undefined.
For big E R P systems like SAP or Baan specific data analysis packages like SASIEnterprise Miner are on the market. However, a large majority of companies, in particular SMEs, run various small E R P systems without customized tools or interfaces for data analysis. Poor data quality means outliers, inaccuray, inconsistence, or incompleteness of data. All these phenomena are serious problems for data analysis in industry. Data are stored abundantly. Data quality assurance does not keep pace. The following phenomena occur: erroneous input, transmission failure, incompatible data format, duplicate records, erroneous interpretation of data, loss of substantial information due t o inadequate documentation and improper aggregation, obsolescence due to outdating. The Global Data Management Survey by PricewaterhouseCoopers (2002) reports the opinions of information officers a t 600 companies across the US, Australia, and UK. Only one in three traditional companies (excluding e-business) claimed t o be confident in the quality of their own data. 75% of respondents experienced significant problems as a result of defective data. Brauer (2001) states that in many organizations 15 to 20 % of data within databases may be erroneous or otherwise unusable. The Data Warehousing Institute (TDWI) estimates that dataquality problems cost US businesses more than 600 billion $ each year. In a consulting project for a retailing company the author investigated bookings in the E R P system, particularly the discounts allowed t o customers. A considerable percentage of discount bookings were inconsistent, misleading or false. The effect of such errors on statistical data analysis is tremendous. Due to the phenomena of inadequate database design, d a t a distribution, poor data quality, data analysis is time consuming and restricted in scope. In consulting practice, many statisticians may have experienced the following course of events, see Klenz & Fulenwider (1999) for a similar exposition: I) Meet business or engineering executives and discuss the problem. 2) P u t up a plan of data analysis. 3) The business or engineering executives contact a n IT manager to request the data. 4) The I T staff extracts the data from databases. 5) Analyze the data. 6) For any additional analysis or updating: Go back t o step 1 or 3. The entire complex of database handling (design, logical structuring, interfacing, maintenance, administration) has widely been ignored by the statistical community with only few notable exceptions as Lenz (1987). This attitude is a serious obstacle of growing importance for successful application of statistics in modern industry. Deficiencies are manifest in two respects: i) Methodology: statistics does not contribute t o rational structures and architectures for data administration. ii) Education: training in database skills usually is not a part of academic statistical programs. Statistics offers neither methods nor skills for creative database handling. Thus, statistical analysis in practice is often reduced to isolated preformatted d a t a sets on micro level.
4 Local and Global SQC I11 the situation described by paragraph 3 , SQC is restricted t o local or micro control, and cannot extend t o global or macro control. Local control applies t o isolated units, like testing laboratories, manufacturing lines, sales department, service department. In these units, immediately available d a t a are analyzed on the spot by classical tools like ANOVA or control charts. However, considering macro phenomena is characteristic for modern industrial environments: i) Products move through multistage processes. ii) Complex products are assembled from components which are manufactured in different plants on different locations. iii) The same part or product is manufactured on different locations. iv) Supply chains link vendors and processors or resellers. v) Life cycle analysis considers products from manufacture over sales t o service. Global SQC faces two challenges. 1) Methodological: Integrative models which decompose global process behaviour into the behaviour of subunits and reflect the relations among subunits. 2) Data handling: Tools to integrate and exploit data from various sources on micro level. The methodology of global SQC needs t o be investigated. Among existing models, all those approaches can contribute which are able t o gather and t o convey information from micro processes into macro processes or from one process stage t o another. In particular, these may be Bayesian and economic models. Recent important contributions to the methodology of S P C in multistage manufacturing are due to Fong & Lawless (1998), Lawless et al. (1999), and Agrawal et al. (1999). The appropriate data handling tool for a global approach t o quality control is the qualzty data warehouse. Inmon (1996) defines the concept of a data warehouse:
A d a t a warehouse is a subject-oriented, integrated, nonvolatile, and time-variant collection of data in support of management's decisions. By these design principles, a data warehouse is a database which avoids the problems described in paragraph 3. 1) Subject-orientation counteracts the transactional design principle. Subject-orientation is the adequate design principle for a database t o suit purposes of data analysis. Data are centered around business subjects like customer, vendor, material, product etc. 2) Integration is a principle to counteract the problem of data distribution. Data from multiple sources, bases, and locations are brought together for purposes of data a.na1ysis. 3) Nonvolatility characterizes the manner of updating the database. In transactional systems, updating occurs frequently and means adding new data records. In a data warehouse, updating means creating a new systematically ordered, structured and cleansed snapshot of data. This type of updating is usually infrequent, e. g., once per week or per month only. Nonvolatility contributes substantially to increase data quality. In the updating step, data are cleansed, formatted, structured, ordered and aggregated according t o the
needs of data analysis. 4) T i m e - V a r i a n c y emphasizes the importance of time as an index of data. Data or data collections should be time stamped and the database should enable a historic view of data. Hence data warehouses use to have a much longer time horizon than transactional bases. T h e focus on timevariancy is essential for data analysis. Details of data warehouse architecture and examples are provided by Inmon (1996), KimbaIl (1996), Silverston et al. (1997). A quality d a t a warehouse is a data warehouse containing data related to quality control and quality management. A typical example of a quality data warehouse as described by Klenz and Fulenwider (1999) may contain process measurements from an online manufacturing monitoring S P C system, production scheduling data from an MRP system, materials data from a supply chain system, production execution data from an MES, quality assurance data from an LIMS, customer complaint data and warranty claim data. Further examples of quality data warehouses are described by Rutledge (2000), J i et al. (2001). On principle, data warehouses are off-line instruments for decision making and decision support. The use of data warehouses for online operation and decision making essentially depends on suitable query tools and computing resources. Data warehouses in operational environments are discussed by McClellan (1997).
5 Date Warehouses and St at ist ical Macro Analysis Data warehouses and statistics are related in two respects: 1) D a t a warehouses can contain statistical data aggregates like sample statistics. 2) Data warehouses support statistical data analysis, in particular, they enable data analysis and modelling on macro level. As defined in paragraph 4, above, macro models express the relations among different variation phenomena which are traditionally considered in an isolated view only. Macro analyis can be attached to the data warehouse in a decision support system (DSS). Data warehouse applications in engineering already use statistical macro analysis on a heuristic basis. Statistics can contribute substantially to this field by providing models and methodology. An example of data warehousing with a corresponding DSS from computer disk drive manufacturing a t IBM is reported on by Rutledge (2000). T h e IBM Storage Technology Division operates facilities for fabricating, assembling and testing the components of IBM data storage products, in particular wafer, disk, head gimbal assembly, head stack assembly, hard disk drive factories. Data from all plants and assembly steps are collected in a data warehouse with weekly updating. A DSS is linked t o the data warehouse with the primary purpose of yield analysis where yield is defined as the expected proportion of
nonconforming final product. The DSS allows to study the effect of tightening component parameters on product yield. The model underlying this study is based on heuristics and on a neural network approach. This is a typical example where a more subtle stochastic analysis by a structural components or variance transmission model may be helpful.
6 The Analysis of Temporal Patterns T h e analysis of temporal patterns is motivated by problems of telecommunication network monitoring in paragraphs 7 and 8. Paragraphs 9 - 11 describe KDD approaches to temporal pattern analysis. Paragraph 12 discusses contributions of stochastic methodology t o temporal pattern analysis. Paragraph 13 considers the use of temporal pattern analysis in industrial control problems.
7 Telecommunication Network Monitoring Telecommunication networks are growing rapidly in size and complexity, often consisting of several thousands of components. Most network components are semi-intelligent, i. e., they record critical events (malfunction, failure or breakdown) and send alarm messages to a network operation center where the messages are stored and analyzed with the objective of preventing against or removing network failures. As an example, Frijhlich et al. (1997) describe the alarm network and 11 typical alarm messages in the GSM (Global System for Mobile Communications) mobile network which is the international, pan-European operating standard for the current generation of digital cellular mobile communications. Weiss (2001) describes alarm messages generated by devices in a 4ESS1 switch in the AT&T telecommunication network. T h e analysis of alarm messages faces four problems. 1. T h e quantity, clumsiness and concurrency of incoming messages. Modern telecommunication networks may produce thousands of alarms per day. A large quantity of alarm messages from different sources may rush in quickly. 2. Layered system architecture: A critical event in one component can be propagated through the network (cascade effect) and cause a storm of alarms. For instance, when a WAN (wide area network) router is idle, each device behind the router loses connection and starts sending alarm messages. 3. Alarm messages indicate malfunction, but they do not provide a diagnosis of the assignable cause of malfunction, neither in type nor in location. Causes may be network internal or external in the environment or in users. (number 4 electronic switching system) is a tandem traffic switching l ~ h 4ESS c system which routes traffic to and from end offices in the telephone network.
Example: For microwave systems, weather conditions are a main external failure cause. Fog, rain, or snow can lead to a connection breakdown. 4. Incompleteness, faultiness or loss of alarms. 5. Serious and less important alarms have t o be distinguished. Some critical events are temporary and transient and require no intervention. Others persist, correspond to serious malfunction or failures which require interventions (tracking, identification, localization, repair). Example: Bit slippage in a ~1~ link. Bit slippage means the loss of a bit, caused by variations in the respective clock rates of the transmitting and receiving devices. Bit slippage often clears quickly. Stochastic problems in alarm analysis result from indeterminate, incomplete and distorted information. Hence network monitoring is a proper subject of statistics in the sense of paragraph 1, and, in particular, a proper topic of SPC. However, investigations on network monitoring seem to be Ieft to computer science and engineering.
8 Event Series Analysis in Network Monitoring The administration and interpretation of alarm messages are the essential tasks of network monitoring. From a methodological point of view, two topics are of major importance: alarm correlation and prediction. Alarm correlation attempts to increase the transparency of the alarm system, see Jakobson & Weissman (1993). Alarms are collected in groups, the groups are interpreted as single alarms of higher order which are associated with a diagnosis and with intervention rules, e. g., failure search, troubleshooting. Grouping follows patterns: A sequence of alarms satisfying a specified pattern is mapped onto a more meaningful higher order alarm. Grouping has to balance two guiding interests:
r
Reduction: The number of monitored alarms should be reduced as far as possible to simplify monitoring. Information: Alarms of higher order should provide relevant information about the network. Groups should be designed so as to support and simplify failure detection. In the ideal case, a group is equivalent to a diagnosis of a certain failure in the network.
Groups may be identified by analyzing the inferential structure of failures and alarms in the network. In simple cases, failures and alarm messages are arranged in deterministic causal dependency hierarchies. In these cases, balancing the above requirements is straightforward.
2 A T1 is a synchronous bit-oriented transmission facility operating at DS1 digital signaling rate, i. e., 1.544 Mbps (million bits per second).
In general, however, there is no causal hierarchy among failures and alarms. T h e exact model analysis of the network structure may end up in a n involved stochastic graph, see Fabre et al. (1998). Most approaches t o alarm correlation do not attempt to describe the network structure completely in a n explicit model. They rely on heuristic or experience based grouping and rather consider the administration of alarm groups in rule based systems, knowledge bases or management platforms, e. g., see Brugnoni et al. (1993), Jakobson & Weissman (19931, Moller & Tretter (1995). Many so-called model based approaches contain simplifying assumptions and rely on prior information which has to be guessed or is obtained from experience, see Bouloutas et al. (1994) or Frohlich et al. (1997). The empirical analysis of temporal patterns or episodes is vital for alarm correlation, in particular, the analysis of patterns which are frequent over time. Successful interpretation of frequent episodes is most promising for simplifying network monitoring and for reducing expenses in time, labour, and cost. On principle, the identification of frequent event episodes is a proper task of statistics. Historical network alarms are available in large databases. However, customary statistics provides no tools for identifying frequent patterns from large databases. This task is typical for KDD, in the present case for temporal data mining, i. e . , data mining in time series. A KDD approach to finding frequent event episodes is described in paragraph 10, below. T h e second important topic for the theory of network monitoring is predzctzon. Failures and breakdowns of critical components should be avoided by diagnosing previous symptoms of such events in the network and taking adequate preventive measures. The natural previous symptoms are specified constellations of alarm messages from the network. Hence prediction, as alarm correlation, is based on alarm patterns or episodes. If the target event is relatively frequent, predictions can be constructed from frequent episodes. However, interesting target events like major or catastrophic network failures are rare. Weiss (2001) considers the problem of predicting extremely rare catastrophic failures of 4ESS switches in the AT&T telecommunication network. For the prediction of such events the analysis of frequent episodes may be useless. A KDD approach to the prediction of rare events is described in paragraph 11, below.
9 Episodes and Patterns in Time Series of Events The subsequent definition of a pattern or epzsode of events unifies the approaches of Mannila et al. (1997) and of Weiss & Hirsh (1998). Consider a finite set A of categorical events. T h e event 0 E A is the null event (nothing happens). Interest is in the events in A. = A \ O . The latter are interpreted as the alarms in the network monitoring context. They may also
be interpreted as events in financial markets or as events in a manufacturing system. In our view, an event is a category which may occur in time. Let (Et)t,o,l,... b e a time series of occurring events Et E A. An episode a can be represented as a directed acyclic weighted graph where the nodes (vertices) correspond t o events in A. and where the direction of t h e arrows (edges) expresses the time order. T h e absence of a n arrow between two nodes (events) Ai, Aj means t h a t no time order is prescribed for t h e occurrence of Ai, A j . A graph component Ai --, A j means: Ai precedes Aj, i. e., E, = Ai, Et = A j for some s < t. The weights of the arrows range over {*, .). A graph component Ai - 3 A~j means that arbitrary events from A. can Aj means immediate occur a t times s + 1,..., t - 1. A graph component Ai precedence, i, e., none of the events from docan occur a t times s 1,..., t - 1, i. e., E,+l = ... = Et-1 = 0. An epsiode P is a (proper) subepisode of a, denoted by P 3 a (/3 4 a), if 0 is represented by a (proper) subgraph of t h e graph expressing a .
-
+
Some important constellations are illustrated by figure 1. I n the serial
Fig. 1. Episodes of Events.
episodes on t h e left-hand side the events occur successively in time order, i. e., E, = AS, Et = Az where s < t . I n the parallel episode a 3 in t h e center both events occur without requirements t o time order, i. e., E, = A1, Et = Ag with some s , t . T h e episode a 4 on the right-hand side is neither serial nor parallel. a 4 contains two proper serial subepisodes P1 4 a d , p2 4 a d , where PI consists of events A l , A2 and p2 consists of events Ax, Az in time order. Episodes can also be described as temporal event patterns in t h e extension of a formalism suggested by Weiss & Hirsh (1998). T h e notation uses t h e primitives "*", " V", "I", "." with the following rules: 1. Each event from A is a pattern. 2. If 4 is a pattern, *4 (respectively $*) is the pattern where a n arbitrary number of events from A precedes (respectively follows) t h e pattern 4. 3. If 4, $I are patterns, 4 . $ is the pattern where $ immediately follows 4. 4. If 4, 7 ) are patterns, is the pattern where 4 and $J occur immediately one after each other in the succession 4, 7 ) or in the order $, 4.
5. If
4,$ are patterns, 4 V $ is the exclusive disjunction pattern where either or $ occurs.
T h e primit,ives have the precedence order "*", " V " , "I", ".". For instance, the episode a4 from the right-hand side of figure 1 can be expressed as the pattern *Al * v * A3 * .A2. T h e above definitions of episodes or temporal event patterns is sufficient for many purposes. Yet more refined temporal structures can be expressed by temporal logic, see Padmanabhan & Thzhilin (1996) or Drusinsky & Shing (2003).
10 A KDD Approach for Finding Frequent Episodes in Time Series Mannila et al. (1997) present an approach for finding and analyzing frequent epsiodes in time series of events. Episodes are described by means of graphs as exposed in paragraph 9, above. Immediate and intermediate succession of events are not distinguished, i. e., the weights *, . of the arrows in the graph are obsolete. Let ( E t ) t = O , , , ,be , ~ a time series of events Et E A. For conveniences sake, let Et = 0 be the null event for times t < 0 or t > T. Focus is on episodes occurring in specified time windows. A window of width win is a subsequence EL,..., Et+,in-l of win successively observed events. For instance, Eloo= A4, Elol = 0,Eloz= A3, Elo3= A], EIo4= A2 is a window of length 5 starting a t time t = 100. Sucessive windows Et,..., Et+,in-1, E t + l ,..., EttWi,,of given width win overlap in the w i n - 1 observations win Et+1,..., Et+,inPl observations. For given width w i n , only the T windows between E-,in+l, ..., Eo and E T , ..., ET+,in-l are interesting. T h e remaining windows consist of the null event 0. An episode a of events A i l , ...,Ai, occurs in a window Et, ..., Et+,in_l if there are k pairwise different times t t l , ..., 5 t h 5 t w i n - 1, in the time order prescribed by the epsiode, with Etl = Ail, ...,Et, = Ai,. For instance, the window Eloo= A4, ElO1= 0 ,Elo2 = A3,ElO3 = A1,ElO4 = A2 contains all epsiodes from figure 1. If a window contains a n episode a , it also contains all subepisodes 5 a.
+
<
+
Consider windows of a given size win. Two tasks are interesting. 1. Find epsiodes a which are occurring frequently in windows of size win, i. e., with high values of the relative frequency win ( a )
# w i n d o w s of size w i n containing a
=
# w i n d o w s of size w i n
# w i n d o w s of size w i n containing a
T + win
-
2. For frequent episodes a in windows of size win, find subepisodes /3 5 a such t h a t cu is frequently linked to /3 in windows of size win, i. e., with high values of the conditional relative frequency rwin(~IP) =
#windows
of size w i n containing or
#windows
of size .win containing
P
-
(a) rwin(,@ rwin
(2)
These pairs fi 5 a constitute the basis for prediction rules of the type less speczjic pattern + more specij5c pattern. If a pair 0 5 a has a high conditional relative frequency, the more specific episode a is predicted from the occurrence of the less specific episode P. T h e quantity (2) is called the (empirical) confidence of the prediction rule ,8 5 a . Mannila et al. (1997) provide algorithms to accomplish the above tasks. Lower bounds min, for the relative frequency of events and cmin, for the conditional relative frequency of events in windows of size fixed size w i n are prescribed. T h e algorithms detect all episodes a with r,,,(a) 2 min,, and among these all prediction rules fi 5 a with rw,,(alP) 2 cmin,.
11 A KDD Approach to the Prediction of Rare Events Mannila et al. (1997) search for prediction rules P 5 a among frequent episodes a . This approach fails t o establish rules for predicting rare events. T h e latter topic is investigated by Weiss (1999, 2001) and Weiss & Hirsh (1998). Their investigations originate in the problem of predicting failures of devices in 4ESS switches of the AT&T telecommunication network. We describe the general features of the approach t o event prediction. Let (Et)t,o,l.,. be a time series of events from A. Interest is in predicting the occurrence of target events from a subset 2 C A, e. g., equipment failures or breakdowns. The events in the complement D = A \ 2 are t h e diagnostic events, e. g., alarm messages. For instance, the essential information of messages generated by devices in a 4ESS switch is given by t h e tuple (device id, severity, diagnostic code) where severity adopts the levels warning, minor, major. T h e target events are messages with code level "failure". T h e modes of predicting a target event Z E 2 are determined by two components and a rule: 1) A predictor set cPz of diagnostic patterns or episodes from 2).2) T h e occurrence window width w i n z . T h e prediction rule is: At time t , the target event Z is predicted, if one of the patterns 4 E Qz occurs in the time window Et-wi,,+l, ..., Et of length w i n z . To be useful for system operating purposes a prediction made a t time t has to refer t o sufficiently distant reference tzmes s > t , e. g., t o allow for preventive action on the system. The necessary distance is prescribed by the lead tzme length or warnrng tzme length w, i. e., any prediction made a t time t refers to times s t + w. If the reference times are too far away from
>
t h e prediction time t , the prediction is meaningless. T h e latter requirement is reflected by prescribing the monitoring period length m > w, where any prediction made a t time t refers t o times s 5 t m. Hence a prediction of a target event Z a t prediction time t asserts that Z will occur a t some time t + w 5 s 5 t m, i. e., that there will be some time t + w 5 s 5 t m with E , = Z.Hence the meaning of "correct prediction" ("hit") a n d "false prediction" is obvious.
+
+
+
T h e empirical evaluation of the prediction strategy reflects two opposing requirements which have to be balanced by the strategy. T h e first requirement is diversity in prediction attempts, i, e., the strategy should not concentrate on the most perspective target events where successful prediction is most easily to be achieved. T h e set 2 of target events should be covered as far as possible by correct predictions. This amounts to requiring a high value of the hit rate or recall #target events correctly predicted recall = (3) #target events occurring T h e second requirement is precison. T h e number of correct predictions should be large in comparison with the number of false predictions. This amounts t o requiring a high value of the precision
precision
=
#correct predictions #predictions
Both quantities recall and precision are simultaneously accounted for by the weighted harmonic mean
F,
=
1
&(-+A)
-
+
(T 1 ) . recall . precision recall 7 . precision
+
(5)
where a weight T > 0 is attached t o recall. F, is known as F-measure in the theory of information retrieval, see van Rijsbergen (1979). T h e measures recall, precision, F, can be used t o evaluate the entire prediction strategy or single patterns. For further diversity considerations a distance measure d ( 4 , $) between two prediction patterns is established as the ratio of the number of target events occurring where 4 and .J, differ in prediction, divided by the total number of target events occurring. Then
measures the similarity of a prediction pattern patterns 11.
4 t o t h e entire set of prediction
From a learning event sequence (Et)tzo,...,-,t h e sets @z of prediction patterns are determined by a genetic algorithm. T h e algorithm is initialized
with patterns consisting of single events. In each step, new patterns are created from existing ones by crossover and mutation operators, and worse patterns are discarded in favour of better ones, where the patterns are evaluated by the quotient F 7 ( 4 ) / N ( 4 ) of the F-measure and the similarity measure N(#J). For details, see Weiss (1999) and Weiss & Hirsh (1998).
1 2 Relations between Temporal Pattern Analysis and Stochastic Time Series Analysis Paragraphs 10 and 11 describe two instances of KDD approaches to the analysis of patterns in time series of events, one concentrating on frequent patterns, the other concerned with prediction of rare events. In recent years, the subject has received growing interest in research, see for instance the bibliography by Roddick & Spiliopoulou (1999). Time series analysis is a classical subject of stochastics. Over the last two decades, the analysis of categorical time series received considerable attention in the academic literature. Several models were suggested. Raftery (1985) introduced the mixture transition distribution (MTD) model for p-th order Markov chains. For methods of empirical inference under MTD models see Berchtold & Raftery (1999). Jacobs & Lewis (1978a, 197813, 1978c) introduced DAR (discrete autoregressive) and DARMA (discrete autoregressive moving average) processes. Jacobs & Lewis (1983) generalize DARMA to NDARMA(p, q) (dzscrete autoregressive moving average) models. Several authors discuss regression models, see Fahrmeir & Kaufmann (1987), Green & Silverman (1994), Fahrmeir & Tutz (2001). Hidden Murkov (HM) models are popular in speech recognition, see Rabiner (1989) or MacDonald & Zucchini (1997). Measures of categorical dispersion and association are discussed by Liebetrau (1990) or Agresti (1990). I n applications, stochastic modelling of categorical time series suffers in two related respects: communication and simplicity. Stochastic categorical time series models are poorly communicated to practitioners. They are discussed rather in academic stochastic literature and are known only in specialized communities. There is no easily accessible literature for categorical prediction problems as it exists in cardinal time series analysis. Typically, Weiss & Hirsh (1998) refer to the monograph by Brockwell & Davis (1996) as their source of time series prediction, they state that "these statistical techniques are not applicable to the event prediction problem", and they conclude that learning by data mining should be used to solve their problem. In theoretical substance and in the way df presentation stochastic categorical time series models are not accessible for practitioners. For the analysis of cardinal temporal data, a set of convenient and flexible concepts and methods is widely acknowledged and used, e. g., serial correlation, ARMA models, ARIMA mod-
els, state space models, Kalman filtering. A similar standard toolbox does not exist for the case of categorical temporal data. Since stochastic models are insufficiently communicated and not suitably tailored for application, the practice of categorical time series analysis has widely become a subject of the KDD community. The advantages of KDD are sparseness, simplicity, immediacy, flexibility, potential of customization, and adaptivity. A difficult problem like event analysis or prediction in a telecommunication network can be tackled immediately without preliminary modelling the specific structure of the underlying phenomenon. Assumptions are few and elementary, e. g., assuming some probability t o be constant over a time period. In particular, narrowing assumptions on probability distributions are avoided. Hence the method applies to a great variety of situations. Provided that new data are a t hand, a KDD approach easily adapts t o system dynamics, e. g., reconfiguration or expansion of a network. The drawbacks of KDD approaches are poverty in explanation and structural analysis, opaqueness and heuristic character of methods, inability t o account for prior incomplete knowledge on the phenomenon, inability to distinguish between substantial and random effects. The source of the event series remains a black box, the laws of the underlying phenomenon (network, system) remain unknown. However, this kind of black box modelling governs large parts of cardinal time series analysis, too, e. g., consider the Box-Jenkins modelling approach. Mannila et al. (1997) mention a marked point process as a stochastic structure for categorical time series. Alas, sufficiently simple and customized tools for the analysis of such a structure are not on the market. A more perspective approach are hidden Markov (HM) models. These models are clearly and transparently structured, and serve well t o explain the relation among observed event patterns and the stipulated states of a n underlying system. However, HM models are difficult in theory, implementation requires expert knowledge in stochastics, time, thorough study of the subject matter. HM models do not serve for rapid customized solutions. Adaptation t o reconfigurations in the systems may be difficult. Whereas KDD approaches are already widely used in network monitoring, the potential of HM modelling is still under study. Several contributions resulted from the MAGDA (Mod6lisation et Apprentissage pour une Gestion Distribuhe des Alarmes) project run by France Telecom. Fabre et al. (1998) describe the network as a so-called partially stochastic Petri net which contains random and deterministic variables. The hidden states of the net are subject to an HM model. An extension of the Viterbi algorithm is used to calculate the most likely state history from the observed alarms, see also Aghasaryan et al. (1998), Benveniste et al. (2001, 2003). Practical experiences with these approaches are not reported on. For many applications, immediacy, flexibility, and adaptivity are predominant requirements. In these cases, a KDD approach is preferrable. It is an interesting strategy for statisticians to t o refine and clarify KDD approaches
by stochastic reasoning. An important topic is the definition of standard objective functions to measure the performance of prediction policies, see Hong & Weiss (2004). Measures mentioned in paragraph 11, like recall, precision, F-measure are common in the information retrieval and language processing communities, see Yang & Liu (1999). They are generally used as empirical measures without an underlying stochastic model. I t is necessary t h a t such measures are formulated and investigated in terms of a stochastic model. In particular the relations t o stochastic measures of prediction quality should be investigated.
13 Temporal Pattern Analysis and St atistical Process
Control Paragraph 8 introduced the topic of temporal pattern analysis from the point of view of telecommunication network monitoring. T h e topic is of similar importance for monitoring and control of industrial processes or systems.
13.1 Complex Monitoring and Control Tasks. Temporal pattern analysis is an important tool in monitoring and control of complex systems. Methods for detecting and interpreting patterns of events (status messages, alarms, adjustments etc.) are important for obtaining and processing information from the system. For instance, Milne et al. (1994) develop a feedback monitoring and maintenance policy for industrial gas turbines which have a quite sophisticated structure. An important part of the policy is a tool ("chronicle model") for representing temporal episodes of system events like external action, internal processes, status messages, alarms, and for associating episodes with maintenance actions. An online recognition system detects episodes from the chronicle model1 during system operation.
13.2 Event P a t t e r n Analysis and Control Charting. Temporal pattern analysis can help t o extract more information from control charts. Kusiak (2000) interpretes a control chart as a simple clustering mechanism which devides the sample space into two regions: alarm region (event A) and no-alarm region (event 0 ) .Thus control charting produces episodes of the type 0. ... . O . A. The information contained in the corresponding specific sequence T I ,T z ,... of sample statistics is not used. However, this information may be useful for two purposes: 1) For the specific purpose of the control chart to detect predefined out-of-control situations rapidly, e. g., a shift in the mean or in the variance. 2) For the purpose of learning about the process. A well-known approach to achieve purpose 1) in the classical two-sided Shewhart X chart is the addition of warning limits and runs rules. In addition t o the action limits (lower and upper 3 sigma limits), the warning limits
define further regions in the sample space (range of X), usually bounded by the target, positive and negative 1 sigma limit, positive and negative 2 sigma limit. Thus the range of X is divided into 8 regions which correspond to 8 events. The runs rules prescribe a n alarm, if the temporal succession of the occurrences of these events corresponds to certain patterns (episodes). A popular set of three rules was originally used a t Western Electric Company (1956). In various situations, charts with runs rules perform better than simple Shewhart charts, see Page (1955), Roberts (1958), Bissel (1978), Wheeler (1983) and Champ & Woodall (1987,1990), Gi5b et al. (2001). In specific situations, further sets of rules may be useful. Methods for finding and administrating such rules can be adopted from alarm pattern correlation in network monitoring. Control charting intends to detect rare events. It should be investigated whether methods for predicting rare events as exemplified in paragraph 11 can be helpful to improve upon control chart performance. A formal view considers a control chart as an iterated statistical test of significance of a parametric hypothesis, e. g., on the mean or variance of a distribution. Though helpful for mathematical analysis, this view unnecessarily restricts the potential of control charting, see the discussion by Woodall (2000). A control chart can convey a lot of additional information on the process. Experienced operators study patterns on the chart t o detect unwelcome variation and to learn about unknown factors or sources of variation. As a rule of thumb, Wheeler (1995), page 139, recommends t o seek a n explanation for any time pattern that repeats itself eight times in succession. Patterns are also used t o track assignable causes after an alarm. Such intuitive reasoning can be improved by tools for detecting temporal patterns and for administrating associated rules. Patterns in control charts have been investigated with standard pattern recognition methods, e. g., with neural networks, see Hwarng & Hubele (1991) or Smith (1992). However, customary classification methods are not designed for temporal patterns, and they focus on previously defined pattern types like trend or cycle. The application of more recent methods designed specifically for temporal patterns seems more promising. Investigation of temporal patterns my also be used t o tune a so-called pre-control chart. See Bhote (1988) for details on pre-control.
13.3 Event Pattern Analysis in Automation. Automation often intends to substitute the manual operations of skilled operators by automata. The analysis of successive patterns of operations can support this process. An approach of this type is described by Heierman & Cook (2003).
13.4 Event Pattern Analysis and Process Monitoring. Classical online monitoring (control charting) methodology considers the process as a black box and ignores information about explanatory factors. This
approach was adequate in a time when online information from the process and its environment was difficult to be obtained and processed. Nowadays, a lot of information (occurreences of events) are available from measuring equipment, machines and automata in the production line and from sources in the environment of the process. Practitioners criticize the black box approach of classical monitoring tools which do not exploit further information. In a customary view, the investigation of process factors is completely detached t o off-line S P C in an experimental preproduction phase which implements techniques like quality function deployment (QFD), failure mode and effect analysis (FMEA), design of experiments (DOE). However, designed experiments may produce designed results. The real operating conditions may be different and may introduce unexpectedly varying factors, e, g., production speed, material properties, operators' dispositions, computer problems. Event pattern analysis can help to learn from events in the process and its environment. In particular, a global SPC approach, see paragraph 4 can use event pattern analysis in the macro analysis of processes. 13.5 Event Management
Paragraph 8 motivates event management by problems of telecommunication network monitoring. Similar and related problems occur on several levels of industrial organizations, particularly in manufacturing and in supply chains. A manufacturing system or a supply chain can be considered as a network of nodes corresponding to departments, assembly lines, automata, computers, measuring devices, conveyances, freight routes, logistic nodes. Malfunctions or failures in a single component can propagate through the system. In a modern environment, the nodes permanently collect and communicate information in the form of status messages, signals, alarms, reports, e. g., on machine status, machine adjustments, machine malfunction or failure, control chart signals, throughput, quality deviations (product, material), environment parameters (temperature, humidity), short of materials, short of manpower, changes in operators, orders overdue, freight status, transit delays. These events have t o be processed, categorized, analyzed, so as t o take suitable action on the level of management, organization, engineering, operation. This is the task of manufacturzng event m a n a g e m e n t (MEM) or supply c h a m event m a n a g e m e n t (SCEM). Similar problems occur in customer relatzonshzp event m a n a g e m e n t (CREM). In industry, interest in event management is growing rapidly. Lots of event management software packages are on the market, e. g., ALTA Power, Aspen Operations Manager, SAP Event Management Visiprise Event Manager, Movex, Categoric Software, Matrikon ProcessGuard. Yet, the topic has received few attention in the academic literature. As in telecom network mon-
itoring, four aspects have to be distinguished: 1) Tools for notifying events. 2) Tools for the intelligent administration of events, event pa$terns and interpretations thereof, and rules. 3) Event correlation, i. e., determining and interpreting meaningful patterns of events. 4) Inference from event patterns (prediction). Aspect 1) is essentially a matter of information transmission and reporting. Aspect 2) requires technology of databases, rule based systems, knowledge bases. Aspects 3) and 4) include the topic of analyzing time series of events. Practitioners criticize, see Bartholomew (2002), t h a t event management packages heavily concentrate on aspect 1) and, occasionally, aspect 2), but fall short completely in aspects 3) and 4). Statisticians should become aware of the opportunities for their discipline in t h e area of event management. Event management is a n excellent framework for statistical control methodology on a macro level, see paragraph 4. T h e framework is described b.y Ming et al. (2002) in their architecture for anticipative event management and intelligent self-recovery consisting of four components: i) event monitoring and filtration, ii) intelligent event manager, iii) intelligent self-recovery engine, iv) supporting databases. Stochastic problems are inherent in components i), ii), iii). In particular, methods for analyzing categorical time series are required.
14 Conclusion We have reviewed the relevance of database technology a n d KDD for process monitoring and control. In particular, we have studied d a t a warehousing and methods for temporal pattern analysis. Both fields can contribute t o enhance classical statistical control methodology, and to integrate statistical d a t a analysis into higher order decision processes in industrial organizations.
References 1. Aghasaryan, A., Fabre, E., Benveniste, A,, Boubour, R., Jard, C. (1998) "A Hybrid Stochastic Petri Net Approach to Fault Diagnosis in Large Distributed Systcms". In : Mathematical Thcory of Network and Systems (MTNS), edited by A. Bcghi, L. Fincsso, and G. Picci, I1 Poligrafo, Padova, Italy, pp. 921-924. 2. Agrawal, R., Lawless, J. F., and Mackay, R. J. (1999) "Analysis of Variation Transmission in Manufacturing Processes Part 11". Journal of Quality Technology, Vol. 31, No. 2, pp. 143-154. 3. Agrcsti, A. (1990) Categorical Data Analysis. John Wiley and Sons Inc., New York. 4. Bartholomew, D. (2002) "Event Management: Hype or Hope?" Industry Week, May 2002. 5. Bcndcll, A., Disney, J., and McCollin, C. (1999) "The Future Role of Statistics in Quality Engineering and Management". The Statistician, 48, Part 3, pp. 299-326. -
6. Benveniste, A,, Le Gland, F . , Fabre, E., and Haar, S. (2001) "Distributed Hidden Markov Models". Pages 211-220 in: Optimal Control and PDE's - Innovations and Applications. In honor of Alain Bensoussan on the occasion of his 60th birthday. Editcd by J.-L. Menaldi, E. Rofman, and A. Sulem. IOS Press, Amsterdam. 7. Benveniste, A,, Fabre, E., and Haar, S. (2003) "Markov Nets: Probabilistic Models for Distributed and Concurrent Systems". IEEE Transactions on Automatic Control, 48, 11, pp. 1936-1950. 8. Berchthold, A., and Raftery, A. (1999) The mixture transition distribution (MTD) model for high-order Markov chains and non-Gaussian time series. Technical Report 360, Department of Statistics, University of Washington. 9. Bhote, K . R. (1988) World Class Quality: Design of Experiments Made Easier More Cost Effective than SPC. American Management Association, New York. 10. Bouloutas, A. T . , Calo, S., and Finkcl, A. (1994) "Alarm Correlation and Fault Identification in Communication Networks". IEEE Transactions on Communications, Vo1.42, No. 21314, pp. 523-533. 11. Braucr, B. (2001) "Data Quality - Spinning Straw Into Gold". Paper 117 in: Proccedings of thc 26th SAS Users Group International Conference, SAS Institutc Inc. 12. Brockwell, P. J . , and Davis, R. A. (1996) Introduction t o Time Series and Forecasting. Springer-Verlag, New York. 13. Brugnoni S., Bruno G., ManioncR., Montariolo E., Paschetta E., Sisto L. (1993) "An Expert System for Real Time Fault Diagnosis of the Italian Telecommunications Network". In: Proceedings of the IFIP TC6/WG 6.6 Third International Symposium on Integrated Network Management, pp. 617-628, ElsevierINorthHolland. 14. Davison, B., and Hirsh, H. (1998) "Probabilistic Online Action Prediction". In: Proccedings of the AAAI Spring Symposium on Intelligent Environmemnts. 15. Drusinsky, D., and Shing, M.-T. (2003) "Monitoring Temporal Logic Specifications Combincd with Timc Scrics Constraints". Journal of Universal Computer Science, vol. 9, no. 11, pp. 1261-1276. 16. Fa.bre, E., Aghasaryan, A., Benveniste, A., Boubour, R., and Jard, C. (1998) "Fault Dctection and Diagnosis in Distributed Systems: An Approach by Partially Stochastic Petri Nets". Discrete Event Dynamic Systems 8, 2 (Special issue on Hybrid Systems), pp. 203-231. 17. Faltin, F. W., Mastrangelo, C. M., Rungcr, G. C., and Ryan, T . P. (1997) "Considerations in the Monitoring of Autocorrelated and Independent Data". Journal of Quality Technology, Vol. 29, No. 2, pp. 131-133. 18. Fahrmeir, L., and Kaufmann, H. (1987) "Regression models for non-stationary categorical time series". Journal of Timc Series Analysis, Vol. 8, No. 2, pp. 147-160. 19. Fahrmcir, L., and Tutz, G. (2001) Multivariate Statistical Modelling Based on Gcncralixcd Linear Models. Springer-Verlag, New York. 20. Fong, D. Y. T . , and Lawless, J . F. (1998) "The Analysis of Process Variation Transmission with Multivariate Measurcmcnts". Statistica Sinica, 8, pp. 151164. 21. Friedman, J. H. (1997) "Data Mining and Statistics: What's t h e Connection?". In: Proccedings of t h e 29th Symposium on the Interface, edited by D. Scott. 22. Fricdman, J. H. (2001) "The Role of Statistics in the Data Revolution". International Statistical Review, 69, 5.
23. Frohlich, P., Nejdl, W., Jobmann, K., and Wietgrefe, H. (1997) "Model-Based Alarm Correlation in Cellular Phone Networks". In: Proceedings of t h e Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telccommunicatiori Systems (MASCOTS). 24. Gale, W . A , , Hand, D. J . , and Kelly, A. E. (1993) "Artificial Intelligence and Statistics". Pages 535-576 in: Handbook of Statistics 9: Computational Statistics, edited by C. R. Rao, North-Holland, Amsterdam. 25. Gob, R., Dcl Ca.stilto, E., a.nd Ra.tz, M. (2001) "Run Length Comparisons of Shewhart Charts and Most Powerful Test Charts for the Detection of Trends and Shifts". Communications in Statistics, Simulation and Computation, 30, 2, pp. 355-376. 26. Green, P. J., and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, London. 27. Hand, D. J . (1998) "Data Mining: Statistics and More?" T h e American Statistician, Vol. 52, No. 2, pp. 112-118. 28. Hand, D. J. (1999) "Statistics and Data Mining: Intersecting Disciplines". SIGKDD Explorations, Volume 1, Issue 1, pp. 16-19. 29. Hcicrman, E., and Cook, D. J. (2003) "Improving Home Automation by Discovering Regularly Occurring Device Usage Patterns". In: Proceedings of the Intcrnational Confcrcncc on Data Mining, pp. 537-540. 30. Hong, S. J . , and Wciss, S. (2004) "Advances in Predictive Models for Data Mining". Pattern Recognition Letters Journal. To appear. 31. Hwarng, H. B., and Hubele, N. I?. (1991) "X-bar Chart Pattern Recognition Using Ncural Networks". ASQC Quality Congress Transactions, pp. 884-889. 32. Inrnon, W. H. (1996) Building the Data Warehouse. John Wiley and Sons Inc., New York. 33. Jacobs, P. A,, and Lewis, P. A. W. (1978a) "Discrete time series generated by mixturcs. I: Correlational and runs properties". Journal of t h e Royal Stat. Soc. B, Vol. 40, No. 1, pp. 94-105. 34. Jacobs, P. A., and Lewis, P. A. W . (197%) "Discrete time series generated by mixturcs. 11: Asymptotic properties". Journal of the Royal Stat. Soc. B, Vol. 40, No. 2, pp. 222-228. 35. Jacobs, P. A., and Lewis, P. A. W. (1978a) "Discrete time series generated by mixturcs. 111: Autorcgressive processes (DAR(p))". Naval Postgraduate School Technical Report NPS55-78-022. 36. Jacobs, P. A,, and Lewis, P. A. W . (1983) "Stationary discrete autoregressivemoving average time series generated by mixtures". Journal of Time Series Analysis, Vol. 4, No. 1, pp. 19-36. 37. Jakobson, G., and Weissman, M. D. (1993) "Alarm Correlation". IEEE Network, 7(6), pp. 52-59. 38. Ji, X., Zhou, S., Cao, J . , and Shao, J . (2001) "Data Warehousing Helps Enterprise Improve Quality Management". Paper 115 in: Proceedings of the 26th SAS Users Group Intcrnational Conference, SAS Institute Inc. 39. Kanji, G. K., and Arif, 0 . H. (1999) "Quality Improvement by Quantile Approach". Bulletin of the Intcrnational Statistical Institute, 52nd Session, Procecdings Tome LVIII. 40. Klcnz, B. W., and Fulcnwider, D. 0 . (1999) "The Quality Data Warehouse: Solving Problems for the Enterprise". Paper 142 in: Proceedings of the 24th SAS Users Group International Conference, SAS Institute Inc.
41. Kusiak, A. (2000) "Data Analysis: Models and Algorithms". Pages 1-9 in: Proceedings of thc SPEI Confcrcnce on Intelligent Systems and Advanced Manufacturing, editcd by P. E. Orban and G. K. Knopf, SPIE, Vol. 4191, Boston. 42. Lawlcss, J. F. Mackay, R. J . , and Robinson, J . A. (1999) "Analysis of Varia.tion Transmission in Manufacturing Processes - Part I". Journal of Quality Tcchnology, Vol. 31, No. 2, pp. 131-142. 43. Lcnz, H.-J. (1987) "Dcsign and Implementation of a Sampling Inspection System for Incoming Batches Based on Relational Databases". Pages 116-127 in: Frontiers in Statistical Quality Control 3, edited by H.-J. Lenz, G. B. Wetherill, P.-Th. Wilrich. Physica-Verlag, Heidelberg. 44. Liebetrau, A. M. (1990) Measures of Association. Fifth Edition. Sage Publications, Newbury Park, London, New Delhi. 45. MacDonald, I, L., and Zucchini, W. (1997) Hidden Markov and other models for discrete-valued time series. Chapman and Hall, London. 46. Mannila, H. (2000) "Theoretical Frameworks for Data Mining". SIGKDD Explorations, Volume 1, Issue 2, pp. 30-32. 47. Mannila, H., Toivonen, H., and Verkamo, A. I. (1997) "Discovery of Frequent Episodes in Event Sequences". Data Mining and Knowledge Discovery 1(3), pp. 259-289. 48. McClellan, M. (1997) Applying Manufacturing Execution Systems. St. Lucie Press, Boca Raton. 49. Megan, L., and Cooper, D. J . (1992) "Ncural Network Based Adaptive Control Via Temporal Pattcrn Recognition". Canadian Journal of Chemical Engineering, 70, p. 1208. 50. Milne, R., Nicol, C., Ghallab, M., Trave-Massuyes, L., Bousson, K., Dousson, C., Qucvedo, J., Aguilar, J., and Guasch, A. (1994) "TIGER: Real-Time Situation Assessmcnt of Dynamic Systems". Intelligent Systems Engineering, pp. 103-124. 51. Ming, L., Bing, Z. J . Zhi, Z. Y., and Hong, Z. D. (2002) "Anticipative Event Managcmcnt and Intelligcnt Self-Recovery for Manufacturing". Technical Report AT/02/020/MET of the Singapore Institute of Manufacturing Technology. 52. Moller, M., and Tretter, S. (1995) "Event correlation in network management systems". In: Proceedings of the 15th International Switching Symposium, volumc 2, Berlin. 53. Oates, T . (1999) "Identifying distinctive subsequences in multivariate time series by clustering". In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 322-326. 54. Padmanabhan, B., and Tuzhilin, A. (1996) "Pattern Discovery in Temporal Databases: A Temporal Logic Approach". Pages 351-354 in: Proceedings of the Sccond ACM SIGKDD Intcrnational Conference on Knowledge Discovery and Data Mining, Portland. 55. PricewaterhouscCoopers (2002) Global Data Managcmcnt Survey. PricewaterhouseCoopcrs. 56. Rabincr, L. R. (1989) "A Tutorial on Hidden Markov Models and Selectcd Application in Speech Recognition". Proceedings of the IEE, volume 77, number 2, pp. 257-286. 57. Raftcry, A. E. (1985) "A Model for High-Order Markov Chains". Journal of the Royal Statistical Society, Scries B, Vol. 47, No. 3, pp. 528-539, 1985. 58. Rijsbergcn, C. J . van (1979) Information Retrieval. Butterworths, London.
59. Roddick, J . F., and Spiliopoulou, M. (1999) "A Bibliography of Temporal, Spatial and Spatio-Temporal Data Mining Research". SIGKDD Exporations, volume 1, issue 1, pp. 34-38. 60. Rutledge, R. A. (2000) "Data Warehousing for Manufacturing Yield Improvement". Paper 134 in: Procecdings of the 25th SAS Users Group International Confcrcncc, SAS Institute Inc. 61. G. Shafcr (1976) A Mathcmatical Theory of Evidence. Princeton University Prcss, Princcton. 62. Silvcrston, L., Inmon, W . H., and Graziano, K. (1997) T h e Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs. John Wiley and Sons Inc., New York. 63. Smith, A. E . (1992) "Control Chart Representation and Analysis via Backpropagation Neural Networks". Pagcs 275-282 in: Proceedings of t h c 1992 International Fuzzy Systems and Intelligent Control Conference. 64. Thornhill, N. F., Atia, M. R., and Hutchison, R. J . (1999) "Experiences of Statistical Quality Control with B P Chemicals". International Journal of COMADEM, 2(4), pp. 5-10. 65. Tukey, J . W . "The Future of Data Analysis". T h e Annals of Mathematical Statistics, Vol. 33, pp. 1-67. 66. Wciss, G. M. (1999) "Timewcaver: A Genetic Algorithm for Identifying Predictivc Pattcrns in Scqucnces of Events". Pages 719-725 in: Proceedings of the Genetic and Evolutionary Computation Conference, edited by W . Banzhaf, J. Daida, A. Eiben, M. Garzon, V. Honavar, M. Jakiela. Morgan Kaufmann, San Francisco. 67. Wciss, G. M. (2001) "Predicting Telecommunication Equipment Failures from Scqucnces of Nctwork Alarms". In: Handbook of Knowledge Discovery and Data Mining, edited by W . Kloesgen and J. Zytkow, Oxford University Press. 68. Wciss, G. M., and Hirsh, H. (1998) "Learning t o Predict Rare Events in Event Scqucnccs". Pagcs 359-363 in: Proceedings of the Fourth International Confercnce on Knowledge Discovery and Data Mining, AAAI Press. 69. Wheeler, D. J . (1995) Advanccd Topics in Statistical Process Control. S P C Prcss, Knoxville, Tennessee. 70. Woodall, W . H. (2000) "Controversies and Contradictions in Statistical Process Control". Journal of Quality Technology, Vol. 32, No. 4, pp. 341-350. 71. Wcstern Electric Company (1956). Statistical Quality Control Handbook. American Tclephonc and Tclegraph Company, Chicago. 72. Yang, Y., and Liu, X. (1999) " A Rc-Examination of Text Categorization Methods". In: Proceedings of the ACM SIGIR Conferencc on Research and Development in Information Retrieval.
Optimal Process Calibration under Nonsymmetric Loss Function Przemyslaw G r z e g o r z e w ~ k i land , ~ Edyta Mr6wka1 Systems Research Institute, Polish Academy of Sciences, Ncwclska 6, 01-447 Warsaw, Poland Faculty of Math, and Inf. Sci., Warsaw University of Technology, Plac Politcchniki 1, 00-661 Warsaw, Poland c-mail: {pgrzcg, mrowka}@ibspan.waw.p1
Summary. A problcm of process calibration for nonsymmetric loss function is considered and an optimal calibration policy is suggested. The proposed calibration mcthod might be used e.g. when the loss caused by oversize and undersize are not equal.
1 Introduction Statistical process control ( S P C ) is a collection of methods for achieving continuous improvement in quality. This objective is accomplished by a continuous monitoring of the process under study in order t o quickly detect the occurrence of assignable causes and undertake the necessary corrective actions. Although many S P C procedures have been elaborated, Shewhart control charts are still the most popular and widely used SPC tools. Typical Shewhart's control chart contains three lines: a center line corresponding to the process level and two other horizontal lines, called the upper control limit and the lower control limit, respectively. When applying this chart one draws samples a t specified time moments and then plots sample results as points on the chart. As long as the points lie within the control limits the process is assumed to be in control. However, if a point plots outside the control limits we are forced to assume that the process is no longer under control. It should be emphasized that there is no connection between the control limits of the control chart and the specification limits of the process. The control limits are driven by the natural variability of the process, usually measured by the process standard deviation. On the other hand, the specification limits are determined externally. They may be set by management, the manufacturing engineers, by product developers and designers or by the customers. Of course, one should have knowledge on inherent variability when setting specifications. But generally there is no mathematical or statistical relationship between the control limits and specification limits (see [5], [4]).
It is evident that the quality improvement makes sense if and only if the final products meet requirements and expectations. Therefore, the process should be both in control and should meet specification limits. It is importa.nt to note t,hat we can have a process in control (i.e. stable with small variability), but producing defective items because of low capability. In this paper we consider the problem of the process calibration, i.e. how to set-up a manufacturing process in order to make it capable. In Sec. 2 we introduce basic notation and recall the traditional method of calibration which corresponds to the symmetric loss function. However, quite often the lower and upper specification limits cannot be treated in the same way. For example, it may happen that although one of the specification limits has been exceeded we obtain a nonconforming item that could be improved or corrected, which requires, of course, some costs. But exceeding the other critical specification limit would lead to the complete fault that cannot be corrected. In the last case the whole material used for manufacturing this item would be wasted a.nd the cost would be much higher. Ladany in his papers [ 2 ] ,[3] considered a situation when the loss generated by passing the specification limits are constant, however different for each specification limit (see Sec. 3). However, it seems that a more adequate loss function should not be piecewise constant but the loss corresponding to noncritical fault should increase gradually. The optimal calibration method for such loss function is considered in Sec. 4.
2 Process calibration with symmetrical loss function Let X denote a quality characteristic under study of a product item (length, diameter, weight, thickness, pressure strength, etc.). An item is called nonconforming if the measurement x of X lies outside a specified closed interval [ L S L ,U S L ] given by a certain lower specification limit L S L and upper specification limit U S L . This interval is sometimes called a tolerance interval. Moreover, we generally assume that X is normally distributed, i.e. X N ( m ,02). A common practice is to set-up the mean of the variable a t the middle distance between the specification limits, i.e.
-
PO =
LSL
+ USL 2
This approach is legitimate only if an incurring loss does not depend on which specification limit - the lower or upper - is exceeded. It means that the loss function L ( x ) is symmetrical, e.g. w 0
w
if if if
x < LSL LSL x < U S L x > USL
<
LSL
USL
Fig. 1. Symmetrical loss function (2).
where w > 0.The loss function (2) is given in Fig. 1. As it is seen this method of calibration is very simple and natural. However, as it was mentioned above, very often the assumption of the symmetrical loss function is not appropriate in practice. Let us consider two following examples:
Example 1 Suppose that the quality characteristic under study is the inner diameter of a hole. Undersized holes can be rebored a t extra costs. But the reduction of the oversized hole would be either impossible or would require much higher repair costs. Example 2 Suppose now that the quality characteristic under study is the outside diameter of the bar or the thickness of the milled item. Oversized items can be regrounded a t additional costs. But the undersized item could be sold only for scrap. These examples show that in situations described above the symmetrical loss function is useless and a nonsymmetric loss function should de rather applied.
3 A simple nonsymmetric loss function To find the optimal calibration policy for situations as described in the examples given above let us consider a following loss function: w 0
z
if if if
x < LSL LSL 5 x 5 U S L x > USL
I
LSL
USI,
Fig. 2. Nonsymmetric loss function (3).
where, without loss of generality, we can assume that z < w (see Fig. 2) The expected loss for function (3) is given by the following formula:
E L ( x )= wP(X < LSL)
+zP(X > USL).
(4)
If we assume that the random variable X is normally distributed, i.e. X N ( p , u 2 ) ,then: X - p < LSL-p X-p USL-p E L ( x ) = wP() + z [ 1 - p(5 u u u a LSL - p USL - p = w@( 1-z@( a ) +Zl
)I
(5)
u
where di is a cu~nulativedistribution function of standard normal distribution N(O,l). Suppose that the standard deviation a is known and we consider the expected loss as a function of the process mean p, i.e. E L = E L , ( x ) = EL,&). More precisely, we are looking for such p where the expected loss takes its minimum. The minimum can be attained for p such that
Thus we have
where q5 denotes the density of standard normal distribution N ( 0 , l ) . Hence
Solving this equa.tion we obtain
[
= exp -
(USL - p12 + (LSL - p12 ?a2
2a2
I
which gives W
2021n z
P
= USL' -
USL =
LSL' - 2p (USL - LSL) ,
+ LSL 2
+
n2 w In -. USL- LSL z
By (1) we get immediately
P=PO+
o2
W
U S L - LSL In z
11 is easy to prove that function (5) has extremum at point (8) and it is a minimum. This result corresponds to the discussion by Lanany in [2], [3]. It is worth noting t,hat our formula (8) reduces to the classical result (1) if w = z . IS the t,rue variance o2 is not known then we may estimate it by
where X I , . . . , X , is a random sample from the process under study. Or, even better, we may estimate variance using m random samples X l j , . . . , Xnj, ,j = I , . . . , m and then we get
where
x,i
and where denotes the average obtained for the j-th sample. In this case our calibration policy leads to the following formula -
S2
W
G = po + USL - LSL ln z
4 A general model As it was mentioned in Sec. 1, it seems that a more natural loss function should be given by the formula dla. dla dla dla where, without loss of generality, z rework U L R > U S L (see Fig. 3).
x < LSL LSL 5 x < U S L USL < x 5 ULR x > ULR
(13)
< w and the so-called upper limit for
Fig. 3. Nonsymmetric loss function (13).
Function (13) attributes a high constant loss w for the critical fault and in the case of the noncritical fault it increases gradually on the interval [ U S L ,U L R ] till it reaches the level z. Going back to Example 2, such shape of the loss function reflects nicely that the additional costs for regrounding the oversized item depend on how much it has been oversized: small oversize results in small costs while relatively big oversize causes high costs. Moreover, starting from some oversize the costs of the corrective action remain constant,. In this situation the expected loss is given as follows:
E L ( x )= wP(X 5 LSL)+zP(X > ULR)+
x - USL U L R - U S L f(x)dx (14)
USL
where f ( x ) denotes the density of normal distribution N ( p , u 2 ) .After some transformations we get
L S L - p I+%+%( p - U S L E L ( x ) = w@( c ULR-USL
~7
)
-
-
+z U L R
-
PL- p
USL
u
Z -0
ULR - USL
0
As in Sec. 3 we have to consider the expected loss as a function of the mean p, i.e. E L = E L ( p ) ,and to find its minimum. Thus we need
+-o1 U L R -z U S L [ W L R - P I @ (U L R - p 1 After some simplifications we get
dELXb) Z d~ ULR - USL Hence we have to solve
). =0
which leads to following equation
and which is equivalent to 0 (LSL- P ) ~ In u, + In z ULR - USL 202
t2
+n
exp(--)dt 2
= 0.
(16)
USL-,I
Unfortunately, this equation depends not only on the unknown variable LSL, U S L , U L R , w , z and o , as well. In order to simplify it a t least a little let us introduce a following notation:
p but on 6 parameters:
a=
U S L - LSL 0
b=
7
ULR - USL 7
0
W
K = -. Z
Therefore, the optimal calibration is given by
p = U S L - Aa, where the correction factor A is a solution of the following equation:
Equation (21) is more acceptable than (16) because it depends only on 3 parameters: a , b and K. But it is still very difficult to solve it analytically. It could be solved numerically but then a natural question immediately arises: how to present the result in a convenient way such that it could be recommended to practitioners? Three dimensional tables containing the correction factors A are, of course, are not the best solution! Moreover, it is still not clear which values these parameters should be considered. Therefore, we need another reparametrization. But it we look carefully on (17) we find easily that parameter a resembles one of the best known tools used in the capability analysis - the so called process capability index C p (see [I], [5], [4])
Cp =
U S L - LSL - 1 - -a. 6a 6
The process capability index is a measure of the ability of the process to manufacture products that meet specifications. If the quality characteristic under study is normally distributed and assuming the process mean is centered between the upper and lower specification limits, one can calculate probabilities of meeting specifications q = P(LSL
< X < USL).
(23)
These probabilities for a few values of the process capability index C p are given in Table 1: To illustrate the use of Table 1, notice that C p = 1 implies
Table 1. Probability of mccting specifications.
a fallout rate of 2700 parts per million, while C p = 1.5 implies a fallout rate of 7 parts per million. For more information on process capability indices and process capability analysis we refer the reader to ( [ I ] ) . Moreover, to avoid parameters a and b let us henceforth consider the ratio
Therefore, now the optimal calibration is given by (20), where the correction factor A is a solution of the fbllowing equation
Since the most common values of the process capability index applied in practice are 1, 1.33 and 1.5, we could also restrict our considerations to these values. Thus we can solve numerically (25) for each fixed Cp and then present the correction factor A in tables which depend on two parameters h and K (see Tables 2, 3 and 4). Thus finally, the calibration policy depend on three natural parameters: Process capability index C,, h which is the ratio of the length of the tolerance interval and the length of the interval where loss function (13) is increasing, and K which is the ratio of constant losses w and z . Hence, to sum up, our calibration method for loss function (13), represented by five following parameters LSL, USL, ULR, w , z , might be described by a following algorithm: -
-
fix desired value of the process capability index C,, compute the ratio h = compute the loss ratio K = y find in a Table the appropriate value of the correction factor A set-up the process mean a t p = USL - Ao.
If the standard deviation o is unknown it should be estimated by a sample standard deviation 2 = 3 (see (9) or (11)). In such a way instead of (20) we have to use the following equation:
Remark As it is known the process capability index Cp defined by (22) and listed in Table 1 with the probability (23) is only correct in case that the process mean is centered between the upper and lower specification limits, i.e. p = LSL:USL. However, in (26) we recommend a nonsymmetrical calibration. Thus one may think that there is some contradiction or confusion. Actually there is no contradiction here. All calculations and formulae hold even if we do not refer to capability index and make use of coefficient a given by (17) a,nd called standardized specification interval. However, we have decided to apply C, index which is just a / 6 because practitioners are accustomed just to this coefficient and they have considerable intuition connected witah this pammeter. Therefore, the process capability index Cp is ut,ilized ra,tller as a convenient reference parameter for choosing appropriate correction factor.
0.80 0.85 0.90 0.95 1.00
2.318 2.328 2.337 2.345 2.353
2.308 2.318 2.327 2.335 2.343
2.298 2.308 2.317 2.326 2.334
2.289 2.299 2.308 2.317 2.325
2.281 2.290 2.299 2.308 2.316
2.273 2.282 2.291 2.300 2.308
2.265 2.275 2.284 2.292 2.300
2.258 2.267 2.276 2.285 2.293
2.251 2.260 2.269 2.278 2.286
Table 2. The corrcction factor A for C, = 1.
2.244 2.254 2.263 2.271 2.279
2.238 2.247 2.256 2.265 2.273
0.90 3.416 3.408 3.401 3.394 3.387 3.381 3.375 3.369 3.364 3.359 3.354 0.95 3.422 3.415 3.407 3.400 3.394 3.367 3.382 3.376 3.371 3.365 3.36C 1.00 3.429 3.421 3.413 3.407 3.400 3.394 3.388 3.382 3.377 3.372 3.367
Table 3. The correction factor A for C, = 1.33.
0.90 3.963 3.956 3.949 3.943 3.937 3.932 3.926 3.921 3.917 0.95 3.969 3.962 3.955 3.949 3.943 3.938 3.932 3.927 3.922 1.00 3.974 3.967 3.961 3.955 3.949 3.943 3.938 3.933 3.928
Table 4. The correction factor A for C, = 1.5.
3.912 3.908 3.918 3.913 3.923 3.919
5 Conclusion We have considered the problem of process calibration with nonsymmetric loss function which should be used if e.g. the loss caused by oversize and undersize are not equal. It is worth noting that the suggested calibration policy depends on such natural and well-known parameters as t h e process capability index C,.
References 1. Kotz S., Johnson N.L., Process Capability Indices, Chapman and Hall, 1993. 2. Ladany S.P., Optimal joint set-up and calibration policy with unequal revenue from oversized and undersized items, Int. J. Operations and Production Management, 16 (1996), 67-88. 3. Ladany S.P., Optimal set-up of a manufacturing process with unequal revenue from oversized and undersized items, In: Frontiers in Statistical Quality Control, Lenz H. J., Wilrich P. Th. (Eds.), Springer, 2001, pp. 93-101. 4. Mittag H.J., Rinne H., Statistical Methods of Quality Assurance, Chapman & Hall, London, 1993. 5. Montgorncry D.C., Introduction t o Statistical Quality Control, Wiley, New York, 1991.
The Probability of the Occurrence of Negative Estimates in the Variance Components Estimation by Nested Precision Experiments
Yoshikazu Ojima, Seiichi Yasui, Feng Ling, Tomomichi Suzuki and Taku Harada Tokyo University of Science, Department of Industrial Administration, 2641 Yamazaki, Noda, Chida, 278-85 10, JAPAN, [email protected]
Summary. Nested experiments are commonly used to estimate variance components especially for the precision experiments. The ANOVA (analysis of variance) estimators are expressed as linear combinations of the mean squares from the ANOVA. Negative estimates can occur, as the linear combinations are usually including negative coefficients. The probability of the occurrence of negative estimates depends on the degrees of freedom of the mean squares and the true values of the variance components themselves. Based on the probability, some practical recommendations concerning the number of laboratories can be derived for the precision experiments.
1 Introduction Precision is one of the most important characteristics to evaluate the performance of measurement methods. There are several precision measures; i.e. repeatability, intermediate precision measures, and reproducibility. Their importance and the ways of determination are described in I S 0 5725-3 (1994). Precision is corresponding to measurement errors which are associated to measurement processes and environments. Introducing some replications after a point in the measurement process the measurement errors before and after the point can be evaluated separately. For example, replication of days (i.e, carrying out an experiment on several days) enables to obtain an error component due to the difference between days, and replication of laboratories (i.e. participation of several laboratories) enables to obtain an error component due to the difference between laboratories. From the statistical viewpoint, a measurement result y can be expressed as
where p is a general mean or the true value, a is a random effect due to a laboratory,
p is a random effect due to a day, y is a random effect due to an operator, E is a random effect due to replication under repeatability conditions. We usually assume that p is an unknown constant, and a, P, y, E are random variables with their expectations are 0, and their variances are oA2,ciB2,cic2, and OE2, respectively. Nested experiments are commonly used to estimate variance components (oA2, ciB2,oC2,and oE2)especially for the precision experiments. The ANOVA (analysis of variance) estimators of variance components are widely used because they are unbiased estimators and can be obtained without any distributional assumptions. The estimators are expressed as linear combinations of the mean squares from the ANOVA. The mean squares are chi-squared random variables under the normality assumption. Negative estimates can occur as the linear combinations are usually including negative coefficients. For the case of balanced nested experiments, the mean squares are mutually independent chi-squared random variables, and then the probability of the occurrence of negative estimates can be evaluated by the F distribution. For the case of the generalized staggered nested experiments, the mean squares are chi-squared random variables with correlation. Applying the canonical forms of the generalized staggered nested designs the probability of the occurrence of negative estimates is evaluated in this paper. The probability of the occurrence of negative estimates depends on the degrees of freedom of the mean squares and the true values of the variance components themselves. Based on the probability and the precision of the estimators, some practical recommendation concerning the number of laboratories can be derived for the precision experiments.
2 Three-stage nested experiments 2.1 Statistical model and ANOVA Three-stage nested experiments include three random components; the statistical model of a measurement result can be generally expressed as Y I / L = ~ + ~ I + ~ I / + E , / ~ ,
a
where p is a general mean, a, is a random variable from N(0, oA2),effect of a laboratory p,, is a random variable from N(0, oB2),effect of a day E , , L is a random variable from N(0, o;), measurement error, and k = I , ... , r , , ; j = 1, ... , b,; i = 1, ... , m . We use the symbols, p~ and pBto express the ratios c i A 2 / oand ~ oB2/oE2.Sums of squares, SSA,SSB,and SSEare obtained as
where averages are defined as
Degrees of freedom,
VA,
vB,and v . are obtained as
v ~ = m -1, vB= C b , - m , and v.=Cr(/ - C b , 'J
I
I
2.2 Balanced nested experiments 2.2.1 General
The number of replications is constant in the balanced nested experiments. Hence, we denote b = 6,(for all i), and r = r, (for all i and j). Table 1 shows the ANOVA table for the balanced nested experiments. Table 1: The ANOVA table for the balanced nested experiments
Factor
A B E
Sum of Squares, SS SSA SSB SSE
Degrees of Freedom, v m-1 m(b - 1 ) mb(r-1)
Mean Square, MS SSA/V ,
E(MS)
ssB/VB
0E2+rOB2+rbOA2 O B2+rOB2
SSEIv~
02
From the ANOVA table, ANOVA estimators are derived as
&: n2
= (MSA-MSB)I(rb),
0, = MSE
(5) For the case of the balanced nested experiments, all sums of squares are mutually independent chi-square distributed. The distribution of SSA / E(MSA) is a X 2 distribution with vA degrees of freedom, and so on. From Eq. (5),the probabilities of the occurrences of negative estimates are;
where F(v, , v2 ) is an F distributed random variable with ( v , , v2 ) degrees of freedom. 2.2.2 Required number of laboratories for precision experiments From the viewpoint of the precision experiments, usual experiments are designed as r = 2 , and b = 2. Applying r = b = 2, and v~ = m 1 , v~ = m, and vc = 2m, we have [ - I , m ) < (1+2pB)/(1+2p~+4pA) ] ~ r [ c j : < o ] = ~ r F(m
< o ] = ~ r ~[ ( m2m) , < 1/(1+2pB)1.
@I (9)
Table 2 shows the minimum number of laboratories ( m ) on given pB = oB2/oE2, for given p (= ~ r [ 6 ;< 0 I). The calculations of p-values for Eq. ( 9 ) are based on the programs in JSA (1972). Table 3 shows the minimum number of laboratories ( m ) on given
PA =
oA2/oE2, and given pB = oB2/oE2, for given p (= ~ r [ & : < 0 I).
Table 2:n ?The minimum number of laboratories (m), on given pB = 02/oE2,for given p (= Prl 0, < 0 I)
From Table 2 , a large number of laboratories are required to estimate oB2positively, for the case of smaller p,.
Table 3 (a): The minimum number of laboratories (m), on pA = oA2/oE2, and given pB = "2 02/oE2,for given p ( = Pr[ OA< 0 1) = 0.01
Table 3 (b): The minimum number of laboratories (m), on p,
=
oA2/oEZ, and given pB =
-2
oB2/oE" for given p ( = Pr[ oA< o 1) = 0.02
Table 3 (c): The minimum number of laboratories (m), on pA = oA2/oE2, and given pB =
-
2
oB2/oE2, for given p ( = Pr[ oA< 0 1) = 0.05
Table 3 (d): The minimum number of laboratories (m), on pA = (rA2/oE2, and given p,
=
-2
o ~ ~ lforo given ~ ~ p, ( = Pr[.oA i0 1) = 0.10
Table 3 (e): The minimum number of laboratories (m),on pA = oAZ/oE2, and given pB = -2
oB2/oE2, for given p ( = Pr[ oA< 0 1) = 0.20
O n the contrary, Table 3 shows that a large number of laboratories are required to estimate (rA2positively, for the case of larger pB. From the viewpoint of con-
ducting precision experiments, the number of participating laboratories is usually less than 50, and the results from Tables 2 and 3 show that we may often find negative estimates of variance components. 2.3 Staggered nested experiments
Based on the structure of staggered nested experiments, we have b = 2, r,, = 2, and r,* = I (for all i). Table 4 shows the ANOVA table for the staggered nested experiments. Table 4: The ANOVA table for the staggered nested experiments
Squares,
Degrees of Freedom, v
Mean Square, MS
E(MS)
From the ANOVA table, ANOVA estimators are derived as
6:= MSA13 - 5/12 MSB+ 1/12 MSE, 3;
-
= 314
(MSB - MSE),
2
os = MS,;.
(10)
For the case of the staggered nested experiments, all sums of squares are chisquare distributed, however SSAand SSBare positively correlated. Concerning Pr[&; < 0 1, SSB and SSEare mutually independent, and then the probability of negative estimates can be obtained similarly to the case of the balanced nested experiments. Table 5 shows the minimum number of laboratories (m) on given pB = oB2/oE2,for given p (= Pr[&j < 0 I) based on Eq.(ll).
Table 5: The minimum number of laboratories (m), on given p,
Pr[ &; < o 1)
= (TB2/(TE2,
for given p (=
From Table 5 , a large number of laboratories are required to estimate oB2positively, for the case of smaller pB. The tendency is similar to the results of the balanced nested experiments. Ojima (1998) derived the canonical form of the staggered nested experiments. Using the canonical form, P r [ s i < 0 ] can be evaluated. Table 6 shows the and given p~ minimum number of laboratories ( m ) on given pA = oAZ/oE2, oB2/oE2, for given p (= ~r[c?: < 0
I).
=
The result of Table 6 has been obtained by
Monte Carlo simulation. From Table 6, a large number of laboratories are required to estimate oA2positively, for the case of larger pB. The tendency is similar to the results of the balanced nested experiments.
-
Table 6 (a): The minimum number of laboratories (m), on p~ = oA2/oE2, and given p~ = 2
oB21cr2, for given p ( = Pr[OA < 0 1) = 0.01
Table 6 (b): The minimum number of laboratories (m), on p~
=
oA2/02, and given pB =
=
oA2/02, and given pB =
n2
o$/oE2,for given p ( = Pr[ OA< 0 I ) = 0.02
Table 6 (c): The minimum number of laboratories (m), on p,
o B 2 / ~for 2 ,given p ( = Pr[ 6;< 0 1) = 0.05
Table 6 (d): The minimum number of laboratories (m), on pA = G ~ ~ / and c T given ~ ~ , p,
=
n2
oB2hE2, for given p ( = Pr[ oA< o 1) = 0.10
Table 6 (e): The minimum number of laboratories (m), on p~ = oA2/02,and given p~
=
*2 oB2/oE2, for given p ( = Pr[ OA < 0 1) = 0.20
For the staggered nested experiments, less data are obtained in a laboratory than for the balanced experiments. Therefore variances of MSB and MSE obtained by the staggered nested experiments are larger than those by the balanced experiments. Consequently, the required number of laboratories for the staggered experiments is almost twice as much as for the balanced experiments. The comparison is only based on the number of laboratories and is not based on the whole number of data in an experiment. The number of data required in the staggered experiment is 314 of that in the balanced experiment with the same number of participating laboratories. Hence, it can be stated that the amount of information obtained from the staggered experiment is also 314 of that of the balanced experiment. 2.4 Practical consideration for precision experiments
As described in Eq.(l), the three stage nested design includes three random components, i.e. a, p, E. Applying the nested design for precision experiments, a is a random component due to a laboratory, P is a random effect due to a day, and E is an error component under repeatability conditions. As oE2is the repeatability variance, the case of a large value of p~ = oB210E2, e.g. pB > 5, means that oB2(day to day variation) is relatively much larger than oE2. In other words, the measurement process in the laboratory is not well controlled. The measurement process might not be standardized enough, or calibration methods may not be sufficient. On the contrary, a relatively small value of pB means that the measurement process in the laboratories can be well controlled. However, there is a possibility that the repeat-
ability value itself is too large in practice. If laboratories are well controlled, and resulting situation is 0.2 < pB < 0.5, it is not necessary to detect uB2. For such a case, the occurrence of negative estimates of uB2is not any longer a serious problem. To evaluate the value of PA, we should also consider a relative magnitude of p ~ . The case of small pA with large p~ is not a natural situation. Because small p,~, means there is small difference between laboratories, at the same time large pB means large variation within days in each laboratory. It can be caused by very poor calibration, or sample deterioration. Those problems should be solved in the stage of standardization of measurement methods. Reviewing several cases of pA and pB as above, it is enough to consider only the case of pA> p ~0.5 , < PB< 2, and 1 < p, < 2, for usual practical precision experiments. For balanced experiments the required number of participating laboratories should be 30 for 1% level (probability of occurrence of negative estimates), 20 for 5% level, 10 for 10% level, from Table 3. Similarly, for the staggered experiments the required number of participating laboratories should be 60 for 1% level, 40 for 5% level, 20 for 10% level, from Table 6.
3 Four-stage nested experiments 3.1 Statistical model and ANOVA
Four-stage nested experiments include four random components; the statistical model of a measurement result can be generally expressed as Y ~ ~ k l = ~ + a ~ + p ~ j + ~ ~ j k + ~ ~ / k l ~ (I2) where symbols are defined similarly as in Eq. (2), however 1 = 1, ... , r,,k; k = 1, ... , c,,; j = 1, ... , b,; i = 1, ... , m. Sums of squares, SSA,SSB,SSc, and SSE are obtained similarly as in Eq. (3). Degrees of freedom, VA, VB, vc, and vE are also obtained similarly as in Eq. (4). 3.2 Occurrence of negative estimates
The number of replications is constant in the balanced nested experiments. Hence, we denote b = 6, (for all i), c = c,, (for all i and j), and r = rUk(for all i, j and k). The probability of the occurrence of negative estimates can be evaluated similarly to section 2.2. Based on the structure of staggered nested experiments, we have b = 2, c,, = 2, C,Z = I, rIl = 2, rl12= 1, and r I 2=~ 1 (for all i). The probability of the occurrence of negative estimates can be evaluated similarly to section 2.3. Ojima (2000) proposed the generalized staggered nested experiments and derived the canonical form. Using the canonical form, the probability of the occurrence of negative estimates can be evaluated by the similar manner discussed in the section 2.3.
3.3 Practical considerations for precision experiments If a precision experiment is conducted after a complete standardization o f the measurement method, three stage nested experiment may be suitable. Four or more stage nested experiments may be useful to detect some large components o f variation in the measurement process. The result o f the experiment should be used to refine the process in order to have high precision. For such cases, the occurrence of negative estimates is not a serious problem because the most important aim of the experiment is to detect the largest variance component in the process.
4 Conclusion The probability o f the occurrence of negative estimates depends on the degrees o f freedom of the mean squares and the true values o f the variance components themselves. Based on the probability and the precision of the estimators, some practical recommendations concerning the number o f laboratories can be derived for the precision experiments. For balanced experiments, the required number o f participating laboratories should be 30 for 1% level (probability o f occurrence o f negative estimates), 20 for 5% level, 10 for 10% level. Similarly, for the staggered experiments the required number o f participating laboratories should be 6 0 for 1% level, 40 for 5% level, 20 for 10% level.
References I. IS0 5725-3: Accuracy (trueness and precision) of measurement methods and results Part 3: Intermediate measures of the precision of a standard measurement method, ISO, 1994 2. Yoshikazu OJIMA (1998), General formulae for expectations, variances and covariances of the mean squares for staggered nested designs, Journal of Applied Statistics, Vol. 25, pp.785-799, 1998 3. Yoshikazu OJIMA (2000), Generalized Staggered Nested Designs for Variance Components Estimation, Journal ofApplied Statistics, Vol. 27, pp.541-553, 2000 4. JSA (1972), Statistical Tables and Formulas with Computer Applications, Japanese Standard Association, 1972
Statistical Methods Applied to a Semiconductor Manufacturing Process Takeshi Koyama Tokushima Bunri University, Faculty of Engineering, Sanuki City, 769-2101, Japan [email protected]
1 Introduction Quality assurance of semiconductor devices requires the effective utilization of the statistical methods. In order to put new products on the market at the soonest possible time, manufacturers are typically forced to apply advanced yet immature technologies. Under this difficult situation, application of the statistical method that was designed to prevent post-shipment defects has become vital. The key factors of quality assurance system including reliability assurance are the following: (1) quality design of new products and initial verification; (2) quality improvement in the manufacturing process; (3) customer support and quality information collection; and (4) study and development of fundamental technologies for quality assurance. In this paper, the author first gives an overview of the statistical methods applied to the semiconductor manufacturing process in Section 2. It is essential to improve the quality (yield, in general) from the beginning of the production of a newly developed product to the end of its life as shown in Fig. 1. As the goal is to increase the profit, the priority in the process of the development of an optimized statistical method is the speed of the quality improvement. Therefore, the method that produces the positive results in the shortest time is the one that is the most valuable. Next, Section 3 demonstrates an application example of the orthogonal array of L,&!'~) to take some defects away. Section 4 then presents some new proposals for the prevention of quality deterioration caused by time dependently deteriorated materials.
1
3
5
7
9
11
13
15
17
19
21
Month Fig. 1 : Trend of yield
23
2 Overview of statistical methods Presently available statistical methods[l]-[4] are classified as Fig. 2, based on the possible quantity of available data and the unit cost for making samples, test, evaluation, analysis, etc. The top left area of the figure corresponds to development and design phases. At this stage, the high cost may prevent us from obtaining enough quantity of data. What must be prepared here is therefore specific circuits and structures that enable effective evaluation despite of the limited quantity of samples. Moreover, development of a certain kind of accelerated life test is essential for short evaluation. The bottom right area, on the other hand, corresponds to the volume production phase. Now manufacturers can obtain enough quantity of data with cheaper evaluation cost. In the semiconductor industry, as shown in Fig. 1, the more rapidly the quality improvement is achieved, the more profitable the manufacturer is. Time is thus most important factor when we detect defects and clear the problem.
trial production phase olume production phase
Available quantity of data Fig. 2: Overview of statistical methods
large
3 An experiment with L&") It is presented that application of ~ ~ ~ ( 2orthogonal ") array[5] led to clear the problem caused by defect such as crack of passivation layer of a semiconductor device. The results obtained from this experiment led to the amazingly successful improvement of the yield. 3.1 Design of experiment Table 1 shows the six factors taken into the experiment, which were thought to be closely related to the defective, namely, crack of the passivation layer. Levels 0 and 1 were determined as shown in Table 1. As it was very difficult to change the condition of diffusion process for the factors F and D, trials 1-4, 5-8, 9-12, and 13-16 were grouped as 4 groups in the dotted lines in Table 2 and the groups were randomized (split-unit design[4]). Other factors except F and D were randomized within each group. As a result, the sequence of the experiments is as follows: trials 10(AoBICIDIFoII), 12, 9, 11, 14, 16, 13, 15, 4, 3, 1, 2, 6, 8, 7, 5, where A,, means level 0 of factor A, namely, 2.0pm, and so on. Again, every 4 trials, for example, trials 10, 12, 9, 11 were experimented under the same conditions regarding factors F and D. Assignment of the factors is shown in Table 2. Columns 1-7 are assigned as the primary group[4] and columns 8-1 5 as secondary group[4]. The first-order error variation Sel is given by columns 5-7 (denoted by el), and second-order error variation Se2 to columns 9, 11, and 13-1 5 (denoted by e2). Then,
because the variation of column i, S,, is given by S , = ( T , , ) , - T ,,,,, ) 2 116 ,
where T(,,, and T(,,oare the sums of the obtained data corresponding to levels 1 and 0[4]. The factors were assigned to the columns as shown in Table 2. The factors F, D, F X D and A were determined as the primary group, also the factors C, B, and I as the secondary group.
Table 1: Assignment of factors
A B C D F I
Factor SiOz layer pre-treatment separation by boron supporting boat diffusion tube handling tweezers
Level 0 2.0pm used three times yes used silica glass tube stainless steel
Level 1 0.8pm new no new polysilicon tube Teflon
The interaction[4] FxD between the factors F and D was assigned to column 3, because the effect of factor F differed according to the level of factor D. 3.2 Analysis of the results
The rightmost column of Table 2 shows the results. Trial 10 datum was missed because of the break of the sample wafer. Now, let x be the missing value, then, error sum of square Se is given by Se = Se,
+ Se,
(4)
with values shown in bottom two rows of Table 2, where Se, and Se2 are given by Equations (1) and (2). The missing value is estimated as x=77.6 by dSe/&=O, so as to minimize Se. The value, Se113/Se2/5=0.26,is smaller than the critical F-value, F(3, 5 ; 0.05)=5.41[6], where 3 , 5 and 0.05 are the degrees of freedom of Se, and Se2, and the percentage of F-distribution, so Se, is pooled to Se2 for the analysis of variance. The result of analysis of variance is shown in Table 3. Factors D, DxF , F and A are significant, because the calculated F-ratios are larger than the critical F-values, F ( 1 , 7 ; 0.05)=5.59 or F(1,7; 0.01)=12.2[6],where 1 and 7 are the degrees of freedom and 0.05 and 0.01 are the percentage of F-distribution. In this analysis, the total degree of freedom as well as the error degree of freedom is subtracted by one, because one missing value is estimated.
Table 3: Analysis of variance
Factor D F DxF A C B I error sum
Sum of square 1975.8 3962.7 2155.3 4395.7 924.2 6.3 322.2 1620.5
Degree of freedom
Mean square 1975.8 3962.7 2155.3 4395.7 924.2 6.3 322.1 231.5
Note: *; larger than F(1,7; 0.05)=5.59, **; larger than F(1,7; 0.01)=12.2
The action that the silica glass tubes were replaced by polysilicon tubes, which was suggested as a result of this experiment, had been able to eliminate the defects from about 17% to almost 0 %.
4 Control of processing time for quality assurance 4.1 Quality influenced by throughput time
Such materials as photo-resist, etching solution, etc. are, in general, degraded in accordance with time passing. In order to assure quality, the staying time in processes should be limited. The quality of these materials changes exponentially as shown in Equation (5).
where, q , and a are the initial quality and the coefficient of degradation, respectively. Let q~ be the allowable limit quality, then, the allowable time should be less than
The total of the waiting time and the treatment time should be controlled withint,.
4.2 Control of batch products Fig. 3 shows a generalized process instead of, for example, the photo lithography process using photo-resist, the etching process using etching solution, etc. in the semiconductor manufacturing factory. When a product to be processed arrives at the process, it joins the waiting queue, unless the process is empty. Although the arrival distribution or the working (treatment) distribution is, in general, expressed by the exponential distribution as well as the Erlang distribution, the gamma distribution
is considered as more general distribution in this paper. If k is an integer, it becomes the Erlang distribution. Furthermore, if k=l, it becomes the exponential distribution. For the arrival distribution, k=k, and a = a,. For the working
k,, a,,.
g(c k', ,a,) arrival
+
departure
- +
Process
Waiting queue (Buffer)
(work station)
Fig. 3: A generalized process
distribution, k=k,,,and a = a,,,.The unit time may be determined as 11 a,. Although staying time in processes has been generally analyzed based on the queuing theory[7][8], the condition that a ,I a , p >l is not taken into consideration, because the number of products in a queue diverges under this condition. However, actual queues in the manufacturing process suggest that the condition that p > l exists on some periods. This necessitates the transient analysis. Transient characteristics are studied by computer simulation as shown in Fig. 4. In this simulation, n, m,
7(i) , and
Kt,,(k)
are the number of trials, the
number of batch products produced continuously, the mean finish time of i-th products, and the k-th mean number of waiting products, respectively. Figures 5 (a) and (b) show the computed mean numbers of waiting products and Fig. 6 shows the computed mean finish time under the conditions: k,=5, a,=2, k,,,=2; ,Q =I .2, 1.4, 1.6; p =0.2, 0.4, 0.6; m=90 and n=50. The numbers of waiting products increase almost linearly under the condition that p 1 1.2. On the contrary, the numbers keep almost the same rates under the condition that p 5 0.6. Fig. 6 shows the mean finish time up to determined number of products. From Fig. 6 , we can obtain
where t, ,B, and x are, respectively, finish time, ,8= q m ) ' m , and the number of batch products produced continuously. Table 4 shows the obtained ,8. Table 4: Obtained fl
0.4
item Mean finishtime
P
T;
0.8
0.2 0.4 181. 182 (m) 6
1.2
1.6
2.0
2.4
2.8
3.2
0.6 0.8 1.0 1.2 1.4 1.6 182.7 184.7 195.0 222.3 256.6 292.1
Conditions: k,=5, a,=2, k,=2, m=90 and n=50.
<
n: number of trials m: number of lot products p,l,(i): i-th uniform random variable for arrival interval t,(i): i-th arrival interval
Start ) I Input k,,, a,>,k,,, a,,, n, m I
Loop number of tr~alsStart I = l-n
r,,(i): i-th arrival time
I
Loop number of batch products Start
pu,(i): i-th uniform random variable for working time
I = 1-m
I
Generate un~formrandom variable, pi .(I) Calculate t,(i)
= g-1(p,l,2(i),
f,.(i): i-th working time t,(i): i-th start time
k,, a , )
tl(i): i-th finish time rWi(i):i-th waiting time d(i): flag for i-th wait
I Generate uniform random variable, po,(i) I
n,,,(k): k-th number o f waiting products s,(i, j): i-th sum of finish time in j-th trial s,",(k,j): k-th sum of the number of waiting products inj-th trial fAi): i-th mean finish time n,.,(k): k-th mean number waiting products
Calculate t,,(i) = g-'(pii,,(i), k,,,,a,,)) w ~ t hthe inverse gamma function provided in ExcelIVBA I
I
Loop: counting number of waiting products: k = 1 -2m I
I Looo: number of batch oroducts: End I Sort r,(l)- t,,(m), tA1)- tAm) in ascendmg order
I
I
Loop: number of trials: End]
Calculate and output tAi) = ski, n)ln
Fig. 4: Flow chart for computer simulation
(End)
100
50
150
Time
(4
Time Fig. 5: Mean number of waiting products: n=50
-
10
20
-
30
-
---
-
40
50
60
.4
70
80
90
Number of batch products Fig. 6: Mean finish time: n=50
From Equations (6) and (8), we obtain
Thus, we should control the number of batch products, that is manufactured in series, not to excess XL given by the Equation (9) in order to assure quality, considering the deterioration of material as well as the number of the waiting products.
5 Summary Statistical methods applied to semiconductor manufacturing process are normally viewed. An example with tI6(2l5) orthogonal array is presented including split-unit design and estimation of one missing value. The result of this experiment was extraordinarily effective to improve the yield beyond expectation. Dynamic and statistical characteristics about consuming time in the manufacturing process are studied by computer simulation, in particular, under the condition that the working time is larger than arrival time interval with gamma random variables. The study will be useful to assure quality in manufacturing process for preventing quality deterioration in accordance with time.
References [ l ] Wadsworth, H., K. S. Stephens and A. B. Godfrey: Modern Methods for Quality Control and Improvement, John Wiley & Sons, Inc., 1986. [2] Montgomery, D. C.: Introduction to Statistical Quality Control, John Wiley & Sons, Inc., 2004. [3] Beck, J. V. and I<. J. Arnold: Parameter Estimation in Engineering and Sclence, John Wiley & Sons, Inc., 1977. [4] Logothetis, N. and H. P. Wynn: Quality Through Design, Oxford Science Publications, 1989. [5] ditto, p. 385. [6] ditto, p. 419. [7] Papadopoulos, H. T., C. Heavy and J. Browne: Queuing Theory in Manufacturing Systems Analysis and Design, Chapman & Hall, 1993. [8] Gross, D.: Fundamentals of Queueing Theory, John Wiley & Sons, Inc., 2000.
An Overview of Cornposite Designs Run as Split-Plots Geoff Viningl and Scott Kowalski2 Virginia Tcch, Dcpartment of Statistics, Blacksburg, VA 24061, U.S.A. [email protected] Minitab, Inc., Statc Collcge, PA 16801, U.S.A. Summary. Many industrial experiments involve factors that are hard-to-change as well as factors that are easy-to-change, which naturally leads to split-plot expcriments. Unfortunately, the literature for second-order response surface designs traditionally assumes a completely randomized design. Vining, Kowalski, and Montgomcry (2004) outlinc the general conditions for response surface experiments such that the ordinary least squares estimates of the model are equivalent to the generalizcd least squarcs estimatcs. Vining and Kowalski (2004) use this result to derive conditions for exact tests of most of the model parameters and the Satterthwaite's proccdurc for thc othcr parameters
This papcr surnmarizcs the results of Vining, Kowalski, and Montgomery (2004) and Vining and Kowalski (2004). It illustrates how to modify standard central compositc dcsigns to accommodate the split-plot structure. It concludes with a fully analyzcd cxamplc.
1 Introduction Many industrial experiments involve situations where some factors have levels that are hard t o change along with factors with levels that are easy t o change. In such situations, the experimenter should restrict t h e randomization by fixing the levels of the hard to change factors and then run all combinations or a fraction of all combinations of the other factors. Such a strategy creates a split-plot design where the experimental unit for t h e hard-to-change factors is subdivided into experimental units for t h e easy-to-change factors. Often, we call t h e hard-to-change experimental units "whole plots" and t h e easyto-change experimental units "subplots". This experimental approach uses two separate randomizations: one for the whole plots, and another for each subplot. These two randomizations create two separate error terms. This paper summarizes the results of two recent papers. Vining, Kowalski, and Montgomery (2004) illustrate how t o modify standard second-order designs t o accommodate a split-plot structure. They derive general conditions
under which the ordinary least squares (OLS) estimates of the model are equivalent to the generalized least squares (GLS) estimates. The conditions required are relatively easy t o achieve for many common experimental situations. Vining and Kowalksi (2004) restrict their attention t o designs where the OLS and GLS estimates of the model are equivalent. They then derive the conditions under which exact tests for the model terms exist. They also illustrate how to use Satterthwaite's procedure for the other parameters.
2 Design Conditions under Which OLS and GLS Estimates Are Equivalent
xzl
Let N = n 7 be the total number of runs in our design. If our design is balanced, each n, = n , and N = mn. Consider our model in matrix form, which is
where y is the N x 1 vector of responses, X is the N x p "model matrix",
p is the p x 1 vector of coefficients, 6 is the N x 1 vector of random whole plot errors, and E is the N x 1 vector of random subplot errors. We assume that 6 and E have means of 0 and that 6 + E has a variance-covariance matrix of E , where
and
The length of each li is ni, the number of subplot runs within the ith whole plot. The naive ordinary least squares (OLS) estimate of 0 is ( X ' X ) - l X ' y , ly. and the generalized least squares estimate (GLS) is ( X ' E - l ~ ) - l ~ ' -C It is important t o note that for split plot experiments, the variance-covariance matrix for the OLS estimate is not a 2 ( X ' x ) - l ; rather, it is
The variance-covariance matrix for the GLS estimate is ( X ' C -Ix)-'. The GLS estimate is also BLUE (best linear unbiased). When the OLS and GLS estima,tes are equivalent, the variance-covariance matrix for the OLS estimates is
just as in t h e GLS case. Vining, Kowalslti, and Montgomery (2004) derive the general conditions which guarantee the equivalence of the OLS and the GLS coefficient estimates. This proof assumes that the model contains the intercept term and that the design is balanced (each whole plot must have the same number of subplot runs). The proof is quite general. It makes no assumption about t h e order of the model to be fitted. It only assumes that the design used supports the intended model. All of the other conditions are on t h e designs for the easyto-change factors within each whole plot. Let
where
W iis the
ith whole plot design and Siis the ith subplot design. If
then the OLS and GLS estimates of the model are equivalent. One strategy t h a t achieves the condition (1) is t o use exactly the same design in the subplot factors within each whole plot. Letsinger, Myers, and Lentner (1996) called such a design a "crossed bi-randomized" design, and they do provide a proof for this specific structure for the equivalence of OLS and GLS. Bisgaard (2000) would call such a design a "Cartesian product design." Often, such experiments are needlessly large. However, they do meet the conditions for the equivalence of the OLS and GLS coefficient estimates. It should not be a surprise that this strategy would yield the equivalence result. Split plot experiments using categorical factors generally follow this strategy, and it is well known that OLS is an appropriate basis for estimating the model parameters. Condition (1) does not require that we use exactly the same subplot design within each whole plot. If the model in terms of the subplot terms is firstorder or first-order plus interactions, then another strategy uses a n orthogonal design in the subplot factors within each whole plot. One does not need t o use the same orthogonal design within each whole plot. Bisgaard's (2000) partial confounding designs fall into this category, which explains why OLS is a perfectly acceptable basis for estimating the coefficients for his normal probability plots.
Condition (1) is actually more restrictive than required. Let
Sbil
where = k for i = 1 , 2 , .. . , m and SRiis the "remainder" of Si. If there exists a matrix G such that
lllSRz= S R i G
(2)
then the OLS and GLS estimates of the model are equivalent. I t is not a trivial exercise to find G t o achieve Condition (2); however, it does exist for many practical situations. The Vining, Kowalski, and Montgomery (2004) results are quite exciting for second-order response surface designs. Previous articles (Letsinger, Myers, and Lentner (1996), Draper and John (1998), and Trinca and Gilmour (2001) all assume that one could not achieve the equivalence of the OLS and GLS estimates of the model. They all focus on GLS estimates of the secondorder model and assume the need for asymptotic tests of the coefficients. The OLS - GLS equivalence for the estimated parameters makes the mathematics tractable for deriving exact tests (Vining and Kowalski 2004).
3 Second-Order Composite Designs That Achieve t h e
OLS-GLS Equivalence The central composite design (CCD) is the most commonly used second-order response surface design. It consists of three parts: 0
a 2k or a Resolution V fraction of a 2k factorial design, a series of "axial" runs, and a series of center runs.
Table 1 gives a standard three factor CCD. The three factors are X I , 2 2 , and The first eight runs are a 23 factorial, the next six runs lie on the axes formed by the three factors, hence the name axial runs, and t h e last five runs are all a t the centered value for the factors. Typical values for a are 1 and the fourth root of the number of factorial runs, which yields a rotatable CCD. A rotatable CCD is one where the prediction variances for any two points equidista.nt from the design center are the same.
23.
The second-order model for this situation is
The 23 factorial portion of the design allows the estimation of the main effects and the two-factor interactions. The axial runs in conjunction with the center runs allow the estimation of the main effects and the pure quadratic terms. We first consider how to modify this CCD t o achieve Condition 1. Let w l be the single hard-to-change factor, and let s l and sz be the two easy-tochange factors. We can achieve Condition 1 if we choose the levels of the s: and s i terms to be actually whole plot effects. Table 2 gives such a design. Within each of whole plots 1 and 2 is a full 2' factorial experiment in the subplot factors. Taken together, whole plots 1 and 2 form a full factorial in the single whole plot factors. Whole plots 3-6 are the axial runs. This particular construction effectively makes the pure quadratic terms for the two subplot factors whole plot effects, which is the key for achieving Condition (1). Whole plots 7-9 are the center runs. Table 1. A Standard Thrcc-Factor Ccntral Compositc Dcsign
1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 -1 1 1 1 1 1 -a 0 0 a 0 0 0 -a 0
o
a
0 O 0 0 0 0 0
0 -a O a 0 0 0 0 0 0 0 0 0 0
o
The design given by Table (2) has several desirable characteristics. First, the OLS estimates of the model are equivalent to the GLS. Second, Vining and Kowalslti (2004) establish that all the tests on the parameters are exact. Unfortunately, this design requires two whole plots to execute the axial runs for the subplot factors. Please keep in mind that whole plots are expensive either in terms of time or money. As a result, it is always nice t o minimize the total number of whole plots, within reason. Third, this design produces "pure
Table 2. A CCD with One Hard-to-Change and Two Easy-to-Change Factors that Achicves Condition (1)
Whole Plot wl sl sz Whole Plot wl s l sz
error" estimates of the variance components. Vining and Kowalski (2004) establish lack of fit tests using these pure error estimates. We next consider a central composite design (CCD) with two hard-tochange factors ( w l , w2) and two easy-to-change factors ( s l , sz). Table 3 gives a design that attempts to minimize the total number of whole plots by running all of the subplot axial runs in a single whole plot. We need t o use both Conditions (1) and (2) to show that the OLS and GLS estimates of the model are equivalent for this design. Vining and Kowalski (2004) establish exact tests for all of the model parameters except the subplot pure quadratic terms.
4 Exact Tests for the Model Parameters Vining and Kowalski (2004) derive exact tests for the model terms when the design achieves the OLS-GLS equivalence. We need appropriate estimates of the two variance components. Consider the model
where
Table 3. A CCD for Two Hard-to-Change and Two Easy-to-Change Factors
Wholc Plot wl w2 sl s2 y Whole Plot wl w z s l s 2 y 1 -1 -1 -1 -1 80.40 7 0 -1 0 0 80.07
piup = [ p l ,p ~. ., . , p,] is the m x 1 vector of means for the whole plots, S is the model matrix for the subplot terms, and y is the vector of subplot coefficients. Let X * = [M S ] .
where (X*'X*)-is a generalized inverse of X*'X*.Vining a n d Kowalski (2004) assume normality and establish that
follows a x2 distribution with df,,,,~= mn -rank[X*(X*'X*)-X*'] degrees of freedom. Thus, an appropriate error term for testing the "purely" subplot effeck is
"Purely" subplot effects are those completely orthogonal t o all of the whole plot terms. Vining and Kowalski show that when the design achieves the OLS-GLS equivalence, the resulting tests are exact. For some designs, not all of the nominal subplot effects are completely orthogonal to the whole plot effects nor are they essentially whole plot effects. There are no exact tests for such terms. However, we can generate approximate tests based on Satterthwaite's procedure. Let X 1 be X with the column associated with the term of interest removed. T h e appropriate linear combination of the variance components is given by trace
[[x(x'x)-Ix'
-
x ~ ( x ~ x I ) -] ~ . x~]E
This linear combination then becomes the error term for testing. Satterthwaite's procedure allows the basis for determining the approximate degrees of freedom. Next, consider the model
where XL 1s the model matrix for all terms that are essentially whole plot effects for a t least one whole plot, P, is the vector of coefficients associated with these terms, X, is the model matrix for all the terms t h a t are strictly subplot effects, and y, is the vector of coefficients for these terms. A nominal subplot effect is a part of X; if there is a t least one whole plot where the term does not change value. Let S&be the model matrix of nominal subplot effects t h a t are part of XG and let W be the model matrix for the true whole plot effects; thus, XL = [W SG].Let S, be the matrix whose nonzero elements are the submatrics of SL that are essentially whole plot effects. Finally, let X, = [W S,]. Vining and Kowalski (2004) define S S r e s ,by ~
They again assume normality and establish t h a t
follows a X 2 distribution with df,,,,~= m - r a n k [ X , ( X L X , ) - X L ] if the design meets the conditions for OLS-GLS equivalence. They further establish that
is a,n appropriate error term for all whole plot effects.
5 Example Consider the experiment summarized in Table 3. From Equation (3), we obtain that SSreS,s is 2.117755 and has 28 degrees of freedom. Thus, MSres,s is 0.07563. From Equation (4), we obtain that SSre,,p is 27.02772 and has 6 degrees of freedom. Thus, M S r e S , w is 4.65462. All of the model terms have exact tests except the two subplot pure quadratic terms. T h e appropriate linear combination of the variance components for these two tests is + 0 . 1 9 5 1 2 2 M S r e s 1 ~Satterthwaite's . procedure yields 6.82 0.704878MSres1~ approximate degrees of freedom. Table 4 summarizes the coefficient estimates using OLS, the estimated standard errors, and the test statistics. Table 4. Estimated Coefficients Using OLS and Summary of Test
Term Estimated Coefficient Standard Error t P Intercept 74.9055 0.4968 150.77 0.0000
w1 wz w," W2 Wl zz
SI s 2
s?
s,"
SIs z
wl sl WI s 2
sl wz s z w2
4.5579 -6.5592 1.7381 -0.5407 0.8431 -4.973 4.0922 -2.3864 2.5736 -1.0394 1.4356 -1.4794 -1.0019 1.9856
0.4404 0.4404 0.8077 0.8077 0.5394 0.0648 0.0648 0.5486 0.5486 0.0688 0.0688 0.0688 0.0688 0.0688
10.35 0.0000 -14.89 0.0000 2.15 0.0314 -0.67 0.5032 1.56 0.1180 -76.72 0.0000 63.13 0.0000 -4.35 0.0000 4.69 0.0000 -15.11 0.0000 20.88 0.0000 -21.52 0.0000 -14.57 0.0000 28.81 0.0000
6 Conclusions An important question facing many industrial practitioners is how to conduct response surface experiments properly when faced with restrictions on the
randomization. This paper summarizes the conditions under which the OLS and GLS estimates of the coefficients are equivalent. Often, these conditions are easy to achieve. This paper also summarizes the appropriate tests when the OLS-GLS equivalence holds. For almost all terms, these tests are exact.
References 1. BISGAARD, S. (2000). "The Dcsign and Analysis of 2 " ~ x 2q-r Split Plot Expcrimcnts" . Journal o f Quality Technology 32, pp. 39-56. 2, DRAPER,N . R. and ,JOIIN,J . A . (1998). "Rcsponse Surfacc Designs where Lcvcls of Somc Factors Arc Difficult to Change". Australian and N e w Zealand Journal of Statistics, 40, pp. 487 495. 3. LETSINGER, J . D . ; MYERS,R. H . ; and LENTNER,M. (1996). "Response Surface Methods for Bi-Randomization Structures". Journal of Quality Technology 28, pp. 381- 397. 4. T R I N C AL. , A.AND GILMOUR, S. G . (2001). "Multistratum Response Surface Designs". Technometrics, 43, pp. 25-33. 5 . V I N I N GG , . G . A N D KOWALSKI, S. M. (2004). L'ExactInference for Response Surface Dcsigns within a Split Plot Structure". Submitted for publication. 6. VINING,G . G . ; KOWALSKI, S. M. ; AND MONTGOMERY, D. C . (2004). "Response Surface Dcsigns within a Split-Plot Structure". submitted for publication.