Advanced Reliability Modeling II: Reliability Testing and Improvement: Proceedings of the 2nd Asian International Workshop (Aiwarm 2006) Busan, Korea, 24-26 August 2006

feedings o 2nd Asian International \A m* Advanced Reliability Modeling II Reliability Testing and Improvement edited b...

Author: Won Young Yun | Tadashi Dohi

26 downloads 594 Views 31MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

feedings o

2nd Asian International \A

m* Advanced Reliability Modeling II Reliability Testing and Improvement edited by

Won Young Yun • Tadashi Dohi

Advanced Reliability Modeling II Reliability Testing and Improvement

This page is intentionally left blank

Proceedings of the 2nd Asian International Workshop (AIWARM 2006)

Advanced Reliability Modeling II Reliability Testing and Improvement Busan, Korea

24 - 26 August 2006

edited by

Won Young Yun Pusan National University, Korea

Tadashi Dohi Hiroshima University, Japan

\fc World Scientific N E W JERSEY

• LONDON

• SINGAPORE

• BEIJING

• SHANGHAI

• HONG KONG

• TAIPEI

• CHENNAI

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

ADVANCED RELIABILITY MODELING II Reliability Testing and Improvement Proceedings of the 2nd International Workshop (AIWARM 2006) Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-256-758-5

Printed in Singapore by World Scientific Printers (S) Pte Ltd

PREFACE

Welcome to the Proceedings of the 2nd Asian International Workshop on Advanced Reliability Modeling (AIWARM 2006), held in Busan, Korea, August 24-26, 2006. The AIWARM 2006 provides a unique opportunity for academia and industry researchers who are interested not only in the latest research results in reliability and maintenance modeling but also in development of the next generation solutions in this emerging area. Since the great success of the AIWARM 2004 in Hiroshima, Japan, August 26-27, 2004, this workshop has been already recognized to be one of the most significant international forums in reliability and maintenance modeling, especially, in Asian region. Our objectives are to bring together researchers, scientists and practitioners from all over the world as well as Asian countries to discuss the state of research and practice when dealing with reliability and maintenance issues at the system design (modeling) level, and to jointly formulate an agenda for future research. To meet these objectives, this time, we focus on the special theme "Reliability Testing and Improvement" and feature nine research categories: System and Network Reliability, Optimization in Reliability Engineering, Maintenance, Advanced Warranty Modeling, Software Reliability, Accelerated Testing and Failure Analysis, Statistical Analysis and Reliability Modeling, Stochastic Models, and Statistical Quality Control. The proceedings contain 95 rigorously refereed articles presented at AIWARM 2006. Each paper was reviewed by at least two referees. These articles come from Japan, Taiwan, South Africa, India, China, Italy, New Zealand, Singapore, Kuwait, Korea, cover the key topics in the above nine categories and provide an in-depth representation of theory and practice in these areas. We believe that the articles of the proceedings will introduce readers to significant and upto-date theory and practice in reliability and maintenance modeling, and also be of interest and importance to the practitioners such as system designers and engineers, as well as to researchers such as applied mathematicians, statisticians, and graduate students interested in reliability, maintenance and safety engineering. AIWARM 2006 is sponsored by Korean Reliability Society, Reliability Innovation Center, Suwon University, Korea and Research V

VI

Center for Failure Analysis and Reliability, Pusan National University, Korea. We, the Guest Editors, are indebted to Honorary General Chairs, Professor D. H. Park, Hallym University and Professor G. S. Kim, Ajou University, who are the past and current presidents of the Korean Reliability Society. Their continual support during the last two years leads to the success of the AIWARM 2006. Also, the workshop is held in cooperation with Reliability Engineering Association of Japan, The Operations Research Society of Japan, IEEE Reliability Society Japan Chapter and IEICE Technical Group on Reliability, Japan. We would like to express our sincere appreciation to the Program Committee members and to the Local Organizing Committee members as well as to all the contributors to the proceedings and the participants of the workshop. Special thanks should go to Program Co-Chairs, Professor J. H. Lim, Hanbat University, Korea and Professor S.-H. Sheu, National Taiwan University of Science and Technology, Taiwan. They have spent their much time and effort to the review process of all the articles submitted and hardly worked to make an excellent program. From AIWARM 2006, we have welcomed some distinguished leading researchers as our International Advisory Committee members. We very much appreciate Professor N. Balakrishnan, McMaster University, Canada, Professor W . Kuo, University of Tennessee, USA, Professor T. N. Goh, National University of Singapore, Singapore, Professor D. N. P. Murthy, University of Queensland, Australia, Professor S. Osaki, Nanzan University, Japan, Professor D. H. Park, Hallym University, Korea, Professor K. S. Trivedi, Duke University, USA, for their useful suggestions and helpful comments. We hope through the proceedings that the readers can enjoy the state-of-art results in advanced reliability modeling and get inspiration for contributions to their on-going research. Finally, we would like to thank Chelsea Chin, World Scientific Publisher, Singapore, for her warm and patient help.

Won Young Yun, Pusan National University, Korea Tadashi Dohi, Hiroshima University, Japan Editors and General Co-Chairs of AIWARM 2006

CONTENTS

Preface W. Y. Yun and T. Dohi I. System and Network Reliability

1

Optimal Burn-in for Minimizing Total Warranty Cost J.H. Cha

3

An Efficient Approach to Enumerate SCG Arising in CRNR Evaluation S.K. Chaturvedi and R. Mishra

11

System Reliability Prediction Model using Truncated Weibull Distribution for Manufacturing System Design H.S. Hwang and G.S. Cho

24

Integrated Performance Model for Facility Design System Performance-RAM-LCC H.S. Hwang and G.S. Cho

32

Reliability Analysis of a Network Server System with Illegal Access M. Imaizumi, M. Kimura and K. Yasui

40

Reliability Analysis of a Repairable System with Warm Standbys and Imperfect Coverage J.B. Ke, K.H. Wang and J. Y. Ding

48

Enhancing Internet Network Reliability by Integrated Framework of Multi-objective Genetic Algorithm and Monte Carlo Simulation D.H. Kim andJ.R. Kim

56

In Heuristics for the Access Network Design Problem in UMTS Mobile Communication Networks KG. Kim, CM. Paikand Y.J. Chung Two Stage Burn in Policy S.H. Sheu andCS. Lin Reliability Analysis of a Repairable System with Standby Switching Failures and Reboot Delay K.H. Wang, J.W. Chen andJ.B. Ke

64

72

80

Quantitative Analysis of a Fault Tree with Priority AND Gates T. Yuge and S. Yanagi

89

A Consecutive- k| and k2-out-of- n System and Its Reliability X. Zhao, L. Cui andX. Yang

97

II. Optimization in Reliability Engineering

105

Efficient Algorithm for the System State Distribution of Multi-State k-out-of-n System T. Akiba, H. Yamamoto and H. Nagatsuka

107

Decision Support System for Reliability Analysis of Process Plant Shutdowns M. Bertolini and M. Bevilacqua

116

Evaluating Ability of a Branch and Bound Method Designed for Solving Bi-Objective NVP Design Problem T. Hiroshima, H. Yamachi, Y. Tsujimura, H. Yamamoto and Y. Kambayashi Optimality of k-out-of-n Systems for Condition Monitoring Maintenance using Dependent Information L. Jin, Y. Horikoshi and K. Suzuki

124

132

Redundancy Allocation Problems with Multiple Component Choices using Simulated Annealing Algorithm H. G. Kim and C. O. Bae

142

Globally Solving the Redundancy Allocation Problem for the Case of Series-Parallel Systems J.H. Kim andJ.S. Kim

150

Reliability Allocation with or without Cost Consideration G.L. Lee

158

A Multi Objective Genetic Algorithm for Solving Reliability Optimization Problem of a Communication Network System M. Mukuda and Y. Tsujimura

166

Genetic Algorithm for Solving Optimal Component Arrangement Problem of Circular Consecutive -k-out-of-n:F System K. Shingyochi, H. Yamamoto, Y. Tsujimura and Y. Kambayashi

176

Redundancy Optimization in Multi-level System with SA Algorithm W.Y. Yun, I.H. Chung and KG. Kim

185

III. MAINTENANCE

193

A Near Optimal Preventive Maintenance Policy with the Effect of Age Reduction Factor M.C. Chen and C. Y. Cheng

195

The Optimal Periodic Preventive Maintenance Policy with Reliability Limit for the Case of Degradation Rate Reduction C.Y. Cheng andM.C. Chen

203

X

An Inventory with Partial Replenishment Subject to Compound Poisson Demands S.K. Choi, K.E. LimandE.Y. Lee

211

Optimal (T, S)-Policies in a Discrete-Time Opportunity-based Age Replacement: An Empirical Study T. Dohi, N. Kaio and S. Osaki

219

A Maintenance and Inventory Control in a Connected-(p, r)-out-of-(m,n): F System Y.J.HaandW.Y. Yun

227

Optimal Maintenance Policy for a System with Damage Repair K. Ito, T. Nakagawa and K. Teramoto

235

Discrete Repair-cost Limit Replacement Policies K. Iwamoto, T. Dohi and N. Kaio

243

Aperiodic Optimal Checkpoint Sequence under Steady-state System Availability Criterion K. Iwamoto, T. Maruo, H. Okamura and T. Dohi

251

Optimal Periodic Preventive Maintenance Policies Based on ARI, and ARL Models D.H. Kim andJ.H. Lim

259

A Continuous-wear Limit Replacement-policy with Random Threshold under Periodic Inspections B.J. Lee, C.W. Kang, S.J. Kim and S.J. Bae

267

A Maintenance Model for Manufacturing Lead Time in a Production System with BMAP Input and Bilevel Setup Control H.W. Lee and N.I. Park

275

xi

Optimal Inspection Policy with Alive Message S. Mizutani, T. Nakagawa and T. Nishimaki

285

Optimal Checkpoint Intervals for Error Detection by Multiple Modular Redundancies K. Naruse, T. Nakagawa and S. Maeji

293

Optimal Maintenance Policy of the System Considering External Environment Preventive Maintenance J.H. Park, Y.B. Kim, G.H. Shin, J.W. Hong and C.H. Lie

301

Approximately Optimal Testing Policies for Series Systems M. Wang and CM. Chang Optimal Preventive-maintenance Policy for Lease Products under a Threshold Value on Failure Rate

310

318

R.H. Yeh and W.L. Chang IV. Advanced Warranty Modeling

327

Optimum Production Run Length and Warranty Period Setting for the Imperfect Production System C.H. Chen

329

Optimal Age-replacement Policy for Repairable Products under Renewing Free-replacement Warranty Y.H. Chien and J. A. Chen

337

Spare Ordering Policy for Replacement under the Rebate Warranty Y. H. Chien and J. A. Chen

345

A Repair Cost Limit Policies with Fixed Warranty Period C.H. Choi and W.Y. Yun

353

XII

Two-dimensional Warranty: Minimal/Complete Repair Strategy S. Chukova, Y. Hayakawa and M.R. Johnston

361

Optimal Production Run-length and Warranty Period for Items Sold with Rebate Combination Warranty B. C. Giri and T. Chakraborty

369

Analysis of Warranty Data with Covariates M.R. Karim and K Suzuki

377

Imperfect Repair Policies under Two-dimensional Warranty W.Y. Yun and K.M. Rang

385

V. Software Reliability

393

Bivariate Extension of Software Reliability Modeling with Number of Test Cases T. Ishii, T. Fujiwara and T. Dohi

395

Nonhomogeneous Poisson Processes Based on Beta-mixtures in Software Reliability Models D.K. Kim, I.K Yeo and D.H. Park

403

Simulation Methods for Parameter Estimation of Inflection S-shaped Software Reliability Growth Model H.S. Kim, D.H. Park and S. Yamada

411

A Study on Bootstrap Confidence Intervals of Software Reliability Measures Based on an Incomplete Gamma Function Model M. Kimura

419

Software Reliability Modeling Based on Mixed Poisson Distributions H. Okamura and T. Dohi

427

Xlll

Coverage Growth Functions for Software Reliability Modeling J. Y. Park and T. Fujiwara

435

Estimating the Optimal Software Rejuvenation Schedule with Small Sample Data K. Rinsaka and T. Dohi

443

Incorporating Dynamic Software Metrics Data in Software Reliability Assessment K. Shibata, K. Rinsaka and T. Dohi

451

A User-oriented Reliability Assessment Method for Open Source Software Y. Tamura and S. Yamada

459

Performance Analysis for Software System with Processing Time Limit Based on Reliability Growth Model K. Tokuno and S. Yamada

467

VI. Accelerated Testing and Failure Analysis

475

Planning Accelerated Tests-A Review (Invited Paper) B.J. Yum, S. J. Park, H.S. Lim andM. Kim

477

Failure Analysis and Accelerated Degradation Test for AC Fan Motor J.S. Jung, J. W. Kim, J.K. Lee, H.J. Lee, J.S. Moon, J.C. Shin, S.W. ChaandM.S. Kim An Analysis of Accelerated Performance Degradation Test Assuming the Arrhenius Stress-relationship T.H. Kang, S. W. Chung and W. Y. Yun

486

494

XIV

Optimum Design of Accelerated Life Tests under Two Failure Modes CM. Kim

502

Analyses of Accelerated Life Tests from General Limited Failure Population C.M.KimandJ.H.Seo

510

Degradation Characteristics of Polymeric Humidity Sensors under High Temperature and Humidity S.M. Kim, D.H. Hwang, J.W. Park, J.K Ham, M.S. Kim andG.T. Oh The Reliability Estimation of Pipeline using FAD, FORM and SORM O.S. Lee and D.H. Kim Reliability Evaluation of High Precision Oil Cooler S. W. Lee, S. W. Han and HK. Lee

518

525

533

Design of Accelerated Life Test Sampling Plans with a Nonconstant Shape Parameter J.H. Seo, M. Jung and CM. Kim

541

A Study on the Lifetime Prediction of the Rubber Materials for Refrigerator Component C.S. WooandS.S. Choi

549

VII. Statistical Analysis and Reliability Modeling

557

Mixture Failure Rates: Ordering and Asymptotic Theory (Invited Paper) M. Finkelstein and V. Esaulova

559

Bayesian Estimation of Hazard Rate of a Mixture Model with Censored Lifetimes S. E. Ahn, C S. Park, H M. Kim and W.H. Kim

569

A Note on Parameter Estimation for Phase-type Distribution in Canonical Form H. Gotoh, H. Okamura and T. Dohi

577

Fuzzy Variable Reliability Modeling Based on Credibility Theory R. Guo

585

Small Sample Asymptotic Distribution of Cost-related Reliability Risk Measure R. Guo

595

Estimation of Failure Intensity and Maintenance Effect under Two Different Environments J.W. Kim, W.Y. Yun, J.S. Park and J. H. Kim

603

Characterizing a Negative Binomial Process for a Gamma Distributed Failure Rate W.H. Kim, S.E. Ahn andC.S. Park

610

On Nonparametric Testing Equality of Residual Life Times J.H. Lim and D.H. Park

618

Parameter Estimation of Shape Parameter of the Gamma Distribution Free from Location and Scale Information H. Nagatsuka, H. Yamamoto and T. Kamakura

626

Comparison of Two Information Structures with Noise in Bayesian Decision Analysis C.H. Qian, J. Chen and T. Nakagawa

634

VIII. Stochastic Models

643

Grey Reliability Analysis under Li Norm R. Guo and Y, H. Cui

645

XVI

Statistical Analysis of Intensity for a Queueing System J. C. Ke and Y K. Chu

654

Application for Market Impact of Stock Price Using Cumulative Damage Model S. Nakamura, M. Arafuka, T. Nakagawa and H. Kondo

660

Some Results on a Markovian Deteriorating System with Multiple Imperfect Repair N. Tamura

668

Transient Analysis of Internet-worm Propagation Based on Simple Birth and Death Processes K. Tateishi, H. Okamura and T. Dohi

676

IX. Statistical Quality Control

685

Improved Inspection Schedule for a Batch Mode Production System J.A. Chen and Y.H. Chien

687

Evaluation of Multi-process Capability by a Fuzzy Inference Method T. W. Chen and T.C. Wang

693

A New Process Improvement Capability Index of Considering Cost to Select Supplier K.S. Chen, S.L. Yang, J.M. Huang and C.Y. Hsieh

703

Joint Optimization of Process Mean and Tolerance Limits for Multi-class Screening S.H. Hong, I. J. Choi, M.K. Lee andHM. Kwon

709

Separate Response Surface Modeling for Multiple Response Optimization Y.J. Kim

717

A Global Criterion Approach to Multiple Response Optimization Y.J. Kim and B.R. Cho

724

Determination of Mean Value for a Production Process with Multiple Products M.K. Lee, S.H. Hong and H.M. Kwon

731

Some Advanced Control Charts for Monitoring Weibull-distributed Time between Events J. Y. Liu, M. Xie and T.N. Goh

739

The Relationship between PCIs and 6o L.Y. Ouyang and C.H. Hsu

747

Confidence Intervals for Measures of Variability in Balanced Mixed Gauge R&R Study D.J. Park

752

Therapeutic Decision Making for Uncertainty Averse Patient T. Satow and H. Kawai

760

The Autocorrelated GWMA Control Chart S.H. Sheu andS.L. Lu

768

An Extended Exponentially Weighted Moving Average Control Chart for Monitoring Poisson Observations S.H. Sheu, T.C. Lin and S.H. Tai

776

Build up the Product Satisfaction Performance Measurement by Using 6-sigma Methodology and S/N Ratio Approach - An Example of PDA C.J. Tao, S.C. Chen andH.C. Tsung

783

Author Index

PARTI SYSTEM AND NETWORK RELIABILITY

This page is intentionally left blank

O P T I M A L B U R N - I N FOR MINIMIZING TOTAL W A R R A N T Y COST

JI HWAN CHA Division of Mathematical Sciences, Pukyong National University, Busan 608-737, KOREA E-rnail: [email protected]

Burn-in is a widely used method to improve the quality of products or systems after they have been produced. In this paper, optimal burn-in procedures for a system with two types of failures(i.e., minor and catastrophic failures) are investigated. A new system surviving burn-in time b is put into field operation and the system is used under a warranty policy under which the manufacturer agrees to provide a replacement system for any system t h a t fails to achieve a lifetime of at least w. Upper bounds for optimal burn-in time minimizing the total expected warranty cost are obtained under a more general assumption on the shape of the failure rate function which includes the b a t h t u b shaped failure rate function as a special case.

1. I n t r o d u c t i o n

• • • • • • •

ACRONYMS AND ABBREVIATIONS CDF cumulative distribution function DIB bathtub shape FR failure rate (function) PDF probability density function r.v. random variable sstatistical(ly) Sf survivor function

• • • •

NOTATION X . lifetime of a system, X > 0; a r.v. b burn-in time pit), 1 — p(t) the probability of Type II and Type I failure, respectively Yj, the time to Type II failure of a burned-in system with fixed burn-in 3

4

time b, when only minimal repair is performed for each Type I failure occurring during operation; Yj, > 0; a r.v. • /(*), F(t), F(t) pdf, Cdf, Sf of X • Gb(t),Gb(t) _ Cdf, SfofYi, •»•(*) f(t)/F(t);FRoiX • ii, h the first, and second change point of the FR, respectively, when the FR is DIB • t*, t** the first, and second wear-out point of the FR, respectively, when the FR is eventually increasing • A(t) JQ r(u)du; cumulative FR • Ap(t) J0p(u)r(u)du • Zb(T) min{Yi„ T}, where T > 0 is a fixed constant; a r.v. • N(Zb(T)) the total number of minimal repairs of a burned-in component (with fixed burn-in time b) which occur during operation time interval (0, Zb(T)] when only minimal repair is performed for each Type I failure occurring during operation; a r.v. • Co a constant which is proportional to the total burn-in time • cs shop complete repair cost per Type II failure during burn-in process • csm shop minimal repair cost per Type I failure during burn-in process • hi(b) the random cost incurred until the first component survives burn-in for the Burn-In Procedure I and Procedure II, respectively, i = 1,2 • cm minimal repair cost per Type I failure during field operation • Cf replacement cost per Type II failure during field operation • Wi (6) the total warranty cost when the Burn-In Procedure I or Procedure II is applied, respectively, i = 1,2 b* optimal burn-in time which minimizes the total warranty cost, respectively, i = 1,2. The burn-in procedure is a manufacturing technique that is intended to eliminate early failures of the system or product. To burn-in a system means to subject it to a period of use prior to the time when it is to actually be used. Due to the high failure rate in the early stages of system life, burn-in procedures have been widely accepted as a method of screening out failures before systems are actually used in field operations. An introduction to this important area of reliability can be found in Ref.9, or Ref.10. Because too excessive or insufficient burn-in is sometimes harmful to the performance of the burned-in component or system, and also because burn-in is usually costly, one of the major problems is to decide how long the procedure should be. The best time to stop the burn-in procedure for a given criterion is

5

called the optimal burn-in time. In the literature, certain cost structures have been proposed, and the corresponding problem of finding the optimal burn-in time has been considered. In this paper, burn-in procedures for a general failure model are considered. In the general failure model, when the unit fails at its age t, Type I failure occurs with probability 1 - p(i), and Type II failure occurs with probability p(t), 0 < p(t) < 1. It is assumed that Type I failure is a minor one, thus it can be removed by a minimal repair; whereas Type II failure is a catastrophic one or total breakdown of the system, thus it can be removed only by a complete repair (or a replacement). Recently, Ref.5-Ref.7 studied two types of burn-in procedures(Burn-In Procedure I & II) for the general failure model. In this paper, under the general failure model, the problem of determining optimal burn-in time which minimizes the total expected warranty cost is investigated. In most burn-in models, the optimal burn-in time cannot be given by an explicit form and must be obtained numerically. In this case, some bounds for the optimal burn-in time may be useful since the numerical search for the optimal burn-in time will be greatly reduced. This paper presents upper bounds for optimal burn-in time under a more general assumption on the shape of failure rate function of products which includes the traditional bathtub-shaped failure rate function as a special case. In Section 2, a more general assumption on the shape of failure rate function is introduced. In Section 3, the detailed burn-in model which is studied in this paper will be described and the upper bounds for optimal burn-in time will be presented. Finally, some concluding remarks are discussed in Section 4.

2. General Assumption It is widely believed that many products, particularly electronic products or devices such as silicon integrated circuits, exhibit bathtub-shaped failure rate functions. This belief is supported by much experience and extensive data collected by practitioners and researchers in different industries. Hence many researches on burn-in have been done under the assumption of bathtub-shaped failure rate function. Recently, there have been many researches on the shape of failure rate functions of mixture distributions. For instance, in Ref.3, Ref.4 and Ref.11, the shape of failure rate functions of mixture distributions which is neither of the traditional bathtub-shape nor of the modified bathtub-shape are in-

6

vestigated, where Ref.11 pointed out that the assumption of the traditional bathtub-shaped failure rate function could be rather a restrictive assumption for the researches on burn-in procedures. Especially, Ref.12 asserts that the bathtub-shaped failure rate function describes only 10% to 15% of applications. In this paper, more general model for the failure rate function is assumed. It can be seen that this general assumption includes the traditional bathtub-shaped failure rate function and the modified bathtub-shaped failure rate function as special cases. Definition 1 A failure rete function r(x) is eventually increasing if there exists 0 < XQ < oo such that r(x) strictly increases in x > xo- For an eventually increasing failure rate function r(x) the first and second wearout points t* and t** are defined by t* — inf{t > 0 : r(x) is nondecreasing in x > t} £** = inf{t > 0 : r(x) strictly increases in x > t}. Obviously 0 < t* < t** < xo < oo if r(x) is eventually increasing. Note also that if r(x) has a bathtub shape with change points *i < *2 < oo(or r(x) is modified bathtub-shaped failure rate function with 0 < t 0 < £i < *2 < oo), then it is eventually increasing with t* = t\ and t** = t2- Therefore, the eventually increasing failure rate function includes both the traditional bathtub-shaped and the modified bathtub-shaped failure rate function as special cases. Ref.13 considered optimal burn-in under the assumption of eventually increasing failure rate function. In this paper we will derive the upper bounds for optimal burn-in time under the assumption of eventually increasing failure rate function. For more detailed discussions about general assumptions for the shape of failure rate function in burn-in model, see also Ref.8. 3. Optimal Burn-in In the present paper, during the burn-in process, the following two types of burn-in procedures are considered under the general failure model: • Burn-In Procedure I: Consider a fixed burn-in time b, and begin to burn-in a new component. If the component fails before burn-in time b, then repair it completely regardless of the type of failure with shop complete repair cost cs, and then burn-in the repaired component again (i.e., restart the burn-in procedure), and so on.

7

• Burn-In Procedure IT. Consider a fixed burn-in time b, and begin to burn-in a new component. On each Type I failure during burn-in, only minimal repair is done with shop minimal repair cost 0 < csm < c s , and continue the burn-in procedure for the repaired component. If a Type II failure occurs before burn-in time b, then a complete repair is performed with shop complete repair cost c s , and then restart the burn-in procedure for the repaired component. Note that Procedure I stops when there is no failure during a fixed burn-in time (0, b] at the first time, whereas Procedure II stops when there is no Type II failure during a fixed burn-in time (0, b] at the first time. In many cases, because of practical limitations, products which fail during the burn-in are just scrapped. In this situation the Burn-In Procedure II can not be applied, and only the Burn-In Procedure I should be used. On the other hand, when the minimal repair method is applicable during a burn-in process, both the Burn-In Procedure I and the Burn-In Procedure II can be applied. In this situation(when both Burn-In Procedure I and Burn-In Procedure II are applicable), the problem of selecting a better (i.e., more economical) burn-in procedure is important to save total expected cost. Using similar arguments as those given in Ref7, E[hi(b)], i = 1,2, can be obtained by F(b)

*F(6)'

and mh

,M1

h[tl2{0)\

=

cotfG0(t)dt ^-rrr Go (6)

H

c sm [J 0 b (l -p(t))r(t)G0(t)dt} -=-—r G0(b)

G0(b) Cs, G0(b)

h ^

where Gb(t) = 1 — Gb(t) is given by Gb(t) = P(Yb > t) = exp{— / p(b + u)r(b + u)du} = exp{-[Ap(b

+ t)-Ap(b)}},

Vt>0,

In this paper, we consider the following warranty policy. • Warranty Policy: A new system surviving burn-in time b is put into field operation and the system is used under a warranty policy under which the manufacturer agrees to provide a replacement system w'th burn-in time b for any system that fails to achieve a lifetime(time to total breakdown of the system) of at least w.

During field operation, if a Type I Failure occurs then it can be removed instantly in the field operation by a minimal repair with minimal repair cost c m . If a total breakdown occurs then it is sent to the repair shop and replaced by another new burned-in system with burn-in time b with replacement cost Cf, and so on. Implicit in the above defined warranty policy is that, under policies of this type, replacement systems are warranted anew. Similar warranty policy is considered in Ref. 2 (p.l84, Policy 6) for an item with only Type II failure(i.e., p(t) — l,Vi > 0). Now the total expected warranty cost is derived. Let Wi(b) be the total warranty cost when the Burn-In Procedure I or Procedure II is applied, respectively, i = 1,2. Also let Yb\ be the time to Type II failure of a new burned-in system with burn-in time b which is firstly put into field operation. Then the following two equations hold : E[wi(b)\Ybl >w}= E[hi(b)} + c ^ J V t Z f t H J l n i > w],

(1)

and E[wi(b)\Ybl <w}=

E[hi{b)} + cmE[N(Zb(w))\Ybl

< w]

+ cf + E[Wi(b)],

(2)

for i = 1,2. Combining (1) and (2), it holds that E[wi(b)] = E{hi(b)} + cmE[N(Zb(w))\

+ Gb(w) • C / + Gb{w) • E[Wi(b)],(Z)

for i = 1,2. From (3) and the results given in Ref1, we have E[hj(b)] + CmEjNjZbiw))] + Gb(w) • cf p f l , ,M1 E{Wi{b)] = G^w) _ E[hj(b)] + cm • $™ r(b + t)Gb{t)dt + (cf - cm) • Gb{w) Gb{w) '

(4}

for i ~ 1,2. Let 6* be the optimal burn-in time which minimizes E[u>i(b)} in (4), i — 1,2. The following results give upper bounds for optimal burn-in time. Theorem 3.1. Suppose that the FR r(t) is eventually increasing with the first wear-out point t* and p(t) is strictly increasing function. Let b* be optimal burn-in time which minimizes E[wi(b)], i = 1,2. Then optimal burn-in time satisfies 0 < b* < t*, i = 1,2. proof It is obvious that E[hi(b)] is strictly increasing in b > 0. Observe that ^ | ^ ob

= -[p(b + w)r(b + w)- p(b)r(b)} • exp{-[A p (6 + w)-

Ap(b)}}

9

< 0 , Vb>t*. This implies that Gb{w) is strictly decreasing in b > t* and Gb{ui) = 1 — Gb(w) is strictly increasing in b > t*. Thus both E[hi(b)]/Gb(w) and (cf — Cm)Gb{w)/Gb(w) in (4) are strictly increasing in b > t*. Now we consider

vQ>) =

cm-J™r(b + t)Gb(t)dt Gb(w) i-b+w

= cm exp{Ap(6 + w)} • /

r(t)

exp{-Ap(t)}dt.

Differentiating n(b), we have b+w

r(t)exp{-Ap(t)}dt

/ + cmexTp{Ap(b + w)} • [r(b + w) exp{-Ap(b

+ w)} - r(6)exp{-A p (6)}] rb+w

= cmp(b + w)r(b + w) • exp{Ap(b + w)} • /

r(t)ex-p{-Ap(t)}dt

Jb

+ Cm [r(b + w)- r{b) exp{A p (6 + w)-

Ap{b)}}

pb-\-w

> cmr(b + w) exp{Ap(6 + w)} • / p(t)r(t) exp{-Ap(t)}dt Jb + cm [r(b + w)- r(b) exp{Ap(6 + w)- Ap(b)}\ = cmr(b + w) exp{Ap(6 + w)} •[- exp{-A p (i)]b+w b + cm [r(b + w) - r(b) exp{Ap(6 + w)- Ap(b)}\ = cm[r{b + w)-r(b)}exp{Ap(b + w) - Ap(b)} > 0, V6 > t*. Accordingly, we have shown that E[wi{b)] is strictly increasing in b > t*. Therefore, we can conclude that b* < t*. • Corollary 3.1. Suppose that the FR r(t) is DIB with the first change point t\ and p(t) is strictly increasing function. Let b* be optimal burn-in time which minimizes E[wi(b)], i — 1,2. Then optimal burn-in time satisfies 0
10 obtained t h a t t h e optimal burn-in t i m e must be before t h e first change point t i if t h e underlying lifetime distribution has a DIB FR. In this paper, t h e problems of determining optimal burn-in t i m e have been considered u n der a general F R model for burn-in procedures, i.e., eventually increasing F R . This general F R model includes as special cases not only t h e DIB F R or modified DIB F R b u t also t h e F R s of m a n y mixture distributions t h a t have a t t r a c t e d a great attention recently. Assuming the proposed general F R model, we have obtained u p p e r b o u n d s for t h e optimal burn-in time. As it can be seen from the above, t* plays the same role as t±.

References 1. F. Beichelt, A Unifying Treatment of Replacement Policies with Minimal Repair, Naval Research Logistics, 40, 51-67, 1993. 2. W. R. Blischke and D. N. P. Murthy, Warranty Cost Analysis. Marcel Dekker, Inc., New York, 1994. 3. H. W. Block, Y. Li and T. H. Savits, Initial and Final Behavior of Failure Rate Functions for Mixtures and Systems, Journal of Applied Probability, 40,. 721-740, 2003. 4. H. W. Block, Y. Li and T. H. Savits, Preservation of Properties under Mixture, Probability in the Engineering and Informational Sciences, 17, 205-212, 2003. 5. J. H. Cha, Burn-in Procedures for a Generalized Model, Journal of Applied Probability, 38, 542-553, 2001. 6. J. H. Cha, A Further Extension of the Generalized Burn-in Model, Journal of Applied Probability, 40, 264-270, 2003. 7. J. H. Cha, On Optimal Burn-in Procedures-A Generalized Model, IEEE Transactions on Reliability, 54, 198-206, 2005. 8. J. H. Cha and J. Mi, Optimal Burn-in Procedures in a Generalized Environment, International Journal of Reliability, Quality and Safety Engineering, 12, 189-202, 2005. 9. F. Jensen and N. E. Petersen, Burn-in: John Wiley, New York, 1982. io. W. Kuo and Y. Kuo, Facing the Headaches of Early Failures: A State-ofthe-art Review of Burn-in Decisions, Proc. IEEE, 71, 1257-1266, 1983. 11. G. Klutke, P. C. Kiessler and M. A. Wortman, A Critical Look at the Bathtub Curve, IEEE Transactions on Reliability, 52, 125-129, 2003. 12. D. Kececioglu and F. Sun, Environmental Stress Screening: Its Qualification, Optimization, and Management: Prentice Hall, 1995. 13. J. Mi, Optimal Burn-in Time and Eventually IFR, Journal of the Chinese Institute of Industrial Engineers, 20, 533-542, 2003.

AN EFFICIENT APPROACH TO ENUMERATE SCG ARISING IN CRNR EVALUATION S. K. CHATURVEDI, RAJESH MISHRA Reliability Engineering Centre, Indian Institute of Technology Kharagpur- 721 302 (WB),

Kharagpur,

skcrec@hijli. iitkgp. ernet. in This paper proposes an efficient approach to enumerate the subsets of minimal cutsets (SCGs) of a communication network having heterogeneous link capacities to evaluate capacity related reliability (CRR), i.e. probability that the network has, at least, a minimum carrying capacity, (WmitJ, between a (s, t) pairs of nodes. In this paper we refer it as CRNR. The efficient methodologies in vogue use apriori information of either path sets or cutsets of the network, which can block or allow a required amount of flow through the network, termed as a first step in CRNR evaluation. In the second step, the disjoint sets of the qualified cutsets or pathsets are obtained using an appropriate SDP approach. Efficient approaches do exist for the second step to obtain the mutually disjoint terms; however the first step is still in open area of research and has attracted much attention in the recent past. In the present paper, we focus on to devise a methodology to solve the first step in the CRNR evaluation using minimal cutsets and propose an efficient technique to enumerate irredundant SCGs, which would be devoid of any redundancy check overheads at the end by which most of the existing algorithms suffer. The technique is applied to several complex networks and experimental results are obtained. A comparison with respect to the number of subsets generation, number of external/internal redundant subsets removal in obtaining irredundant SCGs with recent algorithms have been made to show less computational efforts in using the proposed approach and thus a better performance of than the existing approaches.

1.

Introduction

A general communication system can be modeled as a probabilistic graph G (V, E), which consists of a set of V nodes and a set E of links, directed or undirected depending upon the corresponding communication channel being simplex (duplex). Various measures for the reliability index of a communication network have been proposed in the literature [1]. The most common quantitative index in reliability analysis of such system is 's-7 reliability'. However, the assumption that the network can always carry the desired amount of information between (s, t) pairs of nodes whenever a connectivity exist (the links capacities are large enough to sustain the transmission of any size) is unrealistic and economically unjustifiable in the design of communication networks as the link capacity is a 11

12 function of cost and definitely limited. Each link of the network can have different capacity and is required to transmit a specified amount of flow from source to the terminal node. Lee [2], Misra [3], Aggrawal et al. [4] may be credited for their pioneering works in modifying the definition of reliability for such networks as the probability of successfully transmitting the required amount of information/flow between a specified pair of nodes. Reference [5] calls such performance index as capacity related reliability (CRR), which can also be applied to other networks such as power distribution network, transportation network or a water supply network. In this paper CRR and CRNR are being used synonymously. Obtaining the reliability expression and quantitative assessment of network reliability for such networks with variable link capacities has attracted much attention in the recent past. Literature survey indicates that the evaluationmethodologies can be distinctively put into the following categories: i). ii). iii). iv).

General Approaches [2, 4, 6-9]. Minimal Pathsets and Cutsets Based Approaches [5, 14] Minimal Pathsets Based Approaches [3-4, 10-13] Minimal Cutsets Based Approaches [17-19]

The relative merits and demerits of these methods have been well addressed in [18, 19]. The main thrusts in these methods have been on the efficient enumeration of success (or failure) sub networks for a desired capacity of flow in terms of the nodes and branches of the network. From this information, they to obtain the disjoint sets of these terms by employing well-established Sum-ofDisjoint-Product (SDP) [20-23] techniques thereafter. The greatest advantage of this approach is that the disjoint terms have a one-to-one correspondence with the reliability (unreliability) expression. As mentioned in [17], in most practical systems, the number of cutsets is much smaller than the number of pathsets. It is therefore advisable to address the CRNR evaluation based on minimal cutsets rather than minimal pathsets. Besides, it is easier to handle to a single minimal cutset at a time rather than to handle two or more pathsets at a time to form CP. Therefore, this work focuses on the efficient technique to generate subsets-cut-groups (SCG), each of which is capable of blocking a flow of specified Wmin, which is first step in CRNR evaluation using SDP techniques. Recently, Soh and Rai [18] proposed two cutset-based techniques, viz., Al and A2, which generate the non-redundant valid cut groups in polynomial time in order of number of minimal cutsets of the network, by obtaining valid cuts, referred as subset cut (SC), by utilizing links with capacity less than desired Wmjn contained in each cutset. Later, Soh et al [19] further improved this method

13 by providing a new scheme, named as subset cut enumerations (SCE), to reduce the enumerations and internal redundant SC. The difference in Al and SCE [18,19] lies in the way the subsets of a cut are formed from the set of small_links of a cut, wherein the small_ link set comprises the links of a cut whose capacity is less than Wmin. These methods, however, require lots of bookkeeping and data processing such as keeping a separate list of cuts meeting the criterion in theorem 2, removal of redundant cuts as per theorem 3 etc... to reduce the redundancies to a certain extent. Besides, they enumerate many subsets and external/internal redundant SC as the Wmin approaches closer to maximum capacity of the network. In the present paper, the author proposes an algorithm to generate the irredundant SCGs that can be fed as input to any SDP based reliability evaluation algorithm to obtain the CRNR. The generation of such terms requires apriori knowledge of minimal cutsets arranged in order of increasing order of their flow blocking capacity and within the same value of blocking-capacity, a lexicographic ordering. The ordering scheme not only helps in reducing effort in enumerations but also help eliminates the internal/external redundancies through simple validity checks by proposing two Equations. Besides, it proposes a subset-generating scheme, from certain order onwards, which are enumerated and tested for valid SCGs through proposed Equations. Further, the unqualified SCGs, if any, are used to generate next higher order subsets of a cutset to reduce the number of subsets enumerations. The entire approach is presented with an illustrative example. Notation C Cmax Wmin CA Ns nC i, j 1 L 2.

A Cutset or a Subset of Cut Maximum Flow Capacity of the Network Desired Flow through Network Cut-capacity Column Vector Current Capacity of Network Number of Minimal Cutsets Index Network-Link. Number of Network Links

Development of Algorithm

In the following paragraph, authors present some of the preliminaries forming the building blocks of the proposed approach by utilizing the following

14

information, viz., 's-t' minimal cutsets, link-capacities, and Cmax from the cut information. We employ the cutset matrix, A and its cut-capacity vector, CA [17]. 2.1 Preliminaries For a given Wmin, a minimal cut of the network may itself be a SCG or its subsets would form SCGs. A SCG would (i) block the desired flow, Wmin or (ii) be a redundant one (external/internal) or (iii) not block the flow at all (invalid). The equation that is being proposed to compute the capacity on removal of some links contained in a SCG (or cutest as whole) with capability of removing external redundancy is based on the following idea: For any ith minimal cut set, ordered in their respective flow-capacity and lexicography, a SCG of this cutset would either keep the maximum carrying capacity of the network intact or it would decrease the capacity to a certain lower level lesser than the maximum carrying capacity of the network, i.e., to the current flow-capacity of the network, Ns, on removal of such links in SCG from the network can be computed by, N5 = min[CX],

(1)

Where, CX = CAj-Ck V / < / , a n d CAj = Capacity of j t h element of CA Ck= Sum of capacities of links contained in kth SCG(or cut) of certain order. Eq. (1) not only provides the exact network flow capacity on removing a set of links from the network whose capacity-sum is in Qc but also helps in identifying external or both external/internal redundant SCG. However, working on several examples, it does fail to locate the existence of SCG that would be only internally redundant. The following solves this problem: A SCG is said to be internally redundant, if any link contained in it has its capacity value < A, where A = Wmin - Ns (2) The value of A provides the margin by which the network capacity can be improved through the reinsertions of link(s) from a SCG (note that the SCG is nothing but link(s) taken out from the network). And if any reinsertion of link(s) of this set cannot improve the capacity of the network up to, Wmin, implying that this link(s) in this set is redundant and the SCG under consideration is an internally redundant SCG.

15 2.2 Is cut itself a SCG or it needs its Subsets Enumeration? We utilize Eq. (1) and (2) to ascertain whether the cut itself is a SCG or its subsets would form SCG. Obviously, any i,h minimal cut would produce a current network flow capacity, Ns, equal to zero occurring only at ith position in Eq. (1). Thus, A = Wmi„. Now, if no link in the cut has capacity < A would imply that the cut itself is a SCG. However, if the cut has some link(s) capacities < A , then there could be a certain sets of links of this cut, which are capable of blocking a flow of Wmin. Thus, subsets of this cut will have to be formed to determine those SCGs. Other possibilities could be: (i) When Ns * 0 (<Wmin) not equal to zero but occur only at ith position. This situation occurs when we have already identified and removed the first order SCG from a cut. However, this situation can be dealt with in a similar manner as is done for the cut itself explained in the above paragraph. (ii)When Ns * 0 (<Wmin) occurs at ith and at j * position(s) or only at j t h position(s) (j 1) and/or some of the SCGs are going to be externally redundant. Besides, Ns<Wm;n, may occur at more than one positions. In this scenario, subsets of the cut will have to be formed. 2.3 What Initial Order? Once it is established that the subsets of the cut will have to be formed, the next task is to determine what order of subsets to be enumerated initially to reduce the number of enumerations and validity checks? The situation arises when a cut has some link(s) capacity < A or situation (ii) as stated above. This is dealt with in the following manner: For an i* cut: (i) Arrange the capacities of links contained in the cut in decreasing order, (ii) Calculate, M = CArWmi„. (iii) Determine the minimum number of links needed to provide capacity value > M, by summing their individual capacities. The number of links so determined would be the initial order of SCGs to be enumerated, which would be checked for valid/invalid/redundant SCGs. The remaining SCGs, if any, are then carried over for next higher order SCGs enumerations. The following illustrations are used to explain the above points. Consider a network shown in Fig. 1 with its minimal cutsets and cut-capacities as shown in

16 Table 1. The link capacities are shown in brackets along with their respective link number. Note that the minimal cutsets are arranged in order of their capacity and lexicography. Table 1. Minimal Cutsets for the Network in Fig. 2 Cutset Cut Capacity

(D

Figure 1. 6 Node, 11 Link Network, Cmax = 15 units

C, = {4,5,6} C 2 ={4, 8,11} Cj={1.2} C 4 ={9,10, 11} Cs= {4,7, 10, 11} C 6 ={1,3,6} C 7 = {2,3,4,5} C8={5,6,7,9} C,= {7,8,9, 11} C,o= {5,6,8,9, 10} C„ = {2,3,5,7,9} C 12 = {1,3,5,8, 11} Ci3 = {1,3,5,7, 10, 11} C,4={2,3,5, 8,9, 10}

15 18 19 20 20 23 25 25 28 33 35 36 38 43

Example 1 Consider the 3rd cut with its link capacities shown in brackets, {1 (10), 2(9)}, CA3 = 19. Let Wmin = 6. Applying Eq. (1), Ns = min [15, 18, 0] = 0, occurs at position, i =3. From Eq. (2), A = 6. Since there is no link with capacity < A, {1, 2} is itself an irredundant SCG. Consider 4* cut, {9(9), 10(5), 11(6)}, CA4 = 20. Applying Eq. (1) and (2) provide, Ns = min [15, 12, 19, 0] = 0 occurs at position i = 4 and A = 6. Since there are links with capacity < A, SCG will have to be formed. Example 2 case (i) Consider the 8th cut {5 (5), 6 (6), 7 (5), 9(9)}, CA8 = 25. Let Wmin = 10. This cut had a first order SCG {6}. The remaining links in the cut are {5, 7, 9}. Applying Eq. (1), i.e., Ns = min {10, 18, 19, 11, 10, 23, 20, 6}= 6 at position/= 8. From Eq. (2), A = 4. Since there is no link < A, {5, 7, 9} is itself a SCG. Example 3 case (ii) Consider 12* cut, which had a first order SCG {1}, i.e., {3(7), 5(5), 8(8), 11(6)}, CAI2 = 36 and Wmin = 10. Applying Eq. (1) yields, Ns = 4 at i = 2 (* 12). So, M = 26 and link capacity are arranged in decreasing order as {8, 7, 6, 5}. All four links are failed to provide capacity > 26 and so no SCG generation is performed.

17 Consider 5th cut, {4(4), 7 (5), 10(5), 11(6)}. For Wmin = 10,N, = min {11,4, 19, 9, 0} = 0 and Ns < Wmjn occurs at three positions, viz., at {2, 4, 5}. Therefore, after calculating M = 10 and arranging the capacities in decreasing order {6, 5, 5, 4}, we find that the SCG of minimum order two are to be generated and if required, then higher order. For Wmin = 6, there are two positions at which Ns <Wmin and M = 14. Thus, SCG of third order only are to be generated. Clearly, this method greatly reduces the number of subsets generation. The following example illustrates and proves points. Example 4 Consider /' = 11th minimal cutset {2 (9), 3 (7), 5 (5), 7 (5), 9 (9)} of the network. For, Wmjn = 10 units, Eq. (1) and (2) yields Ns< Wmin at three positions, viz., at {7, 8, 11} with M = 25. It entails the SCGs to be generated of order four. Validating these SCGs, it is found that all SCGs are redundant. In the above illustration, the number of SCGs of order two and higher would be 26, i.e., (25-6), had we started generating SCGs of second order onwards. However, we generate only five SCGs. Obviously, the number of SCGs and their order would further reduce if the valid SCGs were obtained at the initial stages. In fact, this happens as the desired capacity, Wmj„, increases from some minimum value to the maximum carrying capacity of the network. 2.4 Efficient Enumeration of Particular Order SCGs of a Minimal Cut Once we establish the order of enumeration, we can generate SCGs of a particular order in the following manner. Let us represent a 6th order cut with a set of ordered numbers, S6 = {1, 2, 3, 4, 5, 6} to represent the position of a link in the cut. Let order of subsets generation required is third. Taking the last three terms of S6 (equal to the order of enumeration) provides a term = {4, 5, 6}. From this term, we generate all the other terms by decreasing each terms in a logical manner by noting that the first term in this set could decrease up to 1, second up to 2 and third up to 3. In other words, the last term of third order in the list would be {1, 2, 3}. Clearly, if all SCGs of a particular order are valid or redundant then there is no need of generating higher order SCGs. However, the remaining SCGs, if any, are used to generate the next higher order SCGs and are checked further. Authors have utilized a simple approach for generating higher order subsets from lower order as described in [24]. This process of SCGs generation and validity check is repeated until the highest order SCGs, i.e, (degree of cutset -1), are generated and checked.

18 2.5 External or Both External/Internal Redundancy Removal In Eq. (1), for the i'h cutset, Ns < Wmin, may occur at position(s) lesser than i'h. It implies that some SCGs have already been encountered in the SCGs of some other and already processed (such as SCGs generation and external or internal redundant check etc..) cut (< i). Thus we generate SCGs of a particular order of this ith cut. On these each SCGs, we reapply Eq. (1) and check, if Ns < Wmin, occurs at a position(s) lesser than i'h, if it happens then the SCG would be externally redundant. Example 5 Reconsidering i = ll'h minimal cutset {2, 3, 5, 7, 9} of the network shown in Fig. 2 and consider the following cases: Case (i): Let us consider one of its SCG {2, 3, 5, 7} for Wmi„=10. Eq. (1) for this combination would be: Ns = min {10, 18, 10, 20, 15, 16, 4, 15, 23, 28, 9} = 4 units and Ns< Wmin occurs at 7th and ll l h positions rather than only at 11th position, (< i, i.e., 7 < 11). This implies that although {2, 3, 5 7} is a SCG but it is a superset of SCG {2, 3} generated by 7th minimal cutset, {2, 3, 4, 5}, processed earlier and is therefore externally redundant. In fact, for the 7th minimal cutset, Eq (1) for the subset {2, 3} is min [15, 18, 10, 20, 20, 16, 9] = 9, and minimum occurs at 7th position, which provides {2, 3} as a valid SCG and this SCG is not used further to generate its third order SCGs. Likewise, {2, 5, 7, 9} would be detected as a superset of a valid SCG {5, 7, 9} produced by 8th cut set earlier. Case (ii): Consider 5th minimal cutset, {4, 7, 10, 11}, of the same network wherein after the test on this cut, it is found that SCGs of order two onwards are required to be generated. Let us consider the subset (SCG), {4, 7}, for which Eq. (1) yields, Min [11, 14, 19, 20, 11] = 11 > Wmin. (Not a valid SCG but possibly adding one or more link of the cut to this SCG might give a valid SCG. Thus it is be taken to generate next higher order combination) However, for its next order combination, {4, 7, 10} and (4, 7, 11}, Eq. (1) yields: Min [11, 15, 19, 15, 6] = 6 (a valid SCG), and Min [11, 8, 19, 14, 5] = 5 (A redundant SCG.). In this case the current network flow capacity on removing links {4, 7, 11} would be 5 units (can be verified visually). However, Ns < Wmin has occurred at two positions. Therefore, this SCG is redundant. Basically, {4, 7, 11}3 {4, 11} or (7, 11}, which are valid and non-redundant SCGs. In fact, it is a case of both external for {4, 11} and internal redundant for {7, 11} SCG detected by Eq. (1) but can also be removed using Eq. (2).

19 Similar, situation can occur on a tie between the minimums, i.e., for a SCG, {10, 11}, min [15, 12, 19, 9, 9] = 9. Here again, the current network flow capacity would be 9 units. However, the set is not a valid SCG as Ns < Cs has occurred at 4th position as well. 2.6 Internal Redundancy Removal Let us again illustrate it through a case. Case (iii): Consider the example in Fig. 2 of [Example and Fig. 2 in 18]. Only, the network is reproduced for the sake of brevity. The maximum network flow capacity is 10 units and let the desired network capacity (Wmjn) be 4 units.

Figure 2. 7 Node, 15 Link Network

Consider SCG, {1, 2, 7, 11} of the third minimal cutset, {1, 2, 7, 11, 15} in the order of its blocking capacity with respective capacities of links as {3, 1, 4, 3, 1}. Applying Eq. (1) yields, iVj = min{10, 8, 1} = 1 < 4 units, which appears to be a non-redundant SCG. However, capacity of link '2' < A (= 3) and even if link '2' is reinserted in the network could only raise the network capacity to 2 units, still less than Wmin. So link 2's presence or absence does not matter for a specified Wmin = 4. In fact, {1, 2, 7, 11}^ {1, 7, 11}, and {1,7, 11} has already been detected as an irredundant SCG in an earlier iteration implying {1, 2, 7, 11} is an internally redundant SCG. The foregoing paragraphs have explained the building blocks of the proposed approach. Using the above observations and cases, one can easily write the various algorithmic steps to follow. In fact, the authors have implemented the algorithm in MATLAB 7.01 on PIV machine under Windows environment, which can be obtained from the first author. 3.

Experimental Results, Comparison and Discussion

20

The proposed method is simple and is applied on several networks and the results provided by the implementation were exactly the same as reported in [18, 19]. To compare the performance of the algorithm, authors provide experimental results and comparison with the recent approaches by taking the network shown in Fig. 3, treated as a complex network in recently reported algorithms [18, 19]. The number of minimal cutsets of the network is 7376.

•/x N

30

8

r

25

2119

9

4

s

5

3

6 7 9

13

2 1

29

\

14

22

23 <>

—( » (Link/Cap)

21

l>

6

15 1 2

12

W "

7 24

\20 \

11

w

16

—< 1

/

10

...

30

6 7 8 9 6 8 7 6 9 8 7 6 8 9 7 9 7 8 9 8 9 9 8 9 7 Figure 3. 20 Node, 30 Link Network

The results obtained from the proposed approach for various Wm;n are tabulated in Table 2. Columns 3-9 of the Table 2 show the power of Eq. (1) & (2) in detecting external/internal redundant subset cut (SC), viz., ERSC and IRSC and the subsets (INVSC) which were not able to block the required flow. Column 6 shows that number of valid SCG produced from the total subsets generation (TSC). Since the algorithm first checks the existence of first order cuts and if found, it modifies the cut matrix, A. In this situation, before it qualifies for its subset enumeration there could be a possibility of remaining links in that cut can not block the flow and would need no subsets enumeration. Column 7 depicts the number of such cuts. Likewise, a cut as a whole may be necessary or if it had a 1-order SCG then the remaining links would.be required to block to specified amount of flow. This situation is depicted in column 8, if there is an entry in column 9. Clearly, no subset generation is required if the Wmin remains less than or equal to the minimum capacity of a link in the

21 network as proved in [18, 19] and substantiated by proposed Eq. (1) and (2), respectively. Tabic 2. Experimental Results for the Network in Fig. 3

TSC

ERSC (x)

IRSC (y)

INVSC

(z)

SCG from SC(w)

RC

1-6

0

-

-

-

-

7

62381

28198

0

26756

7427

8 9

70226

49467

0

11906

8853

70374

60934

0

2373

10-12

70374

65412

0

0

4962 4675

-

w '* mm

7067

ICSC (X) 7376

l-SCG (Y)

217

2 0 0 0

13 14

223371

167065

180

51451

214568

168728

4956

0

321

222601

197263

294 321

40590

15

20427

4590

0

203

16 17

131241

126038

2165

1124

69774

50 1

2988

71647

760

1112

1191

16 82

18 19

51634

50901

660

1158

116

61572

0 12

73

65367

3279

504

1056

20

96971

88787

14

7617

553

1146

114 88

21

124814

116002

17

8120

675

1212

92

22

71171

67322

24

3316

409

2468

62

TSC: Total SC, ERSC: External Redundant SC, IRSC: Internal Redundant SC, INVSC: Invalid SC, RC: Redundant Cut, ICSC: Irredundant Cut or SC , TSCG: Total SCG

Since, the key issue in CRNR problem lying in generation of valid SCGs from subsets of cuts, we make the performance comparison with reference to this parameter with [18, 19] based on -how the valid SCGs are generated and from how many subsets. Table 3 shows the comparison with respect to the number of subsets of cut generation by different algorithms. Column four of the Table shows whether T3 [18, 19] provides the benefits or not. However, there are situations where T3 does not provide any benefits [18, Network of Fig. 7] and for some values of Wmi„ [18, Network of Fig. 9]. There is no way apriori to ascertain whether T3 would provide benefits or not, thus there remain overheads of applying T3. A graphical representation of experimental results of the number of subsets enumerated by various algorithms (Columns 2, 3 and 5 of Table 2) with varying Wmjn is also shown in Fig. 4 for visualizing the efficiency of the proposed algorithm.

22 Table 3. Subsets Generation Comparison for the Network in Fig. 3

w

T3/B [19,20]

SCE [20]

Al [19]

" mm

1-6 7 8 9

10-12

378642

70374

13 14 15 16 17 18 19 20 21 22

0

0

16925

10196

68978

25395

198032

48906

864170

244292

1114879

331881

1185575

346776

1197592

333725

1198806

321063

1198816

311826

1513832

474411

2075477

769937

2466397

967941

2599331

1003599

Y/Y Y/Y Y/Y Y/Y Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N

Total S C G

Proposed

0

7376

62381

7644

70226

8855

70374

7067

70374

4962

223371

4675

214568

5278

222601

4794

131241

2184

71647

1199

51634

782 624 647 773 479

65367 96971 124814 71171

3000000 2500000 e 2000000 O 1500000 1000000 J3 500000 3 V> 0

Al SCG Proposed

Figure 4. Comparison of Subsets Generated for Fig. 3

Summarily, the proposed method definitely efficient than the method proposed in [17, 18, 19] as it substantially reduces the number of subsets generations, removes the internal/external redundancies simultaneously rather than its removal after generating all cut groups. Eq (1) can also be used to provide the network capacity on removal of certain links from a cut set. Further, as the desired capacity approaches closer to the maximum capacity of the network, the number of subsets generation drastically decreases in comparison

23

to A1 and SCE wherein there are polynomial rise in number of generated subsets in [18, 19]. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.

C. J. Colbourn, The Combinatronics of Network Reliability, Oxford University Press, NY, (1987). S. H. Lee, IEEE Transaction on Reliability 29, 24 (1980). K. B. Misra and P. Prasad, IEEE Transaction on Reliability 31, 174 (1982). K. K. Aggrawal, Y. C. Chopra and J. S. Bajwa, IEEE Transaction on Reliability 31, \77 (1982). S. Rai and S. Soh, IEEE Transaction on Reliability 40,441 (1991). W. J. Rueger, IEEE Transaction on Reliability 35, 523 (1986). C. C. Jane and Y. Yuan,, European Journal of Operations Research 131, 664(2001). S. Rai, A. Kumar and E. V. Prasad, Reliability Engineering 16, 153 (1986). L. Y. Qiu and C. H. Zhong, Microelectronics and Reliability 43, 1946 (1994). K. K. Aggrawal, IEEE Transaction on Reliability 34, 2 (1985). K. K. Aggrawal, IEEE Transaction on Reliability 37, (1988). P. K. Varshney, A. R. Joshi and P. L Chang, IEEE Transaction on Reliability 43,3(1994). S. M. Lee and D. H. Park,, IEEE Transaction on Reliability 50,4 ( 2001). S. M. Lee, C. H. Lee and D. H. Park, IEEE Transaction on Reliability 53, 4 (2004). R. Schanzer, IEEE Transaction on Reliability 44,4 (1995). K. Kyandoghere, IEEE Transaction on Reliability 47, 1 (1998). K. K. Aggrawal, Y. C. Chopra and J. S. Bajwa, Microelectronics and Reliability 22, 3 (1982). S. Soh and S. Rai, IEEE Transaction on Reliability 50, 4 (2005) S. Soh, Lim K. Y. and Rai S., International Journal of Perf ormobility Engineering 2, 1 (2006) S. K. Chaturvedi and K. B. Misra, International Journal of Reliability, Quality and Safety Engineering 9, 3 (2002). S. K. Chaturvedi and K. B Misra, International Journal of Quality & Reliability Management 19, 8 (2002). T. Luo and S. Rai, IEEE Transaction on Reliability 47, 1 (1998) S. Soh. and S. Rai., IEEE Transaction on Parallel and Distributed System 2,2(1991). G. S. Jasmon and K. W. Foong, IEEE Transaction on Reliability 36, 5 (1987).

SYSTEM RELIABILITY PREDICTION MODEL USING TRUNCATED WEIBULL DISTRIBUTION FOR MANUFACTURING SYSTEM DESIGN HEUNG-SUK HWANG Department of Business Management, Kainan University, No. 1 Kainan Rd., Lu-jhu, Taoyuan, 338, Taiwan e-mail: [email protected] GYU-SUNG CHO Human Resources Development Service Traning Management #1877, geumgok-dong, buk-gu, busan, Korea E-mail: [email protected]

Department

In practice, there arc usually limited test data available for reliability analysis in development phase of manufacturing facility and the obtained data are usually failure and time truncated. This study is concerned on development of a reliability prediction and test model in early acquisition stage of under-development equipment. For the prediction of system reliability, many of failure data in test phases are needed and it takes a lot of long time to test. For this problem, in this study we developed both failure and time truncated Wcibull model for reliability analysis and prediction. The central purpose of this model is focused on the analysis of under-developing system reliability and the prediction of the system reliability based on obtained test data. For the computing purpose we developed computer program, and have shown the results of sample example for both two-state and multi-state reliability.

1.

Introduction

Manufacturing facility design programs for complex products require considerable resources such as time, budget, and manpower to achieve a certain level of system quality acceptable for the users. For the reliability goal in manufacturing facility planning, we have to use the reliability design, prediction, test and reliability growth analysis which we call a process of design-test-fix-test fix. We can see many of researches concerned on reliability model in literature such as: the system reliability from its birth-to-death process and its test by Misra (1992), the system reliability of birth-to-death process by Chisman (1998), stochastic models of manufacturing system by Buzacott (1993), analysis of flow control in alternative manufacturing configurations by Solberg (1980) and Anthony (1993), reliability design based on system performance-cost trade off for 24

25

manufacturing system by Viswanadham (1992), Vineyard (1992), and Hwang (1986, 2001). In practice, there are usually limited methods and also limited test data available. Most of these available test data are failure and time truncated. Especially in case of manufacturing facilities development, it is more difficult to get enough data available for some components outsourcing from foreign countries. Thus it is needed to develop a time and failure truncated Weibull model for reliability prediction and test purposes. 2.

Reliability Prediction and Test Model: Truncated Weibull Model

In practice, there are usually limited test data available in the development and test phases of equipment and in most cases, the obtained data are failure and time truncated. We developed two truncated Weibull models in this research. 2.1

Failure Truncated Model

In the test phase of equipment development, when the failure rate is reduced by a certain level, we can stop the test. For this case we can use this Failure Truncated Model. Now, we consider successive occurrences of a single Weibull process and this follows non-homogeneous Poisson process as shown in Figure 1. stop I 0

X\

XZ

3»_i

XK

Figure 1. Successive occurrence of failures in failure truncated model

The conditional density of Xt

is given by Xj_x=xi_x,

^,U,

, xt

,,..., *,._,) = 1-exp

-(T^n^ 1

f(x, |x,, x2, • • •, x M ) = Y[ f, (x, |x,, x2, • • •, x M ) = (•£•)" exp

A

where, xx <x2<

;-]

A

< x„ < co

/ () : The probability density function of occurrence of n successive failures. To estimate parameters, the likelihood function can be given by;

L = (j-y exp

A

;= i

A

The closed form of MLEs can be obtained by dln(L) = dA

Q an(j

dlnjL) d/3

=

j

26 n - 1

These

are

P=n

and

X =

r

x „

\

(1) These estimates are joint sufficient statistics, and the ratios of the form x„ /*,• are distributed independent of X and fixed p . x„ is sufficient statistics for A . Thus xn and such ratios are stochastically independent. The failure rate of ith occurrence of failure is given as: p,{x,) = A f V T ' a n d M T B F ,

w,oo=koor = P

A*

(2)

1

xf-

The reliability at i,h occurrence of failure is given by /?,-(*,•)=

x, X

exp

M )'] ' *,--! ,

A

(3)

When we have met predetermined level of failure rate, we can finish test with the final failure rate, MTBF and reliability computed by failure truncated model. 2.2

Time Truncated Model

Let r be predetermined and suppose n>\, then the failures are observed for the Weibull process during (O,T) at times 0 < *, < x2 < < x„ . It is the case that last in n th failure is observed at time xn and that no failure are observed during (X„,T) . In this case we could get a little different likelihood function. No Failure

4 X=,l

"J

\

Figure 2. Successive occurrence of failures in time truncated model

L = f(x„x],-;XI,)(1-FT), Thus,

L=

The MLE of h

exp

where, FT =

F(T\x„x},-,xm)

n

and P are given by

l

and /? = " / £ / „

=

The achieved MTBF at time T given as M(T) =

^

1

>s r '

(4)

(5)

27

2.3 Confidence Intervals By the pivotal quantity properties of the MLEs, ,±\p and , ju

can be used in

computing confidence limits on A or p . One of possible method to determine this distribution is Monte-Carlo simulation. Suppose a random sample of size n from truncated Weibull process, Wei( I , P ). Let A = p =1, find MLE, / , and A , then ,JU and (L\P

are pivotal functions of X and P , and the distributions are independent

A,

of A and p , that is the distribution of (i\p «

distribution of £

.

From the simulated sample, data i

, and 3 are computed and find the ^n value x such as: Pr(fiu< x) = a , Pr(X]/"<x) = a, where a is constant value. To determine the appropriateness of the truncated Weibull model for representing the reliability growth, one possible method is to use Cramer-Von Mises goodness of fit statistics (Misra, 1992). u

3.

ll

Model Application to Manufacturing Facility Reliability Prediction

For the illustration purpose, a sample system, automobile parts manufacturing facility consisted of 4 workstations is illustrated in the block diagram shown in Figure 3.

Figure 3. Sample System

3.1 System Configuration and RAM The production rate of the final system is 6.250 unit/hr when availability growth activities are considered for the system performance growth. For this configuration, we considered RAM and cost factors to find the system

28

availability, LCC and production rate. Thus, we used a system performance index (SPI) such as present worth of LCC, system availability, LCC per system availability, LCC/unit product and production rate. The results are summarized in Table 1. Availability Costing Factors System Configuration System Reliability LCC

3.2

Table 1. Comparison of output by alternative Alt. #1 Alt. #2 Alt. #3 Alt. #4 (Initial Sys.)

Alt. #5 (Final Sys.)

1,1,1,1,1

1,2,1,1,1

1,2,2,1,1

1,3,2,1,1

1,3,2,2,1

0.5669 4,537.45

0.6275 5,432.12

0.7121

0.7935

0.8598

6,230.31

7,9541.72

8,258.32

Cost Ratc(LCCAJnit)

1.597

1.685

1.772

1.867

1.920

LCC/unit Product

1.004

0.567

0.418

0.341

0.307

Production Rate

1.591

2.972

4.235

5.495

6.250

Reliability Growth Test

Reliability growth management for manufacturing facilities planning consist of planning, monitoring and controlling the growth of reliability parameters throughout system development cycle in order to achieve the reliability objects for each test phase (DOD,1981) (Krishna, 1992). For this, suppose that the test phase began at time 0, but is planned to continue for an additional S units of testing till test time T+S, at which points the system configuration will have failure rate r0, r0=f.(T

+

Sy-\

S = [r0-f-T]

(6)

An intensity function of the appropriate parametric form is: p(t) = r0[t/(T + S)f~], where, r0>o, J3>0,

0

t

md0
+ S.

T T+s Observed Test Time + Future Tost Time «| Reliability Prediction lest Pliase -j Figure 4. Intensity Function for Case J3 < 1

Figure 4 shows that a decreasing trend during future testing time from T to the time T+S has not yet been accomplished. For the convenience, to find the confidence bounds on M(t), MTBF at time t, we have used Crow (1994)

29 procedure. That is, two-sided ( 1 - a )

100 percent confidence intervals on

M(t) are given as:

(7)

[n .wo.n 2^0]

where, r] , = „ > v and [ ] , = „'/,..,• To compute the J~]i ar, d 112 w e found the solutions rx and r2 from the following equations. A computer program is developed to get the values fji and f]2 f° r various combination of n and a .

jiu-wuJK)=^ 7=1

3.4

and

2-j!0-.)./1(2^)=f

(8)

7=1

Reliability Growth Test Case Problem

We have developed computer programs to compute the parameters estimates, X , p and the value A/(/), MTBF at time t for the manufacturing facility which is under development. Also we developed procedures to compute the confidence intervals for these estimates. For the illustration purpose, two examples are shown. In each case problem the failure modes and the system reliability are defined by the other criteria. Case Problem 1: A reliability growth test phase of the above system has been in progress for T=200 hours. From the test data up to time 200, we wish to find parameter estimates, i,p, M (200 ) , and these 90 percent confidence intervals. The observed failure times ti are as follows; ("=21): 2.2, 3.3, 4.5, 5.3, 5.8, 20.3, 27.4, 34.1, 55.2, 58.4, 61.4, 61.4, 62.2,78.3,78.4,91.9,97.7, 112.4, 116.9, 142.4, 176.8, 181.5 By the equation (4), (5) and we found the following outputs using the computer program developed. Figure 5 shows the sample output of reliability growth test. i = 1.16, /3 = 0.59 and Air(200) = 16.11 THE RESULT OF PARAMETER ESTIMATE

(TIME TRUNCATED)

LAMDA= 1.16 BETA = 0.59 90.0 % CONFIDENCE BOUND OF LAMDA IS ( 0.29, 3.42 ) 90.0 % CONFIDENCE BOUND OF BETA IS ( 0.37, 0,78 ) RELIABILITY, FAILURE RATE AND MTBF FAILURE RATE = 0.06 MTBF =16.11 RELIABILITY = 0.69 Figure 5. Sample output of reliability growth test

30

For the goodness fit test, we found the Cramer-Von Mises goodness of fit statistics with given data as following: CRAMER-VON MISES GOODNESS OF FIT TEST ESTIMATED DATA = 5.9098000E-01 NUMBER OF FAILURES = 21 UNBIASED ESTIMATE OF BETA = 5.6283692E-01 CRAMER-VON MISES STATISTICS =5.2078542E-02 Figure 6. Sample output of goodness fit test

As the computed statistics, c2M - 0.05 is less than the critical value 0.170, thus we can say the truncated Weibull model is well fit for the given problem. Case Problem 2: In case problem 1, we want to continue the test for another 200 hours to improve reliability up to a certain level. We are to find the manufacturing system reliability prediction based on the obtained data during the time 200-400 hour and also to find 90 percentage confidence intervals. Equation (6) yields r0 = 0.0463. By the successively taking t=200, and t=300 we can get; A/(200) =16.11, M(300) =19.19 To find the confidence intervals for M(t) we have computed FTi and Y\2 for each time; 200, 300, and 400 as following: M(200) = 16.11 with 90% confidence interval (9.65, 29.40), M(300) = 19.19 with 90% confidence interval (10.73, 35.02), M(400) = 21.59 with 90% confidence interval (12.93, 39.41). These results are shown in Figure 7 for the comparison purpose.

T

1 0

200

300

400

Test Time (Houra) Figure 7. 90% Confidence Intervals for the MTBF Parameters

5. Conclusions This research is mainly based on truncated Weibull distribution for reliability growth test problem. To reduce the testing time of underdevelopment facilities,

31 we developed truncated Weibull model for prediction and testing system reliability for manufacturing facility which is under development. In this study we proposed two models; time truncated and failure truncated Weibull model which is appropriate for the limited test time and data failure truncated cases. For the confidence interval and goodness of fit test we have developed a simulation method and used Cramer-Von Mises goodness fit test procedures. In practice, there are several problems in selecting appropriate model to analyze reliability growth based on the failure data observed. Further study needed for this problem is the field of auto-test procedure to get appropriate model (exponential, poison, or Weibull process) for a given test data. References 1. A. M. Smith, Reliability-Centered Maintenance, McGraw-Hilll Corporation, (1993). 2. J. A. Buzacott, Stochastic Models of Manufacturing System, Prentice-Hall International, (1993). 3. A. 1. Cho and M. Porlot, A Survey of Maintenance for Multi-unit Systems, European Journal of Operational Research, 51, 1-23, (1991) 4. J. A. Chisman, Discrete Simulation Modeling to Study Large-Scale System Reliability/Availability, Computers and Operation Research, 25, 169-174, (1998) 5. L. H. Crow, Reliability Analysis for Complex Repairable Systems, Reliability and Biometry, eds. F. Proschan and J. Sufling, Philadelphia, SIAM, 379-410, (1994). 6. D.O.D., Reliability Growth Management, MIL-HBK-189, (1981). 7. H. S. Hwang, A Model for Army Life Cycle cost Analysis; ALIA Model, AAD Rep., (1986). 8. H. S. Hwang, Reliability Design based on System Performance-Cost Trade off for Manufacturing, The international Journal of Reliability and Applications 2(4), 269-280 (2001). 9. Krishna, B. Misra, Reliability Analysis and Predictions, Elsevir Amsterdam, (1992) 10. Misra K.B., Reliability Analysis and Production, Elsevier, (1992). 11. J. Solberg, Analysis of Flow Control in Alternative Manufacturing Configurations, Journal of Dynamic Systems, Measurement and Control, (1980). 12. M. L. Vineyard and T. R. Meredith, Effect of Maintenance Policies on FMS Failures, International Journal of Production Research 30(11), 26472658,(1992). 13. N. Viswanadham and Y. Narachari, Performance Modeling of Automated Manufacturing Systems, Prentice-Hall International, (1992). 14. W.G. Ireson and C.F. Combats Jr.(Editors), Handbook of Reliability and Management, McGraw- Hill Book Co., Inc., New York, (1988).

INTEGRATED PERFORMANCE MODEL FOR FACILITY DESIGN, SYSTEM PERFORMANCE-RAM-LCC HEUNG-SUK HWANG Department of Business Management, Kainan University, No.l Kainan Rd., Lu-jhu, Taoyuan, 338, Taiwan e-mail:

[email protected]

GYU-SUNG CHO Human Resources Development Service Traning Management #1877, geumgok-dong, buk-gu, busan, Korea e-mail: [email protected]

Department

The objective of this study is to provide an integrated framework for an effective implementation of manufacturing facility design based on optimizing systems configuration, RAM design and the system life cycle cost. The proposed framework consists of four steps. In Step 1, we set up the initial system configuration to meet the required production rate. In Step 2, we proposed an integrated model of system reliability, availability and maintainability, RAM. In Step 3, we developed a cost model of the system life cycle based on the system configuration and system availability. In Step 4, we developed a simulation model for searching the optimality of the system to meet the production requirement by including the factors of system configuration, system RAM, and life cycle cost. We develop the computer programs and apply it for generating the results of sample regimes of manufacturing facility design. We are expecting that the new framework can facilitate workable base for system performance evaluation in both design and operational stage. Furthermore, we point out that the framework proposed in this paper can be extended easily for various problems solving with a variety form of outputs.

1. Introduction A performance evaluation model for manufacturing system was developed. The proposed model uses a step-by-step comparative approach considering the systems performance factors; such as: system configuration, system reliability, availability and maintainability (RAM) life cycle cost (LCC), and system optimization. In this paper, we have used the system availability and life cycle cost to evaluate diverse flexible manufacturing system designs and also choose an optimal system according to a measure of effectiveness such as system configuration, cost, production rate and availability (Hwang, 1999, 2002, 2004). We proposed a four-step generative approach; 1) The first step is concerned on initial system configuration to meet the required production rate in the first step we assumed the system availability is one (no failures), 2) In second step we 32

33

developed a reliability model for effective implementation of reliability management in the design and procurement of manufacturing facilities. 3) in the third step, we developed a system life cycle model considering system configuration and system availability. We computed the system performance cost indexes for the trade-off analysis between system configuration and availability. 4) In the final step, we developed a simulation model to find the optimal system to meet the production requirement: optimal system configuration, optimal system RAM, and optimal life cycle cost. Figure 1 shows the schematic model of this four-step approach. S t e p 1 : CAti-Wir: Stolen. C-.inriguniuinMuJel - Willinut coiwnicriiip K A M HJKI ("ViMmii • Kri'ii^lihiilTe: c
* 1'iik! LiimLi'J 5\9.lem CdnlibUii'liun S t e p 2 : K A M I ) Model: Hy-'clli i'.iW. lwtinmikii

- Ss'slcjn A.vrii!abihiy bsLmiMtkui (I wu-s'.ue ;uiil TliitfL'-*lii:e Kchdh'ilityi " Sj stem )V!i1uni.ui« Based on KAM S t e p 3 : LCC Model- System !.:le Cycle Malel - Systc;n l.il'e Cyc,c I'.siuniilicm' - Syr.cm Fcitoiiiiiinre Co-t 'i:Jcx " Intc^i-riltxl System Peitormmicc l3<Me.cloii IA.V .....

,

1

.

,

.

.

.

S t e p 4: IPE Model- bncjaalv: rVilbiri'cUice lival'.iaisnn N'IXICI Optimal System pel fo: nidi mo Design - Simulation U-i S.'tciii JVi lbnl)i:i:cv II'.IICACS - ^T>}Hi:i-i«ll System Perlbnr.niicc Design * Optimal Synlcni IVribnnraicc Design :3cu.e*l OI: Syslnn '.7"iit'{;iiiat;cii. RAM fuxl t,C<' Integrated Performance Model for Facility Design

Figure 1. Integrated performance Model for Manufacturing Facility Design

2.

4-Step Performance Model Based on System Configuration, RAM and LCC

Figure 1 outlines the schematic steps of the 4-step performance model which encompasses both analytical (Solberg, 1990) and simulation analyses (Vineyard, 1992). In the first step, a closed queuing network (CQN) model extended from CAN-Q (Solberg, 1980) is proposed to find the initial system configuration to meet the required system performance (production rate). The equations used for computing the system reliability R(t) and maintainability M(t) up to time t, are defined as follows: R{t) = ?r[TIF>t] = \-F(t) (1) MTIF = E[TIF] = [x-

f(x)dx

(2)

For illustration purposes, an example manufacturing system is shown in Figure 2. The example system is an integrated manufacturing system which produces parts through the 6 work-stations such as drilling, reaming, facing, milling,

34

painting and inspection. The target production rate is given as 0.35 part/min and the operation time data is given as Table 1. Table 1. Operation time data of example Work Station Operation Time

W/S 1

W/S 2

W/S 3

W/S 4

1

1.2

1.1

0.97

Buffia-l

o

&*° -°-B

W/S 5 1.2

W/S 6 1.25

Buffer i-1 00--- 0

I

£ss& -oo- o Buffer i

Figure 2. Example System

2.1

Step 1: System Configuration Model, CAN-WIP

In this research, we have developed CAN-WIP model which is an extended model from CAN-Q that is capable of changing the system configuration to meet the target performance (production rate) under the assumption of availability of every workstation is given by one. CAN-WIP model was extended by considering the work-in process inventory. The output of the initial system is compared with that of five system alternatives. The sample output of CAN-WIP model is given in Table 2. Table 2. CAN-WIP output of sample problem Alt 1 2 3 4 5 6

System Configuration ( 1 , 1,1, U , l ) ( 1,2, 1,2, 1,2) ( 1 , 2 , 2 , 2 , 1,2) ( 2 , 2 , 2 , 2 , 1,2) (3,3,3,3,2,4) (3,4,3,4,2,4)

Prod Rate (Part/Min) 0.08991 0.13168 0.15296 0.18662 0.30837 0.37487

System (Util Rate) 0.73 0.65 0.71 0.80 0.84 0.86

2.2 Step 2 and 3: RAM and LCC Model To make the CAN-WIP model more robust, we considered the system RAM and LCC in simulation. The decision making in FMS design is made generally on the basis of the required availability with minimum cost. In this research, we used a new tool for a cost-effective system design by considering system life cycle cost and its availability together. The equations for availability are given as follows:

35

Total Uptime Total Uptime + Total Downtime

X*.

lim E i-i

> during time intervai

(0> t )

E[R] E[R] + E[D]

(3)

i-i

h

where, D,: i' down time interval, R/: Random interval between ilh and (i-lf1 down time, A,: Average availability during time interval (0,t), Ae: Steady state or equilibrium availability, U(t), D(t): the number of up and down time during (0,t). In this research, the life cycle cost is defined as the sum of the acquisition costs, the discounted sum of maintenance costs, break-down repair costs, and logistics support costs during the period of intended use of the system. Mathematically, the total life cycle cost can be expressed by: TCOST= £[TCOST of Subsystem./' incurred during intended time(0, t)J = ^TCOSTj

(4)

where, N : the number of subsystems TCOSTj = (Acquisition Cost)+(Discounted Sum of Maintenance Cost)+ (Discounted Sum of Breakdown Repair Cost)+ (Discounted Sum of Logistics Support Cost of Subsystem j). The maintenance cost consists of corrective and preventive maintenance costs. The logistics support cost is given by the percentage overhead costs attributed to maintenance actions. The logistics support cost for subsystem j is given by: (Logistics Cost)j= atCMj + a2PM y where, a , : percentage, overhead cost for corrective maintenance, a2: Percentage overhead cost for preventive maintenance, CM '• Corrective maintenance cost of subsystem j , PM '• Preventive maintenance cost of subsystem/ Thus, we can represent the total cost incurred during the intended time interval (0,/) as: TCOSTJ = ACt+ DSCMj + DSPMt + a,DSCMj + a2DSPMj + DSRq = ACj+(l + a, ]DSCMj + (l + a2 )DSPMJ + DSRCj

(5)

where, AC}: acquisition cost, DSCMj: discounted sum of CM j , DSPMj: discounted sum of PMj, DSRCj ; discounted sum of break down repair cost To find the sum of osCM and DSRC for subsystem j during intended time interval (0, t), we need the instants of breakdowns t's and the costs of specific

36

breakdowns t's. If we assume that MTBF is independent and identically distributed (i.i.d.) random variables, we can express the mean value of the sum of DSCMj and DSRCJ as: 4DSCMJ + DSRq]=E{USCMJ + USRCj ]

where,

USCM

X z

- ^—

'• Corrective maintenance cost for breakdown on t in subsystem j

USRC • Repair cost for breakdown on time t in subsystem j r: interest rate n: greatest integer not greater than t Thus, we used a system performance index (SPI) such as present worth of LCC, system availability, COA (LCC per system availability), and COP (LCC per unit product) to make trade-off analysis between LCC and various availability or production rate. Such trade-off is useful for examining two or more competing alternatives. For the illustration purpose, six work station manufacturing system was considered as shown in Table 1, 2 and Figure 2. The sample outputs of RAM and LCC using computer program developed for this study are shown in Figure 3. In this sample outputs, we computed the system availability, LCC, system productionrate and cost rate by system configuration and unit production cost. Also, we computed the present value of LCC and LCC/unit product. The impact of system RAM and LCC on system performance is enormous. The production rate of the final system is 22.49 unit/hr, while it decreased to 19.34 unit/hr in Step 2 and 3 where a flexible availability is considered for the system performance factors. RAM AND LCC ANALYSIS (Initial Sys. of Sample Prob.), Jan, 5. 2006 SYSTEM AVAILABILITY: .5592598 COST RATE: 7.9154430 PER HR LCC/UNIT: 1.5340 W PER UNIT PWOFLCC: 8661.1950 W SYSTEM COST: TOTAL CAPITAL COST. 3200.000 W OPERATING COST: TOTAL CM COST... 2168.866 W TOTAL PM COST.... 50.295 W TOTAL MATERIAL COST... .242.034 W WORK STATION AVAILABILITY WORK ST. AVAILABILITY W/S 1 .814 W/S 2 .870 W/S 3 .949 W/S 4 .940 W/S 5 9.72 W/S 6 .973

37 RAM AND LCC ANALYSIS (Final Sys. of Sample Prob.), Jan, 5. 2006 SYSTEM AVAILABILITY: .8598281 COST RATE: 6.8011690 W PER HR LCC/UNIT: 1.3181 W PER UNIT PWOFLCC :11208.670 W SYSTEM COST: TOTAL CAPITAL COST. 3200.000 W OPERATING COST: TOTAL CM COST... 3682.896 W TOTAL PM COST.... 505.010 W TOTAL MATERIAL COST...620.792 W WORK STATION AVAILABILITY WORK ST. AVAILABILITY W/S 1 .848 W/S 2 .982 W/S 3 .994 W/S 4 .994 W/S 5 1.000 W/S 6 .996 Figure 3. Sample output of RAM and LCC

2.3 Step 4: Simulation Model AuotoMod simulator is an integrated software system developed by AutoMod Corp (1998) and it provides a variety of modeling capabilities and output modules. The same example problem with the former steps was run by AutoMod simulator. The outputs of initial and final system are given in Table 3. In Table 2 and 3 the system configuration, (1, 1, 1, 1, 1, 1), means the number of production facility in every workstation. Tabic 3. AutoMod output summary

Output Factor System Configuration Production Rate (Unit/hr) Average Waiting Time (hr) Proc Time (hr)

3.

Initial Sys (1,1,1,1,1,1) 2.729 0.412 0.409

Final Sys (3,4,3,4,2,4) 14.847 0.161 0.111

Application Results and Discussion

A comparative study of manufacturing facility performance using CQN, RAMLCC, and simulation models was done. The result of sample system performances in each step shows the final alternative (Alt 6) to be the best alternative to meet the required production rates and low LCC/unit. These results are summarized in Table 4.

38

When we compare the results of the sample outputs of each step, there are some differences between the outputs of CAN-WIP in Step 1, and that of AutoMod in Step 3. Table 4. Summary of Sample Output b Sys. Performance Sys. Configuration Sys. Availability Sys. LCC/Unit Prod. Ratc(UnifHr) - Step 1: CAN-WIP -Step 2, 3: RAM, LCC - Step 4: AutoMod

Initial Sys. (Alt. 1)

Final Sys. (Alt. 6)

(1,1,1,1,1,1) 0.560 1.53

(3,4,3,4,2,4) 0.825 1.32

5.394 3.02 2.729

22.492 18.556 14.487

The CAN-WIP models over estimates the system production rate a little more than that of model simulation model. For example, the production rate of alternative 6 obtained by CAN-WIP is given by 22.49 unit/hr, and it is decreased to 18.556 in Step 2 and 3 where system RAM and LCC are considered but it decreases to 14.847 unit/hr in case of AutoMod simulation model as shown in Table 4. Here, CAN-WIP model assumed the system availability equal to one that is there are no failures in system, but in step 2 and 3, RAM-LCC model considered all the reliability, maintainability and availability, RAM and considered the life cycle cost. Thus in step 2 and 3 the production rate given by small value than in case of CAN-WIP. In simulation model, the production rate is given by lowest value, 14.487 unit/hr, as it is expected. The results of sample outputs of AutoMod are shown in Table 4 and they are superior to the other models. The reason is that it is capable of considering a variety of system design factors and operational conditions, such as load size, WIP, queuing policies, maintenance policies, and blocking. We can recommend that we had better to use more practical simulation model than the other model, but when manufacturing manager needs a rough result within a short period the CAN-WIP model is very effective. 4.

Conclusion

In this research we have proposed a four-step generative performance evaluation model for manufacturing system using CAN-WIP, RAM, LCC and simulation using AutoMod. In step 1, a static model is proposed to find an initial system configuration to meet the required production rate under the assumption of no failures and repairs. In the second and third step, we have developed a computer program for RAM and LCC model to consider the system availability and life cycle cost in system performance evaluation. We used this as optimality criteria for system alternatives on the basis of either cost or availability. In the fourth step, a simulation model was developed to consider a variety of real world factors for the system performance evaluation. A sample problem is run by using

39 the proposed four-step model. In the first step, the system availability is given as one (system failures are not considered). CAN-WIP model is more simple and easier to use than the others but somewhat but it results in over-estimates the system performance more than AutoMod model does. We developed computer programs to support the model and applied to a sample problem. The results show that the proposed method is powerful enough to find the best system configuration and also maintenance policies to meet the required production rate. References 1. A. M. Smith, Reliability-Centered Maintenance, McGraw-Hill Corporation, (1993). 2. AutoMod Corp., AutoMod User Manual 1-4, (1998). 3. J. A. Buzacott, Stochastic Models of Manufacturing System, Prentice-Hall International, (1993). 4. J. A. Chisman, Discrete Simulation Modeling to Study Large-Scale System Reliability/Availability, Computers and Operation Research 25, 169-174 (1998). 5. L. H. Crow, Reliability Analysis for Complex Repairable Systems, Reliability and Biometry, eds. F. Proschan and J. Sufling, Philadelphia, SIAM, 379-410 (1994). 6. H. S. Hwang, A Model for Army Life Cycle cost Analysis; ALIA Model, ADD Rep., 1-150(1986). 7. H. S. Hwang, A Performance Analysis of Transporters for Order Picking Warehouse Design, International Journal of Industrial Engineering 11(1), 1-8 (2004). 8. H. S. Hwang, An Integrated Material Handling System Design for Better Performance Planning, Journal of the Korean Society of Maintenance Engineers 7(1), 129-145 (2002). 9. W. G. Ireson, and C. F. Combats, Jr. (Editors) Handbook of Reliability and Management, McGraw- Hill Book Co., Inc., New York, (1988). 10. H. Kumamoto, K. Tanaka and K. Inoue, Efficient Evaluation of System Reliability by Monte Carlo Method, IEEE Transactions on Reliability 26, 311-345(1997). 11. K. B. Misra, Reliability Analysis and Production, Elsevier Science Publishers, Amsterdam, Netherlands, (1992). 12. N. Nagarur, Some Performance Measures of Flexible Manufacturing System, International Journal of Production Research 30(4), 799-809 (1992). 13. J. Solberg, Analysis of Flow Control in Alternative Manufacturing and Configurations, Journal of Dynamic Systems, Measurement and Control 12, 21-26(1980). 14. M. L. Vineyard, T. R. Meredith, Effect of Maintenance Policies on FMS Failures, International Journal of Production Research 30(11), 2647-2658 (1992).

RELIABILITY ANALYSIS OF A N E T W O R K SERVER S Y S T E M W I T H ILLEGAL ACCESS

M. I M A I Z U M I College of Business Administration, Aichi Gakusen University, 1 Shiotori, Ohike-cho, Toyota 47'1-8532, Japan M. K I M U R A Department of International Culture Studies, Gifu City Women's College, 7-1 Hitoichibakita-machi, Gifu 501-0192, Japan K. YASUI Faculty

of Management and Information Science, Aichi Institute of Technology, 1247 Yachigusa, Yagusa-cho, Toyota 4-70-0392, Japan

As the Internet has been greatly developed, the demand for improvement of the reliability of the Internet has increased. Recently, there exists a problem in the illegal access which attacks a server intentionally. In particular, DoS (Denial of Service) attack which sends a huge number of packets has been a serious problem. In order to cope with this problem, IDS (Intrusion Detection System) has been widely used. This paper formulates a stochastic model for a server system with illegal access. A server has the function of IDS. The mean time and the expected monitoring number until a server system becomes faulty are derived. Further, an optimal policy which minimizes the expected cost is discussed. Finally, a numerical example is given.

1. Introduction As computer systems have been widely used, the Internet has been greatly developed and rapidly spread on all over the world. Recently, the Internet plays an important role as the infrastructure in the information society, and the demands for improvement of the reliability and security of the Internet have increased. Although various services are performed on the Internet, there exists 40

41 a problem in the illegal access which attacks a server intentionally. In particular, DoS (Denial of Service) attack which sends a huge number of packets has been a serious problem. In order to cope with this problem, several schemes have been considered 1 ' 2 ' 3 . As one of schemes to minimize damages by DoS attacks, IDS (Intrusion Detection System) has been widely used. IDS can detect illegal access by monitoring packets which flow on the network. IDS judges an abnormal condition by comparing packets which flow on the network to the pattern of illegal access registered in advance or by analyzing statistically. This paper formulates a stochastic model for a server system with illegal access. A server has the function of IDS. The mean time and the expected monitoring number until a server system becomes faulty are derived. Further, an optimal policy which minimizes the expected cost is discussed. Finally, a numerical example is given. 2. Model We pay attention to only a server which is connected with the Internet. (1) Illegal access occurs according to an exponential distribution 1 — e~at with finite mean 1/a. If the duration time X of illegal access exceeds an upper limit time Y, then a server becomes faulty owing to illegal access, and otherwise, illegal access disappears and it returns to a normal state. This indicates that if the event {X Y} occurs a server becomes faulty. It is assumed that both random variables X and Y are independent, and have exponential distributions, i.e., PT{X < t} = l - e-^ and P r { y
42

(4) When illegal access is detected by a server system, a server system interrupts a service for clients and traces its access. If its tracing has succeeded, a server system deletes its processing. In this case, a server system returns to a normal condition and restarts a service for clients again. This time has a general distribution B(t) with finite mean 1/6. The probability that the tracing of illegal access succeeds is q(0 < q < 1). (5) If a server system fails to trace illegal access and deletes its processing, the refreshment processing is performed and restarts again from the beginning. The time for the refreshment processing has a general distribution V(t) with finite mean 1/v. If illegal access disappears during the refreshment processing, a server system restarts a service for clients. On the other hand, if a server system becomes faulty during the refreshment processing, it stops owing to system down. First, we define the following states of occurrence or disappearance of illegal access: State 0: Illegal access disappears and a server is in a normal condition. S t a t e 1: Illegal access occurs. State 2: A server becomes faulty. Let Qij(t)(i = 0,1; j = 0,1,2) be one-step transition probabilities of a Markov renewal process 4 . Then, we have Qoi(t) = l-e-a\

(1)

Qio(«) = / Pe-{P+X)udu, Jo

(2)

Q12(t) = f Xe-^+^du. (3) Jo From reference 5 , the transition probabilities Poj(t) {j = 0,1,2) that it is in state j at time t when a server is in state 0 at time 0 are given by the following equations: Poo(i) = — ! — P + A -

72

) e - ^ - (/? + A - ^ e ^ } ,

(4)

7i " 7 2

Poi(t) = . — ? — ( e - ^ * - e - ^ ) ,

(5)

7i ~ 7 2

where 7 l = (l/2)[(a + /3 +A) + y/(a + /J + A)2 - 4 a A ] , 7 2 = (l/2)[(a + P + A) - ^ ( a + (5 + A)2 - 4aA].

43

Next, we define the following states of a server system: State 3: A server system begins to operate. State 4: A server system starts or restarts a service for clients. State 5: Illegal access is detected, and the tracing of illegal access starts. State F: A server system becomes faulty. The states of a server system defined above form a Markov renewal process. Transition diagram between states of a server system are shown in Figure 1.

Figure 1.

Transition diagram between states of a server system.

Let (j>(s) be the Laplace-Stieltjes (LS) transform of any function $(<), i.e., (s) = J0°° e~ s t d$(i) for Re(s) > 0. The LS transforms of one-step transition probabilities Qij (t)(i = 3,4,5; j = 3,4,5, F) of a Markov renewal process are given by the following equations:

<734(s) = a{s),

(6)

sT

(7)

sT

(8)

qu{s) = e- P 0 0 (T), 945 (s) = e- P 0 1 (T), ?4F(S)

= /

e-std{J2{Qoi(t)

Jo

* Qio(*)] (fc_1) * Qoi(t) * Qn(t)},

q53{s) = (1 - q)b(s + f3 + X)v{s + /? + A), "754 ( s ) =

(9)

fe=i

P s+ P+X

(10)

[1 - b(s + p + A)] + qb(s + /3 + X)

P [l-v{s +(1 - q)b{s + (3 + A) s+ p+X

+ P + \)\,

(11)

44

9 5 F ( S ) = s + f3 + x[l-b(s

+ (3 + \)} + (l-q)b(s

X [l-v(s S + /3 + A

+ (3 + \)

+ (3 + \)},

(12)

where the asterisk mark denotes the Stieltjes convolution and <1>W(£) denotes the i-fold Stieltjes convolution of a distribution $(£) with itself. We derive the mean time £3F until a server system becomes faulty. Let hzF{t) be the LS transform of the first-passage time distribution from state 3 to state F. Then, we have h

, x=

q34(s)qiF(s)

+
, _

1 - £?44(s) - Q45(s)g54(s) - 945(«)953(s)934(s) '

Hence, the mean time (,3F until a server system becomes faulty is given by ,

,.

dh3F{s)

J{T)

where I(T) = Iie-^T

+

I2e-^T,

J(T) = - + Jxe~^T + J2e~^T a _ 7 l - ( q Z ) + /? + A) li =

+ —— / ( 7 ie~ 7 2 t - 7ae" 7 l t )dt, 7 1 - 7 2 Jo

,

71 - 7 2 T

_ {aD + P + A) - 72

i2 =

, 71 - 7 2

_(-l)[7l-(j9+A)]-a(g-lj>)

J1 =

,

7i - 7 2

(-!)[(/? + A ) - 7 2 ] + <*(£-££>) 7i - 7 2 £> = 953(0) + 954(0),

E = -q53(0) - Poi(T)[q'53(0) + 954(0) + a

9SF(0)]-

Next, we derive the expected monitoring number M4 until a server system becomes faulty. Let M44(t) be the expected monitoring number until a server system becomes faulty in an interval (0, £]. Then, we have Mu(t)

= [Q44{t) + Q45(t) * Q54(t) + Q45(t) * Q53(t) * Q34(t)} * [1 +

Mu(t)}. (15)

45

Thus, the expected monitoring number is given by I(T) M 4 = t limM 4 4 ( f ) = r - ^ y .

(16)

Further, we derive the expected refreshment number M3 until a server system becomes faulty. Let M33(t) be the expected refreshment number until a server system becomes faulty in an interval (0, t]. Then, we have M33(t) = Q34(t)*M43(t),

(17)

M43(t) = Qu(t) * M43(t) + Q 45 (i) * M53{t),

(18)

M53(t) = QM{t) * M43{t) + Q53(t)[l + M33(t)}.

(19)

Thus, the expected refreshment number is given by M3 = t limM3 3 (£) = ^

^

,

(20)

where G = 953(0). Moreover, we derive the expected interruption number M 5 until a server system becomes faulty. Let Mc,§(t) be the expected interruption number until a server system becomes faulty in an interval (0, t]. Then, we have M45(t) = Q44(t) * M45(t) + Q 45 (t) * [1 + M 55 (t)],

(21)

M 5S (t) = Q54W * M 45 (t) + Q53(*) * Quit) * M45{t).

(22)

Thus, the expected interruption number is given by

W

» 5 & ^ = r|-

(23)

3. Optimal Policy We obtain the expected cost and discuss an optimal policy which minimizes it: Let c 4 be the cost for monitoring, c 3 be the cost for a refreshment, C5 be the cost for an interruption and CQ (CQ > c3 > C5 > c4) be the cost for system failure. Then, we define the expected cost C(T) until a server system becomes faulty as n(rr, C{)

_ c4M4 + c3M3 + c5M5 + cp _ cp - (op - c4)I(T) + (c 3 G + c 5 )Poi(T) =

JTF

"

W)

(24) We seek an optimal time T* which minimizes C(T). Differentiating equation (24) with respect to T and setting it equal to zero, we have £(T) = cp - c 4 C I(T)J'(T)-I'(T)J(T)- -^[J'(T)P01(T)-J(T)P^(T)} c0 ' (25)

•

46

Denoting the left-hand side of equation (25) by L(T), we have

m

=

aE + l-aajl-D) „ £ ? + ! + (*£**)(*)• 1

\ en—C4

(26)

'\a'

(I"(T)J'(T)-I'(T)J"(T) T

„^ _

[ ]

\

I -^[J'CnP&jT)

- J"{T)PU(T)\ J

L (1 ) —

o

~I(T)J'(T)-I'(T)J(T) -c£^[J'(T)Poi(T)

-

(27)

J(T)P^(T)}

Hence, when I"(T)J'(T) - I'(T)J"{T) > [(c3G + c 5 )/(co - c4)] [ J ' ( T ) P ^ ( r ) - . / " ( T ^ P ^ T ) ] then L(T) is strictly increasing in T from i ( 0 ) . Thus, we can characterize the optimal policy as follows: (i) If L(0) > (c0 - c 4 )/c 0 , then T* = 0. (ii) If L(0) < (co — Ci)/cQ < L(oo), then there exists a finite and unique optimal T* which satisfies equation (25). (iii) If L(oo) < (CQ - Ci)/co, then T* = oo.

4. Numerical Example We compute numerically the optimal T* from (25): Suppose that B{t) — 1 - e~bt, V(t) = 1 - e~vt and the mean time 1/a for the initial processing is a unit of time in order to investigate the relative tendency of performance measures. It is assumed that the mean time of illegal access interval is (l/a)/(l/a) = 60, the mean duration of illegal access occurrence is (l//3)/(l/a) = 10, the mean upper limit duration of illegal access occurrence is (l/A)/(l/a) = 60 ~ 600, the mean time for tracing of illegal access and deletion of its processing is ( l / 6 ) / ( l / a ) = 1, the mean time for the refreshment processing is (l/v)/{l/a) = 5, and the probability that the tracing of illegal access succeeds is q = 0.5,0.9. Further, the cost CQ for system failure is a unit of cost, the cost rate of a refreshment is C3/C0 = 10~ 2 , the cost rate of an interruption is C^/CQ = 1 0 - 3 and the cost rate of monitoring is c 4 /c 0 = l ~ 1 0 ( x l 0 - 4 ) . Table 1 gives the optimal time T* which minimizes the expected cost. This indicates that T* decreases with q, however, increases with ( l / A ) / ( l / a ) and c4/c0. When c4/c0 and ( l / A ) / ( l / a ) are large, T* = 00. In this case, it is optimal not to monitor.

47 Table 1. Optimal time T*. 1

0.5

0.9

c4/co (xl0~ 4 )

(l/A)/(l/a)

60 120 180 300 600 60 120 180 300 600

1 1.0 1.5 2.0 3.0 9.5 1.0 1.5 1.9 2.9 8.5

2 1.4 2.2 2.9 4.4

5 2.4 3.6 4.9 8.0

17.6

OO

1.4 2.2 2.8 4.3

2.4 3.6 4.9 7.8 oo

15.0

10 3.5 5.5 7.7 13.8

oo 3.5 5.5 7.6 13.4

oo

5. C o n c l u s i o n s We have investigated the stochastic model for a server system with illegal access, and have discussed the optimal policy which minimizes t h e expected cost until a server system becomes faulty. Prom the numerical example, we have shown t h a t t h e optimal time for monitoring decreases with the probability t h a t the tracing of illegal access succeeds, however, increases with the mean upper limit duration of illegal access occurrence and the cost for monitoring, Further, when the m e a n upper limit duration of illegal access occurrence and t h e cost for monitoring are large, it is optimal not to monitor. It would be very important to evaluate and improve t h e reliability of a server system with illegal access. T h e results derived in this paper would be applied in practical fields by making some suitable modification and extensions. Further studies for such subject would be expected. References 1. Information Security Handbook, Ohmsha, 354 (2004). 2. Y. Takei, K. Ohta, N. Kato and Y. Nemoto "Detecting and Tracing Illegal Access by using Traffic Pattern Matching Technique", Trans. IEICE of Japan, J84-B, 1464-1473 (2001). 3. R. Aburakawa, K. Ohta, N. Kato and Y. Nemoto, "An Early Warning System for Illegal Access based on Distributed Network Monitoring", Trans. IEICE of Japan, J86-B, 410-418 (2003). 4. S. Osaki, Applied Stochastic System Modeling, Springer-Verlag, Berlin (1992). 5. K. Yasui, T. Nakagawa and H. Sandoh, "Reliability Models in Data Communication Systems", in Stochastic Models in Reliability and Maintenance, ed. S. Osaki, Springer-Verlag, Berlin, 281-301 (2002).

RELIABILITY ANALYSIS OF A REPAIRABLE SYSTEM WITH WARM STANDBYS AND IMPERFECT COVERAGE JYH-BIN ICE, KUO-HSIUNG WANG, AND JIANG-YIAU DING Department of Applied Mathematics, National Chung-Hsing University, 250 Kuo-Kuang Road, Taichung City, Taichung 402, Taiwan E-mail: jbke@amath. nchu.edu. tw This paper discusses reliability and sensitivity analysis of a repairable system with warm standbys and imperfect coverage. Breakdown times of primary and standby units are assumed to have exponential distribution, and repair times of the broken-down units are also assumed to have exponential distribution. The system fails when the number of primary units is less than K or any one of the broken-down units is not covered. We study the impact of the coverage factor c and other system parameters on the reliability function, the mean and the variance of time to system failure.

1. Introduction This paper studies reliability analysis of a repairable system with warm standby units and imperfect coverage. The system may be impossible to switch in an existing spare module and then recover from a failure. Faults such as these are called to be not covered, and the probability of successful recovery from the failure of an active unit (or standby unit) is denoted by c. Quantity c which including the probabilities of successful detection, location and recovery from a failure is known as the coverage factor or coverage probability (see Trivedi [3]). A standby unit is called a 'warm standby* if its failure rate is nonzero and is less than the failure rate of an active unit. Active and warm standby units can be considered to be repairable. We also assume that the coverage factor is the same for active and standby unit failures. The concept of coverage and its effect on the reliability and/or availability model of a repairable system has been introduced by several authors such as Arnold [1], Dugan [2], Trivedi [3] and etc. Under the assumption of perfect coverage, Wang and Pearn [4] analyzed the series systems with cold standby components and warm standby components, respectively. Both of them assume that the repair time of the server is exponentially distributed. Wang and Kuo [5] investigated the reliability and availability characteristics of four different series system configurations with mixed standby (include cold standby and 48

49

warm standby) components. Recently, Wang, et al. [6] extended Wang and Pearn's [4] paper to investigate the cost benefit analysis of series systems with warm standby components and general repair times. In this paper, we examine the reliability characteristics of a system with M identical primary units operating simultaneously in parallel, W warm standby units and R service stations. The system fails when the number of operating units is less than K or any one of the failed units is not covered. We study the impact of the coverage factor c on the reliability function and the mean time to system failure. The rest of the paper is organized as follows. In section 2, we formulate the problem and provide notations used throughout the paper. In section 3, explicit expressions for reliability function and MTTF are derived using Laplace transform techniques. In section 4, numerical results are presented to illustrate parameter's sensitivity, and conclusions are presented in the last section. 2. Problem Formulation and Notations In this paper, we consider M identical primary units operation simultaneously in parallel, W warm standby units and R service stations. The assumptions of the model are described as follows. Suppose that primary units and warm standby units breakdowns occur independently of the states of other units and follow exponential distributions with parameters A and a (where 0 < a < A, ), respectively. When a unit breaks down, it may be immediately detected, located and replaced with a coverage probability c by a standby if one is available. It is assumed that the replacing time is instantaneous. However, we define the unsafe failure state of the system as any one of the breakdowns isn't covered. This process continues until L+l units of the system breakdown for which we define as the state of safe failure. When a standby unit can replace the broken-down primary unit successfully, its breakdown characteristics become those of a primary unit. If a primary or a standby unit breaks down, it is immediately sent to one of the available service stations where service is performed on the first come first served (FCFS) discipline. It is assumed that each service station can serve only one broken-down unit at a time and that service is independent of the number of units breakdowns. In addition, the time to repair a broken-down unit is exponentially distributed with parameter jU . Once a unit is repaired, it instantly resumes standby status. In this research, system reliability is studied under the assumption that the system fails when either (i) any one of the broken-down units isn't covered, or (ii) the number of primary units is less than K (i.e. L+1=M+W-K units

50

breakdown). Before further developing the model, we first present the notations used in later sections. M W R X a fJ. c Pi (t)

number of primary units in the initial state number of standby units in the initial state number of service stations breakdown rate of a primary unit breakdown rate of a standby unit repair rate of a broken-down unit coverage probability of a broken-down unit probability that there are i units breakdown in the system at time t, where i = 0,1,2,...,/,

PsfiO

probability that there are L+l units breakdown and covered in the

system at time / Puf ( 0 probability that there is any one of the broken-down units not covered P(t) P(0) s

Y MTTF

in the system at time / probability vector with dimension L+3 consisting of all states in the system initial vector of P(t) when t = 0 Laplace transform variable p: (s) Laplace transform of pt (t) P(s) Laplace transform of vector P(t) time to failure of the system RY (t) reliability of the system at time t mean time to system failure

3. Reliability and Sensitivity Analysis At time t = 0, the system commences operation with no broken-down units (including M primary units and W warm standby units) and no busy service stations. That is, the initial conditions for this system are given by P(0) = [l, 0,...,0] r with dimension L+3. The reliability function under exponential failure times, exponential service times and failure coverage probabilities can be developed through Markov chain with absorbing states process. We assume that the states with no standby units have full failure coverage (i.e. c = 1.0) in further machine breakdown. Let pt(t) denote the probability that there are / broken-down units in the system at time t, where / =

51

0,],2,...,L, and / > 0 . In addition, psf(i)

and puf(t)

denote the probability

of states of safe and unsafe failure, respectively. The sample state transition rate diagram of the system is shown in Figure 1. The differential-difference equations governing the state probabilities of this system are:

(la)

^&W=-Vp 0 (0 +fl•/>,«, at

&j& = 4_, • c • Pi_x(t) - ( 4 + ^ • P.(t) + H+1 • pM(t), at \
^

(lb)

= A,, 1 - A , 1 (0-a,. + //,.).Jp,.(0 + A+1-A+1(0, W+\
(lc) (Id)

at — 3 —

= ^L-PL((),

(le)

at w-\

d

^^-{\-c).J ^ = a-c)jXrPi{t\

where

(If)

1=0

dt

_[MX + (W-i)a, '~\(M + W-i)X,

Q
and

Equations (la) to (If) can be written in matrix form as

dP{t) M

at

=Q-P(t),

(2)

52

where P(t) denotes the probability vector at time t and Q is the characteristic matrix of the system containing the system parameters c, A, a, and fj.. The Laplace transform of pt (t) is defined as

pi(s)=[e-s'-pi(t)dt, for i = 0,1 2.....L, sf, uf. Using the Laplace transform, equation (2) is expressed in the domain of s as D(S)P(s)

= P(0),

(3)

where D(s) = s-1 -Q is an (L + 3) x (L + 3) matrix and / is the identity matrix. Solving equation (3) in accordance with Cramer's rule, we obtain the expressions for the last two elements of vector P(s) given by:

p AS) =1±L±M where De(s)

and

p fs) = ^W£)

denotes the determinant of matrix D(s) and NL+2(s)

(4)

,

NL+3(s)

denote the determinant obtained by replacing the (Z,+2)th and (L+3)th column in matrix D(s) by the initial vector P(0), respectively. However, it is too complex to derive the explicit solution of (4). Therefore, we use the computer software MAPLE to obtain the solution. It is easy to see that the determinant De (s) = 0 has double zero roots. Letting s = r, we have

D(r) = r-I-Q.

(5)

We set De(s) to zero and find the corresponding distinct real or complex eigenvalues. Suppose that there are i distinct real eigenvalues (excluding zero), say r p r 2 , . . . , ^ , and j pairs of distinct conjugate complex eigenvalues, say r i+2> ^ + 2 ) ' •••> (ri+j> ri+j) ' w n e r e * andy satisfy i+2j - L+l. Next, we evaluate the numerators NL+2(s) and NL+3(s) in (4). Thus, substituting De(s) and NL+2(s) , NL+i(s) into (4) , the summation of probability of failure states yields

- , x ~ , N J + J

a

n

\h a 1 th + +

M ) M )=— Z-f- £ i^( where a0,ax,,..

s

tts + r,

b.s + c,

'- x'——

s +(r +r )s + r r

l+l M i+r M ,at and b{, c , , . . . , b . ,/=lc. are unknown real numbers.

(6)

53

Let ut and V, represent the real and imaginary parts of the complex eigenvalue rj+l, respectively. Utilizing the inverse Laplace transform in (6), we obtain an explicit expression for Psf(t) + puf{t) = a0 + £ > , • er" + £ b,e "' cos(v,0 + -

-e~"''sin(v,f) (7)

Since in reliability problems the system will be absorbed into either safe or unsafe failure state eventually, we obtain 1-fOO

J

(8)

J

3.1. The Reliability Function Ry (?) Let Y be the random variable representing the time to failure of the system. Since ps, (t) and puf (t) are the probabilities that the system has failed on or before time t, the reliability function is given by ^ ( 0 = 1 - PsfiO - Pufit) = I > , ( 0 ,

t > 0.

(9)

(=0

3.2. The Mean Time to System Failure (MTTF) The mean time to system failure (MTTF) which is always finite is defined as MTTF = E[Y] = lim s->0

RY(t)-e's'dt

\\mRY(s).

(10)

i-»0

To avoid the numerical instability, we can find MTTF by summing up the probability of all the safe states MTTF - lim i->0

2>,w = 1>,(0),

en)

where pj (0) are evaluated in the Laplace domain with 5 = 0. 4. Numerical Results The purpose of our numerical experiments is fourfold. First, the effect of coverage factor c on system reliability RY(t) for two cases is examined. We fix /J = 1, a = 0.005 ,M=6,W=2,R = 2,K=l (i.e. Z,+7=8) and consider the following two cases.

54

Case 1: We choose A — 0.2 and vary the values of c from 1.0 to 0.9. The numerical results are shown in Figures 2. One observes from the figure that the system reliability RY(t) decreases as c decreases. To further study the impact of c, we vary the values of A and c simultaneously and observe the effect on MTTF. The numerical results are presented in Tables 1. One can easily see from Table 1 that MTTF for the case with A = 0.01 drops drastically from 2.1 xlO 1 3 to 144 as c decreases from 1.0 to 0.9. It is also noted that the effect of c on MTTF becomes less significant when the value of A is larger. We also find that the MTTF is not strictly decreasing as A increases for the cases with c = 0.95 and 0.9. The second purpose is to investigate the cross effects of other parameters on MTTF. We fix X = 0.2 , a - 0.005 and consider the following four cases. Case 2: We choose M= 6, W= 2, R = 2, and vary the values of c and fJ.. Case 3: We choose M= 6, W= 2,c = 0.99, and vary the values of R and jU . Case 4: We choose W=6,R = 2, fJ. =1.0, and vary the values of Mand c. Case 5: We choose M- 6,R = 2, fJ =1.0, and vary the values of Wand c. The cross effects of c and // on MTTF for case 2 is shown in Figure 3. We observe from Figure 3 that an optimal repair rate jU with maximum MTTF can be found for each curve with fixed coverage factor c. This behavior can be interpreted as that the higher repair rate JU tends to keep the system with states where the leakage rate to unsafe state is high. Imposing the related maximum MTTF to the points of optimal pair (//*, c) as the third index, we construct a curve as shown in Figure 4. From Figure 4, we observe that this curve is increasing and convex downward with increasing maximum MTTF values. The cross effects of R and fJ, on MTTF for case 3 is shown in Table 2. Against our intuition, a system with more service stations and higher repair rate doesn't necessarily have better performance. Similar to the system behavior shown in Figure 3, there also exists an optimal repair rate fJ, with maximum MTTF for cases as R is fixed at some specific values as shown in Figure 5. We observe from Figure 5 that no matter what the value of R is, the MTTF values approach to a constant in all the cases considered as // > 6. Table 3 describes the cross effects of coverage factor c and the number of primary units M on MTTF for case 4. One see from Table 3 that the effects of c become more significant as M increases. We also find that a system with more primary units having better performance is not valid anymore for the cases with c < 0.99.

55 The cross effects of W and c on MTTF for case 5 is shown in Table 4. We observe that the effect of c become more significant as W increases. For the cases with imperfect coverage, a system with more standby units has less MTTF. The peculiar phenomenon can be explained by observing that a system with more standby units has more states leaking to the unsafe failure state which can be seen in Figure 1. It should be noted that the leakage rate of these states is proportional to the value of 1 - c. 5. Conclusions In this paper, we study a system with machines breakdown which may not be fully covered. The numerical results indicate that the performances of this system are quite different from those of a system with perfect coverage. We provide expressions for the system reliability function MTTF. Comparing with the classic formulations, the latter two performance measures can be easily obtained without going through the numerical integration. It is noted that the system with two failure states has the optimal performance in varying repair rate jU when coverage factor or number of service stations are considered. References 1. 2. 3.

4. 5.

6.

T. F. Arnold, The concept of coverage and its effect on the reliability model of a repairable system, IEEE Trans Comput C-22, 251-254 (1973). J. B. Dugan, K. S. Trivedi, Coverage modeling for dependability analysis of fault-tolerant systems, IEEE Trans Comput 38(6), 775-787 (1989). K. S. Trivedi, Probability and Statistics with Reliability, Queueing and Computer Science Applications 2nd Edition, John Wiley & Sons, New York, (2002). K. H. Wang, W. L. Pearn, Cost benefit analysis of series systems with warm standby components, Math Meth Oper. Res. 58, 247-258 (2003). K. H. Wang, C. C. Kuo, Cost and probabilistic analysis of series systems with mixed standby components, Appl Math Modelling 24(12), 957-967 (2000). K. H. Wang, Y. J. Liou, W. L. Pearn, Cost benefit analysis of series systems with warm standby components and general repair times, Math Meth. Oper. Res., (2005).

ENHANCING INTERNET NETWORK RELIABILITY BY INTEGRATED FRAMEWORK OF MULTI-OBJECTIVE GENETIC ALGORITHM & MONTE CARLO SIMULATION DO HOON KIM College of Business Administration, Kyung Hee University, Dongdaemoon-gu, Seoul 130-701, Korea

Hoegi-dong-1,

JONG-RYUL KIM Division of Computer and Information Engineering, Dongseo University San 69-1, Churye-2-dong, Sasang-ku, Busan 617-716, Korea This paper presents a combined framework of Multi-Objective Generic Algorithm (MOGA) and Monte Carlo Simulation (MCS) in order to improve backbone topology by leveraging the Virtual Link (VL) system in an hierarchical Link-State (LS) routing domain. Given that the sound backbone topology structure has a great impact on the overall routing performance in a hierarchical LS domain, the importance of this research is evident. The proposed decision model is to find an optimal configuration of VLs that properly meets two-pronged engineering goals in installing and maintaining VLs: i.e., operational costs and network reliability. The experiment results clearly indicates that it is essential to the effective operations of hierarchical LS routing domain to consider not only engineering aspects but also specific benefits from systematical layout of VLs, thereby presenting the validity of the decision model and MOGA with MCS.

1. Introduction As network size increases, the scalability issues like routing information overflow are emerging as one of the most critical network operation problems. To cope with scalability issues, an ISP (Internet Service Provider) hiring the Link-State (LS) routing protocols can hierarchically divide the logical configuration of its own network (so-called Autonomous System, AS), thereby partitioning the entire Topology DataBase (TDB) into two tiers: multiple local areas (bottom layer TDBs) and a single backbone (top layer TDB). However, this gain comes with some side effects. Setting apart operational inconvenience, hierarchical configuration degrades the routing performance due to limitation of available routes on TDBs. Furthermore, the overall routing performance in a hierarchical LS AS becomes heavily dependent on the backbone performance. Therefore, the reliability of the backbone configuration in its TDB is a 56

57

critical success factor for running hierarchical LS networks ([2], [6], [9], [10], [11], [12]). Good practices for backbone design suggest that stability and redundancy are the most important criteria for the good backbone ([6], [11]). After hierarchy design, the backbone configuration should be enhanced using Virtual Links (VLs). A VL plays a tunneling role between two backbone routers through a local area, and aims at preventing partition of the backbone configuration when some links or nodes fail. Proposed here is an effective augmentation scheme to increase the reliability of the backbone configuration in its TDB. A VL restores hidden information back to the topological database, thereby enhancing redundancy in the backbone TDB and preventing partition of the backbone when some links fail. With given potential VL locations and the original backbone topology, we propose a bi-objective optimal VL placement framework using a Monte-Carlo Simulation (MCS) combined with a Multi-Objective Genetic Algorithm (MOGA). The framework explicitly evaluates the benefit side as well as the cost side of VL configuration, and fully leverages the trade-off between both sides. 2. Network Reliability Decision Model 2.1. VL System in Hierarchical LS Routing In LS routing protocol, routers exchange their piecemeal topological information and construct a map (TDB) representing the overall network topology. As the network size increases, the volume of these transactions and TDB size grow exponentially. So does the resource consumption for routing management and control. Because of this scalability issues, large LS networks are partitioned into relatively small local areas within each of which routing information exchange among member routers is limited. In order to maintain the entire connectivity (i.e., inter-area routing), a contiguous backbone connecting local areas should be constructed. That is, all backbone routers should be connected to other backbone routers through backbone links and ensure direct connection of each local area ([10], [11], [12]). Therefore, backbone is extremely important in hierarchically configured LS network. Good practices for backbone design suggest that stability and redundancy are the most important criteria for the good backbone ([6], [10], [12]). After hierarchy design, enhancing the backbone configuration in terms of connectivity should be accomplished ([10], [12]). This augmentation, which can be realized by introducing redundancy with VLs, aims at preventing partition of the backbone when some links fail. Accepted LS network design practices indicate that the use of VLs should be

58 considered for a backbone poorly designed as a result of unbalanced hierarchical configuration ([12]). A VL can be configured between separate gate routers that touch the backbone from each side and have a common area. Then the VL acts in a similar way as in a tunnel; i.e., a VL creates a path between two gate nodes by using non-backbone links. The stability of a VL is dependent on the stability of the area it traverses, and the reliability measure (cost metric) of a VL can be defined as that of the weakest link of the links constituting the VL. Also, the amount of the effort required to configure and maintain a VL is proportional to its length (the number of routers on the path corresponding to the VL). Even though it is desirable to configure sufficient VLs to build back-up links, additional VL configurations required for this purpose put lots of burden on the backbone and increase the complexity of the system operation due to the characteristics of the VLs. Therefore, configuring all the potential VLs may lead to an inefficient augmentation of the backbone connectivity. In the following sections, we will propose an efficient VL configuration model which fully leverages the trade-off between the benefits and costs of configuring VLs and solution methods to this model. 2.2. Decision Model of VL Configuration for Network Reliability The survivable network design approaches and solution methods ([4]) are no longer applicable to the most practical situations since we cannot assume that connectivity requirements of backbone routers are given or available VLs cannot fulfill the given requirements. For these situations where connectivity requirements have no meaning, the single objective decision model, such as minimizing total cost, should be extended in order to explicitly evaluate the benefits and costs of configuring VLs and to achieve an optimal balance between benefits and costs. Accordingly, the issue of configuring VLs should be multi-objective decision model looking for an augmented backbone topology which maximizes the net gain (= total benefit - total costs) by setting-up VLs. The decision problem in this paper is to determine an optimal configuration of VLs with a given set of potential VL locations and the original backbone topology. In this decision context, a decision alternative is an augmented graph from the original backbone topology by choosing some VLs (decision variables). Thus, in general, there exist finite but many decision alternatives. We will evaluate a decision alternative in terms of two attributes: the connectivity level of the augmented graph and the cost for configuring VLs in the augmented graph. Specifically, compensatory preference (so-called utility function of some attributes) approach

59 will be employed for this bi-criteria decision making problem in order to explicitly reflect the trade-off between attributes and to resolve commensurability issues for comparing heterogeneous attributes. In this paper, we impose a minimum level of restrictions on the form of utility functions so that the network administrator can have flexibility enough to develop unique utility functions customized to their own situations. We are given the original topology of the backbone that can be represented as a simple, undirected graph G = (V, E) where V and E represent the set of backbone routers and the set of backbone links, respectively. Also, given is A, the pre-defined set of potential VL locations (the set of decision variables) with assigned installation cost we(eeA) which is proportional to the length of VL e. Let GQ = G(V, E u Q) denote an augmented topology by Q ( c A), which corresponds to a decision alternative. Note that GQ may not be a simple graph any longer. The cost vector of configuring VLs in Q is defined as w = (w\, ..., W\Q\). And as a surrogate measure of benefits, this study focuses on network reliability, which provides more flexible and concrete metrics than network survivability in general. In particular, we consider all terminal network reliability, defined as the probability that every pair of backbone nodes can communicate with each other. More formal description of all terminal network reliability is given as follows. r = £ a F L r Pe nee(EuQ)\r (1 ~Pe)

(1)

where pe means the probability that link e is in operations, and Q and T stand for the sets of all operating states in GQ and operating links, respectively. After determining Q from A, r and w represent two attributes of an alternative GQ. To complete the decision model for optimal configuration of VLs, utility functions for attributes and total utility function are employed. First, we introduce UK(GQ) and UW(GQ), each of which represents the network administrator's normalized evaluations of benefits from reliability level r of GQ and costs to install and maintain \Q\ VLs, respectively. That is, £/»{-) can be interpreted as disutility of costs incurred by VL configuration, and is assumed to be UW(GQ) ~ Z ^ e we / A where A = £ ee/( we. Both UR{-) and U»{-) are monotonic non-decreasing function of the connectivity level and the sum of lengths of VLs in Q, respectively. Finally, the network administrator's total utility is described as an additive form of U(GQ) = a UR(GQ) + P UH{GQ) where a and p are weight coefficients. [OVLC] Maximize e&4 U{GQ) = a UR(GQ) + p Vrf,Ge)

(2)

60

3. Solution Method: Multi-Objective Genetic Algorithm with Monte Carlo Simulation (MOGA/MCS) 3.1. MOGA with MCS [5] demonstrates that [OVLC] is not an easy problem by presenting a counter example that shows a naive solution method based on component-by-component comparisons of decision variables could not succeed. This example implies that we may have no choice but to examine 2|A| decision alternatives to find optimal solution in the worst case. However, considering the typical size of the ISP's backbone and the number of potential VL locations, enumerating all the alternatives discourages practical implementation of [OVLC] model. Even though there is an exact algorithm based on a branch-and-bound algorithm (or implicit enumeration), an efficient solution method is requested to solve large real problems within reasonable time. Suggested here is a combined framework of MOGA and MCS which is expected to generate good solutions within reasonable time so that network administrators can try many configurations with this solution method. We first introduce basic design elements for the MOGA/MCS employed to attack [OVLC], followed by an example. 3.1.1. Chromosome Representation, Evaluation, and Selection in MOGA Employed is a simple encoding to just indicate whether each potential VL is selected or not. That is, a chromosome is a binary vector with \A\ elements, each of which takes value 1 if the corresponding VL is chosen to constitute Q and value 0 otherwise. For evaluation, first note that while it is fairly easy to compute associated costs and UW(GQ) through simple inspection, it is not easy to calculate the all terminal network reliability r of GQ since in general, counting operating set in a given network is #-complete problem ([1]). Thus, as for the evaluation criterion, employed here is a sampling scheme to estimate the reliability level of GQ. In particular, we employ MCS in the framework of MOGA. MCS, for which computation time grows only slightly faster than linear with network size, is expected to estimate the network reliability of a decision alternative GQ more accurately than a simple heuristic. Lastly, for the selection criterion, employed is a tournament method which is one of many methods of selection in GAs: i.e., our GA runs a tournament among a few individuals and selects the winner (the one with the best fitness). Selection pressure can be easily adjusted by changing the tournament size. If the tournament size is higher, weak individuals have a smaller chance of being selected. The

61 tournament selection has several benefits: it is efficient to code, works on parallel architectures, and allows the selection pressure to be easily adjusted. 3.1.2. GA Operators and Other GA Control Parameters For crossover operations, employed is uniform crossover operator (or called multipoint crossover), which has been shown to be superior to traditional crossover strategies for combinatorial problem. Uniform crossover first generates a random crossover mask and then exchanges relative genes between parents according to the mask. A crossover mask is simply a binary string with the same size of chromosome. The parity of each bit in the mask determines, for each corresponding bit in an offspring, from which parent it will receive that bit. Mutation is performed as a random perturbation within a permissive range from 0 to 1, which represents whether the corresponding VL is chosen or not. We adjust GA control parameters in the evaluation and selection step to prevent GA from early converging into an inferior solution and to keep not only diversity but also elites (superior chromosomes) developing from one generation to another generation. Furthermore, we also tune up other GA control parameters such as the number of generations, population size, crossover/mutation ratios, etc. to examine their impact on the algorithm performance and convergence speed. Employed stopping rule is to stop the process when the GA finds no further improvement over the last 15 generations within a given number of maximum generations. 3.2. Experimental Test Our solution method finds optimal solutions in all the small size pilot tests where the number of nodes is about 10. Thus, we set the GA control parameter of the number of generations at 100, which is expected to produce at least near optimal solutions (in term of the fitness function) within a short period of time. However, the results from the pilot tests indicate that the convergence speed and performance of the GA will be insensitive to some GA control parameters such as the crossover/mutation ratios and the number of generations. Due to unavailability of studies of the same kind, we could not present experiment results with real data. Instead, we tested the proposed decision model and MOGA/MCS with some randomly generated instances. The costs of implementing/maintaining VLs and the potential locations as well as the available number of VLs are randomly chosen within some ranges. For the utility of network reliability and the disutility of VL costs, employed are typical linear utility functions. Since many literatures on decision analysis (e.g., [3]) provide a lot of utility

62

functions used in practical situations, a network administrator facing the decision problem [OVLC] can consult these literatures to choose the best one for his/her own purpose. Lastly, given functional forms of UR(-) and Un{-), we tried various combinations of weight coefficients in the total utility function £/(-) to see how network administrator's relative preference on the attributes affects the experiment results. Table 1. Experimental test results summarize best outcomes for each utility coefficient combination in MOGA/MCS. 1/4

1

4

9

Best outcome^~----~^_ Cost (avg.)

38.0

28.0

6.9

2.1

0.8

Reliability (avg. %)

98.5

98.0

96.4

Fitness (avg.)

0.946

0.932

0.928

92.9 0.931

92.2 0.969

With the utility model, some sensitivity analysis were conducted. For example, we track down the impact of utility weight ratio (ct/p) changes on the optimal solution structure, and investigate the characteristics of Pareto efficient frontier. The impact of utility weight ratio on the selection of best solutions looks strong. Furthermore, its effects on the solution structure are quite different; that is, the weight ratio renders the algorithm to search different region of the solution space, thereby generating diverse competing chromosomes. In the mean time, the weight ratio does not seem to work differently on the final value of the fitness function. This behavioral pattern confirms our intuition on the role of weight coefficients in finding competitive non-dominating solutions. These observations imply the need to pay more attention to selection of weight coefficients and suggest to try different combinations of weight coefficients as in our experiments. 0.98 0.96 XWO A

• a/b-1/9 o a/b-1/4 Aa/b-1 o a/b-4 Xa/b=9

ewe

0.9 0.88 0.86 0.84 30 Cost

Figure 2. Approximation of Pareto efficient frontier.

63

Throughout our experimental results together with performance evaluation, the proposed MOGA/MCS has proved effective in producing a good approximation of Pareto efficient frontier by generating a set of non-dominated solutions. 4. Concluding Remarks and Future Research Directions In this paper, we proposed a framework for enhancing the backbone configuration of ISPs' hierarchical LS AS. Focusing on VL installation, proposed is an efficient augmentation scheme of increasing redundancy of the original backbone topology. The proposed framework, for example, can be employed to solve short-term network connectivity crisis. Even though there have been lots of network reliability applications so far (for e.g., [1], [5], etc.), few models and solution methods have been reported with a multiple criteria decision model on network reliability like ours. In the future research, we will monitor the initial and final population after running over a certain number of epochs together with a movement of the Pareto frontier with successive epochs to observe whether the MOGA produced considerable improvements over the initial population. In addition to the convergence check, we will also evaluate the multi-tribal and Pareto ranking approach to speed up the MOGA/MCS process. References 1.

M. O. Ball, C. J. Colbourn and J.S. Provan, Network Reliability in Network Models (Eds: Ball, M.O. et al.), North-Holland, (1995). 2. J. Behrens and J. J. Garcia-Luna-Aceves, Proc. IEEEINFOCOM2, (1998). 3. R. T. Clemen, Making Hard Decisions: an Introduction to Decision Analysis, Duxbury Press, (1996). 4. M. Grotschel, C.L. Monma and M. Stoer, Design of Survivable Networks, in Network Models (Eds: M.O. Ball, et al.), North-Holland, (1995). 5. D. Kim, International Journal ofIndustrial Engineering 10,4 (2003). 6. D. Kim and D. W. Tcha, Telecommunication Systems 15,4 (2000). 7. J. R. Kim and M. Gen, Proceedings of the 1999 Congress on Evolutionary Computation 3, (1999). 8. G. G. Yen and H. Lu, IEEE Transactions on Evolutionary Computation 7, 3 (2003). 9. A. Martey and S. Sturgess, IS-ISNetwork Design Solutions, Cisco Press, (2002). 10. J. T. Moy, OSPF: Anatomy of an Internet Routing Protocol, Addison Wesley, (1998). 11. W. R. Parkhurst, Cisco OSPF Command and Configuration Handbook, Cisco Press, (2002). 12. T. Thomas and M. Thomas II, OSPF Network Design Solutions, Cisco Press, (2003).

IN HEURISTICS FOR THE ACCESS NETWORK DESIGN PROBLEM IN UMTS MOBILE COMMUNICATION NETWORKS

Department of Management

HU-GON KIM Information Systems, Kyungsung

Dae-Yeon Dong, Nam-Gu, Busan, 608-736, Tel: +82-51-620-4453,

e-mail:

University,

Korea,

[email protected]

CHUN-HYUN PAIK Department of Information and Industrial Engineering, Dongeui Busan, Korea (e-mail: [email protected])

University,

YONG-JOO CHUNG Department of Internet Business Engineering, Pusan University of Foreign Busan, Korea (e-mail: [email protected])

Studies,

This study deals with the access network design problem in universal mobile telecommunication systems (UMTS) networks. We provide a mathematical formulation of the problem with constraints on RNC and node-B capacities, along with a lower bounding method. We also develop a heuristic algorithm with two different initial solution methods designed to strengthen the solution quality, and demonstrate the computational efficacy of these procedures with several test problems.

1. Introduction During the past decades, the mobile telecommunications industry has experienced significant growth in both the number of subscribers and technology. Existing second-generation (2G) wireless networks are expected to be replaced by third-generation (3G) wireless networks, such as the universal mobile telecommunication system (UMTS), to support a wide range of services, including voice, data, and multimedia. The UMTS network can generally be divided into two main subnetworks: the core network and the access network. In scope, this article focuses on to the design problem of the access network. The main purpose of the access network is to provide a connection between handsets (or mobile stations) and the core network and to isolate those issues associated with the radio network from those of the core network. The access 64

65 network consists of two types of nodes: radio base stations (node-B) and radio network controllers (RNC). The design process of access networks can be divided into wireless and wireline parts. The main objective in the design of the wireless part is to determine the location and service coverage of node-Bs (or cells) by considering the traffic intensity and the radio propagation environment (Cheung, 1994). In fact, due to the significant impact of the wireless part on service quality and the utilization of frequency resources, most studies in the literature have focused on this part (Rose, 2001) (and the studies given in their references). The planning of wireline part in access networks is normally done after the planning on the wireless part has been completed (Berruto, 1998). In this paper, we study the access network design problem focusing on the wireline parts of the network. Given the locations of node-Bs, their traffic demands, and the candidate sites (facility centers [FCs]) for locating RNCs, the design objective is to minimize the total cost of constructing the wireline part of the access network, which includes the fixed cost of opening FCs, the equipment cost of RNCs, and the link costs between RNCs and node-Bs, while satisfying the constraints on the capacities of RNCs and FCs. Herein, we consider an FC any network operator's site in which the placement of RNCs is possible, including switching centers where the transmission and switching equipment are located. With the attention confined to studies on the optimization problem having an objective of minimizing the cost of constructing fixed networks (wireline part of the access network and the core network) of mobile communication networks, we highlight three representative studies. Merchant et al. (1995) provided a mathematical problem dealing with constructing an optimal topology of fixed network in the first generation network whose configuration is significantly different from the one in the 3G network. Krishnamachari et al. (2000) dealt with the fixed network design problem and provided meta heuristics-based solution methods such as genetic, simulated annealing, and tabu search algorithms. However, they did not consider the capacity of equipment. In this study, we provide a mathematical formulation of the problem with constraints on RNC and node-B capacities, along with a lower bounding method. We also develop a heuristic algorithm with two different initial solution methods designed to strengthen the solution quality, and then demonstrate the computational efficacy of these procedures with several test problems.

66

2. Mathematical Model and Lower Bound Consider a large geographical service area covered by n cells in which there are two types of traffic demands, CS and PS. Assume that a node-B serves a single cell and must be connected to a RNC by a link, and that there are two factors that determine the capacity of the RNC: CS-type and PS-type, each representing the RNCs capacity for handling CS and PS traffic, respectively. Further assume that all the RNCs have the same capacity and are placed at one of the opened FCs that can accommodate at most a finite number of RNCs. Given the number and locations of candidate FCs and node-Bs, the access network design problem (ANDP) is then defined as one of obtaining the optimal number and location of the opened FCs, the number of RNCs in each of opened FCs, and the configuration of RNCs and node-Bs, while satisfying the constraints on RNC capacity. The objective of ANDP is to minimize the total cost that composed of three factors: the fixed cost of opening FCs, the equipment cost of RNCs, and the link cost between RNCs and node-Bs. We now introduce some notation. / = {1,2,..., m) : the set of node-Bs, J = {1,2,...,«} : the set of candidate FCs R : the maximum capacity of a FC (in terms of the number of RNCs) K = {1,2,..., R} : the index set of RNCs located at FC, E-: CS traffic demand (Erlangs) at node-B \/i e I, Bi: PS traffic demand (bps) at node-B Vi e / E : CS-type RNC capacity (Erlangs), B : PS-type RNC capacity (bps) Fi : thefixedcost of opening a FC V/ e J , C : the equipment cost of a RNC Cjj: the link cost between a node-B and the RNC in the FC \/i e I, V/ e J The decision variables are - Z. is 1 if FC at location 7 is opened, 0 otherwise - y) is 1 if the Ath RNC is located FCj, 0 otherwise - x* is 1 if node-B is connected to the Ath RNC located at FCy, 0 otherwise. We assume that the link cost of connecting a specific node-B to the RNCs is the same if the RNCs are in the same FC, and hereafter the link cost between node-B and FC simply means the cost between the node-B and the RNCs in the FC, if no confusion arises. We now present a mathematical formulation of our access network design problem (ANDP).

(ANDP) min^Fjij +C^fjykj j=\

j=\

k=\

+YZiCv4 i=\

j=\

k=\

0)

67

vk
n

j = \,2,-,n,k

= l,2,...,R

(2)

<E-y)

y = l,2,...,/i,* = l,2,...,J?

(3)

j = \,2,...,n,k = \,2,...,R

(4)

i = l,2,...,m

(5)

E

ZZ-J = 1

i = 1,2,..., m, y = 1,2,...,n, A: = 1,2,...,/? (6) 4>ykj>zj e{0,l} By letting E',=E,IE and B^BJB , the constraint sets (3) and (4) become the packing constraints of a typical bin packing problem (Monachi). Note that given n items, with each item consuming two kinds of resources, and bins with capacity limitations on both resources, the 2-dimensional vector packing problem (2-DVPP) is defined as one of packing all the items into a minimum number of bins (Spieksma, 1994). If we correspond the item in the 2DVPP with a node-B and the bin with a RNC in ANDP, the sub-problem of finding the minimum number of RNCs required to meet all the traffic demand of node-Bs can be considered as a 2-DVPP. Denote this problem with the set node-B / denoted by 2-DVPP(I). Without loss of generality, in the 2-DVPP(I) we can set / = n because £,'<1 and £,'< 1. The problem 2-DVPP is known to be NP-complete and is a sub-problem of our original problem (ANDP). 2DVPP (I) is then also NP-complete, making it difficult to obtain the optimal solution for the problem (ANDP) even for problems of reasonable sizes. For this reason, this paper will focus on developing fast heuristic algorithms for solving the problem (ANDP), instead of developing an exact solution method. Let R * be the optimal solution of problem 2-DVPP(I). Considering each FC can accommodate at the most R NCs, and [/?*//?] is the minimum number of FCs to be opened, a lower bound for the problem (ANDP) can be deduced as follows: n

m

7=1

i=l

where ^(/) is the / th element in the ascending order of F., V / e J and Cm=ramM{Cv). The three terms on the right hand side of equation (7) represent the total cost of opening \R*IR\ FCs, installing R RNCs, and the cost of the connecting links between node-Bs and RNCs, respectively. To obtain a lower bound on problem (ANDP) by equation (7), we need to know R which is an optimal solution of 2-DVPP(I). However, 2-DVPP(I) is

68

known to be NP-complete (Spieksma, 1994). We therefore, introduce RL provided by Spieksma (2003), which is a lower bound on R in order to get a lower bound of problem (ANDP). We obtain the following equation which will be used as the lower bound of problem (ANDP) from this point forward.

3. Solution Methods Because the ANDP is NP complete, it is difficult to develop an exact solution method, even for problems of reasonable size. We therefore aim to develop a heuristic that generates a good approximate solution. Bun ln«ol saulion Method

1

Hun bMUMngLsrafe-tf |

Slop

|

Figure 1. Summary of heuristic algorithm (H)

3.1. Initial Solution Methods The decision variables of the ANDP contain three cost factors: the number and location of opened FCs, the number of RNCs in each opened FC, and the link configurations of RNCs and node-Bs. It is reasonable to obtain an initial solution by focusing on only some of the three parameters, and we therefore develop two cases. The first case focuses on minimizing the link cost between the RNCs and node-Bs, which may yield a good initial solution if the link cost is much higher than the cost associated with the other factors. The second case

69 focuses on minimizing the number of opened FCs. We refer to the methods of getting an initial solution based on the first and second cases as IO-td and IO-bu, respectively. 3.2. Basic Submodules Before describing the entire solution heuristic, we first describe some key submodules of our heuristic, each of which improves the initial solution while maintaining the feasibility of the solution. The detailed explanation of each submodule is omitted due to the lack of space. The summary of our heuristic algorithm (H) is given in Figure 1. 4. Computational Results The fixed cost of opening a FC, Fj, is generated from the uniform distribution of £7(50000,1000000) . We consider two cases for RNC capacity, R = 5 and R = \Q. Both the CS-type and PS-type capacities of an RNC are set at 500. Both types of traffic demands at each node-B are generated independently from a uniform distribution of ^(10,90) . Table 1 provides a summary of the test problems. Table i. Sample data for computational experiments Type

A

B

C

Problem

Al A2 A3 A4 Bl B2 B3 B4 CI C2 C3 C4

n 50 50 50 50 100 100 100 100 100 100 100 100

m 100 200 300 400 400 600 800 1000

400 600 800 1000

F,

R

E,

Bt

C

E

B

U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000) U(50000,1000000)

5 5 5 5 5 5 5 10 10 10 10 10

U(10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U(10,90) U( 10,90) U( 10,90)

11(10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U( 10,90) U(10,90)

10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000

500 500 500 500 500 500 500 500 500 500 500 500

500 500 500 500 500 500 500 500 500 500 500 500

Table 2 shows the computational results for each of the 12 types of problems with the initial solutions obtained by IO-td. Each result was obtained by averaging the results of 10 independent heuristic (H) runs. It is worth noting, as we do in the fifth column of Table 2, that the upper rounding values of the number of open FCs and the minimum number of FCs required to meet all traffic demands (the sixth column of Table 2) \R*IR] are the same in all but one case. The numbers of RNCs in the solution of heuristic (H) and the values of R * are shown in the seventh and eighth columns in Table 2, respectively.

70

The lower bounds on the objective function obtained by equation (15) are listed in the ninth column (LB), and the objective values of the solution given by heuristic (H) are shown in the tenth column (v(H)). As we show in the eleventh column of Table 2, our heuristic (H) with the initial solution method (IO-td) gives the ratio of v(H) to LB as less than 1.2 for all 12 types of problems. Moreover, the computational times are very fast for real-world-sized problems. Table 2. Computational results of heuristic (H) with IO-td Prob.

n

m

R

Opened FCs

Al A2 A3 A4 Bl B2 B3 B4 CI C2 C3 C4

50 50 50 50 100 100 100 100 100 100 100 100

100 200 300 400 400 600 800

5 5 5 5 5 5 5 10 10 10 10 10

3 5 7 9 9 13

1000

400 600 800 1000

IR*/R] R N C s R*

17.2

11 5 7 9 11

2.7 4.6 6.4 8.5 8.7 12.4 16.5 10.3

4.4 6.5 8.7 10.5

11.3 21.8 32.3 42.8 42.7 63.5 84.5 105.4 42.2 63.5 84.8 106.0

10.7 20.6 30.5 40.6 40.8 60.5 80.5 100.0 40.2 60.6 81.1

101

LB

v(H)

250889 461382 666199 902795 881370 1290451 1722287 1586936 643485 969962 1297650 1603348

289982 522278 750336 993545 959580 1401207 1858694 1758905 748814 1088890 1423772 1760089

v(//)

lime

LB 1.16 1.13 1.13 1.10 1.09 1.09 1.08 1.11 1.16 1.12 1.10 1.10

0.06 0.27 1.50 3.03 3.72 12.03 35.02 42.91 2.11 7.36 20.38 53.42

In Table 3, we summarize the computational results obtained by heuristic (H) using the initial solution method IO-bu, and we note that the IO-td method is superior in solution quality and computational times to the IO-bu method for most types of problems, possibly because in our cost structure, the link cost contributes to a larger degree that does the cost of opening FCs. Our computational experience suggests that the heuristic solution procedure proposed herein yields good solutions for the problems. Table3. The computational results of heuristic (H) with IO-bu Prob.

n

m

R

Opened FCs

Al A2 A3 A4 Bl B2 B3 B4 CI C2 C3 C4

50 50 50 50 100 100 100 100 100 100 100 100

100 200 300 400 400 600 800

5 5 5 5 5 5 5 10 10 10 10 10

2.9 5.1 7.4 9.6 9.8

2.7 4.6 6.4 8.5 8.7

14.3 19.4 12.9

12.4 16.5 10.3

5.2 7.9 10 13

4.4 6.5 8.7

1000

400 600 800 1000

\R*/R\ R N C s R*

10.5

10.8 21.0 31.3 41.2 41.4 62.6 85.6 106.9 43.1 64.7 86.0 107.7

10.7 20.6 30.5 40.6 40.8 60.5 80.5 100.0 40.2 60.6 81.1

LB

v(//)

250889 289050 461382 538869 666199 784653 902795 1049273 881370 1025882 1290451 1507193 1722287 2058396 1586936 1900346 643485 774140 969962 1156696 1297650 1506095 101 1603348 1916708

1.16 1.17 1.18 1.16 1.17 1.17 1.19 1.20 1.21 1.20 1.16 1.20

Time (sec) 1.70 1.96 4.90 8.41 9.18 126.99 98.98 45.44 4.51 11.96 26.17 45.43

71 5. Conclusions In this paper, we studied the access network design in UMTS mobile communication networks and formulated the problem of minimizing the total cost of constructing the access network. We show that the problem is NPcomplete, making it difficult to solve optimally even for small problem instances. Thus, we developed a heuristic algorithm based on simple add/drop and move/exchange procedures, along with a lower bounding method. Another important network design factor that is not considered in this study is the coverage of MSCs (mobile switching centers), which will be tackled in the future study. References 1. J. C. S. Cheung, M. A. Beach, and J. P. McGeehan, Network planning for third-generation mobile radio systems, IEEE Communications Magazine 32, 54-9 (1994). 2. E. Berruto, M. Gudmundson, R. Menolascino, W. Mohr, and M. Pizarroso Research activities on UMTS radio interface, network architectures, and planning, IEEE Communications Magazine 36, 82-95 (1998). 3. R. Rose, A smart technique for determining base-station locations in an urban environment, IEEE Transactions on Vehicular Technology 50, 43-7 (2001). 4. A. Merchant and B. Sengupta Assignment of cells to switches in PCS networks, IEEE/ACM Transactions on Networking 3, 521-6 (1995). 5. B. Krishnamachari and S. B. Wicker, Optimization of fixed network design in cellular systems using local search algorithms, presented at 52nd Vehicular Technology Conference Fall 2000. IEEE VTS Fall VTC 2000 (2000). 6. A. Juttner, A. Orban, and Z. Fiala, Two new algorithms for UMTS access network topology design, forthcoming. 7. M. Monaci and P. Toth, A set-covering based heuristic approach for binpacking problems, Working paper. 8. F. C. R. Spieksma, A branch-and-bound algorithm for the two-dimensional vector packing problem, Computers & Operations Research 21, 19-25, (1994).

TWO STAGE BURN-IN POLICY

SHEY-HUEI SHEU Department of Industrial Management, National Taiwan University of Science and Technology, 43 Keelung Road, Section 4, Taipei 106, Taiwan

CHIA-SAN LIN Department of Industrial Management, Lunghwa University of Science and Technology, 300 Wan-Shou Road, Section I, Kueishan Taoyuan 306, Taiwan

Burn in is a common procedure to improve the quality of products after they have been produced, but it is also costly. This paper presents two stage burn-in policy for the assembly of a complex electronic product. A product is assembled by two different items which have a bathtub-shaped failure rate. Item I is performed burn in time bi. The successful Item I is then assembled with item II, and then we perform the second stage burn in for the product, the second stage burn in time equal to T- b,. There are two types of failure possible for the product in the second stage burn in. One is the type I failure (item I failure), which can be removed by a complete repair, and the other is a type II failure (item II failure), which can be removed by a minimal repair. In this article, we present a two stage burn-in policy for the assembled product.

1.

Introduction

Bum-in is a method used to screen out defective items before they are shipped to customers or put into operation in the field. Without bum-in, a number of defective items are inevitably delivered to customers. Their failures result in costly field repairs and the degradation of a product's reputation. The bum-in process is stopped when a pre-assigned reliability goal is achieved, e.g., the mean residual life (MRL) is long enough, or infant-mortality failures are completely eliminated. Since burn-in is usually a costly procedure, one of the major problems related to bum-in is to decide for how long the bum-in procedure should be continued. In the past a lot of papers have appeared which deal with the problem of estimating the optimum burn-in time of items. Weiss and Dishon (1971) first proposed some economic problem related to bum-in programs. Stewart and Johnson (1972) used the Bayesian Decision Theory to determine the optimum burn-in time. Plesser and Field (1977) addressed the cost-optimized bum-in 72

73

model for repairable electronic systems. Park (1985) demonstrated the effect of burn-in on mean residual life. Way Kuo (1984)investigated reliability enhancement through optimal burn-in Whitbeck (1989) addressed the fact that the optimal burn-in at each stage of a component assembly is not always optimal for the final system. Mi (1996) proved some important properties, namely that the optimal burn-in time is no longer than the first change point of the bathtub failure rate. Block (2002) proposed a criterion for burn-in that balances mean residual life and residual variance, and assumed that each item has a bathtub -shaped failure rate. Cha (2003) structured a generalized burn-in model. A large variety of products were produced by an assembly line, the item tat was combine by two or more items is called parent item. The parent item defined in this paper is assembled by two item (item I and item II). This paper investigates two stage burn-in models. Item S is a parent Item. There are two types of failure of parent item S. One is type/item I failure, which can be removed by a complete repair, replace with a successful item I, and the other is type/item II failure, which can be removed by a minimal repair. We assume that items I and II have a bathtub-shaped failure rate, the failure rate of item I is r,(t) and the failure rate of item II is r2(t). First, we perform a stage I burn-in. If item I fails before bum-in time bi then it must be completely repaired (replace it with a new item I as good as new), and burn it in again. The burn-in test is continued until the first item I surviving the burn-in period b, is obtained .The successful item I is then assembled with item II , and then we perform the second stage burn-in for parent item S. There are two types of failure for item S. One is the type I failure (item I failure), which can be removed by a complete repair, which means replacing the failed item I with a successful item I, and the other is the type II failure (item II failure) which can be removed by a minimal repair , of the same quality as the old item II. The second stage burn-in test is continued until the first parent item S surviving the burn-in period b2 (b2= T- b,) is obtained, T is the optimal burn-in time of parent item S that just be performed one stage burn-in procedure. The rest of this paper is organized as follows: in section 2, probability models are formulated, in section 3 cost function models are formulated. And finally in section 4 we draw our conclusions. 2.

Cost Model

2.1

Notations:

X, : random variable, the lifetime of item i F,(t): C.D.F of X, ,=P(X,
74

r s (t): failure rate of parent item S, = fs(t) / Fx(t) Y, : random variable, the failure time of item I passed first stage burn-in time b, G,(t):C.D.FofY, , =P(Y,
Cost Model

As in Mi (1994), we define n,, —1 to be the random variable which is the number of shop complete repairs until the first item I surviving the stage I burn-in is obtained . It is evident that the random variable \\\ has a geometric distribution given by P(n,=k)= Tx(bx)Fxk-x

(b,),

Vk^l

(1)

We formulate the manufacturing cost per item I on the first stage burn-in. Let V| = manufacturing cost + set up cost + burn-in cost, then the expectation of V] is given by: E[V,]=E(r|i-l ) {c 10 +c 11 +c 12 E(X, | X , < b , ) } + c,0 + c„ +c12 b, == V T F\{by)

[c,o+c 11 +c 12 jJ7i'(OA]

(2)

Now let's consider the second stage burn-in and formulate the second stage manufacturing cost Vs for each parent item S . Let Vs = V| + c2o +set up cost + burn-in cost + repair cost It is reasonable to assume that Vi will be the shop complete repair cost of an item S. We define n2 —1 to be another random variable which is the number of shop complete repairs until the first item S surviving the stage II burn-in is obtained. It is evident that the random variable n 2 has a geometric distribution given by P(n2=k) = ~G$bi) Gf"' ( b2), V k ^ l

(3)

The expectation of set-up and bum-in cost during the second stage burn-in is

75 given by E(n2-1 ) {c2l +c22 E(Y , |Y, < b 2 ) +c24) + c20 + c2, + b2c22

1

[c2, +c22 1 2 G, (t)dt +c24] +c20-c24

(4)

Let the random variable N(b, b2) be the total number of minimal repairs of item S during the second stage burn-in time b2 .Using the results presented in Beichelt and Fischer (1980),the conditional expectation of N(b|b 2 ) are given by E[N(b, b2) |Y, < b2] = — ~ \bl \'r2{u)dudGi{t) . G\{b2) " u E[N(b 1 b 2 )|Y 1 >b 2 ]=J 0 i2 r2( M )^ The unconditional expectation of N(b!b2) is then given by E[N(b, b2)]= }"\'Qr2{u)dudG\{t) + \hJ-ri(u)du ~Gi(b$

= ^G1(t)r2(t)dt

(5)

The expected total cost of each item S after the second stage burn-in is given by E[V,]E(n2-l)+E[V1]+c20+=^L-[c21+c22\b02~Gl(t)dt +c24]+c23E[N(b1 b2)] -c24 Gi(b2) == = [c10+c, ,+cI2 {*' Fx (t)dt ]+c20+ = [c21+c22 f*2 G, (t)dt +c24 Jo G\{b2) F\{bx) ° Gi(Z>2) ]+c 2 3 jJ 2 G 1 (0r2(0*-c 2 4

(6)

When bi=0 , c n =0 , c,2=0, the cost function in (6) is a one stage burn-in model. 3. Burn-in Policy For parent item S, let us now consider two types of burn-in policy under the optimal cost model. Policy I: One stage burn-in procedure [3,4,5,6,7,8,9], only the parent item S (or product, system) performs an optimal burn-in time T* (equal to T ) and item 1 does not perform a pre-burn-in. Policy II: Two stage burn-in procedure, item I performs a burn-in time b| first

76

and the successful item 1 is assembled with item II . Then the parent item S performs a second stage burn-in for time b2 (equal to T*- b,). In order to demonstrate the generality of Policy II we address the properties of Item S as follows. Define the random variable Z to be the lifetime of item S. It then is evident that P r (Z
rs(t) = fs(t) I F,(t) = r,(t)+ r2(t).

In this paper we assume that r^t) has a bathtub-shaped failure rate Definition: Let us assume that item i has a bathtub-shaped failure rate rj(t) .If the first change point of r;(t) is tn then the second change point of rj(t) is t i2 . When t < t,, , r,(t) strictly decreases ; when t > ti2 ,r;(t) strictly increases ; and when t,i r,(t2) ^ r,(t n ), r2(t,) ^r 2 (t 2 ) 2: r 2 (t n ) ,we get rs(t 1 )>r s (t 2 )^ rs(tsl) (2) ifts] = max(t|| t2] )=t 2 i, it is evident that r s (ti)>r s (t 2 )^ rs(tsl) b. rs(t) is strictly increasing in the interval [ts2, co]. Let ts2r,(ti)> r,(t,2), r 2 (t 2 )^r 2 (t,)^r 2 (t, 2 ) thenr s (t 2 )>r s (t,)>r s (ti 2 ) (2) if ts2 = min (t, 2 ,t 22 ) = t22.

77 it is evident that: rs(t2) > rs(ti) > rs(t)2) c. rs(t) is the constant in the interval [t5i_ ts2] .It then is easy to proof that r s (t,)=X. 1 +X. 2 ,fort,€[t sl ,t s2 ] Let T* denote the optimal bum-in time for Policy I [3,4,5,6,7,8,9] and Vs, is the cost of each item S for Policy I. From equation (6) and bi =0, c,, =0, b2 = T , G$b2) F\(b$ = F{(b{ +b2), hence we have the expected cost of each item S for Policy I E[V„]= c,0 E[ n2 ]+ c20 + E[ n2 ] [c2l+ c24 + c22 ^ Fx(t)dt +c23 E[N(0, b2)] -c24

= c l o [ ^ ( r * ) ] - | + c20+[^"(r*)]-|[c21+ c 2 4 +c 2 2 £ F 1 ( O ^ ] + C 2 3 E [ N ( 0 T * ) ] - C 2 4

(7)

Suppose that the first change points of Item I and Item II are not the same point, and let's address the case were 0
78

E[Vs2]= [ Fi(bi) ] •' -=-4 GI(T

[ Cio + c,,+ c12 j j F, (t)dt ] +c20 -Z»,)

+ — J , , [c2l+C24+C22 Jo'"''1 G7(/)A ]+c23E[N(b,, T*- b,)] -c24 (8) Gi(r -by) Evaluate the cost variation between E[Vsl] with E[Vs2]: C(b,, T*)=E[V S1 ] - E[Vs2]

= [-c, 1 -c 1 2 j 0 6 | ^(0^][^(r*)] 1 +(c 2 1 +c 2 4 ) [\-Jih)]

C22[£~F\(.0dt- £

'^(b,

[~Fi(bi)]] +

+ 0 ^ ] [ ^ ( O ] " 1 + C 2 3 { E [ N ( 0 , T*)PE[N(b,, T*- b,)]} (9)

It is reasonable to assume that c12=c22 and simplify the third term [ j ; * ~F\(t)dt - ll'"1 Tx{bx + t)dt ]= £" 7\(t)dt, hence we have C(b1T*)=-cll[F1(T*)]-1+(c21+c24)[l-Fi(M)][^(r*)]-1+c23{E[N(0T*)]-E[N(b1T *-b,)]}

(10)

When bi ->0 and b|^0, the second and third term will be equal to zero. A negative extreme will occur: lim C(b,_ T*)=-CH[ J F I (T*) ]"', this means that it is unreasonable to take policy II for bi ->0. On the other hand, the second and third term increase when b| increases . If the results for the optimal burn-in time bx satisfy C( b\, T*)>0 and b{ =T*- t21., then from equation (10) it follows that if c„ < (c2l+c24) [ l - F , ( r * - / 2 1 ) ] + c 2 3 [^(T*)]{E[N(0, T*)]-E[N(T*- t21, t2l)]} (ID then we will take policy II, otherwise we take policy I. 4.

Conclusions

This paper focused on the computation of the bum-in cost of an assembly item

79 for a two stage burn-in policy. The basic principle is that if the value of the optimal burn-in time for a one stage burn-in cost can be determined, it will be possible to compute the burn-in cost of an assembly item for a two stage burn-in policy by conventional method. Based upon this idea we propose an improvement burn-in policy when a parent item is assembled by two items that have bathtub-shaped failure rate and the first change point of the second item failure rate is shorter than item 1 s. The findings of this study can be used by a reliability engineer or analyst as a guide for planning a two-stage optimal burn-in policy for a parent item. References 1. Craig W. Whitbeck, Lawrence M. Leemis, Component vs System Burn-in Techniques for Electronic Equipment. IEEE Trans Reliability 38(2), 206-209(1989). 2. Frank Beichelt, Klaus Fischer, General failure model applied to preventive maintenance policies, IEEE Trans Reliability R-29(l), 39-41 (1980). 3. George H. Weiss., Menachem Dishon, Some Economic Problems Related to Burn-In Programs, IEEE Trans Reliability 3, 190-195 (1971). 4. H. W. Block, T. H. Savits, H. Singh, A Criterion for Burn-In That Balances Mean Residual Life and Residual Variance, Operation Research 50(2), 290-296 (2002). 5. J. Mi, Burn-in and maintenance policies, Applied probability trust 26, 207-221,(1994). 6. J. Mi, Minimizing Some Cost Functions Related to Both Burn-In and Field Use, Operations Research 44(3), 497-500, (1996). 7. J. H. Cha, A further Extension of the Generalized Burn-In Model, Applied Probability Trust 40, 264-270 (2003). 8. K. T. Plesser, T. O. Field, Cost-Optimized Burn-In Duration for Repairable Electronic Systems, IEEE Trans Reliability R-26 (3), 203-209 (1977). 9. K. S. Park, Effect of Burn-In Mean Residual Life, IEEE Trans Reliab R-34(5), 522-523 (1985). 10. L. Stewart, J. D. Johnson, Determining Optimum Burn-In and Replacement Times Using Bayesian Decision Theory, IEEE Trans Reliability R-21(3), 170-175 (1972). 11. D. G. Nguyen, D. N. P. Murthy, Optimal burn-in time to minimize cost for products sold under warranty, HE Trans 14, 167 174 (1982). 12. W. Kuo, Reliability Enhancement Through Optimal Burn-In, Trans Reliability R-33(2), 145-155 (1984). 13. W. Kuo and Y. Kuo, Facing the headaches of early failures: A state-of-the-art review of burn-in decision, Proc. IEEE 71, 1257-1266 (1983).

RELIABILITY ANALYSIS OF A REPAIRABLE SYSTEM WITH STANDBY SWITCHING FAILURES AND REBOOT DELAY KUO-HSIUNG WANG, JYH-WEI CHEN, JYH-BIN KE Department of Applied Mathematics, National Chung-Hsing Taichung 402, TAIWAN, ROC

University,

We study the reliability analysis of a repairable system with M operating units, W warm standby units, and R repairmen in which switching failures and reboot delay are considered. It is assumed that failures in standby units have a significant probability q of a switching failure. Failure times of an operating unit and of a standby unit are assumed to be exponentially distributed with parameters X and a, respectively. Reboot delay times are also assumed to be exponentially distributed with parameter p . The explicit expressions of the reliability characteristics such as the system reliability,^ n\, and the mean time to system failure, (MTTF) are derived. Several cases are analyzed graphically to study the effects of various system parameters on the R (t) and the MTTF.

1. Introduction The reliability of a repairable system with standbys employs an important issue in power plants, manufacturing systems, and industrial systems. We consider the possibility that the switching device will have a failure probability. Reboot delay takes place in this switching process of a standby unit to an operating unit. We aim to study the reliability analysis of a Markovian repairable system with warm standby units where standby switching failures and reboot delay are considered. The system failure for a repairable system is defined as the system fails when the number of the operating units is less than M. Sivazlian and Wang [3] first developed analytical solutions of the M/M/R machine repair problem with warm standbys. Wang et al. [6] extended Sivazlian and Wang's [3] model to the warm-standby M/M/R machine repair problem with balking, reneging and standby switching failures. The concept of the standby switching failures in the reliability with standby system was first introduced by Lewis [2]. Kuo and Zuo [1] introduced many of the system reliability models such as parallel, series, standby, multi-state, maintainable system, etc. Trivedi [4] introduced the concept of reboot delay and its effect on the reliability and/or availability model of a repairable system. Wang and Ke [5] 80

81 studied the probabilistic analysis of a repairable system with warm standbys and the balking and reneging of failed units. We first provide a Laplace transform method to develop the explicit expressions for the RY (t) and the MTTF. Next, we perform a parametric investigation which presents numerical results to analyze the effects of various system parameters on the Ry(t) and the MTTF. Finally, we perform a sensitivity analysis for changes in the Ry (t) and the MTTF along with changes in specific values of the system parameters. 2. Problem Statement We consider a repairable system with M identical units operating simultaneously in parallel, with W warm standbys, R repairmen. In a real-life situation, the switching failures and reboot delay of standby units are considered. Let us assume that failure times of an operating unit and of a standby unit are exponentially distributed with parameters X an a , respectively, where (0< a < A). Suppose that there is a significant probability q of a switching failure during the switching from standby state to operating state. Reboot delay takes place with mean time 1//? for a standby unit which is exponentially distributed. It requires to assume that no other event can take place during a reboot. (i.e./3»A,a,ju). When a standby unit moves into an operating state successfully, its failure characteristics will be that of an operating unit. Whenever an operating unit or a standby unit fails, it is immediately sent to R repairmen where it is repaired in the order of breakdowns, with a time-to-repair which is exponentially distributed with parameter//. If one operating unit or standby unit is in repair, then arriving failed units must wait in the queue until the repairman is available. Let us assume that failed units arriving at the repairman form a single waiting line. Suppose that the repairmen can repair only one failed unit at a time, and that the repair is independent of the failure of the units. 3. Reliability analysis of the System At time t = 0, the system has just started operation with no failed units. Let PM, (t) = probability of safe states that at time t there are M operating units and i warm standby units in the system, where i = W, W— 1, ..., 0;

82

PM] .(t) = probability of reboot states and failure state that at time t there are M— 1 operating units and i warm standby units in the system, where i= W, W-\, ...,0. The mean repair rate jJ.n is given by: {n/i if

f- =1 Rju

if

0 The Laplace transforms of Pt P'u(s) = [e-s'Pu(t)dt,

\
W}\

R
otherwise.

(t) are defined as: i = M,M-\,

j = 0, 1, ..., W.

Through the birth-and-death process, we can set up the following Laplace transform expressions for p*. (s): (0), [MX

+ (W- n)a + Mn + s]P*MW_n (s)-(W-n

(1)

+ \)aP*Miy_^ (s)

-Ma»PtM.w-^i(s)-j3PtM_]jr_+i(s) = PM,^„(0),

\
(2)

(s) - f3P\_yx Cs) = PMfi (0),

(3)

-MA{\ - q)P*MW {s) + (s + P)P'M_liW (s) = PM_lw (0),

(4)

{MX + juw+ s)P*Mfi (s) - aP\,

-iMWd-qW^WHs

+ flP'H^s) = W J ° X

\
W

-YjMAqkPM
(6)

n=0

Equations (l)-(6) can be written in the following matrix form

D(s)-Q\s)

= Q(0),

(7)

where D{s) is the coefficient (2W+2) * (2W+2) matrix of vector Q'{s).

Q\s)

is a column vector containing the set of elements [ P*MW{s), P MW-$S) > •••>^*M,O(5)' ?M-W(S)>P*M-I,W-\(S')>

•••> P Vi,oO)]

and

2(0)

is a c o l u m n

83

vector

of

initial

states

containing

the

set

p

of

where

W ° ) > •••> PMM W ° ) > W I ( ° ) > •••> M-,M>

[PUJM

elements the s m b o l s

y

T denotes the transpose. At time / = 0, we set Q(0)=[\ 0, 0, ..., Of. Solving equation (7) in accordance with Cramer's rule, we obtain the expressions for P P,MlM(s) 'L

M L

(s)

= det[Nw-L+l(S)\ det[D(s)]

L = W, W-\,

W-X

..., 0,

(8)

where det[D(s)] denotes the determinant of matrix D(s), and det[Nw_L+l(s)] denotes the determinant obtained by replacing the (W—L+$th column in matrix D(s) by the initial vector 0(0) =[1, 0, 0, ..., 0, Of. An efficient MAPLE computer program is used to evaluate the solutions P

M L

{s) given by b,,

b, ,

s + r,

s + rj

c

-s + dLl

s +(rJ+t+rJ+iys

c -s + d,. M + - +-r_ N ' — , r s +( J+k+rJ+k)-s + rj+k-rj+k where o L 1 , •••, t>Lj,

+ rJ+rrJ+}

1 = 0, 1, 2, ..., W.

(9)

L,2-> dL,2> •••' CL,k> dLk are unknown real numbers, where Kl, r2, ..., r are j real distinct eigenvalues (excluding zero), and (rJ+vrJ+l),

c

(rj+2,7j+2),

..., (rJ+k,7J+/c)

are distinct conjugate

complex eigenvalues. Inverting (9), we can obtainPM L(j) , The reliability function RY (t) and the mean8 time to system failure MTTF are given by w ^ ( 0 = EPM,L(0,

t>Q

(io)

£=0

MTTF =lim[ [Ry(t)-e-"dt]

w =lim[^/>*WL(5)]. 1=0

(11)

84

4. Numerical Results We perform numerical experiments to investigate the effects of various parameters on theRy (t) and the MTTF. We choose M = 6 and TixA = 0.01, a = 0.005, jU = 1.0 and consider the following cases: Case 1: We fix W= 3, R = 2,/?= 2.4, and choose q = 0.0, 0.02, 0.04, 0.06, 0.08,0.1. Case 2: We fix W = 3, R = 2, q = 0.01, and choose /? = 2.4, 24, 240, 2400. Case 3: We fixM= 6, /3 = 2.4,/I = 0.01, 9=0.01, and choose R = 1, 2, 3. Case 4: We fix fl = 2,/? = 2.4, q-=0.01, and choose W=\,2,

3.

The effects of various parameters on i? y (7) are shown in Figures 1-4. We observe from Figure 1 that RY (t) decreases as q increases. We now investigate the effect of reboot delay rate /? on the Ry(f) . It appears from Figure 2 that Ry (t) rarely changes for/?, that is, the corresponding reliability curves for four various values of/7 are almost identical. Intuitively, the reboot delay has very little effect on the system reliability. Figure 3 reveals that RY ( 0 is increased by increasing the number of repairmen R. From Figure 4 we observe that Ry{t) improves dramatically as PF increases from 1 to 3. We now study the cross effects of various parameters on the MTTF. To further examine the impact of switching failure probability q, we vary the parameters A and q simultaneously. As shown in Table 1, we find that the MTTF with A = 0.01 drops from 217,076 to 6,174 as q increases from 0 to 0.1. It is to be noted that the effect of q on the MTTF becomes more significant when A is smaller. Table 2 shows the cross effects of q and the number of operating units M on the MTTF. We observe from Table 2 that the effect of q become more significant as M decreases. Finally, we study the cross effects of q and the number of repairmen R on the MTTF. Table 3 shows that the effects of q become more significant as R increases. The corresponding reliability curves of systems with different values of R are shown in Figure 3. It is clear that the performance of the system can be enhanced several times with the addition of one repairman when q = 0; but this effect deceases when q increases. 5. Conclusions We provide the explicit expressions for the system reliability and the MTTF. Numerical results indicate that the performances of this system are quite different from those of a system without switching failure and reboot delay time. Traditionally, increasing the number of repairmen R and number of warm

85 Table 1. The MTTF for different q and A . (M=6, W=?>, R=2, a =0.005, / / =1.0, / ? =2.4) 1

A =0.01

A =0.02

X =0.03

A =0.05

X =0.1

0.00

217076

18502

4331

722

76

2700

537

66

5528

1801

411

57

3503

1264

0.02

70156

9500

30982

0.04

16334

0.06

322

50

0.08

9651

2362

922

258

44

0.10

6174

1670

695

210

39

Table 2. T h e MTTF for different R=2, A =0.01, a =0.005, //=1.0, /?=2.4) 1

M=6

q

and M.

(W=T>,

M=S

M=4

M=3

M=2

M=\

0.00

217076

408187

871136

2247384

7969676

54955590

0.02

70156

113609

199160

391367

925047

3208194

0.04

30982

46464

74182

129706

263625

744974

0.06

16334

23379

57952

108976

279713

0.08

9651

133788

0.10

6174

35350

13386 8371

19530

30736

55120

11909

18219

31644

Table 3. The MTTF for different W=3,2=0.01,a =0.005,//=1.0, /?=2.4)

q

and

R.

74049

(M=6,

R=l

R=2

R=3

0.00

56160

217076

322338

0.02

27437

70156

87254

0.04

15248

30982

35908

0.06

9293

16334

18234

0.08

6064

9651

10527

0.10

4170

6174

6629

86

Figure 1. System reliability for different switching failure probabilities q

\

J,

l» = 2.4

- V' -

M=6,

W=3,R=2

X'0.01

ifl=1_0

a=0.005,

\

"

q=0.01

I)'24

^

/3'240

-

-

p = 2400

\

/^ "

-

1

!

1

i

i

0.5

1

1.5

2

2.5

i

i

i

Figure 2. System reliability for different reboot delay rates

i

87

Figure 3. System reliability for different numbers of repairmen

Figure 4. System reliability for different numbers of warm standby units

88 standby units W can greatly improve the system performance but this effect decays a lot when switching failure are considered. It should be noted that the impacts of the system parameters are different in magnitude compared to the case when switching failure and reboot delay are not considered. Especially, we observe that the reboot delay parameter ft do affect the reliability but not MTTF of the system.

References 1. 2. 3.

4.

5.

6.

Kuo W. and Zuo, M.J. Optimal Reliability Modeling: Principles and Applications, John Wiley & Sons, New York, (2003). Lewis, E.E. Introduction to Reliability Engineering, 2nd Edition, John Wiley & Sons, New York (1996). Sivazlian, B.D. and Wang, K.-H. (1989). Economic analysis of the M/M/R machine repair problem with warm standbys. Microelectronics and Reliability, 29, 25-35. Trivedi, K.S. Probability and Statistics with Reliability, Queueing and Computer Science Applications, 2nd Edition, John Wiley & Sons, New York, (2002). Wang, K.-H. and Ke, J.-C. (2003). Probabilistic analysis of a repairable system with warm standbys plus balking and reneging. Applied Mathematical Modelling, 27, 327-336. Wang, K.-H., Ke, J.-B., and Ke, J.-C. (2006). Profit analysis of the M/M/R machine repair problem with balking, and standby switching failures. Computers and Operations Research (in pass)

QUANTITATIVE ANALYSIS OF A FAULT T R E E W I T H P R I O R I T Y A N D GATES

T . Y U G E A N D S. Y A N A G I Dept.

of Electrical and Electronic 1-10-20 Hashirirnizu, E-mail:

Engineering, National Yokosuka, 239-8686, [email protected]

Defense JAPAN

Academy,

A method for calculating the exact top event probability of a fault tree with priority AND gates and repeated basic events, is proposed when the minimal cut sets are given. We assume that the basic events occur s-independent, exponentially distributed, and the basic event is non-repairable. First, we obtain the probability of occurrence of the output event of a single priority AND gate by Markov analysis. Then, the top event probability is given by cut set approach and inclusion-exclusion formula. A procedure to obtain the probabilities corresponding to the logical products in the inclusion-exclusion formula is proposed.

1. Introduction Fault tree (FT) is used widely as a tool for quantitative risk assessment. Although obtaining the exact top event probability of FT is an important analysis in the assessments, it is a difficult problem for a reasonably large scale system with complex structure, such as a chemical plant, a nuclear reactor, an airplane and so on. Several types of dynamic behavior in a complex system cause the analysis difficult as well as the scale of the system. These dynamic behaviors include transient recovery, inttermittent error, and sequence dependency 1 . A priority AND gate (PAND) is a typical logic gate to represent the dependency of the event sequence. It is logically equivalent to an AND gate where input events must occur in a specific order for the occurrence of output event. Figure 1 shows a PAND. In Figure 1, the output of the gate is true if both A and B have occured in this order. If both events have not occured or if B occured before A, then the gate does not fire. Figure 2 shows an example of a logic model of a non-repairable two unit redundant system with a switch control. The standby unit is instantaneously switched into operation upon failure of the main unit. This system fails if the main and standby units both fail or the 89

90 failure of two unit system

Figure 1. Priority AND gate. Figure 2. Power supply system with stanby unit.

switch control failed first then the main unit fails. In the later case, the standby unit can not be in use because of the failure of the switch controle. Markov analysis is an alternative modeling technique for such dynamic systems 2 . The difficulty of this analysis is the existence of repeated events. The repeated event is defined as an event which is shared by several gates. If all the input events of a PAND does not appear in the other gates, the output probability of the gate can be derived by constructing a Markov transition diagram in which all possible states of the PAND are considered. However if a PAND shares an input event with another gate, for example, main unit failure in Figure 2, we have to construct a large dimensional Markov transition diagram to obtain the top event probability. Therefore, the construction and calculation of a Markov diagram is tedious and error prone as the number of basic events increases. For the static trees with repeated events, many researchers have proposed efficient algorithms to obtain exact or approximate top event probabilities 4 . The proposed methods are classified roughly into two groups. One approach for this problem is using a factoring method 5 in order to eliminate the repeated event one by one. The other is using a Boolean function. In this approach, the main effort is to find the structural representation of the top event in terms of the basic events. Finding the minimal cut sets is one way of accomplishing this step. Several algorithms 3 to find minimal cut sets are proposed. After all minimal cut sets are enumerated, the inclusion-exclusion method is used to calculate the exact top event probability or its upper and lower bounds. In this paper we present a method for calculating the exact top event

91 probability for a large scale FT containing many PANDs and repeated events when all the cut sets/ordered cut sets are given. In this case the top event is represented by using the cut sets. And the exact top event probability is given by the inclusion-exclusion method. A procedure to obtain the probabilities of logical products in the inclusion-exclusion formula is proposed. This procedure adopts a new method which transforms the logical product composed of a priority AND gate having at least one repeated basic event into the sum of disjoint events which are equivalent to priority AND gates. The output probability of the transformed PAND is given by Markov analysis. Numerical examples show our method works well for complex systems. 2. Preparation 2.1. Acronyms and

Nomenclature

FT : fault tree PAND : priority AND gate minimal cut set : minimal set of basic events that the incidence of all the events in a set directly causes the occurence of top event, minimal ordered cut set : minimal ordered set of basic events such that all the basic events in the set have to occur with a specific order for the occurence of top event. 2.2. ei \i Am Fi(t) C\i C2i n Cu C2i

Notations

i-th basic event, i = 1,2,..., m failure (occurence) rate of ej m dimensional vector of A{ distribution function of e,, = 1 — e~Xit minimal cut set, i = 1,2,... ,ni. ni is the number of minimal cut sets. minimal ordered cut set, i = 1,2,... ,n 2 , n 2 is the number of the sets. = m + n2. event that all the basic events in Cu have occured. i = 1 , 2 , . . . , n \ . event that all the basic events in C2i have occured in their order. i = 1,2,. . . , n 2 . Cii is represerended as < e i , e 2 , . . . ,ej > if the specific order is ei, e 2 , . . . then ej. Ii nonempty subset of { C l l v .. , d n i , < 7 2 i , . . . ,C 2 „ 2 }, i = l , 2 , . . . , 2 n - l . U, fl logical sum and product of sets V, A logical sum and product of events A® B operation of ordered sets. A® B =< e\, e 2 , . . . , e„, e[, e 2 , . . . , e'm > when A =< ei,e2,... ,en >,B =< e'^e^,... ,e'm >

92

" (T^QJ^

^Q^Q

\

Figure 3. Transition diagram for an m inputs PAND.

Pr t {a;} probability of occurrence of event x at time t Pr(t) top event probability at t 2.3.

Assumtions

Following assumtions are made: (1) The basic events are s-independent and have exponential failure distributions. (2) The basic events are non-repairable. (3) The minimal cut sets and the minimal ordered cut sets of the system are known. 3. Output probability of priority A N D gates The output probability of a PAND with m input events is derived in this section. Let e\, e%,. • •, em be the inputs arranged from left to right in this order. And Pi(Arn, t) be the probability that e i , e 2 , . . . , e, have occured in this order before time t. P m (A m , t) is the output probability of the PAND. Figure 3 shows the state transition diagram of the PAND. Here, the number in a circle means e i , e 2 , . . . , e; fail in this order (i > 0), and i = 0 means the initial state. The states and transitions which are irrelevant to the analysis are not presented in this diagram. From Figure 3, the following differencial equations are given. dP0(Am,t) dt dPj (Am: ,*)

dt dPm{A .,*) m dt

m

-^2XiP0(Am,t) 8=1 m

- ^T XiPj(Am,t) + \jPj-i(Am,t), (Am,<)

j = l,...,m- 1

93 The Laplace transform of Pm(Am,t), is derived as follows.

L(Pm(Am,t)), m

when P o (A m ,0) = 1,

m

L(Pm(Am,t)) = flAi f l 7 ^ 7 i=i

i=o

(!)

*

where, a 0 = 0, aj = $^L» -\j> » = 1,2,..., m. Then Pm(Am,t) is given by using Heaviside expansion formula as follows. m

a.j

m

6

Pm(Am,t) = l[XiJ2 i=1

°=°

"

m

J]

(2)

(aj-ai)

4. Top Event Probability When cut sets for an FT is given, its top event can be represented as the union of cut sets of the system. Then,

PT(t) = Pn | V V cii} = £ (-i) |/,|+1 Prt | [i=ij=i

4.1. Classification

J

of logical

»=i

A
(3)

[c jfc e/i

products

In order to obtain the logical products in eq.(3), logical products of two identical events are classified as follows according to the type of AND gates and the existence of repeated basic events.

(i) (ii-a) (ii-b) (iii-a)

C\i A Cij C\i A Cij, C\i Cu A Cij-, Cu C-2i A Cij,Cu

(iii-b)

C2i A C2j, Cu n C2j jt

n Cij = (f> n c2j # cf> n Cij = 4>

Case (i) treats a product of non-PAND and non-PAND. Let Cu {Cii,ei2, • • • ,e, u } and Cij = {eji,ej2,... ,ej t ,}. In this case,

=

w

Pit{Cu A Ci,-} = Prt{A,j} = I I *«(*),

(4)

i=l

where, £>i!j={CiiUC2J'} and Dij is the event that all w elements in Ditj occur. (ii-a) and (ii-b) are products of non-PAND and PAND. Let Cu = {eii,ei2,...,eiu} and C2j = {ej1:ej2,... ,ejv], C2j =< ejUei2,... ,ejv >.

94 (ii-a) considers the case that Cu and C2j don't have a common basic event. In this case, Cu and C2j are independent each other. Prt{Cu A C2j} = Pr t {C H }Pr t {<5 2l } = (f[ Fu(t)\

Pv{Av,t)

(5)

Pv(Av,t) is given in eq.(2). On the other hand, Cu and C2j have at least one common basic event in case (ii-b). When £>i-2 is a difference set, Cu — C-XJ, whose elements are eiei,ek2, • • -,ekw, Pit{Cu A C2j} = Pr t {C 2j }Pr t {C xj \C 2j } = Pr t {Ca i }Prt{D 1 _ 2 } = Pv(Av,t)(f[Fki(t)\

(6)

(iii-a) and (iii-b) are products of PANDs. Let C2i = < eji,e»2,---,e» u > and C2j = < eji,ej2, • • •, ej„ >. C2i D C2j = in case (iii-a). Then, Pr t {C 2i A C2j} = PTt{C2i}Pit{C2j}

= Pu(Au,t)Pv(Av,t)

(7)

In the case of (iii-b), let D i j -(=< e ^ e * ^ , . . .,ekw >) be a subsequence of C2i whose elements also belong to C2j- Dji is defined in the same way. (iii-b) is classified into two groups by Dij, Djj as (iii-b-1) Ditj + Djvi (iii-b-2) Dij = Djti For example, the case of C2i = < ei,e2,e4,es,e8 >, C2j =< e3,e6,e8,e7,e4 > corresponds to (iii-b-1). In this case, Dij —< e±,e% >, Dj,i = < e 8 ,e4 >. On the other hand, if C2j = < e 3 ,e4,e6,e7,e8 >, Dij = Dj%i = < e4.es >. This case corresponds to (iii-b-2). In case of (iii-b-1), C2i and C2j never occur simultaneously. Then Pr t {C' 2i AC' 2j } = 0.

(8)

In case of (iii-b-2), both C2t,C2j have the subsequence corresponding to Dij. Aik and Bjk are defined as subsequences of C2i,C2j as follows, AM =< from the left most event to e^i of C2i > An = < from eki to efc(i+1) of C2i >, I = 1 , . . . , w Bj0 = < from the left most event to e^i of C2j > Bji =< from eki to ek(t+1) of C2j >, I = 1 , . . . , w here, ek(w+\) in Aiw (Bjw) is defined as the right most event of C2i {C2j). Ai0 = < ei,e 2 >,BJO =< e3 > in the above example. As An and Bji are

95 exclusive each other, An A Bji and Prt{Au A Bji} can be represented as follow, An A Bji = \fEf, PvtiAu A Bji} = £ P r t { £ f } X

X

Ef is an ordered events that contains all elements of An and Bji and maintain their orders. These are given by using the algorithm shown in Figure 4. For the above example, 3

Ai0ABj0

= \J -Bo = < e i , e 2 , e 3 > V < ei,e 3 ) e 2 > V < C3,ci,e 2 > . x=l

By using the subsequences, C2iAC2j is transformed into the sum of disjoint ordered sets as follows, C2iAC2j

= (\jES\®ekl®

( \/EA®ek2®,...,®ekw®

f \/^»+l)

i

Finally, the probability of the logical product of (iii-b-2) case is given as follows, Pr*{C2i A C2j} = J2 Prt {G'ij}. (9) l

Since G\j is equivalent to a PAND. Vxt{G\j} is given by eq.(2). Eqs.(4)-(9) give all combinations for the product of two minimal cut sets or minimal ordered cut sets. Furthermore the products composed of more than two sets can be derived from performing similar repetition using eqs.(4)-(9). 5. E x a m p l e s Figure 5 shows the top event probability of FT with 10 minimal cut sets/ordered cut sets and 10 basic events. The number of elements in each minimal cut set/ordered cut set is 3. The tree is equivalent to a reliability model for a circular consecutive 3-out-of-10:F system. Namely, the cut sets/ordered cut sets of the system are defined as {ei,e 2 ,e 3 }, {e2,e3,e4},...,{e 9 ,eio,ei},{eio,ei,e 2 }. The failure rate of basic events is defined as X{ = 0.01 + (i * 0.001), i = 1,2,..., 10. Figure 5 shows 10 graphs subject to the number of PAND. These graphs show the influence of PAND upon the top event probability. The PANDs are arranged consecutively, namely from the left most gate to ith gate are PANDs and (i + l)th to the right most gate are non-PANDs.

96 0. Let Au=< ao, • • • , a s _ i >, Bji=< bo,..- ,bt-i > . 0.8

1. Generate s + 1 figures Zx = dodi • • • ds, (x = 0 , 1 , . . . ) whose § radix is t + 1 and £V di = t. Set | »-6 z = 0.

2S

2. while(Z x ){ k = 0,x++,E? =4>; for i = 0,5 for j = k,k + di '

w

J '

Ef® = a,i;

k+=

0.2

0

50

100

150

200

250

300

i

di,x++;

} Figure 4.

S 0.4 2

Figure 5. Top event probability of F T for a circular consecutive 3-out-of-10:F system. Algorithm for V^Ef.

6. C o n c l u s i o n This paper discussed the top event probability of F T with sequence dependency when minimal cut sets / minimal ordered cut sets are given. Basically, t h e t o p event probability is enumerated by Inclusion-Exclusion method. T h e computation time strongly depends on the number of cut sets in this method. Therefore the analysis strikes a snag of computational difficulty. Another efficient computation method for the exact top event probability and an effective truncation method should be developed for F T s with more minimal cut sets. Another problem is to develop an efficient algorithm to obtain minimal cut s e t s / ordered cut sets for a F T with PANDs. References 1. J. B. Dugan, S. J. Bavuso and M. A. Boyd, Fault trees and sequence dependencies, Proc. of the Reliab. and Maintainab. Sympo., 286-293 (1990). 2. J. B. Dugan, S. J. Bavuso and M. A. Boyd, Fault trees and Markov models for reliability analysis of fault-tolerant digital systems, Reliability Engineering and System Safety 39, 291-307 (1993). 3. Z. Tang and J. B. Dugan, Minimal cut set/sequence generation for dynamic Fault trees, Proc. of the Reliab. and Maintainab. Sympo., 207-213 (1994). 4. W. S. Lee, D. L. Grosh, F. A. Tillman and C. H. Lie, Fault tree analysis, methods, and applications S! A review, IEEE Trans, on Reliab. 34, 194 (1985). 5. A. S. Heger, J. K. Bhat, D. W. Stack and D. V. Talbott, Calculating exact topevent probabilities using SII-Patrec, IEEE Trans, on Reliab. 44, 640 (1995).

A CONSECUTIVE- kx AND k2 -OUT-OF- n SYSTEM AND ITS RELIABILITY XIAN ZHAO, LIRONG C U I a , XIAOYAN YANG School of Management

& Economics, Beijing Institute of Beijing, P.R. China

' Email :Lirongcui@bit.

Technology,

edu.cn

In this paper we developed a model for a liner consecutive- kx and k^ -out-ofn :F {k\ < ky) system based on some practical situation. The system consists of n components ordered in a line, while the system fails if, and only if, there exists at least non-overlapping consecutive k{ failed components and consecutive k^ failed components in whole system. The system reliability formula with the form of product of matrices by using the finite Markov chain imbedding approach is presented. Finally a numerical example is presented to illustrate the results obtained in this paper. It is wellknown that the consecutive- kx and k^ -out-of- n : ./^ system becomes into 2 consecutive k-out-of-n: F system when ki=kl=k

, which has been studied already in the literature.

Thus the reliability system built in this paper is an extension of the known consecutive system in both theory and practice.

1. Introduction Consecutive-/: models are very important in system reliability models, which include consecutive- k -out-of- n system, m consecutive- k -out-of- n system, window system and (n,f,k) system etc.. Some monographic books (for example, Chang, Cui & Hwang [1], Way Kuo and M. Zuo [10]) on this direction can be referenced by readers. In this paper we shall give a new consecutive system, and it is called consecutive - kx and A;2 -out-of- n : F system. The consecutive - k± and k% -out-of- n : F system (k± < £2) consists of n components ordered in a line, while the system fails if, and only if, there exists at least non-overlapping consecutive k\ failed components and consecutive A, failed components in whole system. When k\= A^ = k The system becomes into an mconsecutive-k-out-of-n system (m = 2) which was introduced by Griffith [2], and Papastavridis [3] provided recursive equations for reliability evaluation of such systems. Koutras [4] gave the formula of reliability by Finite Markov Chain embedding approach for m consecutive- k -out-of- n system. By 97

98

authors' best knowledge, this new system in this paper has never been mentioned and studied in the literature. This problem has many interesting applications. For example, a communication system consists of n relay stations in which if the capabilities of relay stations are different, if and only if, there are non-overlapping consecutive 3 and consecutive 5 relay stations are failed, the system can not be used for communication. This is a consecutive- k{(£, = 3) and k2{k2 = 5) -out-ofn : F system. The formula of system reliability is given by using the finite Markov chain embedding approach, and a numerical example is presented to illustrate the results obtained in this paper. 2. Markov Chain Representation for Consecutive- kx and A, -out-ofn System The Finite Markov chain imbedding method was first employed by Fu [5], Fu and Hu [6], and subsequently by Chao and Fu [7], [8] in the study of system reliabilities. But the term "finite Markov chain imbedding" was formally introduced by Fu and Koutras [9], They showed that many important systems, such as series system, standby systems, k -out-of- n systems, consecutive- k out-of- n : F system, deterioration systems, and repair systems, can be imbedded into a Markov chain {Y(t),t > 0} defined on the state apace S = {l,2,...,N} and the discrete index space T = {l,2,...,n} while the system fails if there exists ° (with 1 < t0
when / = 1, >U{sN}, when*'= 2

where (/,_/') indicates a working state in which the system (1,2,•••,?) has last consecutive- j failed components, and there does not exist consecutive- k\ failed components in the system (1,2,•••,?) when / = 0 ; and there exists

99 consecutive- h (kt < h < k2) failed components in the system (1,2,•••,;) when ;' = 1; and there exists at least consecutive- &2 failed components when / = 2 in the system (1,2,•••,/). sN indicates the state in which the system fails. Thus there are N = \S\ = 2k2+kt+l, states in total for this Markov Chain. For convenience, we re-label state (i,j), with 0<j
state (0,0) as state s, , state (0,1) as s2 , state (0,2) as s3 , ..., state (0,A 2 -1) as ski state (1,0) as ski+l, state (1,1) as ski+2, ..., state (l,y) as sk2+J+l,..., state (l,yc 2 -l) as s2A. , state (2,0) as s , v l , state (2,1) as s^+2, ..., state (2J) as s2ll2+J+] ,..., state (2,^,-1) as % 2+Ai , We define {^(0,^ > 0} as a Markov chain with transition matrix A, D, E, B, F, A = 5

c,_

[2* 2 +* l +l]x|2/t 2 +Jr,+l|

where p, p, A = p, 0 0

<7, q,

P,
q,

P, q,

q, 0

P, fexfe

o

100

p,

q,

p,

o

1

(*,+l)x(* 1+ l)

0 D,

-the(A,+1)th,

P, P, P, "0

E, =

F 9,

0

9,

* 2 x(*,+l)

* 2 x(*,+l)

Thus in terms of the Chapman-Kolmogorov equations, the system reliability is RL=n0flA,\JT0, where JI0 =(l,0,-,0) l J , 2 t , + t i + n , and U0 =(!,-••,1,0)<Mx[2/t +*,+!]

-

2

3. Numerical Example For a linear consecutive- kt and k2 -out-of- n : F {kx, = 0.5 + 0.02X11 < j < 20). We have JV = |S| = 2 ^ + ^ + 1 = 14 states. The Markov chain state transition diagram is in figure 1.

101

Figure 1.,Markov chain transition diagram for a consecutive-3 and 5-out-of-20: F system

Transition probability matrix is p, A,

D,

E,

B,

F,

q,

p, ,where

A,

q,

p, 0

C,' J l 4 x l 4

q, q,

o p, Pi B,

p, p, p,

q, q,

,c,

q, q, o

p, p, p,

o

o

q, q, q, I

102 0"

"0

,E,=

-5x5

,F,= .It

5x4

9«.

Using the Maple software, we get the system reliability RL = n0 fl A,U„ = 0.9999085422 4. Concluding Remarks A liner consecutive-A; andjt, -ovX-of-n :F (Jq < £,) system model is introduced in this paper and its reliability formula is given by using the finite Markov chain imbedding approach. Finally a numerical example is given to illustrate the results. This new reliability model is an extension of a linear 2 consecutive kout-of-n: F system. In fact, the circular case can also be introduced, which is omitted here because of length of the paper. One of assumptions used in this paper is independence for components, but this assumption can be extended to Markov dependent case easily. On the other hand, the new system reliability model can be extended into consecutive- fc,,A2,...,&M and kj -out-ofn : F (kj < ft/+1), / = 1,2,...,/-1 system, however, situations become into more and more complex as / increasing. Acknowledgments The authors thank the National Natural Science Foundation of China and BIT for their financial support under grant No. 70371048 and BIT grant, the authors also thank the anonymous referees for their helpful suggestions, which improved the presentation of the paper. References 1. G. J. Chang, L. R. Cui, F. K. Hwang, Reliabilities of Consecutive- k Systems, Dordrecht, Kluwer Academic Publishers, (2000). 2. W. S. Griffith, On consecutive- k -out-of- n failure systems and their generalizations. In A. P. Basu (Ed.), Reliability and Quality Control. Elseveir Science, Amsterdam, 157-165 (1986). 3. S. G. Papastavridis, The most important component in a consecutive- k. out-of-n : F system, IEEE Transactions on Reliability 39, 386-388 (1990).

103 4. M. V. Koutras, On a Markov Chain approach for the Study of Reliability, Structures. J. AppliedProbab. 33, 357-367 (1996). 5. J.C. Fu, Reliability of large consecutive-k-out-of-n:F systems with (k-1)step Markov dependence. IEEE Transactions on Reliability 35, 602-606 (1986). 6. J. C. Fu, B. Hu, On reliability of large consecutive-k-out-of-n: F systems with (k-l)-step Markov dependence, IEEE Transactions on Reliability 36, 75-77(1987). 7. M.T. Chao, J. C. Fu, A Limit Theorem of Certain Repairable Systems, Ann. Inst. Statist Math. 41(4), 809-818 (1989). 8. M. T. Chao, J. C. Fu, The reliability of large series system under Markov structure, Advances oj Applied Probability 41, 894-908 (1991). 9. J. C. Fu, M.V. Koutras, Distribution theory of runs: a Markov chain approach, Journal of the American Statistical Association 89, 1050-1058 (1994). 10. Way Kuo and M. Zuo, Optimal Reliability Modeling: Principles and Applications, John Wiley, N.Y., (2003).

This page is intentionally left blank

PART II OPTIMIZATION IN RELIABILITY ENGINEERING

This page is intentionally left blank

EFFICIENT ALGORITHM FOR THE SYSTEM STATE DISTRIBUTION OF MULTI-STATE k-OVT-OF-n SYSTEM TOMOAKIAKIBA Information Management Engineering, Yamagata College of Industry & Technology, 2-2-1 Matsuei, Yamagata, Yamagata 990-2463, Japan HISASHI YAMAMOTO, fflDEKI NAGATSUKA Faculty of System Design, Tokyo Metropolitan University, 6-6 Asahigaoka, Hino, Tokyo, 191-0065, Japan In this study, we provide a recursive algorithm for evaluating the system state distribution of a generalized multi-state k-out-of-n system. This recursive algorithm is useful for any multi-state k-out-of-n system, including the decreasing, increasing and other non-monotonic multi-state k-out-of-n systems. We calculate the order of computing time and memory capacity of the proposed algorithm. The result of a numerical experiment shows that when n is large, the proposed algorithm is efficient for evaluating the system state distribution of multi-state A-out-of-n system.

1. Introduction In traditional reliability theory, both the system and its components are allowed to take only two possible states: working or failed. In the binary context, a system with n components in sequence is called a A:-out-of-«:G (F) system if the system works (fails) whenever at least k components in the system work (fail). The term "£-out-of-« system" is often used to indicate either G system or F system, or both systems. If k=n, a &-out-of-«:F (G) system is a parallel (series) system. Many research results have been reported the reliability evaluation of binary £-out-of-« system; for example, see Barlow and Prochan [1], Chang et al. [2], Chao et al. [3], Kuo and Zuo [4], and Kolowrocki [5]. In many real-life situations, however, the systems and their components are capable of assuming a whole range of levels of performance, varying from perfect working to complete failure. In these situations, the dichotomous model is an oversimplification of the actual situation, and so models representing multi-state systems and multi-state components are much more useful in describing the performance of these systems in terms of the performance of their components. In a multi-state system, both the system and the components are allowed to be in M +1 possible states, 0,1, 2, • • •, M, where M is a positive integer which represents a system or component in perfect working state, while zero is complete failure state. A multi-state system reliability model provides more 107

108 flexibility for the modeling of equipment conditions. Researchers have extended the definitions of the binary k-out-of-n system to the multi-state cases, for example, see Barlow and Wu [6], El-Neweihi et al. [7], Griffith [8], Huang et al. [9], and Zuo et al. [10]. Huang et al. [9] proposed the definition of the generalized multi-state k-oui-of-n: G system and then provided an efficient algorithm for evaluating the system state distribution of special case(decreasing, increasing and constant) multi-stateft-out-of-w:G systems. And, Yamamoto et al. [11] proposed recursive formula for evaluation of system state distribution of a generalized multi-state k-ovA-of-n: G system. The multi-state &-out-of-w: G system is applicable to the production management problem of multi production lines for a certain product, and the problem of a mining operation for a shovel-truck system. Furthermore, Zuo et al. [10] proposed the definition of the generalized multi-state A>out-of-«: F system and then proposed efficient algorithm for the system state distribution of a decreasing multi-statefc-out-of-n:F system. From the definitions of these two types of systems, generalized multi-state A>out-of-n: G system and generalized multi-state k-ovA-of-n: F system, we can regard a multi-state k-o\i\-of-tr. G system as symmetrical to an multi-state {n-k+\)-out-of-n: F system when these systems are constructed by same components. In this study, we provide a recursive algorithm for evaluating the system state distribution of a generalized multi-state A-out-of-w: F system. This recursive algorithm is useful for any multi-state k-oat-oi-n: F system, including the decreasing, increasing and other non-monotonic multi-state &-out-of-«: F systems. We calculate the order of computing time and memory capacity of the proposed algorithm. The result of a numerical experiment shows that when n is large, the proposed algorithm is efficient for evaluating the system state distribution of multi-state A-out-of-H: F system. Throughout this paper, multi-state k-owi-oi-n systems assumes the system are multi-state monotone[8] and the states of components are mutually statistically independent. 2. The Multi-state A-out-of-n System First, we define some notations. Let n be the number of components in the system. To indicate the state of the rth component for / = 1,2, •••,«, we define a random variable Z

for the state of component i, and it can take on any j

(j = o,l,• • •,M). Let u = (w, , u2, • • •, un) be ^-dimensional vector of component

109 states when u means the state of component i. And, let ^>(u) denote the system structure function representing the state of system, q>(u) e {0,1, • • •, M } . Throughout this paper, we consider the set of the probability, {?r{(p(u) = j} \j = 0,1, ...,M} , as system state distribution of multi-state A-out-of-H system. Huang et al. [9] proposed the definition of the multi-state k-out-of-n: G system as follows. Definition. 1 (Huang et al. [9]) j (j = l,2,---,M) if there exists an integer value / (j
+1

components are in state

j + ] or above,..., or at least kM components are in state M or above. On the other hand, Zuo et al. [10] proposed the definition of the generalized multi-state £-out-of-«: F system as follows. Definition.2 (Zuo et al [10]) out-of-«:F system. The condition in this definition can also be phrased as follows: at least k components are in states below j , and at least k

+1

components are in states

below j + l,..., and at least kM components are in states below M. And, the multi-state &-out-of-« systems will be particularly considered as a decreasing (increasing) multi-state k-ovk-of-n system when kt){=n)>k{ >k2 >--->kM (ka(=0)
110 system by similar manner of a increasing multi-state k-out-of-n: G system. Yamamoto et al. [11] proposed recursive formula for evaluation of system state distribution of a generalized (that is, including the decreasing, increasing and other non-monotonic) multi-state &-out-of-«: G system. 3. Theorem In this study, we provide a recursive algorithm for evaluating the system state distribution of a generalized multi-state k-out-oi-n: F system. This recursive algorithm is useful for any multi-state &-out-of-«: F system, including the decreasing, increasing and other non-monotonic multi-state fc-out-of-»: F system. Next, we define the following notations. For i = \,•••,«, j = \,2,---,M, let p.. be probability that component i is in statey. And, we denote by f^\i) the probability that the state of the multi-state A-out-of-j: F system is state j , and by FiJ)(i) that the probability the states of the multi-state A>out-of-z: F system is in below state j . Let x = (xl,x2,---,xM) be M-dimensional vector which is explained below in detail, where x, = 0,\,2,---,k,,kl,kl

(/ = 1,2,---,M), where

kt and k, are not numbers but symbolic characters. Furthermore, we define the event

SF(i,l;x,)

that the following

JC,=0,1,2,—, * „ * „ * , ( / = 1,2,—, M ) , i = l,—,n, 1)

four

conditions

apply,

for

; = 1,2,---,M.

x, components at state below / occur from component 1 to /' if x, =0,l,---,A; / -l,and z = l,2,---,«.

2) " k, components at state below / occur from component 1 to i," and "the state of component i is in some state below /" if x,=kr

and

/ = l,2,--,«. 3) "at least k, components at state below / occur from component 1 to / - 1 " if x: = kn and ;' = 2, •••,«; null event if xt = kt and i = 1. 4) "at least k, components at state below / occur from component 1 to i" if Xj = kj, and i =

\,--,n-

In the definition of 5 F 0',/;x ; ) , for i(i<0), we suppose that such hypothetical component / exists, and its states are always M +1 hypothetical component state. From this definition, FU){i) can be shown as follows,

Ill

Furthermore, for x, = 0,1,2,-••,*„*,,*, (/ = 1,2,---,M) and i = 0,l,---,n, Pr\XjF(U,x,)},

P(i;x):

if / = l , - , » , a n d

fl if x = (0,0,---,0), P(0;x) = \ [0 if x * (0,0,---,0), A(x): {/|x, =l,2,---,*,-l(/ = l,2,---,M)}; that is, {l\x,± 0, k„ k, (I = 1,2,• • •, j)}, for x, = 0,1,2,-•-,k„k, (I = 1,2,--,j) B(x):

{l\x,=0(l

=

., . A if J = 0 .

and j = l,2,---,M.

l,2,-,M)},

for x,=0,l,2, —,k„k! (1 = 1,2, —,j)

and j = l,2, — ,M.

b(x) : max{/ e B(x)} when B(x)^ and 0 when B(x) = . C(x):

{l\x,=k,

(l =

l,2,-,M)},

for *,= 0,1,2,-,*,,*, (/ = 1,2,-,./') a n d j = l,2,---,M. Cardinality of A(x)\jC(x)- For g = 1,2,••-,/«,

m:

/ g : Member of A(x) U C(x); that is, lg

e

^ ( x ) U C(x). We let /, < l2 < • • • < /,„

and suppose /0 is b(x) and /m+1 is M +1, for simple equations. £c

mm

:

gc

mm =

aI

"g

m i n llg g:l
€

C

(X)}

when C(x) * ^ and /« +1 when C(x) = . For example, When M = 5 , A, = 2 , *2 = 4 , *3 = 3 , k4 = 3 , k5 = 5 and x = (0,2,3,3,3) , we can get b(x) = 1 , m = 3 , /, = 2, l2 = 4, /3 = 5 and gcmm

=2

from

^f(x) = {2,5},

x = (1,1,2,3,4) , we can get

B(x) = {l} , C(x) = {4}. Similarly, when b(x) = 0 , C(x) =

and

gcmin=5

from

A(x) = {1,2,3,5}, B(x) = 0, C(x) = , m = 4 and /, =l,/ 2 = 2 , / 3 = 3 , / 4 =5 . And, let Q(i;a,P)

p.( . Then probability / 0 ) ( / ) and

be ^

FU)(i)

can be given in Theorem 1. Theorem l)For j = l,2,--,M / and,

O )

and i = l,2,---,n, (0=Z-

E

£tP(i;xls--,xJ_l,xJ,~l,---,'kZ),

(1)

112

(2) where K, ={0,1,-,*, -1,*,} (l =

l,-,M).

2) For / = l,2,--,«,if x, = k,, ' ( l i x i > " ' >xi>~" >XM) 3)For i = l,2,-..,n

=

"(tlxis'">kl)---,xAf)

+ P(i\xl,- • • ,k,,- • • ,xM).

(J)

and x, = 0,l,-~,k„k, (/ = 1,2,---,M),

0

ifx^ >x, 2 (/* i,

(4)

P0;x) = !e_mi„-'

$V0'-l;yfex))Qg

otherwise,

where

a=20;/,,/g+1-i) and

(5)

y(g;x) = (>>,(g;x),j/ 2 (g;x),-,y u (g;x)), (0

ifleB(x),

x,-l yi(g;*)

\fl>lgandleA(x)[jC(\),

x,

if/g>/and/€^(x)UC(x),

k,

i f / e { / | x ; =k~i),

(6)

for / = 1,2, • • •, M . As the boundary condition, for / = 0, P(f\*)>-

fl 0

if x = (0,0, •••,()), ifx*(0,0,--,0).

(7)

Using theorem, the proposed algorithm consists of the following steps, for computing Fu\n) and fU)(n) for j = l,2,---,MSTEP 0 : (Setting initial value) Set /' = 0 and />(0;x) = from equation (7). STEP1 Set / = / + !.

1

ifx = (0,0,--,0),

0

ifx*(0,0,---,0),

113 Obtain P(i;\)s for all X such that xt does not take kt for all / (/ = 1,2,---,M), by equations (4), (5) and (6). Go to STEP 2. STEP 2 : Obtain R(i; x) s for all X such that x, takes k, (/ = 1,2,• - , M ) , by equations (3). Go to STEP 3.

for some /

STEP 3 : Go to STEP 1 if i
and Fu\n)

for j = l,2,---,M

by

4. Evaluation of our Algorithm We evaluate the orders of computation time and memory size for the proposed algorithm. First, we consider the order of computation time. For each /', in order to compute

P(i;\) s for all X such that x,

takes k,

for some / M

times. The

x, = 0,l,---,/fc;,/fc;

for all /

(/ = 1,2,--,M), we must use equation (3) a maximum of 2 number of P(i;x) 's for all

X such that

(/ = 1,2,-",A/) is TT_(& ; +2)- Therefore, the order of computing and Fu\i)

for j = \,2,—,M

is 0{n(2M+Y\^_]kl))

fU){ri)

by equations (1) and

(2). That is, the order of computation time is of exponential of M and polynomial of n. The maximum memory size required for computing P(i; \) is 2T\M (k, +3), because we need to have r~TM (&, +3) entries for z'-l and / at the same time. Therefore, the order of the required memory size is 0{\~\

k,) •

The required memory size is also of an exponential order of M, but does not depend on n. We performed a numerical experiment in order to compare the proposed algorithm with enumeration method. All the experiments were executed using a Pentium-M (1.3GHz) computer with 768MBytes of RAM, MS-Windows 2000, Visual C++.NET and C language programming. For the first numerical experiments, we consider the six-state k-ovX-oi-n: F system with kx = 6, k2 = 3, &3 = 4 , A:4=5 , k5=2 , k6 = l and « = 6,---,200 including non-i.i.d. components. The state distribution of each component is pj0 = 0.35, pn = 0.2, p,2=0A5,

/7,3 =0.12, /> j 4 =0.1, pi5=0.05,

pi&=0.03

when i is even;

and />„, = 0 . 3 , Pn =0.15, p / 2 = 0 . 1 4 , p H = 0 . 1 3 , / J M = 0 . 1 2 , ft5=0.11, /?j6 = 0.05 when i is odd. We calculated the system state distribution of a

114

six-state k-out-of-n: F system, and we compared computation time by using our proposed algorithm and an enumeration method, as shown in Table 1. The averages are the results from five trials for each n value. In Table 1, we omitted computation times by enumeration method when it took n > 30. Our proposed algorithm is effective than the enumeration method when n > 7. Clearly, the proposed algorithm is more efficient when the number of components n is large. From these results, we see that the proposed algorithm is very efficient for evaluating the system state distribution and enables the system state distribution in the case of large n values to be calculated. Table 1. System state distribution of a six-stateft-out-of-fl:Fsystem, and computation time n 6 7 8 9 10 20 30 50

/ » 0.001158 0.005524 0.017072 0.035752 0.068034 0.675101 0.956849 0.999800

/ » 0.434869 0.624209 0.774178 0.836153 0.863784 0.324708 0.043151 0.000200

/ » 0.062601 0.072761 0.064758 0.052402 0.035705 0.000185 0.000000 0.000000

/*>(„) / ' » 0.082791 0.081412 0.057693 0.038377 0.020115 0.000007 0.000000 0.000000

0.418474 0.216075 0.086296 0.037315 0.012363 0.000000 0.000000 0.000000

ioo i.ax)oooo.(xxxx)0o.rjrj(xx»o.(xxx)ooo.oorx)0o

/ < » 0.000107 0.000019 0.000002 0.000000 0.000000 0.000000 0.000000 0.000000

/*>(*) 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000000 0.000000

150 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 200 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

A | —

^ g

& )

Enu^tion

0.146 sec. 0.258 1.092 sec. 0.306 8.148 sec. 0.350 60.535 sec. 0.401 7.441 min. 0.449 0.924 about 7323 years 1.398 2.343 4.703 7.068 9.438 -

5. Conclusions In this study, we proposed recursive algorithm for the system state distribution of multi-state £-out-of-w: F system. We evaluated the proposed algorithm in terms of the orders of computation time and memory size requirements. Numerical experiments showed that the proposed algorithm is very effective for evaluating the system state distribution of the multi-state k-ouX-oi-n: F systems when n is large. In addition, we can provide the system state distribution of a generalized multi-state A-out-of-«: G system when ks 's are nearer to n, from definitions of a multi-state £-out-of-/7: G system and multi-state A>out-of-«: F system. From this, we can obtain the system state distribution of a generalized multi-state k-out-of-n system by using Yamamoto et al. [11] and the proposed algorithm in this study.

115 References 1. R. E. Barlow and F. Prochan, Statistical Theory of Reliability and Life Testing: Probability Models, Holt, Rinehart and Winston, New York (1975) 2. G. J. Chang, L. Cui and F. K. Hwang, Reliabilities of Consecutive-k Systems {Network Theory and Applications, Volume 4), Kluwer Academic Publishers, Dordrecht (2000) 3. M. T. Chao, J. C. Fu and M. V. Koutras, Survey of Reliability Studies of Consecutive-A-out-of-n: F Related System, IEEE Trans, on Reliab. 44(1), 120-127(1995) 4. W. Kuo and M. J. Zuo, Optimal Reliability Modeling: Principles and Applications, John Wiley & Sons, New Jersey (2002) 5. K. Kofowrocki, Reliability of Large Systems, Elsevier, United Kingdom (2004) 6. R. E. Barlow and A. S. Wu, Coherent systems with multi-state components, Math. Operations Research, 3 (4), 275-281 (1978) 7. E. El-Neweihi, F. Proschan and J. Sethuraman, Multi-state coherent system, J. Appl. Prob., 15, 675-688 (1978) 8. W. S. Griffith, Multistate reliability models, J. Appl. Prob. 17, 735-744 (1980) 9. J. Huang, M. J. Zuo and Y. H. Wu, Generalized multi-state A:-out-of-«: G systems, IEEE Trans, on Reliab. 49(1), 105-111 (2000) 10. M. J. Zuo, J. Huang and W. Kuo, Chapter 1: Multi-state k-owi-of-n Systems, Handbook of Reliab. Eng. (Ed. Pham, H.), SPRINGER, Berlin, 3-17(2003) 11. H. Yamamoto, T. Akiba and H. Nagatsuka, Efficient methods for the system state distribution of Multi-state k-out-of-n: G Systems (Submitted) (in Japanese)

DECISION SUPPORT SYSTEM FOR RELIABILITY ANALYSIS OF PROCESS PLANT SHUTDOWNS MASSIMO BERTOLINI Dipartimento

di Ingegneria Industrials Universita degli Studi di Parma, Viale delle Scienze, 181/A, 43100 Parma, Italy MAURIZIO BEVILACQUA

Dipartimento di Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e di Metallurgia, sede di Forli, Universita degli Studi di Bologna, Via Fontanelle 40, 47100 Forli, Italy This paper describes a Decision Support System (DSS) to assess the influence of planned and unplanned shutdowns and periodic maintenance activities on the overall reliability of complex industrial plants. The methodology here proposed allows to evaluate the effects of plant shutdown on item reliability. The analysis has been carried out on fifteen process plants of an Italian oil refinery for a three-year period from 2001 to 2003. The outcomes of the analysis show that the plant restart represents an important criticality factor for plant operation from a reliability point of view, highlighting an increase in item failure and the subsequent growth of corrective maintenance costs. The present paper describes the application of a methodology to adjust plant operation start up, whose results, applied to API oil refinery in Falconara Marittima (Ancona, Italy), show an improvement in plant reliability in the year 2003.

1. Introduction Reliability of equipment in process industries represents a very important issue to ensure correct plant operation from both a safety related and economic related point of view. The assessment of item failure frequencies is the core step for defining and implementing a risk-based approach to plant life management as discussed by Nilsson [1]. Jovanovic [2] in a recent paper discusses the best practices for risk-based inspection and life management of process plants in Europe, analyzing the evolution from the traditional approach to risk aware solutions based on the condition of components and related risk, and emphasizing the necessity of focusing the inspection activities on critical elements so as to improve the overall plant management. The development of risk-based management strategy for complex process plants is analyzed by Krishnasamy et al. [3]: the proposed methodology allows to estimate the risk caused by unexpected failures and their consequences. The maintenance strategy selection based on the risk approach, taking simultaneously into account economic and safety aspects, is also proposed by Kumar [4] as an effective tool to manage complex automated systems. 116

117 Goel et al. [5] propose a tool to help the management to define the equipment reliability at the design stage by balancing the associated costs with the impact on the design, maximizing the process reliability subject to budget constraints. The authors recognize how unplanned equipment shutdowns due to equipment failures can severely affect complex process plant availability. The role of equipment reliability becomes fundamental for all process plants, especially when the industry's activities can heavily impact on the safety of the environment and the people: the ability to forecast incorrect item operation can thus provide the management with relevant data to increase plant availability. Plant working conditions are strictly dependent on process variables: regular operation can provide relevant benefits from point of view of availability increase or consolidation. On the other hand, transient operation can compromise equipment reliability with heavy consequences on plant availability and running costs. This paper aims at evaluating the effects of plant shutdowns on equipment reliability of an oil refinery. The analysis has been carried out on the most important oil refinery production plants, for a three-year period from 2001 to 2003. The influence of planned and unplanned shutdowns will be evaluated for different processing units and critical items as a function of the operating conditions, finally analyzing the time of failure distribution after a plant shutdown. 2. Objective of the Work The analysis described in this paper aims at describing a formalized tool to analyze planned and unplanned shutdown effects on item reliability in the process plants of an oil refinery from the point of view of (i) failure count, and (ii) failure temporal distribution. The paper is structured in the following steps: (i) process plant shutdown identification, (ii) analysis of shutdown effects on item failure, and (iii) failure timing distribution. The process plant shutdown identification has been performed by analyzing the process feed mass flow rate. Our analysis focused in particular on the most relevant oil refinery process production plants like Topping, Vacuum, Visbreaking, Naphtha splitter, Unifining, Platforming, Desulphuration. Data were collected for each process plant over a three-year period from 2001 to 2003, gathering the input mass flow rate through the measures performed by the load flanges and P&I instruments such as flow controllers and indicators. Fig. 1 shows, as an example, the mass flow rate load profile obtained for Vacuum distillation unit during 2001.

118 Vacuum - year 2001

Figure l. Vacuum plant inlet flow rate load profile.

The oil refinery process plant stoppages have been classified in two different groups according to the decrease in input mass flow rate and the length of the stoppage. With the term "shutdown" we refer to a long stoppage with a resulting production blockage, while with the term "slowdown" we refer to either a period when the input mass flow rate decreases to 25% of its mean value or a temporary stoppage. The relevant data to draw the mass flow rate load profile for the process plants were obtained from the oil refinery database, questioned through the formulation of specific queries. The results of these queries were further filtered so as to eliminate the presence of dummy stops due to instrumentation errors or data transmission problems. Once the load profiles had been opportunely filtered, the stoppages were assigned their specific causes. Plant failure data were deduced analyzing the "work order", a document used in the oil refinery to request maintenance intervention. In particular, the corrective maintenance work orders were taken into consideration, relating to an item normal operation restoration following a failure. Owing to the high number of plant item types, the analysis was restricted to eight item classes, chosen according to the oil refinery management staffs opinion as those potentially mostly affected by unplanned stoppages consequences in terms of failure criticality. The following briefly described items have been considered. • •

mechanical joint (a flange coupling whose seal is guaranteed by a flat or spyrometallic gasket); area (with this term we refer to the request of maintenance interventions to regulate potentially dangerous process fluid loss from P&I components);

119 • • • •

•

•

controller (automatic device to control an item output signal according to the input signal); strain (piping element installed to protect instrumentation and equipment such as control valves); flame arrestor (safety component made of a metallic net to block backfire. Normally placed on burner fuel feeling pipes); indicator (general-purpose instrumentation to provide the output signal from a measurement system. Particularly used in oil refinery process plant for temperature, it levels flow rate pressure and differential pressure measures); pilot burner (acts as a trigger for the main burner of an industrial furnace and as a safety system for incidental main burner switch off, preventing the presence of dangerous gas backlog); seal (mechanical constraint system to prevent process fluid loss from a rotating machine, made up by a static element and a rotating one supplied with proper gaskets).

The work order data available have been sorted by operating period, processing unit and equipment, ananging them in a spreadsheet through the use of pivot tables. The generic pivot table for a processing unit reports the corrective maintenance work orders count for the critical equipment previously described, divided by operating period, as shown in Table 1. Topping - year 2001

Flow rate [t/day]

14000

12000

10000

8000

An/ vp A

ffr'VV SAJ^JHA w

ly i

IT

i

V

6000

4000 j

)

-u ! J1 *

Platform Ing shutdown

121

pressure

.' 180 \

» Flow Indicator ,failure ai ""

*

^C v e

X240 "s ^

| W l K t a |

mure

- •

300 s *

equipment electric feed failure

360

"»w P^sure shuldow

Figure 2. Topping plant inlet flow rate load profile.

Days

120 The work orders data were then compared with the process plants load profile taking into account only a mass flow rate decrease below 25% of its mean value (represented by the red dashed line in Figs. 1 and 2). Table 1. Topping plant failure pivot table for year 2001. |

Process unit Failure count Date (MA') Components 01/01 02/06 03/06 04/06 05/06 06/06 07/06 08/06 09/06 10/06 11/06 12/06 Grand total Mechanical joint 1 2 1 1 1 1 7 Area 2 1 1 5 1 5 31 Controller 1 6 1 9 2 1 3 3 1 7 Strainer 2 3 1 1 14 8 1 Indicator 1 3 3 Pilot burner 2 1 14 Seal 1 1 3 1 3 4 1 1 9 22 4 5 81 Grand total 3 7 11 3 9 6

As an example, it can be noted from Fig. 1 and Table 1 that the Topping process plant failure number increased from 3 to 22 just after the "Low pressure shutdown stoppage" that took place in September 2001. The increase in failure number after a planned or unplanned shutdown has been noticed for all the process plants examined, with the greatest increase of failures statistically placed not beyond 30 days after the plant restart. The planned or unplanned shutdown effects on system reliability will then be linked to the failure number growth following a plant stoppage, data obtained through the analysis of the work orders database. 3. Data Analysis and Result Discussion The information gathered using the analysis methodology described has been arranged in the form of histograms for the sake of clarity. Figs. 3, 4 and 5 report the frequency distribution of the failure number increment for all the oil refinery process plants analyzed. Fig. 3 shows that, although class zero of failure increment is quite numerous, plants stoppages considerably affect the corrective maintenance request. This fact can be motivated with the relevant critical items sensitiveness to the unsteady operating conditions taking place during a plant shutdown and restart. Figs.4 and 5 show that passing from 2001 to 2003 a decrease was observed in the failure number after a plant restart. The main reason for this behavior can be referred to the decrease in process plant stoppages from 2001 to 2003, and in particular a sensible fall in unplanned shutdowns, which mainly affect equipment reliability. Another reason that contributed to the improvement of

121

process item availability can be assigned to the outsourcing, starting from 2002, of the periodic turnaround to an external contractor skilled in global service maintenance management.

50 45 40 35 „ 30 O •S o Z

•I

!

25

20 15

10 5 0 10

12

14

20

16

22

24

Faflure inncrement - all years

Figure 3. Frequency distribution of failure increment from year 2001 to 2003.

•| i • I i

:

r-

i

I I 12

14

16

18

20

22

24

Fafcire inncrement - year 2001

Figure 4. Frequency distribution of failure increment for year 2001.

122 is | — .

,

.

,

1

,

.

,

.

,

.

,

,

10

12

14

16

18

20

22

24

r-,

16 14

12

I 10 S 8 6 4

;\

2 0

2

I j 4

6

8

Faiure iincrement. year 2003

Figure 5. Frequency distribution of failure increment for year 2003.

All the failure count due to the critical items are analyzed: mechanical joints and seals as the most sensible items to process plants stoppages. A relaxation of the undesired effects could be obtained by using hot connection procedures for flanged coupling and upgrading all the seal systems according to the American Petroleum Institute API 682 standard. Also the influence of process plants stoppages on instrumentation equipment is considered: a possible cause of the relevant impact on these items can be found in their not being adequately protected during planned shutdown, which can induce the necessity for further instrument calibration. A final analysis was carried out to investigate the failure time shift following a plant shutdown and restart. Such examination was performed for the low-pressure production process plants (the distillation residues processing units, which appeared to be the most affected by the plant stoppages) and for the primary distillation unit (the Topping plant, core of oil refinery operation). The work orders database allowed to obtain the distribution of corrective maintenance requests following a plant shutdown. As previously mentioned, the analysis has been restricted to a maximum time fence of 30 days from a plant restart, allowing to draw the failure time shift frequency distribution reported in Fig. 6.

123 Fafcire fere shift Exponentol Fitting read = 13 days

Figure 6. Frequency distribution of failure time shift.

From Fig. 6 it is possible to point out that the time to failure following a plant restart is characterized by a decreasing trend, which can be represented by an exponential distribution curve with a mean value of time to failure of about 13 days. 4. Conclusions The main conclusion that can be drawn from the results obtained is that a process plant stoppage, both planned and unplanned, represents a critical phase for equipment reliability. It can in fact be stated that a process shutdown is characterized by an increasing drift in corrective maintenance requests. The methodology described can act as Decision Support System for maintenance management. On the basis of the crossed data of plants input load and corrective maintenance requests it is possible to identify, from a statistical point of view, the most critical processing units and equipment, thus predict the occurrence of unexpected failures.

References 1. 2. 3. 4. 5.

F. Nilsson, Nuclear Engineering and Design 221, 293 (2003). A. Jovanovic, Nuclear Engineering and Design 226, 165 (2003). L. Krishnasamy, F. Khan, M. Haddara, Journal of Loss Prevention in the Process Industries 18, 69 (2005). U. Kumar, Journal of Mines, Metals and Fuels, Annual review, 343 (1998). H.D. Goel, J. Grievink, M.P.C. Weijnen, Computers and Chemical Engineering, 27, 1543 (2003).

EVALUATING ABILITY OF A BRANCH AND BOUND METHOD DESIGNED FOR SOLVING BI-OBJECTIVE NVP DESIGN PROBLEM TOSHIYUKI HIROSHIMA1, HIDEMIYAMACHI23, YASUHIRO TSUJIMURA2, HISASHIYAMAMOTO3, YASUSHI KAMBAYASHI2 1 Graduate School of Engineering, Nippon Institute of Technology 2 Department of Computer and Information Engineering, Nippon Institute of Technology, Miyashiro, Saitama, 345-8501, Japan 3 Faculty of System Design, Tokyo Metropolitan University Hino, Tokyo, 191-0065, Japant The N-version programming (NVP) is a programming approach for constructing fault tolerant software systems. In this paper, we formulate the NVP design problem as the multi-objective optimization problem that seeks Pareto solutions, and then we proposed a novel branch-and-bound method to find Pareto solutions for this problem within a practical computing time. To verify the efficiency of our branch-and-bound method, we compared computing time of this method with that of a complete enumeration method. It is observed that the branch-and-bound method significantly dominates the enumeration method. Further we analyze relations between the ability of this branch-arid-bound method and structural characteristics of NVP designs, and clarify what types of the NVP design problems the branch-and-bound method is most applicable.

1. Introduction The N-version programming (NVP) [1] is a programming approach for constructing fault tolerant software systems. This methodology is a kind of safe-by-redundancy techniques based on variety of designs. The NVP is such a software construction technique where N development teams design and implement A' functionally equivalent software independently. Those N versions of software perform the same task and return their results independently so that a decision mechanism can compare them and determine what it outputs by taking what the majority agrees. Ashrafi et al. [2] proposed a solution for NVP design problem. Ashrafi et al. formulated the NVP design problem as a single-objective 0-1 non-linear integer programming problem that maximizes the system reliability under cost constraints, and then proposed a solution that is based on the dynamic programming. 124

125 In order to find more reliable and more economic solution, we considered a bi-objective NVP optimal design problem. Yamachi et al. [3] further investigated algorithms that efficiently search the Pareto solutions for the biobjective NVP design problem that requires to maximize the reliability and to minimize the cost. The algorithm is based on a branch-and-bound method that uses the depth-first search (DFS). So far, our preliminary experiments have demonstrated that this algorithm found all the Pareto solutions. It is obvious that this algorithm shows its superiority when pruning occurs close to the root of the search tree. It is not known that what characteristics in NVP design problems affect the behaviors of the search algorithm. In this paper, therefore, we not only present the searching algorithm based on a branch-and-bound method, but also analyze the behaviors of the algorithm according to the characteristics of the NVP design problems. This analysis contributes improving further efficiency of our algorithm. 2. Formulation of the NVP Design Problem Figure 1 shows a NVP system. The same task is given to all the N versions of software, i.e. the same set of inputs is supplied to all the TV versions, and all the TV versions produce their own results. The results are sent to the voter that decides which one is the correct result by taking the majority or the one supported by a half or more than half, and outputs that result. In this paper, we employ the method that takes the result supported by a half or more than half of the versions, since this method is applicable for other cases.

Version 1

Task-

The majority of versions answer the same result. Correct *" output

Version 2 • Version N

Figure 1 Basic concept of die NVP.

> Subtask !.•

* Subtask 2* Task. Figure 2. Task and subtask^

* Subtask n*

126 A software system usually consists of several subtasks as shown in Figure 2. Therefore, NVP is applied for every subtask in a software system. In this section we re-formulate the NVP design problem based on the mathematical formulae proposed by Ashrafi et al [2] in order to describe our algorithm. We define the following notations. n: the number of subtasks that construct a given task /': subtask i, i = 1, 2,..., n TV,: the set of versions that can perform subtask i, i.e. {1, 2,..., |/Y,|} A/,: the set of versions that are selected to perform subtask /, i.e. Mt c Nt {M/, M2,..., Mj} : the set of versions that are selected to perform / subtasks that are subtask 1 trough subtask i \A\: the cardinality of a set A For eachy = 1, 2,... ,\Nt\ (i = 1, 2,..., n) Cjf the cost of version y to perform subtask / r-,j. the reliability of version y to perform subtask i We also define

=V

r /.max

/

and r

r .

ii

(|i = V v max V /

/

. /

r

, where / is an arbitrary

* it

number from 1 through n. For each Mt c TV, ( i = 1, 2,..., n ), R(M^) : the reliability of NVP, i.e. the probability that a half or more than half of versions produce the correct result that is constructed by all the elements in the set M{ that performs the subtask i C(Mf) : the cost of NVP that is constructed by all the elements in the set Mt that performs the subtask /',for each / = 1, 2,..., n. Rs(Mh M2, ... , Mi) : the reliability of perform / subtasks that are subtask 1 through subtask / (/' = 1, 2, ... , I), when the NVP, which performs the subtask /', consists of all the elements in the set Mi Cs{Mh M2,..., M/): the cost of performing / subtasks that are subtask 1 through subtask / (/' = 1,2,..., I), when the NVP, which performs the subtask /, consists of all the elements in the set M, that i s ^ C(M,) > w h e r e Rs (Mi) = R (Mi) and Cs (A/,) = C(M,). When considering a bi-objective optimization problem that maximizes f\(x) and minimizes/2M> a Pareto solution is defined as x that holds the inequalities fi(x) >Mx)orf2(x)

127 Furthermore, we assume the following conditions for the NVP design problem we discuss in this paper. They are based on the formulae given by Ashrafi et al. 1) Each version is only available for a certain subtask. 2) For each subtask, at least one or more versions must be selected. 3) Each version has its own cost and reliability that are evaluated beforehand. 4) Failure of each version occurs independently. Based on the notations and assumptions described above, we can formulate the bi-objective NVP design problem Par as follows: Bi-objective NVP design problem Par. Rs(Mh M2,..., M„) -> max CS(M,, M2,..., M„) -» min S L

-

\\Mt\>\

(i = l , 2 , - , » )

[M , c N,

(i = l,2, ••-,»)

We define that the set of Pareto solutions of the problem Par as SPM. We propose an efficient algorithm to search SPM. 3. Structure and Properties of NVP Design Problem 3.1. Bi-Objective and Single-Objective NVP Design Problems As a preparation for solving the bi-objective NVP design problem Par, in this section, we consider the following two problems both solve a NVP design problem for only subtask /'. One problem is a bi-objective NVP design problem PS(i) (Pareto solutions in Single task), and the other is a single-objective NVP design problem OS(i, c)(Optimal solutions in Single task) with constant cost c. First, we formulate the bi-objective NVP design problem PS(i) as follows: For each i, i= 1,2,... ,rt Problem PS(i):

R (M,) -> max C (Mj) —• min »-t-

||M,.|>1

[Mi e N,

128 We define SPS(i) (Set of Pareto solutions in Single task) as the set of Pareto solutions for the problem PS(i). Second, we formulate the single-objective NVP design problem with a constant cost c, OS(i, c) as follows: For each /', i = 1,2,... , n and c,c = 0,1,... , c,,max Problem OS(i,c):

R(M,)-naax. s±

[C(A/,) = c <|M,.|>1 M, c N,

We define SOS(i, c) (Set of Optimal solutions in Single task) as the optimal solutions for the problem OS(i, c). 3.2. Properties and Theorem In this section, we deduce properties and a theorem that contribute to construct the algorithm that solves the bi-objective NVP design problem. We start to describe the relationship between the problems PS{i) and Par. Property 1 If {M\, M2,..., M„} e SPM, for each /, i = 1,2,... ,n,M,& SPS(i). In order to obtain SPM, by Property 1, we can focus {M\, M2,..., M„) that consists of Mi 6 SPS(i) (i = 1, 2,..., ri). Taking advantages of Property 1, we need to find: 1) how to compute SPS(i), and 2) how to compute SPM from the set{{M,, M2,..., M„} \ Mi<ESPS(f) 0=1,2,...,«)}. We consider the relationship between problems PS(i) and OS(i,c), and state it as Property 2. This is useful for 1) to compute SPS(i). Property 2 For each /, / = 1,2,... ,n, SPS(i) c ( j SOS(i, c). c

i.max

In order to obtain SPS(i) from M SOS(i,c), we use the following Property 3. c=0

129 Property 3 For each i, i= 1, 2, ... , n, suppose C{Mi)>C{M'l). Then R{Mi)RS{MU M2,..., M,), then vA/,cAr,
130 rooted that node. Here, we store the highest reliability and the corresponding solutions found in the search tree in a array indexed by the cost, in the same way we did to obtain SOS(i, c). From this array, we can obtain SPM through one scan of the array as we did for SPS(i). As mentioned before, obtaining SPS(i) by using Property 2 requires obtaining SOS(i, c). In order to obtain SOS(i, c), we have to generate all the combinations of the versions that perform the subtask /, then to compute their costs and reliabilities. We use the recursive formula, based on [4], that is useful to compute the reliabilities. We can propose the depth-first search algorithm based on the theorem and properties described in this section.

(a) (b) (c) (b)

Table 1. Branch and Bound effect for Cost and Reliability Range Range | Number of SPS(i) | Branch and Bound Cost Reliability |spl sp2 sp3 sp4 sp5 sp61 sub2 sub3 sub4 sub5 total 5~10 0.80—0991 20 — 20 20 20 2010.5% 5.2% 26.7% 44.0% 76.4% 5 - 1 0 0.9S—0.99 25 25 25 25 25 25».0% 8.8% 30.6% 27.3% 66.7% 5 - 5 0 0.80—0.99 16 16 16 16 16 16».0% 2.2% 18.5% 37.4% 58.2% 5~50 0.9S~0.99| 19 19 19 19 19 19|o.O% 1.3% 8.6% 15.5% 25.4%

leaf pareto 0.10% 0.03% 0.20% 0.09%

160 215 128 159

(sec)

~ST 9.6 1.4 5.7

Table 2. Branch and Bound effect by order osSPS's Number Branch and Bound time Oder Number Of SPS(i) leaf pareto (sec) sub2 sub3 sub4 sub total 643 317 Increse ^^ 14 36 33 86 129 o.6*i lJ.5% #.5%- 36.6% 78.3% 5 2.8% 14.6% 28.9% 34.3% 80.7% 0.02% 648 68.9 Decrese 129 80 53 30 14

rfW

4. Numerical Experiments First, we have constructed the following data. Each subtask has six versions. We use the same set of version data for each subtask. In general, even when reliability modestly increases, the cost increases rapidly. Therefore, we set the cost dp and the reliability ry 0=1,2,... ,w)for our experiment as follows: 1) The range of reliability, 0.85 < r,j< 0.95, and the cost ctj =j. 2) For c,i, set rn = 0.85, and for c/iraax, set /-,>ax = 0.95. We set the relationship between cy and r-y as r _ a C>J , ^. and obtain a and b from the value set in 2). ' + ca Then, we have had the minimum and maximum costs, and the minimum and maximum reliabilities vary, and for each variation, we have observed the number of SPS(i), the number of stopped branches, Pareto solutions, and the computation time. Table 1(b) shows the case that both the costs and reliability reside in a narrow range produces the highest SPS, 25, and Table 1(c) shows the case that both the costs and reliability reside in a wide range produces the lowest SPS, 16. We can interpret these result that the costs and reliabilities are

131 distributed in a wide range, i.e. improving reliability requires much cost, the possibility of a solution be SPS becomes low. In the case (a), where the range of costs is narrow and that of reliabilities is wide, 76% of the branches are stopped, while in the case (d), where the range of costs is wide and that of reliabilities is narrow, only 25% of the branches are stopped. We can also observe many stopping of branching in the case (b) even though it produces many SPS. Next, we vary the number of versions. The numbers of versions in each subtask are 3, 5, 7, 9, 11 and 13, and the number of SPS's each case produces are 5, 14, 30, 53, 80 and 129, respectively. Then, we have observed how the number of stopped branches, Pareto solutions, and computation time vary in a case of computing in ascending order of SPS's and computing in descending order of SPS's. Table 2 shows the results. While whether computing in ascending order or in descending order does not affect the number of stopped branches, it affects the computation time. The reason why the ascending order case provides a half of the computation time as the descending order case may be the latter produces more nodes in the search tree. References 1. A. Avizienis and L. Chen, On the implementation of N-version programming for software fault-tolerance during program execution, Proceedings of Computer Software and Applications Conference, COMPSAC77, 149-155 (1977). 2. N. Ashrafi, O. Berman and M. Cutler, Optimal design of large softwaresystems using N-version programming, IEEE Transaction on Reliability 43(2), 344-350(1994). 3. H. Yamachi, H. Yamamoto, Y. Tsujimura, Y. Kambayashi, An Algorithm to Solve NVP Design Problem Based on Branch-and-Bound Method, Proceedings of Eighteenth Autumnal Reliability Symposium, 33-36 (2005). In Japanese. 4. A. M. Rushdi, Utilization of symmetric switching functions in the computation of A>out-of-w system reliability, Microelectronics and Reliability 26(5), 973-987 (1986).

OPTIMALITY OF K-OUT-OF-N SYSTEMS FOR CONDITION MONITORING M A I N T E N A N C E USING D E P E N D E N T INFORMATION

L. JIN, Y. HORIKOSHI A N D K. SUZUKI The University of Electro-Communications, Chofugaoka 1-5-1, Chofu-city, Tokyo 182-8585, Japan

This paper deals with a maintenance monitoring system subject to two types of contradictory failures, "false alarms" and "failure to alarm." It is shown that feout-of-n systems with multiple dependent monitors are preferable to other coherent systems under a milder condition than that of previous research. The condition is that the ratios of the symmetric components of the conditional probabilities for the observed matrix of the monitors given the true state of the system are the same besides the probability matrix being weak-MLR (weak-Multivariate monotone Likelihood Ratio). If the optimal procedure of a monitoring system is given by a fc-out-of-rt system, the amount of calculation time needed to identify an optimal decision can be greatly reduced. This enables an optimal procedure to be found quickly.

1. Introduction 1.1.

Background

Since the breakdowns that occur in huge, complex systems can have a great impact on society, it is necessary to improve the reliability of such systems. However, it is sometimes difficult to make improvements at system design stage for technical reasons. Condition monitoring maintenance, which can prevent breakdowns in advance, is thus attracting more and more attention in the field of reliability. Although condition monitoring maintenance plays an important role in preventing breakdowns, it suffers from two types of contradictory failures, "false alarms" and "failure to alarm." The problem is how to select the optimal procedure that can minimize the expected risk when systems are subject to both types of failures. 132

133 1.2. Previous

Research

Increasing the number of monitors is an effective way to reduce the occurrences of both types of failures; however, increasing the number of monitors makes it more difficult to make an optimal decision in a short period of time because of the greater number of possible decisions. Taking this into account, Murakami 5 investigated a system with multiple sensors that have independent observations and proposed a method for determining the optimal number of sensors. Phillips 6 studied a system with n independent and identically distributed components and proved that a fc-out-of-n system is preferable to any other coherent systems when the components are subject to both types of failures. Ansell and Bendell1 considered the case of dependent and identical components and obtained the same result as Phillips 6 . Murakami 5 and Phillips 6 assumed that the observations of monitors were independent. Ansell and Bendell investigated the case of dependent but identically distributed observations. They extended the earlier research to a general case and investigated the conditions of observational matrix for achieving optimality in fc-out-of-n systems. The components dealt with by Ansell and Bendell x were identical, while those in this research are not necessary to be identical. 2. 2.1.

Model Modeling

We model a system in which the true internal state cannot be directly observed. Let S denote the true state, taking a value of "0" or "1," where state "0" denotes that the system is in a good state, and state "1" denotes that it is in a failure state. It is assumed that the system is monitored incompletely by n (> 1) monitors that give information X = (XW,X ( 2 ), • • • , X ^ ) related to S probabilistically. Here, X^ is the output of the fc-th monitor, and it takes a value from {0,1, ••• , m*}. The relationship between the monitor outputs and the system's true state is described by observed conditional probability matrix T = {jix}, which has the form p _

/70(0,--.,0)

•••

70(*i,•••,*„)

•••

70(mi,-,m„)\

\7l(0,-,0)

•••

7l(zi,-,*n)

•••

7l(mi,".,m„)/ '

Here, ^ix is the probability that X = (X (1) , • • • , X ( n ) ) takes x = (x1,-when S = i, i.e. j i { x i , . . , X n ) = Pr(XW = n , • • • ,X™ = xn\S = i).

,Xn)

134

We consider two activities: "ao" means that the system does not give an alarm signal, and "ai" means that it does give an alarm signal. Since the true internal state is unobservable, the system takes one of the two actions based on the monitors' outputs. Let LQ be the loss due to an alarm signal when the system is in a good state and Li be that from failing to give a signal when the system is in a failure state. 2.2. Expected

Risk

Let d(x) = ak be a decision procedure when X = x, and u> (0 < w < 1) be a prior probability of S = 1, that is UJ = Pr(5 = 1). Obviously, Pr(5 = 0) = 1 — oj. The expected risk under decision rule d(-) with prior probability u) is given by B(w|d) = ( l - w )

Yl

L

<>7o*+w

x:d(x)=ai

where

V,

J2

Ll

^la"

W

x:d(x)=oo

means the summation of x that satisfies d(x) = ak • The

x:rf(a:)=afc

decision procedure that minimizes the expected risk is the optimal procedure. 2.3.

Definitions • Weak-multivariate monotone likelihood ratio Matrix T has a property of weak-multivariate monotone likelihood ratio (weak-MLR) 7 if lix 7ia;' > 0 7ja: TjV

for i < j,x -<x'.

(2)

This is denoted by T £ weak-MLR. Here, the partial order "-<" is defined as x -< x' if Xk < x'k for each k £ {1, • • • , n } . Coherent system A system of n components is coherent if 1) its structure function is increasing, and 2) each component is relevant, as described by Barlow and Proschan 2 . fc-out-of-n system An n-component system that functions if and only if at least k (k < n) of the n components function is called a "A;-out-of-n system." Fig. 1 shows an example of a monitoring system with n monitors. Each monitor has two kinds of outputs, "0" and " 1 . "

135 "0" indicates a normal state, and "1" indicates an abnormal state. As shown in Fig. 1, the monitoring system gives an alarm signal if at least "fc" monitors give outputs "1."

give an alarm signal

IT | system |

at least k (k < n) monitors give outputs "I"

Figure 1. fc-out-of-n system

3. Optimality of fc-out-of-n system Phillips 6 proved that a fc-out-of-n system is preferable to any other coherent systems in the case of independent and identical components. Ansell and Bendell 1 extended this result to dependent and identical components. This research investigates the optimality of a fc-out-of-n system when the components are both dependent and non-identical in the same framework as Phillips 6 and Ansell and Bendell 1. 3.1. n = 3 , X(fe> G { 0 , 1 } Since T has a property of weak-MLR, the optimal policy is given by a monotone procedure, which means the optimal action changes once at most in observation x. Therefore, we consider the coherent systems that are monotone and obtain nine decision procedures as shown in Table 1. In procedure di(x), the system gives an alarm (ai) if at least one monitor outputs " 1 . " Therefore, it is a l-out-of-3 system. Similarly, we can see that d2(x) and ds(x) correspond to 2-out-of-3 and 3-out-of-3 systems, respectively. In procedure d^x) (d'4(x), d'l(x)), the system gives an alarm (a,\) if the first (second, third) monitor or at least two monitors give outputs

136 Table 1.

Decision procedures for coherent systems being monotone (n — 3)

X

(0,0,0)

(0,0,1)

(0,1,0)

(1,0,0)

(0,1,1)

(1,0,1)

(1,1,0)

(1,1,1)

di(cc)

ao ao ao ao

ai ao

ai ao

ai ao

ai ai

ai ai

ai ai

ai ai

ao ao

ao ao ao ao ao

ao ai

ao ao ai ao ao ao ao

ao ai ao ao ao ao ao

ao ai ax

ao ai ai ai ai ao ai

ao ai ai ai ai ai

ai ai ai ai

ao

ai

d.2(x) d.3(x) d${x) d'4(x) d'{(x) ds(x) d'5(x)

d'i(x)

ao ao ao

ai ao ai ai

ai ai

"1." d5, d'5 and dg correspond to the other coherent systems that satisfy the condition of monotone procedure but does not satisfy that of fc-out-of-n systems. For simplicity, we use Bi(ui) instead of B(cj\di) in the following. From Eq. (1) and Table 1, the expected risks corresponding to the nine decision procedures are .Bi(w) = (1 - w)L 0 (l - 7o(o,o,o)) + wLi7 1(0)0 ,o),

(3)

B2(ui) = (1 - w)L0(7o(o,i,i) +7o(i,o,i) +7o(i,i,o) +7o(i,i,i)) B3(CJ)

+ wZi(71(0io,o) + 7i(o,o,i) + 7i(o,i,o) + 7i(i,o,o)),

(4)

= (1 -w)L 0 7o(i,i,i) + w L i ( l - 7 i ( i , i , i ) ) ,

(5)

B4(w) = (1 - w)Lo(7o(i,o,o) +7o(i,o,i) +7o(i,i,o) + 7o(i,i,i) +7o(o,i,i)) + wLi (7i(o,o,o) +7i(o,o,i) +7i(o,i,o)), Bv{u)

(6)

- (1 - w)L0(7o(o,i,o) +7o(o,i,i) +7o(i,o,i) +7o(i,i,o) +7o(i,i,i)) (7)

+ wLi(71(o,0,o) +7i(o,o,i) +7i(i,o,o)), i?4"(w) = (1 - w)L0(7o(o,o,i) +7o(o,i,i) + 7o(i,o,i) +7o(i,i,o) +7o(i,i,i)) + wLi(7i(0,o,o) + 7i(o,i,o) + 7i(i,o,o)), B5(OJ)

(8)

= (1 - w)£0(7o(i,o,i) +7o(i,i,o) +7o(i,i,i)) + wLi(7i(0)0,o) + 7i(o,o,i) + 7i(o,i,o) + 7i(o,i,i) + 7i(i,o,o)),

(9)

£ 5 '(w) = (1 -w)L 0 (7o(o,i,i) +7o(i,i,o) +7o(i,i,i)) + wLi(7i(0,o,o) + 7i(o,o,i) + 7i(o,i,o) + 7i(i,o,o) + 7i(i,o,i)),

(10)

B5»(u>) = (1 - w)L0(7o(o,i,i) +7o(i,o,i) +7o(i,i,i)) + wLi(71(0,o,o) + 7i(o,o,i) + 7i(o,i,o) + 7i(i,o,o) + 7i(i,i,o))-

(H)

For a fc-out-of-3 system to be optimal, the minimum value of Bi(w),

137 min{i?i(w)}, for any w must be Bi(w), B2(OJ) or B3(u>). Here, Bi(w) is the expected risk for a l-out-of-3 system* and B2(UJ) and B3(UJ) are the expected risks for 2-out-of-3 and 3-out-of-3 systems. Let wiBi,B) be the intersection of Bi{u) and BJ(LJ). A sufficient condition that min{i?j(w)} is given by Ui(w), B2(w) or B3(ui) for any u is given by the following inequalities: W(B 2 ,B 5 ) < U(B2,B3)

< ^(BLBS) <

U(B2,B'5)

< U(BUB3)

w

< U(B2,B3)

(B 2 > Bi') <

W

(B2,B 3 ) <

w

w

< U(Bi,Bi)

(Bi,B3) <

w

(B2,B4),

(12)

< W(£(2,B;),

(13)

(B2,Bi) <

(B2,Bi) <

W

W

(B 2 ,B4')-

(14)

Here we used the property of weak-MLR for T. From Eqs. (12), (13) and (14) we derive 7o(o,o,o) 7i(o,o,o)

> _

7o(o,o,i) _ 7o(o,i,o) _ 7o(i,o,o) 7i(o,o,i) 7i(o,i,o) 7i(i,o,o) 7o(o,i,i) _ 7o(i,o,i)_ 7o(i,i,o) ^^ 7o(i,i,i) _ 7i(o,i,i) 7i(i,o,i) 7i(i,i,o) ~ 7i(i,i,i)'

,„,

which is a sufficient condition for the optimality of a fc-out-of-3 system under the assumption T £ weak-MLR. Equation (15) means that the ratios of the symmetric components (see Appendix A) in matrix T are the same. 3.2.

n = 2 , XW G {0,1}

Using the same approach as above, we can derive a sufficient condition for a fc-out-of-2 system to be optimal under the assumption T £ weak-MLR. This condition is given by 7o(o,Q) 7i(o,o)

> _

7o(o,i) 7i(o,i)

=

7o(i,o) > 7o(i,i) 7i(i,o) ~~ 7i(i,i)

. ^

which also means that the ratios of the symmetric components in matrix T are the same. 3.3.

Numerical

Examples

Here we present two numerical examples of optimality for a system with three monitors. • r satisfying condition given by Eq. (15) We assume that matrix T given in Fig. 2 has a property of weak-MLR and satisfies Eq. (15). The symmetric components are

138 indicated by the solid and broken lines to clearly show the number of components that give output " 1 . " The expected risks for each procedure are given by Fig. 4. In this case, the minimum value is given by B3 , B2, and B\ with an increasing ui. Therefore, a fc-out-of-3 system is optimal when observed conditional probability matrix T satisfies the condition given by Eq. (15). T not satisfying condition given by Eq. (15) We assume that matrix T given in Fig. 3 has a property of weakMLR but does not satisfy Eq. (15). In this case, the minimum value is given by JB3, B 5 , B4, and Bi with an increasing u> (see Fig. 5). Therefore, a fc-out-of-3 system is not optimal for some values of u).

(O,O,OXO,O,I)(O,I,O)(O,I,I)(I,O,O)(I,O,I)(I,I,OXI,I,I)

A

(o.o.o )(O,O,I)(O,I,Q)(O.I,I)(I.O,Q)(I,O,I)(I,I,O)(I,I,I)

0.35 |0.15| CU2 [0.061 CL06 E i o ' l l i m ] 0.04 0.06 0.10 0 08 10.0910.04 J0.15|!0.18| 0.30

Figure 2.

Matrix T satisfying Eq. (15)

0.40 0.20

0.01

Figure 3.

0.04

Matrix T not satisfying Eq. (15)

V "'/

•]

/

0.10 10.10! 0.10 0.05||0.04| o.on 0.05 lo.ioi o.io 0.10iio.20JO.4oJ

^~~^^~\

I

tl.S-

1

v

/

/

y

B4 B,

a.*

0.2-

o.tiiH

u.ote

li.eua

u.oi

c.ou

II.SM

o.oie

Figure 4. fc-out-of-3 being optimal

f

0.03

0.D4

Figure 5. fc-out-of-3 not being optimal

4. Discussion 4.1. Relationship

with previous

researches

Murakami 5 and Phillips 6 assumed that the monitor observations were independent. Ansell and Bendell * extended the previous research to the

139 case in which the observations were dependent and identically distributed. That is, the observed conditional probability matrix had the same symmetric components. We studied the case of non-identical components with dependent observations and derived a sufficient condition for the optimality of a fc-out-of-3 system: the ratios of the symmetric components are same. This condition is milder than that derived by Ansell and Bendell : . 4.2.

Counter-example

In the research of Ansell and Bendell x , a fc-out-of-n system was shown to be optimal without the condition of T € weak-MLR. In this subsection, we present a counter-example of the previous research. Assume that the system has two states, 5 £ {0,1}, and it is monitored by two identical monitors which have two outputs. That is, X^ £ {0,1}, (i = 1,2). There are two actions, "ao" and "ai," for the system. Let LQ and L\ be 50 and 100 respectively, where L0 and L\ are the losses of "false alram" and "failure to alarm." We assume the conditional probability matrix of the monitors, (0,0) (0,1) (1,0) (1,1)

5 = 0 / 0 . 1 0.2 0.2 0.5\ S= 1 VO.l 0.15 0.15 0.6/ '

(17)

does not have a property of weak-MLR. We assume that we get a signal "I" if the monitor output is either (0,1) or (1,0). Similarly, we get a signal "II" if the output is (0,0), and a signal "III" if the output is (1,1). From Eq. (17), we obtain the observed conditional matrix, r", for the signals "I," "II" and "III" as (1,0) or (0,1) (0,0) (1,1)

r'=

5 =

° ( °-

2+ 0 2

-

5 = 1 V0.15 +0.15

°-

1

0.1

0-5\

I

II

III

/0.4

0.1

0.5\

0.6J ~ V0.3

0.1

(18)

0.6;'

which has a property of totally positive of order 2 (TP 2 ) 4 . From the result obtained in the research of Jin and Suzuki 3 , the optimal action changes once at most in the order of "I," "II" and "III." Fig.6 illustrates the optimal procedure when I " G TP 2 . The horizontal axis denotes w, and the vertical axis denotes signals, "I," "II" and "III." u>* is the boundary point of a prior probability w where the optimal action changes. In Fig. 6, if w falls on the darkly shadowed part of the horizontal axis, the optimal procedure based on signals "I," "II" and "III" is given as shown in Table 2.

140 w=.0

i

r

(0.0) !

He «j=f>C

•l-

1 1

0,1)

Figure 6.

",

.1 TV

optimal polity for/';-.0.60, 0.<>7) H . ' I W (Bin, (IE)}

Optimal Procedure when T' 6 TP2

Table 2. Optimal Procedure for u <E (0.60,0.67) based on signals "I," "II" and "III" signal monitor output optimal action

III

I (0,1) , ( 0 , 1 )

II (0,0)

(1-1)

do

ai

Table 3. Optimal Procedure for u £ (0.60, 0.67) based on the monitor output (X^ ,X^) monitor output optimal action

(0,0) a\

(0,1), (0,1) ao

(1,1) ai

Then, we obtain the optimal procedure based on the monitor output (X^,X^) as shown in Table 3, and it is not a fc-out-of-n system. Therefore, without the condition F € weak-MLR, the fc-out-of-n system may not be optimal. 5. Conclusion We investigated optimal maintenance monitoring with multiple monitors and observations that are dependent and not identical. A sufficient condition for a fc-out-of-n (n < 3) system to be preferable to any other coherent system was derived. This condition is given by that the ratios of the symmetric components in the observed conditional matrix T are same. Since the sufficient conditions for the cases of n = 2 and n — 3 can be written as Eqs. (15) and (16). under the assumption T € weak-MLR. The same condition is predicted for a general case of n monitors: i < jy

Ijx Ijx

x -< x n

Ifjx

1

i <3, Yl k=l

Xk =

n

5Z x'k fc=l

(19)

141 This condition gives a guiding principle for designing a monitor. For a system with monitors those meet the condition given by Eq. (19), the optimal procedure is given by a fc-out-of-n system. It reduces the number of decision procedures t h a t must be considered, resulting in faster identification of the optimal procedure.

Appendix A. Symmetric Components Suppose there are n monitors, of which the o u t p u t is 0 or 1. outputs, x — ( x i , • • • ,xn) and x' = (x[, • • • ,x'n), if

i > = X>; fc=l

For two

(A.I)

Jb=l

holds, then we can say t h a t x and x' are symmetric components. For example, for two monitors, output x can be either (0,0), (0,1), (1,0) or (1,1). Based on Eq. (A.I), (0,1) and (1,0) are symmetric components. Note t h a t "same symmetric components" means 7o(o,i) = 7o(i,o) > 7i(o,i) = 71(1,0)-

References 1. J. Ansell and A. Bendell: "On the Optimality of fc-out-of-n:G Systems," IEEE Transactions on Reliability, 3 1 , 206-210 (1982). 2. R. E. Barlow and F. Proschan: Statistical Theory of Reliability and Life Testing Probability Models, TO BEGIN WITH, MD (1981). 3. L. Jin, and K. Suzuki: "Necessary and Sufficient Condition for Optimality of Monotone Procedure in Condition Monitoring Maintenance with General Number of Actions," Journal of the Japanese Society for Quality Control, 35, 299-311 (2005). 4. S. Karlin and H. Rubin: "The Theory of Decision Procedure for Distributions with Monotone Likelihood Ratio," Annals of Mathematical Statistics, 27, 272299 (1956). 5. M. Murakami: "Choosing the Optimal Number of Sensors in Safety Monitoring Systems," Proceedings of the Institute of Statistical Mathematics, 37, 1-11 (1989) (in Japanese). 6. M. J. Phillips: "fc-out-of-n:G Systems Are Preferable," IEEE Transactions on Reliability, 29, 166-169 (1980). 7. W. Whitt: "Multivariate Monotone Likelihood Ratio and Uniform Conditional Stochastic Order," Journal of Applied Probability, 19, 695-701 (1982).

REDUNDANCY ALLOCATION PROBLEMS WITH MULTIPLE COMPONENT CHOICES USING SIMULATED ANNEALING ALGORITHM HO-GYUN KIM, CHANG-OK BAE Dept. of Industrial & Management Engineering, Dong-Eui University, 995 Busanjin-gu, Busan, 614-714, Korea

Eomgwangno,

Reliability has been considered as an important design measure in many industrial systems and system designers have made efforts to achieve more reliable system structure. Component reliability and redundancy allocation in each subsystem have mainly been used to improve the system reliability. In the recent highly developed industry, there are many kinds of available components which have different features (e.g. reliability, cost, weight, volume, etc.). In this paper, we consider the redundancy allocation problems with multiple component choices (RAP-CC) subject to some resource constraints. A simulated annealing (SA) algorithm is presented for the problem and several test problems are experimented to show its efficiency and effectiveness. It is found that the SA algorithm gives better solutions than other previous studies within a few seconds of the CPU-time.

1. Introduction Reliability has been considered as an important measure in various industrial systems. One of the most important problems in reliability design is the redundancy allocation problems (RAP) which determine the optimal number of redundant components for only one component employed in each subsystem. In a practical design, there are many kinds of component alternatives which have identical functions with different characteristics such as reliability, cost, volume, weight, etc.. We name the problem the RAP with multiple component choices (RAP-CC) which consider component choices among several alternatives when each subsystem is designed. Several researchers have studied on the RAP-CC. Fyffe et a/.1 used dynamic programming and Yokota et al.2 applied genetic algorithm (GA) to the problem. Considering component mixing, Nakagawa & Miyazaki3 used a surrogate constraints method and Hsieh4 dealt with the problem by using linear approximation method. Ramirez-Marquez et al.5 used the max-min approach as a surrogate which is the first time the component mixing has been addressed using integer programming. In recent years, metaheuristics have been selected and successfully applied to handle this problem. Coit & Smith6,7 used GA and Liang & Smith8 used ant 142

143 colony optimization (ACO) to search the optimal solution for the problem. Kulturel-Konak et al.9 developed a Tabu search algorithm and compared to integer programming and GA solutions. Kuo et a/.10 noted that simulated annealing (SA) has advantages for the application of the complex discrete optimization problems, but not many studies have been executed about the reliability design. There are few studies which used SA to solve the RAP and RAP-CC. Angus & Ames" and Ravi et al}2 applied SA to the RAP. In this paper, an SA algorithm is presented to search the optimal solution to the RAP-CC. Using the same example presented by Fyffe et al}, the algorithm will be evaluated to show its effectiveness. To improve the performance of the SA algorithm, the concept of the NESA (non-equilibrium SA) which can complete the inner loop of SA when several solutions can be obtained and the local search algorithm which can search a better solution are integrated. This paper is organized as follows. In Section 2, the RAP-CC is briefly explained; in Section 3, the concept of proposed SA algorithm and its parameters are described; in Section 4, several numerical examples are solved and discussed. Finally, conclusions and further studies are provided in Section 5. 2. Mathematical Model for the Problem In this paper, we consider a RAP-CC problem. There exist several choices of design alternatives in each subsystem that the system designer can employ in order to maximize the system reliability under cost and weight constraints. In the first place, some notations for the problem are defined as follows: n: the number of subsystems (or components) ki\ the number of component alternatives for the subsystem i qik: the failure rate of the subsystem i when the alternative k is used (= 1 - rik) cik: the cost of the subsystem i when the alternative k is used Wik: the weight of the subsystem i when the alternative k is used C[W]: the available system cost [weight] Rs: the system reliability The decision variablesxik (i = 1,2, . . . , « & k = 1,2, ..., k,) are specified as the number of the redundancy of the alternative k in subsystem i. The problem can be expressed in the following optimization problem.

Max

Rs=tl(l-q^q^-q^)

144

Y£cikxlk
s.t.

(1)

(2) xjk 6 nonnegative integer

(3)

The objective function of the problem is to maximize the system reliability, and Eq.'s (1) and (2) represent respectively the cost and weight constraints where we consider the value of C and W as integers. Eq. (3) defines the decision variables for the problem. 3. Simulated Annealing Algorithm SA, one of the metaheuristics, has been introduced by Kirkpatrick et al.u and Cerny14 as an alternative of the local search and it has been successfully applied to many combinatorial optimization problems. SA is an approach to search the global optimal solution that attempts to avoid entrapment in poor local optima by allowing an occasional uphill move to inferior solutions. This paper presents an SA algorithm to search an optimal solution of the problem. To apply the SA for the various combinatorial optimization problems, the solution representation and the energy function are to be determined and initial solution, initial temperature, cooling rate and stopping criterion are to be initialized. We set several parameters of the SA algorithm. Subsystem 1 0 Component type: 1

0

2

1

Subsystem 2 3

0

...

0

Subsyst t m « 0

2 3 *,=4 1 2 ... k2

0

1

0

1 2

Figure 1. A solution representation for the problem.

3.1. Solution Representation and Initial Solution The solution representation of the problem should contain the redundancy levels and the component choices of each subsystem. Figure 1 shows a solution representation for the problem which has m subsystems. Each subsystem constitutes several digits which are equal to the number of component types and each digit represents the parallel redundancy level. For example, the first

145 subsystem constitutes four types of component, and 2 components of the third type and 1 component of the fourth type are connected in parallel. The initial solution of the problem is initialized by a randomly generated solution, and evaluated by the energy function. The energy function E is the objective function of the problem and its value will be zero if it violates the constraint function. 3.2. Generation Method of a Feasible Neighborhood Solution Efficient generation methods of a feasible neighborhood solution are very important to improve the performance of the SA algorithm. We present a generation method composed of two phases. Firstly, a neighborhood solution is generated in Phase I until a feasible solution can be obtained. If Phase I continues to generate infeasible solutions five times, it goes to Phase II. 1) Phase I: In Phase I, two positions are randomly chosen and their elements are exchanged with each other. And then one additional position is randomly chosen and the element is changed by a random number. Generally, the method which chooses only one position is mainly used, but through the preliminary experiments we found that the convergence speed becomes rapidly decreased as the number of components increases. 2) Phase II: Phase II generates a feasible neighborhood solution with changing the position values which represent the redundancy levels. This method is called the local search algorithm. Let's consider for example a subsystem with 3 types of component alternatives and 3 components can be chosen as a redundancy for the subsystem. Suppose that the first infeasible solution at Phase I is S = {3, 0, 0}. Phase II will evaluate the following solutions: 5 = {2, 1,0}, 5 = {2,0, 1}, S= {1, 2, 0}, S = {1,1, 1}, S = {1, 0, 2}, S = {0, 3, 0}, 5 = {0, 2, 1}, S = {0, 1, 2}, S = {0, 0, 3}. Among all these solutions, a feasible solution which has a maximum objective value will be selected as a feasible neighborhood solution. 3.3. Evaluation for Acceptance of Neighborhood Solution Metropolis et a/.'5 presented the Metropolis criterion to avoid entrapment in poor local optima by allowing an occasional uphill move to inferior solutions. If the energy function value of the neighborhood solution is more than that of the current solution (EN > Ec), the neighborhood solution will replace the current solution. Then compare this neighborhood solution's energy function value with that of the best solution found thus far (EB). If EN > EB, then replace the best solution with EN. Otherwise, if EN < Ec then whether or not to accept the

146 neighborhood solution is determined by the acceptance probability P (A) = exp (-AE I 7), where AE = Ec ~ EN is referred to as the difference between the energy function values of the current solution and the neighborhood solution. 3.4. Initial Temperature The initial temperature of the SA algorithm should be set sufficiently high to accept all transitions during the initial phase of the SA algorithm. However, it is known that very high initial temperatures can cause longer computation time. Most studies which used SA have set up the initial temperature through preliminary experiments, so we determined the initial temperature to be T0 = 100. 3.5. Epoch Length The value of epoch length L represents the number of iterations made at each temperature level. The fixed epoch length which is simply determined as a constant by multiplying a constant y and the neighborhood size is usually used. This method is exploited to reach the equilibrium state for preserving the solution convergence, but it can also cause the epoch length to become too long. Cardoso et a/.16 presented the NESA to improve the convergence speed. The NESA can complete the inner loop of the SA when several solutions can be obtained. Ravi et al.12 successfully applied it to a RAP. In this study, we adopt the concept of the NESA, so the inner loop is terminated when the Q feasible solution is obtained in it. We set the value of L = 20/w and Q = 10m through preliminary experiments. 3.6. Cooling Schedule and Terminating Condition Cooling schedule is one of the generic parameters of SA to decrease the temperature adjusted by its cooling rate a. We used the method of the simple geometric schedule that the next temperature is calculated by Tc = a'Tc^ (C = 1, 2, ...). Generally, cooling rate a is determined between 0.5 and 0.99. The solution quality of the SA algorithm could be improved with slower convergence, when a larger value of a is used. If the new value of Tc is greater than or equal to the stopping value of TF = 1 (if 7c > 7 » then return to generating a feasible neighborhood solution. Otherwise, stop. We determined the value of « = 0.98 for the problem through preliminary experiments.

147

4. Numerical Experiments In this section, some numerical experiments are conducted with the example presented by Fyffe et a/.1 to evaluate the performance of the SA algorithm. A system has 14 subsystems in series and each has three or four component alternatives. The objective is to maximize the system reliability subject to the cost and weight constraints. We test the example ten times with the same condition to grasp the algorithm convergence. The SA algorithm is coded in C++ and the numerical experiments are executed on an IBM-PC compatible with a Pentium IV 3.0GHz. Through a numerical experiment with the example (C = 130, W= 170), the presented SA algorithm found a best solution of the example which can be represented [0030, 200, 0003, 003, 030, 0200, 200, 201, 0020, 021, 101, 4000, 020, 0020] as the solution representation of the SA algorithm. The system reliability that is the value of the objective function of the example is 0.970385. The used resources are C = 122 and W= 170, and the CPU-time to search the optimal solution of the example takes within 1 second. To evaluate the performance of the proposed approach, we experiment a set of test problems. As the several former researches, we transformed the example of Fyffe et al. with W = 165-175 while fixing C = 130. The experimental results for the set of problems are represented in Table 1 and are compared with former studies (N&M: Nakagawa & Miyazaki3, C&S: Coit & Smith7, Hsieh4). It is clear that the CPU-time for all problems is within 1 second. 5. Conclusion This paper considered the redundancy allocation problems with multiple component choices (RAP-CC) and proposed an SA algorithm to search the optimal solution of the problem. The SA algorithm includes the concept of NESA which terminates the inner loop of the algorithm when several solutions are obtained and a local search algorithm. Several numerical experiments were conducted to evaluate the performance of the SA algorithm, and we found that the SA algorithm can find the optimal solution with high convergence speed. Table 1. Numerical results for 10 test problems. N&M

C&S

Hsieh

The SA algorithm

Weight Rs

C

W

Rs

C

W

Rs

C

W

Rs

C

W

175

0.9744

121

174

0.97552

122

175

0.97350

122

175

0.97498

124

175

174

0.9744

121

174

0.97435

123

174

0.97233

120

174

0.97447

125

174

148 173

0,9723

122 173 0.97362

122 173 0.97053

172

0.9720

123

172 0.97266

120 172 0.96923

117

171

0.9700

119 170 0.97186

121

118

170

0.9700

119 170 0.97076

169

0.9675

121

168

0.9666

167

0.9656

166 165

119

173 0.97342

121

173

172 0.97271

124

172

171

0.97127

122

171

120 170 0.96678

119 170 0.97039

122

170

169 0.96992

120 169 0.96561

117

169 0.96930

121

169

120 168 0.96813

119 168 0.96415

118

168 0.96750

120

168

117 167 0.96634

118 167 0.96299

116

167 0.96634

118

167

0.9646

116 166 0.96504

116 166 0.96121

115

166 0.96487

119

166

0.9621

118 165 0.96371

117 165 0.95992

113

165 0.96371

117 165

171 0.96790

We tested the SA algorithm with a few examples. The experiments with more diverse examples and the comparisons with the solutions through other metaheuristics (TS, GA) can be executed. We remain them to be further studied. References 1. D. E. Fyffe, W. W. Hines and N. K. Lee, System reliability allocation and a computational algorithm, IEEE Transactions on Reliability R-17(2), 64-69 (1968). 2. T. Yokota, M. Gen and Y. X. Li, Genetic algorithm for nonlinear mixedinteger programming and its applications, Computers and Industrial Engineering, 30(4), 905-917 (1996). 3. Y. Nakagawa and S. Miyazaki, Surrogate constraints algorithm for reliability optimization problems with two constraints, IEEE Transactions on Reliability R-30(2), 175-180 (1981). 4. Y. C. Hsieh, A linear approximation for redundant reliability problems with multiple component choices, Computers and Industrial Engineering 44, 91103 (2002). 5. J. E. Ramirez-Marquez, D. W. Coit and A. Konak, Redundancy allocation for series-parallel systems using a max-min approach, HE Transactions, 36, 891-898(2004). 6. D. E. Coit and A. E. Smith, Reliability optimization of series-parallel systems using a genetic algorithm, IEEE Transactions on Reliability, 45(2), 254-260(1996). 7. D. E. Coit and A. E. Smith, Penalty guided genetic search for reliability design optimization, Computers and Industrial Engineering, 30(4), 895-904 (1996). 8. Y. C. Liang and A. E. Smith, An ant colony optimization algorithm for the reliability allocation problem (RAP), IEEE Transactions on Reliability, 53(3), 417-423(2004).

149 9. S. Kulturel-Konak, A. E. Smith and D.W. Coit, Efficiently solving the redundancy allocation problem using tabu search, HE Transactions, 35, 515526 (2003). 10. W. Kuo, V. R. Prasad, F. A. Tillman and C. L. Hwang, Optimal Reliability Design: Fundamentals and Applications, Cambridge University Press, Cambridge, (2001). 11. J. E. Angus and K. Ames, A simulated annealing algorithm for system cost minimization subject to reliability constraints, Communications in Statistics: Simulation and Computation, 26(2), 783-790 (1997). 12. V. Ravi, B. S. N. Muty and P. J. Reddy, Nonequilibrium simulated annealing-algorithm applied to reliability optimization of complex systems, IEEE Transactions on Reliability 46(2), 233-239 (1997). 13. S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Optimization by simulated annealing, Science 220, 671-679 (1983). 14. V. Cerny, Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm, Journal of Optimization Theory and Applications 45, 41-51 (1985). 15. N. Metropolis, A. W. Rosenbluth and M. N. Rosenbluth, Equation of state calculations by fast computing machines, Journal of Chemical Physics, 21, 10-16(1953). 16. M. F. Cardoso, R. L. Salcedo and S. F. de Azevedo, Nonequilibrium simulated annealing: a faster approach to combinatorial minimization, Industrial Engineering Chemical Research 33, 1908-1918 (1994).

GLOBALLY SOLVING THE REDUNDANCY ALLOCATION PROBLEM FOR THE CASE OF SERIES-PARALLEL SYSTEMS JAE-HWAN KIM Mathematical Information Science, Korea Maritime Univ., Dong-Sam Busan, 606-791, Korea

International

JONG-SEOK KIM Trade and Economics, Korea Maritime Univ., Dong-Sam Busan, 606-791, Korea

Dong

Dong

This paper presents a global optimization method for solving the series-parallel redundancy allocation problems of determining the optimal number of redundant components in order to maximize the system reliability subject to multiple resource restrictions. Our problems are nonlinear integer programming problems that are in general hard to deal with. We transform them into binary integer programming problems and find global optimal solutions using GAMS. Computational results show that tabu search among heuristic methods is a powerful heuristic one. It is believed that our approach would be a very useful tool to evaluate the efficiency of heuristic methods for the moderate size of problems such as Fyffe et al.[Y\.

1. Introduction In this article we globally solve the Redundancy Allocation Problem(RAP) for the particular case of series- parallel systems. It is found that as long as the size of the problem is moderate, global optimal solutions can be always obtained. Solutions for the series-parallel RAP have been suggested by several authors. Fyffe et al. [1] originally set up the problem and suggested a solution algorithm utilizing a dynamic programming approach. Coit and Smith [13] extended Fyffe problem in such a way that the parallel system could be more flexible, and employed the genetic algorithm to obtain solutions. KulturelKonak et al. [16] developed a taboo search algorithm(TSRAP) and showed that TSRAP may produce better solutions in most cases of the test problems they set up than the previous method. Also, Kulturel-Konak et al. [17] proposed more efficient TS(Tabu Search) than TSRAP by exploiting TS memory. It should be noted, however, that the solution algorithm the above authors adopted are searching for local solutions only by its very nature. It is known that their solutions are not global optimal ones. 150

151 We develop an alternative algorithm which can overcome the shortcomings of the previous approaches. We tackle the same test problems as in Coit and Smith [13] and show that global solutions can be obtained for each and every test for this purpose in mind problem. We transform them into BIP(Binary Integer Programming) problems and find global optimal solutions using GAMS. The article proceeds as follows. In the following section the typical optimization problem of a series parallel system is presented. Section 3 shows how the problem can be transformed into a BIP problem and it also proves that the transformed problem can be globally solved using the usual computer software GAMS. In Section 4, we illustrate an example to show how our solution algorithm can be applied. We also provide global solutions for each of the test problems of Coit and Smith [13] and compare our solutions with GA [13], TSRAP [16], and TS [17]. The final section summarizes our results and suggests a future work in this field. 2. The Problem Formulations The reliability of a system can be increased by properly allocating redun-dancies to subsystems under various resource and technological constraints. Generally, the RAP is to determine the optimal number of redundant components in order to maximize the system reliability subject to multiple resource restrictions or system-level constraints on cost, weight, power, etc. The RAPs are useful designs largely assembled and manufactured to have very high reliability requirements. Most electronic systems are in this category. The problem is usually formulated as a nonlinear integer problem which is in general difficult to solve due to the considerable amount of computational effort required to find an exact optimal solution(Chem [11]). Therefore, various heuristic methods have been delveloped(Glover [7, 8, 9], Glover and Laguna [14], Kuo and Prasad [15]). The RAPs are mainly divided into the following 5 categories(Tillman [3]): • • • • •

Series system Parallel system Series-parallel System Parallel-series System Complex system

In this paper, we deal with the RAP of the third category which is called the series-parallel system. A series-parallel RAP, was originally formulated by

152

Fyffe et al.[\], and was solved by the exact methods which, in the such as dynamic programming(Fyfee et al. [1], Nakagawa and Miyazaki [5]), the integer programming(Ghare and Taylor [2], Bulfin and Liu [6], Misra and Sharma [10], Gen et al. [12]), and the mixed-integer and nonlinear programming(Tillman et al. [A]). Nakagawa and Miyazaki [5] developed 33 variations of the Fyffe problem, where the weight constraint varies its value from 159 to 191. They constrain the solution space so that only identical component type can be allowed for each subsystem. On the contrary, Coit and Smith [13] extended the problem in such a way that the solution allows a mixing of component types within a subsystem. They solved this problem using a genetic algorithm. We set up the following case of the series-parallel RAP as formulated by Coit and Smith [9]. m,

s

(P1)

Maximize R(x)=

fl

0

_

II ?? )

1=1

subject to

s

mi

;=i

j=\

y=l

,=i j=\

Y.Xy >1

for i = l,...,s,

xve{0,l,2,...,U}. where, R(x) =

system reliability depending on X .

X

( X n ,Xn,

=

... , X l n J | ,X2i

,X22,-.->

^2,m2 ' • ' • '

T s,m, > •

X

Xjj = quantity of the j th available component used in subsystem / . mi = number of available components for subsystem i. S = number of subsystems. C,W = system-level constraint limits for cost and weight. c u ' Wij' Qij = cost > we ig nt > a n d unreliability for the j th available component for subsystem i. U = maximum level number of Xy. Kulturel-Konak et al. ([16], [17]) developed TS algorithms for (PI) and showed that they may produce better solutions in most case of the test problems

153 than the previous method. However, the solutions obtained by the above methods do not guarantee global ones, and the exact methods corresponding to (PI) have not been developed yet. A primary purpose of this paper is to develop the solution method to find global optimal solutions to (PI). 3. The Solution Method The problem (PI) is a nonlinear integer problem that is in general hard to deal with. Ghare and Taylor [2] proved that it is possible to transform the series system RAP into a BIP problem. To solve (PI) easily, we can also show that our problem of series and parallel system can be transformed into a similar BIP one. Our problem (PI) is transformed into the following BIP problem: (P2) Maximize

;=l *,=o

kmi=0

subject to s

U

i=l *,=0 s

U

U

m,

k„t=0

j=\

U

m,

IZ-- ECCW**.* ;=1 *,=0

u

km.=0

where

-

kmi

*w,

(kit...,kmi)*o,

u

Z'-'Z-^a, yt,kl

j=\

*mj. = 1

^

i=

, * m , = 0 o r l for 1 = 1,2,..., J , rtJCi

kmi

l,2r~,s,(ki,-,km,)*0, (&lv..,£mj)*0,

= log(l-q% x ^ x - x q ^ ) ,

(*„...,*„,) * 0 . An advantage of the transformation is that it facilitates the problem solving. Moreover as long as the number of arguments is moderate, it makes it possible to find the global optimal solution. GAMS is alternatively useful software that can be used to solve the BIP problems. However, this approach has the shortcoming that the normal computer cannot find solutions if the number of arguments becomes bigger than we treat in this paper.

154

4. An Example and Results The (PI) applied to Fyffe et al.[l] problem(s =14, C=130, W =174) is as follows: 14

m,

J } ( 1 - J\qxj

Maximize R(x) =

)

14 m,

2£^x^l30, (=1 7=1 14 m,

subject to

,=i

y=i

J > , >1 for/ = 1,...,14, 7=1

xv e{0, 1,2,...,5}. The (P2) corresponding to the above problem can be represented as the following BIP problem: Maximize K[X)=

^0,0,0,1

X

^1,0,0,0,1 "*"

+

r

i4,5,5,5,5

X

^14,5,5,5,5

subject to c

u

x

^1,0,0,0,1 + 2 x c 1 4 x .y lj0>0)0j 2 +

5x(c l 4 1 +c 142 + c14>3 +c 14i4 )x^ 14|5AW £130, W|4 X ^,,0,0,0,1 + 2 X W, 4 X ylfifiA2

+

5x(w14il + wH2 + wMi3 + w l 4 4 ) x y u ^ w < 174, ^l.O.O.O.l

+

^14,0,0,0,1

y\,0,0,0,2

+

+

^14,0,0,0,1

>"l,0,0,0,l> ^ 1 , 0 , 0 , 0 , 2 >

"*" 3^1,5,5,5,5

+

+

=

*•>

-

^14,5,5,5,5

'^14,5,5,5,5

=

^

*•> OV

1'

SO

SO

SO

l/i

sO

to

o• u

sO

0\ W UJ LA

OO

so

t*J LA

so

VO NJ VO —

t>

~J -

LA

O W

L A

OO

N» W

W

tO

O so -0 — so tO «

ON

O

vO -O, O ^J ON O

SO -^a •— tO

^0 ^ UJ

so -J U> O tO s o U) CO tO

O o

IO

so ^ OJ O

sO

O SO

O so

o

IO vo

so

O O tO

—

p so OO

—

LA —

so OO

so OO

sO so 4*

so OO

p

o

— ^J Os

so

ON

_ CO

o

o

o

co

— —

so 00 Os

p o o

— Os

so 00 ON

O v o s o LA O N ON so 4^ CO to _ — to ON —

vO OO LA SO K> tO

v O S O s O s O S O ^ O s O s o s o s O s O S O S O S O s O S O s O V O S O s O S D ^ s o S O S O s o s o s O s D S D OO OO OO L A O N O N O N O N O N O N O S 00 so O — — tO tO LA LA ON ON SO _ t O U i L A O N O O s o O — 4^. LA tO O LA tO SO U> SO 4^ 0O — ON to SO ON N> tO tO K> -o to — — OO ' O O t O t O t O t O L A L A — CO tO ON —

o

OO

so CO LA OJ

O S LA U) ^J

p o o p

LA O LA

so OO

o

s O ^ o s o s O s O s o v o s o CO OO OO to UJ LA LA LA ON U) —

s O S O s o s O s O s o s CO O SOOO sCOO sCOo s o s OO O s o s 0o0 s OO vO O — — tO ro u> 4 * . -fc. 4*. ON so s o LA 00 SO tO — LA so o ^-1 tO 00 Os 4*. L A O -J OO ON

o p o p p o p p o p p

oo

o o o o o o o o o o o o o

o

L A L A L A L A O N O N O N O N O N O N O N (A OO vO ON O0 so ^4 O ON OO 4^. u> LA OO 4^ Ul to

vo

f^ *{

tO

so —i O

sO

t

?) V^

tO

o o o © v© VO VO vo CTN CT\ Ov C7\

vO

sO VO sO

p p p

so OO tO

p

NO LA ^ LO

p

sO LA OS sO

p

sO LA Ui Ov

p

o

v o 00 — vO

o

O

OO — —

—

v O s o s O v O s O s O s O s o s O s O s o s o s O v O s O s O ' O v O v O ^ O v O s o ^ O OO L A L A O N O N O N O N O N O N O N LA OO sO ON GO sO O — to U> -J OO *0 ^O I-O t>j — ^J Ov O — OO ON ON L*J L*> U> tO ON ON ON K>

O s O C O - O . O N L A 4 * U > t O

Q

•^j

H CO

> -a

70

CO

H

00

cr

#

CO

H

D.

>T3

»

>

C/l

H

o

B*

a >

U)

1

—'

>

o 3 •o oft lobi

156 where

1o,o,o,i = l o g ( l - q°u x q°n x q^ x q\A), and j > 1 0 > 0 0 , = 1

t o X | | — X|2

=

X j j = U,

Xj^ =

equals

1.

The above problem can be easily solved by GAMS in IBM RS/6000P mainframes and it takes 0.32 CPU second. We find the global solution of

R(\) X

l\

=0.974926, x I3 = 3,x21 = 2,x 34 = 3,x 43 = 3,x52 = 3, ~

X

73

=

L>XK\

=

^

> X91

~ ^

'

-"-10,2

=

*10,3

=

^

'

*U,1

=

x62 = 2, *U,3

=

^

'

x 1 2 , = 4 , x 13>2 = 2 , x 1 4 3 = 2 for this problem, and TSRAP[16] fails to obtain the same global one. Total number of binary variables required in this case are 9490 ((C/ + l) m | +.... + (C/ + 1)"* -s). The global solutions for 33 test problems (Ovaries from 191-159) are given in Table 1. Table 1 shows that GA [13] produces poor solutions in most cases of 33 test problems, and TSRAP [16] cannot find the global solutions in 7 cases of 33 test cases. TS [17], however, obtains the global optimal solutions for all 33 cases. It shows that TS [17] is more efficient for solving the seriesparallel RAP than TSRAP [16]. 5. Conclusions In this paper, we find that the RAP for seies and parallel system as formulated by Fyffe et al. [1] and Coit & Smith [13] can be solved in a global way using BIP transformation of the given optimization problem. We also compare our solution results with those of GA [13], TSRAP [16], and TS [17]. As a result, GA produces poor solutions in most cases of 33 test problems. We notice that TSRAP [16] cannot find the global solution in 7 of the 33 test cases, and TS [17] finds the global optimal solutions for all 33 cases. It shows that TS [ 17] is a powerful heuristic method for solving the series-parallel RAP. The approach we took has the strength of finding the global solution. Therefore, we expect that our approach would be a very useful tool to evaluate the efficiency of heuristic methods for the moderate size of problems such as Fyffe et al. [1] problem. It, however, has the shortcoming that the normal computer cannot find global solutions if the number of arguments becomes very bigger than we treat in this paper. Further research may be needed to overcome such a problem. Acknowledgements This research was conducted while Jae-Hwan Kim was a visiting schalar in the Mechanical and Industrial Engineering Department at University of Illinois at

157 Urbana Champaign. He would like to thank Prof. Sahinidis for valuable discussions during the research. References 1. D. E. Fyffe, W. W. Hines and N. K. Lee, IEEE Trans. On Rel.VI, 14 (1968). 2. M. Ghare and R. E. Taylor, Oper. Res. 17, 838 (1969). 3. F.A. Tillman, C.L. Hwang and W. Kuo, IEEE Trans. On Rel. 26, 148 (1977). 4. F.A. Tillman, C.L. Hwang and W. Kuo, IEEE Trans. On Rel. 26, 162 (1977). 5. Y. Nakagawa and S. Miyazaki, IEEE Trans. On Rel. 30, 175 (1981). 6. R.L. Bulfin and C.Y. Liu, IEEE Trans. On Rel. 34, 241 (1985). 7. F. Glover, Com. & Oper. Res., 13, 533 (1986). 8. F. Glover, ORSA Jour. Of Com. 1, 190 (1989). 9. F. Glover, ORSA Jour. Of Com. 2,4 (1990). 10. K.B. Misra and U. Sharma, IEEE Trans. On Rel. 40, 81 (1991). 11. M.S. Chern, Oper. Res. Letters, 11, 309 (1992). 12. M. Gen, K. Ida, Y. Tsujimura, and C.E. Kim, Com. & Indu. Eng. 24, 539 (1993). 13. D.W. Coit and A.E. Smith, IEEE Trans. On Rel. 45, 254 (1996). 14. F. Glover and M. Laguna, Tabu Search, Kluwer, London (1997). 15. W. Kuo and V.R. Prasad, IEEE Trans. On Rel. 49, 176 (2000). 16. S. Kulturel-Konak, A. E. Smith and D. W. Coit, HE Trans. 35, 515 (2003). 17 S. Kulturel-Konak, B. A. Norman, D. W. Coit and A. E. Smith, INFORMS Jour. On Computing.16, 241 (2004).

RELIABILITY ALLOCATION WITH OR WITHOUT COST CONSIDERATION GWO-LUH LEE Department of Industrial Management, Vanung University Chung-Li city, Tao-Yuan, Taiwan 320, R.O.C. In this paper, a new approach based on the effort minimization algorithm and disjoint products algorithm is derived. Different cost functions containing parameters can be applied to work with the concept of "choice factors". Even so, the cost factor may not take into account as a function if such data is not available. The choice factor is defined as the ratio of rate of increase of effort to reliability importance for each component. The derived effort minimization algorithm can also be applied to any system structure that has identical or non-identical components. The results show that this algorithm can be a simple and efficient tool for aiding reliability engineers during the design phase of a product1.

1. Introduction The process of assigning reliability requirements to individual components or subsystems to attain the desired system reliability is known as reliability allocation. Suppose that a system has not achieved a specified reliability target during the design phase. In order to meet this target, one should apportion resources to improve component reliability in a minimum total resource allocation. Usually we have to increase complexity of current systems and the necessity to consider multiple constraints such as cost, weight, volume, and component obstruction among others. The qualification of resources is regularly given in terms of effort functions which are as realistic and unrestrictive as feasible [1], However in this paper, we only consider the way to improve the reliabilities of the constituent components of a system without changing its structure. The cost is then the only resource or factor we have to deal with as a result of the unchanging system's structure. Several well-known reliability allocation methods, i.e., ARINC, AGREE and minimum effort method, were mentioned [1,2], but most of them require all the components to be improved, thus requiring a comparatively large total effort. However, in the minimum effort method, the reliability of only those components which have a relative low reliability value is increased. 158

159

The minimum effort method assumes the availability of a suitable effort function which should satisfy certain properties [1]. Several effort functions reported in the literature were studied to examine whether they satisfy all these requirements and it was declared that only the following function is suitable [2]: f

a-\n[(\-x)/(}-y)];0<x0

[ 0

(1)

;x>y

However, there are still lots of cost functions which are suitable to satisfy all these requirements for the effort minimization method. Yalaoui et al. [3] considered the Truelove function and the Tillman function. Elegbede et al. [4] and Ravi et al. [5] defined the cost of each component as: (2)

C/(/0 = a,--exp 1-A

where pi is the reliability of the each component of a system, at and bt are positive constants. In addition, Mettas [6] proposed a general cost function of each component as: ', ip,\fl>PiMn»

i,min Pi.max ) = CXp ( l - ^ ) - * +_ ^ ri,max ri

(3)

where p, is the reliability of the each component of a system, OSs/i^ 1 is the feasibility of increasing a component's reliability, piimln is the initial (current) reliability value of the t component, and piimax is the maximum achievable reliability of the /* component. These functions in [3-6] are very different to the function (1). For that matter, the basic effort minimization algorithm may not work well on these cost functions, neither. In this paper, several mentioned cost functions are considered and two of them are studied. Since we only consider the way to improve the reliabilities of the constituent components of a system without changing its structure, the concept of choice factor [2] is still applied. The derived effort minimization algorithm can be applied to any system structure that has identical or nonidentical components. A computer program is developed to calculate a system reliability using disjoint products algorithm and to solve the reliability improvement problem at the same time.

160 2. The Basic Effort Minimization Algorithm For the system reliability evaluation, a disjoint products algorithm was developed to obtain a simplified reliability expression. For the determination of optimum reliability allocation and improvement in a general non-series-parallel system, the central idea of the method is to successively improve the components starting with a component having the lowest value of choice factor. The choice factor p{ [2] is defined as the ratio of rate of increase of effort to reliability importance for a component i:

A =w(o,/>,v # j / / ,

(4)

where G,(0,/>,) is the effort function and /, is the importance factor defined therein as: Ii=dR{p)ldpi

(5)

where R(p) is the reliability expression for the system. Following the steps for an optimal solution in [1], p\, pi,..., pn is derived first for the n components and ordered as pw^p^2)^---^P(ny Then, according to the central idea of the method, this policy would have been most optimal if the first "A" components will improve by forcing p^= p(2)=.. .= p(k)^ p(k+\)^... ^P(„y In this case, however, the equations being non-linear, it becomes default to force the equality in a numerical method. Therefore, an approximation optimal is asked if the first "k" components are improved only to meet the goal under/9(i)^p ( 2)^ . . . ^ yO(k)^P(k+i)^---^P(n), where I /fyy/fy) I is smaller than a given deviation and ij^k. Nevertheless, a new cost function being different from (1) may cause the basic minimization algorithm going wrong. So we need a modified algorithm to deal with the more complicated choice factor /?, and the formed nonlinear programming problem in an efficient way. 3. The Derived Algorithm As mentioned above, the cost is the only factor we have to deal with during the component reliability improving process. Our purpose is to gain the optimal allocation of cost to satisfy the goal requirement of system's reliability. A simple way to decide which component has to be improved is depend on the ratio of rate of increase of cost to reliability importance. That is we can make our "choice factor" as:

&=[&,/dpty[dR{p)idp?i

(6)

161 In this case, we may understand how much the system reliability will be improved and how much the cost is needed in order to increase any small unit of the component reliability/?,. Generally, equation (6) will be a non-linear function. Therefore most of the literature formulated complicated models first and their resolution approaches were mainly heuristic methods which did not guarantee to obtain the global solutions. However, in this paper, we apply an optimal solution policy based on the basic effort minimization method. The steps for an optimal solution can be summarized into two phases as follows: Phase 1— (i) Find all paths (from source to sink) of the network, (ii) Create all disjoint vectors of the network. (iii) Derive the reliability expression R(p) by disjoint products algorithm, (iv) Give the system reliability goal Rs, all components' initial reliabilities, and all necessary parameters and constants. Phase 2 (v) If system reliability goal is met, stop the process for the obtained optimal solution, (vi) Derive pu pi,..., p„ for n components and order them as p^^p^^... ^/?(„). (vii) Increase pm by a small value Ap. (viii) If all pi can not increase anymore then stop the process for there is no solution, otherwise go to step (v). This procedure is simple and efficient to be achieved because we may keep away the difficulty from the non-linear calculating. Each time only pm has to be increasing step by step with a small increments. The procedure will become more efficient if we follow the two stages: First, we may set a larger ^p (e.g. 0.01) to obtain the system's rough "optimal solution" and each "final/?," quickly if they exist. Secondly, based on "the valuable information", we can then set a smaller ^p (e.g. 0.0001) to solve for the optimal solution and those final/>,s again under required accuracy. 4. Numerical Experiment We proposed in this section several examples for the derived algorithm in different cost functions and systems mentioned in the above literature.

162

4.1. The Bridge Structure with Cost Function in Eq. (2) The well-known bridge structure is shown in Fig. 1 with the element reliabilities as p/,p2, •••, and/?j. Following Phase 1 of the derived algorithm, the reliability expression for this network is given by: R{p)=PiP2+piq2p}pAqiP3P<&Piq2q3P4Pi+qiP2P3q4P5

(7)

Source

Fig. 1: Reliability Block Diagram for the system of Section 4.1-4.3

Let the component reliabilities and their respective values of constants a, and bi are respectively as [2,5]: p,=0.70, p2=0.60, p3=0.60, p4=0.80, p5=0.50, and a,=l and frf=0.0003, for j=l,...,5. And the system reliability goal Rs\s 0.99. The numerical value of the system reliability for the present component reliabilities is 0.754. Suppose that the related cost function of the system is in the form of (2), the corresponding choice factors, p,$, can be generated as: pl = 0.0003Gxp[0.0003/ql]/[qf(p2

+ q2p,pA - p}pA + q2qzp4p5 - p2p3q4p5)] ( 8 )

p2 = 0.0003exp[0.0003/q2]l[q\(/?, -p,p}p4

-plqJpApi

+ qlP3q4P5)]

(9)

p3 = 0.0003exp[0.0003/qi]/[ql(plq2p4+qlp4-p1q2p4p5+q1p2q4ps)]

(10)

p4 =0.0003exp[0.0003/?4]/[942G?,92/?3 + qtp3 + p^q2qzp5 -qtP2PiP5)]

(H)

ps = 0.0003 exp[0.0003 / qs Vlqliptf&p,

(12)

+ qlp2Piq<)]

The global optimal solutions are finally calculated as: pt= p2= pj= P4=0.92>A9 and pf=Q.192. The minimum of the system cost is 5.01992 (which was reported 5.01993 in [5]). 4.2. The Bridge Structure with Cost Function in Eq. (3) With the same bridge structure shown in Fig. 1, we have the same system reliability expression as in (7). Let the component reliabilities have the same initial values as in section 4.1. and their respective values offh plmin and pimax are respectively as fj=0.9, /?,m,„=0.8 and pimm=0.999, for *'=1,...,5. And the system reliability goal Rs is still 0.99.

163 The derivatives of the cost function (3) with respect to individual component reliability, dci I dpt, are: dc, — L = exp dPi

Pi ~ Pi /w'.max

\, Jri

Pi.*** ~Pi

Pi ~ Pi.mm

(13)

(Pi.m^-Pi)

forz=l,...5. The corresponding new choice factors can be generated then. Although their expressions are complex and neglected here, fortunately it is easy to program the differential calculus in the Mathematica Software. We only have to give Eq. (3) and Eq. (7) for generating Eq. (6) directly, so the whole calculating procedures are very simple. The global optimal solutions are finally calculated as: /?;= pf= pj= p^=0.9344 and pj=0.8121. The minimum of the system cost is 5.93161. Compared the results with those in section 4.1., we may know that the deviations are caused by the different cost functions. 4.3. The Bridge Structure Without Cost Function Empirical forms for the cost function may be derived based on past data, or models may be fitted on cost data obtained from the development phase of the product. Nevertheless, we may have no such data to make a cost function or model. In this case, the derived algorithm in our paper still can do the optimal allocation about the reliability improvements. Now we may set c,(p,) as: ci(pl) = cpi

(14)

That is, the new choice factor p, will be: Pi=c/[dR(p)/dPi]

(15)

where c is a constant for all component i. This means that we only have to figure out the most influential components in a system. Those can increase the system's reliability more efficient then the others when we give the same (or equivalence) effort to them in the system. The whole solving procedure is the same as those in 4.1. and 4.2. The optimal policy for this example is thus to change pj from 0.6 to 0.6905 and p4 from 0.8 to 0.833 without changing the remaining components, i.e., 1, 2, and 5. 4.4. Another Complex System with Cost Function in Eq. (3) This problem is taken from Mettas [6]. The block diagram for this problem is

164 in Fig. 2. All components have the same initial reliability of 0.9 and the same feasibility value of 0.9. The system reliability goal of 0.9 is sought. Following Phase 1 of the derived algorithm, the reliability expression for this network is given by: R(p)= p,p2psP7 + Piq2PiPsP7 PlP2q3P4qsP6p7

+ pmq3P4P(P7 + PMPMsPcP? + PiP2P3qsPeP7 (16)

According to the information in [6], the initial component reliabilities pt and their respective values of / and /?,,„,„ are respectively as: p,=0.9, ^=0.9 and /?,m,„=0.8, for z-l,...,7. With the system reliability goal Rs= 0.99, the results for cases pimax=0.999 and/?,>!ai=0.96 for /=1,...,7 are calculated. The results with those reported by Mettas [6] are compared and summarized in Table 1. Although the results are quite similar, there are some unreasonable outcomes in the Mettas' report. For example, both R(p)s are less than the system goal 0.9. Also, at />,,mox=0.999 case, p$ and p6 cannot be increased to /7j=f>f=0.9002 because their Ps=prz 10.3472 . are still greater than Pi=pf=6.061. That is why the system cost 7.263251 in [6] is great than the one 7.263098 in our paper, even the system reliability of the former is less than the latter.

Source

Fig. 2: Reliability Block Diagram for the system of Section 4.4 Table 1: Summary table for the system of Fig. 2

Final p, Final p2 Final p3 Final p4 Final p5 Final pc Final py Final K(p) System Cost

Reported by Mettas f6] P,>„,=0.999 /W=0.96 0.9547 0.9515 0.9 0.9029 0.9 0.9144 0.9 0.9029 0.9002 0.9346 0.9002 0.9346 0.9547 0.9515 O.&'OTJ:'. 0.899998 7.263251 . . . 8 99y873

Results here ;W=0.999 /W=0.96 0.954722 0.95155 0.9 0.90275 0.9 0.9143 0.9 0.90275 0.9 0.93455 0.9 0.93455 0.954722 0.95155 0900000 0.900005 9.013302 7.263098

165 5. Conclusions In this paper, we use a modified minimum effort algorithm to allocate the reliability values in the system reliability improvement problem. A computer program based on the optimal allocation procedures and the disjoint product algorithm is developed on the Mathematica software. In the computer program, the only thing we need to do is inserting the data about (i) possible paths of the network, (ii) each component's reliability, (iii) the reliability of the final goal, and (iv) one suitable cost function with their relative parameters. Then, the system reliability allocation and improvement will be done in the minimum cost (or effort). However, we even don't need a cost function here if the relative parameters are in an unknown situation. The demonstrated cases show that the approach is very flexible. The derived algorithm can be applied to any system structure that has identical or non-identical components. The parameters of the proposed cost functions or even the cost functions themselves can be altered. With the step by step incremental approach, nonlinear functions are no longer problems during our solving procedures. The exact optimal allocation solutions can be obtained instead of those from heuristic methods. The results also show that this algorithm can be a simple and efficient tool for aiding reliability engineers during the design phase of a product. References 1.

2.

3.

4.

5.

6.

Dale, C.J. and Winterbottom, A., Optimal Allocation of Effort to Improve System Reliability, IEEE Transactions on Reliability R-35(2), 188-191 (1986). Aggarwal, K.K. and Guha, S., Reliability Allocation in a General System with Non-identical Components — a Practical Approach, Microelectronics and Reliability 33(8), 1089-1093 (1993). Yalaoui, A., Chu C , and Chatelet E., Reliability Allocation Problem in a Series-parallel System, Reliability Engineering and System Safety 90, 5561 (2005). Elegbede, A.O.C. and Chu C , Adjallah, K.H., and Yalaoui, F., Reliability Allocation through Cost Minimization, IEEE Transactions on Reliability 52(1), 106-111 (2003). Ravi, V., Murty, S.N., and Reddy, P.J., Nonequilibrium Simulated Annealing-algorithm Applied to Reliability Optimization of Complex Systems, IEEE Transactions on Reliability 46, 233-239 (1997). Mettas A., Reliability Allocation and Optimization for Complex Systems, Proceedings Annual Reliability and Maintainability Symposium, 216-221 (2000).

A MULTIOBJECTIVE GENETIC ALGORITHM FOR SOLVING RELIABILITY OPTIMIZATION PROBLEM OF A COMMUNICATION NETWORK SYSTEM M1NORU MUKUDA, YASUHIRO TSUJIMURA Computer and Information Engineering, Nippon Institute of Technology, Miyashiro , Saitama, 345-8501, Japan In this paper,first,we formulate a reliability optimization problem in which unavailability of a communication network system is minimized, and next, we propose a multiobjective Genetic Algorithm (mo-GA) for solving the reliability optimization problem. In order to obtain high efficiency of searching non-dominated solutions, we combine the Improved Saving Pareto solutions Strategy (ISPS) and the adoptive Local Search (LS) with the multiobjective Genetic Algorithm. Through some numerical experiments, we evaluate the reliability optimization problem of a communication network system, and show effectiveness of the multiobjective Genetic Algorithm.

1. Introduction Generally, a network-typed information communication system is designed in consideration of appropriate load with high system reliability and low system construction cost. Such systems are mainly designed by experiences of designers in many cases. In this paper, we first formulate the optimal design problem of the network system as a multiobjective optimization problem, and develop a multiobjective Genetic Algorithm (mo-GA) to solve the problem. The optimal design problem of a communication network system we treat here is formulated as a multiobjective optimization problem in which the system unavailability and the system construction cost are minimized, simultaneously. The multiobjective Genetic Algorithm proposed here for solving the reliability optimization problem of a communication network system is combined with the Improved Saving Pareto solutions Strategy (ISPS) and the adoptive Local Search (LS) in order to obtain high efficiency of searching non-dominated solutions as many as possible. A mo-GA has ability of searching for Pareto solutions, which naturally inherits the feature of multipoint search of GA. ISPS is basically same as the traditional Saving Pareto solution Strategy, however, selecting solution candidates in ISPS is deferent from SPS. ISPS stores up current Pareto solutions. 166

167

And, versatility of the solutions is maintained by evaluation using sharing. Simultaneously, because the number of solution candidates increases, execution time becomes large. Therefore, the candidates are discarded by using a certain rule. Through some numerical experiments, we evaluate the reliability optimization problem of a communication network system, and show effectiveness of the multiobjective Genetic Algorithm. 2. Network communication system and its reliability optimization model A network-topological communication system [1] approximately models a real fiber optic communication system (e.g. Fig.l). In our mathematical formulation, we use both system unavailability [2] and system construction cost as objective functions. In this paper, we consider the system (two-terminal) unavailability between the point A and B in Fig.l as a typical case. The unavailability is calculated according to logical structure of a fault tree for the system failure [4]. SubSys

SubSy s

SubSys

2

3

4

Point A

SubSys 5

SubSys 7

SubSys

SubSys IB

SubSys 8 SubSys

SubSys 6

9 SubSys

SubSys

11

10

-

1

SubSys 12

SubSys

SubSys

13

14

O

o

Fig.l An example of network scheme of communication system

The system unavailability t / ( m , g ) is using logical AND constituted from one to 4 item. Here, m and g are sets of the backup lines and number of the fibers for cable, respectively. The number of subsystems q is 15. C ( m , g ) is the system construction cost, and its details can be referred in [1][3]. The top event (7) of fault tree for system failure is logically expressed by the following structure function. T = 5, + (B]0 + (B,3 + Bu) • (S„ + (5, 2 • (B9 + (B6 + B8) (£ 4 +5 5+J B 6 +S 7 ))))) (1) • (B2 + fl3 + (fi7 + (B6 + Ba) • (B9 + (B{2 • (Bu + Bn + fl,4)))) Here, • (5 4 + Bs + (B6 • (fi8 + B9 + (Bn • (B7 + Bn + BXJ))))) obtained from the logical structure expressed by function (1).

168 V mathematical model of the network communication system is as min £/(m,g) = C/| + max{£/2,£/10} + max{£/3,£/l0} + max{£/2,£/11,t/l3} + max{t/2,t/11,t/14} + max {£/„,£/, ,£/,„} +max{C/5,[/7,t/10} + max{U6,U9,Ul0} + max{U3,Ull,Ul3} + max {LA,, Ux i, UH} + max {U6, Un, Un} + max{U6,Ul2,UH} + max {U6, U-,, Ui, Uu} + max {U4, C/8, U9, Ul0} + max{Ui,Ui,U9,Ul0} + max{U6,Ul0,Uu,Ul2} + max{U2,U9,Ut2,Uli} + max{U2,U9,Ul2,U[4} + max {C/3, U9, Ut2, £/,„} + max {U4, U1, Un, Un} + max {U5, U1, U,,, Un} + max {U6, U9, U{,, Un} + max{U4,U%,Un,Un} + max{Ui,Ui,Un,Un}

(2)

+ max{U4,U1,UluUX4} + max{Us,U1,Un,U]4} + max {U6, U9, £/,,, Uu} + max {U4, £/8, t/12, C/l4} + max{C/5,[/8,t/12,C/,4} + max{t/3,t/9,[/12,C/13} UjaUjfaj.gj) (j = = l,2,-,fl) l,2,-,q) *tf,0»„*,) (/ min

C(m,g) =

fjCj(mJ,gJ)

Cj (mJ,gj) = CTj (ntj ,gj) + CRj (rrij ,gj) + CFj {rrij ,gj) + C0J (mj,(3)g,) s.t.

UU J(m J(m j,g j,g j)<W j)<W j J m) < rrij < m"

(j = l,2,-,q) g) < g, < g[

Notations: Uj{mj,gj) and Uj : unavailability of subsystem/ C/rrij.g) : cost of subsystem/ Crj(mj,gj): cost of terminal in subsystem/ CRj{mj,gj) : cost of repeater subsystem/ C/ry(/W;,g;): cost of fiber subsystem/ Cojinipgj) '• cost of overhead subsystem7 Wj: limitation value (unavailability) of subsystem/ mj: backup fiber lines of subsystem/ gj: number of fibers for cable in subsystem,/ mj' and mj1: upper and lower range of backup lines g / and gjU : upper and lower range of fiber lines Uaj: demand line of unavailability of subsystem/

169 Uy : line of unavailability of subsystem./ Uw : unavailability of protect switch Uu: unavailability of waiting unit Uc: unavailability of fiber line Uf: unavailability of route, construction method, and hostile conditions Up : unavailability of power supply U,: unavailability of line system Nr: number of repeaters rij: demand fiber lines of subsystem y Umux : unavailability of send side card Udux : unavailability of recive side card Uu : unavailability of send card Urx: unavailability of recive card where Uj(mj,gJ) = Uaj(mJ,gJ)-UbJ(mj,gJ) i=\

(4)

m

j

u

* K -8 j) = I I ( ^ (mJ -Sj) + Uw+Uu)k

UZJ(mJ,gj) = U,+NrUr+(Nr+2)Us

u,=utt+u„+umux+udia Us=\-{\-Up){\-Uc)

Uc=\-f{{\-Uf) 3. Multiobjective Genetic Algorithm We propose the mo-GA for solving a multiobjective optimization problem. Local Search by an adaptive scheme employs the directive gene method. We employ Improved Saving Pareto solution Strategy as the selection operation. Evolution simulation of mo-GA materializes several thousands generations. 3.1. GA Implementation (1) Representation A gene assumes the number of backup lines in a subsystem and the number of core lines of a subsystem. A chromosome consists of 2q genes. v*=[mt,gt] mk =[mk],...,mkj,...,mkq]

(k = \,2,---,popSize) (j = \,2,...,q)

8* ~l8k\>---'8lg>--->8kqi

(5)

170 where popSize : size of a population k: index of chromosome in a population j : index of subsystem q: numbers of subsystems V k : chromosome of individual k mkj : number of backup lines of subsystem7 in individual k gk, : number of cores lines of subsystem^' in individual k (cores/cable) (2) Crossover We employ the uniform crossover operation. The crossover probability is experimentally set to 0.5. (3) Mutation The uniform mutation is adapted here. A gene takes integer value and it can be changed with a range of [ mj1, mf] and [ gjU, gf1]. 3.2. Improved Saving Pareto Solutions Strategy (ISPS) The Improved Saving Pareto solution Strategy consists of two stages. It saves the candidate Pareto solutions, and from the candidate Pareto solutions, elites are chosen to construct a chromosome population for the next generation. Fig. 2 depicts the flow diagram of one generation in the evolution process: Stage 1 is the first selection, i.e., updating candidate Pareto solutions shown in Fig.2-(2). Stage 2 is the second selection shown in Fig.2-(3). ISPS consists of the following two stages. population

(1) Genetic operations

offspring

Crossover Mutation and Local search (3) Selection

1 ~ -

Stage 2 V

(2) Update Candidate Pareto solutions Stage 1 1

'viutivii

- .

(4) Create Pareto solution set

"

Candidate Pareto solutions

Fig.2 Flow diagram of one generation

[Stage 11 Update candidate Pareto solutions (first selection) This updating stage is segmented into three phases: early phase, increment phas e, and convergence phase. Step 1: All the Pareto solution candidates' degree of rank is calculated by orderi ng individuals in ascending order of values of the evaluation function. Step 2: The number of the Pareto solution candidates performs three kinds of up

171

dating. Procedure: Update candidate Pareto solutions if sol < Jh then The rank = 1 individual is added to the candidate Pareto solutions. if sol >fh and sol < mh then The rank = 1 and 2 individuals are s aved. if sol > mh then The rank = 1 individual is saved. end where sol: Number of candidate Pareto solutions. sh : Number of the thresholds for increment decision(exp:300). mh : Number of the thresholds for convergence decision(exp:650). Here, the conventional SPS is only use the Pareto solutions at the current genera tion in Stage 1. [Stage 2] Selection (Elite selection from candidate Pareto solutions: second selection) The fitness value ek of individual k is calculated by equation (6). Here, evaliy k) is the evaluation function value of individual k calculated by using t he weighted sum method. Sk is a sharing degree of individual k. pcSize is the n umbers of candidate Pareto solutions. ek =

1-

evaliy,) —

(k = 1,2,.. .,pcSize)

(6)

3

k

Where

evaliy k) = w,U? + w2Cf _Uk(mk,gk)-UmiD

LTStd k

ctd k

T rmax

=

j rmin

Ck(mk,gk)-Cmm ^max

^-*min

W\ , w2: weights for objective functions JJ mm jymax, m m j m u m a n ( j m a x i m u m value of unavailability from the first generation. C m m , C m a x : minimum and maximum value of cost from the first generation Selection based on elitism is employed. It selects popSize individuals from ca ndidate Pareto solution set according to fitness value.

172 [Sharing degree] The calculation of the sharing degree sk is shown. pcSize

Sk = £ F ( r f ( x t , x , ) ) (k = 1,2,...,pcSize)

{ besides it where, d(xk,Xj)

(7)

is 0

: distance of the individual k andy , X t , x . : individual

D : sharing radius, pcSize : numbers of candidate Pareto solution k, j : index of individual 3.3. Adaptive Local Search (LS) The adoptive Local Search (LS) intensively searches for a neighborhood of a Pareto solution using adaptive scheme. The adaptive scheme uses the sharing degree, and operates LS by the variation rate of a solution. The Directivity Gene method [1] is adopted in LS. Next, we explain the procedure of LS with an adaptive scheme. Here, we set the sharing rate 20% experimentally. Step 1: For all individuals in the candidate Pareto solution set, calculate the sharing degree Sk . Step 2: Adopt LS according to the following role. Procedure: Adoptive Local Search pcSize

^ (I S =1 _*=! < o.2, Sk = \ . t h e n don't adopt LS, pcSize ' [O otherwise else Select one individual with Sk = l from the candidate set randomly as the neighborhood. This adaptive LS is used the Directive Gene method. end if

4. Numerical experiments To verify the propriety of the bi-objective reliability optimization model formula ted in section 2, and to demonstrate the efficiency of the mo-GA proposed in the previous section, we discuss results of some numerical experiments. Here, for a 11 of the numerical experiments, we used a PC with Pentium4 CPU, and its clock frequency is 3.0GHz.

173 4.1. Design Value of a Fiber-optic Communication System Tables 1 and 2 are design data of a test model depicted in Fig. 1. The design data is referred from Fiber Optic Communications Design Handbook [3]. Note that to only cable unit prices, we assigned original data. Table 1 Subsystem (FO-system) design data (1 of 2)

2

i

/

5

7

6

8

50 67

100 167

100 95

50

233

100 121

91

118

684

684

684

684

684

684

684

0.01

0.01

0.01

0.01

0.01

0.01

0.01

100

0,

131

r,

684

«-;

0.01

"J

4

3

3UU

50

Refer to the document [3] for UmuXt Uaux, Utt, U^ Ur UWi Up value, mj" ,mjU is

backup fiber lines of subsystem, lower and upper limits. Set to land 10. g / ,gjU is numbers of core (fiber) per cable in subsystem, lower and upper limits. Set to 8 and 64. Table2 Subsystem (FO-system) design data (2 of 2)

Vr

l.St

10 200 59 684

11 100 172 684

12 100 200 684

13 200 189 684

14 200 176 684

15, 100 57 684

It)

IMI|

0.01

0.01

0.01

0.01

0.01

0.01

/ ",

A

y

5n " 1

£>.,: Length of a communication line in subsystem (kilo meters) W/. Limitation value of subsystem j Uf. Unavailability of route, construction method, and hostile conditions nf. Demand fiber lines

4.2. Experimental Conditions and Results We verify the performance of the proposed mo-GA through the following two st eps: Experiment 1 evaluates the efficiency of the ISPS, i.e., it compares perform ance of ISPS with conventional SPS without LS. Experiment 2 evaluates the effi ciency of the adapted LS, i.e., it compares the proposed mo-GA with two anothe r mo-GAs: mo-GA without LS and mo-GA with uncontrolled LS without adapti ve scheme. [Result of Experiment 1\ In Experiment 1, in order to clarify the efficiency of the saving strategy, LS is n ot used. We employ Arch area rate [1] and Cover rate [1] as rate scaling for Pare to solutions. Ranges of Arch area rate and Cover rate are 0.0 - 1.0. Here, 1.0 is the best for both rates.

174

Mutation -MIL'

_ ..

03 0.6

SI'S

Table 3 Comparison of SPS and ISPS with non-T.S Arch aiea Co\ei No of Genujlions MilPmelo Kite 0 597022 0.797 3000 156

ISPS

0 615372

0.941

657

3000

24.466

SPS

0 631478

0.933

318

3000

18.925

ISPS

"638255

0.967

956

3000

85.160

CiA l\DC_...

Pioiessing tunc isct t 11.429

[Result of experiment 2] Experiment 2 evaluates the efficiency of the adaptive scheme and LS. Munition Kile

0.3

0.6

r. — \. ivn,. V,-L

Arch aiea ule

Covet rale

No ot I'dictu solutions

Generations

Processing lime (see)

non-1.S

0.615372

0.941

657

3000

38.959

ftill-LS

0.638901

0.969

866

3000

57.870

jdp-LS

0.624103

0.943

779

3000

43.251

iion-l.S

0.638255

0.967

956

3000

78.524

iiill-LS

0.650575

0.975

1293

3000

137.671

adp-LS

0.647151

0.970

1033

3000

85.160

non-LS: GA without LS full-LS: GA with uncontrolled LS adp-LS: GA with controlled LS by the adaptive scheme

4.3.

Discussions

Comparison by Arch area rate of Experiment 1 is shown in Table 3. ISPS is und erstood where search efficiency is very good as compared with the conventional SPS. In the case of mutation rate 0.3 and 0.6, ISPS has high Arch area rate and t he search ability can be superior. In addition, it is Pareto solution in which Arch area rate is high, and is near to an ideal solution. In SPS, the increase in a solutio n candidate is slow and the capability to search for various solutions is weak. Th e dependability of an initial individual is high, and although many solutions can be obtained like ISPS, there are few good solutions. ISPS has little dependence on an initial individual, is stabilized and can obtain various solutions. From this, it can be said to be a fundamental technique of mo-GA. The result of Experiment 2 is shown in Table 4. The technique, which controls LS by an adaptive scheme, has an effect in shortening searching time, aintaining the effect of LS. In Table 4, when the experimental result of a mutation rate 0.6 i s compared, adp-LS and full-LS have a high value of the rate of arch area comp ared with non-LS. Fig.3 shows the behavior of adoptive LS. The left vertical axis is the number of candidate Pareto solutions and the right vertical axis is the frequency

175

of activating LS. The horizontal axis is generation. LS dose not work in early generations. It works from middle generations. This behavior is the most typical feature of ISPS. 1800 1600

1200 1000

ijcyng;

m-m-m

candidate Pareto solutions —a—frequency of acti\«ting LS

1400

E^H) -

800 600

L

400 200 0

ITllllllIl>

0

500

1000

1500

2000

Z500

3000

3500

4000

4500

5000

generation Fig.3 Behavior of adaptive LS

5.

Conclusions

In this paper, hybrid mo-GA is proposed as a solving method of the multiobjecti ve reliability optimization design problem. By adapting the mo-GA to the multio bjective reliability optimization design problem, its effectiveness was verified b y means of a numerical experiment. The Improved Saving Pareto solution Strate gy (ISPS) can be said to be a basic functionality of the mo-GA, and contribute to acquisition of great effect in searching Pareto solutions. References 1. M. Mukuda, Study on Optimal Design for System Reliability by Multiobjective Genetic Algorithms, Doctorial Dissertation, Graduate School of IPS, WASEDA University, (2005). 2. W. Kuo, V. R. Prasad, F. Tillman and C. L. Hwang, Optimization Reliability Design: Fundamentals and Applications, Cambridge University Press, (2001). 3. R. J. Hoss, Fiber Optic Communications Design Handbook, Prentice Hall, Englewood Cliffs, New Jersey, (1990). 4. Japan Society for Fuzzy Theory edzt: Fuzzy and Soft Computing Handbook, p.514, Tokyo, Kyoritsu Shuppan Co., Ltd., (2000).

GENETIC ALGORITHM FOR SOLVING OPTIMAL COMPONENT ARRANGEMENT PROBLEM OF CIRCULAR CONSECUTIVE-tf-OUT-OF-AT: F SYSTEM KOJI SHINGYOCHI Department of Social Science and Computer Science, Jumonji University, Sugasawa Niiza-shi, Saitama 352-8510, Japan

2-1-28

HISASHI YAMAMOTO Department of System Design, Tokyo Metropolitan University, 6-6 Hino-shi, Tokyo 191-0065, Japan

Asahigaoka

YASUfflRO TSUJIMURA, YASUSHI KAMBAYASHI Department of Computer and Information Engineering, Nippon Institute of Technology, 4-1 Gakuendai, Miyashiro-machi, Minamisaitama-gun, Saitama 345-8501, Japan

A circular consecutive-£-out-of-«: F system consists of n components arranged along a circular path. This system fails if no less than k consecutive components fail. One of the most important problems for this system is to obtain the optimal component arrangement that maximizes the system reliability. In order to obtain the exact solution for this problem, one needs to calculate n! system reliabilities. As n becomes large, however, the amount of calculation would be intolerably large. In this paper, we propose two kinds of genetic algorithm to obtain the quasi optimal solution for this problem within a reasonable computing time. One employs Grefenstette's direct ordinal representation scheme. The other employs special ordinal representation scheme we have developed. The latter scheme eliminates arrangements with same system reliability produced by rotation and/or reversal of certain arrangements. In addition to that, we have improved the scheme to produce only arrangements that allocate components with low failure probabilities at every A-th position, because system reliabilities of such arrangements should be high. We compared their performance and demonstrated the advantage of the scheme we have developed through numerical experiments.

1. Introduction A circular consecutive-&-out-of-w: F system consists of n components arranged along a circular path. This system fails if no less than k consecutive components fail. Chang, Cui and Hwang have described many practical examples of such a system [1]. One of the most important problems for this system is to obtain the optimal component arrangement that maximizes the system reliability. In order to obtain the exact solution for this problem, we need to calculate n\ system reliabilities. As the number of components n increases, however, the computing time would 176

177 be intolerably long, and it prevents us from obtaining the solution within a reasonable computing time even by using high-performance computer. The Genetic Algorithm (GA) is one of the evolutionary computation methods that can provide the quasi optimal solution within a reasonable computing time. Therefore, we can consider GA as a powerful tool for solving combinatorial optimization problem such as we are facing now. In this paper, we propose two kinds of GAs for solving optimal arrangement problem of a circular consecutive-&-out-of-w: F system. The first GA employs Grefenstette's direct ordinal representation scheme. The second GA employs improved ordinal representation scheme we have developed. The scheme eliminates many arrangements produced by rotation and/or reversal of certain arrangements. In addition to that, this scheme produces only arrangements that allocate components with low failure probability at every A:-th position. As compared them in the results of the numerical experiments, it is observed that the second GA provides better quasi optimal arrangements than what the first one provides for the most cases. We set the following assumptions that: 1. 2. 3.

each component as well as the system takes only two states, either working or failure, the probability of occurrence of component failure is known and statistically independent, and the components can be re-arranged without causing any system troubles.

2. Optimal Arrangement Problem in Circular Consecutive-A-out-of-w: F System Let Z, be a binary variable that takes 1 if component allocated at the i-th position fails and otherwise takes 0. Then the reliability of a circular consecutive-A:-out-of-«: F system can be expressed as Pr{f){Za = \,i
+ k-\y}

(1)

where z, = Z,.„ for / = n, n+\,..., n+k-l. We define notations as follows: <7,: failure probability of component /, where ql>q2>--->qn, Q-Q = (gi,q2,-,qn), P: permutation of n different integers from 1 to n, where P = (P(l), P(2), ..., P(«)), S„: the set of all permutations Ps D„:D„= {(xux2, ...,x„)\ (xux2,...,x„)eR„, l>x, >x 2 >•••>*„ >0 }. For a given permutation P, system reliability can be determined if components are arranged according to P, i.e. P(i') is regarded as the number of component allocated at position i, when a failure probability vector Q is given.

178 Thus, in this paper, we call a permutation P an arrangement P. We define R(P,Q) as the reliability of a circular consecutive-£-out-of-«: F system, when Q is given and all components are arranged according to an arrangement P. Thus we can define the optimal arrangement problem in a consecutive-&out-of-«: F system as the problem that finding the optimal arrangement P*, where R(P*,Q) = max R(P,Q). (2) P e i'„

Generally, the optimal arrangement P* depends on component failure probabilities. It is known, however, that for some cases, the optimal arrangement P* does not depend on the values of component failure probabilities but on the ranks of component failure probabilities. Such an optimal arrangement is called as invariant optimal arrangement. Malon [4] defined invariant optimal arrangement as an arrangement P* such that R(P*,Q) = max i?(P,Q) , for V Q e D„. (3) P e Sn

Table 1 shows the invariant optimal arrangements of the circular consecutive-£-out-of-«: F system (Kuo and Zuo[3]). In this paper, we consider only the case of 2 < k < n-2, where no invariant optimal arrangement exist. Table 1. Invariant Optimal Arrangements for Circular Consecutive-A-out-of-n: F System. k=\ any arrangement k=2 (l,n-l,3,n-3,... ,n-2,2,ri) 2
3. Encoding and Decoding Procedures for Optimal Arrangement Problem In this section, we introduce Grefenstette's direct ordinal representation scheme, and propose improved ordinal representation scheme for quasi optimal arrangements of a circular consecutive-&-out-of-«: F system. 3.1. Grefenstette's Direct Ordinal Representation scheme Grefenstette's direct ordinal representation scheme was originally developed for solving the Traveling Salesman Problem (Grefenstette et al. [5]). The advantage of this scheme is that the candidates of solution correspond to the chromosome with one to one mapping and the classical crossover operator generates only the candidates of solutions without special process. For solving the Traveling Salesman Problem, this scheme employs two list structures namely Tour List and Free List. Free List has the cities we have yet to visit. Tour List has the positions of the cities we are going to visit in the Free List in the visiting order. If we consider Free List as the set of the component

179 numbers we have yet to allocate, we can regard Tour List as the chromosome G to which we encode arrangement P. Thus, this scheme can be easily applied to GA for solving the optimal arrangement problem of a circular consecutive-Aout-of-w: F system. We call the GA with Grefenstette's direct ordinal representation scheme GCGA (Grefenstette Circular Genetic Algorithm) in this paper. We describe the encoding and decoding procedures that use this scheme below. Encoding Procedure The arrangement P = (P(l), P(2),..., P(«)) is encoded into the chromosome G = (g\,g2,.~, gn) in the following steps. Step 1 Initialization of Free List C Set Free List C = {1,2,...,«}. Step 2 Encoding P(i) into g; for / = 1,2, ... , n Set the position of P(/) in Free List C to g„ and then, remove assigned numbers P(z') from Free List C. Decoding Procedure The chromosome G = (gi, g 2 ,..., g„) is decoded into the arrangement P = (P(l), P(2),..., P(«)) in the following steps. Step 1 Initialization of Free List C Set Free List C = {1,2,...,«}. Step 2 Decoding g, into P(z') for i = 1, 2,..., n Set the number of g r th position in Free List C into P(7), and then, remove assigned numbers P(z') from Free List C. 3.2. The Improved Ordinal Representation scheme Generally, for any arrangements in a circular consecutive-/c-out-of-«: F system, it is obvious that the following properties hold. Property 1 For /' = 2, 3 , . . . , n R((¥(\), P(2), P(3),..., P(w)),Q) = /?((P(i), P(i + 1),..., P(#0,P(1),.., P ( i -

D),Q). Property 2 K((P(1), P(2), P(3),..., P(n)),Q) = R((P(l), P(«), P(«-l),... ,P(3), P(2)),Q). Property 1 states that the reliability of an arrangement is equal to the arrangement produced by rotation, and Property 2 states that the reliability of an arrangement is equal to the arrangement produced by reversal. Taking

180 advantages of these properties, we can restrict the calculation of the system reliabilities of only the arrangements where P(l) = n and P(2) < P(«). In addition to that, arrangements that allocate components with low failure probabilities at every A-th position should have high system reliabilities, because system failure can not occur if components arranged at every k-th position work. Therefore, we propose the ordinal representation scheme that allocates components with low failure probabilities (components with large number) at every k-th position in both clockwise and counterclockwise direction from position 1. We call these k-th positions "key positions" in this paper. In denoting by h the number of key positions of a circular consecutive-Aout-of-«: F system, h = [(n-l)/k], where [] is the floor function. We denote by K = {ku k2,... , kh) the set of key positions where _ f l + (/ + l)-/t/2 (if/isodd) [l + n - i • k 12 (if / is even) for / = 1,2,..., h. Based on the idea we have just mentioned above, we have developed an improved arrangement scheme. In this scheme, components are allocated as follows: 1. 2. 3. 4.

set P(l) = n. This eliminates the redundant arrangements produced by rotation. allocate h components oi ah lowest failure probability components (except for componentri)to key positions. We set a = 2. allocate the rest components to the remaining positions except for positions 1, 2 and n. allocate the component with higher failure probability (component with smaller number) of the two rest components to position 2, and allocate the other component to position n. Then we can eliminate arrangements produced by reversal.

We call GA with the above scheme ICGA (Improved Circular Genetic Algorithm) in this paper. We describe the encoding and decoding procedures that use this scheme below. Encoding Procedure The arrangement P = (P(l), P(2),..., P(«)) is encoded into the chromosome G = ig\, gi, - , gn-3) in the following steps. Step 1 Initialization of Free List C Set Free List C = {n-2h, n-2h+\,..., n-\}. Step 2 Encoding V(k,) into g/ for / = 1, 2,..., h

181 Similar to Grefenstette's direct ordinal representation scheme, encode P(kj), the component numbers arranged at key positions, to g„ and then, remove assigned numbers P(^) from Free List C. Step 3 Revising Free List C Add {1, 2,..., n-2h-l} to Free List C. Step 4 Encoding into g/ for / = h+l, h+2,..., n-3 Similar to Grefenstette's direct ordinal representation scheme, encode the component numbers arranged at the positions except for 1, 2, n and key positions into g, with clockwise direction, and then, remove assigned numbers from Free List C Note that the two numbers left in Free List C after completion of this procedure, and the number n need not be encoded. Decoding Procedure The chromosome G = (gi, g 2 ,..., g„-3) is decoded into the arrangement P = (P(l), P(2),..., P(/?)) in the following steps. Step 1 Assign the number n to P(l) Step 2 Initialization of Free List C Set Free List C = {n-2h, n-2h+\,...,

n-1).

Step 3 Decoding g, into P(/t,) for i= 1,2, ... ,h Similar to Grefenstette's direct ordinal representation scheme, decode g; into P(kj), and then, remove assigned numbers P(&,) from Free List C. Step 4 Revising of Free List C Add {1, 2,..., H-2/2-1} to Free List C. Step 5 Decoding g, for i = h+l, h+2,..., n-1 Similar to Grefenstette's direct ordinal representation scheme, decode gt into the component numbers arranged at the positions except for 1, 2, n and key positions with clockwise direction, and then, remove assigned component numbers from Free List C. Step 6 Assign to P(2) and P(«) Assign smaller number left in Free List C to P(2), and the other number to P(«). 4. Numerical Experiments In this section, we compare the performance of GCGA and ICGA by numerical experiments. In both GAs, we use the system reliability, which we calculated with a recursive formula (Hwang [2]), as a measure of fitness. We set the number of chromosomes in one generation 20, and the number of the last generation 500.

182 4.1. Outline of Numerical Experiments For specifying the optimal arrangement problem, we define "the component rate of low failure probability" as the number of components with low failure probability divided by the number of entire components in the system. We denote this rate by r. For example, when n = 30 and the number of components with low failure probability is 6, the component rate of low failure probability r = 6/30 =0.2. According to the range of failure probabilities, we consider five kinds of problems and name each of them pattern 1 through 5 as shown in Table 2. Their failure probabilities are selected in the uniform random numbers in the given ranges. We prepare five problems for each combination of n = 20, 30, 40, 50, r = 0.0, 0.1,..., 1.0 and k = 3, 4, 5, 6, 7, 8 in each pattern, that is, 5 x 4 x 1 1 x 6 x 5 problems. Then we solve their problems ten times by using ICGA and GCGA with the best tuned parameter set selected by preliminary experiments. Table 2. Range of Failure Probabilities for Each Pattern. Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 5

Low 0.001 -0.100 0.100-0.199 0.200-0.299 0.300-0.399 0.400 - 0.499

High 0.900-0.999 0.800-0.899 0.700-0.799 0.600-0.699 0.500-0.599

4.2. Analysis of the Result of Numerical Experiments We have used a Pentium4(2.0GHz) computer with l.OGBytes of RAM to obtain the solution. The average of computing time that of using GCGA and ICGA for a same problem was less than one second, even in the case of «=50. We have compared the performances of ICGA and GCGA by the average of errors. We define the average of errors as the difference between the system reliability of each solution and the maximum system reliability obtained by using ICGA and GCGA for the same problem. Table 3 shows the average of errors of ICGA and GCGA and the difference (GCGA - ICGA) of them. Except for the case of r = 0.9, the differences of the average of errors are greater than 0, i.e. ICGA can obtain better solution than GCGA. The superiority of ICGA can be observed especially in the following special cases when 1. 2. 3. 4.

n is large, k is close to 5, r is close to 0.2, and, the difference between high component failure probabilities and low component failure probabilities are large.

183 Table 3. Comparison of the Average of Errors.

overall

k

rttern

r

a.

20 30 40 50 3 4 5 6 7 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4 5

GCGA 0.00513 0.00060 0.00327 0.00685 0.00980 0.00649 0.00699 0.01047 0.00327 0.00201 0.00153 0.00008 0.00218 0.01396 0.01023 0.01163 0.00934 0.00492 0.00235 0.00120 0.00043 0.00010 0.01039 0.00806 0.00477 0.00204 0.00038

ICGA 0.00133 0.00055 0.00103 0.00166 0.00207 0.00269 0.00215 0.00136 0.00079 0.00057 0.00039 0.00005 0.00046 0.00054 0.00156 0.00282 0.00345 0.00224 0.00183 0.00114 0.00047 0.00003 0.00141 0.00251 0.00179 0.00080 0.00012

GCGA-ICGA 0.00380 0.00005 0.00224 0.00519 0.00774 0.00380 0.00484 0.00911 0.00249 0.00144 0.00114 0.00003 0.00171 0.01342 0.00867 0.00881 0.00589 0.00268 0.00052 0.00006 -0.00003 0.00007 0.00898 0.00555 0.00298 0.00125 0.00026

We can conclude that ICGA has better performance because ICGA reduces the number of arrangements. The rate of the number of arrangements in GCGA to that in ICGA is given in equation (5). The cases of 1 and 2 mentioned above, i.e. n is large and k is close to five, make the rate in equation (5) large. From this observation, we can conclude that the search domain reduction causes performance improvement. the number of arrangemerts in GCGA n\ 2x«!x/z! the number of arrangemerts in ICGA (2h)U{n -h-\)\ (2h)\x(n - h -1)! hU2\ (5) 2xn!x[(tt-l)/fe]! ~ (2[(#I-1)/*])!X(#I-[(M-1)/*]-1)!

The differences of the average of errors reach the peak at nr - h = 1 as shown in Figure 1. In this case, the number of low failure probability components is equal to the number of positions, to which low failure probability components are allocated by ICGA, i.e. key positions and position 1. If nr- h is equal to 1, r approaches to 0.2 (the case 3) as k approaches to 5. In the cases of

184

3 and 4, the system reliabilities of the arrangements by using ICGA are higher than those of other arrangements. 0.050 0.040 0.030 0.020 0.010 -

•' • -o.ooo • -20

-10

0

10

20

30

40

50

nr - h Figure 1. Difference of the Average Errors for each nr-h.

5. Summary In this paper, we propose two kinds of GAs for solving optimal arrangement problem of a circular consecutive-fc-out-of-n: F system. The first GA, we call GCGA, employs Grefenstette's direct ordinal representation scheme. The second GA, we call ICGA, employs special ordinal representation scheme we have improved in the way that: 1. 2.

eliminate many arrangements with equal system reliability produced by rotation and/or reversal of certain arrangements, and, allocate components with low failure probabilities at every k-th position.

As comparing ICGA and GCGA by the results of the numerical experiments, we have observed that ICGA produces better quasi optimal arrangement for the systems than GCGA for the most cases. The cause of the superiority of ICGA performance can be considered that search domain reduction improves searching ability. We can also point out that ICGA considers only arrangements with relatively high system reliabilities. References 1. 2. 3. 4. 5.

G.J. Chang, L. Cui and F. Hwang, Kluwer Academic Publishers, Dordrecht, Network Theory and Applications 4, (2000). F. K. Hwang, IEEE Trans, on Reliability R31, 447 (1982). W. Kuo and M. J. Zuo, Optimal Reliability Modeling. Principles and Applications, John Wiley & Sons Inc., Hoboken, New Jersey, (2002). D. M. Malon, IEEE Trans, on Reliability R33, 414 (1984). J. J. Grefenstette, R. Gopal, B. Rosmaita and D. Van Gucht, Proc. of the 1st International Conference on Genetic Algorithm, 160 (1985).

REDUNDANCY OPTIMIZATION IN MULTI-LEVEL SYSTEM WITH SA ALGORITHM WON YOUNG YUN Department of Industrial Engineering, Pusan National University, 30 Geumjeong-Gu, Bitsan, 609-735, Korea

Changjeon-Dong,

IL HAN CHUNG Technical Research Institute, Rotem Company 462-18, Uiwang, Gyunggi-Do, 449-910,

Sam-Dong,

Korea

HO GYUN KIM Department of Industrial and management Engineering, Dong-Eui University, Gu, Busan, 614-714, Korea

Busanjin-

Single-level systems have been considered in redundancy allocation problems. While this method may be the best policy in some specific situations, it is not the best policy in general. In regards to reliability, it is most effective to duplicate the lowest level objects, because parallel-series systems are more reliable than series-parallel systems, but the redundancy cost can be higher than in modular redundancy. In this paper, redundancy is considered at all levels in a series system, and a mixed integer programming model is summarized. A SA algorithm is considered to solve the problem and some examples are studied.

1. Introduction Redundancy allocation is an efficient method for improving reliability. In the literature, a wide variety of problems have been formulated and a large number of techniques have been proposed. Many models have been used for various system structures such as series, network, k-out-of-n, and other systems. Dynamic programming [14], the Lagrange multiplier [11], the heuristic approach [1, 2, 7, 9, 10, 12, 13], and the genetic algorithm [4, 5, 6] have been used with these systems. For a list of recent papers, see the review paper by Kuo [8]. However, in most papers, the objects of redundancy allocation in most studies have been limited to single levels because of the well-known design 185

186 principle that redundancy at the component level is more effective than redundancy at the system level. This is true for some specific assumptions, but Boland and EL-Neweihi showed that this is not true for redundancy with nonidentical spare parts [3]. The purpose of this article is to maximize system reliability by allocating redundancy using the constraints on cost, volume, weight and other variables. Modular redundancy with identical spare parts is addressed by this article. Modular redundancy can be more effective than component redundancy because in modular systems, duplicating a module composed of several components can be easier, and require less time and skill, than duplicating each component. Therefore the lower the level the redundant item is and the more spare parts added, the higher the cost of redundancy may be. Yun and Kim [15] considered this problem and proposed a genetic algorithm. In this paper, we consider the same model as Yun and Kim [15] and propose a SA algorithm. The system illustrated in Fig. 1 is an example of system structure. The system has three sub-systems(A, B, C). Sub-system A and B has two components and sub-system C has a module and a component. Papers related to redundancy optimization problems consider only the lowest components to be candidates for redundancy. However in this paper, module, subsystems, and the system itself are candidates for redundancy.

J.

<*) 4> <*>

5*S><s^i*F<s> ©©

©^)®S)^© ©©

A. The existing problem

B. New Problem

El

Q

: Targets for redundancy in system structures

Fig. 1. System hierarchical structures in the existing problem and the new problem

Before discussing the proposed model further, some concepts will be defined. The term 'unit' is used as a common name for a system, subsystem, module, and component. From S to each component, there is a path and the set with all units on the path is defined as a path set. In Fig. 1, (S-A-Ai), (S-A-A2), (S-B-B,), (S-B-B2), (S-C-d-C,,), (S-C-Ci-Cu), and (S-C-C2) are path sets. We consider two units, Q and Z that are included in same path set. If Q. is in a level lower than Z, then Z is an ancestor unit of Q, and CI is an offspring unit of Z. If

187

Q is one level lower than Z, then Z is the parent unit of Q, and D is the child of Z. For example, in Fig.l, both S and A are ancestor units of unit A1; but only A is a parent unit of unit A). If the parent units of Q and Z are the same, the Z is a sibling unit of Q. If the levels of Q and Z are equal, then Z is a cousin unit of Q. In Fig.l, (A2, B,, B2, Q , C2) are cousin units and A2 is a sibling unit of A,. Notation j f : a set of ancestor units of unit/' Rs: system reliability R,: reliability of unit i br: amount of available resource r n : total number of units nr: number of resources X : number of redundancy allocated to unity y.: 0 or 1 indicator variable g \Xj ) : amount of resources r consumed at unit i. 2. Model Description This article considers series systems and parallel redundancy, and assumes failure to be statistically independent. We assume that in a path set, only a level can be selected for redundancy. This assumption means that all units in the path set(A,-A-S )of Fig. l.B are targets of redundancy, but only one unit out of Ai, A and S should be selected as a part of the system. Fig 2.B represents a structure that is beyond the scope of this research because both Ai and A, in the path set(A,-A-S) are used. r r

A

,

r-

B

_,

L-

B

J

A

i

-i i-

L) A , U

B

~]

B

J

A,

A. Possible structure

1 A

-

B. Impossible structure

Fig. 2. Reliability block diagrams of possible and impossible structures

Under these assumptions, the problem of allocating modular redundancy is formulated as follows:

188

Subject to £ " = 1 ytgri(x,.)
r = 1,2,••-,n r

(2)

^+Z,£{7/}^=1

(3)

yt = 0 or 1 for all i

(4)

All x > 1 and integer./' denotes the components in the lowest level The objective is to maximize system reliability. Two prime variables, x • and yj , are used, x . denotes the number assigned to unit j . y, indicates whether unit j is actually used or not. Consequently, X. x y. yields the number of unit j used in the system. Eq. (2) represents the constraints of available resources suc'h as cost, weight and volume, where all gr/(Xj) 's, are assumed to be linear with the exception of cost. C ( x ) = ex + X* is used as the cost function. That is, the cost function is the sum of the price of the unit, ex, and the additive cost, X* . As the number of units used for redundancy increases, the additive cost increases geometrically. Eq. (3) is a constraint for the assumption that only a unit can be used in a direct line. 2.1. SA Algorithm This paper presents an SA algorithm that searches for the optimal units to be made redundant. To apply the algorithm to the various combinatorial optimization problems, the solution representation and the energy function are determined, and the initial solution, initial temperature, cooling rate, and stopping criterion are initialized. 2.2. Solution Representation and Initial Solution The solution consists of the two variables ( x , ^ ) of all of the units to be made redundant. The initial solution of the problem is initialized by a randomly generated solution, and evaluated by the energy function. The energy function, E, is the objective function of the problem and if the constraint function is violated, then it is set to zero. 2.3.

Method for Generating a Feasible Neighborhood Solution

The method uses the following steps:

189 Step 1: Select one unit randomly. Step 2: If the value of variable y of the selected unit is 1, then select a random number of redundant units and go to step 4. Step 3: 3-1: Check the y variables of the ancestors of the selected unit. If one of them is 1, y=0 and x=0, and change the values of y variables of offspring units that are also cousin units of the selected unit to 1 and select x variables randomly. 3-2: Check the y variables of the offspring units of the selected unit. If one of them is 1, then change all x and y variables of all offspring units to 0. 3-3: Change the y variable of the selected unit to 1 and randomly select the x variable of the selected unit. Step 4: Check whether the new solution satisfies the cost constraint. If it is satisfied, stop, otherwise, go to step 1. 2.4. Evaluating the Neighborhood Solution If the value of the energy function of the neighborhood solution is greater than the current solution (EN > Ec), then the neighborhood solution will replace the current solution. If the value of the energy function of the neighborhood solution is greater than the best solution found so far (EN > EB), then replace the best solution with EN. Otherwise, if EN < Ec, then the decision to accept the neighborhood solution is determined by the acceptance probability P (A) = exp (-A£ / T), where AE = Ec - EN is referred to as the difference between the energy function values of the current solution and the neighborhood solution. 2.5. Initial Temperature The initial temperature of the SA should be set sufficiently high to accept all transitions during the initial phase. However, it is known that high initial temperatures can cause longer computation times. Most studies using the SA have set the initial temperature through preliminary experiments, the initial temperature, TQ, was set to 100 in this study. 2.6. Epoch Length The epoch length, L, represents the number of iterations made at each temperature level. A fixed epoch length is usually used. It is determined by multiplying a constant, y, and the neighborhood size. In this paper, we use the

190 concept of the NESA (Ravi et al. [13]), so that the inner loop is terminated when the Q feasible solution is obtained. The value of L was set to 300, and Q to 10«. 2.7. Cooling Schedule and Terminating Condition The cooling rate is a generic parameter of the SA to so that the temperature decreases at the rate a. A simple geometric schedule was used so that the next temperature was calculated by Tc = a~Tc-\ (C = 1, 2, ...). Generally, the cooling rate a was between 0.5 and 0.99. The quality of the solution of the SA improved with a slower convergence, when a larger a value is used. If the new value of Tc was greater than or equal to the stopping value of 7> = 1 (if Tc > TF) then a feasible neighborhood solution was sought. Otherwise, the process stopped. We used the values of a = 0.98 and L=300. 3. A Numerical Example A series system, whose data is shown in Table 1, is considered as an example. The cost function is C(x)=CX+Ax . Several cost limits are considered, optimal solutions are obtained and the results are shown in Table 2. Table 1. The input data Unit 1 (system) 11 12 13 111 112 113 121 122 131 132

Parent unit

Reliability

Price

Additive cost parameter

1 1 1 11 11 11 12 12 13 13

0.40029 0.72675 0.76500 0.72000 0.90000 0.95000 0.85000 0.90000 0.85000 0.90000 0.80000

72 26 19 21 5 6 5 6 7 8 7

2 2 3 2 3 4 4 4 4 3 4

4. Conclusion This considered all units at multi-levels to be candidates for redundancy optimization in series systems. The model is formulated as a mixed integer programming. A SA algorithm is considered and numerical examples are studied. In future studies, more than one unit in a path set line should be considered for redundancy. It would be interesting to apply the algorithm to various systems.

191 Table 2. Cost limit, total cost, and system reliability of the optimal solution Cost limit

Total cost

Reliability

Cost limit

Total cost

Reliability

150

149

0.805693

250

241

0.945659

160

158

0.830871

260

241

0.945659

170

169

0.851054

270

270

0.960942

180

175

0.866762

280

270

0.960942

190

186

0.887817

290

270

0.960942

200

202

0.913644

300

270

0.960942

210

202

0.913644

310

304

0.975487

220

215

0.927171

320

304

0.975487

230

228

0.931863

330

304

0.975487

240

228

0.931863

340

304

0.975487

Acknowledgments This work was supported by the Regional Research Centers Program (Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development. References 1. K. K. Aggawal, Redundancy optimization in general systems, IEEE Transactions on Reliability 25(5), 330-332 (1976). 2. K. K. Aggawal, J. S. Gupta and K. B. Misra, A new heuristic criterion for solving a redundancy optimization, IEEE Transactions on Reliability 24, 86-87(1975). 3. P. Boland and E. L. Neweihi, Component redundancy vs. system redundancy in the hazard rate ordering, IEEE Transactions on Reliability 44(4), 614-619 (1995). 4. D. W. Coit and A. E. Smith, Reliability optimization of series-parallel systems using a genetic algorithm, IEEE Transactions on Reliability 45(2), 254-266(1996). 5. D. W. Coit and A. E. Smith, Redundancy allocation to maximize a lower percentile of the system time to failure distribution, IEEE Transactions on Reliability 47(1), 79-87 (1998). 6. M. Gen and R. Cheng, Genetic algorithms and engineering design, John Wiley and Sons (1996).

192 7. K. Gopal, K. K. Aggarwal and J. S. Gupta, An improved algorithm for reliability optimization, IEEE Transactions on Reliability 27(5), 325-328 (1978). 8. W. Kuo, C. L. Hwang and F. A. Tillman, A note on heuristic methods in optimal system reliability, IEEE Transactions on Reliability, 27(5), 320324(1978). 9. W. Kuo, An annotated overview of system-reliability optimization, IEEE Transactions on Reliability, 49(2), 176-187(2000). 10. J. Li, A bound heuristic algorithm for solving reliability redundancy optimization, Microelectronics and Reliability, 36(3), 335-339(1996). 11. K. B. Misra, Reliability optimization of a series-parallel system, IEEE Transactions on Reliability, 21, 230-238(1972). 12. K. B. Misra, A simple approach for constrained redundancy optimization problems, IEEE Transactions on Reliability, 21(1), 30-34(1972). 13. V. Ravi, B. S. N. Muty and P. J. Reddy, Non-equilibrium simulated annealing algorithm applied to reliability optimization of complex systems, IEEE Trans, on Reliability, 46, 233-239(1997) 14. C. F. Woodhouse, Optimal redundancy allocation by dynamic programming, IEEE Transactions on Reliability, 21(1), 60-62(1972). 15. W. Y. Yun and J. W. Kim, Multi-level redundancy optimization in series systems, Computers & Industrial Engineering, 46, 337-346(2004).

PART III MAINTENANCE

This page is intentionally left blank

A NEAR OPTIMAL PREVENTIVE MAINTENANCE POLICY WITH THE EFFECT OF AGE REDUCTION FACTOR MINGCHIHCHEN1 Industrial Engineering and Management Department, Wufong, Taiwan, R.O.C.

Chaoyang

University,

Chaoyang

University,

CHUNG-YUAN CHENG Industrial Engineering and Management Department, Wufong, Taiwan, R.O.C.

There are many repair/replacement problems have been investigated during the past decades. A repairable system is usually subject to stochastic failure when it deteriorates with age. The overall objective of this research is to investigate the optimal maintenance policy by modeling the system deteriorating process as a Markov decision process. The optimal repair/replacement policy is proposed with incorporating the costs of operating cost, general repair, failure replacement and preventive replacement under the discounted cost criterion. The specific maintenance actions for a repairable system are whether to replace the system, to perform a general repair or to keep it operating. This paper is an extension of the model developed by Chen and Feldman (1997) in which an optimal policy is investigated. The major modifications of the standard replacement model in this research are the addition of the general repair with age reduction factor and the number of general repairs can be used more than once. Finally, the optimal parameters of the maintenance policy can be obtained by solving the n-stage problem from the backward recursive scheme over a set of finite horizons to approximate the optimal policy for the infinite planning horizons.

1. Introduction Almost all systems deteriorate with age and are subject to stochastic failure. System deterioration can cause higher operating costs and less competitive products, and thus the maintenance activity is an essential task for manufacturing systems. Considering a system with an increasing failure rate, replacement of a working system may be based on two reasons: (1) the increasingly expensive cost to maintain operation and (2) the avoidance of a break down cost. The key concept of repair is that to replace the whole system at failure may not be economical but only repair or replace the failed parts. In

f

Work is supported by grant of the Taiwan National Science Foundation under the Grant No. NSC 92-22I3-E-324-016.

195

196 general, a repair can return the failed system to the working state somewhere between 'good-as-new' (a perfect repair) and 'bad-as-old' (a minimal repair). Kijima [2] proposed that the state of the system just after the general repair can be described using the so-called virtual age which is smaller (younger) than the system's real age. In his paper, he developed two repair effect models in which the failure rate depends on the virtual age of the system. The first model in which post-repair virtual age was directly proportional to the virtual age right before the failure occurred is called type I repair process. Makis and Jardine [4] formulated this type I repair process as a semi-Markov decision process and demonstrated that an optimal stationary policy exists under some suitable conditions. Love et al. [3] also formulated the type I repair process as a discrete semi-Markov structure and proposed a numerical search procedure to find the optimal maintenance. Kijima's second model as called type II repair process permits a more general repair effect. In this paper, we will only incorporate this type II imperfect repair to our maintenance model. The Kijima's type II virtual aging process can be denoted as follows: /- = 0 x fn , where tn is the virtual age of a system at the «th failure, and /" denotes the post-repair virtual age of the system. Here 0 reflects the impact of the general repair (0 < # < 1). If # = 1, then the repair is a minimal repair. If 0 - 0, the repair is a perfect repair. However, these models mentioned above did not consider the operating cost and the preventive replacement as a maintenance option for an operating system. This paper is an extension of the model developed by Chen and Feldman [1] in which an optimal policy is investigated with an age-dependent operating cost. As with the standard model in the literature, the repair and replacement times are negligible. Under the expected discounted cost criterion, the optimal replacement policy is shown to be the similar form as the Chen and Feldman's optimal (t, T) policy. Unlike Chen and Feldman's model which assumed that minimal repair can be used only once, the number of the general repair can be conducted more than once and the general repair can partially reset the failure intensity of the system. With these modifications, the proposed model is more close to real world situations. 2. Problem Formulation and Analysis In this section, we define the repair/replacement problem and establish the mathematical framework as a discrete semi-Markov decision process. Consider a system subject to stochastic failure with a time-to-failure distribution given by F. At any instant of time, two maintenance options arise according to the state of the system: for a failed system, the decision maker must decide between a

197 general repair and failure replacement; for a working system, the decision maker must decide between a scheduled replacement or leaving the system operating. It is assumed that a failure is instantly detected and the time to repair or replacement is negligible. Four types of cost are considered:

cm,cs,C,

and cn(t) , for t > 0. The cost Cm refers to a minimal repair; cs refers to a scheduled replacement; Cf refers to a failure replacement; and C0 (t) is the cost rate for operating a system of age t. The minimization criterion is based on the total discounted cost, where a > 0 denoted the discount factor. The following five conditions are satisfied by the cost terms and the distribution F . ASSUMPTION 1. The five assumptions listed below are made throughout this paper: 1. The distribution function F has an increasing failure rate. 2. There exist two positive values, £ and 8 such that no scheduled replacement is considered for a system of age less than S and F(e) < 1. This assumption insures that there will never be an infinite number of repairs or replacements in a finite amount of time (see Ross [7]). 3.

The operation cost rate functions c 0 is nondecreasing and bounded over the support set of F .

The replacement problem can be described as a semi-Markov decision process with the state space S = {(X, h )| 0 2s x
198 the new condition so the decision process regenerates at each replacement. Then the decision process renews and starts with the state (0,0). In another way of interpreting the second component h, when/z= 2x k, for k=0, 1, 2, 3..., the system is working and has been repaired k times. While h = (2* k )-l, k = 1, 2, 3..., the system is in failure for the Ath time. The set of all possible actions is A = {0,1,2}, where a 6 A is defined as 0, leave alone a = < 1, minimal repair 2, replacement The cost function which is the same as Chen and Feldman [1] is given according to the specific maintenance action:

c((i,h),a) =

c0(i),

h = 2kmda

= 0,

c,,

h = 2k and a = 2,

cm,

h = 2k-landa = l,

cf,

h = 2k-\

and a = 2.

It should be noted that there are combinations of the state space and action space for which the cost is undefined; in such case, the action is not permitted. A stationary policy is a function that indicates which action to take for every possible state of the system. According to the truth that an optimal stationary policy exists in Makis and Jardine [4], a stationary policy has been selected for our semi-Markov decision process. The dynamics of the decision process are as follows: upon the system entering state (i,h) at time n, the action a e A is selected and a cost c(( i,h),a) is incurred. Then at time n+1 the state (/', h') is entered according to the transition probability Pa((i,h),(i',h')) . Since a stationary policy is selected, the following optimality equations are given for planning horizonn — 1,2,... and system age / = 1,2,3... vn(i, 2k) = min{c0(i) + ocr} v„_,(/" +1, Ik +1) + a (1 -r.)v„_,(/' + \,2k); cs + v„(0,0)} fork vn(i,2k - 1 ) = min{cm(i)

=0,1,2,...

(1)

+ vn (6 x /, 2k); cf + v„(0,0)} ,

for)fc= 1,2,3, ...

(2)

where v„(0,0) = co(0) + a r0 vn_l(l,l) +a(\-r0)v„_,(!,0) . Thus vn{i,h) denotes the cost or value associated with the selected stationary policy given that the decision process began with a system's initial state (i,h). In the optimality equation (!),>-= (F(i +1) - F(i))/(\ ~ F(i)) is the

199 conditional probability that a system will fail during the time interval (/, / +1) given that system was working at time i. In the optimality equation (2) the system has experienced its Ath failure and has been repaired immediately. Thus the transition of the state has moved from the failure state to the working state with the imperfect repair level 9 . The method of successive approximation which allows for considerable computational flexibility and simplicity will be used to obtain the optimal parameters for the maintenance policy. In the next section, some examples will be given and the numerical results can help to identify the structural properties of the optimal policy. 3. Numerical Approximation of the Optimum Once the optimality equations are defined, the n-stage problem can be solved by the backward recursive scheme over a set of finite planning horizons. The boundary condition, v0(i,h) = 0 for all (i,h) e 5" is assumed to calculate the numerical values of optimal equations. The numerical procedure is carried out by a program written in the C language and implemented on a Pentium IV computer. For consistency with Chen and Feldman [1], the following data were used: C, = $80,000 , Cf = $100,000 , Cm = $30,000 • f$1497.6 +2.4*//week 0

[$1560/week

/ > 26 week

/ < 26 week

The Weibull distribution with a mean of 188 weeks and a standard deviation of 64.7 weeks was used for the system life times. The tail of the lifetime distribution function is truncated to give a maximum age of the system (i.e., in our example 422 weeks). Observed that the time unit is week, therefore, an annual rate of return 15% yields a = 0.997. Although the number of general repair is not limited in the optimal equations (1) and (2), for both the computational efficiency and boundary condition requirement, the maximum number of repair upon the Ath need to be specified in our numerical examples below. The optimal control limits were shown in Figure 1-2 for the finite planning horizons up to thirty years, where T0, tx, T2, ... denote the control limits for the corresponding system failure history computed from the optimality equations (1) and (2) with the number of the general repair can be used only once. Observing from the n-stage problem in figure 1 that three control limits converge to two control limits in Chen and Feldman's model, the optimal policy for their Markov decision process has the same form of the (t,T) policy in Tahara and Nishida's model for the infinite planning horizon. However, when the imperfect repair has the effect of bring the state of the system to a younger

200

level prior to failure, figure 2 shows that the control limits TQ , T2 for replacement do not converge when 0 = 0.6. Figure 3 shows the optimal control limits for the maximum number of repair is 3 which have the similar repair control limits with Love et al. [3] except that our model provides preventive replacement control limits. Note in Table 1, a comparison of different values of repair effect and the maximum number of failure repaired for k = 1, 3, 5, ... is provided. The total expected discounted cost is approximated by using the nstage problem under the 30-year planning horizons. 4. Conclusion This paper extended the results which were obtained by Chen and Feldmam [1]. The optimal repair/replacement problem under the discounted cost criterion is formulated as a semi-Markov decision process, and numerically showed the structure of the optimal maintenance policy. The problems with age dependent cost structures seem to be practically more important. Hence, the examination of the optimality of repair/replacement policy under such cost structure is left for future research. Table 1. Optimal control limits and the total discounted cost for selected values of repair effect and the maximum number of failure repaired. 0 = 0.8 0 = 0.4 k=l

k=\

A=3

k=S

187

246

293

315

185

186

-

293

315

-

180

185

-

293

315

ti

-

-

185

-

-

315

t9

-

-

180

-

-

307

T

215

214

214

246

293

315

T

215

214

214

205

263

298

-

214

214

-

228

278

-

214

214

-

195

253

-

-

214

-

-

223

-

-

214

-

-

192

722,020

719,350

719,140

704,910

684,730

679,730

k= l

A=5

182

186

h

-

h

T6

v'(0,0)

201

*$

'a 1 2: a 2i O

U

§

•tl -TO

ry^oVwvswvywwwTOww

T2

5

10

15

20

30

25

Planning horizon n (in year) Figure 1. The optimal control limits for the n-stage problem under 0 = 1 and k = 1

Figure 2. The optimal control limits for the n-stage problem under 0 = 0.6 and k= 1

-•—TO ---tl -* T2 ^^t3 -*^T4 —15 ;

10

15

20,

25

30

-1—T6

Planning horizon n (in year) Figure 3. The optimal control limits for the n-stage problem under 0 = 0.6 and k = 3 .

202

Acknowledgments This research was supported by the National Science Council of the Republic of China, under the Grant No. NSC 92-2213-E-324-016. References 1.

2. 3.

4.

5. 6. 7. 8.

9.

10.

M. Chen and R.M. Feldman, Optimal replacement policies with minimal repair and age-dependent costs, European Journal of Operational Research 98, 75-84 (1997). M. Kijima, Some results for repairable systems with general repair," Journal of Applied Probability 26, 89-102 (1989). C.E. Love, Z.G. Zhang, M.A. Zitron and R. Guo, A discrete semi-Markov decision model to determine the optimal repair/replacement policy under general repairs, European Journal of Operational Research 125, 398-409 (2000). W. Makis and A.K.S. Jardine, A note on optimal replacement policy under general repair, European Journal of Operational Research 69, 7582(1993). T. Nakagawa, Modified, Discrete Replacement Models, IEEE Transactions on Reliability R-36, 243-245 (1987). H. Pham and H. Wang, Imperfect Maintenance, European Journal of Operational Research 94, 425-438 (1996). S. Ross, Applied Probability Models with Optimization Application, Holden-Day, San Diego, CA (1970). A. Tahara and T. Nishida, Optimal Replacement Policy for Minimal Repair Model, Journal of Operations and Research in Japan 18, 113-124 (1974). H. Wang and H. Pham, Optimal age-dependent preventive maintenance policies with imperfect maintenance, International Journal of Reliability, Quality and Safety Engineering's, 119-135 (1996). Z.G. Zhang and C.E. Love, A simple recursive Markov chain model to determine the optimal replacement policies under general repairs, Computers & Operations Research 27, 321-333 (2000).

THE OPTIMAL PERIODIC PREVENTIVE MAINTENANCE POLICY WITH RELIABILITY LIMIT FOR THE CASE OF DEGRADATION RATE REDUCTION" CHUN-YUAN CHENG Dept. of Industrial Engineering and Management Chaoyang University of Technology, Taiwan, ROC cycheng@cyut. edu. tw MINGCHIH CHEN Dept. of Industrial Engineering and Management Chaoyang University of Technology, Taiwan, ROC From the literature, it is known that preventive maintenance (PM) can reduce the deterioration of equipment to a younger level. Researchers usually focus on studying the reduction effect of age or failure rate when performing PM actions and developing the optimal PM policies. However, the PM actions, such as cleaning, adjustment, alignment, and lubrication work, may not always reduce equipment age or failure rate. Instead, it may only reduce the degradation rate of the equipment to a certain level. In addition, most of the existing optimal PM policies are developed with the consideration of the cost minimization only. Yet, as demonstrated in this paper, the equipment with optimal PM schedule will result in very low reliability. Hence, this paper is to develop an optimal PM policy by minimizing the cost rate with the consideration of the reliability limit for the case of degradation rate reduction after performing PM. The improvement factor method is used to measure the reduction effect of the degradation rate. The algorithm for searching the optimal solutions for this PM model is developed. Examples are also presented with discussions of parameter sensitivity and special cases.

1. Introduction It has been shown that performing preventive maintenance (PM) activities can slow the deterioration process of a repairable system and restore the system to a younger age or state, which is called the imperfect preventive maintenance. Malik [5] has proposed the improvement factor method to measure the restoration effect of the PM action for a deteriorating system. Nakagawa [6] has presented a model to describe the age reduction (or restoration) effects when the

This work is supported by the National Science Council of Taiwan under the project number NSC92-2213-E-324-026.

203

204

system is performed a PM activity. Chan and Shaw [2] have studied the modeling of the hazard rate restoration after the PM action. The PM policies have been developed for the deteriorating and repairable systems in the literature. Typically, these policies are to determine the optimal interval between PM activities and the number of PM before replacing the system by minimizing the expected average cost over a finite or infinite time span, e.g., Nakagawa [7], Lie and Chun [4], and Jayabalan and Chaudhuri [3]. Most of the PM models and policies shown in the literature assume the PM action can restore the system to a younger age or a smaller hazard rate. However, Canfield [1] has proposed a model by assuming that the PM activity can only relieve stress temporarily and hence slow the rate of system degradation while the hazard rate is still monotonically increased. Park et al. [8] has extended Canfield's model to determine the optimal PM policy. Park et al. [8] proposed a periodic PM model for the deteriorating systems using a constant unit of time to represent the restoration of system degradation rate. In the model, the optimal number of PM and the interval between each PM activity can be obtained by minimizing the expected cost per unit of time over an infinite time span. However, the optimal policy proposed by Park et al. can only guarantee minimal cost ratea without considering the system reliability. It can be seen in next section that the optimal PM policy proposed by Park et al. will result with very low reliability when the system is subject to be replaced. Based on the model provided by Park et al. (called Park's PM model), a periodic PM model over an infinite time span is proposed in this paper, of which a deteriorating system has to be replaced when it's reliability reaches the specified minimal limitation. The improvement factor method is used to measure the restoration effect on the degradation rate of a system after each PM activity. An algorithm is provided to obtain the optimal interval of each PM and the optimal number of PM before replacement by minimizing the cost rate with considering the reliability limit at system replacement. The policies obtained from the proposed model are compared with those from Park's model. Furthermore, parameter sensitivity for the proposed model is also studied. 2. The Preventive Maintenance Model To develop the proposed PM model with considering reliability limit at the time of system replacement, we use the examples shown in Park et al. [8] as the study cases of this paper. The resulted reliabilities at the time of replacing a system (i.e., r = N*ti) for the examples are calculated and shown in Table 1. It can be found from Table 1 that the reliabilities at the optimal replacement time (7) are very low in most cases when the objective is to minimize the cost rate. In fact.

" Cost rate is the expected cost per unit time over the study time span.

205

the maintenance practitioners in real world usually consider a reasonable and acceptable reliability level for keeping the system in operating status. Thus, the optimal solutions of finding the number of PM and the PM interval proposed by Park et al. [8] may not be suitable for the realistic problems unless the required acceptable reliability (Rmi„) is smaller than the reliability at the time of replacing a system (RT). 2.1. Notation h (/?/;) the time interval between PM's for the PM model without (with) considering reliability limit TV (NR) scheduled number of PM for a system which will be replaced at the TV* (NRth) PM for the model without (with) considering reliability limit T the time of preventive replacement where T=Nh (T=NR-hR) for the PM model without (with) considering reliability limit X(f) original hazard rate function (before 1st PM action) A,{t) hazard rate function in the ith PM period where X(l(t)= X(t) C,,„, cost of each scheduled periodic PM Cpr cost of preventive replacement Cmr cost of minimal repair at each failure occurred t] the improvement factor for measuring the restoration effect of PM where each PM reduces the degradation rate to that existing tjh time units previous to the PM interval Rmin acceptable reliability limit at the time of preventive replacement 2.2. The assumptions The assumptions for the proposed PM model are listed below which are also similar to those shown in Park's PM model. • The system is deteriorating over time with increasing failure rate (IFR), i.e., the failure process of the system is non-homogeneous Poisson process (NHPP). • The periodic PM activities with constant interval (hR) are performed over an infinite time span. • The periodic PM activities can restore the degradation rate of the system to a younger level using the improvement factor method to measure the restoration effect, while the hazard rate keeps monotone increase. • Minimal repair is performed when failure occurs between PM activities. • The system is replaced at the end of the NR'h PM interval with a required reliability limit (Rmin). • The times to perform PM, minimal repair, and replacement are negligible.

206 •

2.3.

The costs of PM (Cpm), minimal repair (Cmr), and replacement (Cpr) are assumed to be constant and Cpm and Cmr are not greater than Cpr. The Park's PM Model

Before introducing the proposed periodic PM model with considering reliability limit, we summarize the hazard rate function and the cost rate function of Park's PM model in Equations (1) and (2), respectively, by applying improvement factor rj where 0 < rj <. 1. We also derive the general form of the PM interval (h) and the equation of the reliability at time T for Park's PM model with Weibull failure case as shown in Eq. (3) and (4), respectively. The hazard rate function of each PM with degradation rate reduction X{l),

for i = 0, (i.e., 0 < t < h)

(1)

A

A (') = ' Z W + ( * - W " V)h)- A(*(l -n)h)}+ Mt - ?jih), for ih
l,2,...,N-l

The cost rate function

C(h,N) =

Nh

«-w-4$

(2)

The general form of the PM interval (A) for the Weibull case cmre-"(p-\)

for N = 1

(N-\)Cr„,+C,„

C,J-'\P-\)

i+/?XX

\k-kn+ny-'

(3)

for N > 1

The reliability at time T for the Weibull case expK(7>

for^*=l,

(4)

l + /? £ 2 (l + (* - ixi - i)Y" - (*d - n)Y exp

v +r f(i+,(i-,,)r-(,(i-;,)r

for A' >1

By using Eq.(4), the reliability at the time of preventive replacement for each example of Table 1 is calculated.

207 Table 1. The reliability at the time of replacing a system for the examples of Park et al.

5

7

9

11

13

15

20

30

40

50

60

70

90

100

0.9 1.0 0.1 0.5 0.01 0.05 0.7 0.3 N",h' 1, 1.1447 1,1.1447 1, 1.1447 1,1.1447 1,1.1447 1,1.1447 1,1.1447 1,1.1447 3.9311 C(N", ti) 3.9311 3.9311 3.9311 3.9311 3.9311 3.9311 3.9311 .0.2231 \Q;223t -0:223.1-;. :-&223J V' #.2231' • 4 2 2 8 ! ' 0.2231 JV*,/T 1, 1.3572 I, 1.3572 1, 1.3572 1, 1.3572 1,1.3572 1, 1.3572 1,1.3572 1,1.3572; 5.5261 5.5261 5.5261 5.5261 5.5261 CQf, ti) 5.5260 5.5261 5.5260 : 0.0821'. 0.0821- : 0.0821 i ;'-0:08'2i;- 0,0821' 0,0821- 0.0821 - 0.0821 Rm N",ti 1,1.5183 1,1.5183 1,1.5183 1,1.5183 1,1.5183 1,1.5183 2,0.9291 3,0.7469 C(N", ti) 6.9157 6.9157 6.9157 6.8619 6.6943 6.9157 6.9157 6.9157 P.0302 *0:0302-1 • 0.0302 • 0.0302- 0;0302'": • i},0302 0.0143 0.0067 ' N",ti 1, 1.6510 1, 1.6510 1,1.6510 1,1.6510 1, 1.6510 1,1.6510 3,0.7631 4,0.6745 7.8622 C(JV, ti) 8.1770 8.1770 8.1770 8.1770 7.5059 8.1770 8.1770 '0.0111" • : 0.0111 0.01I1 - 0 . 0 U L 0,0111 0,0025 • 0.0012 Ml) rf,ti 1, 1.7652 1, 1.7652 1,1.7652 1, 1.7652 1,1.7652 2,1.0194 3,0.8034 6,0.5661 C{N", ti) 9.3475 9.3475 9.3475 9.3475 8.7132 9.3475 9.1966 8.1706 4.09E-03 •4jQ9B-03 4.0911-03 4.09E-03 409B-03 1.93E-03 9J2J-04 9.61B-0S mi) N~,ti 1, 1.8663 1, 1.8663 1, 1.8663 1, 1.8663 1,1.8663 2,1.0711 4,0.6938 7,0.5396 CQf, ti) 10.4487 10.4487 10.4487 10.4487 10.4487 10.1531 9.4588 8.7361 1.50E-O3 l.SOB-OT UQE-03 l.'50Rtf3 1.50E-O3 7.16&04 1.60E-04 1.67E-05 N",ti 1,1.9574 1, 1.9574 1, 1.9574 1, 1.9574 2, 1.0827 3,0.8171 5, 0.6203 8,0.5175 C{tt, ti) 11.4946 11.4946 11.4946 11.4946 11.4296 11.0151 10.1569 9.2392 5.53E-04 5.53E-Q4. 5.53B-04 5i3E-04 1611-04 1.23E-04 2.75E-05 2.90E-06 Rm tf, ti 1,2.1544 1,2.1544 1,2.1544 1,2.1544 2,1.1826 3,0.887 6,0.588 12, 0.443 C(N", ti) 13.9248 13.9248 13.9248 13.9248 13.6354 12.9706 11.6844 10.3004 4.54E-05 4.54E-05 4:54£-05 4.54E-05. :2d4E-0S 1:013m • b(m~.06 L19E-08 Rm N", ti 1,2.4662 1,2.4662 1,2.4662 2,1.3042 3,0.9458 5,0.6622 8,0.5327 18,0.388 C(rf, ti) 18.2466 18.2466 18.2466 18.1147 17.4449 16.3104 14.2566 11.9361 M(l) 3.06E-0? 3.06B-07 3.06E-07 1.44E-07 6.83B-08. 1.52E-08 L61B-09- 8.88E-13 tf, ti 1,2.7144 1,2.7144 1,2.7144 2, 1.4297 4,0.8028 5,0.7185 10,0.488 25,0.345 C(N", ti) 22.1042 22.1042 22.1042 21.7699 20.7860 19.2059 16.4523 13.2155 2.Q6E-09 2.06E-09 2.06E-G9 9.74E40 2.17B-10 1.03E-10 2.41B-12- 3.14E-17 Rm N",ti 1,2.9240 1,2.9240 1,2.9240 3, 1.0516 4,0.8590 6,0.6589 11,0.481 32,0.317 C(JV\ ti) 25.6496 25.6496 25.6496 25.0933 23.7937 21.8164 18.4161 14.2862 1.39E-11. 1:391-11 1.39E-U 3.10E-12 1.46B-12. 3.27B-I3 7.68E-15 l.llE-21 Rm N~, ti 1,3.1072 1,3.1072 1,3.1072 3,1.1187 5,0.7444 7,0.6101 13,0.445 38,0.300 C(rf, ti) 28.9647 28.9647 28.9647 28.1580 26.5985 24.2338 20.2160 15.2166 9.36E-14 9.36E-14 9.36E-14 2.09E-14 4.66E-I5 104E-15 1.15E-17 8.31E-26 Rm Ar\ ti 1,3.2711 1,3.2711 1,3.2711 3, 1.1750 5, 0.7803 8,0.5694 14,0.438 45, 0.283 C(N~, ti) 32.0996 32.0996 32.0996 31.0639 29.2216 26.5070 21.8952 16.0452 Rm 6.31E-16. 6:311-16 6.31E-I6 1.4 IE-16 3.14B-i'7- 3.31B-18 3.68E-20 2J94E-30 ht,ti 1, 3.4200 1,3.4200 !2,1.7427 4,0.9367 6,0.6894 8,0.5921 15,0.430 52, 0.269 C(N", ti) 35.0882 35.0882 ' 35.0757 33.8273 31.7285 28.6591 23.4802 16.7961 Rm 4.25E-18, 425E-18 2.01E-18 4.48E-1? 9.99E-20 2.23E-20 1.17E-22 1.04E-34 N",ti 1, 3.5569 1,3.5569 2,1.8112 4,0.9723 6,0.7148 9,0.5536 16,0.422 58, 0.260 C(tf, ti) 37.9545 37.9545 37.8892 36.4460 34.1020 30.7093 24.9872 17.4853 2.86E-20 2.86E-20 1.35E-20 3.02E-21 6.7313-22: 7.10E-23. 3.72E-25 7.77E-39 Rm JV*, ti 1,3.6840 1, 3.68401 2,1.8729 4, 1.0055 6,0.7384 10,0.521 18,0.396 65,0.250 a w " ti) 4 m i 6 ^ 40-71^3 4n finis 38 o-7?8 16 1596 ^2 6R34 2fi 42-75 181240 I 931.-22 1 93B-22 9 11F-23 2 0i["-23 4 541.-24 2 26i;-25 5 601.-28 2 751.-41 Rm

sm

o.onr"

208

2.4. The Proposed Periodic PM Model with Reliability Limit Suppose that the system has to be replaced when the system's reliability decreases to a certain level, say Rmjn. Let tNR be the age of which the system's reliability decreases to Rmin and the replacement is occurred in the NRlh PM. It means that R{tNR) = Rmi„. Thus, for the case of Weibull failure distribution with location parameter 0 and shape parameter /?, the periodic PM interval (hR) of this model is derived as function of NR and is shown as follows. 0(-^RmJv,

foTNR=l (5)

h.=

forA^>l

-In ft. 1

*-'i+/*E£

(k-kn + riT -(k-knr

.

•v,-i

(i-iri + V -(i-inY

Then, the optimal value of A^ (say NR ) can be determined such that NR=mmC(hR,NR),

NR =1,2,3...

(6)

Where C(hR, NR) has the same definition as that of Park's PM model and is shown in Eq. (7). (NR ~ 1)C,„, + Cpl. + Cm "£ f 0A A, (t)dt C{hR,NR) = ^5

(7)

The algorithm for finding the optimal solutions of both hR and A^ is developed as follows. 1. 2.

Let NR=\, obtain the value of hR by Eq.(5) and the value of C(hR, 1) by Eq.(7). Let Cmn = C(hR, NR).

3.

Let^^A^+l.

4. 5.

Obtain the value of hR by Eq.(5) and calculate C(hR, NR) by using Eq.(7). If C(hR, NR) < Cmi„, let NR*=NR, and hR*=hR,.return to Step 2; otherwise, stop.

3. Numerical Examples and Lemma In this paper, some numerical examples are performed by using the same conditions as those shown in Park et al. [8] where the failure distribution is assumed to be Weibull(/?= 3, 9= 1), Cpm = 1.5, and Cmr = 1. Table 2 shows the optimal solutions to the examples of Cpr = 3, 9, 15, 30, and 80 by the proposed

209 model with reliability limit of Rmi„ =0.1 and 0.4. It can be seen from Tables 1 and 2 that the optimal cost rate of each example obtained from the proposed model is higher than the corresponding result of Park's model. The symptom is reasonable since the required reliability limit is a trade-off of the cost. It can also be seen from the Tables that the N* or NR is more sensitive to higher values of Cpr and tj. In additions, examples with lower values of Cpr and T] are more possible to result with N* = 1 (or NR*= 1). It means that the optimal policies of these examples require no PM activity before performing a preventive replacement. Thus, it is interesting to study the conditions (or called the control limit) of no-PM policy. Lemma 3.1 For the case of Weibull failure distribution, the optimal solution to the proposed problem is NR' = 1 (i.e., no-PM policy) when the following condition is satisfied ^

Cpt-(lnRml„)C

>(2v''-l)

(8)

Where v is defined as V = l + /?[1-(1-7?)""'] + ( 2 - 7 7 ) " - ( 1 - 7 7 ) "

(9)

Proof: Define that hRl and h^ be the hR values obtained from Eq. (5) for NR = 1 and 2, respectively. For the case of NR = 1, it means that C(hRU 1) < C(hR2, 2).. Thus, by applying Eq.(7) to Eq.(10), we obtain Eq.(8).

(10) •

For the perfect PM case (i.e., TJ = 1), it can be seen from Eq. (8) that the system with reliability limit does not need any PM if c > [2(2+/?)"'"-lb -(\nR )C • 4. Conclusions A periodic PM model with reliability limit at the time of preventive replacement over an infinite time span is proposed for the deteriorating systems with degradation rate reduction effect after each PM activity. The improvement factor method is applied in the proposed PM model to measure the restoration effect on the degradation rate of a system after each PM activity. The optimal interval between each PM and the optimal number of PM before preventive replacement are determined by minimizing the cost rate function. An algorithm is provided to search the optimal solutions in this paper. Examples are presented and compared with those shown in Park's PM model. The sensitivity

210 of parameters to the optimal solutions is briefly discussed. Finally, the control limit for a special case of no-PM policy is developed in this paper. Table 2. The optimal solutions to the examples by applying the proposed model with reliability limit.

c

«»,,„

1

3

ti

cm

9

ti C(T) 15

ti C(T) 30

ti C(T) 80 N" •

ti C(T)

0.1 1 1.3205 4.016 1 1.3205 8.559 1 1.3205 13.103 1 1.3205 24.462 1 1.3205 62.327

0.1 0.3 0.5 0.7 1 1 1 1 1.3205 1.3205 1.3205 4.016 4.016 4.016 1 1 1 2 1.3205 1.3205 1.3205 8.559 8.559 8.559 1 1 2 1.3205 1.3205 0.7308 13.103 13.103 12.865 1 2 3 1.3205 0.7076 0.5187 24.462 23.887 22.687 3 4 6 0.4677 0.3769 0.2840 60.790 57.574 52.697

0.9 1.3205 4.016 0.7574 8.452 3 0.5546 12.203 6 0.3243 20.455 11 0.1992 44.416

0.1 1 0.9713 4.032 1 0.9713 10.210 1 0.9713 16.387 1 0.9713 31.831 1 0.9713 83.309

0.4 0.3 0.5 0.7 1 1 1 1 0.9713 0.9713 0.9713 4.032 4.032 4.032 1 1 1 1 0.9713 0.9713 0.9713 10.210 10.210 10.210 1 1 2 0.9713 0.9713 0.5375 16.387 16.387 16.201 1 2 3 0.9713 0.5204 0.3815 31.831 31.143 29.633 3 4 6 0.3440 0.2772 0.2089 81.303 77.024 70.538

0.9 0.9713 4.032 0.9713 10.210 3 0.4079 15.458 5 0.2751 26.836 11 0.1465 59.525

References 1. 2.

3.

4. 5. 6. 7. 8.

R. V. Canfield, Cost Optimization of Periodic Preventive Maintenance, IEEE Transactions on Reliability R-35, 78-81 (1986). J. K. Chan and L. Shaw, Modeling Repairable Systems with Failure Rates that Depend on Age and Maintenance, IEEE Transactions on Reliability 42,566-571 (1993). V. Jayabalan and D. Chaudhuri, Cost Optimization of Maintenance Scheduling for a System with Assured Reliability, IEEE Transactions on Reliability 41, 21-25 (1992). C. H. Lie and Y. H. Chun, An Algorithm for Preventive Maintenance Policy, IEEE Transactions on Reliability R-35, 71-75 (1986). M. A. K. Malik, Reliable Preventive Maintenance Scheduling. AIIE Transaction 11, 221-228 (1979). T. Nakagawa, Imperfect Preventive Maintenance, IEEE Transactions on Reliability R-28, 331-332 (1979). T. Nakagawa, Periodic and Sequential Preventive Maintenance Policies, Journal ofApplied Probability 23, 536-542 (1986). D. H. Park, G. M. Jung and J. K. Yum, Cost Minimization for Periodic Maintenance Policy of a System Subject to Slow Degradation, Reliability Engineering and System Safety 68, 105-112 (2000).

A N I N V E N T O R Y W I T H PARTIAL R E P L E N I S H M E N T S U B J E C T TO C O M P O U N D POISSON D E M A N D S

S. K. C H O I , K. E . LIM A N D E . Y. L E E Department of Statistics, Sookmyung Women's University, Seoul 140-742, Korea E-mail: [email protected]

We introduce the concept of partial replenishment to an inventory. The stock of the inventory is replenished either fully with probability p or partially with probability 1 — p by a deliveryman arriving at the inventory according to a Poisson process. The demands for stock to the inventory form a compound Poisson process. The stationary distribution of stock is derived and an optimization is studied.

1. Introduction In this paper, we generalize the well-known (s,S) policy for an inventory by introducing the concept of partial replenishment. The inventory with capacity S > 0 is replenished either fully or partially if the level of inventory is insufficient when a deliveryman arrives at the inventory. The deliveryman arrives at the inventory according to a Poisson process of rate A > 0. If the level of inventory is less than s(0 < s < S), when he/she arrives, he/she replenishes the inventory either fully (up to S) with probability p or partially(by a random amount) with probability 1 — p. The amounts of partial replenishments are assumed to be iid with distribution function G, and guaranteed to be no less than s to ensure the level of inventory to exceed s after a replenishment. The partial replenishment, however, can't be made over capacity S. Meanwhile, the demands for stock arrive at the inventory according to another Poisson process of rate v > 0. The amounts of demands are assumed to be iid with exponential distribution of rate (i > 0. When the amount of a demand is larger than the current level of stock, only the current amount of stock is supplied. The classical (s, S) policy for an inventory was introduced by Dvoretzky, Kiefer and Wolfowitz6 and, thereafter, studied by many authors. For examples, the condition and optimality were proved by Boylan3 and Veinott 14 . 211

212

Tijms 13 and Schal 11 extended the earlier analyses to dynamic inventories in infinite and finite horizons, respectively. Federgruen and Zipkin7 developed an efficient algorithm to compute the optimal (s,S) policy. Hordijk and van der Duyn Schouten9 and Sethi and Cheng 12 studied the optimality of the policy when the demand process is Markovian. In the case of Poisson arrival demands, Dirickx and Koevoets 5 and Hohjo and Teraoka8 studied the optimization of (s, S) policy and Beckmann and Srinivasan2 obtained the stationary distribution of inventory level. The present replenishment policy is a generalization of the (s, S) policy in the case of Poisson arrival demands. In section 2, we obtain the stationary distribution of the level of inventory. After assigning several management costs to the inventory, in section 3, the long-run average cost is derived. A numerical example to find the optimal value of the arrival rate of deliveryman is illustrated at the end of paper. 2. The Stationary Distribution of Inventory level Let X(t) be the level of inventory at time t > 0. A sample path of {X(t),t > 0} is shown in Figure 1. Let F(x,t) = Pr{X(t) < x} be the distribution function of X(t) and F(x) = \ixnt-,oc F(x,t) be the stationary distribution of X{t). To obtain F(x), we first decompose process {X(t), t > 0} into two processes. {Xi(t),t > 0} is formed by separating the parts where X(t) > s from the original process and combining them together. {X2(t),t > 0} is formed by connecting the rest of process {X(t),t > 0}. Note that {X(t), t > 0}, {X\(t),t > 0} and {X2(t),t > 0} are all regenerative processes. In {X(t),t > 0} and {Xi(t),t > 0}, the regeneration point is the moment where a replenishment occurs and in {X-2(t),t > 0} the moment where the level of inventory falls below s for the first time since the last replenishment. Let Fi(x) = lim^oc Pr{Xi{t) < x) and F2(x) = l i m t ^ Pr{X2(t) < x} be the stationary distributions of X%(t) and Xi(x), respectively. After we derive F\(x) and ^ ( a ; ) , we will combine them together to obtain F(x). To do this, we first calculate some interesting characteristics of the model.

213

A

)EM

Xi

Y1

x°H 0

X X TX" ,

)'( $ x x — x 6 x x x x Q x

t

E'-

$

T*

*

X : demand O : partial replenishment D : full replenishment Figure l. A sample path of {X(t), t > 0} Properties (i) Let X° be the level of inventory at the moment when it falls below s for the first time since the last replenishment, then the distribution function Ho(x) of X° is given by H0{x) = e-^s~x\

0<x<s.

(ii) Let X' be the level of inventory just before a replenishment occurs, then the distribution function Hi (x) of X' is given by 0 < x < s.

ffi(s)

(iii) Let X" be the level of inventory immediately after a replenishment occurs, then the distribution function H2(x) of X" is given by H2{x) = (l-p)G*H!

(x),

s<x<S.

where * denotes the Stieltjes convolution. Remark 2.1. Notice that the discrete probabilities exist in Ho, Hi and H2. That is, Pr{X° = 0) = e"" 8 , Pr{X' = 0) = e ~ ^ , and Pr{X" = S) = 1 - (1 - p)G * # i ( S ) . Now, we obtain Fi(x). Let T * " be the first passage time from X" to the states below x in {Xi(t),t > 0}, for s < x < S. Then, we have the following

214 Lemma 2.1: L e m m a 2 . 1 . E(T*")

= \ fx »{z -

x)dH2(z).

Proof. Consider {Xi(t),t > 0} in a period between two successive regeneration points. If we project this portion of {Xi(t),t > 0} onto y-axis, then the resulting process forms a Poisson process with interarrival time which is the amount of a demand, {N(t),t > 0} say. Hence, it can be shown that

where E" is an exponential random variable of rate v. Conditioning on X", we have E(T?")

= - f

E[N(z - x)\X" = z]dH2(z) S

V J-r

H{z

-x)dH2(z),

since N(t) is independent of E\ and X".

•

Recall that {Xi(t),t > 0} is a regenerative process, where a regeneration occurs at the point of a replenishment. Observe that T* is the period between two successive regeneration points. Hence, using the renewal reward theorem(Ross 10 ), we have F\(x)=

lim

Pr{X1(t)>x}

t—*-oc

E{Tf)

=

E(T?"V where E{T* ) and E(T* ) are given in Lemma 2.1. Now, we drive the stationary distribution F2(x) of {X2(t), t > 0}. Recall that {X2(t),t > 0} is a regenerative process and the regenerative point is the moment where the level of inventory falls below s. The time between two successive regeneration points is an exponential random variable of rate A due to the memoryless property. Let T* be the first passage time from X° to the states below x, then the renewal reward theorem gives, for 0 < x < s, Ezix) = lim Pr{X2(t)

> x}

t—>oc =

E[min(Tf,E*)} ETE1}

_

xo

AE[mm{lx

A

,h ) \ .

215

E[min{T*

, Ex)\ is given in the following Lemma 2.2: ,EX)\ = {[I + A e - ^ ~ * ) -

Lemma 2.2. E[min{Tf

^-^fe^1].

Proof. First, observe that Ex)
Pr{min(T^°,

= l - Pr{min{T^\Ex) = 1 - Pr{T*°

> t} xt

> t}e~ ,

0 < t < oo,

since Ex is independent of T* . Conditioning on X°, we have, N(X°-x)+l

Pr{T?°

>t}=

Pr{

£

EX > t}

i=l

_

^.^-'

X

fc!

(n + 1)! v

n=Ofc=0

'

Hence,

<™v)l-«

M**VT.*)^H.-EE^T"•' n=0 fc=0 Therefore, A

v

v

The foregoing gives ~PAx) = 1 + A e - M ( « - x ) _

A+£e-

Xtt(s-x)

\ +v

We are now ready to obtain the stationary distribution F(x) of the original process {X(t),t > 0}. Let T* denote the time between two successive replenishments in process {X(t),t > 0}, then T* = T*" + EX in distribution. T h e o r e m 2 . 1 . F(x) = 1 — -F^z) is given by (

F(x)

\nJ*(z-x)dH2(z) XnSfiz-xW^W+v* T

\fifss(z-x)dH2(z)+u

X>S \n[S(z-x)dHi(z)+v'

u

^

x

^ *

216

Proof. Suppose that we earn a reward at rate of one per unit time when processes {X(t),t > 0}, {Xi(t),t > 0} and {X2(t),t > 0} are greater than x(0 < x < S), then we have, by the renewal reward theorem, that E[reward during a cycle T*] F{x) = E[T*} E[reward during a cycle T* ] + E[reward during a cycle Ex] ___ E[T*"\ — E[T*}

E[reward during T*"] —— \r,fi E[TSX"}

X

~T~

E[EX\ T" E[T*}

Remark 2.2. The stationary expectation E(X) by F ( x ) (

=

r

X

E[reward during Ex] E[EX]

of {X(t),t

> 0} is given

AMJ/Jif(«-!t)rfgaWd»+^(l-e-'")-^^(l-e~^) \»Jas(z-x)dH2(z)+v

'

3. Optimization Let C\ denote the cost of ordering a unit of goods, Ci denote the cost per visit of the deliveryman, C3 denote the loss per unit time while the inventory being empty, and C4 denote the carrying cost of a unit of goods per a unit time. Let N be the number of visits of the deliveryman during a cycle. Note that the replenishment occurs only once during a cycle, the amount of which is X" — X'. Note also that during a cycle the period of the inventory being empty exists with probability po = Pr(X = 0) = e x +". The expected length of this period will be j if exists, due to the memoryless property of Poisson process. We assume that the inventory has operated for a long time and the level of inventory is now stationary. Then, by the renewal reward theorem, the long-run average cost per unit time is given by _ [E(X") - E(X')]d

\v[E{X") -8+^(1-

+ \E(T*)C2 + \P0C3 + E[f*' E{T*) ft _

ae-*fr)]Ci +

ve-^C?, 1- \Ci

Xn[E(X") - s] + v \*>SUl^ +

- x)dH2(z)dx+±(l E X

M ( ")

X(u)du}C4

- e~n

- ^ ( 1

-s\+v

- e-35)]C4

217

Here, we use facts that E(X') e~&),

E(X")

E(X)

= lim^oo

= /„* 1 - H^dx

= / / 1 - H2(x)dx £[J

oXt(")d"]

=

g

= s - **?{1 -

= S - (1 - p)J*G d

l/(T*y "]

due to

* Hx{x)dx

and

the renewal reward

theorem. The value of A which minimizes C(A) can be calculated numerically. Specially, when the amount of partial replenishment Y is exponentially distribution with density function g(y) = ^ e _ J ^ r - , for y > s, the values of A minimizing C(A) can be found as in Table 1, for p = 0.25,0.5 and 0.75, by using MATLAB. Table 1 shows that as p and s increase, the optimal value A* decreases, as we have expected. Meanwhile, as the demands for stock increase(as v increases and \i decreases), we can see that the optimal value A* increases. We also see that the larger value of m yields the smaller value of A*. Table 1.

M 1 0 2

1 10

2

1 20

2

Optimal values of A and corresponding costs.

p = 0.5 p = 0.25 A* C(A*) C(A») 1.0735 8.1119 0.8765 7.0369 2 5 8.7665 0.7860 15 0.8550 7.5071 1.1915 12.6828 1.0015 11.5440 4 5 15 0.9430 13.1755 0.8925 11.9389 0.8435 5.3133 0.6805 4.3815 2 5 6.1424 15 0.6720 0.6100 4.9253 1.0735 8.1119 0.8765 7.0368 4 5 15 0.8550 8.7665 0.7860 7.5071 5.6068 2 15 0.3295 6.2189 0.3415 25 0.3390 5.9391 0.3470 5.4694 4 15 0.4155 10.3659 0.4355 9.8207 25 0.4265 10.1359 0.4410 9.7084 2 15 0.2025 3.9189 0.2120 3.2915 25 0.2115 3.6211 0.2180 3.1424 4 15 0.3290 6.1436 0.3405 5.5393 25 0.3390 5.8666 0.3470 5.4024 3.7712 2 25 0.1650 3.8735 0.1670 35 0.1675 3.8067 0.1685 3.7281 7.5065 4 25 0.1880 7.5813 0.1915 35 0.1915 7.5379 0.1940 7.4785 1.8210 0.1020 2 25 0.1015 1.7139 35 0.1025 1.7482 0.1030 1.6668 3.7582 4 25 0.1675 3.8599 0.1685 35 0.1690 3.7929 0.1695 3.7150 Ci = 2 , C2 = 1.5, C3 = 20, C 4 = 0.5, V

m

A*

p = 0.75

A*

C(\")

0.7595 0.7280 0.8800 0.8410 0.5860 0.5620 0.7595 0.7280 0.3525 0.3555 0.4515 0.4535 0.2220 0.2250 0.3520 0.3555 0.1690 0.1700 0.1955 0.1960 0.1030 0.1030 0.1700 0.1710 S = 40

6.4737 6.6917 10.9299 11.1202 3.9046 4.1484 6.4737 6.6917 5.1158 5.0650 9.3795 9.3386 2.7844 2.7278 5.0525 5.0015 3.6726 3.6518 7.4343 7.4207 1.6104 1.5875 3.6601 3.6392

218 Acknowledgements This work was supported by grant No. ROl-2004-000-10284-0 from Basic Research Program of the Korea Science & Engineering Foundation and the SRC/ERC program of MOST/KOSEF(Rl 1-2000-073-00000).

References 1. L. A. Baxter and E. Y. Lee, A diffusion model for a system subject to continuous wear, Probab. Eng. Inform. Sci. 1, 405 (1987). 2. M. J. Beckmann and S. K. Srinivasan, An (s, S) inventory system with Poisson demands and exponential lead time, OR Spektrum 9(4), 213 (1987). 3. E. S. Boylan, Multiple (s, S) policies, Econometrica 32, 399 (1964). 4. P. H. Brill and J. M. Posner, Level crossings in point process applied to queues: single-server case, Oper. Res. 25, 662 (1977). 5. Y. M. I. Dirickx and D. Koevoets, A continuous review inventory model with compound Poisson demand process and stochastic lead time, Naval Res. Logist. Quart. 24(4), 577 (1997). 6. A. Dvoretzky, J. Kiefer and J. Wolfowitz, On the optimal character of the (s,S) policy in inventory theory, Econometrica 2 1 , 586 (1953). 7. A. Federgruen and P. Zipkin, An efficient algorithm for computing optimal (s,5) policies, Oper. Res. 32(6), 1268 (1984). 8. H. Hohjo and Y. Teraoka, The replenishment policy for an inventory system with Poisson arrival demands, Sci. Math. Jpn. 58(1), 33 (2003). 9. A. Hordijk and F. A. van der Duyn Schouten, Optimality of (s, S)-policies in continuous review inventory models SIAM J. Appl. Math. 46(5), 912 (1986). 10. S. M. Ross, Stochastic Processes, 2nd ed, Wiley (1996). 11. M. Schal, On the optimality of (s, 5)-policies in dynamic inventory models with finite horizon, SIAM J. Appl. Math. 30(3), 528 (1976). 12. S. P. Sethi and F. Cheng, Optimality of (s, S) policies in inventory models with Markovian demand, Oper. Res. 45(6), 931 (1997). 13. H. C. Tijms, The optimality of (s, S) inventory policies in the infinite period model, Statistica Neerlandica 25, 29 (1971). 14. A. F. Jr. Veinott, On the optimality of (s, S) inventory policies : New conditions and a new proof, SIAM J. Appl. Math. 14, 1067 (1966).

OPTIMAL (T, SO-POLICIES IN A DISCRETE-TIME O P P O R T U N I T Y - B A S E D AGE R E P L A C E M E N T : A N EMPIRICAL STUDY*

T . D O f f l t , N. K A I O * A N D S. O S A K l t t * Department of Information Engineering, Hiroshima University Higashi-Hiroshima 739-8527, Japan: [email protected] * Department of Economic Informatics, Hiroshima Shudo University Hiroshima 731-3195, Japan: [email protected] *' Department of Information and Telecommunication Engineering Nanzan University, Seto 489-0863, Japan: [email protected]

In this paper we consider the optimal (T, S)-policies in a discrete-time opportunitybased age replacement (DOAR), where S is a restricted duration of opportunities and T (> S +1) is a preventive age replacement time. Based on the expected cost per unit time in the steady-state as a criterion of optimality, we formulate totally 6 DOAR models and derive numerically the optimal (T, S)-policies minimizing the respective expected costs per unit time. A numerical example with real failure time data investigates the dependence of the optimal DOAR policies and their associated minimum expected costs on parameters of the discrete Weibull failure time distribution.

1. Introduction The opportunity-based replacement problems are regarded as the most plausible preventive replacement schemes in practice, since the preventive replacement at each opportunity may need less effort or cost than the scheduled preventive replacement. Dekker and Smeithink 1,2 , Dekker and Dijkstra 3 analyze the typical but rather general opportunistic preventive replacement models. Iskandar and Sandoh 4,5 extend the seminal opportunity-based age replacement (DOAR) by Dekker and Dijkstra 3 . It should be noted that the above references 1,2,3 ' 4,5 are related with only "This work is supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research; Grant No. (B) 16310116 (2004-2006), Grant No. (A) 16201035 (2004-2006), Grant No. (C) 16510128 (2004-2006), and the Nanzan University Pache Research Subsidy I-A-2 for 2006. The authors wish to thank Mr. T, Ishii, Hiroshima University, for his numerical assistance in this paper*

219

220

Figure 1.

Section Switch.

continuous-time models. Recently, Dohi et al.6 consider a discrete-time DOAR model with an application to the preventive maintenance program in the Japaneses power company, and reformulate Dekker and Dijkstra's model 3 . The underlying problem considered by Dohi et al.6 is a DOAR of section switches to control the amount of electric power, and to distribute it to other places (see Fig. 1). Since the section switch is usually equipped with telegraph pole, it can be inspected at an opportunity when the telegraph pole is removed for any construction or when the maintenance team inspects the telegraph pole by other reasons. The main reason why the section switch is not replaced during the relatively shorter period is that the switch is relatively reliable and its replacement cost is very expensive. In other words, since the used section switch with relatively younger age can be utilized again in the same place, the duration of opportunities can be restricted as a longer time period. On the other hand, if the section switch does not fail, the preventive replacement should be performed when the age of switch attains a pre-determined threshold level. Of course, if the failure occurs, then the electric current is off over an extensive area, and the much cost to recover the local power delivery system will be needed. Since in our empirical research project the failure time data of section switches are recorded as group data (the number of failures per year) and the preventive replacement time is commonly scheduled at the unit of year, it is appropriate to formulate the problem as a DOAR one. Actually, it is

221 shown in our questionnaire that such a treatment would be plausible and could be supported by practitioners. Dohi et a/.6 consider the optimal Tpolicies with a fixed S in 6 DOAR models, where 5 is a restricted duration of opportunities and T is a preventive age replacement time. More specifically, the same authors 6 analytically derive the optimal preventive replacement time T (> S+l) with a fixed S, which minimizes the expected cost per unit time in the steady-state. However, in general, it is not so easy for the power company to give the restricted duration S in advance. Our experience in the field research suggests that both decision variables (T, S) should be derived theoretically based on any optimality criterion. In this paper we consider the optimal (T, ^-policies in a DOAR, and extend the previous result 6 by introducing one more decision variable S. Similar to the past result 6 , we formulate totally 6 DOAR models and derive numerically the optimal (T, 5)-policies minimizing the respective expected costs per unit time. A numerical example with real failure time data investigates the dependence of the optimal DOAR policies and their associated minimum expected costs on parameters of the discrete Weibull failure time distribution. 2. Model Description Consider a DOAR model for a non-repairable single unit system in discrete time. Suppose that the time interval between opportunities for replacements, X, obeys the geometric distribution Pr{X = x} = gx{x) = p{\ - pY'1 (X = 1,2^-• ;0 x} = (1 -p)*-1 = Gx{x - 1) and mean E[X] = 1/p (> 0), where in general <£(•) = 1 - <£(•). Then, the unit may be replaced at the first opportunity after elapsed time S (= 0,1,2, •••) even if it does not fail. The failure time of unit, Y, follows the common probability mass function P r { F _ = y) = fY(y) {y = 1,2,- ••) with survivor function P r { y > y} = FY(y - 1). Without any loss of generality, we assume that /y(0) = <7x(0) = 0. If the failure occurs before a pre-specified preventive replacement time T (= S + 1, S + 2, • • •), the corrective replacement may be executed. On the other hand, if the unit does not fail before the time T, the preventive replacement may be carried out at time T or the first opportunity after S time elapses. The cost components under consideration are the following; c\ (> 0): corrective replacement cost per failure, c 2 (> 0): cost for each preventive replacement,

222

C3 (> 0): cost for each opportunistic replacement. From the above notation, we make the following two different assumptions; Assumption ( A - l ) : c\ > C3 > C2, Assumption (A-2): c\ > c2 > C3. It would be valid to assume that the corrective replacement cost is most expensive, cj ^> c2, C3. The relationship between the preventive replacement cost c2 and the opportunistic replacement cost C3 will be ordered by taking account of the economic justification in applications. As pointed out in the previous paper 6 , the DOAR model above has to be treated carefully, so that at an arbitrary discrete point of time, the decision maker must make one decision among three options; failure (corrective) replacement F a , preventive replacement S c and opportunistic replacement O p . We introduce the following symbol to represent the priority relationship: Definition 2.1: The option P has a priority to the option Q if P >- Q. From Definition 2.1, if two options occur at the same time point, the option with higher priority will be selected. In our model setting, consequently, it is possible to consider totally 6 different priority models as follows. Model 1: S c y F a y O p Model 3: S c y O p y F a Model 5: F a y O p >- S c

Model 2: F a y S c y O p Model 4: O p y S c y F a Model 6: O p y F a y S c .

Based on the previous results by Dohi et al.6, the mean time length of one cycle Aj(T, S) for Model j (= 1, • • • ,6) are all same, that is, Ai(T, S) = A2(T, S) = A3(T, S) = A4(T, S) = A5(T, S) = A6(T, S), where S

T

Aj(T,S) = Yl^Y(k-l)+

Y,

fc=l

FY(k-l)Gx(k-S-l)

(1)

k=S+l

is independent of the priority rule. On the other hand, the total expected costs during one cycle Bj{T) for Model j (= 1, • • • ,6) are given by S

B1(T,S)=c1Y,fY(n) n=0

T-\

+ c1 J2

fY(n)Gx(n-l-S)+c2Fy(T-l)

n=S+l T-l

xGx(T-l-S)

+ c3 J2 FY{n)gx(n-S), n=S+l

(2)

223

B2(T,S)

= c1J2

fr(n) + c i

X ! fY(n)Gx(n

n=0

- 1 - S) +

c2FY(T)

n=S+l T-l

xGx(T-l-S)

+ c3 Y

FY(n)gx(n-S),

(3)

n=S+l T-l

B3(T,S)

= c1Y/fY(n)

+ c1 Y

n=0

fY(n)Gx(n-S)

+

c2FY(T-l)

n=S+l T-l

c G x ( T - l - 5 ) + c3 Y

FY(n-l)gx(n-S),

(4)

n=S+l T-l c

B4(T,S)=ClJ2fY^)

+ i

n=0

Y

/y(n)Gx(n-5) + c2Fy(T-l)

n=S+l T

x G x ( T - 5 ) + c3 $ ]

Fy(n-l)5x(n-S),

(5)

n=S+l T

B5(T,5)

= cx E / y ( n ) + Cl n=0

J2

fy(n)G*(n-l-S)

+

c2FY(T)

n=S+l T

xGx(T-S)

+ c3 Y

FY(n)gx(n-S)

(6)

n=S+l

and B6(T,S)

= ClJ2fY(n) n=0

+ Cl

Y,

fY(n)Gx(n-S)

+

c2FY(T)

n=S+l T

xGx(T-S)

+ c3 Y

FY(n-l)gx(n-S),

(7)

n=S+l

respectively. Then the expected costs per unit time in the steady-state Cj (T, S) for Model j (= 1,2, • • • ,6) are, from the discrete renewal reward argument, E total cost on [0, n] Cj(T,S)=

lim

B^SVA^S), (8)

and the problem is to find the optimal pair (T*,S*) which minimizes the expected cost Cj(T). Dohi et al.6 develop a computation algorithm to

224

calculate the optimal T* for a fixed S by examining carefully the mathematical properties of the functions qj(T) = [Aj(S)Bj(S + 1) — Aj(S + l)Bj(S)]/Fy(T). In our problem, an improved algorithm to update S is available when the failure time distribution is IFR (Increasing Failure Rate). Algorithm DA OR: For each j = 1, 2, • • • , 6 For each i = 1, 2, • • • , I If qj(Si + 1) < 0 and qj (oo) > 0 find the optimal Tt (St + 1 < Tt < oo) satisfying ^(Tj - 1) < 0 and qj(Ti) > 0 end If qj (oo) <0 Ti^oo end If9j(Si + l ) > 0 , Ti = Si + 1 end end end Find the optimal i minimizing Cj(Ti,Si) for i = 0 , 1 , . . . , / and Model j (= 1,2, • • • , 6), where I is a sufficiently large integer. It should be noted that the assumption (A-l) [(A-2)] is needed for Models 1, 2 and 3 [4, 5 and 6] for the above algorithm. Also, although this algorithm is based on an elementary idea to list up all possible 5s, it can be effectively improved by applying the classical branch and bound algorithm. 3. A n Empirical S t u d y The 112 failure time data used here are recorded in Hiroshima city, Japan, during past twenty five years. Figure 2 illustrates the relative frequency of the failure time data of section switches, measured by the unit of year. In a fashion similar to Dohi e.l, a,l.6, suppose that the failure time obeys the following discrete Weibull distribution 7 : P r { y = y} = {q)^-^

- (q)<<

(9)

225 relative frequency 9 S 7 6 5 4 3 2 1 0 1

3

5

7

9

11

13

Figure 2.

15 17 19 21 23 25 27 29 31

Failure Time Data.

where q G (0,1), /? (> 0) and y = 1,2, •••. Ali Khan ei a./.8 develop an intuitive but simple parameter estimation method as well as the moment method (MM) and the maximum likelihood method (MLM) for the discrete Weibull distribution. We apply both MM and MLM to estimate q and /?. Unfortunately, since the time data between opportunisies are not recorded, we subjectively give the parameter of geometric distribution as p = 0.05. This means that the first opportunity for replacement comes 20 years after in average. In MM, we estimate q = 0.9995 and /3 = 2.8547 from E[Y] = 13.40 [year] and Var[Y] = 24.36. On the other hand, we have q = 0.9995 and (3 = 2.8158 by means of MLM. Table 1 presents the optimal (T, 5) policies and their associated expected costs per unit time Cj(T,S) for Model j (= 1,2,••• ,6), where ci = 5 x 103 ($), c2 = 1 x 103 ($), c3 = 1.5 x 103 ($) and p = 0.05. Prom this result, it can be seen that the parameter estimation methods (MM and MLM) do not give a remarkable difference on the optimal policies, but may result the difference on the expected costs. When the parameter q is larger, it can be observed that the difference of priority rules depends on the optimal policies. For instance, when q = 0.9990, we have (T*,S*) = (6,5) for Models 1 and 3, but (T*,S*) = (5,4) for Models 2, 4, 5 and 6. As the result through this numerical example, it can be shown that the optimal preventive replacement time is given by T* = S* + 1, so that one needs to take one of maintenance options (F a , S c and O p ) at T*. However, if we relax the assumption such as T > S, only the optimal policy for Model 6 in Table 1 is given by (T*,S*) = (7,7), and the corresponding expected costs become C6{T*,S*) = 25.1867 ($) and C6(T*,S*) = 25.2364 ($) with

226 MM and MLM, respectively. This is a very interesting result, because the DOAR can not be always reduced t o a simple age replacement with T = S exept for Model 6 in this case. Table 1. Comparison of 6 DOAR Models.

(T*,S*)

Model Model Model Model Model Model

1 2 3 4 5 6

(7,6) (7,6) (7,6) (7,6) (7,6) (7,6)

MM Cj{T*,S*)

22.1360 25.1867 22.1360 21.4811 24.5700 25.3616

MLL (T*,S*)

(7,6) (7,6) (7,6) (7,6) (7,6) (7,6)

Cj{T*,S*)

22.2101 25.2364 22.2101 21.5555 24.6196 25.4124

References 1. R. Dekker and E. Smeithink, Opportunity-based block replacement, Euro. J. Opl. Res., 53, 46-63 (1991). 2. R. Dekker and E. Smeithink, Preventive maintenance at opportunities of restricted duration, Naval Res. Logist., 4 1 , 335-353 (1994). 3. R. Dekker and M. C. Dijkstra, Opportunity based age replacement: exponentially distributed times between opportunities, Naval Res. Logist., 39, 175-190 (1992). 4. B. P. Iskandar and H. Sandoh, An opportunity-based age replacement policy considering warranty, Int'l J. Reliab. Qual. Safe. Eng., 6, 229-236 (1999). 5. B. P. Iskandar and H. Sandoh, An extended opportunity-based age replacement policy, R.A.I.R.O. Ope. Res., 34, 145-154 (2000). 6. T. Dohi, N. Kaio and S. Osaki, Discrete-time opportunistic replacement policies and their application, Proc. Int'l Workshop on Recent Advances in Stochastic Operations Research, 18-25, Canmore, Canada, August 25 -26 (2005). 7. T. Nakagawa and S. Osaki, The discrete Weibull distribution, IEEE Trans. Reliab., R-24, 300-301 (1975). 8. M. S. Ali Khan, A. Khalique and A. M. Abousammoh, On estimating parameters in a discrete Weibull distribution, IEEE Trans. Reliab., R-38, 348-350 (1989).

A MAINTENANCE AND INVENTORY CONTROL IN A CONNECTED-(P,R)-OUT-OF-(M,N): F SYSTEM YOUNG JU HA, WON YOUNG YUN Department

of Industrial Engineering, Pusan National University, 30 Geumjeong-Gu, Busan, 609-735, Korea

Chang)eon-Dong,

This paper deals with a maintenance and inventory control problem in a linear connected(p,r)-out-of-(m, n) F lattice system. It is assumed that all components of the connected(p,r)-out-of-(m, n) F lattice system are identical and have two states: state 1 (operating) and O(failed). The purpose of this paper is to develop an optimization scheme that aims at minimizing the expected cost per unit time. We considered an age-based preventive maintenance and modified (s, S) inventory policy. To find the optimal maintenance interval and inventory level, a genetic algorithm is proposed. The expected cost per unit time is obtained by a Monte Carlo simulation. Sensitivity analysis to the different cost parameters is done by numerical examples.

1. Introduction A connected- (p, r) -out-of- (m, ri): F system is a system with multi-units in which components are arranged physically or logically like the elements of an (m,n) matrix. The system fails if and only if all components in any {p,r) sub-matrix fail. The connected- (p, r) -out-of- (m, ri) : F system has been introduced by Salvia [3] and can be applied to obtain the reliability of a sensor system, an X-ray diagnostic system, a pattern search system, a liquid crystal display system and a phased array radar system. These lattice systems have some common features. That is, the system consists of identical components and the number of components in the system is relatively large. For example, the phased array radar system loaded in an airplane consists of hundreds of identical radar units. Yamamoto and Miyakawa [4] introduced a recursive equation technique to obtain the exact reliability of a connected- (p, r) -out-of- (m,ri) : F system. Although this method is effective for relatively small systems, the computation time becomes much larger. Malinowski and Preuss [2] proposed computation methods using the upper bound and lower bound in the F lattice system in the case that all components are independent in failure. Yuge et al. [5][6] proposed the simulation method to compute the reliability and availability of the F lattice system. 227

228

Kabir and Al-Olayan [1] suggested a policy that has been advanced for joint optimization of age replacement and the provision of spare parts. It combines the age replacement policy with the (s ,S) type inventory policy, where s is the stock reorder point and S is the upper level. This policy is generally applicable to any operating situation having either a single component or a number of identical components. A simulation model has been developed to determine the optimal values of the decision variables by minimizing the total cost of replacement and inventory. In this paper, we propose a Monte Carlo simulation for the joint age based preventive maintenance policy and ( s , S ) inventory model in a connected(p, r) -out-of- (m, ri): F lattice system. And, we have determined the optimal preventive maintenance interval, reorder point and maximum inventory level using a genetic algorithm. 2. System model and control 2.1. Failure Occurrence and Maintenance Policy A connected- (p, r) -out-of- (m, ri) : F system fails when all parts of any (p, r) sub-matrix in an (m, ri) matrix with components are out of order. The status of the system and the part has only 2 states: work or failure. We assume that the failures of all parts of the system are independent and follow an identical exponential distribution. We use an age based preventive maintenance policy. Under this policy, we replace all the failed components when the system is failed (failure maintenance) or the system age reaches the specified time (preventive maintenance), T , whichever occurs first. After maintenance, the system age is reset to zero. We also consider inventory control problems for components and it sometimes happens that we do not have sufficient components for maintenance. Then we shut down the system and wait for the components to arrive. It is assumed that the maintenance time is negligible. 2.2. Spare Parts Inventory Control In this paper, a modified ( s , S) inventory policy is considered. It is assumed that the lead time of order is a random variable. We start to operate the system with S spare parts and check the amount of parts in inventory at every maintenance point. Various actions are taken as follows:

229

Case 1: The inventory level is greater than the number of the failed components, N. In this case, we maintain the system (failure or preventive maintenance) without time-delay and if the remaining components, R, is greater than the reorder point, s, we do nothing. Otherwise, we check whether we should wait for the delivery of an order. If we decide to wait for the components for the last order, we wait for the components. But there is also the waiting delivery, and then we order parts (the order quantity is S-R). Case 2: The inventory level is less than the number of the failed components, N. In this case, we shut down the system. If we wait for delivery of the order, we wait for the delivery. Otherwise, we order components (order quantity is S-R) and wait for the delivery of the order. Remark: In case 2, we can consider a special case that the ordered quantity is less than the number of the failed components. In this case, it is more efficient to order components, but it is not considered to simplify our model. Thus, there are 8 model parameters: 1 parameter ( / / ) for the component failure, 4 parameters (p,r,m,n) for system structure, 1 parameter (7' ) for maintenance and 2 parameters (s, S) for inventory control. The three parameters, (Tp,s,S), are system control parameter and decision variables. 3. System Optimal Design 3.1. Cost Model Notation CR C0 C, C2 C3 C4 Ch S S T

: expected cost per unit time : fixed cost at corrective maintenance : cost for replacing a unit : fixed cost at preventive maintenance : fixed cost that order has incurred : fixed cost that shortage has incurred : holding cost per unit time and component : reorder point : maximum inventory level : preventive maintenance interval

We strive to obtain the optimal maintenance and inventory control policy simultaneously and the optimization criterion as the expected cost per unit time. We consider four cost factors: maintenance costs (failure and preventive maintenance), holding cost, order cost, system shutdown cost as follows:

230

1.

Maintenance costs: The maintenance cost consists of a fixed cost ( C 0 for corrective maintenance or C2 for preventive maintenance) and a variable cost depending on the maintenance duration where the cost per unit time is

2.

Holding cost: When we hold the units, the holding cost depends on the number of units and duration where the cost per unit time for a unit is Ch.

3.

Shortage cost: When we try to maintain the system correctively or preventively, we need new units to replace the failed units in the system. But sometimes we do not have sufficient units in inventory, in which case we should shut down and wait for delivery of the units. At this time, the shortage cost occurs when failed components are suddenly increased or when the replacement is delayed because of the late arrival of the ordered quantity. System stopped cost: It occurs when the system is unavailable because of shortage during the lead time.

4.

The total cost is the sum of the above stated four cost factors. The expected cost per unit time is calculated by the value which equals the total cost divided by the simulation time. 3.2. Simulation Procedure We have developed a Monte Carlo simulation procedure to obtain the expected cost per unit time. The simulation procedure is given in Figure 1 and there are 5 different cases in the path of inventory level. 3.3. Genetic Algorithm We propose a genetic algorithm to find the optimal solutions. The fitness value is the expected cost per unit time which can be obtained during the proposed simulation procedure. Encoding/Decoding: A chromosome encodes real values of periodic replacement time T and system parameters with 11-digit strings in which each string has an integer value from zero to nine.

231 11 Bit

>

r Number of (0-9)

Figure 3. Chromosome

Selection: This operator selects chromosomes in the population for reproduction. The Roulette selection is used. Crossover: It occurs at one point. If the value of PM is selected, all bits between two points, chosen at random, are exchanged. Mutation: This operator randomly flips some of the bits in a chromosome. 4. Numerical Examples Simulation and GA are operated under the following conditions: - Simulation time: 10,000 hours - GA parameter: random selection (chromosome number, generation number) = (100, 50) (crossover rate, mutation rate) = (0.3, 0.5) Table 1 shows the values of the model and cost parameters and Table 2 shows the optimal solutions of 21 replications. The optimal s, S, and TR may be near 35, 94, and 16, respectively.

232 (

Start

)

] Generate the failure time tt

Cumulate the failure number, interval length, Count maintenance number

Simulation time=total interval + total system stopped time Renewal system

Calculate total cost

(

I

End

J

Figure 1. Simulation procedure

233 Table 1. Input variables in simulation

Co 100

simulation time : 10,000hour, chromosome number: 100, generation number : 50 m n P r Q

c2

c3

5

10

200

5

5

3

3

ch

SL

0.008

10

Table 2. Experiment results

number

S

s

T

P

CR

1

33

82

15.565

1.89978

2

39

88

15.313

1.95506

3

38

101

15.135

1.89684

4

34

101

16.879

1.92212

5

31

82

15.189

1.89503

6

35

95

17.175

1.92529

7

40

96

16.516

1.99231

8

38

94

15.591

1.91946

9

34

103

14.840

1.92760

10

38

96

17.418

1.95214

11

31

93

16.440

1.90118

12

34

103

15.115

1.89083

13

34

85

15.355

1.90843

14

34

91

15.414

1.92097

15

37

84

14.815

1.94525

16

31

95

15.894

1.92790

17

36

103

16.555

1.94853

18

36

73

17.344

1.95852

19

34

102

15.676

1.91586

20

34

106

17.062

1.93396

21

32

101

15.089

1.87719

Mean

34.90

94.00

15.923

1.92449

Standard deviation

2.66

8.84

0.872

0.02749

234

5. Conclusion This paper proposes a maintenance and inventory control in a connected(p,r) -out-of- (m, ri) : F lattice systems. We have considered age-based preventive maintenance and (s, S) inventory policy. We propose a simulation procedure and genetic algorithm to obtain the preventive maintenance interval and (s, S) to minimize the expected cost per unit time. A numerical example was also studied. The simulation procedure proposed in this paper is applicable for other F lattice systems in which the failure distribution of the components has an exponential distribution. Acknowledgments This work was supported by the Regional Research Centers Program (Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development References 1. A.B.M.Z. Kabir and A.S. Al-Olayan, A stocking policy for spar part provisioning under age based preventive replacement, European Journal of Operational Research 90, 171-181 (1996). 2. J. Malinowski and W. Preuss, Lower & upper bounds for the reliability of connected-(r,j)-out-of-(w,«): F lattice systems, IEEE Trans. Reliability 45(1), 156-160(1996). 3. A. A. Salvia and W. C. Lasher, 2-Dimensional consecutive-£-out-of-«: F models, IEEE Trans. Reliability 39(3), 333-336 (1990). 4. H. Yamamoto and M. Miyakawa, Reliability of a linear connected-(r,j)-outof-(m,n): F lattice systems, IEEE Trans. Reliability 44(2), 333-336 (1995). 5. T. Yuge, M. Dehare and S. Yanagi, Reliability and availability of a repairable lattice system, Trans, on IEICE E83-A(5), 782-787 (2000). 6. T. Yuge, M. Dehare and S. Yanagi, Reliability of a 2-dimensional consecutive k-out-of-n: F system with a restriction in the number of failed components, Trans, on IEICE E86-A(6), 1535-1540( 2003).

OPTIMAL M A I N T E N A N C E POLICY FOR A S Y S T E M W I T H D A M A G E REPAIR

K. I T O Mitsubishi

Technology Training Center, Technical Headquarters, Heavy Industries, LTD., l-ban-50, Daikouminami 1-chome, higashi-ku, Nagoya 461-0047, Japan T . N A K A G A W A , K. T E R A M O T O Department

of Marketing and Information Systems, Aichi Institute of Technology 1247 Yachigusa, Yagusa-cho, Toyota, Aichi, 470-0392, Japan E-mail: toshi-nakagawa@aitech. ac.jp

Aged fossil-fired power systems, which need the maintenance for their steady operations, are on the great increase in Japan. The preventive maintenance and/or repair of such systems are indispensable to prevent the serious trouble such as the emergency stop of operation. Because the cumulative fatigue damage of system parts remains, the condition of system after repair cannot return to brand-new. Such repair degradation of system has to be considered when the maintenance plan is established. In this paper, a system is repaired at prespecified schedule when the cumulative damage level is below a managerial level. When the cumulative damage level exceeds a certain critical level, the system fails and such critical level lowers at every repair. The expected cost per unit of time between maintenances is obtained, and the optimal maintenance policy is derived.

1. Introduction A number of aged fossil-fired power plants are increasing in Japan. For example, 33% of these plants are currently operated from 150,000 to 199,999 hours (from 17 to 23 years), and 26% of them are above 200,000 hours (23 years) 1. Although Japanese government relaxed regulations of electric power industry, most industries restrain from the investment for new plants and prefer to operate current plants efficiently because of the long-term recession in Japan. The deliberative maintenance plans are indispensable to operate these aged plants without serious trouble such as the emergency stop of operation. 235

236

The importance of maintenance for aged plants is much higher than that for new ones, because occurrence probabilities of severe troubles increase and new failure phenomena might appear according to the degradation of plants. Furthermore, actual lives of plant components are mostly different from predicted ones because they are affected by various kinds of factors such as material qualities and operational circumstances 2 . So, maintenance plans should be established considering occurrence probabilities of miscellaneous component troubles. The maintenance is classified into the preventive maintenance (PM) and the corrective maintenance (CM). Many authors have studied PM policies for systems because the CM cost at failure is much higher than the PM one and the consideration of effective PM is significant 3 ~~ 7 . The occurrence of failure is discussed by utilizing the cumulative damage model. Cumulative damage models, where a system suffers damage due to shocks when the total damage exceeds a failure level, generate a cumulative process 8 . Some aspects of damage models from the reliability viewpoint, were discussed by Esary, Marshall and Proschan 9 . The PM policies where a system is replaced before failure at time T 10 , at shock N n 12 , and at damage K 13 14 were considered. Nakagawa and Kijima 15 applied the periodic replacement with minimal repair at failure to a cumulative damage model and obtained optimal values T*, N* and K* which minimize the expected cost. A plant consists of a wide variety of mechanical parts such as power boiler, compressor, combustor, steam and gas turbines. Some parts suffer high temperature at operation and thermal damages are accumulated in these parts. PM is performed periodically before these damages cause serious failures. The condition of system after PM cannot return to the brand-new condition because some cumulative fatigue damage of system parts remains after PM 16 . In past PM studies and cumulative damage models, the condition of system after PM is supposed to be brand-new. In the actual plant maintenance, the remaining damage after PM should be considered. In this paper, a system is repaired at prespecified schedule when the cumulative damage level is below a managerial level and some damage remains after the repair. When the cumulative damage level exceeds a certain critical level, the system fails and the critical level loweres at every repair. The expected cost per unit of time between maintenances is considered and the optimal maintenance policy is derived.

237

2. Model 1 We consider the following maintenance policy (see fig.l) : 1) The system is operated continuously and shocks during operation occur at a non-homogeneous Poisson process. The probability that 17 the j'-th shock occurs during (0,*] is Hj(t) = {XtYe-xt/j\ . Thus, the probability that shocks occur j-times during (0,t] is Fj(t) = 2) The damage caused by each shock has an identical probability distribution G(x) = Pr{Yj < x} (j = 1,2, • • •) with finite mean, and each damage is additive. Then, the total damage Zj = X)i=i ^i to the j - t h shock where ZQ = 0 has a distribution Pr{Zj < x} = GU)(x) (j = 1,2, • • •), where $^>(x) (j = 1,2, • • •) denotes the j fold Stieltjes convolution of $(x) with itself. 3) The cumulative damage is below a managerial level k during (Ti,Ti+i](i = 0,1,2, •••) where To = 0. The system is repaired at time Tj+i and its cost is CQ. The system degrades at every repair. 4) When the cumulative damage level exceeds a failure level Ki (i = 0, l , - - - ) , the system fails and its maintenance cost is c ^ . Ki declines i.e., Ki < Kj_i < KQ because some cumulative fatigue damage of system parts remains at every repair. 5) When the cumulative damage level is between k and Ki, the system is overhauled and its cost is c\ (cfd > c\ > CQ, i = 0,1, • • •). In this model, the system is operated until its cumulative damage exceeds k and k has to satisfy l i m , - ^ Ki > k. The probability P^ that the system undergoes overhaul when the cumulative damage exceeds the managerial level k is Pkt = E [ G 0 ' _ 1 ) W - G&(k)] fT'+1 dFjit).

(1)

It is obvious that J2Zo -Pfc; = 1- The probability P^ that the system fails during (Tj,T i+1 ] when the cumulative damage level exceeds Ki, is p

Kt=J2

t1 -

G

( ^ - u)]dGU-V{u)

/

dFj(t).

(2)

238

K

i *» It*, k

Operation time Figure 1.

Schematic diagram of Model 1

Let E{U} denote the mean time to some maintenance. From (1) and (2), we have

D G(j-i) ( fc ) - G(j)(fc)i r + i tdFM

E

M=E

i=0 j = l

JTi

= I [ 1 + M(fc)], where M(k) = Y^=1 G{j)(k). some maintenance is

(3)

Further, the total expected cost E{C)

E{C) = ] T [(a + ico)(Pki - PKi) + (cKi + ico)PKi] •

to

(4)

i=0

Therefore, from (3) and (4), the expected cost rate Ci(k)/X is E ~ o E ~ i {(ci + ico)[GV-l\k) d(k)

- (*)] j j

+ 1

dF.it)

+i

_

+{cKi-cl)ti\l-G{Ki-u)}dG^\u)$ dFj{t)) • (5)

1 + M(k)

A

Suppose that G(x) = 1 - exp(-fix),i.e-, and M{x) — fix. oo

Gu){x)

=

Y^j(^)ie-'1X/i\

oo

E{C} = a - c0 + Y, E ^ »=0 j = 0

- ^ - i ) e " * + co]G^(fc)^(TO ,

(6)

239

where Ai = (cKi - ci)e ^Ki ,Ki = oo(i < 0). Therefore, from (3) and (6), the expected cost rate C\(k)/\ is Ci(fc) A

=

ci-co + ESoPi-^-iK^ + colE^o^W^m) l + fik '

U

Differentiating C\(k) with respect to k and putting it to zero, we have oo oo

E E { [(^< - ^-i) e M f c + co] (1 + M*)^'"1) (*) j=0

j=0

- [(Ai - i4i_i)e"fc + co(2 + Mfc)] G«>(A:)} ^-(Ti) =

Cl

- CQ , (8)

where G{j)(k) = 0(j < 0). Letting denote the left-hand side of (8) by Li(fc), we have oo

M O ) = ^2{[(Ai

- A<_i) + co](ATi - 1) - c 0 }e- A T i ,

(9)

oo oo

LiiKoo) =J212{

[(^* - ^ - i ) e " * ~ + co] (1 + /xK 0 0 )G ( j - 1 ) (^oo)

i=0 j=0

- [(Ai ~ Ai-Je"*-

+ co(2 + / x ^ ) ] G«> (#«>)} # ; ( W 0 )

Thus, if Li(0) < ci - co < Li(-Koo) or Li(Koo) < cx - c0 < Li(0) , then there exists the finite k* (0 < k* < Koo) which minimizes C\(k).

3. M o d e l 2 We consider the following maintenance policy which has the same assumptions except 5) of Model 1 (see fig.2) : 5)' When the cumulative damage level is between k and Ki: or the operation time exceeds time T n , whichever occurs first, the system is overhauled and its cost is c\ ( c ^ > c i > c o , i = 0,!.,•••). The probability that the cumulative damage is below the managerial level k until the time Tn is oo

i¥n = E G ( j ) (*w r »)3=0

ai)

240

K,

' i 1

! i t

i

\ ! i

. . ........^./. 7J

K»-i

O

.

j

..].•••

T,

TM

T^

T»

Operatwfttiine Figure 2. Schematic diagram of Model 2 Then, the mean time to some maintenance or to Tn is •try

1

T1-

OO

E

,+1

W = £ E^^^fc) - G^(fc)] / ,/T

i=0 j = l

tdFj(t) + TnPTn

*

(12) Next, the total expected cost ^ { C } to some maintenance or to Tn is E{C} = [ci + (n - l)co]P r „ n-l

n—1

+ J > i + ico)(Pki - P* 4 ) + £ ( < * < + tcoJP^ i=0

i=0

= ci - co + £ p * - X i - i ) ^ * + co] ^ G ^ ^ f c ) ^ ^ ) i=0

j=0 oo

-A-ie^EGW^)^(T„).

(13)

3=0

Therefore, from (12) and (13), the expected cost G2(n, k) is C\ - C o

C2(n,k)

+ ZtoU*

- ^i-i)e" f c + co] E ° l o GW)(fc)^m) E,~oGW)(*)/orB^W*

, (14)

241

which agrees with (7) as n is infinity. 4. Numerical illustration Ki and cki are defined as Kt = (K0 - Koo)exp(-
/c,(k) 2.09 X 10

/

2

0.92 X 10 2

67.0 50

F i g u r e 3.

93.6 100

150 Managerial damage level

Model 1 and conventional model expected costs

5. Conclusions We have considered the optimal PM policies for a system with damage repair. The system fails when the cumulative damage exceeds a certain critical level and the critical level loweres at every repair. The overhaul is performed when the cumulative damage is between a managerial level and the critical level. Two models are considered and expected costs of these

242 models are delivered. Optimal pocilies which minimize costs are discussed. Numerical illustrations are attached. References 1. K.Hisano, "Preventive Maintenance and Residual Life Evaluation Technique for Power Plant (I.Preventive Maintenance)" (in Japanese), The Thermal and Nuclear Power Vol.51, No.4, 2000, pp.491-517. 2. K.Hisano, "Preventive Maintenance and Residual Life Evaluation Technique for Power Plant (V.Review of Future Advances in Preventive Maintenance Technology)" (in Japanese), The Thermal and Nuclear Power Vol.52, No.3, 2001, pp.363-370. 3. R. E. Barlow and F. Proschan, Mathematical Theory of Reliability, John Wiley & Sons, New York, 1965. 4. T. Nakagawa, "Optimal policies when preventive maintenance is imperfect", IEEE Trans. Reliability, R-28, 1979, pp.331-332. 5. C.H.Lie and Y.H.Chun, "An algorithm for preventive maintenance policy", IEEE Trans. Reliability, R-35, 1986, pp.71-75. 6. T. Nakagawa, "A summary of imperfect preventive maintenance policies with minimal repair", RAIRO Operations Research, 14, 1980, pp.249-255. 7. T. Nakagawa, "Sequential imperfect preventive maintenance policies", IEEE Trans. Reliability, 37, 1989, pp.581-584. 8. D. R. Cox, Renewal Theory, Methuen, London, 1962. 9. J. D. Esary, A. W. Marshall and F. Proschan, "Shock models and wear processes", Annals of Probability, 1, 1973, pp.627-649. 10. H. M. Taylor, "Optimal replacement under additive damage and other failure models", Naval Res. Logist. Quart, 22, 1975, pp.1-18. 11. T. Nakagawa, "A summary of discrete replacement policies", European J. of Operational Research, 17, 1984, pp.382-392. 12. C.Qian, S.Nakamura and T. Nakagawa, "Replacement and minimal repair policies for a cumulative damage model with maintenance", Computers and Mathematics with Applications, 46, 2003, pp.1111-1118. 13. R. M. Feldman, "Optimal replacement with semi-Markov shock models", Journal of Applied Probability, 13, 1976, pp.108-117. 14. T. Nakagawa, "On a replacement problem of a cumulative damage model", Operational Research Quarterly, 27 1976, pp.895-900. 15. T. Nakagawa and M. Kijima, "Replacement policies for a cumulative damage model with minimal repair at failure", IEEE Trans. Reliability, 38, 1989, pp.581-584. 16. S.Kosugiyama, T.Takizuka, K.Kunitomi, X.Yan, S.Katanishi and S.Takada, "Basic Policy of Maintenance for the Power Conversion System of the Gas Turbine High Temperature Reactor 300 (GTHTR300)"(in Japanese), Journal of Nuclear Science and Technology, Vol.2, No.3 (2003) pp.105-117. 17. S. Osaki, Applied Stochastic Systems Modeling, Springer Verlag, Berlin (1992).

DISCRETE REPAIR-COST LIMIT R E P L A C E M E N T POLICIES *

K. I W A M O T O f , T . DOfflt A N D N . K A I O * ^ Department of Information Engineering, Hiroshima University Higashi-Hiroshima 739-8527, Japan: [email protected] Department of Economic Informatics, Hiroshima Shudo University Hiroshima 731-3195, Japan: [email protected]

This paper addresses statistical estimation problems of the optimal repair-cost limits minimizing the long-run average costs in discrete setting. Two repair-cost limit replacement models are considered with and without imperfect repair. We introduce the discrete total time on test (DTTT) concept and propose non-parametric estimators of the optimal repair-cost limits.

1. Introduction Since the design of maintenance policies is closely linked to overall economic goals in industry, the optimal control of repair program is quite important to establish an advanced economic maintenance schedule. The usual experience suggests that if the repair cost is relatively cheap, the failed unit should be repaired, otherwise, replaced by new one. This kind of maintenance operation is called the repair-cost limit replacement (RCLR) policy, and has been extensively studied by many authors. Kaio and Osaki1 consider a simple RCLR policy under the assumption that the repair cost is independent and identially distributed (i.i.d.) continuous random variable. Dohi et al.2 extend Kaio and Osaki's continuous RCLR model 1 by introducing a lead time delay, and propose a statistical estimation algorithm based on the i.i.d. sample of repair cost data. Subsequently, Dohi et al.3 take account of the possibility of imperfect repair 4 , and analyze the similar "This work is supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research; Grant No. (B) 16310116 (2004-2006), No. (C) 17510139 (2005-2008) and the Research Program 2006 under the Institute for Advanced Studies of the Hiroshima Shudo University, Japan.

243

244

but somewhat different RCLR model with imperfect repair to the earlier one 2 . On the other hand, it is worth noting that since the realization of repair cost in real world is given by an integer value, the optimal RCLR problem should be considered in discrete setting. Kaio and Osaki 5 reformulate their own paper 6 as a discrete repair-time limit replacement (RTLR) model. Recently, Dohi and Kaio 7 consider a discrete RTLR model with a lead time and develop a non-parametric estimator based on the i.i.d. sample of discrete repair time data. This idea can be directly applied to the discrete RCLR model. This paper addresses statistical estimation problems of the optimal repair-cost limits minimizing the long-run average costs in discrete setting. Two RCLR models are considered with and without imperfect repair 2 , 3 . We derive the analytical conditions to exist the optimal RCLR policies. Further, we introduce the discrete total time on test (DTTT) concept7 and propose non-parametric estimators of the optimal repair-cost limits. Based on a graphical idea 2 , 3 , we give simple estimators of the optimal RCLR policies for two models, as functions of the i.i.d. repair cost. In fact, from the analogy to the continuous RCLR models 2 , 3 , we can expect that the resulting estimators have strong consistency, i.e., the estimates of the optimal repair-cost limit asymptotically approach to the real but (unknown) optimal solutions as the number of repair cost data increases. 2. Model Description 2.1. Model 1: Perfect

Repair

Model

Consider a single-unit repairable system, where each spare is provided only by an order after a lead time L(> 0) and each failed unit can be repaired orreplaced by new one. The original unit begins operating at time 0. Once the unit fails, the repair operation starts immediately. The mean time to failure for each unit is given by m / ( > 0). If the repair operation completes within a prespecified cost v0 (= 0,1,2, •••), the unit is installed at the completion time of repair. Then the mean repair time is given m s ( > 0), provided that the repair cost does not exceed VQ. It is assumed that the repaired unit is presumed as good as new. However, if the repair cost exceeds the cost VQ, the failed unit is scrapped, and a new spare unit is ordered immediately at the scrap point and delivered after the lead time L. Then the mean repair time to reach the cost VQ, i.e., the mean time to retire the repair, is given by m u ( > 0) provided that the repair cost

245

exceeds vo- Suppose that the time required for replacement of a new unit is negligible. From this point of view, we respectively call v0 and this model the repair-cost limit replacement (RCLR) policy and Model 1 in this paper. The repair cost for each unit is a discrete random variable having the cumulative distribution function (c.d.f.) H(v) with probability mass function (p.m.f.) h{v) and finite mean mm (> 0). Without any loss of generality, the c.d.f. H(•) is assumed to have an inverse function, i.e. H~1(-). Under these model assumptions, define the time interval from the start of operation to the following start of operation as one cycle. The costs considered in this paper are the following; a cost per unit time kf(> 0) is incurred for the shortage and a cost c(> 0) is incurred for each order. Especially, we make the following assumptions: (A-l) ms > mu + L, (A-2) kjiris < kf(mu + L) + c. The assumptions (A-l) and (A-2) imply that the mean repair time ms and the corresponding expected cost when the repair cost does not exceed VQ are longer and smaller than the mean down time and its associated expected cost, respectively. These assumptions are needed to motivate the optimal decision of RCLR policy. 2.2. Model 2: Imperfect

Repair

Model

Next, we consider a somewhat different model from Model 1. When the unit fails, the repair operation starts immediately. If the repair operation completes within the repair-cost limit VQ, the unit is installed at the completion time. In this case, the mean time to complete the repair is given by ms (> 0). Since the repair is imperfect, the unit may fail again for a finite time span with probability one, and the mean failure time provided that the repair is completed is given by ma (> 0). On the other hand, if the repair cost reaches VQ, then the repair operation is retired at that point, and the failed unit is scrapped. Then, a new spare unit is ordered with an ordering cost c (> 0) immediately and is delivered after a lead time L (> 0). In this case, the mean repair time to reach VQ is given by mu (> 0), provided that the repair cost does exceed VQ. Further, the new unit also may fail again for a finite time span with probability one, where the mean failure time is given m; (> 0) in this case. For this model referred as Model 2, we define the time interval from the start of repair to the following start of repair as one cycle. For the cost component, we make the following different

246

assumptions: (B-l) ms + ma > mu + L + mh (B-2) kfms < kf(mu + L) + c. The above assumption (B-l) is reduced to (A-l) if ma = mi. 3. Analysis of Optimal RCLR Policies 3.1. Model 1 The total expected cost for one cycle and the mean time of one cycle are given by TQ-1

EI{VQ)

= Y^ H{v) + kf [m,H(y0) + (mu + L)H{v0)] + cH(v0),

(1)

OO

VQ

Ti{v0) = Y^Cmf+ms)h(v)+

^2

{ms+mu

+ L)h{v),

(2)

respectively, where H(-) = 1 — H(-). Then it may be appropriate to adopt the long-run average cost in the steady-state as a criterion of optimality. Prom the standard discrete renewal reward argument, we have

d(v0)

E the total cost on[0, n] = lim - i 1 = EiM/T^vo),

(3)

so that the problem is to find the optimal RCLR policy VQ which minimizes the long-run average cost CI(VQ). Taking the the difference of Eq.(3) and dividing it by H(VQ) yields 9i(v0) = | l + \kf{ms

- (mu + L)} - c e(v0 + l ) | | m / +

msH(v0) VQ-}

+(mu + L)H(v0)\

-\ms-

{mu + L)]e(v0 +

l)[Y/H(v)dv v=0

+kf msH(v0)

+ (mu + L)H(v0)

+ cl(j)0)|,

(4)

where e{v) = h(v)/H(v — 1) is the repair rate, which is equivalent to the hazard rate of the c.d.f. H(v). Theorem 3.1: (1) Suppose that the repair-cost distribution H(v) is strictly DHR, i.e., e{v) is a strictly decreasing function of v, under (A-l) and (A-2).

247

(i) If qi(0) < 0 and 91(00) > 0, there exists (at least one, at most two) optimal RCLR policy v0* (0 < v0* < 00) satisfying qi(vQ - 1) < 0 and <7I(VQ) > 0. Then, the corresponding minimum long-run average cost has to satisfy K i K ) < C i K ) < / f i K + i),

(5)

where 1 + {kf{ms -mu(Tna -mu-

L) - c}e(v) L)e(v)

(ii) If gi(0) > 0, the optimal RCLR policy is v0* = 0, i.e., it is always optimal to order a new spare unit just after the unit fails. Then, the corresponding minimum long-run average cost is given by Ci(0) = {kf(mu + L) + c}/{m,f + mu + L}. (hi) If 00. 3.2. Model

2

Since the total expected cost for one cycle and the mean time of one cycle are given by vp — 1

E2(v0) = J2 ^

+ kf{msH{vQ)

+ (mu + L)H{v0)} + cH(v0),

(7)

v=0 i>o

T2{v0) = ^2{ms

00

+ ma)h{v) + ] T

l>=0

(m u + L + mi)h(v),

(8)

v=v0 + l

respectively, the problem is to derive the optimal RCLR policy VQ minimizing the long-run average cost C2(v0) = E2(v0)/T2(v0). It is obvious that Model 2 is reduced to Model 1 when m / = ma = mj. Define the function: 2(^0) = \kf[msmi 92(

- (mu + L)ma] - c{ms + ma) \e(v0 + 1) - lms + ma vv 0-l1

-{mu

+ L + mi)j^e(v0

°~

+ l) ^

U(v) - H{v0)} + mu + L

•mi.

v=0

(9)

248 Theorem 3.2: (1) Suppose that the repair-cost distribution H(v) is strictly DHR under (B-l) and (B-2). (i) If 92(0) < 0 and 92(00) > 0, there exiss (at least one, at most two) optimal RCLR policy v0* (0 < v0* < 00) satisfying 92(^0 — 1) < 0 and 52(^0) > 0. Then, the corresponding minimum long-run average cost has to satisfy K2(v*0) < C2(v*0) < K2(v*0 + 1),

(10)

where K2{v)=l

+ {kf(ms-mu-L)-c}e(v) (ms + ma - mu - mi - L)e{v)

(ii) If 52(0) > 0, the optimal RCLR policy is vo* = 0, and the corresponding minimum long-run average cost is given by C2(0) = {kf(mu + L) + c}/{mu + mi + L}. (iii) If 92(0°) < 0, the optimal RCLR policy is VQ* —• 00, and the corresponding minimum long-run average cost is given by C2(oo) = {mm + kfms}/{ms + ma}. (2) Suppose that the repair-cost distribution is IHR under (B-l) and (B-2). If {kf(mu + L) + c}{ms+ma} < {mm + kfms}{mi+mu + L}, then v0* = 0, otherwise, VQ* —> 00. 4. Statistical Estimation Algorithm Following Dohi and Kaio 7 , the scaled discrete total time on test (DTTT) transform of the c.d.f. H(v) is defined by H-1(p)_

(p) =

J]

H{v)dv,

0
(12)

where H~1(p) = min{v;H(v)>p}-l

(13)

is the p-fractile of the c.d.f. and where 00

mm = J21f(v).

(14)

249 From the definition above, it can be easily checked that H(v) is IHR (DHR) if and only if
0
(15)

where ^

=

mf+mu V

L

( > o)>

(16)

" *"U

S

=

+

cms - {kf{ms -mu-L)(ms - mu - L)mm mu + L + mi

a2 =

•

T

c}mf

( > 0),

(18)

ms + ma - mu - mi - L kj{msmi - (mu + L)ma} - c(ms + ma) (ms + ma - mu - mi - L)mm

a

From Theorem 4.1, it can be seen that the optimal RCLR policy VQ = H~1(p*) is determined by the optimal point p* so as to maximize the tangent slope from the point (—/%, —oij) to the curve (p, <j>{p)) = [0,1] x [0,1] on the two dimensional plane for Model j ( = 1 , 2 ) . Next, we propose a statistical method to estimate the optimal RCLR policies using the scaled DTTT-statistics for complete i.i.d. samples of repair cost. Suppose that the optimal repair-cost limit has to be estimated from an ordered complete sample 0 = v^ < V(i) < v^) < • • • < vtn) of repair costs from a discrete repair-cost distribution H(-), which is unknown. Then the scaled DTTT-statistics based on this sample are denned by ui = V'i/V'n, i = 0,1,2, ••• ,n, where i

^ ^ ( n - j

+ lXi/y)-!/(,-_!)),

i = l,2,--- ,n; Vo = 0.

(20)

i=i

By plotting the point (i/n, Uj), i = 0,1,2, • • • , n, and connecting them, we obtain a step function called the scaled DTTT-plot. Finally, we have the main result of this paper.

250 T h e o r e m 4 . 2 : Suppose t h a t an ordered complete sample 0 = f(o) < V(i) < v (2) < • • • < ^(n) of repair costs from a discrete repair-cost distribution H(-) which is unknown, is given. T h e n an estimate of t h e optimal R C L R policy is given by f(»*), where

-{'

$3 +Ui

\

mm — — - — }

/oi \

(21)

0
for Model j ( = 1, 2) and j3j is given in Eqs. (19) and (21) by replacing m m by E " = i "(j)A»5. C o n c l u d i n g R e m a r k s In this paper we have considered two discrete repair-cost limit replacement models with and without imperfect repair and have developed statistical estimation algorithms of the optimal repair-cost limits minimizing the longrun average cost in the steady state, from the complete sample of repair cost. Although we have omitted the detailed numerical results for brevity, it can be shown t h a t the resulting estimators have nice convergence properties as the number of repair cost d a t a monotonically increases.

References 1. N. Kaio and S. Osaki, Optimum repair limit policies with cost constraints, Microelectron. Reliab., 2 1 , 597-599 (1981). 2. T. Dohi, H. Koshimae, N. Kaio and S. Osaki, Geometrical interpretations of repair cost limit replacement policies, Int'l J. Relia. Qual. Safe. Eng., 4, 309-333 (1997). 3. T. Dohi, N. Kaio and S. Osaki, A graphical method to repair-cost limit replacement policies with imperfect repair, Math. Comput. Modelling, 3 1 , 99106 (2000). 4. D. G. Nguyen and D. N. P. Murthy, Optimal repair limit replacement policies with imperfect repair, J. Opel. Res. Soc, 32, 409-416 (1981). 5. N. Kaio and S. Osaki, A discrete-time repair limit policy, Adv. Manage. Sci., 1, 157-160 (1982). 6. N. Kaio and S. Osaki, Optimum repair limit policies with a time constraint, Int'l J. Sys. Sci., 13, 1345-1350 (1982). 7. T. Dohi and N. Kaio, Discrete repair-time limit replacement program, Proc. 5th Int'l Conf. Oper. Quant. Manage. (ICOQM-V), 2, 176-184, Seoul, Korea, October 25-27 (2004).

A P E R I O D I C OPTIMAL C H E C K P O I N T SEQUENCE U N D E R STEADY-STATE SYSTEM AVAILABILITY CRITERION *

K. I W A M O T O , T . M A R U O , H. O K A M U R A A N D T . D O H I Department of Information Engineering, Hiroshima 1-4-1 Kagamiyama, Higashi-Hiroshima 739-8527, E-mail: okamuQrel. hiroshima-u. ac.jp

University JAPAN

In this paper, we develop a computation algorithm for an aperiodic checkpoint placement problem with a preventive maintenance, which maximizes the steadystate system availability. The proposed algorithm is based on the usual dynamic programming, and provides an effective iterative scheme. In a numerical example, we investigate the dependence of model parameters on the optimal checkpoint sequence, and carry out the sensitivity analysis to examine the effects of failure parameter and the preventive maintenance time.

1. Introduction Checkpointing is one of the most significant fault tolerant techniques to establish the data diversity effect in file systems. The main purpose of checkpoint placement is to back up the data or process on a stable medium periodically or aperiodically. In our daily data processing and communications, the status of the process running on memory is preventively saved to secondary storage devices such as a hard disk at each checkpoint before system failures occur, because only the commonly used reactive techniques like rollback and rollforward recoveries or system reconfiguration have limitation to guarantee high dependability. Once checkpoints are placed, the process can be restarted from the latest checkpoint by loading the saved status, even though a system failure occurs, and the downtime caused by the failure will become relatively shorter. On the other hand, frequent checkpoints lead to loss of computation time and let checkpointing overhead increase. In other words, if one excessively places checkpoints, as a result, the system availability will extremely decrease. Hence, it is quite "This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Exploratory Research: Grant No. 15651076 (2003-2005).

251

252 important to place the checkpoints by taking account of both effects of checkpointing overhead and recovery overhead 1,2 . Young 3 obtains the optimal checkpoint interval approximately for restarting the computation after system failures. Chandy et al. * propose some performance models for database recovery, and calculate the optimal periodic checkpoint intervals which maximize the system availability or minimize the mean overhead during the normal operation. Grassi et al. 4 consider the optimal checkpoint policy for transaction-oriented systems with critical tasks. Vaidya 5 examines an impact of checkpoint latency on overhead ratio. Recently, Okamura et al. 6 proposes a non-parametric and periodic checkpoint-placement algorithm by applying reinforcement learning techniques and extend the Vaidya's model 5 . In this way, periodic checkpointing with equal time length between successive two checkpoints has been extensively studied in many researches. In fact, if the system failure time is exponentially distributed, this assumption can be validated under specific model assumptions. Apart from periodic checkpointing, some authors focus on aperiodic checkpoint placement problems which are natural extensions of the periodic checkpoint placement. Toueg and Babaoglu 7 develop a dynamic programming (DP) algorithm for a real-time system in the general case where the system failure time obeys the general probability distribution. Ziv and Bruck 8 discuss a sequential checkpointing strategy for a simple Markov model. Ling et al. 9 proposes an approximate solution for a sequential aperiodic checkpoint placement based on the variational calculus technique. Ozaki et al. 10 propose min-max checkpoint placement algorithms as well as exact computation algorithms to minimize the expected checkpointing cost till the completion of recovery operation after a system failure. However, it should be noted that the exact algorithms to derive the aperiodic checkpoint sequence strongly depends on the initial value, and involve unstable try-and-error methodology. In this paper, we develop a computation algorithm for an aperiodic checkpoint placement problem with preventive maintenance, which maximizes the steady-state system availability. The stochastic model under consideration is somewhat different from Ozaki et al. 10 , since the preventive maintenance is performed at a scheduled time and completion of rollback recovery in our model, where the preventive maintenance implies the rejuvenation of system 11 and can renew the system's failure mode to the initial state. It involves stopping the running process, cleaning the internal state and restarting the software system. Cleaning an external

253

state of software may be garbage collection, defragmentation, reinitializing data structures, memory deallocation, hardware reboot, etc. Noting that the checkpointing is a data diversity, it can be understood that the rejuvenation is a distinguished environment diversity technique for continuously running systems with long periods of time. The proposed algorithm in this paper is based on the usual DP 1 2 , and provides an effective iterative scheme. Here it is worth noting that the existing DP-based approach 12 for inspection-like policies gives an approximate solution by applying a discretization technique, but our method is based on an improved policy iteration algorithm and enables to derive the continuous optimal checkpoint sequence. Given a scheduled preventive (rejuvenation) time, the problem is to derive the optimal checkpoint sequence maximizing the steady-state system availability. In a numerical example, we investigate dependence of model parameters on optimal checkpoint sequences, and carry out the sensitivity analysis to examine effects of failure parameters and a scheduled preventive maintenance. Similar to the reference10, it is shown that the periodic checkpoint placement is not always optimal for the underlying availability maximization problem with preventive maintenance, even if system failures follow exponential distributions.

2. Model Description Suppose that a system operation starts at t — 0. It is assumed that the system undergoes a preventive maintenance at the time when the cumulative operation time reaches to time T (> 0). When the preventive maintenance is completed, the system's failure mode becomes as good as new. The system failure may occur according to the absolutely continuous probability distribution function F(t). Upon the system failure, a rollback procedure is immediately carried out to recovery the lost data or process, where the time required by the rollback operation is given by p(x) and x is the time interval between the latest checkpoint and the system failure point. More precisely, the system operation restarts again from the last checkpoint and the lost data or process is recovered to the state just before the system failure occurs. After the completion of recovery operation, the preventive maintenance is performed to initialize the system's failure mode and its time overhead is given by 7 (> 0). The checkpoints are aperiodically placed over the cumulative operation time period [0, J"). Let n = {£i,*2, •••} be the checkpointing schedule, where the expected overhead required by each checkpointing is /J,C (> 0).

254 checkpoint failure • maintenance

• failure time

0

v

h

iv "

Q/* , 711

£v-i

_

1 cycle i PW . recovery

Figure 1.

^

Checkpointing scheme.

At each checkpoint, only the data or process is sequentially saved on a secondary medium, but the system's failure mode does not change by checkpointing itself. Ozaki et al.10 consider a checkpoint placement problem with a finite planning horizon and call it the finite horizon problem. In fact, their model is quite similar to our model except for the renewal structure by preventive maintenance. Consequently, the checkpoint placement algorithm in this paper can be also regarded as an improved version proposed in the reference10. Figure 1 illustrates the checkpointing scheme considered in this paper.

3. Availability Analysis Let us consider the steady-state system availability as a criterion of optimally. Define the time period between successive preventive maintenance operations as one cycle. From the renewal reward theory, the steady-state system availability with the checkpoint schedule TZ is given by .

. .

ASS(TT)

=

,

hm

t—>oo

Efcumulative operation time in [0, t) -••-

t

E [cumulative operation time in one cycle] E[time length of one cycle]

(1)

Then the problem is to find the optimal checkpoint schedule TV* = {t\,t^,... } maximizing the steady-state system availability ASS(TV) under constraint 0 < t\ < t% < To cope with the optimal checkpoint schedule, we develop a DP algorithm under the assumption that the number of checkpoints during the cumulative operation time period [0, T) is fixed as N (> 1). During the time period between two successive checkpoints, [tj_i, U), the expected operation time A(ti\U-i) and the mean time length of one cycle T(ij|f j_i)

255 are given by A(ti\ti-1)=

fl Jo

* 'xdFWti-J

+ tti-ti-iWiU-U-dU-i),

(2)

T(U\U-i) = [ ' * \x + p{x) + 1}dF{x\ti-l) Jo + {U — ti-l + Hc}F(ti — ti-\\ti-i),

(3)

respectively, where t0 = 0, tN+1 = T and #(•) = 1 - (•)• In Eq. (4), F{-\-) represents the conditional probability distribution denned by F(s\t) = l-F(t

+ s)/F(t).

(4)

At the scheduled preventive maintenance time T, the above expressions are rewritten as follows: A{T\tN)= T(T\tN)

f Jo = f Jo

" xdF(x\tN)

+ (T-tN)F{T-tN\tN),

(5)

N

{x + P(x) + -y}dF(x\tN) + {T-tN+

7)F(T

-

tN\tN), (6)

Prom the principle of optimality, we obtain the following optimality equations hi = maxw(ti\t*_vhi,hi+i),

i — 1,...

,N,

/i;v+i=w(T|r^,/ii,/ii),

where the function w(ti\ti-\,so,

(7)

Si) is formulated as

w(ii|£i_i,s 0 ,si) = A{U\U-i) - ^ T ( t i | t i _ i ) _ +s0F(ti - ti_i|ti_i) + S!F{U - ti_i|ti_i).

(8)

In the above equation, { indicates the maximum steady-state system availability and hi, i — 1 , . . . ,7V" + 1 are relative value functions. In brief, deriving the optimal checkpoint schedule is equivalent to finding t\,... , t*N which satisfy the optimality equations. In this paper, we apply the policy iteration algorithm which is an effective algorithm to solve the above type of functional equations. Since the decision variables i i , . . . , £JV a r e continuous variables, however, it should be noted that the change of one checkpoint affects directly the performance for the next period. Therefore, even though the usual policy iteration is applied, it does not function well as it is. In this paper, we develop an improved policy iteration algorithm to solve the optimality equations with continuous decision variables. The

256

basic idea is to treat optimization problems for two successive periods at the same time. Define the following function instead of the original w(-): t«(*i | t j _ l , / l l , t « ( t » + i | t i , fti, /lj+2)) •

(9)

Finally the DP-based algorithm to derive the optimal checkpoint sequence can be developed as follows. DP-Based Algorithm Step 1: Give initial values k := 0,

(10)

to := 0,

(11)

*<°>:={t<°>,...,#>}. k

(12)

)

{k)

Step 2: Compute h[ \... ,h^ +1,^ under the policy TT . Step 3: Solve the following optimization problems: tf+1):=

argmax for i = 0,1,...

$

+1)

w^tf^M^U^M^)), ,N-1,

(13)

:= lagmaK w(t\t^llt0MT\t,0,0)).
(14)

Step 4: For a l i i = 1 , . . . , N, if \t\k+1) - tf]\ < 5, stop the algorithm, where 5 is an error tolerance, otherwise, let k := k + 1 and go to Step 2. In Step 2 of the above algorithm, we have to calculate the relative value functions. From the original optimality equations (7) and (8), we can find that the relative value functions under a fixed policy 7T = {*i,... , tjv} must satisfy the following linear equation: Mx = b,

(15)

where -F(U - U-ilU-i) 1 [M\i,i = { Tfclti-i) 00 x = {h2,... b=(A(t1\t0),...

if i = j and ifi = j + l, if j = JV + l, cotherwise,

,hN,hN+1,£)T, ,A(tN\tN+1),A(T\tN)Y.

j^N+1, (16)

(17) (18)

257 Table 1. The maximum steady-state availabilities with N = 1 , . . . ,10.

system

N

Ass

N

Ass

1 2 3 4 5

0.78186 0.80556 0.80834 0.80249 0.79285

6 7 8 9 10

0.78169 0.77014 0.75880 0.74779 0.73710

[•]ij denotes the (i, j)-element of matrix and T represents transpose of vector. Without a loss of generality, we set h\ — 0 in the above algorithm.

4. A Numerical Example In this section, we carry out the sensitivity analysis on optimal checkpoint schedule and its corresponding steady-state system availability. Suppose that the system failure time distribution is given by the exponential distribution F(t) = 1 — exp(—£/??), where // (> 0) is the scale parameter. The other model parameters are assumed as /xc = 7 = 1, T = 30 and p(x) = x. Figure 2 shows the optimal checkpoint intervals, where the number of checkpoints is set as N = 10. Even in the case where the system failure time is exponentially distributed, it is proved that the optimal checkpoint interval with preventive maintenance at time T = 30 is not constant 3 . This result on the sequential checkpoint placement has not been known yet. In other words, the constant checkpoint can be validated in only the case of T —» 00 if the preventive maintenance is considered. We next investigate the dependence of number of checkpoints on the steady-state system availability. Table 1 presents the maximum system availabilities for varying N = 1 , . . . ,10 under the same parametric conditions as Figure 2 shows. From the result, it is seen that the optimal number of checkpoints is N = 3. Figure 3 depicts the checkpoint intervals when N = 2,3,4. From this result, it can be found that the optimal number of checkpoints, say N = 3, is a change point from increasing to decreasing of sequence. This result seems to be interesting because the increasing sequence of the checkpoints is possible for placement. Also, it is noted that the resulting checkpoint sequence with N = 3 is closed to the constant one but is not exactly constant. Actually, since the steady-state system availability is a unimodal function of N, the change point A^ = 3 is the globally optimal number of checkpoints.

258 1

"= 1 0

•

rt =20

0

1=3!)

_ 4

t

i

i r 1

CP

Figure 2. Optimal checkpoint intervals with TV = 10.

2

3

r-l

4

CP

Figure 3. Optimal checkpoint intervals with N = 2 , 3 , 4 .

References 1. K. M. Chandy, "A survey of analytic models of roll-back and recovery strategies," Computer, vol. 8, no. 5, pp. 40-47, 1975. 2. V. F. Nicola, Checkpointing and modeling of program execution time, pp. 167— 188. New York: John Wiley & Sons, 1995. 3. J. W. Young, "A first order approximation to the optimum checkpoint interval," Comm. of the ACM, vol. 17, no. 9, pp. 530-531, 1974. 4. V. Grassi, L. Donatiello, and S. Tucci, "On the optimal checkpointing of critical tasks and transaction-oriented systems," IEEE Trans, on Software Eng., vol. SE-18, no. 1, pp. 72-77, 1992. 5. N. H. Vaidya, "Impact of checkpoint latency on overhead ratio of a checkpointing scheme," IEEE Trans, on Computers, vol. C-46, no. 8, pp. 942-947, 1997. 6. H. Okamura, Y. Nishimura, and T. Dohi, "A dynamic checkpointing scheme based on reinforcement learning," in Proc. 2004 Pacific Rim Int'l Symp. on Dependable Computing, pp. 151-158, IEEE CS Press, 2004. 7. S. Toueg and O. Babaoglu, "On the optimum checkpoint selection problem," SIAM J. of Computing, vol. 13, no. 3, pp. 630-649, 1984. 8. A. Ziv and J. Bruck, "An on-line algorithm for checkpoint placement," IEEE Trans, on Computers, vol. C-46, no. 9, pp. 976-985, 1997. 9. Y. Ling, J. Mi, and X. Lin, "A variational calculus approach to optimal checkpoint placement," IEEE Trans, on Computers, vol. 50, no. 7, pp. 699707, 2001. 10. T. Ozaki, T. Dohi, H. Okamura, and N. Kaio, "Min-max checkpoint placement under incomplete information," IEEE Trans, on Dependability and Secure Comput., vol. (in press), 2005. 11. V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, K. V. Vaidyanathan, and W. Zeggert, "Proactive management of software aging," IBM J. Research & Development, vol. 45, pp. 311-332, 2001. 12. H. Luss, "An inspection policy model for production facilities," Management Science, vol. 29, no. 9, pp. 1102-1109, 1983.

OPTIMAL PERIODIC PREVENTIVE MAINTENANCE POLICIES BASED ON ARI, AND A R I ^ MODELS

DO HOON KIM Department of Applied Information Statistics, Kyonggi University, Yiui-Dong, YeongTong-Gu, Suwon 447-760, Korea

94-6,

JAE-HAK LIM Department of Accounting, Hanbat National University, Sanl6-l, DuckMyoung-Dong, Yuseong-Gu, Daejeon 305-719, Korea In this paper, we propose two periodic preventive maintenance (PM) policies based on ARI| and ARI„ repair models discussed in Doyen and Gaudoin(2004). In ARI^, model, a repair reduces the hazard rate of an amount proportional to the current hazard rate while a repair has effect on the relative wear since the last PM in ARI, model. In both PM policies, the system undergoes the minimal repair at each failure between the preventive maintenances. For each PM policy, we derive mathematical formulas to evaluate the expected cost rate per unit time. Assuming that the system is replaced by a new one at the Mh PM, the optimal values of N, which minimizes the expected cost rate, is solved. For the purpose of illustrating and comparing two proposed PM policies, a numerical example is given when the lifetime distribution of a system is Weibull distribution.

1. Introduction As most of industrial systems become more complex and multiple-function oriented, it is extremely important to avoid the catastrophic failure during actual operation as well as to slow down the degradation process of the system. One way of achieving these goals is to take the preventive maintenance (hereafter, PM) while the system is still in operation. Although more frequent PM's certainly would keep the manufacturing system less likely to fail during its operation, such PM policy inevitably requires a higher cost of maintaining the system. Since Barlow and Hunter(1960) propose two types of PM policies, many authors have addressed the problem of designing the optimal schedule for the PM by determining the length of time interval between PM's to minimize the average cost rate of the system. Different types of PM policies studied in many literatures are summarized in Pham and Wang(1996) and Wang(2002). 259

260

In most of the PM policies discussed earlier, each model assumes that the system undergoes PM at specified periodic times and is restored to as good as new after each PM. However, although the PM improves the system and slows down the degradation process, it is very unlikely that the PM restores the system to the one like new for practical systems in use. That introduces the concept of imperfect repair or imperfect PM model, which has been attracted by many researchers. Chan and Downs(1978) suggest criteria for imperfect preventive maintenance, in which PM is imperfect with probability p. Nakagawa(1979, 1980) proposes optimal PM policy when PM is imperfect. Similarly, Murthy and Nguyen(1981) study the optimal age replacement policy with imperfect preventive maintenance in which the system after PM is either fails instantaneously with a probability p or is like new with a probability 1 - p . Brown and Proschan(1983) introduce imperfect repair model in which a repair restores a failed system either to the state as good as new with probability p or to the state just prior to the failure with probability \-p . Doyen and Gaudoin(2004) propose two classes of imperfect repair models based on reduction of hazard rate or virtual age, which is arithmetic reduction of intensity(ARI) model or arithmetic reduction of age(ARA) model, respectively. In this paper, we propose two periodic PM policies, namely ARI\ PM policy and ARJX PM policy, based on ARI\ and ARI^ repair models which are discussed in Doyen and Gaudoin(2004) . For each PM policy, the expected cost rate per unit time is derived. We discuss the optimal number of PM's, which minimizes the expected cost rate per unit time and compare two PM policies numerically in the sense of the optimal number of PM's. Section 2 describes the periodic PM policies and its assumptions. In section 3, for each PM policy, the expression of the expected cost rate per unit time is obtained. Section 4 discusses the optimal number of PM's for two PM policies. In section 5, a numerical example is given. The following notations are adopted throughout this paper. Notation

m MO X

N P Cmr pm

hazard rate without PM hazard rate with PM period of PM number of PM's when the system is replaced probability which system is perfect repair, 0 < p < 1 cost of minimal repair at failure cost of PM

261 Cre C(x, N)

cost of replacement expected cost rate per unit time

2. Model and Assumptions We consider two periodic PM policies in association with imperfect repair models discussed by Doyen and Gaudoin(2004). Throughout this paper, we postulate the following assumptions. The system begins to operate at time t = 0 . 1.

The PM is done at periodic time kx (k -1,2, • • •, N; x>0), and is replaced by new one at the TV th PM. 2. In AR1\ PM policy, the hazard rate h(kx+) right after the k th PM is reduced to h(kx-)-p[h(kx-)-h((k-\)x-)] where h(kx-) is the hazard rate just prior to the k th PM, h((k - l)x-) is the hazard rate just prior to the (k - \) st PM and 0 < p < 1. When p = 0 , the system after PM is as bad as old one while when p = 1, the system after PM returns to the state right after the last PM. 3. In ARIX PM policy, the hazard rate h(kx+) right after the k th PM is reduced to h(kx-) - ph(kx-) . When p = 0 , the system after PM is as bad as old one while when p = 1, the system after PM is as good as new one. 4. The system undergoes only minimal repair at failures between PM's. 5. The repair and PM times are negligible. 6. h(t) is strictly increasing and convex function. 3. Proposed PM Policies and Expected Cost Rate The PM policies we propose in this paper are ARI\ PM policy and ARI^ PM policy. For both policies, the system is preventively maintained at periodic times kx and is replaced by a new system at the N th PM, where k = 1, 2, ..., N. It is assumed that the system undergoes only minimal repair at any failure between PM's and hence, the hazard rate remains unchanged by any of minimal repairs. In ARl^ PM policy, the PM reduces the hazard rate of an amount proportional to the current hazard rate, while the PM has effect on the relative wear since the last PM in ARI\ PM policy. It is also assumed that the wear-out speed after PM is the same as the one before the PM is conducted. More explicitly, the hazard rates of the proposed periodic PM policies are as follows. 1.

ARIX PM policy :

262

hBm{t)2.

h{t) h(t)-ph(kx)

0
(1)

ARIX PM policy :

\Kt) h

"»W

for k = l,2,-,N

0
\h{t)-p'Yu^-p)ih{{k-j)x) , OZpZl,

kx
(2)

hpm(0) = h(0).

To derive the formula to compute the expected cost rate, we use the wellknown fact that the number of minimal repairs between the k th PM and the (k +1) st PM follows a nonhomogeneous Poisson process(NHPP) with intensity function fja^x hpm{t)dt. Since the life cycle of the system is equal to Nx and the total cost of maintaining the system is obtained as the sum of costs for PM, minimal repair and replacement, the expected cost rate per unit time during the life cycle can be obtained as follows. 1.

ARI] PM policy : Ct(x,N):

J_ C

mr

H(Nx) - />*£ h(kx) \ + (N- \)Cpm + C,

(3)

~Nx~

2.

ARIX PM policy :

C2(x,N) =

1 Cmr H(Nx) - / « £ £ ( 1 - p)Jh((k - j)x) \ + {N- \)Cpm + C, (4) Nx *=o y.o

where HU) = f h(x) dx . Jo

4. Optimal Schedules for the Periodic PM Policies To design the optimal periodic PM policies, we need to find an optimal number yV of PM needed for replacing the system by a new one. The decision criterion to adopt is to minimize the expected cost rate during the life cycle of the system. 4.1.

ARIX PM Policy

Suppose that the period x is known. To find the optimal /V* , which minimizes C, (x, N), we form the following inequalities.

263

C, (x, N + l)>q (x, N) and C, (x, AT) < C, (x, JV -1)

(5)

For 0 Cx{x,N) q(x,N)< q(x,N-\) imply mx,N)>Cre

Cpm

C

C and L,(;c,JV-l)<" 1V

"

"'~'

'

mr

C/

C

""

and

(6)

n„

respectively, where L{(x,N) = N{H((N + l)x)-H(Nx)}-H(Nx)-Npxh(Nx)

+

px£h(kx).

Lemma 1: If h(t) is convex and strictly increasing in t > 0, then L^ (x, N) is increasing function in N. Proof: Omitted. • Theorem 2: If h(t) is convex and strictly increasing in t > 0 , then there exists a finite and unique N' which satisfies (6) for any x > 0. Proof: It is shown from Lemma 1, that L^(x,N) is increasing in N. And we have L,O,AO = 2 , I J N

k=0

Kt)dt-pxh(Nx)\-\)

x2h'(Nx) 2

W)dt-pxh(kx)

I

x{h((k + l)x)-h(kx)}' 2 (7)

=

-[Nxh\Nx)-h{Nx)]

which becomes oo as A' —> oo . Thus there exists a finite and unique N* which satisfies (6) for any x > 0.1 4.2. ARIn PM Policy By utilizing the similar method in section 4.1, we have L2(x,N)>Cre

Cpm

and L2(x,N-\)
Cpm

(8)

264

where

L2(x,N) = N{H((N + \)x)-H(Nx)}-H(Nx)-Npx N-\k-\

+ pxY

f

(\-p)Jh((N-j)x)

Y(\-p)Jh((k-j)x).

k=0j=0

Lemma 3: Let £(A:)=J

h{t)dt - x^(1 - p)Jh{(k - j)x).

Suppose that

£(&) is increasing in k . If h{t) is strictly increasing in t > 0, then L2(x, N) is increasing function in N. Proof: Omitted. • Theorem 4: Let %(k) be the same as one in Lemma 3. Suppose that ^^ [^(A^) — ^(A:)] > —^

— as A r ->oo. Then there exists a finite and unique

N' which satisfies the equation (8). Proof: Omitted. • 5. Numerical Example This section presents numerical example to illustrate the optimal PM policies discussed in Section 4 and to compare between ARI\ and ARI^ PM policies. As the underlying life distribution of the system, we take a Weibull distribution with a scale parameter A and a shape parameter /?, of which the hazard rate is given as h(t) = j5tP~xtp~x for /3>0 and r>0 . For the hazard rate to be strictly increasing and convex which is required for the existence of unique solution, we consider the special case when p = 2.2 and X = 1 for t > 0. Table 1 presents the values of N for various combinations of Cre and p when we fix Cmr=l.O , Cpm=\.5 and x = 0.8 . As for Cre , we take Cre =2.0 to 3.5(0.5) so that the ratio (Cre -Cpm)/Cmr varies 0.5 to 2(0.5) and take p = 0.1 to 1.0(0.1). It is apparent from Table 1 that the value of N increases as the cost for replacement gets higher. And it is interesting to note that as the value of p, which represents the effect of PM, gets higher, N* increases and the expected cost rate decreases. In other words, the better the PM effect is, the greater the optimal number of PM's for replacement is. It is also shown from Table 1 that, for a fixed Cre, the optimal number of PM's under ARI\ PM policy is smaller than the optimal number of PM's under AR1X PM policy. It is quiet natural since the PM in ARI] PM policy has an effect on the

265

relative wear since the last PM while the PM has an effect on the global wear in ARIX PM policy. Table 1 Optimal number of PM's N'

and its expected cost rate C(x,N')

with

Cmr = lCpm = \.5 and x = 0.8 and /? = 2.2 c„ 2.0 P 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

A

C(x, N')

2.5 N'

;> 5

3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.1878 3.0926

1 1 1 1 2 2 2 2 3 5

I 3

3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.2651 3.1878 3.0926

1 1 1 1 2 2 2 3 4 5

C(x, N')

3.5

3.0 N'

ARI, PM policy 3.8901 2 3.8901 2 3.8901 2 3.8901 2 3.8369 2 3.7527 2 3.6686 2 3.5844 3 3.4859 3 3.2582 7 ARI„ PM policy 3.8901 2 3.8901 2 3.8901 2 3.8901 2 3.8369 2 3.7527 2 3.6686 3 3.5811 4 3.4328 5 3.2582 7

CO, AO

N"

CO, Af')

4.4860 4.4019 4.3177 4.2336 4.1494 4.0652 3.9811 3.8972 3.6942 3.3625

2 2 2 2 2 2 3 3 4 9

4.7985 4.7144 4.6302 4.5461 4.4619 4.3777 4.2726 4.0876 3.8740 3.4400

4.4860 4.4019 4.3177 4.2336 4.1494 4.0652 3.9464 3.7783 3.5742 3.3625

2 2 2 2 2 3 4 5 7 9

4.7985 4.7144 4.6302 4.5461 4.4619 4.3229 4.1482 3.9206 3.6762 3.4400

References

4.

5.

R. E. Barlow and L. C. Hunter, Preventive Maintenance Policies, Operations Research 9, 90-100 (1960). M. Brown and F. Proschan, Imperfect Repair, J. of Applied Probability 20, 851-859(1983). P. K. W. Chan and T. Downs, Two Criteria for Preventive Maintenance, IEEE Trans. Reliability 35, 272-273 (1978). T. Nakagawa, Optimal Policy When Preventive Maintenance Is Imperfect, IEEE Trans. Reliability 28, 331-332 (1979). T. Nakagawa, A Summary of Imperfect Preventive Maintenance Policies with Minimal Repair, R.A.I.R.O. Operations Research 14, 249-255 (1980).

266

6. 7. 8.

9.

D. N. P. Murthy and D. G. Nguyen, Optimum Age-Policy with Imperfect Preventive Maintenance, IEEE Trans. Reliability 30, 80-81 (1981). H. Pham and H. Wang, Imperfect Maintenance, European Journal of Operational Research 94,425-438 (1996). L. Doyen and 0 . Gaudoin, Classes of Imperfect Repair Models Based on Reduction of Failure Intensity or Virtual Age, Reliability Engineering and System Safety 84, 45-56 (2004). H. Wang, A Survey of Maintenance Policies of Deteriorating System, European J. of Operational Research 139, 469-489 (2002).

A CONTINUOUS-WEAR LIMIT REPLACEMENT-POLICY WITH RANDOM THRESHOLD UNDER PERIODIC INSPECTIONS BAE JIN LEE Memory Division, SEC, Hwasung,

Korea

CHANG WOOK KANG Department of Mechanical and Information Management Hanyang University, Ansan, Korea

Engineering,

SEONG-JOON KIM, SUK JOO BAE Department of Industrial Engineering, Hanyang University, Seoul, Korea This study suggests a preventive maintenance model for the system which wears continuously in time with a random breakdown threshold under periodic inspections. When each item has significant individual variation to withstand shocks, or component failures are not fully dependent on a physical wear variable which can be measured, it is reasonable that the breakdown threshold is not constant and has a certain distribution. In this paper, the wear accumulated continuously in time is represented by the infinitesimal renewal process. The item is preventively replaced if the wear at periodic inspections exceeds a certain wear limit; on failure, it is replaced immediately. The optimal wear limit for preventive replacement which minimizes the long-run total expected cost per unit time is derived by using renewal theory.

1. Introduction Consider an item(system) which is subject to random failure by degradation. Since its failure during operation might be costly and/or dangerous, there is an incentive to attempt to replace the item preventively before failure. However, if the degradation is dependent on cumulative wear which can be measured, then it is more direct and appropriate to base the preventive replacement on its measurable status rather than on its age. When each item has significant individual variation to withstand shocks, or component failures are not fully dependent on a physical wear variable which can be measured, it is reasonable that the breakdown threshold is not constant 267

268 and has a certain distribution[2]. The item can fail at any wear level - of course, at different probability. In this paper, we consider the situations which can afford only periodic inspections. Car brake linings are typical examples which wear continuously but difficult to monitor continuously. Most of previous works in modeling wear are shock models in which the cumulative wear is a jump process[4, 5]. Few studied nonjumpy, continuous wear processes[3, 6]. Abdel-Hameed[l] related the life distribution properties of a device subject to wear occurring randomly in time as a continuous gamma process to the corresponding properties of the probability of surviving a specified amount of wear. We derive the optimal wear limit for preventive replacement which minimizes the long-run total expected cost per unit time by using renewal theory. 2.

Model

Consider an item that fails when it wears beyond a certain breakdown threshold. The breakdown threshold is a random variable because of heterogeneous quality of the item or imperfect information. The wear accumulates continuously in time starting zero; its increments are nonnegative, stationary, and s-independent. At periodic inspections, it is preventively replaced if the measured wear exceeds a wear limit; otherwise, it continues to be used. Inspection or replacement is instantaneous. On failure, the item is replaced immediately and new inspection cycles start. Notation Q CR CB

cost of an inspection replacement cost of item loss from breakdown during operation

X{t) G,(x) G,(x)

random wear accumulated in time interval [0, t], X(0) = 0 cdf of X(t), G0(x) = 1 for all x > 0, Gt(0) = 0 for all t > 0 =l-G,(x)

gt(x)

pdfof^(0

M ( x)

renewal function, M(x) = ^ G; ( x ) = ] T Gf ° ( X )

G,(,) ( x )

i-fold Stieltjes convolution of Gx ( x )

m(x)

renewal density, m{x) = - r - M(x) = 2 ^ &•(•*)

00

ax

00

M

269

Tx

random time taken for the wear to reach level x

Fx(t) B H(b)

cdfofT; breakdown wear threshold, r.v. cdfoftf

H(b)

=\-H{b)

r

wear limit for preventive replacement

Gt(r,b)

=

?r{X(i-l)
Figure 1 shows typical examples of wear process.

0

1

2

3

4

(a) The situation being B > r

0

1

2

3

(b) The situation being B < r

Figure 1. Typical example of the wear process X(t)

3. Optimal Replacement Wear Limit Consider one cycle from the beginning of the item operation to failure or preventive replacement. Without loss of generality, let the passage of time be measured in the time unit of an inspection interval; then the wear is inspected at t=l,2,3,-. 3.1. Total Mean Cost per Replacement Cycle C(r) The item is preventively replaced only when X(t) > r at discrete inspection points of time and the item has operated at that time. The expected cost when the item is preventively replaced is:

270 00

C, =^(i-C,+CR)

Pr{item is replaced at /th inspection}

where the probability that item is replaced at rth inspection is: Prfitem is replaced at z'th inspection} = JT Pr{X(i -\)
X(i) < b}dH{b)

= [{Gi{r,b)-G«\r)}dH{b)

(1)

r

Gx(b- x)dG^) (x)dH(b) - G,<0 (r) -H(r)

The expected cost when item breaks down between inspection points is: C2=^

{(i - \)C, +CB + CR) Pr{item breaks down between inspections (/' -1) and /}

where the probability that the item break down between inspections (/-/) and i is: Pr{item breaks down between inspections (/ -1) and i} =

^Pr{X(i-l)
< b,b < X(i)}dH(b)

(2)

x)}dGi'-l) (x)dH(b)

{\-G[{b-x)}dG\i-'\x)dH{b) Therefore, total mean cost per replacement cycle C ( r ) becomes: C(r) = Cl+C2 = CR + CB - (CB - C,) { [ G, (b)dH(b) + [ [ G, (b - x)dM(x)dH(b)\ 3.2. Mean Length of a Replacement Cycle T(r) From (1), the expected cycle length of preventive replacement before failure is: oo

Tt = Vj: • Pr{item is replaced at rth inspection}

= z «• • {r i G>(b'x)dG"~>} ww) -GI° w • ^w}

(3)

271

The expected cycle length when item breaks down between inspections is: T2 = ^£{itemns breakdown time between inspections (i-1) and i}

{(/ + F 5 r n ' -*> W »- w* ?"" (x)dH(b) = Z + {o i } + F (,)dG x)

r n - W >-

= 2 > r i^WGr\x)dH(b) i=\

+

'" w ^ w

- £ [° [ Ji Fb_x{t)dtdGr\x)dH{b) i=\

Z > [ f ^ a W ' " " (*)<»(*)-Z [ f J [ F ^ ( O ^ G , ( W ) W C / / / ( A ) j=i

I=I

Since F^(7) = \-Gt(x)

because X(t) > x if and only if Tx
T

: = Z' • f C{1" G > ( 6 - *> W"" W«^(*) /=l

~ Z f 1 1 ' G, (b - x)dtdGr} (x)dH(b) +

(4)

Z I fj[ G< (b - x)dtdG\>-» (x)dH(b) -H(r)-{1 + M(r)}

Therefore, mean cycle length T(r) becomes: T(r) = JT ^G,{b)dtdH{b) + J" JT

£G,(b-x)dtdM(x)dH(b) (5)

fl

G,(b-x)dtdM(x)dH(b)

3.3. Optimal Wear Limit From the renewal reward theorem[7], the long-run expected cost per unit time is:

CW = f i T(r) The optimal wear limit r* minimizes C(r). 3.4. Characteristics of C(r) l)Whenr = 0in(3)&(5)

(6)

272

C(0)

C(0) = ^!L = T(0)

Cs+Ca-(CB-Of GiiftdHQ) j * [ [G,(b)dtdH{b)

(7)

Physically, C ( 0 ) represents the long-run expected cost per unit time when the item is replaced at the first inspection, irrespective of the wear. 2) When r = <x> in (3) & (5) C(oo) C(oo) = ^ i - U

CR + CB+cS

;

r(00)

f l G ' ib)dtdH^

where I M(b)dH(b)

JL

M(b)dH(b)

+ r f 1G'(b - x)d'dM(t)dH(b)

(8)

denotes the expected total number of inspections before

breakdown. T(oo) represents the time until the item breaks down without any replacement at inspections. 3) Differentiate C(r) and T{r) with respect to r, C\r) = -{CB-Cl)m{r)^Gx{b-r)dH{b) T'(r) = m(r) f°° [' G, (b - r)dtdH(b) by Leibnitz's rule. F r o m C ' O ) and C(r\ ~V'

+ CBm{r)H{r) >0

T'(r), m{r)

Fir)}2

[-iCs-qtfGtf-

~fiG,(b-r)dtdH(b)C(r) m(r) •V(r) {T(r)}2 V(r) = {-(CB - C,) f G, (b - r)dH{b) + CBH(r)}T(r) Where '

G,(b-r)dtdH(b)C(r)

>0

273

Because of -— z ~ > 0 for all r > 0, optimal wear limit r* is the solution of {T(r)} Y ( r ) = 0.

1. Y(0) = -C* f" JG,(b)dtdH(b) which implies that

when Cfi > 0 when Q = 0

1^(0) = 0 since 2.

£ $G,(b)dtdH(b)>0

¥(00) = 0 _ _ Therefore, C ( r ) decreases at the outset and converges C(oo) as r increases, when CR ^ 0.

4. Numerical Example Among possible distributions, the gamma distribution has been used to describe continuous cumulative wear process with all nice properties of an infinitesimal renewal process[8]: nonnegative, stationary, and s-independent increments starting from zero level. In this example, the distribution of X(t) is a gamma distribution with shape and scale parameter, t and 2, respectively. The distribution of breakdown threshold B is a gamma distribution with shape and scale parameter, 2 and 0.5, respectively. Figure 2 shows behavior of C ( r ) as a function of replacement wear limit r with different value of CB when CR = 2 and Q = 0.1. In Figure 2, optimal wear limit r* at each cost structure is minimum point of r at each curve.

C« = 30

^~~-^~^~"

\

\r^ '^L V^z_-

^

x^_

"^•^^^r^r^^

w.

Cfl = 20 —

-

™

—

CB=i0 C„ = 5

C„=\ C„ = 0

Figure 2. r vs C ( r ) with different values of CB when CR=2

and Cj

=0.1

274

5. Conclusions We have considered a preventive maintenance model for the system which wears continuously in time with a random breakdown threshold under periodic inspections. The wear accumulated continuously in time is represented by the infinitesimal renewal process. The item is preventively replaced if the wear at periodic inspections exceeds a certain wear limit; on failure, it is replaced immediately. The optimal wear limit for preventive replacement which minimizes the long-run total expected cost per unit time is derived by using renewal theory. References 1. M. Abdel-Hameed, A Gamma Wear Process, IEEE Transactions on Reliability 24, 152-153 (1975). 2. Arjas, E. C. K. Hansen and P. Thyregod, Heterogeneous Part Quality as a Source of Reliability Improvement in Repairable Systems, Technometrics 33(1), 1-12(1991). 3. J. Esary, A. Marshall, F. Proschan, Shock Models and Wear Processes, Annals of Probability, 627-649 (1973). 4. R. Felderman, Optimal Replacement with Semi-Markov Shock Models, Journal of Applied Probability 13, 108-117(1976). 5. A. Mercer, Some Simple Wear-Dependent Renewal Processes, Journal of Royal Stat. Society B23, 368-376 (1961). 6. K. S. Park, Optimal Continuous-Wear Limit Replacement under Periodic Inspections, IEEE Transactions on Reliability 37(1), 97-102 (1988). 7. S. M. Ross, Stochastic Processes, John Wiley & Sons, (1983). 8. W. L. Smith, Renewal Theory and Its Ramifications, Journal of Royal Stat. Society B20, 243-342 (1957).

A M A I N T E N A N C E MODEL FOR M A N U F A C T U R I N G LEAD TIME IN A P R O D U C T I O N SYSTEM W I T H B M A P I N P U T A N D BILEVEL SETUP CONTROL *

H. W . L E E Dept.

of Systems Management Engineering, Sungkyunkwan Su Won, KOREA, Email: [email protected]

University

N. I. P A R K BcN Architecture Team, BcN Research Division, ETRI Daejon, KOREA, E-mail: [email protected]

In this paper, the manufacturing lead time in a production system with maintenance period, non-renewal BMAP (Batch Markovian Arrival process) input and bilevel threshold control is analyzed. The factorization principle is used to derive the distribution of the manufacturing lead time and the mean value. A numerical example is provided to see the effect of the non-renewal input on the system performance.

1. Introduction In many production systems, maintenance and setup operations are very costly. One way to reduce the setup cost per unit time is to delay the production until some number of raw materials accumulate. This will increase the length of the production cycle and thereby reduce the number of setups per unit time. This is the well-known TV-policy in queueing context. The TV-policy is useful especially when the setup cost is extremely high compared to the WIP holding cost. The TV-policy was first studied by Yadin and Naor [15]. For other works on TV-policy queues, see Takagi [14], Lee et al. [4] [5] [7] and Lee and Park [6], to list a few. In this paper, we consider a single-machine production system with the following specification: (1) (Maintenance period) Jobs are processed one at a time. The system *This work was supported by the SRC/ERC program of M O S T / K O S E F grant # R l l 2000-073-00000.

275

276

undergoes a maintenance period of random length V when there are no jobs to process. (2) (Bilevel threshold control) After the maintenance, if there are a or more jobs waiting, a setup of random length H is initiated (setup period). Otherwise, the system waits for the number to reach or exceed a (buildup period). After the setup, if the number is greater than or equal to N, the system immediately begins to process the jobs. If not, it waits until the number reaches or exceeds N (stand-by period). The processing times {Si, S2, • • • } are iid. Also, the setup times {Hi,H2, • • •} are iid. (3) (Non-renewal input) Jobs arrive according to the non-renewal BMAP (Batch Markovian Arrival Process) with parameter matrices {DQ,DI,D2, •. } where D = Yl^Lo-^n is the infinitesimal generator of the underlying Markov chain (UMC). The objective of this study is to find the distribution of the manufacturing lead time (MLT) as a function of the maintenance period parameters. The MLT of our model corresponds to the system sojourn time and the processing time corresponds to the service time in a queueing system. Also, the maintenance period can be seen as the vacation which the server takes after a busy period. An excellent treatment of BMAP/G/1 queue and computational algorithms can be found in Lucantoni et al. [10], Lucantoni [11], Lee et al. [7], Ramaswami [13], Neuts [12] and Latouche and Ramaswami [3]. The queue length process of the system under study in this paper was fully analyzed in Lee at al. [8]. Lee and Park [9] analyzed the waiting time of the BMAP/G/1 queue under simple AT-policy and vacations. 2. Preliminaries The probability vector 7r of the UMC can be obtained from irD = 0, 7te = 1 where D = Y^=o Dn, and e is the column vector of l's with appropriate dimension. Then, the group arrival rate Xg and customer arrival rate A can be obtained from Xg = —irD0e and A = it Y^n=\ nDne. An arbitrary group is of size k with probability 7^ = ^J?** „ = w , ke. Our analysis needs some information on the simple BMAP/G/1 queue under Af-policy. This system will be called the system-AT. For the system-Af, let us define D*k as the probability that the idle period process ever visits level k, Qn as the matrix probability of n customers at the busy period starting point and T*N(9) as the matrix Laplace-Stieltjes transform (LST) of the idle period with TN = - ^ T ^ ( ^ ) | f l = Q e . Then we have, from Lee et al. [8], D*0 = / , £ ) * =

277

Q» = (-D0)-1Dn

£ ( f o DU-Dor'D^,

ZfS1H-Do)-1DjQ^,

+

T*N(6) = (61 - D o ) " 1 { E t Y *>* [n_fc(«) - I]+ D - Do} and rN

2.1

T/ie factorization

property

and derivation

of

=

pidle(z)

Chang et al. [1] proved that for the BMAP/G/1 queue with vacations, the joint transform pproc(z,ff) of the queue length (i.e., number of jobs) and the remaining service time (i.e., processing time) at an arbitrary time in a busy period is related to the queue length GF Pidieiz) a ^ a n arbitrary time during the idle period by [A(z) - S*(9)I] [zl - A(z)} [91 + D(z)}-1 (2.1) Let us define P^ as the time-average probability that the system is in period-(i) under the condition that the system is idle, {i = maint,bu,su,sb},
Pjrocfr *) = (l-P)pidie(z)zD(z)

E r = o ^ P ; - i , H% = V. + T^^VjQtlj, Ha(z) = T,ZaHakzk V{z) + YZ:lo*bn{-D0)-lz"D{z), Ht = E U W - . . * ? EtaHtDt_l,K = V(G)H(G) + j:Vo *bnu(-D0)-1GnD(G)H(G) Zn=a *s*{-D0)^GnD{G).

= = +

The idle period is the sum of the periods and its mean length becomes N-l

n-l

E(I) = K E(V)I + Y, ^(-Do)'1 k—Q

h—OL

which leads to Pmaint = f g f , Pbu = " E * = " %l{Do)^', _ «Et~,' «i t (-flo)- 1 e P ^sb ~

n'i-Do)-1

+ E(H)I + J2

PSu = §§},

and

E(I)

T h e o r e m 2.1. (Lee et al. [8]) Let Pu)(z) be the vector PGF of the queue length at an arbitrary time during period-(i) under the condition that the system is idle. Then, Pidie(z) can be obtained from Pidle(Z)

where prnamt{z)

= Pmaint(Z)

+ Pbu(ZY+

= P,naint • KV+(Z),

pbu(z)

Psu(z)

+ Psb(Z),

= Pbu • RJ^^£}7D°\-L

(2-2)

,

278

PTO(*) = Psu-nHa{z)H+{z),

psb(z)

=^•

K K

|U$(.°j".'v ^

+

W =

3. Distribution of the manufacturing lead time The vector LST wA(9) of the waiting time of an "actual job" becomes where w*,A0) is the LST of the waiting time of an actual job under the condition that the job arrives during period-(i). Then, the vector LST w*MLT(0) of the manufacturing lead time of an arbitrary job can be obW tained from *MLT(0) = W*A(9)S*(0)(3-lb) (Derivation of Wproc
(Derivation of w^^ A{6)) Following Lee and Park [9] and skipping the detailed derivation, we get » L 0 y f D « i t T p)Pmaint • K {Ex + Eo +jE3 + E4 + E5 + E6} (3.3) £i = E E E »v{a,j,b,e)[s*{e)]a1i2is''^i~1 a-0 j=\ N-a-j-b

b=0

•> i = l

c—a—a—j—b N — a—j — b — c

J2

oo

Hl(0)T*N_a_j_b_c_k{0)+

k=N—a—j—

fc=0 1 a — a a —a—j

E2 = E E a=0 j=l oo

2

H*k(0) b-c+1

^(a,J,6,0)[5*((9)]o^[S*((?)]<-1

E 6-0

J2

^ i=l

T*a_a_j_b(0,c)H'(0),

£3 = E E E ' ^(a,J)M)[s,(o)r)t[s'wi1"1 a = 0 j = l b=a~a-j

+l

-* i - 1

N — a—j — b

£

oo

Ht(8)T'N_a_J_b_k{e)+

fe=0 a —1 a —a

+E E

H e

£

^)

k=N-a-j-b-\-\ oo

-,

3

E flvfojAflMsWjEP'o?)]'-1^).

o=0 j = l 6 = J V - a - j + l

^ »=1

£

Q-1

N-n-a

N-a—j

-.

E ^M^MMsWyE^w- 1

4=E E a=Oj=o~o+l

6=0

»=1

N — a—j — b

^

3

oo

H:(fl)n_a_i_t_fc(fl)+

F

E

jfc=0

*W

fc=JV-o-j'-6+l

iV-a

Q-1

£

5= E

E

a —1

«v(«.*M)[sw-$>*(0)]i_1tf*(0)

E

oo

oo

-.

j

+ E E E^ajA^w^E^wr 1 ^)a = 0 j = W - a + l 6=0

£6=EE f f ^ ^ ^ ^ w ^ E ^ w - 1 a = a j=l

n=i

6=0

g

H;(fl)n_a_J._6_fc((?)+

H

E

W)

fc=0 fc=JV-a-j-6+l N-lN-a

oo

+E E

1

E

a=« j =l

j

^^A^WI^E^W^Wn=i

b=N-a-j+l

See Lee and Park [9] for nv(a,j,b,y), T*a_k(9,c) and H%~k. (Derivation of wlu A{9)) In this case, we have n w bu,A (9) = (1 - p)Pbu J2 ^[S*(6)\

• {Fx + F2},

n=0 a-n

j-.

j

^EiEw*"

N — n—j

1

E n-n-,-(«,«)

N-n-j-a

E

oo

fl:(»)n-„-j-.-*(8)+

E

fc=0 fc=AT-n-j-a+l a —n j ~ .

EfE^'W" .7 = 1

»=13

1

oc

E

Q=7V-n-j'+l

n-„-,(^a)^w,

H

*^)

(3

280 N-n

F

3

D

*= E

-N-n-j

-T^EPW-

j=ct—n-\-l

1

E

i=l oo

Hwm-n-j-kW

fc=0 °° J

fc=AT-n-j + l

D<

j=N-n+l

° i=l

(Derivation of iu*„,^(0)) The waiting time of the job which arrives during the setup time depends on the number of jobs at the start of setup time and is given by
= (! - P)psu •K{J1 + J 2 } ,

(3.5)

where JV-l

J

J1= £HZ[S'(0)]n n=a

r N-n—1 N—n-a L

N-n-a—j

E

E

E

a=0

j=l

6=0

nUa,3,b,8)[S'{0)]°

^ 1= 1

oo

oo

oo

1

J

+ E E E ^(a. J, *, »)[s* (*)]°7 D ^ p - 1 i=l

a=0 j = l 6=0 oo n=N

oo

oo

a = 0 j = 16=0

^ »=1

(Derivation of w*sb A(6)) In the similar manner, we have N-l

w*sbA(o) = (i - p)Psb^£

Kb[s*(0)}ifch

n=ct

D-D(S*(6))

3

{

J=I

A[1-S*(0)]

t=i

(3.6)

Then, using (3.2)-(3.6) in (3.1a) yields the vector LST of the actual waiting time. The scalar LST of the actual waiting time can be obtained from *»A(9)

= wA(6)e.

(3.7)

Then, finally, the scalar LST of the manufacturing lead time becomes

281

W*MLT(O) = ™*A(0)S*(O).

(3.8)

4. Mean length of the manufacturing lead time Once the vector LST is found, we can use the standard procedure of the Lucantoni et al. [10] and Lucantoni [11] to derive the mean actual waiting time WA and the mean length WMLT of the manufacturing lead time, and we get WA

= M1 + M2 + M3 + M4,

WMLT

= WA + E(S)

(4.1)

where Ml

= ( 1 - p)Pmaint

'«

, a — 1 a — a a—a—j

N—a—j — b

* E E E 'Ma.j.fc) *• o = 0 j = l

£

6=0

N-a-j-b-c

X

HT-'-"

c=a—a—j-6 Q-1Q—aa-a-j

£ fffcTjV_a_,,-_b_c_fc +E E

fc=0 a —1 a—a N—a—j + E E E a = 0 j = l b=a-a— j+1 Q-1

N-a—1

+E

E

n

v{a,j,b)

N-a—j

N-a-j-b

E nv{a,j,b)

o=0j=o-o+l

E •f2^(a'J'b)T«-a-j-6

a=0 j = l 6=0 N—a—j — b ^ HfcTjV-a-j-b-fc k=0

] P iffeTjv_0_j_6_fe

6=0

fc=0

N-2N-a-lN-a-j

N-a-j-b

+ E E E flv(°»j.6) E H k^N-a-j-b-k >, 6=0 A:=0 -" JV-1 N-n-1 N-n-aN-n-a-j a M2 = {\-p)PsuKYJH n E E E ^H{a,j,b)rN-n-a-j-b a=a

j=l

n=a

a = 0 j = l 6=0

N-l

N-n .

n

+ (i- P )p TO 53#»'^;^i Tjv _ n _. ) n=a a —1 M s = (1 - p ) P | » ^ , n =0 n=0

E j = =a a- -nn++ l]

j = l ( ct—n . y-j

n

| ^ — *• j = l

"T1 E

H

N~n—j

kTN-n-j-k

fc=fl fc=0

N—n—j—a

^ " « a=a-n-j

"

_£_, fc=0

^k-TN-n-i-a-k

+ E —r-T<*-n-j f. -, = l1 j =

J

282

/

1

a-l

N-l

\

Mi = - - PmaintK + Au Yl *£* + P ^ K + P»» 5Z *™ \

n=0

ra=<*

a-l

N-l

+ Pbu Y,n$bnuE(S)e + P

E^l

+ rmaint 2

(D + e i r ) _ 1 0 ( 1 ) e J

•2

E { y )

{2)

T7E {S)D e + 2p(l - p)

n#*6£(S)e

+ PbuE{H) + Psb £

+ P

F(m + P Z(H>)

+ ^maint •&(*)

TZE(S)D{1)(D

+ ^ u ^ ^ 1

+ eir)" D A(l - p)

In the above equations, /2y(a,j,6) =

\E(S^_ + ^

(1)

_

p )

+

1 —

e

fiy(a,j,b,0)\g

and D^ =

5. Numerical experience: Poisson input vs B M A P input As a computational experience, we compare the average MLTs under batch Poisson arrivals and BMAP arrivals. This section is based on Lee at al. roi w n /-10.0 1.0 \ /8.0 0 \ „ /l.O 0 [8]. We use D0 = (^ QA _Q g j , ^ = ( Q „ 2 J . ^ = ( 0 0 . 2 T h e n , we get n = {-KUTT2) = ( f , f ) , Xg = TT £ ; ; = 1 £>„e = M> A = • 7 r ^ n = 1 n £ ) „ e = ^ . An arbitrary group is of size i with probability Mx/G/1 7 l = 2^s = 17 a n d 7 2 = ^ e = ^ F o r b o t h BMAP/G/1 and queues, we use the followings: (a, N) = (3,5), exponential maintenance and setup time with mean E(V) = E(S) = 1.0, and Eriang processing time with P^f ^ X(n-i)\— a n d the mean processing time E(S) = n//x. We consider seven cases of processing times: \i — 20,15,12,10,9,8,7 with the order fixed at 2 for all cases. This arrangement leads to seven cases of traffic intensities, ranging from p — \E(S) — 0.3286 to p = 0.9388. Figure 1 shows the average MLTs for two different input modes. BMAP case was computed from (4.1). The compound Poisson case can be computed as a special case of the BMAP case. It is observed that as p is getting larger, the relative difference between two average MLTs is getting larger. This simple numerical example tells us that in many real-world manufacturing systems, the naive Poisson assumption may lead to a severe underestimation of the average MLT.

283 Average MLT

Figure 1. Comparison of the average MLTs as p varies.

References 1. S.H. Chang, T. Takine, K.C. Chae and H.W. Lee, A unified queue length formula for B M A P / G / 1 queue with generalized vacations, Stochastic Models 18(3), 369-386 (2002). 2. S. Kasahara, T. Takine, Y. Takahashi and T. Hasegawa, M A P / G / 1 queues under N-policy with and without vacations, J. OR. Soc. Japan 39(2), 188212 (1996). 3. G. Latouche and V. Ramaswami, Introduction to Matrix Analytic Methods in Stochastic Modeling, ASA-SIAM series on Statistics and Applied Probability (1999). 4. H.W. Lee, S.S Lee, K.C. Chae, Operating characteristics of M /G/l queue with N-policy, Queueing Systems 15, 387-399 (1994). 5. H.W. Lee, S.S Lee, J.O. Park and K.C. Chae, Analysis of Mx/G/1 queue with iV-policy and multiple vacations, J. Appl. Probab. 31, June, 467-496 (1994). 6. H.W. Lee and J.O. Park, Optimal strategy in iV-policy system with early setup, J. Opera Res Soc 48, 306-313 (1997). 7. H.W. Lee, B.Y. Ahn and N.I. Park, Decompositions of the queue length distributions in the M A P / G / 1 queue under multiple and single vacations with iV-policy, Stochastic Models 17(2), 157-190 (2001). 8. H.W. Lee, N.I. Park and J. Jeon, Application of the factorization property to the analysis of production systems with a non-renewal input, bilevel threshold control and maintenance, Proceedings of the Fourth International Conference on Matrix-Analytic Methods in Stochastic Models, Matrix-Analytic Methods: Theory and Applications (eds. Guy Latouche and Peter Taylor, ISBN 981238-051-5), 219-236 (2002). 9. H.W. Lee and N.I. Park, Using factorization for waiting times in B M A P / G / 1 queues with N-policy and vacations, Stochastic Analysis and Applications

284 22(3), 755-773 (2004). 10. D.M. Lucantoni, K. Meier-Hellstern and M.F. Neuts, A single server queue with server vacations and a class of non-renewal arrival process, Adv. Appl. Probab. 22, 676-705 (1990). 11. D.M. Lucantoni, New results on the single server queue with a batch Markovian Arrival Process, Stochastic Models 7(1), 1-46 (1991). 12. M.F. Neuts, Structured stochastic matrices of M/G/l type and their applications, New York, Marcel Dekker (1989). 13. V. Ramaswami, Stable recursion for the steady state vector for Markov chains of M / G / l type, Stochastic Models 4, 183-188 (1988). 14. H. Takagi, Queueing Analysis: A Foundation of Performance Evaluation, Vol I, Vacation and Priority Systems, Part I, North-Holland (1991). 15. M. Yadin and R Naor, Queueing system with removable service station, Opns., Res. Quarterly 14, 393-405 (1963).

OPTIMAL INSPECTION POLICY W I T H ALIVE MESSAGE

S. M I Z U T A N I Institute

of Consumer Sciences and Human Kinjo Gakuin University 1723 Omori 2-chome, Moriyama-ku, Nagoya, Aichi, 463-8521, Japan E-mail: [email protected]

Life,

T. NAKAGAWA, T. NISHIMAKI Department

of Marketing and Information Systems, Aichi Institute of Technology 1247 Yachigusa, Yagusa-cho, Toyota, Aichi, 470-0392, Japan E-mail: [email protected]

This paper considers the inspection model when a main unit sends signals periodically to a checking unit for the detection of its failure. Such a signal is called an alive message. When the checking unit can not receive the signal until a specified time, it is concluded that the main unit has failed and is replaced. Next, we consider another case that when the checking unit can not receive the signal although the main unit does not fail, the main unit is not replaced. We obtain the expected costs and derive analytically optimal policies which minimize them. Numerical examples are finally given when the failure time is exponential.

1. Introduction We consider a system where a main unit sends signals periodically to a checking unit for the detection of its failure. When the checking unit can not receive the signal until a specified time, it is concluded that the main unit has failed. Such a signal is called an alive message, and is used to check the system such as microprocessor or server system l. However, there may exist a possibility that the checking unit can not receive a signal until the specified time although the main unit does not fail. Then it is incorrectly concluded the main unit has failed. The main reason is for the busy state of processing tasks, and the other is for the congestion of network between the main unit and the checking one. 285

286

Main unit Failure!

T

Time —
^y T

Checking system Figure 1.

v_y»x T

detection of failure

Unit and checking unit

In this paper, we consider the inspection model with alive message. If the failure of a main unit is detected correctly or incorrectly, then the main unit it is replaced and becomes like new. In a real world, if the failure is not catastrophic one then the main unit can recover without replacement 6 . Therefore, we consider another model where the main unit does not become like new in the case where the checking unit can not receive a signal when the main unit does not fail. We consider the optimal policy of the above signal checking model 4 : The signal from the main unit has a delay time until the checking unit receives it. That is, when the failure has occurred during the interval from sending the signal to being received by the checking unit, it is detected at the next signal (Fig. 1). We obtain the expected costs, and derive analytically the optimal policies which minimize them. That is, we derive the optimal interval time T*. Numerical examples are finally given when the failure time is exponential. 2. Modeling 2.1. Model

A

Consider a system where the main unit sends signals periodically to the checking unit for the detection of its failure. When its failure is detected by the checking unit, the main unit is replaced and becomes like new and starts to operate again 2 ' 3 . For this model, we make the following assumptions (Fig. 2): (1) The failure time of the main unit has a general distribution F(t) with finite mean 1/A, where survival function F(t) = 1 — F(t). (2) The main unit sends a signal to the checking unit at periodic times

287

ci

^

q

7

,^V

l

^

c

F a i 1

i

^

correct

1

J^'

1

IO^ ^ ^ correct

k^ ^ ^ ^ ^ ^r ^- r ^

k^ ^ V ^ 1 ^T-^ ^-T^ Coi

'

incorrect

Figure 2.

Process of the model A

j(T + T) — T (j = 1,2,...), and the checking unit receives it at time j(T + T) (r > 0). A delay time r is an upper interval time from sending a signal to being received by the checking unit, and is considered as a fixed value. (3) A cost coi is the cost for maintenance or replacement when the main unit fails, and c02 is the cost for maintenance which include a detailed check, recover or replacement when the checking unit can not receive the signal although the main unit does not fail, where coi > c 0 2-

(4) A cost ci is the cost for one check by sending a signal, and c 2 is the loss cost per unit of time for the time elapsed between a failure and its detection by the checking unit. Note that when a failure has occurred during (j(T + r) — T,J(T + T)], a failure is detected at time (j +1)(T + r ) and c 2 is incurred at (j + 1)(T + T), where we assume C2/A > Ci + c 0 i. (5) The signal can not be received by the checking unit with probability p and can be received with probability q until a specified time r when the main unit does not fail, where p + q — 1. We derive the probability that the checking unit receives a signal until a specified time when the main unit has failed. That is, the probability that the failure is detected correctly is given by (j + l)(T+T)-T

F{T)+ "£,
dF(t). Jj(T+T)-T

(1)

288

Furthermore, we derive the probability that the checking unit does not receive a signal until a specified time although the main unit does not fail. That is, the probability that the failure is detected incorrectly is given by

^pq^FUiT

+ ^-r).

(2)

3=1

Evidently, (1) + (2) = 1. Then, the total expected cost until the failure is detected correctly or incorrectly is rT

/ [c0i+c1+c2(T Jo 00

T-t)}dF(t)

/•(J + 1 ) ( T + T ) - T

+I>7 j= 1

+

{cdl + (j + l)ci+C2[(j + l)(r+T)-t]}dF(t)

Jj(T+T)-T

OO

+ H O ' c i + cm)PQi~1'FU(T + T) - r ) .

(3)

Similarly, the mean time until a failure is detected correctly or incorrectly is (J + 1 ) ( T + T ) - T

(T+r)

dF(t) + F{T) i(T+T)-T OO

1 + ]T^F(J(T + T)-T)

(4)

J=l

2.2. Mode/ £ We consider the model that the main unit is not replaced when the signal can not be received by the checking unit with probability p until a specified time r although the main unit does not fail (Fig. 3) Si It is assumed that the cost of the detection incorrectly is C3, and then the main unit is not replaced. The other assumptions are the same ones as those of model A. The expected number of the incorrect detection of a failure until the main unit has failed and is replaced is OO ""

.(j + l)(T+T)-T /•U-l-iM-'-t"

dF(t). •=

1

Jj(T+T)-T

(5)

289

incorrect \ c3

ci

Failure

correct

Figure 3. Process of Model B

T h e expected cost until the failure is detected correctly is / [coi+Cl+c2(T + T-i)]dF(i) Jo + E

/

{coi + (j + l)ci+c2[(j

+ l)(T + r)-t]+c3jP}dF(t).

(6)

JJ(T+T)-T

j = 1

T h e mean time until the failure is detected correctly is the same as the case ofq=l.

3. Optimal Policy We seek an optimal interval time T* which minimizes t h e expected cost per unit of time when the failure time has an exponential distribution. We consider one cycle from t h e beginning of unit operation t o t h e detection of a failure. Then, the expected cost per unit of time is given by the ratio of expected cost of one cycle w i t h m e a n t i m e of one cycle 5 .

3 . 1 . Model

A

We seek an optimal interval time T* which minimizes the expected cost per unit of time C i ( T ) for model A when F(t) = 1 — e~xt. Then, the expected cost per unit of time C\{T;T) is, form (3) and (4), r

m 1{

_ (e 2 /A - CQI - c1)[eXT+ g(l - e~ A r )] - [ c 2 / A - cm - P(CQ2 - C2T)\ ° (T + T)[e*r + g ( i _ e - A r ) ] 2

' _

°2

_ c 2 /A - CQI - ci T + \

+

c 2 / A - c 0 1 - p ( c 0 2 - C2T) ( T + r)[e^ + g(i_e-Ar)]-

Differentiating C\ (T) with respect t o T and putting it equal t o zero, we have [eXT + g(l - e - A r ) ] 2 e [l + X(T + r)] + q(l - e-x-) XT

_ c 2 /A - c01 - p{c02 - C2T) c2/X - c01 - a

. UJ

290 Letting denote the left-hand side of above equation by Za(T'), we have Ll(0)

- l + AT + g ( l - e - ^ ) ' Li(oo) = oo, 1{

'

A2e^(T + r ) [ e 2 ^ - g 2 ( l - e - ^ ) 2 ] {e*T[l + X(T + T)] + q(l - e - ^ ) } 2

>

Therefore, we have the optimal policy as follows: (1) If Li(0) < [c2/A - c0i - p(c02 - c 2 r)]/(c 2 /A - c0i - c±), then there exists a finite and unique T* (0 < T* < oo) which satisfies (8). (2) IfLi(O) > [c2/A - c o i - p ( c 0 2 - c 2 r ) ] / ( c 2 / A - c o i - c i ) , then T* = 0 . 3.2. Model

B

We seek optimal interval time T* which minimizes the expected cost per unit of time C2(T) for model B when F(t) = 1 — e~xt. Then, the expected cost per unit of time C 2 (JT) is rcn

,

(c2/A - cot - C l ) ( l - e-MT+r)) _ ( c i + (r + r)[l+e-^(l-e-Ar)]

C 2 ( r ) = C 2

Czp)e-xr

<9>

•

Differentiating C 2 (T) with respect to T and putting it equal to zero, then we have e AT +

(1

_

e-Ar}[l

_

X{T

+

T)]

A(T + r) + l + e - ^ ( l - e - ^ )

Cl +

C3p

c2/A - c0i - ci

^

U;

Letting denote the left-hand side of above equation by ^ ( T ) , we have _ l + (l-e-^)[l-Ar] (°) Ar + 2 - e - ^ > °' L 2 (oo) = oo, L2

A(T + r)[l - e - ^ ( l - e - ^ ) ] ( l - ^ + e AT ) 2 ^2UJ[A(T + r ) + l + e - ^ ( l - e - ^ ) ] Therefore, we have the optimal policy as follows: (1) If I/2(0) < (ci + c 3 p)/(c 2 /A - coi - ci) - e~AT, then there exists a finite and unique T* (0 < T* < oo) which satisfies (10). (2) If L 2 (0) > (ci + c 3 p)/(c 2 /A - co! - ci) - e" AT , then T* = 0.

291 4. Numerical Examples We compute numerically optimal intervals AT* x 103 which minimizes the total expected cost C\(T) and C^CT), when F(t) = 1 - e~xt. The costs are normalized to c\ as unit cost. Table 1 gives optimal interval AT* x 103 which minimizes the expected cost Ci(T) for c 2 /(Aci) = 103, 2 x 103, c 0 i/ci = 10, 20, 30, 40, and p = 0.00 ~ 0.10 when C02/C1 = 10 and AT X 103 = 1. This indicates that optimal AT* increase when p decrease and CQI/CI increase. Table 1. Optimal XT* x 10 3 which minimizes C i ( T ) when C02/C1 = 10 and AT X 10 3 = 1. C2/(Aci ) = 10 3 p 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00

10 19.22 22.97 26.23 29.14 31.80 34.27 36.58 38.76 40.83 42.81 44.70

20 19.27 23.06 26.33 29.27 31.95 34.43 36.76 38.96 41.04 43.03 44.94

30 19.32 23.14 26.45 29.40 32.10 34.60 36.95 39.16 41.25 43.26 45.18

c 2 /(Aci) = 2 x 10a coi / c i 40 10 19.37 19.19 23.23 20.67 26.56 22.05 29.54 23.36 32.26 24.60 34.78 25.79 37.13 26.93 39.36 28.02 41.47 29.08 43.49 30.10 45.42 31.09

20 19.21 20.70 22.09 23.41 24.66 25.85 26.99 28.09 29.15 30.18 31.17

30 19.24 20.73 22.13 23.45 24.71 25.91 27.06 28.16 29.23 30.26 31.26

40 19.27 20.77 22.17 23.50 24.76 25.97 27.12 28.23 29.30 30.34 31.34

Table 2 gives optimal interval AT* x 103 which minimizes the expected cost C2(T) for c 2 /(Aci) = 103, 2 x 103, c 0 1 /ci = 10, 20, 30, 40, and p = 0.00 ~ 0.10 when cs/cx = 5 and AT X 103 = 1. Similarly Table 2, indicates that optimal AT* increase when Cni/ci increase. Conversely AT* decrease when p decrease and give the same results when p — 0. 5. Conclusions This paper has considered optimal policies for the models where when a signal of periodic checking can not be received by a checking unit, a main unit has failed. However, there exists a possibility that the checking unit can not receive a signal until a specified time for some accidents although the main unit does not fail. We have considered two models: (1) When the checking unit can not receive the signal correctly or incorrectly, the main unit is replaced and becomes like new and starts to operate again.

292 Table 2. Optimal AT* x 10 3 which minimizes C\(T) 3 AT x 10 = 1. C2/(Aci) = 10 3

3 ca/(Ac!) = 2 x 10

coi / c i

p 10 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00

when C3/C1 = 5 and

55.15 54.19 53.22 52.22 51.21 50.18 49.13 48.06 46.97 45.85 44.70

20 55.44 54.48 53.50 52.50 51.48 50.45 49.39 48.31 47.21 46.09 44.94

30 55.74 54.77 53.78 52.78 51.76 50.72 49.66 48.57 47.47 46.34 45.18

40

10

56.04 55.06 54.07 53.06 52.04 50.99 49.92 48.83 47.72 46.59 45.42

38.39 37.72 37.04 36.35 35.64 34.92 34.19 33.44 32.67 31.89 31.09

20 38.49 37.82 37.14 36.44 35.73 35.01 34.28 33.53 32.76 31.97 31.17

30

40

38.59 37.92 37.23 36.54 35.83 35.10 34.37 33.61 32.85 32.06 31.25

38.70 38.02 37.33 36.63 35.92 35.20 34.46 33.70 32.93 32.14 31.34

(2) W h e n t h e checking unit can not receive the signal although the main unit does not fail, the main unit is not replaced. We might consider t h a t t h e time from sending a signal t o being received by t h e checking unit has also a probability distribution. Further, t h e time from sending t h e signal t o being received by t h e checking unit might b e almost instant, and if and only if t h e main unit is busy state t h e n we might assume the time has a probability distribution. These formulations and results in this paper would be applied t o other real systems such as digital circuits by suitable modifications. References 1. P. K. Lala, Self-Checking and Fault Tolerant Digital Design, Morgan Kaufmann Pub., San Francisco (2001). 2. R. E. Barlow and F. Proschan, Mathematical Theory of Reliability, John Wiley, New York (1965). 3. S. Osaki, Applied Stochastic System Modeling, Springer Verlag, Berlin (1992). 4. S. Mizutani, T. Nakagawa, "Optimal Inspection Policy with Signal Checking", Proceedings of International Workshop on Recent Advances in Stochastic Operations Research, Canada. 149-155, (2005). 5. S. M. Ross, Applied Probability Models with Optimization Applications, San Francisco, Holden-Day, (1970). 6. T. Nakagawa, K. Nishi, K. Yasui K "Optimum preventive maintenance policies for a computer system with restart". IEEE Trans Reliab, R - 3 3 , 272-276 (1984).

OPTIMAL CHECKPOINT INTERVALS FOR ERROR DETECTION BY MULTIPLE MODULAR REDUNDANCIES

K.ENICHIRO NARUSE, TOSHIO NAK.AGAWA Department of Industrial Engineering, Aichi Institute of 1247 Yachigusa Yakusa-cho Toyota, Aichi 470-0392,

Technology Japan

SAYORI MAEJI Institute of Consumer Sciences and Human Life, Kinjo Gakuin 1723 Oomori2 Moriyama Nagoya, Aichi 463-8521,

University

Japan

This paper considers multiple modular redundant systems as the recovery techniques of error detection and error masking on the finite process execution, and discusses analytically optimal checkpoint intervals. Introducing the overheads of comparison and decision by majority, an error occurrence rate and a native execution time of the process, we obtain the mean times to the completion of the processes for multiple modular systems, and derive optimal checkpoint intervals which minimize them. Further, we extend such checkpoint models to the case where the occurrence of error rate is not constant and increases with the number of checkpoints. The sequential checkpoint intervals for a double modular system are computed numerically.

1. Introduction In computer systems, some errors often occur due to noises, human errors, hardware faults, and so on. To attain the accuracy of the computing, it is important to detect and/or mask such errors by fault tolerant computing techniques [1,2]. This paper considers the redundant techniques of error detection and error masking on a finite process execution. Firstly, an error detection of the process can be made by two independent modules where they compare two results at suitable checkpoint times. If their results do not agree with each other, we go back to the newest checkpoint and make a retrial of the processes. Secondly, a majority redundant system with multiple modules is adopted as the technique of an error detection and the result is decided by its majority of modules. In this case, we determine numerically what a majority system is optimal. In such situations, if we compare results frequently, then the time required for rollback could decrease, however, the total overhead for comparisons at 293

294

checkpoints would increase. Thus, this is one kind of trade-off problems how to decide an optimal checkpoint interval. Several studies of deciding a checkpoint frequency have been discussed for the hardware redundancy above. Pradham and Vaidya [3] evaluated the performance and reliability of a duplex system with a spare processor. Ziv and Bruck [4, 5] analytically considered the checkpoint schemes with task duplication and evaluated the performance of schemes. Kim and Shim [6] derived the optimal instruction-retry period which minimizes the probability of the dynamic failure on the triple modular redundant controller. This paper firstly considers a double modular redundancy as redundant techniques of error detection and summarizes the results [6, 7]. Next, we consider a redundant system of a majority decision with (2«+l) modules as an error masking system, and compute the mean time to completion of the process and decide numerically what a majority system is optimal. Further, we extend the above checkpoint model with a double modular redundancy to the case where an error rate increases with the number of checkpoints. The sequential checkpoint intervals which are not constant are computed numerically. 2. Multiple Module System Suppose that S is a native execution time of the process which does not include the overheads of retries and checkpoint generations. Then, we divide S equally into N parts and create a checkpoint at planned times kT\k=l,2,—J^-l) where S=NT. To detect errors, we firstly provide two independent modules where they compare two results at periodic checkpoint times. If two results agree with each other, two processes are correct and go forward. However, if two results do not agree, it is judged that some errors have occurred. Then, we make a rollback operation to the newest checkpoint and a retry of the processes. The process completes when two processes are succeeded in all intervals above. Let us introduce a constant overhead C, for the comparison of two results. We neglect any failures of the system caused by common mode faults to make clear an error detection of the processes. Further, it is assumed that some errors of one process occur at constant rate X, i.e., the probability that any errors do not occur during (0,/] is given by e". Thus, the probability that two processes have no error during (0,/] is ~F$T) = e~2Ar [8]. The mean time L ,(N) to completion of the process is the sum of the processing times and the overhead C, of comparison of two processes. From

295 the assumption that two processes are rolled back to the previous checkpoint when an error has been detected at a checkpoint, the mean execution time of the process for one checkpoint interval (0,7] is given by a renewal equation:

I 1 (l) = (7' + C 1 )e-^ +(T +

Q+Ll(lW-e-^),

(1)

and solving it, we have

Ll($ = {T + Cl)e2Ar.

(2)

Thus, the mean time to completion of the process is

LX(N) s #1,(1) = N(T + Cx)ellT ={S + NCy*51"

(3)

We seek an optimal number N\ which minimizes L$N) for a specified S. Evidently, Li(oo)=oo and

Ll($ = (S + Cl)e 2AS

(4)

Thus, there exists a finite number JVj (1 < JV, < °o) . However, it would be difficult analytically to find Ni* which minimizes L{(N) in (3). Putting T = SIN in (3) and'rewriting it by the function T, we have

k{T) = S 1-

c,

,2/ir

(0 < T < S).

(5)

It is evident that 4(°) = x}r%L^ = °°> and 1,(5) is given by (4). Thus, there exists an optimal f,(0 < f, < 5) which minimizes £/7)in (5). Differentiating L j(T) with respect to Tand setting it equal to zero, we have

1

2X

(6)

Solving it with T,

1

2

1 +/LC,

(7)

Therefore, we have the following optimal interval number N,* [9]: If Tx < S , we put [ S I f[ ] = N where [x] denotes the greatest integer contained inx, and calculate L{(N) and L,(JV+1) from (3). If Lt{N)< Li(N+l)

296 then TV, =N and T* = S/N* , and conversely, if L$N+$< L^N) then 7V,*=7V+1. 2. If T]>S, i.e., we should make no checkpoint until t i m e S t h e n N\ =1, and the mean time is given in (4). Note that fx in (7) does not depend on S. Thus, if S is very large, is changed greatly or is unclear, then we may adopt Tt as an approximate checkpoint time. Further, the mean time for one checkpoint interval per this interval is

W

+ ff"-

W-^^

Thus, the optimal time which minimizes LX(T) also agrees with Tx in (7). Next, consider a redundant system of a majority decision with (2«+l) modules as an error masking system, i.e., («+l)-out-of-(2«+l) system («=1,2, - "). If more than («+l) results of (2«+l) modules agree, the system is correct. Then, the probability that the system is correct during (0,7] is

t=n+l

V

k

Thus, the mean time to completion of the process is

where C„+i is the overhead of a majority decision of (2«+l) modules. 3. Sequential Checkpoint Interval

T0

J

Tj

T2

T3

1

1—I

H s

o Error rate

TN.j TN

X\

X2

X3

XN

Figurel. Sequential checkpoint intervals and error rates

It has been assumed until now that the error rate k is constant and S is divided into an equal part. In general, error rates would be increasing with time, and so that, their intervals should be decreasing with their number. We assume for the

297 simplicity of the model that error rates are increasing with the number of checkpoints. We consider the checkpoint model where each checkpoint is placed at Tk(k=l,2,~-,N-l) where T0=0 and TN=S, and an error rate during (Tk.h Tk] is Xk which is increasing in k. Then, the probability thatjwo processes have no error during (Tk.{ Tk] for a double modular system is F{(Tk_{,Tk) — e~ * * . Thus, by the similar method of obtaining (2), the mean execution time during (Tk.K Fk] is Lx(k) = (Tk-Tk_x+cyx^-T^.

(11)

Therefore, the mean time to completion of the process is L(Tl,T2,-,TN)^fjLl(k)=fj(Tk-Tk_l+Ciy^^-'J^. k=\

(12)

k=\

We find optimal times Ty which minimizes L(T\,T2,'",TN). Differentiating L(T\,T2,"\TN), with respect to Ty and setting it equal to zero, we have

[ 1 + 2 ^ -TkA + C , ) ] ^ ( ™ =[l+24+I&+I ~Tk +Cl)Y^(T^) {k=\,2,---,N-\).

(13)

Putting that xk=Tk-Tk.\ and rewriting (13) as the function of xk, we have \

+ 2

K+\iXk

+\

+ C l ) _ 02(Atxt -Attlxl+1 )

l + 2Ak(xk + C,) It is easily noted thatX k+] x k+ \<X k x k , and hence, xk+]<xk since Xk+l<Xk. In particular, when Xk+]=Xk=X, xk+\=xk = T which corresponds to the constant checkpoint model described in Section 2. If Ak+] > Xk then xk+] < Xk . Let Q(xk+i) be the left-hand side of (14) for a fixed Xk . Then Q(xk+l) is strictly increasing from g(0)=

1+ 2

^

+.

C

!

e2Xt,t

(15)

to Q(xk)>0 . Thus, if Q(0) < 0 then an optimal JC*+1 (fj < X^+l <jc i ) to satisfy (14) exists uniquely, and if Q(0) > 0 then xk+x = S-Tk. Noting that T0 = 0 and TN = S , we have the following result: (1) When N — \ ,7j = S and the mean time L(S) is given in (4). (2) When N = 2, from (13), we have

298

[1 + 24(7] +C,)>2i'yi -[1 + 2 4 ( 5 - 7 ; +C,)>2^(i'-7l) =0.

(16)

Letting Q\{T{) be the left-hand side of (16), it is strictly increasing from Q(0)<0to Qx (S) = [l + 2 4 (S + C, )Y*S

- (l + 2 4 C , ) .

Hence, if QX(S) > 0 then 7;*(o < T* < s) to satisfy (16) exists uniquely, and if 0 , ( 5 ) < 0 then 7;* = 5 . (3) When YV = 3 , we obtain Tk \k = 1,2) which satisfy the simultaneous equations:

[1 + 2/i,(r, + c1)>2/1'7'' = [1 + 2/i2(r2 - r , + cl)y*AT>-T'), [1 + 24(7-, - r, + c 1 )]e"'^- r ' ) = [1 + 2^(5 - r2 + c,)]* 2 ^ 5 -^. (4) When TV = 4,5,••• , we obtain Tk(k = l,2,---,N-l)

similarly.

4. Numerical Examples We show numerical examples of optimal checkpoint intervals for a double module system when AS = 10~' . Table 1 presents AT{ in (7), optimal number N' , AT* and AL]T(N'1) for AC, =0.5,1.5,2,3,4,5,10,20,3o(xlO"3) . For example, when A=102 (1/sec), and C,=10~'(sec) and S = 10.0(sec), the optimal number is Nl = 5 , the optimal interval is 7;* =5/TV,* =2.0(sec), and the resulting mean time is Z,I(5)=10.929 (sec), which is longer about 9.3 percent than S.

ACt

Table 1. Optimal checkpoint number, interval and mean time for a double modular system when 15=10"'. AT, - x 10 x 10 3 A. f, x 10 2 ALt(N') x 10 2 *.' 1.556 10.650 0.5 6 1.67 2.187 10.929 5 2.00 1.0 2.665 1.5 4 11.143 2.50 3.064 11.331 3 3.33 2.0 3.726 11.715 3.0 3 3.33 4.277 11.936 4.0 2 5.00 4.756 2 12.157 5.00 5.0 6.589 2 13.435 10.0 5.00 9.050 1 14.657 10.00 20.0 10.839 15.878 30.0 1 10.00

2

Next, we consider the problem what a majority system is optimal. When the overhead of comparison of two processes is Ct, it is assumed that the overhead

299

C„+i of an (M+l)-out-of-(2n+l) system is given by Q J2"^

( n =n..r

T h i s is

to select and compare 2 from each of (2«+l) processes. Table 2 presents optimal number N*n+] and the resulting mean time XLn-,{N*n+ly\02 for n=\,2,3,4 when /lC,=0.1xlO"3, 0.5xl0"3. When AG=0.5xl0"3, the optimal checkpoint number is /V3*=2 and AL3(2)=10.37xl0"2 which is the smallest among these systems, that is, a 2-out-of-3 system is optimal. The mean times for «=1,2 are smaller than 10.65* 10"2 for a double system. Table 2. Optimal checkpoint number and mean time for (n+l)-out-of(2n+l) system when AC, = 0.5x10 \ /IC, =0.1x10"' and AS = 1 0 " ' . A C , = 0 .1 x 10 " 3

n

A',;., 3 1 1 1

1 2 3 4

AC,

- i z , . , (N ; _ , ) X IO 2

= 0.5x10 *L

*.'•.

10.12 * 10.18 10.23 10.36

-'

».i ( ^ . . . ) x 10

2 1 1 1

n

10.37 * 10.58 11.08 11.81

Finally, we compute sequential checkpoint intervals Tk(k=\,2,---N) for a double modular system. It is assumed that Xk=[l+0.l(k-l)]X (k=l,2,--), i.e., an error rate increases by 10% of an original rate X. Table 3 presents optimal sequential intervals Tk and the resulting mean times for iV=l,2,"-,9 when A5=10_l and ACi=10"3. In this case, the mean time is the smallest when N=5, i.e., the optimal checkpoint number is N =5 and the checkpoint should be places at 2.38, 4.53, 6.50, 8.32, 10.00 (sec) for/l=10"2(l/sec), and the mean time is a little longer than 10.929 in Table 1 because error rates are increasing with the number of checkpoints. Further, all values of xk= Tk-Tk.] are decreasing in k. Table 3. Optimal sequential checkpoint interval when A5=10"' and ^Ci=10' -

N A7-,xl0 2 A7>10 2 A7-5xl02 A7-4XIO2 17-5x1 o : A7;,xl0 :

1 10.00

2 5.24 10.00

3

4

5

6

7

8

9

3.65 6.97 10.00

2.85 5.44 7.81 10.00

2.38 4.53 6.50 8.32 10.00

2.05 3.91 5.62 7.19 8.65 10.00

1.83 3.48 4.99 6.39 7.68 8.88 10.00

1.65 3.15 4.52 5.78 6.65 8.03 9.18 10.00

1.52 2.89 4.15 5.31 6.38 7.37 8.31 9.18 10.00

11.079

11.010

11.009

11.042

11.095

11.160

/17- 7 X10 :

A7- 8 xl0 : XL(TlJ1:--,Ty) xlO 2

12.336 11.327

11.232

300

References 1. K. M Chandy and C. V. Ramamoorthy, Rollback and recovery strategies for computer programs, IEEE Transactions on Computers 21(6), 546-556 (1972). 2. T. Anderson and P. Lee, Principles and Practice, Prentice Hall, New Jersey, (1981). 3. D. K. Pradham and N. H. Vaidya, Roll-forward and rollback recovery: Performance-reliability trade-off, Proceeding of the 24nd International Symposium on Fault-Tolerant Computings, 186-195 (1994). 4. A. Ziv, and J, Bruck, Performance optimization of checkpointing schemes with task duplication, IEEE Transactions on Computers 46(12), 1381-1386 (1997). 5. A. Ziv, and J. Bruck, Analysis of checkpointing schemes with task duplication, IEEE Transactions on Computers 47(2), 222-227 (1998). 6. S. Nakagawa, S. Fukumoto and N. Ishii, Optimal checkpoint interval for redundant error detection and masking systems, Proceeding of the First Euro-Japanese Workshop on Stochastic Risk Modeling for Finance, Insurance, Production and Reliability II, (1998). 7. S. Nakagawa, S. Fukumoto and N. Ishii, Optimal checkpointing intervals of three error detection schemes by a double modular redundancy, Mathematical and Computing Modeling 38(11-13), 1357-1363 (2003). 8. S. Osaki, Applied Stochastic System Modeling, Springer-Verlag, Berlin, (1992). 9. T. Nakagawa, K. Yasui and H. Sandoh, Note on optimal partition problems in reliability models, J. of Quality in Maintenance Engineering, 10, 4, 282287 (2004).

OPTIMAL MAINTENANCE POLICY OF THE SYSTEM CONSIDERING EXTERNAL ENVIRONMENT PREVENTIVE MAINTENANCE J. H. PARK, Y. B. KIM, G. H. SHIN*, J. W. HONG*, C. H. LIE Industrial Engineering,

Seoul National University, 56-1 Shilim-Dong Seoul, 151-744, KOREA

Kwanak-Gu

An optimal maintenance policy for a system with periodic inspection and external environment maintcnance(EEM) is investigated. EEM, which is newly introduced concept from this study, is the action that removes external factors causing failure, for example, removing dust inside of electronic appliances. A policy is defined to be optimum if it minimize the cost function comprised of EEM cost, inspection cost, repair cost and system breakdown penalty cost. Inter-relationship between inspection period and EEM period is reflected in the formulation of cost function. Simulation analysis is performed using actual data for failure and EEM period of PDP in subway station for 14 months. As a result, Balance of system failure is observed to be reduced by incorporating EEM.

1. Introduction Occasionally, we clean dirt off equipments or control a temperature and a humidity to slow system-aging down. For the same purposes, we prevent system from excessive vibration. Works mentioned above are similar to prevent maintenance(PM) in the fact that both have improvement effect of maintenance, which looks like reducing failure rate. However PM is distinguished from works mentioned above by doing with systems directly, not environment like temperature, vibration. So we introduce this kind of works as External Environment Maintenance(EEM), That is, EEM means all kinds of works controlling factors that potentially cause failure. EEM has improvement effect of maintenance and influence another operation period such as inspection or replacement for total cost minimization, therefore previous studies about maintenance policy can be referred to EEM. But unfortunately there is no study considering EEM itself, though a lot of studies about optimal maintenance policy have been conducted.

* Knowledge Management Center, KMA Consultants, Inc., Seoul, KOREA * Dept. of Industrial Engineering, HanSung University, Seoul, KOREA

301

302

Generally, assumption and methodology of PM is available for considering EEM, but EEM has several different properties, for example, EEM can be operated continuously or in a group. Therefore different properties should be considered for accurate analysis. This paper studies an optimal maintenance for a system with EEM. Similarly to other optimal maintenance studies, A policy is defined to be optimum if it minimize the cost function comprised of EEM cost, inspection cost, repair cost and system breakdown penalty cost. This paper is organized as follows. Section 2 provides background information about system and some assumptions. Section 3 presents and develops cost function. Section 4 gives an application example to illustrate the procedure of this study. Notation CE, C,, CR, CB

cost for EEM/ Inspection/ Repair/ Breakdown penalty

TE, Tr

cycle of EEM/ Inspection

X, (t)

failure rate function after k th EEM

fk{t),

Fk(t)

p.d.f/c.d.f of failure related to Aft(f)

P[a, b] B(a, b)

Pr { failure occur during (a, b]} breakdown time during (a, b]

N(a, b)

# of failure during (a, b]

Cost[a, b] p r•

cost during (a, b] improvement factor by EEM the time that EEM is carried out.

2. Background and Assumptions A system with periodic inspection and external environment maintenance (EEM) is investigated. Inspection is carried out periodically (T r ) and checks whether the system is out or not. EEM is also carried out every TE time. EEM costs CE each maintenance and has improvement effect of maintenance, we use Malik's[2] proportional age reduction(PAR) model to consider improvement effect of maintenance. Hence improvement effect of maintenance by EEM is as figure 1.

303

0

P'T. Te Figure 1 proportional age reduction(PAR) mode regards improvement effect as time-back

T

0

2T

F.

F.

Figure 2 failure rate function with twice EEM and no failure till time t

Improvement effect of maintenance by EEM is equal to time-back at the rate of improvement factor( p ). Therefore if EEM is carried out twice without failure, failure rate function is as figure 2 and represented as Eq(l) [1],

\{t)=\dt-p-k-TE)

(1)

If a failure would occur between time 2TE and 3TE , failure rate function is

0

T ,.

IT ,

faulure

3 Tc

Figure 3 failure rate function with twice EEM and a failure occurs between time 2TE and 3TF

304

We investigate an optimal maintenance policy for a system under assumptions below. Assumption 1.

2. 3.

System failure can be detected only by inspection and if detected, it is replaced or repaired immediately and Replacement or repair time is assumed to be zero. If a system is replaced or repaired, the system is good as new.(GAN Model) A period of EEM is k times inspection's (TE = k-Tnk = 1,2,3,•••).

3. Cost Function Cost function is comprised of EEM cost, inspection cost, repair cost and system breakdown penalty cost. 3.1. EEM Cost EEM costs CE every TE and frequency of EEM in time interval (0,t] is [tITE]. Therefore EEM cost during (0,/] is C£-[t/TE]. 3.2. Inspection cost Inspection costs C ; every inspection circle T, , and number of inspection during time interval (0,;] is [tIT,]. Therefore inspection cost in time (0,/] is C,-[t/T,].-

3.3. Repair Cost If failure occur and it is detected by inspection, system is repaired immediately with cost CR. Therefore expected repair cost during (0,/] is CR -E[N(0,t)], where E[N(0,t)] is expected number of failure during (0,/]. E[N(0,t)] can be computed by integration of failure rate function during (Q,t] . However failure rate function is dependent on time when failure occurs and EEM is carried out during (0,/]. We are going to develop repair function more detail in section 3.5.1. 3.4. Breakdown Penalty Cost If failure occur, the system won't work until inspection detects the failure. During downtime, breakdown penalty cost occur with CB per unit time.

305

Expected breakdown penalty cost during (0, /] is Cfi • E[B(0, t)] . CB • E[B(0, t)] will be developed more detail in section 3.5.2. 3.5. Cost Function Development A policy is defined to be optimum if it minimize cost function comprised of EEM cost, inspection cost, repair cost and system breakdown penalty cost. Therefore total cost during (0,/] is as Eq(2).

Total Cost[Q,t] =CE

t

[TE\

+cr

t

UJ

+ CR-E[N(Q,t)] + CB-E[B(0,t)]

(2)

To find optimal maintenance policy, we should find TE and 7} to minimize Eq(2). For convenient computing, we assume TE is k times 7} (assumption 3) and let t be (/ + 1) • 7} . Total Cost[0,(l + \)T,] = CV

-cE-

+

C,

(/+i)7;

k-T,

+ CR • £liV(0, (/ +1)7})] + CB • E[B(0, (l +1)7} ] (3)

•C, -[(/ + 1)] + CS •£IJV(0,(/ + 1)7})] + CB -EiBiOXl + VT,)]

3.5.1. Developing Repair Cost To compute repair cost of Eq(3), expected number of failure during (0, (/ +1)7) ] is needed and can get by integration of failure rate function during (O.^ + l ) ^ ] . However failure rate function is changed by time when failure occurs and EEM is carried out, so we compute probability of every potential failure rate function case. The probability that failure occur during (IT, (I +1)7}] is as Eq(4) i

f[ff/,(/ + i)r ( ]=Xo i ['^^»n^ W H i , l l ((+i-i)r / i,/ t H i / l |('- i ^) Hence, expected number of failure during (/7},(/ +1)7}] is as Eq(5).

W

306

E [ NilT,, (I +1)7))] =]T D. 177), (/ +1)7) ] J ^ ^ / / t H . /t] (/)rf/ (5)

no-i'-'+'i) where D, =

;=0

P[i-H]-Y\(l-P[s,s

+ l]) i

P[i-l,i]

=1,2,3,-,I-\

i=l

Using Eq(5), we can compute expected number of failure during (0, (/ +1)7) ] . E[N(0, (/ +1)7))] = £ £ [ JVOT,, C/ +1)2} )M/T 7 , (/ +1)7) ] 7=0

(6)

+

)r

£ ZW/.o'+iF/]- f '"' 'VhW')* x^c/-02},a+i-07;] 3.5.2.

Developing Breakdown Penalty Cost

In case failure occurs at time f as figure 4, system down time is as Eq(7) Svstem down time

r

IT,

\

time

(/ + D7-,

failure Figure 4 system down time, in case failure occurs at time t between /7) and (/ + 1)7)

5(/7),(/ + l)7))=(/ + l)7)-?

(7)

Therefore, we can get expected breakdown time during (/7), (/ +1)7) ] as Eq(8).

£[5(/7),(/ + l)r / )]=2A[^.(/ + l ) ^ ] - r V ^ ( / + l)2/-0->[;/*H//t](0^ (8) /=o

A

,)T

'

307

Using Eq(8), expected breakdown time during (0,(/ + l)7}] is computed as Eq(9).

E [S(0 , (/ +1)7))] = YJ

E

^JTI

> 0 ' + W)]^T/'

0' + W ]

j=0

= Z Z ^ t ^ / ' O ' + DT-J- ^ ' ^ ' ( ( y + l)^ -t)-fulkHm(t)dt\

(9)

*P[jT„ (7 + 1)7)] By Eq(3),(6),(9), Total cost during (0, (/ +1)7} ] is as Eq(10) TotaICastof[0,(l + l)T,] ~/ + l" /c +C

+c

+C,-(/ + l)

« ' Z Z W ^ O + l)?}]- fy+.'"')7>^y/^H,7*](0^[ ^[M ^"')7} J

* - Z iD*L/T/,(y+i)?;]v=o l/=o x TO, C/+ 1)7}]

)

"°~' '

(10)

r~')T'(u+i)Tj-tyf[jimm{t)dt J

4. Application Example Actual advertising PDPs in 30 subway stations is investigated as a numerical example of finding optimal maintenance policy of EEM. Failure, repair and EEM (removing dust or filter cleaning) history data of about 120 advertising PDPs in 30 subway stations from August, 2003 to October, 2004 are gathered. 4.1. Failure Rate Function Estimation We assume weibull failure model so failure rate function follows power-law form as Eq(l 1). We estimated weibull parameters a, ft and improvement factor P •

308

' - p•Z

VH-

T

>

(11) aJ

When we gathered data, EEM was carried out at regular odd intervals, therefore a,(5 and p should be estimated together. We estimated parameters using MLE(Maximum Likelihood Estimation) by Gauss-Newton Method and got these values below. a=318.7110, p = 1.0057,/? = 0.3024 Test of goodness of fitness for above parameters showed that parameters 4.2. Finding Optimal Maintenance Policy Optimal TE and 7) that minimize Eq(ll) is determined by simulation. Actually it's not easy to consider two variables( TE and 7} ) at the same time, so we reduce considering variable by assumption 3 in section 2. Hence we should find just Tj and coefficient & minimizing Eq( 11). To find optimal 7) and k , we compute fk (t) and Fk (?) from Xk (t) and computing Eq(ll) using ^ ( r ) a n d fk(t)and Fk(t) for feasible case of Tl and k . The result of simulation for feasible case of 7} and k is as figure 5 HBcoaccoeooaooo

eccocoj

I4COQOCD50DQOCD

aacoQcoo4COQOOO D2COQCDO3CCQOCO I l.OCQOOO2COQCKD

m o-i.ooaooo

Figure 5 grape of cost for feasible case of 7} and Tg

(k-Tj)

309 We found optimal maintenance policy as T, =3 and TE =12(fc = 4) showing figure 5, and also above policy has cost reduction effect comparing to current policy( 7} = 1 and TE = 30). 5. Conclusion and Further Study We introduce the concept of EEM and assert the need of considering EEM as a maintenance policy. EEM has improvement effect of maintenance and influence another operation period such as inspection or replacement and effect to failure rate and break-down time, therefore EEM is worth to conduct research like PM that has been conducting. In this study, we develop some equations for considering effect of EEM as cost and show the procedure finding optimal policy using those equations in application example section. But EEM that investigated in this study is restrictive case, EEM has feature which distinguish from PM such as continuous maintenance or grouping maintenance. References 1. I. Shin, T. J. Lim and C. H. Lie, Estimating parameters of intensity function and maintenance effect for repairable unit, Reliability Engineering and system Safety 54, 1-10 (1996). 2. M. A. K. Malik, Reliable preventive maintenance scheduling, AIIE Trans. 11,221-228(1979). 3. S. H. Sheu, Y. B. Lin and G. L. Liao, Optimal policies with decreasing probability of imperfect maintenance, IEEE Transactions on reliability 54(2), (2005). 4. W. Li and H. Pham, An inspection-Maintenance model for systems with multiple competing processs, IEEE transactions on reliability 54(2), (2005). 5. M. Ohnishi, H. Kawai and H. Mine, An optimal inspection and replacement policy for a deteriorating system, Eur. J. Oper. Res. 147, (1986). 6. T. Nakagawa, Imperfect preventive maintenance, IEEE transactions on reliability 2$, (1979). 7. V. Jayabalan and D. Chaudhuri, Cost optimization of maintenance scheduling for a system with assured reliability, IEEE transactions on reliability 41,(1992)

APPROXIMATELY OPTIMAL TESTING POLICIES FOR SERIES SYSTEMS

MIN WANG Chaoyang University of Technology, 168 E. Gifong Road, Wufong, Taichung, Taiwan, R.O.C. E-mail: [email protected] CHI-MIN CHANG Chairman of R&M Committee, Chinese Society for Quality P.O. Box No. 90008-18, Lung-Tan 325, Tao-Yuan, Taiwan, R.O.C. E-mail: chimin48@mail. nctu. edu. tw This study provides a framework to build up an optimal testing policy for singleunit and series systems from a decision theoretical point of view. Simultaneous testing is recommended for series system. Moreover, optimal test interval of each unit in series system construct an almost optimal testing policy for series systems.

1. I n t r o d u c t i o n Most earlier research on preventive maintenance for standby systems, whose failures can be found only when they are tested or inspected, focused on maximizing system availability, without regard to c o s t 1 - 4 . Resent research consider b o t h risk and cost based on decision theoretical point of v i e w 5 - 8 . In this study, we derive approximately optimal surveillance testing policies balancing the loss due to system unavailability and maintenance cost for single-unit and series systems. In earlier research, all units in series systems are tested according t o the equal test interval, thus, the whole system can be considered as a single-unit system with the system failure rate equivalent t o the sum of the failure r a t e s of all units, and the optimal test interval was followed by the results of single-unit systems. However, under certain situations, equal test interval for all units m a y not be reasonable. This study reveals t h a t each unit in a series system should be tested according to individual's optimal test interval to maximize the plant benefit. 310

311 2. Assumptions and Multi-objective Function 2.1.

Assumptions

Our general assumptions are listed below. 1. Surveillance tests of each unit are performed periodically. 2. A unit is either operable or failed. There are no partially degraded states. Unit failure can occur either because of random failure during standby, or failure on demand. 3. The state of each unit can be determined precisely during a test. If the unit is found to be failed, the unit is fully repaired or replaced. 4. Each unit is as good as new immediately after its restoration. 5. Each unit is unavailable during testing and repair. 6. For multi-unit systems, units are assumed to fail independently of each other during both standby and operation. 7. For analytical convenience, simultaneous maintenance or testing of multiple units is not included in our model. 2.2. Multi-objective

Function

Unlike in most other research whose optimal test interval minimizes the average system unavailability, the objective function in our model considers both cost and unavailability, and expressed as a weighted sum of average unavailability and average maintenance cost as follows, min

M = CUU + C,

(1)

where U and C, respectively, denote the average unavailability and the average maintenance cost per time unit, (e.g., per hour), and Cu denotes the expected loss per time unit of unavailability (measured in dollar terms). In other words, Cu is the average cost that managers would expect to incur if the unit in question were unavailable for one time unit. Therefore, the objective function can be viewed as the expected cost (or loss) per time unit, due to both maintenance cost and unavailability. We want to find the optimal maintenance policy that minimizes this objective function. 3. Single-Unit Systems The maintenance schedule for a single-unit system is shown in Figure 1. Each cycle is divided into two periods—namely, the testing period and the standby period. Thus, the ith cycle begins at the ith test of the unit at Oi. After testing, the unit is restored at time Ai, and then returned to

312

Oi

Ai

Oi+i

U *4* Testing Period

Period:

Figure 1.

Time

d Standby Period Maintenance Schedule for a Single-Unit System.

standby status until the next test at Oj+i. The length of time between Oi and Oi+i is assumed to be constant for all i, and denoted by T. The test duration is denoted by r, which is expected downtime for testing and repair before restoration and assumed to be the same for all cycles. We will let t denote the time elapsed from the beginning of the current cycle, and let u(t) denote the time-dependent unavailability of the system at time t, defined as follows: U{ }

f1 \ (1 - p)F(t ~r)+p

t£ [0, T) (in testing period) t e [T, T) (in standby period).

{ }

where F(x) is the system failure probability by age x, and p is the system failure probability on demand. As stated earlier, the condition of standby systems can only be revealed by either testing or on demand. The average system unavailability for standby systems over a cycle is therefore given by U(T) = % {/oT"T ((I " P)FlP) +P\dx 3.1. The Optimal Test Interval Failure Rate

for Systems

+ T}

with

(3) Constant

Consider a system with a constant failure rate A during standby. Then the system failure probability by age x can be approximated by Xx. The optimal test interval can be approximated by [AT2+2T

TA =

\ - ^

+

{2CT +

2CF{p-(l-p)Xr}}\1/2

cjtr^p)

/

(4)

where CT is the cost per cycle for testing and restorative maintenance, and Cp is the additional cost incurred if repair or replacement is needed. Moreover, the test duration is likely to be small relative to the test interval (i.e., r « T ) generally, the optimal test interval, T\, as shown in Eq. (4), can be further approximated as follows: Tr

f|

+

2[C T + gH,>-AT))r

(5)

313 As we can notice, when the cost Cu of unavailability loss goes to infinity, TT goes to Tu = y/2rj\, which is the optimal test interval minimizing system unavailability, as discussed by Apostolakis and Chu 1 , and Vaurio 2 . Let us consider a case in which A = 10~ 6 /hr, p = 10~ 3 /demand, r = 1 hour, Cu = $l/hr, CT = $10/test, and CF = $100/repair. According to Eq. (5), the optimal test interval minimizing the average total cost for this case is about 4.7 x 103 hours which is about three times longer than the interval that minimizes the system unavailability (i.e., 1414 hours). Thus, our approach provides the decision maker insight into the influence of both maintenance cost and system unavailability on the optimal test interval.

4. Two-Unit Series Systems In this section, we discuss the optimal testing policies including the optimal testing strategy and test intervals for series systems. Most earlier research for series systems was limited to the case of exactly simultaneous or almost simultaneous testing, with all units tested according to the same test interval; e.g., Vaurio 2 - 3 . Here, we relax these assumptions to allow unequal test intervals, and also to determine whether simultaneous (or almost simultaneous) testing is in fact the optimal testing policy.

4.1. Equal Test

Intervals

The testing schedule for this case are shown in Figure 2. Without loss of generality, assume that the i t h cycle begins with the i t h test of unit 1 at time Oj. After testing, unit 1 is restored at time Ai, and stays in the standby state until the next test at Oi+\. Similarly, unit 2 is tested at B, and restored at d. The cycle ends just before the (i + l ) s t test of unit 1 at Oi+i. The time lag between the tests of units 1 and 2 is assumed to be constant for all cycle i, and denoted by Li2. The expected testing durations for units 1 and 2, respectively, are n and r^. The system unavailability was obtained by the time-dependent unit unavailabilities. The time-dependent unavailability of unit k is given by ( \\ Uk[Xk Wt J

- J (! ~ Pk)Fk[xk{t)] + Pk if xk(t) £ [0,T - Tk) ~ U ifxfc(t)G[r-rfc,T),

(6)

where Fk[xk(t)] and pk, respectively, denote the probability that unit k fails on standby at or before time t and the probability that unit k fails on demand. Then the average unavailability over a cycle is a function of the

314

•

T

•«

i

••

L\n

i

i

Oi Ai

i

Bi

i

d

,.

Oi+iTime

Figure 2. Maintenance Schedule for a Two-Unit System in Which Both Units Have the Same Surveillance Test Interval.

test interval T and testing strategy (represented by L12 in this case), and denoted as U(T,Li2)Since exactly simultaneous testing (with its possible cost advantages) is not considered in this model, the maintenance cost is assumed to be independent of the time lag L12 between testing of the two units. Under this assumption, the total expected cost for maintenance and possible failure of unit k during a single surveillance test interval is given by

Ck(T) = 1 [CTk + CFkuk(T - rk)\ .

(7)

Therefore, the objective function is 2

M(T, L12) = CUU(T, L12) + J ! Ck{T).

(8)

fc=i

Let us consider a system with two units, denoted as units 1 and 2, in series, with constant failure rates as Ai and A2, respectively. The objective function for this case has minimum when L12 = T\ 01T—T2. That is, almost simultaneous testing is shown to be the best choice for series systems. More specifically, unit with the higher product of the expected time to failure and the test duration (i.e., ^ t ) should be tested first. Thus, the unavailability of a two-unit series system can be approximated by that of a single-unit system with failure rate A = Ai + A2, demand failure probability p = pi +P2, and test duration r = TJ + T2. The optimal test interval for such a system using this approximation is given by qTT

El=1{CTk+CFk[pk-(l-Pk)XkTk}}\1/2

[2T

sr/ M/ = j

c^i^J)

T +

4.2. Unequal Test

/

•

(9)

Intervals

In this section, we consider a case in which two units can be tested according to different length of test intervals. By adopting the almost simultaneous testing (that is the best testing strategy for equal test interval case), and

315

Ll2

•Mil Oj BitiCij

T2

M*|

\^\

Bi^Cifi

Bi]TlCit„

\&\

.TlJ

Bt,Arpt,N 2 Oi+i

Time

Figure 3. Almost Simultaneous Testing for a Two-Unit System for the Case in Which ST I = STh = N2 • STI2

assume that unit 1 have the longer test interval, the testing schedule for this case is illustrated in Figure 3. The difference from the case of equal test interval is that there are -/V2 tests for unit 2, but only one test for unit 1 in each cycle. The first test of unit 2 in the ith cycle is at Bi,i. The time lag between the first tests of units 1 and 2, i.e., Li 2 , is equal to n by almost simultaneous testing assumption. After testing, unit 2 is restored at dt\ and stays in standby status until the next test at -Bj,2. The j t h test of unit 2 begins at Bij and ends at dj. Since there are N2 tests for Unit 2, then T = Ti = N2T2. For any given value of N2, the optimal length of each cycle , i.e., test interval of unit 1, is approximated as T = Ti =

(10)

where ^2 = C„[iV 2 (l-p 1 )A 1 + ( l - p 2 ) A 2 ] A0 = Cu J [JV2(1 - p!)X! + (1 - p2)X2} (n + N2r2f + l-(N2 - 1) [(1 -

P2 )A 2 r 1

- (1 - pi)Ai(JVf T2)(7i +

2

+ (1 - pi)A!JV|r22)]

AT2T2)

+ iV2(l - Pi - p 2 ) ( n +

N2T2)

2

+ 2j2Ni{CTj+CFj[Pj 4.3. Equal vs. Unequal Test

- (1 -p^XjTj}}.

Intervals

The previous sections presented a model to find the optimal test intervals for two units in series, tested according to the equal and unequal test intervals. In this section, we first presented sensitivity analyses to show the impact of using unequal instead of equal test intervals, then provided an method

316 (a) Objective Function Value

(b) Opt Test Interval (Equal vs. Unequal)

(c) Opt. Test Interval (Single vs. Optimal)

Figure 4. Equal Test Intervals vs. Unequal Test Intervals, as a Function of Failure Rate of Unit 2, A2 (for the case where Cu = $l/hr, Ai = 10_6/hr, rk = 1 hour, Pk = 10-3/demand, CTk = $10/test, and CFk - $100/repair, k = 1,2) to find the approximated ratio between the optimal test intervals of unit 1 to unit 2. Figure 4 shows the difference between the two testing policies as a function of the failure rate of unit 2, A2. Figure 4(a) shows the objective function values achieved by equal test interval (denoted by N = 1) and the optimal test interval (denoted by N = Nopt). This figure suggests that the difference in the objective function values achieved by the two testing policies will tend to be greater when A2 differs significantly from that of unit 1 (Ai), if all other parameters are equal. Figure 4(b) shows that, when Ai is fixed, the optimal test interval for equal test interval case (denoted by TJV=I) decreases as A2 increases. However, for unequal test interval case, only the optimal test interval of unit 2 (denoted by T2 t) decreases significantly with A2; that of unit 1 (denoted by T\opt) is not heavily affected by A2. This figure shows that when the test duration of unit 1 is significantly different from that of unit 2, then the use of unequal test intervals will yield a better result than the use of equal test intervals, provided that the other parameters are equal. Previously, the optimal ratio of the test interval of unit 1 to that of unit 2 (i.e., N2) was found numerically. Here, we provided an analytic approximation to find the optimal ratio. Let Tfcopt denote the optimal test interval of unit k using the model for unequal test intervals. We now compute the optimal test interval of each unit considered separately. Let Tfcsingie denote the optimal test interval of a single-unit system consisting only of unit k that can be obtained by Eq. (4). Figure 4(c) shows Tkopt and TfcSingle as a function of A2. Note that there is no significant difference between T2opt and T2Single. However, Ti opt appears to converge to Ti single when A2 is sufficiently larger than Ai. This

317 figure suggests t h a t when unequal test intervals are preferred, the optimal test intervals Tfcopt will often be close to the optimal test interval of the corresponding single-unit system, TfcSingle- Similar sensitivity analyses for other key parameters (such as pk and Crk) can also shown the similar results. Thus, the optimal ratio ./V2 can be approximated as Single 2

N2

Single

"• " ' i s i n g l e — " ' 2 s i n g 1 ( .

2

^ Single l f

.

iSinele

^Single

<

Single

where [y] denotes the integer closest to y.

5. C o n c l u s i o n This study briefly presented the optimal testing policies for series systems. T h e best choice of testing strategy for series system is (almost) simultaneous. The choice of testing strategy does not have significantly impact on system unavailability. When units in series are significantly different, units are not recommended t o be tested according t o the same test intervals. In such case, each unit in series systems are recommended t o be tested according t o one close t o its individual optimal test interval.

References 1. Apostolakis, G. and Chu, T. L., "The Unavailability of Systems under Periodic Test and Maintenance," Nuclear Technology 50, 5-15, (1980) 2. Vaurio, J. K., "Unavailability of Components with Inspection and Repair," Nuclear Engineering and Design 54, 309-324, (1979) 3. Vaurio, J. K., "Practical Availability Analysis of Standby Systems," Proceedings Annual Reliability and Maintainability Symposium, 125-131, 1982, IEEE, New York, NY, USA 4. Vaurio, J.K., "Unavailability Analysis of Periodically Tested Standby Component," IEEE Transactions on Reliability 44(3), 512-517, (1995) 5. Vaurio, J.K., "Optimization of Test and Maintenance Intervals Based on Risk and Cost," Reliability Engineering and System Safety, 49, 23-36, (1995) 6. Vant, J., "Maintenance Optimisation from a Decision Theoretical Point of View," Reliability Engineering and System Safety, 58, 119-126, (1997) 7. Vaurio, J.K., "Availability and Cost Function for Periodically Inspected Preventively Maintained Units," Reliability Engineering and System Safety, 63, 133-140, (1999) 8. Martorell, S., Sanchez, A., Carlos, S., and Serradell, V., "Comparing Effectiveness and Efficiency in Theoretical Specification and Maintenance Optimization," Reliability Engineering and System Safety, 77, 281-289, (2002)

OPTIMAL PREVENTIVE-MAINTENANCE POLICY FOR LEASE PRODUCTS UNDER A THRESHOLD VALUE ON FAILURE RATE RUEY H U E I Y E H , WEN LIANG CHANG Department of Industrial

Management,

National Taiwan University of Science and Technology, 43, Keelung Rd. Section 4, Taipei, Taiwan

This paper investigates the optimal threshold value on failure rate for leased products with a Weibull lifetime distribution. Within a lease period, any failure of the product is rectified by minimal repairs and a penalty may occur to the lessor when the time required to perform a minimal repair exceeds a reasonable time limit. To reduce product failures, additional preventive maintenance actions are carried out when the failure rate reaches a threshold value. Under this maintenance scheme, a mathematical model of the expected total cost is established and the optimal threshold value and the corresponding maintenance degrees are derived such that the expected total cost is minimized. The structural properties of the optimal policy are investigated in detail. Finally, numerical examples are provided to illustrate the features of the optimal policy.

1.

Introduction

Due to the increase in complexity of products and rapid advances in technological innovation, there is a trend to lease a product rather than own a product. For a leased product, performing maintenance actions usually requires some expensive equipments and special professional technicians, which is not economical for the lessee (the one leasing the product). Therefore, the maintenance of the product is usually specified in a lease contract to ensure that the product could fulfill its intended purpose [5]. In this paper, we propose a maintenance scheme, in which preventive maintenance actions are taken when the failure rate of the leased product reaches a certain threshold value since this maintenance scheme can be easily specified in a lease contract in practice. Maintenance actions usually can be classified into two major categories: (i) Corrective maintenance (CM) and (ii) Preventive maintenance (PM). CM actions are used to rectify a failed product back to its operational state, and PM actions are performed to improve the operational state of the product to avoid failures. For repairable products, various maintenance policies have been extensively discussed in the literature [1, 3, 6, 12]. In practice, minimal repair is the most commonly used CM action to restore a failed product [8,9]. Various issues associated with minimal repair can be found in [2, 8, 9, 11]. To reduce the number of failures and possible penalties within the lease period, PM actions are usually employed by the lessor. Many PM policies have 318

319 been proposed and studied under various situations such as finite or infinite horizon [2, 4], and perfect or imperfect maintenance [6, 7, 10]. Nakagawa [8] first proposes imperfect PM for repairable products in which the degree of PM is described by the reduction of age of the product. Since then, the age-reduction method is widely adopted in the research of imperfect maintenance policies. The other method to describe the degree of a preventive maintenance is the failure-rate reduction method. In 1993, Chan and Shaw [3] propose two methods of failure-rate reduction: (i) failure rate with fixed reduction; that is, the failure rate is reduced with the same quantity after each PM action; and (ii) failure rate with proportional reduction; in other words, the failure rate is reduced such that each jump down is proportional to the current failure rate. In this paper, we will use the fixed failure-rate-reduction method to describe the degree of PM and derive the optimal PM policy for a leased product, since this method can be clearly specified in a lease contract. The remainder of this paper is organized as follows. The mathematical model is developed in Section 2 for the case when the failure density is Weibull. The properties of the optimal PM policy are investigated in Section 3. Furthermore, the impact of providing preventive maintenance within a lease period is illustrated through numerical examples in Section 4. Finally, some conclusions are drawn in the last section. 2. Mathematical Formulation Consider that a new product with Weibull lifetime distribution is leased for a period of L. It is well-known that the probability density function of Weibull is given by f{t) = Xfi{Xt)P^e~M for t>0 where X>0 is the scale parameter and /? > 0 is the shape parameter. Since Weibull distributions can provide a versatile class of distributional forms by changing parameter values, it is one of the most commonly used distributions in reliability engineering. By the definition of a failure rate function, the failure rate function of the Weibull distribution is r(t) = Xfi(Xt)P~] and its inverse function is given by r-\t) = \tl(XPpy$W-^ . Note that both /•(/) and r~\t) increase in t if P > 1, and decrease in / if /? < 1. When /? = 1, r(t) = X is a constant but r~] (/) does not exist. In this paper, we focus on the case where the failure rate r{t) continuously increases in t and its inverse function exists (i.e. /? > 1) within the lease period. Within the lease period, any failure of the leased product is rectified by minimal repairs. Each minimal repair incurs a fixed repair cost Cr > 0 to the lessor and requires a random amount of repair time tr that follows a general

320

cumulative distribution function G(tr) . If the repair time exceeds a pre-specified time limit r, then there is a penalty CT to the lessor. After minimal repair, the product is operational but the failure rate of the product remains the same as that just before failure. To reduce the number of failures within the lease period, imperfect PM actions with degree 8 > 0 are carried out whenever the failure rate of the product reaches a threshold value 9. That is, the failure rate of the product is reduced by 8 < 6 after each PM. In general, the cost to perform an imperfect PM is a non-negative and non-decreasing function of the maintenance degree 8 . Let Cp (8) be the cost for performing a PM action with degree 8 > 0 . Then, we have Cp(8)>0 and C'p(8)>0 for all 8>0. In this paper, we consider the case where the PM cost is a linearly increasing function of the maintenance degree; that is, Cp (8) = a + b8 for any a > 0 and b > 0 . Assuming that the time required for performing an imperfect PM is negligible, the preventive maintenance scheme can be described in terms of the failure rate function as shown in Figure 1. In Figure 1, n and 0=T0
can be easily obtained. Observing Figure 1, we have

n - j [r(L)-6]/S | which is the smallest non-negative integer greater than or equal to [r(L)-0]/8 T0=0

and

, and 7] = r~x[0 + 0'-l)8]

for / = l,2,---,« , where

TH+l=L.

> t

T0=0

Tr r,

T„ L

Figure 1. The PM scheme under a threshold value

6.

When PM actions are performed at time epochs Tt for i = 1,2,- • •, n, the failure process of the product in the interval

(TnTj+i]

becomes a

321

non-homogeneous Poisson process (NHPP) with intensity function

r(t)-iS,

since failures are rectified by minimal repairs [8, 9]. Let R(t) = £ r(u) du be the cumulative failure rate up to time t. Then, we have R(t) = (At)13 for the Weibull case, and the expected total number of failures during the lease period L can be expressed by £JLo }J+I [r(t) - iS]dt = R(L) + 8(£,U T, - nL) . As a result, given any 0 and 8, the expected total cost within the lease period L can be easily obtained as follows

C(n,8,e) = [Cr+CTG(T)]

(XLf+8

6+(i-\)8 p-\

-nL

+ n[a+b8). (1)

. A'fi J

Note that although n = | [r(L)-6]/8~\ is a function of 6 and 8, we will relax this constraint in the following analysis and treat it as a decision variable. Hence, our objective here becomes to find an optimal PM policy («* ,8* ,0*) such that the expected total cost in Eq.(l) is minimized. 3. Optimal Policy It is well-known that the Weibull failure rate function is non-increasing in t when p < 1; that is, r(tx) > r(t2) for all 0 < tx 1, there are special cases for the objective function (1) as follows. If 6 > r(L) or 8 = 0, then PM actions are not performed during the lease period. In these cases, we have n = 0 and the resulting expected total cost becomes C(0,8,9) = [Cr

+CTG(T)](AL)I}

, which provides an upper bound for the

expected total cost. To investigate the properties of the optimal PM policy, we first take the partial derivative of Eq. (1) with respect to 6. For any fixed n > 0 and 8 , and we have

^

^ OV

= [Cr+CTG(r)]S^fi)^i^[0Hi-l)S^.

(2)

;=1 p - 1

Observing Eq. (2), the following theorem holds. Theorem 1. Given any n > 0 and 8 > 0 , the optimal threshold value 6* =8 when P > 1.

322

Theorem 1 shows that the optimal threshold value 6 and the PM degree 8 should be the same when J3 > 1. In other words, when the failure rate of the lease product reaches the threshold value 0* = 8 , a PM action should be performed to reduce the failure rate back to zero. Note that although the failure rate becomes zero after PM, the slope of the failure rate function is different from the slope at time t = 0 . Using the result of Theorem 1, the objective function in Eq. (1) can be rewritten as C(n,8) = C(n,8,6t) = {Cr+CrG{T)} (XLf+8

i8

p-\

H\tffiJ

-nL

(3)

+n(a+bS).

Furthermore, for any n > 0, taking the first partial derivative of Eq. (3) with respect to 8 , we have dC(n,8)

p[Cr+CTG(rj\

d8

-l

l

tf^S*^+n[b-{Cr+CTG(j))L].

(4)

Based on Eq. (4), the following theorem shows that there exists a unique optimal PM degree corresponding to any n > 0. Theorem 2. Given any n>0

and J3>\,

if b>[Cr+CTG(r)]L,

(n*,8*) = (0,0). Otherwise, there exists a unique 8(n) e (0,r(L)] C(n, 8(n))=

then such that

min C(«, 8). 0<S
Based on the result of Theorem 2, we know that the optimal PM degree is unique if the number of PM actions is pre-specified. Let A = {p - \)[(Cr + CTG(T))L - i p ' W ^ - D / p(Cr + CTG(T)) and S(n) = n I X"=i / l / ( ^ _ 1 ) . Now, setting Eq. (4) equal to 0, we have

8{n)

n(P-l)[(Cr

+

"/r^- 1 CTG(T))L-b](A''/3)

•[A-S(n)] p-\

(5)

P(Cr+CrG(r))±i^ The following lemma summarizes the properties of 8{n) for any n > 0.

323

Lemma 3. If P > 1, then S(ri) = [A • S(n)] ofn and bounded by (A I n)

and A

is a strictly decreasing function

.

Substituting the result of Eq. (5) into Eq. (3), the expected total cost becomes C(n,S(n)) = [Cr + CTG(r)](AL)fi +n j a - ^

T

- ^

8{n) \.

(6)

Using the result of Lemma 3, it is clear that S(n) converges to (A I «)^ _ 1 ; that is, lim 5{n) = 0. Therefore, the following theorem holds. Theorem 4. / / a>[(Cr +CTG(r))L-b]S(l)/J3,

then n = 0 . Otherwise, there

exists a finite upper boundfor the optimal number of PM actions. Theorem 4 shows that if the fixed cost a for PM actions is greater than [(Cr+Cr(^r-))Z,-6]<S(1)//? , then it is not worth performing preventive maintenance within the lease period. Otherwise, there exists a finite upper bound for the number of PM actions. 4. Numerical Examples For a Weibull lifetime distribution with a scale parameter X > 0 and a shape parameter ji > 0 . Suppose that each failure of the product incurs an expected cost [Cr +C r G(r)] = 100 and Cp(S) = 5 + bS . The improvement by performing the optimal PM is evaluated by the percentage reduction of the expected total cost, which is given by A = [C(n*,
324

3. When /? increases, the C(w*,<5*) and the number of PM actions increase. These results imply that PM actions should be considered when a new product is leased with a long period L>/4 or when the PM cost is low. Table 1. FRRM of the Weibull case (The Optimal policy (n ,S', 0' ) = (n*, S'))

[Cr+( CTG(T)] = 100 , a = 5 b = 0

,1=0.1

ifi.M)

I

C(0,0)

VS)

C{n*,8*)

3

16.4

(1,0.05)

11.9

(1.5,9.0) 10 100.0 (3,0.04)

34.8

20 282.8 (6,0.03)

61.3

3

9.0

9.0

(0,0)

6 = 50

b = 20

A

(„*,
27.4 (1,0.05)

C(n*,S*)

A

(„*,«?*)

C(n*,S*)

A

12.9

21.3 (1,0.04)

14.2

13.4

65.2 (3,0.04)

37.2

62.8 (3,0.04)

40.8

59.2

78.3 (6,0.03)

65.0

77.0 (6,0.03)

70.6

75.0

0

(0,0)

9.0

0

(0,0)

9

0

(2.0,8.8) 10 100.0 (4,0.04)

40.0

60.0 (3,0.05)

43.0

57.0 (3,0.05)

47.3

52.7

20 400.0 (8,0.04)

84.4

78.9 (8,0.04)

91.5

77.1 (8,0.04)

102

74.5

3

2.7

(0,0)

2.7

(3.0,8.9) 10

100

(4,0.06)

44.7

55.3 (4,0.05)

49.1

20

800

(13,0.08)

141.3

82.3 (13,0.08)

162.8

200

0

(0,0)

2.7

0

50.9 (3,0.06)

55.2

44.8

79.6 (13,0.08)

194.3

75.7

2.7

0

(0,0)

-beta 1.5 -beta 2.0 -beta 3.0

1 3 5 7 9 11 13 15 17 19 Lease period Figure 1. Lease period v.s. Optimal cost

Furthermore, using the inverse function r~l (t) = [t I(Afi/?)]1/(^_1), the time epochs for carrying out the optimal PM actions can be easily obtained. For example, when (b,/?,L) = (20,1.5,10), the optimal policy is («*,<5*)=(3,0.04) and the PM actions are performed at 7] « 0.71, T2 « 2.84 , and T3 « 6.4. When (b, 0, L) = (20,2,10), the optimal policy is (n , f) = (3,0.05) and the PM actions are performed at 7j ~ 2 , T2 « 4 , and T3 * 6.

325 Acknowledgments This research is supported in part by a grant (NSC 94-2213-E-011-036) from the National Science Council, Taiwan. References 1. 2. 3.

4.

5. 6. 7. 8.

9.

10. 11.

12.

R.E. Barlow and L.C. Hunter, Optimum Preventive Maintenance Policies, Operations Research 8, 90-100 (1960). P.J. Boland and F. Proschan, Periodic Replacement with Increasing Minimal Repair Costs at Failure, Operations Research 30, 1183-1189 (1982). J.K. Chan and L. Shaw, Modeling Repairable Systems with Failure Rates Dependent on Age and Maintenance, IEEE Transactions on Reliability 42, 566-570 (1993). Y.H. Chun, Optimum Number of Periodic Preventive Maintenance Operations under Warranty, Reliability Engineering and System Safety 37, 223-225 (1992). P. Desarand D. Purohit, Leasing and Selling: Optimal Marketing Strategies for a Durable Goods Firm, Management Science 44, 19-34 (1998). N. Jack and J.S. Dagpunar, An Optimal Imperfect Maintenance Policy over a Warranty Period, Microelectronics and reliability 34, 529-534 (1994). T. Nakagawa, Imperfect Preventive-Maintenance, Journal of the Operations Research Society of Japan 24, 213-227 (1979). T. Nakagawa, A Summary of Periodic Replacement with Minimal Repair at Failure, Journal of the Operations Research Society of Japan 24, 213-227 (1981). T. Nakagawa and M. Kowada, Analysis of a System with Minimal Repair and its Application to Replacement Policy, European Journal of Operational Research 12, 176-182 (1983). H. Pham and H. Wang, Imperfect Maintenance, European Journal of Operational Research 94, 425-438 (1996). S.H. Sheu, Periodic Replacement with Minimal Repair at Failure and General Random Repair Cost for a Multi-unit System, Microelectronics Reliability 31, 1019-1025 (1991). R.H. Yeh and H.C. Lo Optimal Preventive-maintenance Warranty Policy for Repairable Products, European Journal of Operational Research 134, 59-69 (2001).

This page is intentionally left blank

PART IV ADVANCED WARRANTY MODELING

This page is intentionally left blank

OPTIMUM PRODUCTION RUN LENGTH AND WARRANTY PERIOD SETTING FOR THE IMPERFECT PRODUCTION SYSTEM CHUNG-HO CHEN Department of Management and Information Technology, Southern Taiwan University of Technology, 1 Nan-Tai Street, Yung-Kang City, Tainan 710, Taiwan E-mail: [email protected] Traditional economic manufacturing quantity (EMQ) model addressed that the perfect production for product. However, there possibly exists the defective product in the manufacturing process. Hence, it is necessary to consider the production process state in the modified EMQ model. Chen and Chung presented the quality selection problem to the imperfect production system for obtaining the optimum production run length and target level. Ladany and Shore proposed the problem of determining optimal warranty period of product in relation to the manufacturer's lower specification limit. In this paper, we further integrate Chen and Chung's and Ladany and Shore's models for obtaining the optimum production run length and warranty period of product for the imperfect production system.

1. Introduction Traditional economic manufacturing quantity (EMQ) model assumes that the perfect product for the manufacturing process. Hence, the defective product of its model has been neglected. Previous researchers, e.g., Porteus [13] and Rosenblatt and Lee [19, 20], firstly proposed the imperfect quality of EMQ model. Subsequently, some works addressed the integrated model about production, inspection, maintenance, and quality, e.g., Lee and Rosenblatt [7, 8], Lee and Park [6], Liou et al. [10], Rahim [14], Tseng [21], Chen and Chung [1], Makis [11], Makis and Fung [12], Wright and Mehrez [24], Rahim and BenDaya [15], Kim and Hong [3], Wang and Sheu [23], Kim et al. [4], Roan et al. [18], Lin [9], Rahim and Tuffaha [17], and Rahim and Ohta [16]. In 1996, Chen and Chung presented the quality selection problem to imperfect production systems for obtaining the optimum production run length and target level. Rahim and Tuffaha [17] further proposed the modified Chen and Chung's [1] model with quality loss and sampling inspection. 329

330

Djamaludin et al. [2], Yeh and Lo [25], Yeh, et al. [26], and Wang [22] addressed a post-sale warranty cost in the imperfect production system. However, all of their models assume that the warranty period of product is known. In 2004, Ladany and Shore considered the problem of determining the optimal warranty period of product with lower specification limits. They assumed that sale-price per manufactured item increases linearly with the warranty period. In this paper, we further integrate Chen and Chung's [1] and Ladany and Shore's [5] models for obtaining the optimum production run length and warranty period of product for the imperfect production system. 2. Integrated Model There are some assumptions made in the integrated model of Chen and Chung [l]and Ladany and Shore [5]: 1.

2. 3. 4.

When the production cycle starts, the process is in control state. Once the shift has occurred, the process will remain in an out-of-control state until it is discovered by inspection and followed by some restoration work. Otherwise, the out-of-control state will continue until the end of the production run. The elapse time until the occurrence of the assignable cause assumed to be exponentially distributed with a mean of 1/ A . The life-time of product is assumed to be exponentially distribution with a mean of l//t] when the production process is in control state. The life-time of product is assumed to be exponentially distribution with a mean of \l A^ when the production process is in an out-of-control state, Al < A2.

5. 6.

The warranty period of product is determined by the lower specification limit of the life-time of the manufactured item. The sale-price per manufactured item increases linearly with the warranty period.

If the life-time of product is conformance, the revenue per item for manufacturer is rs . If the life-time of product is within the warranty period (below lower specification limit, LSL), the revenue per item for manufacturer is rLrs,0 < rL < 1 . If the life-time of product exceeds the upper specification limit (USL), the revenue per item for manufacturer is rurs,0< rL,
331

£ n ,., | t te-«,»-«-

g

) *

AT

(1)

vT

where A is the parameter of exponential distribution for the elapse time until the occurrence of the assignable cause; R is cost of restoring the process from an out-of-control state to the in-control state; v is the production rate in item per hour; g 0 is expected revenue per item for in-control process; gx is expected revenue per item for an out-of-control process; LSL r r

go= s L

USL

j/(*i)^i+>"s \f{xx)dxx-¥rsru LSL

0

= rs\rL +(1-rL)exp(-AlLSL) where xx~EXP(-Ax)

w

, f(xx)

jf(xx)dxx USL

+ (ru

-^expi-A^USL)]

is the probability density function of xx ,

rs = a + bLSL , a > 0, and b > 0 ; LSL

g\=rsrL

Jf(x2)dx2+r5 0

USL

oo

jf(x2)dx2

+rsrv

LSL

= rs\rL +(\-rL)exp(-A2LSL)

j/(x2)Jx2 USL

+ (ru

-1)expi-A.USL))

where x2 ~ EXP{-A{) , f(x2) is the probability density function of x2 , rs = a + bLSL , a > 0, and b > 0 . For the given parameters, we can adopt the multi-dimensional search method, e.g., pattern search method for obtaining the optimum production run length T and the optimum warranty period LSL . The optimal solution of integrated model depends on the several parameters. The influences of all of them need to be illustrated by adopting the sensitivity analysis. 3. Numerical Example and Sensitivity Analysis Suppose that the integrated model parameters are R =100, v = 500, /I = 0.05, A, =1/36, A,= 1/12, USL= 40 , rL = 0.05, rv = 0.95, a = 1, and b = 0.05. By solving Eq. (1), we obtain the optimum solution with T* = 3.90, LSL = 15.20, and ETP =1.0491. Table 1 lists the sensitivity analysis of some parameters. Figures 1-10 present the effect of parameters on the expected total profit, the warranty period, and the production run length. From Table 1 and Figures 1-10, we have the follow conclusions: (1) the parameters a and b have the major effect on the expected total profit; (2) the parameters Al ,

332

a, and b have the major effect on the warranty period; (3) All parameters have the moderate effect on the production run length. 4. Conclusions In this paper, we have presented an integrated model based on considering the maximum value of expected total profit per production cycle. The warranty period of product is determined by the LSL of the life-time of the manufactured item. The sale-price per manufactured item increases linearly with the warranty period. Further study should discuss the solution condition, address the quality cost of process control, or different warranty policy in the integrated model. References 1. S. L. Chen and K. J. Chung, Determining of the optimal production run and the most profitable process mean for a production process, International Journal of Production Research 34, 2051-2058 (1996). 2. V. Dajmaludim, D. N. P. Murthy and R. J. Wilson, Quality control through lot sizing for items sold with warranty, International Journal of Production Economics 33, 97-107 (1994). 3. C. H. Kim and Y. Hong, An optimal production length in deteriorating production processes, International Journal of Production Economics 58, 183-189(1999). 4. C. H. Kim, Y. Hong and S. Y. Chang, Optimal production run length and inspection schedules in a deteriorating production processes, HE Transactions 33, 421-426 (2000). 5. S. P. Ladany and H. Shore, Optimal warranty period when sale-price increases with the lower specification limit, In: H. J. Lenz and P. TH. Wilrich (Eds), Frontiers in Statistical Quality Control 7, Physica-Verlag, 335-345 (2004). 6. J. S. Lee and K. S. Park, Joint determination of production cycle and inspection intervals in a deteriorating production system, Journal of the operational Research society 42, 775-783 (1991). 7. H. L. Lee and M. J. Rosenblatt, Simultaneous determination of production cycle and inspection schedules in a production system, Management Science 33, 1125-1136(1987). 8. H. L. Lee and M. J. Rosenblatt, A production and maintenance planning model with restoration cost dependent on detection delay, HE Transactions 21,368-375(1989).

333

9.

10. 11. 12.

13. 14. 15.

16. 17.

18.

19. 20.

21. 22.

23.

C. Y. Lin, Optimization of maintenance, production and inspection strategies while considering preventative maintenance error, Journal of Information & Optimization Sciences 25, 543-55 (2004). M. J. Liou, S. T. Tseng and T. M. Lin, The effects of inspection errors to the imperfect EMQ Model, HE Transactions 26, 42-51 (1994). V. Makis, Optimal lot sizing and inspection policy for an EMQ model with imperfect inspections, Naval Research Logistics 45, 165-186 (1998). V. Makis and J. Fung, An EMQ model with inspections and random machine failures, Journal of the Operational Research Society 49, 66-76 (1998). E. L. Porteus, Optimal lot sizing, process quality improvement and set-up cost reduction, Operations Research 34, 137-144 (1986). M. A. Rahim, Joint determination of production quantity, inspection schedule, and control chart design, HE Transactions 26, 2-11 (1994). M. A. Rahim and M. Ben-Daya, A generalized economic model for joint determination of production run, inspection schedule and control chart design, International Journal of Production Research 36, 277-289 (1998). M. A. Rahim and H. Ohta, An integrated economic model for inventory and quality control problems, Engineering Optimization 37, 65-81 (2005). M. A. Rahim and F. Tuffaha, Integrated model for determining the optimal initial settings of the process mean and the optimal production run assuming quadratic loss functions, International Journal of Production Research 42, 3281-3300(2004). J. Roan, L. Gong and K. Tang, Joint determination of process mean, production run size and material order quantity for a container-filling process, International Journal of Production Economics 63, 303-317 (2000). M. J. Rosenblatt and H. L. Lee, Economic production cycles with imperfect production processes, HE Transactions 17, 48-54 (1986a). M. J. Rosenblatt and H. L. Lee, A comparative study of continuous and periodic inspection policies in deteriorating production systems, HE Transactions 18, 2-9 (1986b). S. T. Tseng, Optimal preventive maintenance policy for deteriorating production systems, HE Transactions, 28, 687-694 (1996). C. H. Wang, The impact of a free-repair warranty policy on EMQ model for imperfect production systems, Computers & Operations Research, 31, 2021-2035(2004). C. H. Wang and S. H. Sheu, Fast approach to the optimal production/PM policy, Computer and Mathematics with Applications 40, 1297-1314 (2000).

334

24. C. M. Wright and A. Mehrez, An overview of representative research of the relationships between quality and inventory, Omega 26, 29-47 (1998). 25. R. H. Yeh and H. C. Lo, Quality control products under free-repair warranty, International Journal of Operations and Quantitative Management 4, 265275(1998). 26. R. H. Yeh, W. T. Ho and S. T. Tseng, Optimal production length for products sold with warranty, European Journal of Operational Research, 120, 575-582 (2000). Table 1. Sensitivity analysis of parameters. V

300 400 500 600 700 R

50 100 150 200 250

A

T* 5.28 4.44 3.90 3.51 3.22

LSL* 14.07 14.77 15.20 15.51 15.75

ETP* 1.0199 1.0371 1.0491 1.0581 1.0652

r

LSL* 16.17 15.20 14.35 13.52 12.69

ETP* 1.0798 1.0491 1.0264 1.0080 0.9924

2.65 3.90 4.94 5.93 6.90

LSL*

ETP*

0.03 0.04 0.05 0.06 0.07

4.87 4.29 3.90 3.61 3.39

15.97 15.58 15.20 14.86 14.52

1.0726 1.0600 1.0491 1.0394 1.0305

h

T*

LSL*

ETP*

1/30 1/32 1/36 1/40 1/45

5.49 4.73 3.90 3.44 3.09

7.94 10.44 15.20 19.89 25.69

0.9937 1.0096 1.0491 1.0957 1.1606

A2

r

LSL*

ETP*

1/8

3.27 3.56 3.90 4.47 5.66

15.99 15.47 15.20 15.12 15.41

1.0309 1.0400 1.0491 1.0620 1.0815

1/10 1/12 1/15 1/20

r

335

Table 1. (Continued). a 0.5 1.0 1.1 1.2 1.3

3.82 3.90 4.06 4.38 5.44

r

LSL* 27.47 15.20 12.52 9.62 5.67

ETP* 0.7845 1.0491 1.1140 1.1841 1.2611

b 0.04 0.05 0.06 0.07 0.08

T* 5.74 3.90 3.42 3.14 2.94

LSL* 6.73 15.20 19.65 22.66 24.88

ETP* 0.9695 1.0491 1.1521 1.2652 1.3840

T*

LSL*

ETP*

4.00 3.94 3.90 3.85 3.81

13.74 14.45 15.20 15.99 16.84

1.0365 1.0426 1.0491 1.0560 1.0633

r

L 0.03 0.04 0.05 0.06 0.07

7"*

LSL'

ETP*

U 0.88 0.90 0.92 0.95 0.98

4.20 4.11 4.02 3.90 3.78

13.54 14.01 14.49 15.20 15.97

1.0128 1.0230 1.0333 1.0491 1.0652

USL 30 35 40 45 50

3.95 3.92 3.90 3.87 3.85

r

LSL* 14.80 15.02 15.20 15.37 15.51

ETP* 1.0403 1.0450 1.0491 1.0526 1.0557

r

336

•T

•LSL •ETP

oi

;£=£ 300

400

500

600

700

Figure 1. The effect of v.

50

100

150

200

250

Figure 2. The effect of J?.

0.03

0.04

0.05

0.06

0.07

Figure 3. The effect of X .

1.1

2.4

2.6

2.S

3.0

3.2

Figure 4. The effect of \ .

k

A'...

0.5

0.7

0.9

1.1

1.3

Figure 6. The effect of a.

(1.04

0.05

0.06

0.07

0.G3

Figure 7. The effect of b.

0.03

O.Ol

0.05

0.06

0.07

Figure 8. The effect of i-£

0.68

0.9

0.92

0.94

0.96

0.94

Figure 9. The effect of ry .

•\ 1 t

Figure 5. The effect of Aj .

Figure 10. The effect of USL.

OPTIMAL A G E - R E P L A C E M E N T POLICY FOR REPAIRABLE P R O D U C T S U N D E R R E N E W I N G FREE-REPLACEMENT W A R R A N T Y *

Y . H. C H I E N

f

Department of Statistics, National Taichung Institute 129, Sec. 3, San-min Rd., Taichung City, Taichung E-mail: [email protected]

of Technology, 404, Taiwan

J. A. C H E N Department of Business Administration, Kao-Yuan University, 1821, Chung-Shan Rd., Lu-Chu Hsiang, Kaohsiung County 821, Taiwan, E-mail: [email protected]

This paper investigates the effects of a Renewing Free-Replacement Warranty (RFRW) on the age replacement policy for a repairable product with a general failure model. In the general model, there are two types of failure when the product fails. One is type I failure (minor failure) which can be removed by a minimal repair; and the other is type II failure (catastrophic failure) which can be removed only by a replacement. After a minimal repair, the product is operational but the failure rate of the product remains unchanged. For both warranted and nonwarranted products, cost models are developed, and t h e corresponding optimal replacement ages are derived such that the long run expected cost rate is minimized.

1. Introduction An age replacement policy, where an operating system is replaced at time of failure or at age T, whichever comes first, is proposed by Barlow and Proschan l . Recently, Yeh et al. 7 analyzed the effects of RFRW on the age replacement policy for a non-repairable product with increasing failure rate. In this paper, a repairable product with a general failure model is "This research was supported by the National Science Council of Taiwan, under Grant No. NSC94-2213-E-025-005. tCorresponding author. T e l : +886-4-22196660; Fax:+886-4-22196331. E-mail address: [email protected] (Y.-H. Chien).

337

338

considered. In this general model, when the product fails, type I failure occurs with probability 1 — p and type II failure occurs with probability p (0 < p < 1). It is assumed that type I failure is a minor one, and thus can be removed by a minimal repair; whereas type II failure is a catastrophic one, and thus can be removed only by a replacement. Such models have been considered in the literature. See, for example, Beichelt and Fischer 2 , Nakagawa 4 , Sheu 6 and Cha 3 . Applied the age replacement policy to such a repairable product, it become a general age-replacement model with minimal repair that discussed in Sheu 5 , where the product is completely replaced at a certain age t (t > 0) or upon type II failure (catastrophic failure), whichever occurs first. Taking the RFRW policy into account, the mathematical formulation for such a general age replacement model is developed. For product with an increasing failure rate function, the effects of a RFRW on the age replacement policy are investigated analytically, and the fact that there exists a unique optimal replacement age such that the long run expected cost rate is minimized, is shown.

2. Mathematical Formulation The general age-replacement policy, in which minimal repair or replacement are takes place according to the following scheme. If the product fails before age t, it is either replaced by a new product (due to type II failure with probability p ) at a downtime cost Cj > 0 and a purchasing cost Gv > 0, or it undergoes minimal repair (due to type I failure with probability 1 — p) at a minimal repair cost C m > 0. Otherwise, the product is preventively replaced whenever it reaches age t. Because a preventive replacement is a planned PM action, only the purchasing cost Cp is incurred in this action. Under this model, the design variable is the age for preventive replacement t. For a repairable product purchased under the RFRW, if a failure occurs within the warranty period w, either a new product with the same warranty is offered, free of charge by the seller, to replace the failed one (type II failure); or a minimal repair is performed free of charge by the seller (type I failure). However, compare to the type I failure (minor failure) which can be instantly detected and repaired instantaneously by the seller, a type II failure is a catastrophic failure which cause the consumer experience inconvenience as well as handling, shortage and waiting, although any failures are free charge to consumer during warranty period. Therefore, we assume that a type II failure of the product during the warranty period still incurs

339

a downtime cost Cd to the buyer. 2.1.

Preliminaries

Denote by the r.v. X the lifetime of a new product and by F(x) the Cdf of X. Let assume that X has pdf f(x), its failure rate r(x) is then given by r(x) = f(x)/F~(x), where F~(x) = 1- F(x) is the Sf of X. Also denote by the r.v. Y the time to first type II failure of a new product. If G(y) is defined as the Cdf of Y and the Sf G(y) = 1 - G(y), then G(y) is given by G(y) = e x p { - fV p • r(u)du} = exp{-p • A(y)} = [F(y)]p. ./o 2.2. Cost model without

(1)

warranty

Without warranty, any two successive replacements of the product form a renewal cycle of the failure process. Hence, under a replacement age t > 0, the cycle time (denote by T0(t)) is

and the total cost incurred in a renewal cycle (denote by Co(t)) is r m = 0{l

>

( C d + Cp + Cm-(l-p)A(Y),it \Cp + Cm-(l-p)A(t),

Yt.

{6}

Thus, the long run expected cost rate can be obtained as follows. CRo{t)

- EPbW] =

2.3. Cost model with

] J ^

"

(4)

RFRW

The cost model should be established for two cases: t > w, and t < w. Case 1. t > w. There are three possible states of replacement for the repairable product. First, if the type II failure occurs within the warranty period (Y < w), a downtime cost Cd is incurred. Second, if the type II failure occurs after the warranty period, but before the preventive replacement (u> < Y < t), it incurs an additional purchasing cost Cv. Third, when the product reaches age t without any type II failure (i.e., Y > t), a preventive replacement is performed with cost Cp. While whenever a type I failure occurs, it is detected and repaired instantly by a minimal repair. The cost for minimal repair is only free charge during the warranty period. That is,

340

the user should responsible for all minimal repair costs after the warranty expires. According to the above scheme, the elapsed time, and the total cost in a renewal cycle (denoted by T\{t) and Ci(t), respectively) become Y, if Y < w, Ti(t) = { Y, if w < Y < t, t, if Y > t,

(5)

and ( Cd, if Y < w, Ci(t) =lcd + Cp + Cm-(lp)[A(y) - A(w)], if w t,

(6)

Therefore, the long run expected cost rate is _ E [ g i(*)] _ g P • G(w) + [Cd + CmC- - l)]G(t) - CmC- - l)G(w) CHlW ~ E[Tl(t)} ~ J*G(u)du (7) Case 2. t < w. All the replacements (planned or unplanned) as well as minimal repairs are performed within the warranty period, so all the maintenance cost is free charge to the consumer. That is, under this case, the user will only incurs either a failure replacement (unplanned replacement due to type II failure) with a downtime cost Cd, or a preventive replacement (planned replacement at age t) with a purchasing cost Cp. According to the above scheme, the elapsed time, and the total cost in a renewal cycle (denoted by T2(t) and C2(t), respectively) are as follows: if Y < t,

(8>

if Y > t, ™>-{rii— and

Then, the expected cost rate in this case becomes rn

E C

(,\ CR2{t)=

f ^)]

wm

= C -G(t) + C -G(t) —f 0 G { u )d u • p

d

(10)

Note that when the preventive replacement age t is equal to the warranty period w, then by (7) and (10) yields nv t\ n* (\ Cp-G(w) + Cd-G(w) CRi{w) = CR2(w) = -z . w

Jo G(u)du

(11)

341

3. Optimal Policies Differentiating CRo{t) in (4) with respect to t yields

dt

[JoG(u)du]*

(12)

where ip(t) = pr(t) J0 G(u)du — G(t), it plays an important role in determining the optimal age for replacement. Let fi denote the mean time to first type II failure of a new product; that is \i = J0 G{u)du. Then, for a product with an increasing failure rate function, the following Lemma holds. Lemma 1. If r(t) is an increasing failure rate (IFR) function, andG(0) = 0, then tp(t) is an increasing function of t with rimt_>o ip(t) = ip{0) = 0, and lim^oo ip(t) = ip(oo) = pr(oo)p, — 1. In this case, the optimal replacement age £Q f° r a product without warranty can be easily obtained by setting (12) equal to 0, i.e., T/"(*O) = Cp/[Cd + Cm(l/p — 1)], and the results are given in the following Theorem. Theorem 1. Given r(t) is an IFR function. If Cp/[Cd + Cm(l/p — 1)] < V;(oo) or equivalently, r(oo) > [Cp + Cd + Cm(l/p - l))/pjl[Cd + Cm(l/p 1)], then there exists a finite, and unique optimal replacement age ij for a product without warranty, and the resulting expected cost rate is Ci?o(io) = [Cd + Cm(l/p — l)]pr(ig)- Otherwise, t$ = oo and CRQ(OO) = [Cp + Cd + Cm(l/p - l)]/£. Under a RFRW, for t > w, the first derivative of (7) with respect to t is dCRl{t)

dt

\Cd+cmip - Dmm)

- °' g( g;g: ( ( i:;i° w ]

[SlG{u)du]>

Let t\ be the optimal value minimizing (7) in the interval [w, oo), then the following Lemma 2 can be obtained by (13). Lemma 2. Under a RFRW, given r(t) is an IFR function for a repairable product with warranty period w, the following results hold for t G [w, oo). (i) If [CpG(w) - Cm(l/p - l)G{w)\/[Cd + Cm(l/p - 1)] > V(oo), or equivalently r(oo) < {Cd + [Cp + Cm{l/p - \)}G{w)}/pfi[Cd + Crnil/p - 1)}, then t{ = oo, and C/?i(tJ) - {Cd + [Cp + Cm(l/p l)]G(w)}/fl.

342

(ii) If1>{w) < [CpG(w)-Cm(l/p -l)G(w)]/[Cd+Cm(l/P -1)] < V(oo), there exists a finite, unique optimal replacement age ij > w satisfying 1>(tl) = [CpG{w) - Cm(l/p - l)G(w)]/[Cd + Cm(l/p - 1)] and CRiW = [Cd + Cm(l/p - l)lpr(tl). (Hi) If [CpG{w) - Cm(l/p - l)G_(w)}/[Cd + Cm(l/p - 1)] < V H , then tl = w, and Cfli(tJ) = [CpG(w) + CdF(w)]/ f™ G[u)du. Next, for t < w, the first derivative of (10) with respect to t becomes dCR2(t)

dt

=

G(t)[(Cd-Cp)^(t)-Cp]

[f0G{u)du]>

Let ££ be the optimal value minimizing (10) in the interval [0, w), then the following Lemma can be obtained by (14). Lemma 3. Under a RFRW, given r(i) is an IFR function for a repairable product with warranty period w, the following results hold for t G [0, w). (i) When Cd < Cp, or when Cd > Cp and I/J(W) < Cp/(Cd — Cp), the optimal age for replacement is t% = w and (7^2(^2) = \Pv + (^d — Cp)G(w)]/f™G(u)du. (ii) WhenCd > Cp andip(w) > Cp/{Cd — Cp), then there exists a finite and unique replacement age t% < w satisfying ^(^2) = Cp/(Crf ~ Cp), and CR2(t*2) = (Cd - Cp)pr(t*2). In the previous discussion, the local optimal replacement age, for a warranted product under the constraint t > w or t < w, were derived. However, in practice, the replacement age should not be pre-specified to be in a certain interval. Therefore, there is a need to investigate the global optimal replacement age ££, without any constraint. Theorem 2. Under a RFRW, given that r(t) is an IFR function for a repairable product with warranty period w, the following results hold for t € [0, 00).

(i) For Cd < Cp, if 1>{W) < [CpG(w) - Cm(l/p - l)G{w)]/[Cd + Cm{\/p — 1)], then t^ = t\> w; otherwise, t^ = w. (ii) For Cd> Cp, (a) ifi,{w) < [CpG(w)-Cm(l/p -l)G(w)]/[Cd + Cm(l/p -1)}, then tl, = tl > w; (b) if [CpG(w) - Cm(l/p - l)G(w))/[Cd + Cm(l/p - 1)] < ip(w) < Cp/(Cd - Cp), then t*w = w; and

343

(c) ifi/>(w) > Cp/{Cd - Cp), then t*w = t*2 < w. Based on Theorem 2, the global optimal replacement age t*w for a warranted product can be easily obtained, and we can draw the following remarks. Remark 1. When the downtime cost Cd is smaller than the purchasing cost Cp, the optimal replacement age should be greater than the warranty period in order to take advantage of the warranty coverage. However, if the downtime cost Cd is relatively large, the product may be replaced before the warranty expires, to avoid product failures. Furthermore, the condition for a preventive replacement being performed within warranty period is ip{w) > Cp/{Cd - Cp) > 0. Remark 2. Because tp(t) is a monotonically increasing function of t, its inverse function i/'_1(") exists. Prom Theorem 2, we know that when the warranty period is w < ip~1([CpG(w) - Cm(l/p - l)G(w)]/[Cd + Cm(l/p — 1)]), the optimal replacement age is greater than the warranty period (i.e., Vw > w). Ii^-1([CpG{w)-Cm(l/p -l)G{w)]/[Cd+Cm_(l/p ~ l -1 1)]) < w < ip- (Cp/[Cd - Cp}) when Cd > Cp, or if w > V ([C P G(ti;) Cm(l/p - l)G(w)]/[Cd + Cm(l/p - 1)]) when Cd < Cp, then the optimal replacement age t^ is always equal to the warranty period w (i.e., t^ = w). Furthermore, when w > '4>~1{Cp/{Cd — Cp)) for the case of Cd > Cp, the optimal replacement age t*^ is less than the warranty period w (i.e., t^ < w). 4. Comparisons Corollary 1. Given any replacement age t, the expected cost rate for a repairable product without warranty is always greater than the expected cost rate for a repairable product with RFRW. Corollary 2. For a repairable product with an age replacement policy under RFRW with period w, the optimal replacement ages t^, and t^ which minimize the long run expected cost rate has the following properties: (i) When Cd < Cp, (a) i/1>(w) < [CpG(w)~Cm(l/p -l)G(w)]/[Cd+Cm(l/p -1)], then w < t*w < to; (b) if [CpG(w) - Cm(l/p - l)G(w)]/[Cd + Cm{l/p - 1)] < iP(w) < Cp/[Cd + C m ( l / p - 1)], then w = t*w < t*0; (c) ifi,(w) > Cp/[Cd + Cm{l/p - 1)], then f0 < t*w = w. (ii) When Cd>

Cp,

344 (a) if^w) < [CpG(w)-Cm(l/p -l)G(w)]/[Cd+Cm(l/p -1)], then w < t£, < to,' (b) if [CpG{w) - Cm(l/p - l)G{w)\/[Cd + Cm(l/p - 1)] < V>M < Cp/[Cd + Cm(l/p - 1)], then w = t*w< Q ; (c) if Cp/[Cd + Cm{l/p - 1)] < V M < Cp/{Cd - Cp), then to Cp/(Cd - Cp), then t*0
5. C o n c l u s i o n It is worthy t o point out t h a t product warranty is an important factor in deriving an optimal P M policy. Practitioners should take various product warranties into account in making P M decisions.

References 1. R. E. Barlow and F. Proschan, Mathematical Theory of Reliability, Wiley, New York (1965). 2. F. Beichelt and K. Fischer, IEEE Trans. Rel., 29, 39 (1980). 3. J. H. Cha, JAP 38, 542 (2001). 4. T. Nakagawa, J. Oper. Res. Soc. Japan, 24, 325 (1981). 5. S. H. Sheu, M. R., 3 1 , 1009 (1991). 6. S. H. Sheu, EJOR, 108, 345 (1998). 7. R. H. Yeh, G. C. Chen and M. Y. Chen, IEEE Trans. Rel. 54, 92 (2005).

SPARE O R D E R I N G POLICY FOR R E P L A C E M E N T U N D E R T H E REBATE W A R R A N T Y *

Y. H. C H I E N * Department of Statistics, National Taichung Institute 129, Sec. 3, San-min Rd., Taichung City, Taichung E-mail: [email protected]

of Technology, 404, Taiwan

J. A. C H E N Department of Business Administration, Kao- Yuan University, 1821, Chung-Shan Rd., Lu-Chu Hsiang, Kaohsiung County 821, Taiwan, E-mail: [email protected]

This paper develops, from the customer's perspective, the optimal spare ordering policy for a non-repairable product with a limited-duration lifetime and under a rebate warranty. The spare unit for replacement is available only by order and the lead time for delivery follows a specified probability distribution. Through evaluation of gains due to the rebate and the costs due to ordering, shortage, and holding, we derive the expected cost per unit time and cost effectiveness in the long run and examine the optimal ordering time by minimizing or maximizing these cost expressions. We show that there exists a unique optimum solution under mild assumptions. Finally, we give some comments and conclusions.

1. Introduction Recent research on warranty policies has focused on preventive maintenance (PM). For example, Yeh and Lo 7 , Jung and Park 3 , Yeh et al. 8 , and Chien l. Most of these papers assumed that whenever a product fails during the warranty period or after the warranty expiries, a new one is immediately available for replacement. However, this might not be true in many situations. As a simple example, the distributor may run out of stock and need to order a replacement; clearly, this can incur substantial costs "This research was supported by the National Science Council of Taiwan, under Grant No. NSC94-2213-E-025-005. tCorresponding author. Tel.: +886-4-22196660; Fax:+886-4-22196331. E-mail address: [email protected] (Y.-H. Chien).

345

346

for customers if the product is necessary for ongoing business operations. Jhang 2 was the first to consider the lead time for product replacement under warranty. However, he assumed that the lead time for delivery is fixed. 2. Model description and assumptions Under a rebate policy, the customer is refunded a proportion of the sales price Cp if the product fails during the warranty period [0, W\. The refund amount, R{x), is a linear function of the failure time x (Nguyen and Murthy 4 ); that is, H[X)

~\0,

for x>W,

[i)

where 0 < k < 1 and 0 < a < 1. We assume that the product has a finite useful lifetime T at which point it is discarded without repair. If the product fails before time T, the spare unit for replacement is provided only by an order from the supplier, and the delivering time L is a random variable which follow a distribution function G(y). Specifically, assuming the original product is purchased at time 0 and fails before a specified time to (0 < to < T), an expedited order is executed immediately at the failure time instant, and the failed product is replaced by a new one as soon as it is delivered. On the other hand, if the product has not failed by to, a spare for replacement is regularly ordered in anticipation of failure at time to, and the original product is replaced when it fails. After failure and replacement, this process repeats. We use the following notational conventions. X lifetime of a product W warranty expiration date T useful lifetime limit of a product (0 < T < oo) /(•), F(-), F(-) pdf, Cdf and Sf of X _ r(-) failure rate function of X{r{-) = /(-)/F(-)) L random lead time for delivering a spare g(-), C(-), G(-) pdf, Cdf and Sf of L /i£ mean lead time (E[L] = JQ G(y)dy) Coe, Cor cost for an expedited, regular order Cp product sales price per unit Ch holding cost per unit time

347

shortage cost per unit time resulting from a failed product (Cs = Cai + Cs2, where Csi and Cs2 represent the out of pocket and opportunity costs respectively) the expected cost per unit time over an infinite time horizon cost effectiveness over an infinite time horizon

Ca

CR(t0) CE(to)

3. The cost per unit time minimization model Let X,* denote the length of the ith replacement cycle and R\ the operational cost over the renewal interval X* for i — 1,2,3, •••. Also, let Xi, X2, Xz, • • • be independent copies of X, the lifetime of a product. Prom the renewal reward theorem (see Ross 6 , p.52), the expected cost per unit time over an infinite time horizon is: E[R\]/E[Xl), will be developed under Case 1 t 0 < W and Case 2 t 0 > W. Case 1. to < W: X\ and R\ can be expressed as ' X-L+L,

to + L, Xi, Xl={Xu to + L, T, to + L,

if c 1-1,

if if if if if if

c c c c c c

1-2, 1-3, 1-4, 1-5, 1-6, 1-7,

(2)

and (Co (Co (Co

Rl =

(Co (Co (C0 (Co

+ + + + + + +

Cp) Cp) Cp) Cp) Cp)

+ (Csl + Cs2)L - Wp(l - a ^ - ) , + (Csi + C s2 )(t 0 + L~XX)kCv(l + Ch(X, - to - L) - Wp(l sfr), + Ch(X1-t0-L), + (Cai + Cs2)(t0 + LXx), Cp)+Ch(T-t0-L), Cp) + (Csi + Cs2)(t0 + L-T),

where conditions c 1-1 ~ c 1-7 are given as follows: ' c 1-1: Xl < t 0 , c 1-2: t 0 < Xi < W and Xi < t 0 + L, c 1-3: t 0 < Xx < W and t 0 + L < Xx (i.e., t0 + L<X1 c 1-4: W < Xj < T and t 0 + L < Xu c 1-5: W < Xi < T and Xi < t0 + L, c 1-6: t 0 + L < T < Xi , ^ c 1-7: T < Xi and T < t 0 + L.

W

if c 1-1, •),ifcl-2, if c 1-3, if c 1-4, ifc 1-5, if c 1-6, if c 1-7. (3)

< W),

348 Thus the expected cycle length is /•OO /•oo

/-to /-to

E[X'i)=

/ JO

rW

(x + y)dF(x)dG(y)

JO

/ />T

/

xdG(y)dF(x) /»00

(to +

JT-to rto

y)dG(y)dF(x)

JW JO

yOO

+ /

(t0 + Jx-t0

xdG{y)dF{x) +

JO

+ / / (to + y)dG(y)dF(x) JW Jx-to JT

/ Jt0

+ / J to

roo

+

pT—tQ

+ / / JT Jo

TdG(y)dF(x)

y)dG(y)dF(x) r-T—ta

= HL+

F(x)dx + / 10 JO Jo Jo and the expected cost per cycle is

F(t0 + x)G(x)dx,

(4)

rt,

E[Rl] = [ °l(Coe + Cp) + (Csl + Cs2)fiL - kCp(l - ^)}dF(x) w Jo

+ •••

= Cv{\ - k[l - (1 - a)F(W) - — J • F(x)dx}} +CoeF{t0) + Cor • f (t 0 ) + (Csi + Cs2)[^L - f F(x)G(x - t0)dx] Jt0

+Ch f

F(x)G(x - t0)dx.

(5)

J to Case 2. W < t0 < T: Here X±* and R\ can be expressed as ' Xx + L, Xi + L, to + L, X{ = • Xu T, t0 + L,

if c 2-1, if c 2-2, if c 2-3, if c 2-4, if c 2-5, if c 2-6,

(6)

and

HJ

(Coe (Coe (Cor (Cor (Cor (C o r

+ Cp) + (Csi + Cs2)L - kCP(l - ^ ) , if c 2-1, + Cp) + (Csl + Cs2)L, if c 2-2, + Cp) + (C,i + Cs2)(tQ +L- Xx), if c 2-3, + Cp) + Ch(X1-t0-L), if c 2-4, + Cp) + Ch(T-t0-L), if c 2-5, + Cp) + (Csi + Cs2)(tQ + L-T), if c 2-6,

(7)

349

where conditions c 2-1 ~ c 2-6 are given as follows: 'c c c c c c

2-1: 2-2: 2-3: 2-4: 2-5: 2-6:

Xi < W, W < X-i. < t0, t0 < Xi < T and t0 < Xi < T and T < Xi and t0 + T < X x and T <

X1 < 10 + L, t0 + L < Xx (i.e., t0 + L < Xx < T), L < T, t0 + L.

We can derive B[Xi] and E[Rj] using (6) and (7), and they are equal to (4) and (5), respectively. That is, it doesn't matter whether the ordering time to is within the warranty period or after the warranty expires; the expressions for -El-X^] and i?[i?i] are the same. Hence, the expected cost per unit time is CR(t0) = E[R{]/E[X^}, where 0 < t0 < T. In order to find the optimum tj = mint 0 CR(to) analytically, let D{t0) = E[Xl] and N{t0) = E[R*X}; then we have D'(t0) = F(t0)Di{t0) and iV'(to) - F(to)Ni(to), where D^to) = 1 - Jc, ° F(x\t0)g(x)dx, and Niito) = (Coe - Cor)r(t0) + (C s l + C s 2 )[l - JQT~toF(x\t0)g(x)dx} °h Jo~t0 F(x\t0)g(x)dx, andF(z|t 0 ) = 1 - F(x\t0) = F(t0 + x)/F(t0). We see that dCR(to)/dt0 = 0 if and only if C(t0) = £ ( ^ i ( * o ) [ ~ 4 - S i = {fiL+ Jo

F(x)dx + / Jo

= ^(*o)A(*o)[^(*o) - CR(t0)} F(t0 +

x)G(x)dx}

x {(C oe - Cor)r(t0) + (C s l + Cs2)[l - (

° F{x\t0)g(x)dx]

Jo

-Ch-

f Jo

°F{x\t0)g{x)dx}

-{C P {1 - fc[l - (1 - a)F{W) - - J

F(x)dx}}

+Coe • F{t0) + Cor • F{t0) + ( C i + Cs2) -\ni-

[

F(x)G(x - t0)dx]

Jt0 rT

+Ch • /

rT-to

F(x)G(x - t0)dx} x {1 - /

F{x\t0)g(x)dx}

= 0,

Jo

Jt0

(8) where (j>(t0) = Ni(t0)/Di(t0)

= N'(t0)/D'(t0)

is the marginal cost function

350 of this spare ordering policy. The main results of the optimal ordering time foi, which satisfies CR(ij$i) = Min 0 < to Cor, and the condition Q(t0) = {Csi + Cs2)foF(x)dx - CP{1 - k[l - (1 - a)F(W) ^ f™F(x)dx]} - CoeF(t0) - CorF{t0) > 0 holds. Then: (i) i/C(0) > 0 (i.e., <j>(0) > CR(0)), then the optimal ordering time t^ is 0; i.e., the customer should order a spare when the original product is purchased. (ii) 7/C(0) < 0 andQ(W) > 0 (i.e., 0(0) < CR(0) andcp(W) > CR{W)), then there exists a unique optimal ordering time £QI (0 < ^oi < W) that satisfies C(*oi) = 0; *•£-, the customer should order a spare during the warranty period, (in) If Q{W) < 0 and C(T) > 0 (i.e., (W) < CR(W) and 0(T) > CR{T)), then there exists a unique optimal ordering time £QI (W < ioi < T) that satisfies C(*oi) = 0/ *-e-; the customer should order a spare after the warranty expires. (iv) If C(T) < 0 (i.e., 4>(T) < CR(T)), then the optimal ordering time ig! is T; i.e., the customer should order a spare when the original product ends its useful life. Remark 1. Notice that if Q(t0) < 0, no spares should be ordered since this implies that, under the rebate warranty, incurring shortage costs for the entire lifetime are lower than ordering costs. Thus, it is necessary that Q{to) > 0 to order a spare. 4. The cost effectiveness maximization model In this section we adopt cost effectiveness as an alternative criterion, reflecting efficiency per dollar spent. As described in Park and Park 5 , this criterion is useful for the effective use of available money. Furthermore, this criterion is useful when the benefits obtained from investment are difficult to quantify. Since in our formulation of the spare ordering and replacement process, each replacement is a regeneration point, we can rewrite the cost effectiveness s-availability expected uptime in a cycle s-expected out of pocket cost rate expected out of pocket cost per cycle The expected uptime per cycle is the expected cycle length minus the expected downtime per cycle. According to (2) or (6), the expected uptime

351 per cycle is / 0 F(x)dx. We obtain the expected out of pocket cost per cycle by setting CS2 = 0 in (5). Hence, the cost effectiveness is CE(t0) =

^F(x)dx S — ~w= , (9) Cv{\ - k[l - (1 - a)F(W) - # f™ F(x)dx}} + Tr(to)

where IT (to) is given by CoeF(to)+CorF(t0)+Csl[iiL-

J F(x)G(x-t0)dx]+Ch

f

F(x)G(x-t0)dx.

•'to

Jto

(10) We see that d-!r(to)/dt0 = 0 if and only if #(£o) = 0, where 0(*o) = {Coe - Cor)r(t0) + Csl [1 - /

°

F{x\t0)g(x)dx)

Jo

-Ch-

f ° F(x\t0)g(x)dx. (11) Jo We summarize the main results concerning the optimal ordering time t^2 satisfying CE(tQ2) = Max.o Cor-

Then:

7 X (i) 7/r(0) > Chi" 7{xMx)%:%[^T ^ W t then the opUmal or_ dering time t^2 is 0; i.e., the customer should order a spare when the original product is purchased. (li) If r(0) < c ^ » T ^ ) ^ ) f - ^ H ^ W ^ ) ^ and r{w) > Ch.$-"

F^WM^-C^

#-*^.w

then

there

exists

an unique optimal ordering time t^2 (0 < £j$2 < W) satisfying ^(^02) = 0/ *-e-; ^*e customer should order a spare during the warranty period. Ch w w then there exists an unique optimal ordering time t^2 (W < t^2 < T) satisfying $(£02) = 0; i.e., the customer should order a spare after the warranty expires but before the product reaches the end of its useful life.

(Hi) if r(w) <

^~ ^m9i^-c^-rr ^\^^)M

R e m a r k 2. Theorem 2 implies that, under the cost effectiveness criterion, the optimal spare ordering time t^2 always precedes the useful life limit (i.e., t*Q2 < T).

352 5. C o m m e n t s a n d c o n c l u s i o n s Theorems 1 and 2 present t h e structural properties of the optimal spare ordering time t h a t minimize the former and maximize the latter. Through a numerical example, we can find t h a t the optimal spare ordering time t h a t minimizes per unit time costs depends directly on the warranty period. Some other results are also intuitive and match our expectations. Product warranty is an important factor in deriving an optimal spare ordering policy. Practitioners need t o be able t o weigh their value when planning and scheduling operations.

References 1. Y. H. Chien, International Journal of Systems Science 36, 361 (2005). 2. J. P. Jhang, International Journal of Systems Science 36, 423 (2005). 3. G. M. Jung and D. H. Park, Reliability Engineering and System Safety 82, 173 (2003). 4. D. G. Nguyen and D. N. P. Murthy, HE Transactions 14, 167 (1982). 5. Y. T. Park and K. S. Park, European Journal of Operational Research 23, 320 (1986). 6. S. M. Ross, Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970. 7. R. H. Yeh and H. C. Lo, European Journal of Operational Research 134, 59 (2001). 8. R. H. Yeh, G. C. Chen and M. Y. Chen, IEEE Transactions on Reliability 54, 92 (2005).

A REPAIR COST LIMIT POLICIES WITH FIXED WARRANTY PERIOD CHUNG HYUN CHOI Agency for Defense Development, P.O. Box 18, Chinhae, Kyungnam, 645-600, Korea E-Mail:[email protected] WON YOUNG YUN Department of Industrial Engineering, Pusan National University, Geumjeong-gu, Pusan, 609-735, Korea

Jangjeon-dong

Repairable products which fail within the warranty period are either repaired or replaced by the contractor. In a repair cost limit policy, the repair cost is estimated at each failure and if the cost is greater than a predetermined cost limit, the failed product is replaced by a new one, otherwise it is minimally repaired. In this paper, various repair cost limit shapes with a free warranty period are considered and the best cost limit shape is proposed.

1. Introduction A warranty is a contractual obligation between manufacturer(or contractor) and buyer(or consumer) for the sale of a product. The purpose of warranty is to establish liability in the event of a premature failure of an item or the inability of the item to perform its intended function. Often, manufacturers use warranty as a promotional tool to signal the quality of the product. This improves sales and revenue. It protects a manufacturer against product abuse and negligence to maintenance of the product by the buyer, and also buyer against product failure and unexpected repair costs. There are various types of warranty policies(refer Bliscke and Murthy [2]). Under a free repair/replacement warranty, the manufacturer repairs or replaces free of cost to the buyer within the warranty period. In the case of non-renewing warranty, warranty expires at original warranty period. For repairable products sold with free-replacement warranty, the manufacturer has option of repairing the failed item or replacing it with a new one. The optimal strategy is one that minimizes the expected cost of servicing the warranty over the warranty period. Nguyen and Murthy[7, 8] studied the case where a failed item is always repaired 353

354 and the optimal servicing policy where the decision to repair/replace is based on the age of the failed item. The decision to repair or replace a failed item may depend on the estimated repair cost. Hastings [5] proposed a repair limit replacement policy where repair costs are random. Murthy and Nguyen [6] studied an optimal repair cost limit policy for servicing warranty. They assumed that the cost limit is constant with time. In the repair limit replacement policy, when a system requires repair it is first inspected and the repair cost is estimated. If the estimated cost exceeds a certain amount, known as the "repair cost limit", the system is replaced. Park [9] presented an optimum minimal repair cost limit policy. He assumed that the failed system should be minimally repaired(or replaced by a new one), if the estimated repair cost is less(greater) than a constant cost limit. Park [10], Choi and Yun [4] presented a model for exponentially decreasing dynamic repair cost limit policy using a general failure distribution function. Choi [3] studied several repair cost limit types with a fixed warranty period under the repair limit replacement policy. He made not only experiment with three types of repair cost limit function (constant, decreasing and increasing), but also obtained approximately expected warranty cost by numerical method. This paper experiments with a general repair cost limit function by simulation method, and the best cost limit shape is proposed to analyze warranty cost from the manufacturer's point of view. 2. General Warranty Cost Model In a Free Replacement Warranty(FRW) policy, repaired or replaced items are warranted only for the remaining time of the original warranty period w . In the repair cost limit policy, when an item fails in the warranty period, it is inspected for estimation of the repair cost. If the cost of repair is high, then it is replaced by a new one. Otherwise, it is minimally repaired. The idea of minimal repair was introduced by Barlow and Hunter [1]. After minimal repair, the failure rate lit) does not change. If the time to carry out minimal repairs is negligible, failures can be modeled according to a non-homogeneous Poisson process (NHPP) with intensity fif). In this section, a generalized time-varying repair cost limit policy is presented. The result can be applied to all cases (i.e. the constant, decreasing, increasing and mixed type of cost limit). 2.1. Notations Type-112 failure

A failure which is removed by minimal repair/

355

Cs F(t),r(t),

F(t)

H{z), h(z), q(z)

Cdf, probability density function (Pdf) and hazard rate of repair cost.

G(t), g(t) jj. N,(w)/N2(w)

Sf and Pdf of the time to first renewal. Mean of repair cost. (i.e. \i = (T zh(z)dz) The number of type-1/2 failures for warranty period (0,«). Expected cost of minimal repair at age t.

c(t) P(t) M(t), m(t) U K(t) B(w),b(w) I (t; a ) £ Q(w;a)

W 2.2. 1.

2.

3.

replacement. Cost to the seller of replacing the product. Cumulative distribution function(Cdf), hazard rate and survival function(Sf) of the product failure.

Probability that repair cost is greater than the repair cost limit at time / . Renewal and density function associated with G(t) . Time to first type-2 failure. Total cost of minimal repairs incurred over (0, a>) • Total warranty cost with warranty period (0, co) and its expectation. Time-varying repair cost limit with parameter a at time t and constant repair cost limit. Manufacturer's expected cost with warranty period (0, a>) and servicing the warranty for an item sold with the time-varying repair cost limit, l{t\a). Length of the warranty period.

Assumptions The repair cost, Z is a random variable. Z is a non-negative random variable with a distribution function H(z). h(z)[^ H'(z)] is the density function associate with H(z) and q(z) = h{z)j H{z) is repair cost rate. For type-1 failure, only minimal repair is performed; that is, the failure rate of the item after repair is same as that just before failure. For type-2 failure, the item failed is replaced. Replacements and repairs take negligible times compared to the whole life of product and the warranty period and therefore treated as zero.

First, we obtain basic results for NHPP with two types of failures which are applied to obtain expected warranty cost under repair cost limit policies with free replacement warranty.

356

Theorem : Now, we consider a product of which failures follow a nonhomogeneous Poisson process {N(t), t > 0} with intensity r(t). The product has two types of failures and the probabilities of failure types are random variables depending on failure times. The probability of type 2 failure is P(t) and that of type-1 failure is P(t). Let Nl0(t) and N20(t) be the number of type 1 and 2 failures in an interval (0, t ) . Then processes Nl0 (t) and N20 (t) follow mutually independent NHPPs with respective intensities P(t)r(t) and P(/)r(0(Ross[ll]). 2.3.

General Warranty Cost Model under Repair Cost Limit

In this part, we obtain total expected warranty cost under general repair cost limit and free warranty period W . The expected minimal repair cost E[K(w)] and the expected number of failures replaced by new items in(0, w] can be obtained as follows(Choi[3]); E[K{w))]= [c{x)P(x)r{x)clx

(1)

CO

and E[N2 (w)] = M{w) = ] T G„ (w) =G(w) + | " M(w - x)dG(x) where M(w) is the renewal function associated with

(2)

G(x).

For total expected cost of repairs for warranty period, let5(w) be the total cost in time-interval (0, w] and its expectation be denoted by b(w). Then,

b(w) = E{B(w);U > w] + E[B(w);U < w] = G(w)E[K(w)]+ %E[K(t)]dG(t)+ _[b(w - t)dG(t). Put
then

^ M ^ J G C O P

b(w) = 0(w) + [b(w

- t)dG{t)

V

by Eq. (1). (3)

Thus, b(vty satisfies the following renewal type integral equation (Bliscke and Murthy [2]);

357

b(w) = #(w) + [ V O - t)dM (0 = %G(t)c(t)P(t)r(t)dt

+J"[

i[~'~G(z)c(zjP(z)r(z)dz]dM(t).

Using the integration by parts,

b(w) = _[[1 + M{w - t)]G(t)P(t)r(t)c(t)dt = [[\ + M(w-t)]G(t)r(t)[^''a)zdH(z)]dt.

(4)

The expected cost of servicing the warranty per unit item over the deterministic warranty period W is

Q(w;a

) = csM (w) + b(w)

(5)

where, M(w) is given by Eq. (2). Due to the nature of a renewal function which appears on both sides of the Eq. (2), it is difficult to obtain the expected cost numerically in Eq. (5). Hence, we experiment with a general repair cost limit function by simulation method. 3.

Repair Cost Limit Shapes and Simulation Results

In this section, various repair cost limit shapes with a fixed warranty period are explained and the warranty cost is also obtained by simulation method. 3.1.

Repair Cost Limit Functions

Many types of the time-varying repair cost limit function can be existed(/.e. exponential or linear decreasing/increasing shape, decreasing and then increasing shape, etc.). From the manufacturer point of view, Choi[3] had showed that a mixed function shape(decreasing and then increasing repair cost limit shape in Fig. 1) is better than the other shapes(just only constant, linear or exponential increasing/decreasing). In this paper, we make an experiment with a general type, mixed shape(Fig. 1) and U /Trapezoid shapes(Fig. 2). We can consider two types of repair cost limit function as in Fig. 1, one is linearly time-varying function;

^(t;a) = j A ( t ; a )

= Cs_(Cs

"a)Xt/d'

°-t
[£2(V,a) = [(cs - a ) x t + a x w - c s x d ] / ( w - d ) ,

(6) d
358

^

^

*Ni •V

1S*

"*J**»z N N/

•*-"

^»

4

/

Jz

/

?

<M

•

Fig. 1. Decreasing & increasing shape.

Fig. 2. U shape & Trapezoid shape.

and the other is an exponentially time varying function ; £(t;a) = -

,((;a) = c I xe" B , 2(t;a)

0
fi

= csx(l-e' '),

where,

(7)

d
Also, we can consider two kinds of repair cost limit function in Fig. 2. One is U shape time-varying function;

£(t;a)=4(cs-a)(t-w/2)2+a

0
(8)

and the other is a trapezoid shape time-varying function; £x(t;a) = cc-(cc-a)xtld, t{t;a) =U2(t;a) = £,

0
dx
£:>(t;a)=[(cc-a)xt+axw-ccxd2]/(w-d2),

d2
3.2. Simulation Procedures Step 1. Input cs, w,a(d), / ? ( / ? ) , I(X), andstart(Current time = 0). Step 2. Generate a failure and store the failure time(failure time^ Current time). Step 3. Generate a repair cost, compare the repairing cost with repair cost limit (at that time).

359

Step 4. Compute the repair and replace cost until the current time arrives at the end of the warranty period. Step 5. Repeat Step 2 ~ Step 4 until a given iteration number is satisfied. Step 6. Find £ , £ ( / ; « * ) minimizing Q(w;H) and Q(w;a). 3.3. Simulation Results Here, the Weibull failure distribution {fi-2, /l=l) and Weibull repair cost distribution (J3=\, ~k=0.04) are assumed. Table 1. presents simulation results for a warranty period (w = 1) with five types of repair cost limit functions and Table 2. shows the minimum warranty cost and sample variances. Table 1. Simulation Results ( c s = 100 , Weibull failure & repair cost distribution : /? = 2 , A = 1, /? = 1, A = 0 . 0 4 ) Constant Shape(Fig. 1) U Shape (Fig-3)

t

90

91

92

94

95

Simulation Results

24.5234

24.4665

24.3442

24.4662

24.4797

a

88

89

90

91

92

Simulation Results

24.4772

24.4317

24.2747

24.4192

24.4823

(0.4, 0.5)

(0.5, 0.5)

(0.55,0.5)

(0.6,0.5)

(0.7, 0.5)

24.4718

24.2518

24.2282

24.3806

24.3810

(83, 0.5)

(84, 0.5)

(85,0.5)

(86,0.5)

(87, 0.5)

24.4605

24.3586

24.0984

24.4215

24.4764

(a,d)

Exponentially Mixed Shape (Fig. 2)

Simulation Results

Linearly Mixed Shape (Fig. 2)

Simulation Results

Trapezoid Shape(Fig. 3)

(a,d)

(M>^) (87,0.2,0.8) (88, 0.2, 0.8) Simulation Results

24.6413

24.3391

(89, 0.2, 0.8) (90, 0.4, 0.6) (91,0.1,0.9) 24.0876

24.1198

24.2434

Table 2. Minimum Warranty Cost and Variances ^-\^

Shape

Result ^ \ ^ Minimum Warranty Cost Variance

Constant

u

l-.xp. Mixed

Linearly Mixed

Trapezoid

24.3442

24.2747

24.2282

24.0984

24.0876*

0.0099

0.0053

0.0305

0.0147

0.0146

360

4. Conclusions In this paper, we experiment with various shape of time-varying repair cost limit function in a free warranty period. Repair cost limit as high in early stage and low in late stage may be also effective in our simulations. As in Table 2, mixed type of repair cost limit function is more effective than constant shape and U shape functions. Especially, the trapezoid shape function is the most effective function in our simulation, but additional experiments are necessary for the accuracy and consistency in simulation. References 1.

R. E. Barlow and L. C. Hunter, Optimum Preventive Maintenance Policies, Operations Research 8,90-100 (1960). 2. W. R. Blischke and D. N. P. Murthy, Warranty Cost Analysis, Marcel Dekker, New York (1994). 3. C. H. Choi, Maintenance and Warranty Policies with Uncertain Life Cyle, Ph. D. Thesis, Pusan National University, Korea, (2001). 4. C. H. Choi and W. Y. Yun, A Note on Pseudodynamic Cost Limit Replacement Model, Int. Journal ofRel., Quality and Safety Engineering 5(3), 287-292 (1998). 5. N. A. J. Hastings, The Repair Limit Replacement Method, Journal of the Operational Research Society 20, 337-349 (1969). 6. D. N. P. Murthy and D. G. Nguyen, An Optimal Repair Cost Limit Policy for Servicing Warranty, Mathl. Computer Modeling 11, 595-599 (1988). 7. D. G. Nguyen and D. N. P. Murthy, An Optimal Policy for Servicing Warranty, Journal of the Operational Research Society 37(11), 1081-1088 (1986). 8. D. G. Nguyen and D. N. P. Murthy, Optimal Replace-repair Strategy for Servicing Products Sold with Warranty", European Journal of Operational Research, 39, 206-212, (1989). 9. K. S. Park, Cost Limit Replacement Policy under Minimal Repair, Microelectronics and Reliability 23(2), 347-349 (1983). 10. K. S. Park, Pseudodynamic Cost Limit Replacement Model under Minimal Repair, Microelectronics and Reliability 25, 573-579 (1985). 11. S. M. Ross, Stochastic Process, Wiley, New York (1996).

TWO-DIMENSIONAL WARRANTY: MINIMAL/COMPLETE REPAIR STRATEGY

S. CHUKOVA School of Mathematics, Statistics and Computer Science Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand E-mail: Stefanka. ChukovaQmcs. vuw. ac. nz Y. HAYAKAWA School of International Liberal Studies, Waseda University, 1-21-1 Nishi-Waseda, Shinjuku-ku, Tokyo 169-0051, Japan E-mail: [email protected] M.R. JOHNSTON School of Mathematics, Statistics and Computer Science Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand E-mail: Mark. Johnston@mcs. vuw. ac. nz For repairable products, the warrantor has options in choosing the type of repair performed to an item failed within the warranty period. We focus on a particular warranty repair strategy, related to the degree of the warranty repair, under nonrenewing, two-dimensional, free of charge to the consumer warranty policy. We consider a rectangular warranty region and divide it into four disjoint subregions. Each of these subregions has a preassigned degree of repair for a faulty item. Our main goal is to determine the subregions, so that the associated expected warranty servicing cost per item sold is minimised.

1. I n t r o d u c t i o n Rectified warranty claims are a cost, of doing business and a liability incurred by the manufacturer at the time of sale. T h a t ' s why studies of w a r r a n t y and, in particular, forecasting warranty expenses is of significant interest for the manufacturer. Also, warranty d a t a provide information about the durability of products in the field and, therefore, is of interest t o engineers. Furthermore, warranty coverage can be regarded as a p r o d u c t ' s attribute t h a t affects buying decisions. We consider a two-dimensional warranty with a rectangular w a r r a n t y 361

362

region and divide it into four disjoint subregions, so that each of these subregions has a preassigned degree of repair for a faulty item. Our main goal is to determine the subregions, so that the associated expected warranty servicing cost per item sold is minimised. We assume that during the warranty coverage, the warrantor assumes all expenses and the warranty is non-renewing, i.e., we study a two-dimensional non-renewing free repair warranty (NRFRW). A general treatment of warranty analysis is given in Blischke and Murthy l i 2 and Chukova, Dimitrov and Rykov 3 . For a recent literature review see Murthy and Djamaludin 6 . The evaluation of any warranty parameter of interest in modeling warranties, depends on the failure and repair processes and on the preventive warranty maintenance. The repairs can be classified according to the degree to which they restore the ability of the item to function (Pham and Wang 7 ). In this study, we focus only on complete repairs and minimal repairs. A complete repair resets the performance of the product so that upon restart the product operates as a new one. This type of repair is equivalent to a replacement of the faulty item by a new one, identical to the original. A minimal repair has no impact on the performance of the item. It brings the product from a 'down' to an 'up' state without affecting its performance. The outline of this paper is as follows. In section 2, the model formulation is given, along with the description of the process of failures and the warranty repair strategy. Section 3 provides analysis of the proposed model. Section 4 offers some conclusions.

2. Model formulation Consider a product sold under a two-dimensional warranty, i.e., a warranty which is described in terms of two measures. Usually, the two-dimensional warranty is associated with a region O c l 2 , where the warranty measures are well defined. For more on the different shapes of fi and related warranty policies, see Blischke and Murthy 2 . For convenience, the warranty measures of interest will be called age and usage. We focus on NRFRW with Q — KxL, i.e., the warranty expires either when the product's age, denoted by T, exceeds K, or the total usage of the product, denoted by X, exceeds L, whichever occurs first. For example, typical automotive warranties are limited by age as well as mileage and fi — 3 years x 36,000 miles.

363

2.1. Modeling

failures

We will assume that the repairs are instantaneous, i.e., the repair time is negligible with respect to the operating time of the product. Denote by T(t) the virtual age of the product at calendar time t, and denote by X(t) the virtual usage of the product at calendar time t. Assuming the warranty repairs are minimal or complete implies that T(t) < t, for any t G [0,K]. Also, we assume that a repair does not affect the customer's usage accumulation rate and model X(t) as a linear function of T(t) (Iskandar et al. 5 ), i.e., X(t) = RT(t),

(1)

where R is the usage accumulation rate. For a particular customer, R is assumed to be a constant over fl. The usage accumulation rate is assumed to be a positive random variable with known cumulative distribution function G(r) — P{R < r) and corresponding probability density function g(r). In view of equation (1), we model the failure/repair process of the product as a point process with an intensity function X(t\r), i.e., X(t\r) 5t — P(a product operating up to time t fails in [t,t + St) \ R = r). Related to X(t\r) cumulative failure rate function is A(t|r) = fQ X(x\r) dx and the cdf of the time to first failure of the product, say T^r, is i^ 1|r (*) = l - e x p \ - J

X(x\r)dx)

.

If all warranty repairs are complete repairs, conditional on R = r, the corresponding failure/repair process is a renewal process generated by FrUr(-), whereas under minimal warranty repairs, again conditional on R = r, the corresponding process is a non-homogeneous Poisson process. In order to identify the structure of the model, the intensity function \(t\r) has to be selected. Overall, X(t\r) has to be a positive increasing function of r, T(t) and X(t). If appropriate data are available, an approach for obtaining an empirical estimate of X(t\r) is given in Rigdon and Basu 8 . 2.2. Repair

strategy

In what follows we describe our warranty repair strategy. In Iskandar et al. 5 the warranty region is divided into three disjoint subregions fii, fl2, and 0 3 , such that, f^ U fi2 U H 3 = £1. They adopted the following repair strategy (S):

364 usage

i

usage I

L

L

,-'

y'

**

,'

-'''

r-

L3 I,

_,' , ' ,--

L, L,

a

/'''a>

^^^r

a

,

,

K age , *, Figure 1. Iskandar et al. r2 > r\ K

Q

a

,

°,

Kj

K,

4 K3

K

age

Figure 2. Warranty: case (A), r2 < r\.

(1) Any repair in £l\ is minimal, costing C2(2) The first repair in fl2 is complete, costing ci, and any further repair in O2 is minimal. (3) Any repair in ^3 is minimal. They analysed the case (see Figure 1)

h.

K2

r2 and — = n

(2)

and identified (S) by the triple <j> = (Ki,K2,r2). They computed the minimum expected warranty servicing cost EC(*) and obtained the optimal strategy (S), characterised by * = (K{, K^r^)Chukova and Johnston 4 extended their study to an unrestricted strategy (S) by relaxing (2), i.e., allowing j± ^ jg a s w e ll a s J£ = jg. In this paper, we extend Iskandar et al. 5 by considering an additional subregion, i.e., Qi, tt2. Q3 and fi 4 such that ill U ^2 U 03 U O4 — f2 (see Figure 2) combined with a natural extension of the restricted strategy (S). Our new restricted strategy (S) is: (1) Any repair in Qi is minimal, costing C2. (2) The first repair in Sl2 and the first repair in O3 are complete, costing c\, and any further repair in f^ and $1$ are minimal. (3) Any repair in JI4 is minimal. We consider the case (see Figure 2) L\ Ki

L2 K2

L3 K3 = r 2

and

K

n.

(3)

365

The optimal repair strategy is the one which minimizes the expected warranty servicing cost per item sold. Thus, our main goal is to determine the subregions fli,U2,Q3 and fi 4 , so that the corresponding expected warranty servicing cost per item sold is minimised.

3. Analysing the model Firstly, we define = c2 A(a;1|r) + (A(x\r) - A(x3\r))

ECr(x1,x2,x3,x) + / Jxi

^(A^H-A^r))

I - + A(x2 - t\r) + (A(x - t\r) - A{x3 - t\r)) e -(A(a;3-t|r)-A(x2-t|r >• M

fTxilr(t)dt+ H (i + A(x-t\r)} Jxi

vM

>

hX2,xl]r{t)i

(4)

We will make use of this function while deriving the expected warranty servicing cost per item sold. 3.1. Solving for optimal

(S)

We further assume that there is no delay in reporting and exercising the warranty claims. The cost of a minimal repair is c2, whereas the cost of a complete repair is ci, with /j, = 9Z. Assuming that, the warranty region is predetermined, i.e., fl is known, there are four decision variables that identify the repair strategy (S), 7 = ( ^ 1 , ^ 2 , ^ 3 , ^ 2 ) , such that K\ < K2 < K3. These four variables determine uniquely the shape of the subregions fii, Q2, 0,$ and Q4. Thus, we need to select 7* = {K{,K2,K^,r2) so that 7* = argmin£'C(7). 7

We compute the expected warranty servicing cost per item sold, EC{^), by firstly conditioning on R = r and computing ECr(-y) = EC(j\r), and then removing the condition. Thus our next step is to show how to compute ECr(7). There are two possible orderings between r2 = jj£- = |p- = jp. and J"i = jfc. Thus we consider the following two cases: (A) r2 < rx and (B) r i < f'2 and compute corresponding EC(-y).

366

3.2.

Detailed

analysis

of case (A): r-z < T\ r case

We compute ECA{I) f° (A), see Figure 2, showing all the steps. Later we use analogy to express ECB(I)In order to compute ECA(I), we need to consider three cases for the possible values r of R, i.e., (1) r < r2, (2) r2 < r < ri and (3) r\ < r. Case (1). Suppose r < r2 (see Figure 2). Then ECr(j) is a sum of the expected costs EC?' (7), where EC^ (7) is the expected warranty servicing cost over the subregion $7,, i = 1,2,3,4. Next, we aim to evaluate each of the EC?i(-y),i = 1,2,3,4. Over the subregion fii, the warranty repair is minimal, thus EC^{1)

= c2K{K1\r).

(5)

In order to compute EC?2 {7), we need to consider, conditional on R = r, the distribution of the time, say Tji^r, to first failure after K\. Thus we have FTKiir(t)

= P(TKllr =

< t) = f

A(*|r)e- ( ^ W - A d M r ) )

^

1 _ e -(A(t|r)-A(ftT 1 |r))

where X(x\r) dx is the probability of failure at x and e I J is the probability that there are no failures within (K\,x). The corresponding VdfisfTKilr(t)=X(t\r)e-(A^-A^). In computing EC!?2 (7) we consider two possible cases: • TK^ > K2 with probability 1 _ FTKIIAK2)

= FTKilr(K2)

• TKl\r < K2 with probability

= e~ (A(*l->-A(*,|r,)

a n d

FTKI[T(K2)•

Thus EC^ij) = / ^ \

2( c i + C 2 A

^ 2 " * | r ) ) frKl„.(i) d< 0

^ 1 < r ^ i . < K2 otherwise

(6) In order to compute EC?3^), we need to consider, conditional on R = r, the distribution of the time, say TK3,KI\T^ to first failure after K2. It can

367

be shown that the corresponding pdf is 3-(A(t|r)-A(/d|r))

/rx a . K l l P (*) = A ( t | r ) e "

f

K2

X(t - u\r) e~ ( A ( t - k ) - A ( ^ - l o )

fTK^{u)

du>

JK

In computing EC^}3 (7) we consider two possible cases: • TKaiKl\r > K3 with probability FTK2K^{K3) and < K3 with probability FTK2
= /"

fc^^W*

= 1-

FTl<2,Kllr(K3).

Thus, corresponding expected warranty costs are £Cn3(7)

=

J J

* (ci + c2A(K3 - t|r)) /T K2 , Kl|r (t) dt

(

K 2 < 1 ^ , , . < K3

0

otherwise (7)

Similarly, for EC™* (7) we obtain c2 /

(A(K - t\r) - A(K3 - t\r)) hK„K^t)

dt

for K2 < TKaiKl\r < K3 rA2 EC^fr)

= < C2 /

(A(K - t\r) - A(K3 - t | r )) e -(A(*,-*|r)-A(X a -*|r)) fT

(t)

dt

for K-i. < TKx\r < K2 and TK2yKl]r > K3 c2 (A(K\r) - A(K3\r)) e - ( A ^ a W - A ^ l r ) ) for TKl\r > K3 (8) Finally, using (5), (7), and (8), and simplifying, we get EC^(7)=ECr(K1,K2,K3,K), where ECr{KltK2, K3, K) is given by (4). Case (2). Let r 2 < r < r\. In this case, conditional on R, the warranty over the subregions il\, $12 and fl3 will expire due to exceeding the usage limits Li: i = 1,2,3. Thus, the expenses over fli, i = 1,2,3 will be identified by Ti = ^-, not by Ki, i = 1,2,3. Hence, using a similar approach as in case (1) and taking into account the adjustment regarding T,'S, we obtain: EC^(7)

=

ECr(n,T2,T3,K).

368 Case (3). Finally, suppose n < r. Then, following a similar argument, denoting by Tj = ^f, i = 1,2,3 and r = £ we obtain: EC(3)(7) = £Cr(T1,r2,T3,T). At last, removing t h e condition R = r, we obtain ECA(l)=

P

ECi1\1)g{r)dr+

JO

P

£C< 2 >( 7 )g{r)dr

Jr2 /•CO

+ /

BC<3)(7)s(r)rfr.

1S t n e Similarly, the expected w a r r a n t y servicing cost ECB{I) same as EC A (7) with r\ and r 2 interchanged and ECr (7) modified to be equal to ECr(Ki,K2,K3,T).

4.

Conclusions

In this paper, we have extended the results in Iskandar et al. 5 on warranty repair strategy for repairable p r o d u c t s sold with two-dimensional warranty. O u r strategy is characterised by four p a r a m e t e r s , compared t o t h e threeparameter strategy in Iskandar et al. 5 . A comparison between our strategy and previously studied, more restrictive strategies will be reported during the workshop and provided in t h e full version of this paper. References 1. Blischke, W.R. and Murthy, D.N.P. Warranty Cost Analysis. Marcel Dekker, 1993. 2. Blischke, W.R. and Murthy, D.N.P. Product Warranty Handbook. Marcel Dekker, 1996. 3. Chukova, S., Dimitrov, B., and Rykov, V. Warranty analysis, a survey. Journal of Soviet Mathematics, 67(6):3486-3508, 1993. 4. Chukova, S. and Johnston, M. Two-dimensional warranty repair strategy based on minimal and complete repairs, accepted in Mathematical and Computer Modelling, 2006. 5. Iskandar, B.P., Murthy, D.N.P., and Jack, N. A new repair-replacement strategy for item sold with a two-dimensional warranty. Computers and Operations Research, 32:669-682, 2005. 6. Murthy, D.N.P. and Djamaludin, I. New product warranty: A literature review. International Journal of Production Economics, 79(2):236-260, 2002. 7. Pham, H. and Wang, H. Imperfect maintenance. European Journal of Operational Research, 94:425-438, 1996. 8. Rigdon, S.E. and Basu, A.P. Statistical Methods for the Reliability of Repairable Systems. John Wiley & Sons, 2000.

OPTIMAL PRODUCTION RUN-LENGTH A N D W A R R A N T Y P E R I O D FOR ITEMS SOLD W I T H R E B A T E COMBINATION WARRANTY*

B. C. GIRI AND T. CHAKRABORTY Jadavpur

Department of Mathematics, University, Kolkata - 700 032, E-mail: [email protected]

INDIA

This paper considers a decision model for a production system in which the demand of the product is influenced by the warranty period offered to the customer. The production process is not perfect; it may shift from an in-control state to an outof-control state at any random time where some non-conforming items may be produced. The proposed model is formulated under rebate combination warranty policy, assuming t h a t the process shift distribution is arbitrary and product defects are detectable only through time testing for a significant period of time. The expected pre-sale and post-sale costs per unit item is taken as the criterion for optimality. Some characteristics of the model are studied analytically. Optimal decisions are also derived in a numerical example.

1. Introduction Warranty plays an important role in the consumer behavior on selecting especially the new brand products in the market. Consumers may predict the quality of a product based on its warranty 1 . Warranty has also a significant promotional value to the manufacturer. Hence it can be considered as a marketing tool to differentiate from competitors 2,3 , since a satisfactory warranty policy enhance consumer's purchase willingness. Glickman and Berger4 perhaps were the first to consider demand as a function of market static in which the demand decreases exponentially with price and increases exponentially with warranty length. Recently, Lin and Shue 5 assumed warranty, price and cumulative sales dependent demand and showed that optimal decision policies can be characterized by simultaneously increasing or reducing both price and warranty length. *Work partially supported by Jadavpur University, Kolkata - 700 032 under Potential for Excellence Research Scheme (2005-2006).

369

370

In this paper, we consider a decision model for non-repairable products whose demand is influenced by the warranty period offered to the customer. The products are produced in a deteriorating production system and are sold under rebate combination warranty policy. Under this policy, the manufacturer agrees to provide full refund of the original purchase price up to a certain time from the time of initial purchase; any failure in the interval from that time to the end of the warranty period results in a pro-rata refund. Our objective is to find the optimal production time and the optimal length of the warranty period which minimize the expected cost per unit item. 2. Model Description 2.1.

Assumptions

We adopt the following assumptions to develop the proposed model: (A-l) The production process starts always in an 'in-control' state but it may shift to an 'out-of-control' state at any random time. (A-2) The time to process shift follows an arbitrary probability distribution with a non-decreasing failure (hazard) rate. (A-3) Inspection and preventive maintenance are carried out at the end of each production run. At the time of inspection, if the process is found in the 'out-of-control' state, then it is restored to the 'incontrol' state with negligible time. (A-4) Non-conforming items are (i) produced not only in 'out-of-control' state but also in 'in-control' state, (ii) operational when put into use but performance characteristics are inferior to those of conforming items 6 , (Hi) detected only through time-testing for a significant time period. (A-5) Produced items (both conforming and non-conforming) are released for sale with a rebate combination warranty policy. 2.2.

Notation

X fx(') Fx(-) Fx(-) D (> 0) P (> 0)

: random variable denoting the time to process shift '• probability density function of the time to process shift '• cumulative distribution function of the time to process shift • the survivor function : demand rate : production rate

371 T c s (> 0) cp{> 0) c/i(> 0) w{> 0) w\{< w) r(t) N

: production run time : setup cost : unit product price : inventory holding cost per unit product per unit time : total warranty period : full refund warranty period : rebate function : number of non-conforming items produced during each production run v(> 0) : inspection together with preventive maintenance cost R(-) : restoration cost which is function of detection delay 6\ : probability that a produced item in the 'in-control' state is non-conforming, 0 < 6\ < 1 #2(> #i) : probability that a produced item in the 'out-of-controP state is non-conforming hi (t) : hazard rate of a conforming item h.2(t) : hazard rate of anon-conforming item; /12(f) > > hi(t), 0 < t < 00

3. Model Formulation Suppose that for a single-unit, single-item deteriorating production system, the demand of the product is influenced by the warranty offered to the customer, i.e., D = D{w) = ki(w + k2f,

fci,Jfc2>0,

0
(1)

where k\ is a known constant of amplitude factor, k2 is a known constant of the time displacement that avoids the possibility of zero demand when w is zero and /3 is the displaced warranty length elasticity. For notational simplicity, we use D and D(w) interchangeably throughout the paper. Since the total number of items produced during a production run is PT, therefore, the inventory holding cost per item can be easily derived as ch{P-D)T 2D When the random variable X takes the value T, let the restoration cost, which is a linear function of detection delay, be R(T -

T)

= jfe (T - r)

372

where k (> 0) is a real constant. Then the expected restoration cost per unit item in a complete production cycle is

^ J

k(T-T)fX(T)dT.

Therefore, the total pre-sale cost per unit item is given by

Let N denote the number of non-conforming items produced in a production lot of size PT. Then, we have (61PT

+ 62P(T-T),

{

0
QxPT,

T>T.

The expected number of non-conforming items produced in each production cycle is, therefore,

E(N) = J {e1PT + 92P(T-T)}fx(r)dr /•oo

+ /

exPTfx(r)

JT

dr.

Simplifying, we get E(N) = 92PT - (02 -0i)P

f

FX(T)

dr.

(2)

Jo

It is easy to verify that E(N) increases as the production run length T increases. If q denote the fraction of non-conforming items produced in a production lot, then we have q = q(T) = E(N)/{PT)

= 92- ^Z^A

£

Fx(r)

dr.

(3)

Hence, the mean number of failures under warranty for items produced in a production lot of size PT is (1 - q)Ri + qR2 where Rx = /J" hi{t) dt and R2 = J^ h2 (t) dt, respectively. Let the rebate given to the customer during the warranty period w be •p,

( * ) = .cv -^*-, 7-t V

W — Wi '

0 < t < wi

:Vr"
Wl

1 _

<w . _

(4)

373

Then, the expected post-sale (warranty) cost per item is fWl fw (1 - q)cp / ln(t)dt + (1 - q)cp /

Jwi

JO

fWl fw / h2(t)dt + qc„ /

+qcp

h!(t)dt

W — W\

w -t

JWlw~

Jo

w -t

h2(t)dt.

w

(5)

\

Hence, the expected total cost (which includes the pre-sale cost and the post-sale cost) per item is given by C(TM

C

=c

p

+

^ ^ fWl

+(l-q)cp

C +

~^

±;£k(T-r)fx(r)dr

+

fw hi(t)dt + (1 - q)cp /

w -t

hx(t)

Jw! W — Wi

JO

fWl fw w — t xdt + qcp / h2(t)dt + qcp / h2{t)dt. (6) Jo Jwl w - wi Our objective is to determine the optimal production run length T* and the warranty period w* which minimize C(T,w), subject to the condition: S{w) = P-k1(w

+ k2f

>0.

(7)

As it is difficult to examine the existence and uniqueness of the optimal pair (T*,w*) analytically, we proceed to analyze the model when one of T and w is known. 3.1. Analysis

of the model when w is

known

Suppose that w is known in advance and it satisfies the constraint (7). Let Wi = Iw, 0 2

{£

Fx{r)dr

- TFx{T)}A{w)

(8)

where the prime denotes differentiation with respect to T and A(w)=

\h2(t)-h1(t)\dt+ J •^z1-\h2(t)-h1(t)\dt Jo L Jtw w-lwl > 0, since h2(t) > hi(t) V t > 0, by assumption.

i

(9)

374

Proposition 1. Suppose that the shift distribution Fx{-) is exponential. Then there exists a unique optimal production run length T* which minimizes Ci(T). Proof: Let us define g(T)=T2C[(T). Then g(T) has the same increasing property as C[(T). Since lim g(T) = — £s±v. < o and lim g(T) —> oo. So, there is at least one sign change of T—>oo

g(T) in [0,oo). Now, differentiating g(T) with respect to T, we get g'(T) = T[Ch{PD

D)

+ **jjll

+ (92 -

e1)A(w)f'x(T) > 0 ,

when -Fx(-) is exponential. This implies that g(T) is an increasing function of T. Hence C[(T) has exactly one sign change from (—)ve to (+)ve in [0, oo). Therefore, there exists a unique optimal production run length T* which minimizes C\{T). This completes the proof of the proposition.

Proposition 2. The optimal production run length decreases with the increase in wx or with the decrease in the pro-rata warranty period (w—wi). Proof: Let h(t) = h2(t) - hi(t) and h,l2 € [0,1] such that 0 < l2w < l\w < w. Then, from equation (9), after simple calculation, we have

A(i1W)-A(i2W)=fiw±-^-Ht)dt+ r

^-y^-fMt)*

Jl2w W(l-l2) Jhww{l-h){l-l2) > 0 for h > l2. (10) Let wn and u>i2 denote the full refund warranty periods corresponding to the values l\ and l2, respectively. Then l\ > l2 => wu > w\2. Now, from equations (8), (9) and (10), we have Cl(T|u;11)>Cl(r|«;12))

(11)

where C[{T \ wn) = C[(T, w = wn). If T*{wn) and T*{w12) represent the optimal production run lengths corresponding to C\{T \ wn) and C\{T \ w\2), respectively, then we have Cl(T*(ii;11)|«;12)i2)|«;i2), which implies that C[(T*{wu) \ wn) < C[(T*(w12) \ w12). Since C[(T) is an increasing function of T, therefore, T*(wn) < T*(w\2). Hence, the Proposition 2 is proved.

375

3.2.

Analysis

of the model when T is

known

In this case, if wi — Iw, 0 < / < 1 and C ^ M denote the expected presale and post-sale costs per unit item, then our objective is: (R) Minimize C2(w), subject to the constraint (7). Proposition 3. There exists a local minimum of the constrained optimization problem (R) provided that hi (t) and h2 (t) are increasing functions of t. Proof: The associated Lagrangian function L(w) is given by L(w) = C2(w) + XS(w) where A is the Lagrange multiplier. The Kuhn-Tucker necessary conditions for optimum are |^=0,

^ = 0 ,

XS{w) = 0, S(w)>0,

A>0.

The above Kuhn-Tucker conditions are also the sufficient conditions for the minimum of C
+

dh2t2dt dt

+

(1 - q)cP T w3(l-l)Jlw

d h ^ dt

k2f-2.

+

Clearly ^ > 0 and 0 the proposition. 4.

qcp r w3(l-l)Jlw

> 0 for 0 < /? < 1. This completes the proof of

Numerical Example

Let us suppose that FX(T)

= 1 - e - ( A r ) \ T > 0, A > 0, 7 > 1,

hi{t) = e {t/e)

h ~ "'' h^) = r~(t,2?'

and the parameter values involved in the model are: cp = 5, cs = 250, ch — 0.5, k = 5, ki = 100, k2 = 4, v = 10, Wl = 0.75w, A = 0.5, 7 = 2, 6>i = 0.05, B2 = 0.95, P = 500 (in appropriate units).

376 T a b l e 1. Impact of the parameter (3 on t h e optimal decisions

p 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

/T-I*

0.720772 0.787659 0.865054 0.955335 1.062060 1.190660 1.349920 1.555290 1.837450 2.270930

w* 0.000000 0.056156 0.101863 0.138392 0.166794 0.187957 0.202675 0.211724 0.216059 0.217512

d(T*,u,*) 6.44251 6.32028 6.20285 6.08996 5.98111 5.87558 5.77243 5.67032 5.56725 5.45965

Table 1 shows the optimal results when T and w are taken as decision variables. Note t h a t as 3 increases, b o t h T* and w* increase whereas C\(T*,w*) decreases. Usually the expected cost increases with the warranty period. Here, as 8 increases, the increase in post-sale (warranty) cost per unit item is less t h a n t h e decrease in pre-sale cost per unit item. As a result, the proposed warranty model provides a lower item cost t h a n t h e corresponding model without warranty.

References 1. W. Bulding and A. Kirmani, A consumer-side experimental examination of signaling theory: do consumers perceive warranties as signals of quality?, Journal of Consumer Research, 20, 111-123 (1983). 2. M. A. J. Menezes and I. S. Currim, An approach for determination of warranty length, International Journal of Research in Marketing, 9, 177-196 (1992). 3. V. Padmanabhan, Warranty policy and extended service contracts: theory and an application to automobiles, Marketing Science, 12, 230-248 (1993). 4. T. S. Glickman and P. D. Berger, Optimal price and protection period decisions for a product under warranty, Management Science, 22, 1381-1389 (1976). 5. P. C. Lin and L. Y. Sheu, Application of optimal control theory to product pricing and warranty with free replacement under the influence of basic lifetime distributions, Computers and Industrial Engineering, 48, 69-82 (2005). 6. I. Djamaluddin, D. N. P. Murthy and R. J. Wilson, Quality control through lot sizing for items sold with warranty, multi-objective model for warranty estimation, International Journal of Production Economics, 3 3 , 97-107 (1994).

ANALYSIS OF W A R R A N T Y DATA W I T H COVARIATES

MD. REZAUL KARIM AND KAZUYUKI SUZUKI Department of Systems Engineering The University of Electro-Communications Tokyo 182-8585, Japan. E-mail: [email protected]; [email protected]

The reliability characteristics of automobile components depend on factors or covariates such as the automobile operating environment (e.g. temperature, rainfall, humidity, etc.), usage conditions, manufacturing periods, types of automobiles which use the components, etc. In recent years, many automotive manufacturing companies utilize warranty database as a very rich source of field reliability d a t a that provide valuable information on such covariates for feedback to new product development systems on product performance in actual usage conditions. In warranty database, the information on those covariates are known for the components which fail within the warranty period and are unknown for the censored components. This article considers covariates associated with some reliability-related factors and presents a Weibull regression model for the lifetime of the component as a function of such covariates. The EM algorithm is applied to obtain the ML estimates of the parameters of the model because of incomplete information on covariates. An example based on real field data of automobile component is given to illustrate the use of the proposed method.

1. Introduction Automotive manufacturing companies analyze field reliability data to enhance the quality and reliability of their products and to improve customer satisfaction. In recent years, many manufacturers utilize warranty database as a prime source of field reliability data, which can be collected economically and efficiently through repair service networks. Warranty claim data is superior to laboratory test data in the sense that it contains information on actual environment in which the product is used. Therefore, a number of procedures have been developed for collecting and analyzing warranty claim data (e.g. [l]-[4]). In this paper, we discuss an approach for modeling the reliability of a specific system (unit) of automotive components based on field failure warranty data. The unit's lifetime depends on some explanatory variables or covariates such as the automobile operating envi377

378

ronment or used-region, types of automobiles which use the unit, and the types of failure modes. If a unit fails within the warranty period, the information on covariates can be known from warranty database; however, such information are unknown for the censored units. The principal aim of this paper is to present a Weibull regression model for the lifetime of the unit which depends upon a vector of categorical covariates and to assess the reliability of the unit as a function of those covariates. The ExpectationMaximization (EM) algorithm [5] by the method of weights proposed in [6] is used to estimate the parameters of the model. 2.

Modeling

Regression analysis of lifetimes involves specifications for the distribution / ( t | x , /3, a) of a lifetime variable, T, given a vector of explanatory variables or covariates, x = (x\, • • • ,xp)', upon which lifetime may depend. Let 9 — (13', a, 7')' be a (p + r + 1) x 1 vector of all parameters in the model, where /3 = (/?i, • • • ,/? p )' represents a vector of regression coefficients, a is a scale parameter, and 7 = (71, • • • , 7 r ) ' is the parameter vector associated with the distribution of covariates x, gr(x|7). The complete-data log-likelihood function based on n independent observations can be written as n

n

/ x , t ( 0 | x , t) = J2 Jx,t(0|Xi, U) = J2 fyxO9' 2=1 i=l

n CT Xi

l > **) + J2 * * ( 7 | X i ) , i=l

(1)

where lXtt(9\x.i,U) is the complete-data log-likelihood of 9 for the ith observation based on the joint distribution of (x,t); lt\x(0,cr|x;,£j) is the log-likelihood based on the conditional distribution of i|x; and Z X (7|XJ) is the contribution from the marginal distribution of x. Here we consider Weibull distribution for / ( t | x , /3, a) and multinomial distribution for g(x|7) to model field reliability data. 2.1.

The Weibull

Regression

Model

We assume the Weibull regression model in which the log-lifetime Y — log(T) follows a location-scale distribution with location parameter dependent on x, A*(x) = f3 x, and scale parameter a. Under this model, the density function of T given x can be written as f(t\x,P,cr)

= — exp at

Yiog(t)-M(x)\ _ exp /iogft)-Mxr

,i>0.(2)

For more detailed explanations of Weibull regression model, see [7]-[10]. In case of reliability data, especially in warranty claim data, the lifetime

379 is often censored. Conditional on covariates, for unit i, the log-likelihood function for (3 and a can be written as lt\x(fiMSi,^,U)

= 6i]og[f(U\xi,l3,a)]

+ (l-5i)\og[S(U\xi,0,
(3)

where S(U\xi,(3,a) = Pr(Ti > t|xj,/3,
of

Covariates

Suppose A denotes a set of all possible combinations of levels of covariate vectors for any individual and r denotes the number of elements in A. Let x(fe) denote the kth covariate vector in A, and rrik be the expected count in covariate class k, k = 1, • • • , r. Like [6], we assume that the covariates x = (xii"" • >xp)' a r e random variables that come from a discrete distribution with finite range parameterized by the vector 7 = (71, • • • , 7 r )', where "fk is the probability that an individual is of covariate type k, k = 1, • • • , r. Thus, the counts of individuals having each of the possible covariate assignments, mk, are multinomial(n, 7). The log-likelihood for 7 , Z x (7|x), can be expressed as n i=l

r—1

T—\

k ( 7 | x ) = Y2 / x (7|xi) oc ^

"ifc log(7fe) + (n-^2 fc=l

mk) log( 7 r ),

(4)

k=\

where (n — $ZI=i rrik) is the number of individuals belong to the last covariate-class r and 7 r = (1 — 5Zfe=i Ik)3. Parameter Estimation via the EM Algorithm Let Xi = (x 0 b Si j,x m i S ,i), where x0bs>i and x m i S i i denote the observed and missing components of Xj, respectively. Following [6] and [11], the E-step of the EM algorithm at the (s + l)st iteration can be written as n

Q(0\eW) = X > i ( 0 | 0 ( s ) ) = E[l

(OiU^Xilxobs^U^eM)

n

n

=E

E^^^^ix^^.^+E

E^( 0(s) )U7ixi), (5)

where j indexes all possible combinations of levels of covariates that subject i could be in and Wij(9^s') is the weights, which can be interpreted as the

380

posterior probabilities of the missing values [12] as w

ij(9iS))

= Pr[Xmis,i\Xobs,i,

U, Si, 0 W ]

f(ti,Si\-Xmis,iU)'Xobs,i,^{s),cr<-^)g(xmis,iU)^:x-obs,ih{S))

=

T,x

• -(i)

/gs {a

a)

:x

{a)

f(U,Si\y^is,iti)'Xobs4,0 \v( )9(^i8,iti)> -obs,i\'Y )

Assume that here we have created n^ new observations for each of the possible missing covariates for observation i, given the response U and the observed covariate x0&S)j. If N — Y17=i Ui denotes the total number of observations, then the double subscripted weights tUjj's can be replaced with a single subscripted weights, say, Vi, for i — 1, • • • ,N. In M-step, the first term of the expression of Q(0\0^s') given in Eq. (5) can be maximized by using any failure time regression program/package that allows weights for observations (e.g. Splus, R) or by applying NewtonRaphson iterative method to obtain 0 and <J( S + 1 ). The maximization s of the second term of Q(#|
381 the names of the unit, failure modes, and used-regions are not disclosed here to protect proprietary nature of the information. Our interest is to investigate how the age-based lifetime of the unit differs with respect to three categorical covariates: used region [Region (xi): Regionl=l, Region2=2, Region3=3, Region4=4], types of automobile which used the unit [Auto (x2): A u t o l = l , Auto2=2], and failure modes [Mode (x3): Model=1, Mode2=2, Mode3=3]. If a unit i fails within its warranty period, the age of the unit and the corresponding covariates Xj are known from warranty database. Prom warranty database the age-based number of censored units and their censoring times are also known, but the covariate values for censored units are unknown. We assume a Weibull model for age T, f(t, (5|x, /3, a), with location parameter dependent on covariates x, n(x) = /3'x= /? 0 +/?i#i+ #2^2 +/?3#3> and scale parameter a. The EM algorithm, discussed in the previous section, is applied to estimate the parameters and their asymptotic variances. There are r = 24 possible covariate classes for 3 covariates, therefore,
EM estimates for /3 and a for Weibull regression model

Parameter

Estimate

Std. Error

Z-value

p- value

A>

8.3664 -0.2583 -0.6265 0.7121 0.9375

0.3201 0.0559 0.1274 0.1420 0.0081

26.1361 -4.6201 -4.9184 5.0149 -7.7313 a

1.8565e-149 9.2429e-006 2.2278e-006 1.3801e-006 4.1830e-014

Pi P2

ft a Note:

a

95% Conf. Limits UCL LCL 7.7389 -0.3678 -0.8762 0.4338 0.9218

8.9938 -0.1487 -0.3768 0.9904 0.9535

Test statistic for H0: a = 1

In the above model, when x\ = X2 = £3 = 0, /?o becomes the estimate of /i where fi — \og(rj) and a gives the estimate of 1/m, where 77 and rh are the ML estimates of scale and shape parameters of two-parameter Weibull distribution without covariates. To make this comparison, we fit the two-parameter Weibull distribution without considering covariate ef-

382

fects by using the SPLIDA software [14]. This model gives fi = 7.6160 with approximate 95% confidence intervals [7.2041, 8.028] and a — 0.9383 with approximated 95% confidence intervals [0.8643, 1.0119]. These results of \i and a are very close respectively to the estimates of /?o and a given in Table 1, which roughly indicating that the ML estimates of the parameters of the model with covariates are acceptable. We apply a graphical method based on the examination of residuals to assess the adequacy of the distributional assumptions. For the assumed model, the standardized residuals or "censored Cox-Snell" residuals ([8], p. 443) is defined as log(tj) - /i(x)

1,2,-

€i = e x p

(8)

When ti is a censored observation, the corresponding residual is also censored and it can be estimated using the complete data residual summed over the missing data at the last iteration of the EM algorithm, like the estimation of the denominator of weights, Wij{9). Figure 1 shows the Weibull probability plots of the standardized residuals defined in (8). The plotted points are not particularly straight, but are roughly so. This suggests that the residual plot does not represent any serious departure from the Weibull distributional assumption in the model for the observed data.

.01 -

| —

.005 " .003 -

S

j S

0

* ^

.0005 -'

-g .0003 " £

••

.0002 -

• .* ••

.0001 .00005 " .00003 " .00002 -

•

.00001 " 0.0005

0.0010

0.0020

Standardized Residuals

Figure 1. Weibull probability plots of the standardized residuals

In many applications of Weibull distribution, interest centers on quantiles (or Bp life) rather than the distributional parameters. For example,

383

in the automobile industry of Japan, 5 1 0 is the most popular reliability index [15]. Table 2 shows the maximum likelihood estimates and normalapproximation 95% confidence intervals for 5 1 life, 5 5 life and 510 life at specified levels of covariates. Table 2. Estimates and confidence intervals for Bp life at specific conditions of covariates a

Covariates x X\

X2

1

1

1

3

2

2

Note:

a

Bp Life

Estimate

95% LCL

95%UCL

5 1 Life 5 5 Life 5 1 0 Life 5 1 Life 5 5 Life 5 1 0 Life

48.4629 223.3969 438.7016 31.4946 145.1791 285.0994

34.5702 158.8763 311.3918 24.6798 113.3928 222.1698

67.9386 314.1197 618.0610 40.1913 185.8759 365.8538

X3

Only 2 combinations of levels of x out of 24 are shown.

Table 2 shows that when covariates - used region, auto type and failure mode are fixed respectively as Regionl, Autol and Model, the ML estimate and 95% confidence intervals of 5 1 0 life are 438.7016 and [311.3918, 618.0610]. These estimates become 285.0994 and [222.1698, 365.8538] for covariates values - Region3, Auto2 and Mode2. Under the first combination of levels of covariates, the estimates imply that we are 95% confident that 10% of the units are expected to fail between the ages 311 and 618 (measurement unit is omitted). Industrial personnel who are responsible for reliability, safety and design decisions for the unit wanted to know weather a redesign would be needed to meet the design life specification for specified levels of the covariates. The results in Table 2 would be useful for this requirement.

5. Concluding Remarks To assess the reliability of components as a function of covariates has engendered considerable interest to both the manufacturers and buyers. To do this we proposed a Weibull regression model for the warranty claim data as a function of covariates. Because of missing covariates in warranty database for censored units, the EM algorithm is applied to estimate the parameters of the model and their confidence intervals. For the example component, we observed a strong evidence in favor of the dependency of lifetime on

384 three covariates. Standardized residual plot for checking model assumption suggested t h a t t h e Weibull distribution assumption in t h e model provides a reasonable fit t o t h e data. A simple comparison based on models with covariates and without covariates effects, indicated t h a t t h e ML estimates of t h e parameters of the regression model with missing covariates are acceptable. Estimates and confidence intervals for distribution quantiles at specific conditions of covariates are also presented. An important extension of future study is t o investigate an appropriate goodness-of-fit methodology for this problem. Also consideration of mileage accumulation r a t e as another variable in t h e model would b e useful in many applications. Finally, further investigation on a more general Weibull regression model in which b o t h t h e location and scale parameters depend on covariates will be worthwhile for the d a t a . Acknowledgment We are grateful for the support from the Japan Society for the Promotion of Science (JSPS) during this research. We are also grateful t o two anonymous referees for their comments and suggestions. References 1. J. F. Lawless, International Statistical Review, 66, 41 (1998). 2. K. Suzuki, M. R. Karim, and L. Wang, Handbook of Statistic: Advances in Reliability, Elsevier Science, Vol. 20, 585 (2001). 3. D. N. P. Murthy and I. Djamaludin, International Journal of Production Economics, 79, 231 (2002). 4. M. R. Karim and K. Suzuki, International Journal of Quality & Reliability Management, 22, 667 (2005). 5. A. P. Dempster, N. M. Laird and D. B. Rubin, Journal of the Royal Statistical Society B, 39, 1 (1977). 6. J. G. Ibrahim, Journal of the American Statistical Association, 85, 765 (1990). 7. R. L. Smith, Reliability Engineering and System Safety, 34, 55 (1991). 8. W. Q. Meeker and L. A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, Inc., New York (1998). 9. J. D. Kalbfleisch and R. L. Prentice, The statistical analysis of failure time data, (2nd Ed.), John Wiley & Sons, Inc., New Jersey (2002). 10. J. F. Lawless, Statistical Models and Methods for Lifetime Data, (2nd Ed.), John Wiley & Sons, Inc., New Jersey (2003). 11. S. R. Lipsitz and J. G. Ibrahim, Biometrika, 83, 916 (1996). 12. S. R. Lipsitz and J. G. Ibrahim, Lifetime Data Analysis, 2, 5 (1996). 13. T. A. Louis, Journal of the Royal Statistical Society B, 44, 226 (1982). 14. W. Q. Meeker, SPLIDA (Splus Life Data Analysis), Version 6.7.4, http://www.public.iastate.edu/~splida, (2005). 15. K. Suzuki, Technometrics, 27, 263 (1985).

IMPERFECT REPAIR POLICIES UNDER TWO-DIMENSIONAL WARRANTY WON YOUNG YUN Department of Industrial Engineering, Pusan National University, 30 Geumjeong-Gu, Busan, 609-735, Korea

Changjeon-Dong,

KYUNG MIN KANG Quality Assurance part, LG Company, 642 Jinpyung-Dong, Goomi, Gyungsangbuk-Do, 730-030, Korea For repairable items, the manufacturer is required to rectify all item failures through minimal repair, replacement, and imperfect repair, should failure occur within the period specified in the warranty. In this paper, we look at a new warranty servicing strategy that considers imperfect repair with two-dimensional warranty where the failed item is imperfectly repaired when it fails for the first time in a specified region of the warranty and all other failures are repaired minimally. We derive the optimal values for these to minimize the total expected warranty servicing cost. We compare the results with other strategies reported in the literature.

1. Introduction A warranty that fully or partially compensates the consumer in the event of a failure is a contract offered by a producer to a consumer to repairs a faulty item. It requires the manufacturer to rectify all item failures through minimal repair, replacement, and imperfect repair, should failure occur within the period specified in the warranty. In the case of two-dimensional (2D) warranties, a warranty is characterized by two-dimensions, with one axis representing time or age and the other representing item usage. The costs incurred in warranty servicing depend on the warranty servicing strategy. The optimal servicing strategy of manufacturers involves choosing the appropriate corrective maintenance actions for failure in order to minimize the expected warranty servicing costs. Therefore, we suggest a servicing strategy with imperfect repair in the 2D plan. Some early works on warranties are Blischke [1], Blischke and Murthy [2, 12], Chukova et al. [6]. A general treatment of warranty analysis is given by Blischke and Murthy[3, 4]. An 385

386

extensive survey on 2D warranties with a useful list of references can be found in Jack, Murthy and Iskandar [10]. In this paper, we extend the 2D warranty servicing strategy proposed in Iskandar, Murthy and Jack [9] by considering imperfect repair. 2. Model Formulation We consider the free replacement policy that requires the manufacturer to rectify all failures occurring under warranty and assume that the product is repairable so that the rectification of a failed item can be achieved through repair, replacement or imperfect repair. Notation F(t): the failure time cumulative distribution function. fit): the failure time probability density function. r(t): the hazard rate function. A (t): the intensity function. C0 : Average cost of each replacement. Cm: Average cost of each minimal repair (Cm < C0) Q : Cost of imperfect repair (=Cm +( C0 - Cm)(p8 +(1- p) S2)). (This is an increasing function and Q(0,t) = Cm, Q(l,t) = C0). 8 : reduction rate of imperfect repair at t, a decision variable, 0 < 8 < 1 ,

x
387

Xc{i)=RTc{t). (1) where R represents the usage rate that varies from user to user and a random variable with distribution function, G(R). We model item failures by a point process with an intensity function and let A (t\r) 51 denote the probability that the current working unit at time t will fail in the small interval [t, t + 8 i] given that R = r. Conditional on R = r, failures occur according to a Poisson process with an intensity function A(t\r), t>0, modeled by the relationship

Ht\r) =
(2)

where cp(t,x) is a non-decreasing function in both t and x. Murthy and Wilson [13] consider the following form: A(t\r) = 9Q+e,r + 02TC (t) + 6,XC {t),

(3)

Let N{t\r) denote the number of failures over [0, t] and T\\r the time to first item failure, conditional on R = r. The distribution function of T\\r is given by F(t r) = \-exp{-[A(x\r)dx}.

(4)

If all failures under warranty are rectified through minimal repair (see Barlow and Proschan [1]), conditional on R = r, we have Tc(t) = t, Xc(t) = rt, and N(t\r) is a non-homogeneous Poisson process with intensity function A{t\r). 2.2. Imperfect Repair Strategy Two repair-replace strategies (1 and 2) are studied in Iskandar and Murthy [8]. In both strategies, the warranty region is divided into two sub-regions, Q \ and Q. 2. In Strategy 1, all failures occurring in Q i are rectified by replacement and any failure in Ci. 2 is rectified by minimal repair. Replacing all failed items occurring in Q i will result in decreasing the number of failures. In Strategy 2, all failures occurring in D. i are rectified by minimal repair and any failure in Q 2 is rectified by replacement. Repair-replace strategies (Strategy 3) are studied in Iskandar, Murthy, and Jack [9], The following strategy is therefore proposed. The warranty region is divided into three disjoint sets, £ ) , , Q 2, and Q. 3 (see Fig. 1). Under this strategy, all failures in Q , are minimally repaired, the first failure in Q 2 is rectified through replacement and subsequent failures in this region are rectified through minimal repair, and all failures in Q 3 are always minimally repaired

388 usage U

U2 Ux

Wi

"

W2

W

a

8e

Figure 1. repair-replacement regions

In this paper, the warranty region is divided into three disjoint sets, CI 1; Q 2, and Q 3, such that Q i U Q 2 U Q 3 =Q (see Fig. 1). Under this strategy, 1. 2. 3.

all failures in Q i are minimally repaired; the first failure in Q 2 is rectified through imperfect repair and subsequent failures in this region are rectified through minimal repair; and all failures in Q. 3 are always minimally repaired.

Therefore, it might be more economical to modify this strategy by imperfectly repairing only the first failure in Q 2, instead of replacing the first failed items in Q 2, since this action would be sufficient to decrease the failure rate of the item in this region. We need to split Region Q. 2 into two sub-regions. Imperfect repair of the first failure in Q 2 would not be economical if the age, t (or the usage, w) at the failure is very close to W (or U), as it is then very close to the expiry of the warranty. Let ={WU W2, U\, U2} represent the set of parameters of the repairimperfect repair strategy, and EC( 0 ) the expected warranty servicing cost per item sold. The objective is to obtain the optimal ^ that minimizes EC( ). We restrict our analysis to the case in which Regions Q., and Q i U Q 2 are similar in shape. Let r2 = U21 W2 = £/, / Wu so that the warranty servicing strategy is characterized by the three-parameter set ^ ={WU W2, 8 }. 3. Model Analysis We assume the following:

389 • • • •

All failures during the warranty period are repairable(minimally or imperfectly). Minimal repair brings the failed product to the state just before failure. Imperfect repair brings the failed product to the state with a reduced failure rate function. Repair times are negligible.

We obtain EC( ) by using a conditioning argument. Let ECT( 0 ) denote the expected warranty cost conditional on R = r. Note that the subscript corresponds to conditioning on a given r. If we define r\ = UI W, we need to consider the two cases (1) r2 < r\ and (2) r2 > r\ separately. In this paper, we summarize the total expected cost for the first case. It is easy to obtain the cost function of the second case using the procedure given. We should consider several cases to obtain the total expected cost rate even in the first case. Case \r < r2\ In this case, the warranty ceases at time W. The expected warranty cost conditional on R = r, ECr( <j> ), is the sum of the expected conditional warranty costs over the three periods, which are [0, W{\, [Wu W2], and [W2, W\. Let T\{W\\r) denote the time at which the first failure occurs after W\ conditional on R = r. The distribution function of T^{W\\r) is given by .

F{t\r)-F{W.\r)

We derive the expression for ECT( ) by conditioning on T\(Wi\r) = ti and then remove the condition. Given that, we need to consider two sub-cases: f, < W2 and

tx>W2.

For the first sub-case, no failure is over [W\, tx] and failures over the remaining period occur according to an NHPP with intensity function given by r(,k)=/eH 1

os«»;

X{t-5t\r),

p,

tx
This occurs with probability

fMr)dt=

/(

'lr),

dt.

As a result, the conditional expected cost is given by

(8)

390

ECr(W„W2,S\t,<W2) = Cm Jp Z(t\r)dt + Cm ^ [A(t - Stl \r)]dt + Ci

(g)

= Cm j p A(t\r)dt + Cm f [A,(t - Stx\r)]dt + Cm + (Co - Cm)(pS + (1 -

p)S2).

If we derive another conditional expected cost rate, the cost function is given by ECr{Wx,W2,S) = ECr(Wx,W2,S\tx <W2)fx(/,\r)dtl +

ECr(Wl,W2,S\t]>W2)[l-F](W2\r)]

= £' (Cm + (Co - Cm)(pS + (1 - p)S2) + Cm f X(t\r)dt + Cm | " [X(t - 8tx \r)]dt) * ^ ' J ^ i + (Cm f A(/|r)df + Cm £ X(t\r)dt) * [ ' / J ^

<*.

•

We summarize the results for other cases as follows: Case : |r 2 < r < rj] ECr(h,r2,8) = £ C r ( r „ r 2 , 4 , < T2)f,(t\r)dtx+ECr(h,r2,S\t,

> r2)[l-^(r2|r)]

= [ 2 (Cm + (Co - Cm)(pS + (1 - p) S2) + Cm f A(/|r)df + Cm f [*(/ - A,|r)]dif) '' , df •o •»! 1-F(ff,|r) + (Cm f' A(t\r)dt + Cm (A(t\r)dt) * 1 _ ,' . •o ' *J 1-F(r,|r) Case [r > rj]

(11)

391 ECr(jvT2,8) = £ C r ( r „ r 2 , 4 , < r2)fx(t\r)dt, +ECr(Ti,T2,J[tl > T2)[l-Fx(r2\r)] ^(Cm+(Co-Cm)(pS+(\-p)S2)

= +

Cm[X{t\r)dt+Cm\[X{t-a\r^t)*^^dt,

+(Cm f' X{t\r)dt+Cm f A(t\r)dt) * 1 ~ F W . •b ' *i l-F(rJr) (12) Finally, on removing the condition on R, the expected warranty cost is given by EC^)=[ECr(Wx,W2,5)g(r)dr +

h

[ECr{T„t2,5)g(r)dr+[ECr(h,T1,S)g{r)dr. *>

(13)

We can obtain the expected warranty cost numerically for given failure distributions. 4. Conclusions In this paper, we have reviewed repair-replace strategies for a repairable item sold with a two-dimensional warranty and proposed a new strategy with imperfect repair. The strategy is characterized by three parameters, and these can be selected to minimize the expected warranty cost. We proposed a procedure to obtain the total expected cost. A comparison of the new policy should be made with other strategies — 'always repair', 'always replace', 'strategy 1', 'strategy 2', and 'strategy 3' with numerical examples. Acknowledgments This work was supported by the Regional Research Centers Program (Research Center for Logistics Information Technology), granted by the Korean Ministry of Education & Human Resources Development References 1. W. R. Blischke, Mathematical models for analysis of warranty policies. Mathematical and Computer Modelling 7, 1-16.(1991).

392 2.

3. 4. 5. 6. 7.

8.

9.

10.

11. 12.

13.

W. R. Blischke, D. N. P. Murthy, Product warranty management—/: A taxonomy for warranty policies, Research Paper DS-90-47, School of Business Administration, USC, (1991). W. R. Blischke, D.N.P. Murthy, Warranty Cost Analysis, Marcel Dekker, New York, (1993). W. R. Blischke, D. N. P. Murthy, Product Warranty Handbook, Marcel Dekker, New York, (1996). M. Brown, F. Proshan, Imperfect repair. Journal of Applied Probability 20, 851-859.(1983) C. D. R. Chukova, S. B. Dimitrov, V. Rykov, Warranty analysis: A survey. Journal of Soviet Mathematics 67 (6), 3486-3508(1993). B. P. Iskandar, Modelling and analysis of two dimensional warranty policies, Unpublished PhD thesis, The University of Queensland, Brisbane, Australia, (1993). B. P. Iskandar, D. N. P. Murthy, Repair-Replace Strategies for TwoDimensional Warranty Policies, Mathematical and Computer Modelling 38, 1233-1241(2003). B. P. Isknadar, D. N. P. Murthy, N. Jack, A new repair-replace strategy for items sold with a two-dimensional warranty, Computers & Operations Research 32, 669-682(2005). N. Jack, D. N. P. Murthy, B. P. Iskandar, Comments on Maintenance policies with two-dimensional warranty, Reliability Engineering and System Safety 82, 105-109(2003). Moskowitz H, Y. H. Chun, A Poisson regression model for two attribute warranty policy, Naval Research Logistics 41, 355-376(1994). D. N. P. Murthy, W. R. Blischke, Product warranty management—///: A review of mathematical models, Research Paper DS-91-28, School of Business Administration, USC, (1991). D. N. P. Murthy, R. J. Wilson, Modelling two-dimensional failure free warranties, Proceedings of the Fifth Symposium on Applied Stochastic Models and Data Analysis, Granada, Spain, (1991).

PARTY SOFTWARE RELIABILITY

This page is intentionally left blank

BIVARIATE E X T E N S I O N OF SOFTWARE RELIABILITY MODELING W I T H N U M B E R OF T E S T CASES *

T . ISrfflt, T . F U J I W A R A * A N D T . D O H I t ' Department of Information Engineering, Hiroshima University Higashi-Hiroshima 739-8527, Japan: [email protected] Second Development Division II, Fujitsu Peripherals Limited, Inc, Japan Katoh-Gun, Hyogo 673-1447, Japan: [email protected]

In this paper we consider a software reliability model (SRM) depending on the number of test cases executed in software testing. The resulting SRM is based on a two-dimensional discrete non-homogeneous Poisson process (NHPP) and is considered as a bivariate extension of the usual NHPP-based SRM by taking account of two time scales; calendar time and number of test cases executed. We apply the Marshall and Olkin's bivariate geometric distribution and develop a twodimensional discrete geometric SRM. In a numerical example with real software fault data observed in a real development project, we investigate the goodnessof-fit for the proposed SRM and refer to an applicability to the actual software reliability assessment.

1. Introduction The reliable software plays a central role to develop the dependable and high assurance systems. Since the debugging cycle times of software are often reduced due to smaller release time requirements, the accurate estimation of software reliability tends to be more important day by day, especially, in the earlier testing phase. Software reliability models (SRMs) are used to measure the software reliability and to control quantitatively the software testing process 1 . Since the software reliability is defined as the probability that software errors caused by faults do not occur for a specified time period, the time evolution of error-occurrence (fault-detection) process in software testing should be modeled by any stochastic counting process. In fact, a huge number of SRMs have been extensively developed during the last three decades, to help us in estimating the number of initial fault contents "This work is supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research; Grant No. (B) 16310116 (2004-2006).

395

396 and understanding the effect of errors on software operation as well as in predicting the software reliability. In this paper, we focus on discrete SRMs (D-SRMs) based on the discrete non-homogeneous Poisson process (D-NHPP). First the D-SRMs are proposed by Yamada et al. 2 ' 3 as a simple alternative of the continuous-time SRM. Scholz4 proposes a multinomial D-SRM and discuss its parameter estimation method. Recently, Okamura et al. 5 unify all the NHPP-based D-SRMs and develop a unified parameter estimation algorithm based on the EM (Expectation-Maximization) principle. In fact we frequently encounter the situation in which the number of software faults experienced before should be a function of the numbers of test runs. Alternatively, in practice, it is not always easy to monitor the software test execution in continuous time. In other words, we count the number of software errors or faults experienced in the testing phase at the discrete calendar time like hour, day and week. Then, the D-SRMs will be useful to assess the reliability of software product. This paper is motivated to incorporate two time scales; calendar date and number of test cases, at the same time in assessing the software reliability. It is clear that the number of software test cases consumed during the testing phase strongly depends on the reliability of software product. On the other hand, as well recognized in the reliability engineering community, the reliability growth with respect to calendar time is a representative phenomenon for testing software products. Actually, one can often observe in the real testing phase the software fault data which consists of the calendar date, the number of test cases and the number of detected software faults. For such a case, these parameters should be utilized effectively to assess the software reliability with higher accuracy. In general the D-SRMs can be used for two cases; the number of detected software faults on each calendar date and the number of detected software faults on each test case. The SRM proposed in this paper is based on a two-dimensional D-NHPP and is considered as a discrete counting process with two time scales. We apply the Marshall and Olkin's bivariate geometric distribution 7 and develop a bivariate extension of the usual geometric SRM 2 ' 3 . In a numerical example with real software fault data observed in a real development project, we investigate the goodness-of-fit for the proposed SRM and refer to an applicability to the actual software reliability assessment.

397 2. Bivariate Software Fault Model 2.1. Model

Description

Suppose that: (A-l) The software fault detected in testing phase is instantly fixed and removed. (A-2) The number of initial fault contents in the program, N (> 0), is given by a non-negative discrete random variable having the probability mass function (p.m.f.) gn = Pr{N = n}. (A-3) Each software fault can be detected at calendar time T (= 1, 2, • • •) with the cumulative number of test cases S (= 1,2, •••), where (T, S) is the bivariate discrete random variable having the joint cumulative distribution function (c.d.f.) F(r,s) = Pr{T < T,S < s}. We call the above bivariate c.d.f. the fault-detection probability. From the assumptions (A-l)-(A-3), it can be seen that the total number of software faults detected by the bivariate time point (r, s) is given by the univariate random variable X{T,S) having the conditional p.m.f.: Pv{X{r,s)

= x\N

= n} = Q F l v f i l - F f v ) } " -

1

.

(1)

Since Eq.(l) is a simple binomial p.m.f., it is straightforward to see that E[X(r, a)\N

= n] = nF(r, s),

Var[X(r,s) | N = n] =

TIF(T,S){1

(2) -

(3)

F(T,S)}.

It should be noted in the bivariate c.d.f. that ^(00,00) = 1, F(0,s) = = F(0,0) = 0. Define the marginals by F(r,oo) = FT(T) and F(oo, s) = Fs(s). Then, we have F(T,0)

Pr{T >

T, S

> s} = 1 -

FT{T)

- Fs{s) + F{T,

S)

/ 1-

F{T, S).

(4)

Further, we make the following additional assumption: (A-2') The number of initial fault contents in the program, N (> 0), is the Poisson distributed random variable with the p.m.f. gn = tunexp(-uj)/n\, where UJ (> 0) is the mean number of initial fault contents. By replacing (A-2) by (A-2'), we have 00

Pr{X(r, s) = x) = J2 P r { * ( r , s) = x | N = n}g(n) n=0

398

=

i^)l! e x p { _^ ( r , s ) } ,

(5)

which is the D-NHPP with mean value function A(r, s) — ~E[X(T, S)] = Var[X(r,s)] = U>F(T,S). In the D-NHPP with two time scales, {T,S), if either T or S takes an extremely large value, i.e., if a number of test cases are consumed during an extremely short period or very a few test cases are tried for an extremely long period, then the fault-detection probability takes small value. Also, it is worth noting that F(T,S) = FT{T)FS(S) if T and S are statistically independent from each other. This implies that the cumulative number of test cases has any corelation with the length of software testing, and seems to be a quite reasonable assumption. In fact, we expect in modeling that the functions E[S | T = r] and E[T | S = s] are both increasing functions of r and s, respectively. 2.2. Special Case:

Geometric

SRM

The most simple but significant fault-detection probability is the bivariate geometric distribution. Following Marshall and Olkin 7 , consider a bivariate discrete random variable (X, Y) with Bernoulli marginals. Then it must be that (X, Y) has only four possible values (0,0), (1,0), (0,1), (1,1) with probabilities po, Pi, Vi, 1 ~Po> respectively, where Cov(X, Y) = po(l - Po) ~ P1P2

(6)

and the corelation coefficient is given by C

°V(*'Y) (7) V(l-P0+Pl)(P0-Pl)(l-P0+P2)(P0-P2) When p = 0, it is obvious that X and Y are statistically independent. For a sequence (X\, Y\), (X2, Y2), • •• of independent and identically distributed bivariate Bernoulli random variables, let T and S denote the number of 0's before the first 1 in the sequence X\, X2, • • •, and in the sequence Yj, Y2, • • •, respectively. Then (T, S) is said the bivariate geometric distributed random variable having the p.m.f.: X

;

\p5(l-po),

0
W

From Eq.(8), it can be seen that Pr{T>T,S>s}=pUp0-p2)s-T, Pr{T < T,S < s} = 1 +pT0-1(Po - p 2 ) s - r { l

(9) Pi(l

-P0+P2)} P0 - P 2

399 Table 1. Model 1 LLF AIC BIC RSS Model 2 (calendar date) LLF AIC BIC RSS Model 3 (number of test cases) LLF AIC BIC RSS

Goodness-of-fit test results

25% -2.4844 12.9688 10.5140 0.1999

50% -12.8818 33.7775 34.5524 1.6122

75% -22.2887 52.5775 55.1337 2.1889

90% -23.8827 55.7654 58.8557 1.9017

100% -25.1100 58.2201 61.7815 2.0168

25% -3.1815 10.3630 9.1353 0.0598

50% -13.5496 31.0991 31.4936 1.7164

75% -22.6226 49.2451 50.5233 2.3807

90% -24.5282 53.0564 54.6016 1.7268

100% -26.1120 56.2240 58.0048 2.0519

25% -5.9631 15.9262 14.6988 0.2251

50% -14.1616 32.3231 32.7176 3.3851

75% -32.9861 69.9722 71.2504 15.8660

90% -38.2754 80.5508 82.0960 14.8416

100% -39.7471 83.4943 85.2750 9.0009

jDl(l - P 0 + P 2 ) '

-Po^fl

PO - P2 > I \s-lf-, Pl(!-P0+P2)\ , 1fV . -P0-P2) \1 \(10) *• P0 - P2 > T h e above bivariate geometric distribution is first considered b y Hawkes 6 . More specifically, Hawkes 6 gets t h e following factorial moment generating function: Ef

T s

[ZlZ2i

] = 1~P°~(Zl

+z2)Cov(X,Y)-ziz2(piP2-p0Cov(X,Y))

U-(PO-Pl)*l)(l-(PO-P2)^)(l-P0*l*2)

'

[

'

By differentiating Eq. (11) it can be verified t h a t t h e corelation between T and S is given by p\/{l - po + p i ) ( l - po + P 2 ) / ( l - Po)- Directly from t h e p.m.f. in Eq. (8), we have t h e regression function:

_P0 Pi

/

P0 y P0-P2

(l-po)Cov(X,y) Pl(l-P0+Pl)(l-P0+P2)'

which is useful t o estimate t h e number of test cases in t h e future if it can not be known in advance. By substituting Eq.(10) into Eq.(5), the twodimensional D-SRM with mean value function A(r, s) = LJF(T,S)

can b e

derived analytically. Because this is also a simple N H P P , its analytical

400

treatment to derive the likelihood function etc. is almost similar to the one-dimensional case and is quite easy.

(a) Model I

Calendar date (b) Model 2

Number of tesl cases ( c ) Maic\

;

Figure 1. Behavior of mean value functions.

3. Real Data Analysis Suppose that 18 data sets (r^, s;, Xi) (i — 1,2, • • • , 18) which are observed in the real software development project are available, where T, (= i), s, and Xi denote the calendar date, the cumulative number of test cases consumed by i-th date and the cumulative number of software faults detected by i-th date, respectively. It is also assumed that three SRMs based on the D-NHPPs with mean value functions; UIF(T,S) (Model 1), UJFT{T) (Model 2) and u>Fs(s) (Model 3) are used for the data sets (r^s^Xi), (T^X*) and (si,Xi), respectively. We apply the maximum likelihood estimation to estimate the model parameters. Table 1 presents the goodness-of-fit results based on the log likelihood function (LLF), AIC (Akaike information criterion), BIC (Bayesian information criterion) and residual sum of square (RSS). Figure 1 depicts the behavior of the mean value functions for three SRMs with estimated model parameters from 100% observations (18 data sets). From the results in Table 1, Model 1 outperforms Model 2 and Model 3

401 Table 2. Model 1 (known) PLL PSE (regression) PLL PSE Model 2 (calendar) PLL PSE Model 3 (number of tese cases) PLL PSE

Predictive performance.

25% -45.8870 62.5045 25% -45.8870 62.5045

50% -12.4966 7.2398 50% -12.4966 7.2398

75% -3.2864 0.9332 75% -3.2864 0.9332

90% -1.2493 0.1002 90% -1.2493 0.1002

25% -57.3280 67.8175

50% -12.6227 2.6266

75% -3.7873 2.8189

90% -1.7498 1.9135

25% -85.7262 73.6092

50% -24.4849 19.6366

75% -4.5548 0.9074

90% -0.0223 1.9043

in terms of the maximum log likelihood functions. This is a quite reasonable result because Model 1 involves additional two parameters comparing with Model 2 and Model 3. Nevertheless, it is worth noting that the difference on AIC and BIC between Model 1 and Models 2 & 3 is at most 2, which is corresponding for just one extra parameter in AIC. This result tells us that Model 1 could explain the underlying software fault data well. For instance, when 100% data are observed, we obtain the maximum likelihood estimates as w = 25.1602, pi = 0.014986, p2 = 1 x lO" 6 and p0 = 0.95529 for Model 1. On the other hand, we have CJ = 49.1821 and px = 0.018439 for Model 2 and u) = 3.611 x 106 and p 2 = 2.783 x 10~9 for Model 3. Hence, the corelation between T and S in this data is closed to one, but the parameter p2 is quite insensitive to the model prediction. That is, our two-dimensional SRM can involve both special cases completely and can show the dependency of the number of test cases in software reliability assessment quantitatively. Next, we investigate the predictive performance of the resulting SRM. Once the model parameters are estimated, we predict the future behavior of the cumulative number of faults based on the mean value function. Table 2 presents the predictive model performance for three SRMs, where PLL and PSE denote the prediction log likelihood and the prediction squared error, respectively. In Model 1, 'known' and 'regression' correspond to the cases that the number of test cases is completely known and that it is unknown but can be estimated by the regression equation in Eq.(12). From Table 2, it is seen that Model 1 can provide the best or second best predictive

402 performance in each observation point with different criterion. In Fig. 2, we plot the software reliability as a function of calendar time. Compared Model 1 with Model 2, the former tends to give the more optimistic reliability estimation t h a n the latter.

.

1

j... " % 5 0.6

•

• Model 1

•

• Model 2

X~\^

J" l»

^ """-•

0.2 0

"\ \

•3 0.6

"**--

l0.4

1

1

• Model 1

•

-• Mode! 2

V\

'•^\^

0.2

~~»-7C^~—^^^^

d

2

•

1

2

3 4 Calendar Dale

(.'ale ndar Dale (b) 75% data point

(a) 90% data point

1

<

•

l

• Model t

• - - - - - • Mode! 2

f o ,

\

= U.b 2

v•x

*,

1 0.6

*.

s

s^»

lo.4

o,

|04

1 0.2

2

• Model 1

•

-• Model 2

• ••

* " * ' " • - •

3

"V 5

•

0.2 4

(c) 50% data point

6

8 Calendar Dale

0

2

4

6 (lj)

8

10

12 14 Calendar Dale

25% data point

Figure 2. Behavior of software reliability.

References 1. H. Pham, Software Reliability, Springer, Singapore (2000). 2. S. Yamada, S. Osaki and H. Narihisa, Software reliability growth modeling with number of test runs, Trans, of IECE of Japan, E-67, 79-83 (1984). 3. S. Yamada and S. Osaki, Discrete software reliability growth models, Applied Stochastic Models and Data Analysis, 1, 65-77 (1985). 4. F. W. Scholz, Software reliability modeling and analysis, IEEE Trans, on Software Eng., SE-12, 25-31 (1986) 5. H. Okamura, A. Murayama and T. Dohi, A unified parameter estimation algorithm for discrete software reliability models, Opsearch (in press). 6. A. G. Hawkes, A bivariate exponential distribution with applications to reliability, J. Roy. Statist. Soc. Ser. B, 34, 129-131 (1972). 7. A. W. Marshall, I. Olkin, A family of bivariate distributions generated by the bivariate Bernoulli distribution, J. Amer. Statist. Assoc, 80, 332-338 (1985).

N O N H O M O G E N E O U S POISSON PROCESSES B A S E D ON B E T A - M I X T U R E S IN SOFTWARE RELIABILITY MODELS

DAE KYUNG KIM Division of Mathematics and Statistical Informatics Chonbuk National University, Chonju 561-756, Korea E-mail: dkkimQ chonbuk. ac.kr IN-KWON YEO Department of Statistics Sookmyung women's University, Seoul 561-756, E-mail: [email protected]

Korea

DONG HO PARK Department of Information and Statistics Hallym University, Chunchon, 200-702, Korea E-mail: dhparkQhallym. ac.kr

This paper deals with the software reliability model based on a nonhomogeneous Poisson process. We introduce a new family of mean value functions which can be either NHPP-I or NHPP-II according to the choice of the distribution function. The proposed mean value function is motivated by the fact that a strictly monotone increasing function can be modelled by a distribution function and that an unknown distribution function can be also approximated by a mixture of beta distributions. Many existing mean value functions can be regarded as special cases of the proposed mean value functions. The maximum likelihood approach is used to estimate the parameters contained in the proposed model.

1. Introduction The software plays a critical role not only in the fields of sciences and businesses, but also in daily life where electronic devices such as cars, telephones and television sets are operated by executing the relevant software systems. Although the advanced techniques have enhanced the production of fault-free software system, a number of softwares which are required to operate with high reliability still need to undergo extensive testing and debugging procedures. While it was believed in general in the past years that 403

404

the system failure is mainly caused by the defective hardware or the poor maintenance of the system, the software failure has become a dominant cause of the system unavailability lately. Thus, the accurate assessment of software reliability becomes an important objective of software engineers when developing a high-quality software system. Many software reliability models have been proposed and studied in the literature to analyze the observed failure data during the testing phase and to assess the reliability of the developed software system. A common procedure for modelling the number of failures of a software is to fit an appropriate nonhomogeneous Poisson process (NHPP) with the available failure data. Let N(t) be the number of failures of the software system observed over the time interval (0,t\. We assume that N(t) is modelled by a nonhomogeneous Poisson process with mean value function m(t). The number of failures,N(t), is also specified by its intensity function A(£), which is the derivative of m(t) with respect to t. Then, the distribution of N(t) is given as p-m{t)

P{Nit)=n}==l

m(f\n

_pi_,

n= 0)i,2,....

Many different choices for m{t) and A(i) have been developed and studied in the literature, for instance, Duane (1964) process with A(£) = a/3i a _ 1 , Cox and Lewis (1966) with \(t) = exp(a + (3t), Goel and Okumoto (1979) with m(t) = 6(1- e-P*), Goel (1985) with m(t) =9(1e-^"), and Musa and Okumoto (1984) with A(£) = a/(t + a). These functions can be in general classified into the following two classes; • NHPP-I : lim.m(f) = c < oo, t—*oo

• NHPP-II: lim m(t) = oo. t—>oo

In this paper, a new type of mean value function which is based on mixtures of beta distribution functions is proposed. The motivation for the proposed mean value function is that a strictly monotone increasing function can be modelled by a distribution function. The mixture model approach models an unknown distribution function using a dense class of mixtures of standard distributions. The EM algorithm is introduced to obtain the maximum likelihood estimates of the parameters characterizing the proposed model. In addition, we also compare the proposed model with several existing NHPP models using several criteria by fitting a real testing data set reported by Pham and Zhang (1997) and Tohma et aL(1991).

405 2. Mean value functions based on beta mixtures For the purpose of searching for the mean value function that is increasing in time t, it is helpful to recall that modelling a strictly monotone function is equivalent to modelling a cumulative distribution function (CDF), for instance see Mallick and Gelfand(1994). As argued by Diaconis and Ylvisaker (1985), the discrete mixtures of beta densities provide a continuous dense class of models for densities on [0,1]. This implies that an unknown distribution function can be approximated by a mixture of beta distribution functions. Let B(u;c,d) be the incomplete beta function evaluated at u with parameters c and d, i.e., the beta CDF. Note that m(t) is a monotone increasing function on R+. Since B{u;c,d) provides various patterns of an increasing function on [0,1] by appropriate choice of c and d and a value on [0,1] can have an one-to-one mapping to a value on R+ by a quantile function, a rich class of mean value functions can be obtained by taking the form

m(t) = G-1 hf>,£{Fo(i);a,,/3,};/i

(1)

where L is the number of mixands and wj's are the mixing weights, wj > 0 and Yli=i w* = 1- The function F0 denotes a centering distribution function and it is assumed that FQ can be chosen so that the resultant m(t) is wellbehaved. Without any prior information about FQ, the uniform distribution on (0, T) is a naive choice for the time truncated model with a terminal time T. The function G~l is the inverse function of a continuous CDF G which is indexed by a parameter fj,. We can make m(t) either NHPP-I class or NHPP-II class according to the choice of G. Note that the distribution G whose support is bounded interval on R+ leads m(t) to belong to NHPP-I and G supported on the whole R+ leads to the NHPP-II class. Since the specification of u> = (u>i,.. .,u>L)T, a = (a%,. •. ,az,) T ', and (3 = (/?i,... ,/3L)T is too complicated to deal with, as was done by Mallick and Gelfand (1994) we set ai - al and ft = a(L + 1 - /). Then, the resultant m(t) can be written as m(t;d)

= G _ 1 J2wiB{F0(t);al,a{L

+

l-l)};n

(2)

i=i

where 6 = (u>, a, n)T. B {F0; al, a(L + 1 — 1)} has various shapes depending on the values of a and /. Although the large values of L may provide

406

more delicate and precise descriptions for m(t), L = 3,4, and 5 may be sufficient enough in practical situations. The mean value function (2) provides various and flexible shapes to fit a given data. However, it may not be easy to interpret its behaviors, mainly because m(t) consists of the mixture of beta CDFs. Note that we can make m(t) much simpler by setting, for a specific I, uii = 1 and wt, = 0 for all k ^= I. By doing so, we do not need to estimate w, but instead we only need to select the beta CDF which describes the behavior best out of L beta CDFs. The resulting mean value function for the selection of I becomes mi{t; 9) = G - 1 [B {F0{t); al, a{L + 1 - I)} ; /i],

(3)

and the parameters to be estimated reduce to (/x, a) 3. Estimation of Parameters Usually, modelling the software reliability growth models are based on using either interval domain data or time domain data. For the interval domain data, sometimes called grouped data or error count data, we have the cumulative number of failures of software n^ that have been observed during the time intervals (0,tj], i = 1,2, . . . , n and 0 < t\ < i 2 < • • • < tn. In this case, the log-likelihood function for the interval domain data can be written as n

U 0 ) = X)(«i - m-i)log{m(ti;0)

- m(ti-i;0)}

- m(tn;9),

(4)

t=i

where no = 0 and to = 0. For the time domain data, sometimes called interfailure data or failure times data, we obtain the times ti, i = 1,2,..., n, that denote ordered epochs of the observed n failure times, with possibly the terminal time T. Then, the corresponding log-likelihood function is written as JnW = 5 3 l o g { A ( t i ; e ) } - m ( t n ; f l ) ,

(5)

i=i

where A(i; 9) = dm(t; 9)/dt denotes the failure rate function or intensity function and m(tn; 9) is replaced by m(T; 9) for the time truncated model. Let b{-; a, /?) be the beta density function with shape parameters a and ft, and let g and /o be the density functions of G and FQ, respectively. Taking derivatives on both sides of L

G {m(t); fi} = ] T uiB{F0{t); al, i=i

a(L+l-l)},

407

we derive the intensity function

u* m _ ELi"iHFo(th^°(L

+ l ~ l)}fo(t)

Notice that, for the mean value function (2), subject to the parameter space, for I = 1,...,L, wj > 0 and J ] f = i w ' = * » w e should apply the method of Lagrange's undetermined multipliers to obtain the maximum likelihood estimators for 0 as follows.

Q(o,\) = in(o) +

\(jrwi-i\.

In this case, the maximization by the Newton-Raphson algorithm is prone to failure because of the constraints on wj's. For the mean value function (3), we compute the maximizer (fj,,cr) of (4) or (5) for each / and then select the maximum likelihood estimates out of L maximizers. This procedure is much easier to implement in practice. When G is taken as an uniform distribution, the EM algorithm may be applied to estimate the parameters. The EM algorithm formulated by Dempster et al. (1977) is a very useful tool to obtain the maximum likelihood estimate for the case where analysis of complete data is relatively simpler than that of incomplete data. We illustrate this procedure with the interval domain data. For a simple notation, let Xi = N(ti) — N(ti^i) be the number of failures over the time interval (tj_i,ti] and let Su = mi{ti\0) — mi(U-\;0). When G is uniform on (0,fj), m(t;6) = nY,t=iUiP> {FQ(t);al,a(L + 1 - / ) } = £)f = 1 wjmi(i;0) and one possible interpretation of m(t; 0) is that Xi comes from one of the L Poisson processes with each mean Su and probability OJ/, / = 1,2,... ,L. We define the complete data Zt = (Xi, Yi), where Yt denotes the indicator of where Xt is coming from. Then, the marginal probability of Yi becomes P(Yt = I) = LOI and the joint probability density function of Zi = (Xt,Yi) is, for X; = 0 , 1 , . . . , and j/i — 1,2,...,L, P(Xi = xu Yi = ^

= P(Y = Vi)P(Xi

= Xl\Yi = yi) = ioyi6

V ,6 yli ' ] .

Thus, the normalized log-likelihood function of 6 given z = (zi,..., obtained as n

ln(0;z)

L

= ^2YlI(Vi i = l (=1

= I) {~5u + Xi\og(SH) + \og(uji)} ,

zn) is

408

where I(yi = I) — 1 if yi = I and zero otherwise. With starting value 0^°\ the E-step computes the conditional expectation Q(o\x,0io))=E{ln(O;Z)\x,oW}

Q{O) = n

L

= J2J2 (~6ii + X i l o s(*»)+ l o s(^)} p {Y* = ^ > 0 ( o ) ) • 2=1

1=1

Applying the Bayes theorem, we obtain ,,(0) „ - 5 < 0 ) l ( 0 ) X i

uu ^->3 = l

3

3*

In the M-step, the conditional expectation Q{6) can be divided into the weighted log-likelihood functions ^™=i YA=I &U {—$U + xi l°g(^j)} and 127=1 Xw=i &u log(u;i), and the update of (fi, a) and u> is based on maximizing these functions separately. The updated values for the mixing weight becomes UJ\ = n _ 1 Yl"-i un- Then, we go back to the E-step with the updated value and iterate the same process until the convergence occurs. 4. D a t a Analyses and Model Comparisons The following four mean value functions are considered as testing functions for the purpose of comparisons. • Model 1: m(t) = a (l - e" 6 ') • Model 2: m(t) = a (l - e~bt) (l - ^) + aat

.Modd3:mW =

I T

|

F 5

{(l-e-

w

)(l-^)+a*}.

• Model 4:

Model 1 was discussed by Goel and Okumoto (1979), Model 2 by Yamada, Tokuno, and Osaki (1992), Model 3 by Pham and Nordmann (1997), and Model 4 by Zhang and Pham (1998). Mean squared error(MSE) and predictive-ratio risk(PRR) are used as the criteria to measure the performance of each model. For detailed discussions for those performance measures, see Pham and Chao (2003). These measures are defined as follows, respectively:

M S B - j ^ t w w - * M ) " .

^ ' t l ^ p

1

}

2

.

409 where m{t) stands for the estimated cumulative number of failures observed at time t and k represents the number of parameters in the models. For both criteria, the smaller the value is, the better the model fits. Pham and Zhang (1997) reported the data set which consists of 25 observations as the number of failures. In order to analyze this data set, it is assumed that the centering distribution F0 is uniform over the interval (0,28) and G is uniform over (0,/i). Letting L = 3 and using the EMalgorithm, we obtain that a> = (0.998,0.000,0.002), pi = 137.524, and a = 0.579. This implies that almost all parts of m(t; 0) can be explained by mi(t;0). The corresponding MSE and PRR are obtained as 8.587 and 0.027, respectively. Under the same circumstances, the maximum likelihood estimates for the mean value function (3) when I = 1 are fi — 137.426 and a = 0.578 which give the smaller MSE and PRR than those of mean value function given in (2). Table 1 shows that both MSE and PRR of the proposed model based on (3) and Models 1-4. It is shown that the proposed model has the smaller MSE and PRR than those of other NHPP models except the PRR of Model 3. The superiority of Model 3 in PRR seems to be due to the better fitting around the starting earlier times. Tohma et al.(1991) reported a set of real data that was recorded from testing the program designed for monitoring and real-time control. The program consists of about 200 modules and each module has, on average, 1000 lines of a high-level computer language. The recorded data are composed of 111 observations based on the number of failures. We consider the same settings as the above except the centering distribution Fo being uniform over (0,115). For numerical computations, I — 1 is selected and the corresponding maximum likelihood estimates for a and /x are computed as a = 1.415 and jl = 481.053, respectively. This implies that the corresponding beta density is skewed to the right and the mean value function is increasing fast in the early days and rather slowly in the latter terms. The

Model Model 1 Model 2 Model 3 Model 4 Proposed

Pham's MSE 38.663 15.535 9.681 26.041 7.887

Data PRR 0.647 0.045 0.023 0.083 0.027

Tohma's MSE 1008.461 732.875 322.875 309.356 283.584

Data PRR 5.115 4.301 1.574 1.670 1.340

410 maximum likelihood estimates for t h e competing models were presented in Zhang, P h a m , and Vu (1999). Table 1 shows t h a t the proposed model is superior t o other N H P P models a n d fits t h e given d a t a set b e t t e r . References 1. Cox, D. R. and Lewis, P. A. (1966), Statistical Analysis of Series of Events. Methuen. 2. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), Maximum likelihood from incomplete data via the EM algorithm (with discusstion). Journal of the Royal Statistical Society Series B 39, 1-38. 3. Diaconis, P. and Ylvisaker, D. (1985), Qauntifying prior opinion. In Baeysian Statistics 2, 133-156, North-Holland, Amsterdam. 4. Duane, J. T. (1964), Learning curve approach to reliability monitoring. IEEE Transaction on Aerospace 2, 563-566. 5. Goel, A,L.(1985), Software reliability models: Assumptions, limitations, and applicability, IEEE Transactions on Software Engineering 1 1 , 1411-1423. 6. Goel, A. L. and Okumoto, K. (1979), Time-dependent error-detection rate model for software and other performance measures, IEEE Transaction on Reliability 28, 206-211. 7. Mallick, B. K. and Gelfand, A. E. (1994), Generalized linear models with unknown link functions, Biometirka, 8 1 , 237-245. 8. Musa, J. D. and Okumoto, K. (1984), A logarithmic Poisson execution time model for software reliability measurement. In Processions of Seventh International Conference on Software Engineering, Orlando, 230-238. 9. Pham, H., and Chao, D. (2003), Predictive-ratio risk certerion for selecting software reliability models, ISSAT 9th Conference Proceedings of the International Conference on Reliability and Quality in Design, Hawaii. 10. Pham, H. and Nordmann, L. (1997), A generalized NHPP software reliability model, Proceedings of the Third ISSAT International Coference on Reliability and Quality in Design, 116-120. 11. Pham, H. and Zhang, X. (1997), Optimal release policies for a software cost model, Proceedings of Third the ISSAT International Conference, Reliability and Quality in Design 236-241. 12. Tohma, Y., Yamano, H., Obha, M., and Jacoby, R. (1991), The estimation of parameters of the hypergeometric distribution and its application to the software reliability growth model, IEEE Transactions on Software Engineering 17, 483-489. 13. Yamada, S., Tokuno, K., and Osaki, S. (1992), Imperfect debugging models with fault introduction rate for software reliablity assessment, International Journal of System Science 23, 2241-2252. 14. Zhang, X. and Pham, H. (1998), A software cost model with error removal times and risk costs, International Journal of Systems Science 29, 435-442. 15. Zhang, X., Pham, H., and Vu, M (1999), Comparison of NHPP software reliability models, Proceedings of Foutrh the ISSAT International Conference, Reliability and Quality in Design, 66-71, Seattle, Washington, USA.

SIMULATION METHODS FOR PARAMETER ESTIMATION OF INFLECTION S-SHAPED SOFTWARE RELIABILITY GROWTH MODEL HEE SOO KIM Department of Social Systems Engineering, Faculty of Engineering, Tottori University, Minami 4-101, Koyama, Tottori-shi, 680-8552, Japan DONG HO PARK Department of Information and Statistics, Hallym University, Chuncheon 200-702, Korea SHIGERU YAMADA Department of Social Systems Engineering, Faculty of Engineering, Tottori University, Minami 4-101, Koyama, Tottori-shi, 680-8552, Japan The inflection S-shaped software reliability growth model (SRGM) proposed by Ohba(1984) is one of the most commonly used SRGM. One purpose of this paper is to estimate the parameters of Ohba's SRGM by applying the Markov chain Monte Carlo techniques to carry out a Bayesian estimation procedures. This paper also considers the optimal software release problem with regard to the expected software cost under this model based on the Bayesian approach. The proposed methods are shown to be quite flexible in many situations and the statistical inference for unknown parameters of interests is readily obtained.

1. Introduction This paper adopts the Bayesian approach to estimate three parameters of the inflection S-shaped reliability growth model and to solve the optimal software release problem by assigning certain priors to the relevant parameters contained in the model. For such purpose, we define several new random variables which represent the reduction of faults in the system and the improvement of fault detection skills, etc. as the software system is upgraded. The prior distributions for such random parameters are assumed to be either Beta distribution or discrete Beta distribution, which is known to be quite flexible to represent the prior uncertainty. This paper consists of five sections. Section 2 gives detailed description of the inflection S-shaped SRGM suggested by Ohba (1984). Section 3 briefly describes the estimation procedures for the unknown parameters by both maximum likelihood method and Bayesian approach. For the Bayesian estimation, the Metropolis Hastings algorithm within Gibbs 411

412 sampling is used. In Section 4, we assume certain prior distributions for the unknown parameters to be considered and derive its corresponding conditional marginal posterior distribution for each parameter, which is necessary to apply the Bayesian method. In Section 5, the optimal software release problem is established based on the Ohba (1984)'s inflection S-shaped SRGM and its optimization with respect to the expected software cost is presented. 2. Inflection S-shaped SRGM Let M(t) be the cumulative number of faults detected until time t and let m{t) denote the mean value function of an NHPP describing the software fault detection process. The mean value function for the inflection S-shaped SRGM, which is proposed by Ohba (1984) is defined as ( l-e-*' ^ m(t) = N 0) For the inflection S-shaped SRGM, TV and (f> denote the s-expected number of initial faults latent in the program and the failure detection rate, respectively and y (0 < / < 1) is the inflection rate which indicates the ratio of the number of detectable faults to the total number of faults latent in the program. The intensity function for the inflection S-shaped SRGM is obtained as dm(t) = N<j>(\ + ((l-y)/r))e-^ (2) dt {\ + {{\-y)ly)e-*'f For more detailed discussions on the inflection S-shaped SRGM, see Ohba (1984). 3. Estimation of Parameters Let (zk,tk), k = \,---,n, be the observed n pairs of data, where zk denotes the number of failures observed up to time tk and thus zk+x > zk for tk+l > tk . If the number of failures occurred follows a Poisson distribution, the joint probability that the pairs of data (zk,tk), are observed can be expressed as Pr{Af(0) = 0,M(fO = Z|r--M((,,) = z,,} = fI { w ft ) ~"'ft-' : ) } '' V '

exp{-("•(<,)-"»((,_,))}.

(3)

Let£>, ={(z 0 =0,tQ =0),(z 1 ,/,),---,(z„,? n )}represent the observed n pairs of data. Then, the likelihood function for the inflection S-shaped SRGM can be written as

m m ) ^ n

1

1

i

1

^ ^ '

1

1+((1

(*,-*H)!

-^"-'J

expf-

N

^ ' \

1. W

\\ i+(0-r)/rK*-J

413 3.1. Maximum Likelihood Estimation The log- likelihood function for three parameters N,$ and y can be written as lnL = \nL{N,t,r\D,) = z„ lniV + ^ z - V i ) l "1 - I n > (z, -z y ,)

tt

''*'*<", ^

I1~\,\-H\

(5)

——.

l + ((l-y)//K"-

By taking the partial differentiations of In L with respect to each of three parameters N, (j> and y and setting them equal to zero, we obtain the following three log- likelihood equations. ainZ, = glnZ, _ ainZ, dN ~ 3$ ~ dy '

0-

(6)

The maximum likelihood estimates for Af,^ and / can be calculated by solving the equations, given in (6), simultaneously. Ohba(1984) presents the maximum likelihood solutions for N and (/). 3.2. Bayesian Estimation Given the data Dt , the joint posterior distribution of N,<j) and y can be expressed as

For the inflection S-shaped SRGM, the joint posterior distribution for N, and y , given Dt has the following expression.

1-e^''

^•nk(a-r)/r>^''

l-e^'H

i+(a-r)/r>^''-

h-=,-i)

•exp

{ i+(0-r)/r>,-i".

where g(N) , g(0) and g(j) denote the prior distributions of N, and y, respectively and these three parameters are assumed to be independent. To compute the Bayesian estimates for N,(/> and y, we adopt the Metropolis Hastings algorithm within Gibbs sampling which is a special case of a Markov chain Monte Carlo(MCMC) technique. We first generate a Markov chain, whose states are the parameters N,(j) and y, and whose steady-state (stationary) distributions are n(N\0,r,Dt) , K{(j>\N,r,D,) and 7[(r\N,^,Dt) , which are the posterior distributions of N,(/> and y, respectively. Using these three posterior densities, the random variates are generated sequentially for N, <j> and y following the steps stated below.

414

i ) Take an initial values of N ,0 and y, which are denoted by JV(O),0 and /°K

ii) Generate a random draw Nm of N from ^(JVU ( 0 ) ,/ 0 ) ,£),). iii) Generate a random draw ^ (1) of (j) from 7r(AN0) ,y(0\D,). iv) Generate a random draw / ( 1 ) of / from n(y\Nm,$m,D,). v ) Repeat the steps (i i) - (iv) until the / ' iteration to obtain / random draws of N,(/> and / . Under very mild regularity conditions, this Markov chain converges to a station-ary distribution for large / and the vector (JV (/) ,^ (/) ,y (/) ) has a distribution that is approximately equal to n{N,,y\Dt) . By starting with independent initial choices, we can replicate the above iterations J times. Let (N^J\^iJ\y(iJ)) denote the realization of (N,0,y) for the zth iteration and j th replication. For the purpose of calculating the posterior mean from the sampler, we assign the weight 2/{IJ) to each (,/V('j),^('"'),r<''-')) , / = / / 2 +1, • • •, / and j = 1,2, • • •, J as suggested by Gelman and Rubin( 1992). In case when the conditional posterior densities are not easily identified to carry out the steps (ii) - (iv), we employ the Metropolis Hastings algorithm to generate the random draws of N, and y . 4. Prior and Posterior Distribution of N, and y In absence of any information on the failure process, a commonly used prior is the non-informative prior. The non-informative prior is obtained by applying Jeffrey's rule, which suggests to take the prior to be proportional to the square root of the determinant of Fisher information matrix. Le\g{N) denote the prior density for TV . Then, under the hypothesis of prior independence, Jeffrey's prior for ,/V as the non-informative prior is given by g(N)ccNa~\ where a = 0 for failure truncated sampling (T -1 ) and or = 1/2 for time truncated sampling (T>tn). Thus, the conditional marginal posterior distribution for ./V can be expressed as

jr(^,y,A)«JV'--«xJ-

^l~C?-A

•

(8)

[ l + ((\-y)/y)e ""J In case the only information available for N is that the values of a parameter are bounded on (a,b), the uniform prior for N can be used. Based on this prior, the conditional marginal posterior distribution for N can be written as

415 Next, we suppose that only a prior mean can be elicited on TV, denote it by nN . Under this situation, an exponential prior, which has the property of maximizing the entropy under the constraint of prior mean only, can be used and the conditional marginal posterior distribution for N becomes 7i{NU,y,D,)x Nz- sxp\-

(

1

N +

^N

.

Ml-e-^)

\+((\-r)/y)e-"»

If not only the prior mean but also the prior standard deviation, denote it by aN, are available, then the prior information on the failure process is more detailed and so we may formalize the following gamma prior with shape and scale parameters CC and /? . g{N) = £-N"-x T{a)

exp(-/? N),

a = {jiN /aN f, 0 = MN 1*1 > 0 •

Under the gamma prior, the conditional marginal posterior distribution for N is proportional to n(N\^y,D,)

oc Nz"+a~] exp PN + ^ " ^ \ + {(l-y)/y)e-*'" V

Two other parameters, ^ and y, assume the values between 0 and 1 and thus, it is natural to consider a beta distribution, of which the uniform distribution on (0,1) is a special case, as its priors. The prior distribution for y with its hyper parameters a and p has the following probability density. g(y) =

y

(l-y)

, a >o,fi >o-

In the similar manner as for TV, the conditional marginal posterior distribution for y can be expressed as

^ww-'(i-y/-'exJ- *c-^> j-nj •-*"",

'-^'-' p'.

Similarly, the prior and conditional marginal posterior distribution for are obtained by exchanging y and (/> of g ( j ) and n(j\N,0,Dt)

in the above

expressions. By applying the Metropolis Hastings algorithm within Gibbs sampling, the Bayes estimates for ^ and / may be obtained numerically. 5. Optimal Software Release Policy Consider a situation where a new version of an existing software system has been developed and the old version has been already released to users and the

416 failure data was collected during the testing period for the old version. In this situation, it is desired to determine the optimal release time of the new version by using the information on the software failures which has been observed for the old version. To determine the optimal release time of the new version, we assume that the new version has the following mean value function. rn.it) = N' where A"= kNN

(0
\+

(9)

{{\-y')ly')e-^)

(1 < k, < -) and y<= ky

(1 < £ < - ) •

<j>

y

This function incorporates the improvement factors kN,k^ andk into the new model by assuming that the fault detection skill and ability of the software developer was improved as the experience has accumulated through the tests of the old version. Let ct ,c and cm denote the testing cost per unit time, the removal cost per fault during the testing period and the removal cost per fault after releasing the software, respectively. Then, the total expected software cost can be formulated as

c(T)=c,r+Cym„(7>c„,k(rt)-/w„(r)] (

\-e*'T

\

i+((i-/)//>-" T ) = c,T+{cr-cm)NkN

l-e

(

(10)

)

'" U+((i-/)//X (

-*k,T

i+W-rKVirW

l-e^

-*k,T

+cmNkN

l-e

lHQ-rKWK))!

J

where T and TL represent the software release time (that is, length of testing period) and the length of software life cycle, respectively, with TL>T • As a prior distribution for kN, we take a beta distribution with hyper parameters aN and fiN on (0,1). The prior distribution for kN is given by

g^N) = l{"N^\kN^\\-kNt-\

00.

(11)

As for the prior distributions for the parameter k^ and ky, we consider the discretization of a beta density, which has been discussed in Soland (1969) and Juang and Anderson(2004). The prior distribution for k^ is given by P, = Pr(*, = *„,) = t"+**'2g2{u)du ,

(12)

417 where k4, = kfL + <J, (2/-1) / 2 and St={kfU-k^lm^ Here, v P _ 2,°-'

anc

for / = l,2 J -,m„ •

^ £2 (M) ^s a ^ e t a density defined as

The prior distribution for £ y , which is a discrete beta density on (1,1 / 7 ) , is defined as follows. Pj =Pr(kr=krj) where kyj=kyL+5r{2j-\)I2 Here, A

ft(v) =

= ti+*r'2g3(v)dv,

and
(14) for y = l,2,-,m r •

_ and g3(v) is a beta density defined as follows.

r(«r)roffr)

(A^-V)"'^- 1

*

r

^ r

'^

Given the priors (11), (12) and (14), b y taking the expectation on C{T) of (10) with respect to kN,k,

and k the Bayesian total expected software cost

can be expressed as CB(T) = V>,*, C(T> = c ' r + ( c r " O -

a

N

+

PN

e

J l=\ y=l

\-e-*k»T ,^^,7-

l k i+(0-r* r y)/(r* r y)>'

W, (16)

7JPy. «N+/V Li+W-r^VCr^py^J Differentiating the equation (16) with respect to T a n d setting it equal to zero, the following relation is derived. +C„

IE

f*M(i-y^)/(M„)+i)e~' ,,v 2

t + (0-/^)/(/A: J , ; )) e - }

,v
(17)

<*N + PN

^(cm-cr)

Let g(T) and C denote the left-hand side and the right-hand side of (17), respectively. Then, we have sig d " . — CB{T) = C - g ( T - ) , where g ( 0 ) > o . g(oo) = 0 , 0 < y < l and C* assumes the same sign as cm-cr

.

Due to the fact that ^ . ( r ) is concave for each / and j and the linear combination of concave functions is a concave function, g(T) is a concave

418 function. Thus, there exists a value T, denote it by T0 , such that g(T) is increasing for TT0. Case 1. Suppose that T0 < 0 . Let 7j be the unique solution satisfying dCB(T)/dT=0. The, it can be shown that the optimal software release is determined as follows. (Al) If g(0) >C*, the optimal release time is T' =TV (AT) If g(0) < C*, the optimal release time is T* = 0. Case 2, Suppose that T0 > 0. Let T2 be the unique solution satisfying dCB(T)/dT=0 when g(0) > C* and let Ta and Tb be the values of T satisfying dCB(T)/dT = 0 when g(0) C* . Then, the Bayesian optimal software release policy can be summarized as follows (Bl) If g(0) > C*, the optimal release time is T* = T2. (B2) If g(T0) < C*, the optimal release time is T' = 0. (B3) If g(p) < C \ g(T0)> C and CB(0) = CB{Tb), the optimal release time is T* = 0 or Tb . (B4) If

g ( o)

< C*, g(r 0 )> C* and C a (0) > C s (7;) ,

the optimal release time is T* =Th(B5) If g (0) < C', g(T0)> C' and CB(Q) < CB(Tb), the optimal release time is T* = 0. References 1. A. Gelman and D. B. Rubin, Statistical Science 7, 457 (1992). 2. A. L. Goel and K. Okumoto, IEEE Trans. Reliability 28, 206 (1979). 3. M. G. Juang and G. Anderson, European Journal of Operational Research 155,455(2004). 4. P. K. Kapur and R. B. Garg, Microelectron Reliab. 31, 39 (1991). 5. L. Kuo, J. C. Lee, Kiheon Choi and Tae Young Yang, IEEE Trans. Reliability 46,76(1997). 6. L. Kuo and T. Y. Yang, J. Amer. Statis. Assoc. 91, 763 (1996). 7. M. Ohba, IBM J. Research and Development 28,428 (1984). 8. R. M. Soland, IEEE Trans. Reliability 18, 181 (1969). 9. S. Yamada, M. Ohba, and S. Osaki, IEEE Trans. Reliability 33, 289 (1984).

A S T U D Y ON B O O T S T R A P C O N F I D E N C E INTERVALS OF S O F T W A R E RELIABILITY M E A S U R E S B A S E D ON A N I N C O M P L E T E G A M M A F U N C T I O N MODEL*

M. K I M U R A Department of Industrial & Systems Engineering, Faculty of Engineering, Hosei University 3-7-2, Kajino-cho, Koganei-shi, Tokyo, 184-8584, Japan E-mail: [email protected]

This study focuses on the generalization of several software reliability models and the derivation of confidence intervals of reliability assessment measures. First we propose an incomplete gamma function model, and discuss how to obtain the confidence intervals from a data set by using a bootstrap scheme. A two-parameter numerical differentiation method is applied to the data set to estimate the model parameters. We also show several numerical illustrations of software reliability assessment.

1. Introduction For over these three decades, a lot of software reliability growth models have been proposed in the literature of software reliability assessment issues 1,2 ' 3 . This fact shows that each model has some advantages for several data sets, however, the model is not always applicable to all kinds of data. Therefore we have a number of software reliability models so far. In order to overcome this complication on model selection, we discuss a method of generalizing several proposed models in this study. In particular, we deal with some growth curve models for software reliability assessment. These models have been known as the models which describe the time behavior of the cumulative number of detected software faults. We show that an exponential, delayed S-shaped, and Gompertz curve models can be included in an incomplete gamma function model, which is newly proposed in this study. * This work was partially supported by the Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (C), 18500066, 2006.

419

420

We also propose a method of parameters estimation by using a twoparameter numerical differentiation method. This method allows us to use a linear regression analysis for the data set. After discussing the modeling and the method of parameters estimation, we present how to obtain the confidence intervals of software reliability assessment measures by using a bootstrap method. 2. Generalization of Growth Curve Models In this study, we assume that a data set forms (ij, yt) (i = 1,2,... , n), where £; means the «-th testing time recorded and yi the cumulative number of detected (and removed) software faults up to ti. We also assume that 0 < 2/i < y2 < • • • < Vn is satisfied. One of our main concern is to predict the future behavior of (tj,yj) (tj is given, and j > n). In order to investigate an appropriate function to describe the time behavior of such data sets, we focus on the following three growth curves 1 ' 2 ' 3 . M{t) = m1(l-e-m2t)

(mi > 0 , m 2 > 0 ) , d2t

D(t) = di(l - (1 + d2t)e- ) p[ 93i]

G(t) = gigf -

(dt > 0, d2 > 0),

(31 > 0, 0 < g2 < 1, g3 > 0).

(1) (2) (3)

Equations (1) and (2) are often used as the mean value functions of the nonhomogeneous Poisson process models, which are widely known as software reliability assessment models 1 ' 2 ' 3 . They are called an exponential software reliability growth model (SRGM, for short) and delayed S-shaped SRGM, respectively. Equation (3) is the so-called Gompertz curve, which has been applied to describe various growth phenomena in many research areas, not only used as a software reliability growth curve. Based on these functions, we can find the following relations on them. ,

rdM(t).

log{

dt

,

,

} = log{mim 2 } - m2t,

l o g { ^ / i } = log{cM 2 } - d2t,

(4) (5)

\og{^-/G(t)} = log{ 53 log - } tot. (6) dt g2 These formulas suggest that if we directly obtain the left-hand side values of each model when t = U (i = 1,2,... ,ri) from the data set, we can use linear regression scheme4 to estimate the unknown parameters appeared in the right-hand side of the above equations. We describe it as follows.

421

2.1. Two-Parameter

Numerical

Differentiation

Method

In such a situation, the method of numerical differentiation by using n data pairs of (tj,2/$) is often used. For instance, the i-th value of Jf' in Eq. (4) can be given by

{

i(»»+i-»i

I Vi-Vi-i)

( i ^ = i

(')

(t = n)

where t 0 = 0 and yo = 0. That is, we extract the central difference from the data set. The basic idea of taking the central difference can be seen in Yamada & Somaki 5 . In this study, moreover, in order to deal with Eqs. (4), (5), and (6) by unit formula, we introduce a non-negative, increasing, and differentiable function H(t), and its transformed value z(c,d,ti). By using them, we formulate the following relations.

log { ^ V ( W } U , = z{c,d,U) = l

(8)

In the above equations, we introduced two parameters c and d to obtain more applicability of the model. We call this transformation the twoparameter numerical differentiation method. Consequently, we see a linear relation between U and z(c,d,ti) as z(c,d,ti)

= A-Bti

+ ei (i = l , 2 , . . . , n ) ,

(9)

from Eqs. (4), (5), and (6), where A and B are the constant parameters, which can be estimated under the least squares rule. Also we assume that Cj is the z-th realization of i.i.d. random variable of which mean is zero.

2.2. Incomplete

Gamma

Function

Model

From Eqs. (8) and (9), we have a differential equation as follows. log{^-/H(ty/td} = A-Bt.

(10)

422

This equation describes the mean behavior of z(c, d, ti) when ti is given. Equation (10) can be solved with an arbitrary constant C\ as follows. (1 - c){Cx - -^T[d

+ 1, Bt}} ^

(if c / 1)

H(t) = I

(11) . [Ci expl-^Tid

+ 1, Bt]]

(if c = 1)

where r[m,x] is the incomplete gamma function defined as am-1e-'ds.

T[m,x] = /

(12)

JX

In Eq. (11), If c ^ 1, we choose the initial condition H(0) = 0 to decide C\ then //(£) can be rewritten as H(t)

£^*{T[d+1,0]-r[d+l,Bt}}

(13)

As a special case, setting (c = 0, d = 0) and (c = 0, d = 1) respectively yields H{t)

B

{l-e~»%

(14)

aA

H(t) = -^(1-(1

-m).

+ Bt)e

(15)

These equations correspond to M(t) and D(t) in Eqs. (1) and (2), respectively. Similarly, we can confirm that the function H(t) with (c = 1, d = 0) becomes the Gompertz function G(t) in Eq. (3). Thus we call H(t) derived by Eq. (11) the incomplete gamma function model, and propose it for the software reliability data analysis. 3. Method of Parameter Estimation This section discusses a method of parameter estimation. We here recall Eq. (9) as z(c,d,U) = A-Bti

+ ei (i = 1 , 2 , . . . , n ) .

By using the least squares estimation, the parameters A and B can be analytically obtained by 1 "

n

1

A = - V z(c,d,U) + B x - V t u i=\

2_/i=l(^i

(16)

i=l

—

(17) n Ei=l^»)

423

Note that these A and B are the functions of the unknown parameters c and d. Therefore the sum of squared residuals, S(c,d), is derived by n

"£e2

S(c,d) =

n

= E[«
z(c, d) + ((, - t )

t=l

E

" ^ ' J ^ * ' ' ' 12,(18) E?=i(*i-*)

where 1 " ;z(c,d) = - V z ( c , d , t i ) , n ^-^

1 " t = -V*i. n •'—'

(19)

Minimizing S(c, d) with respect to c and d, we obtain c and d numerically. A and JB can be also estimated via Eqs. (16) and (17) with c and d. However, since we assume e, does not obey a normal distribution, there is no guarantee that both A and B are the best estimators. 4. B o o t s t r a p Confidence Intervals The incomplete gamma function model shown by Eq. (11) can yield several software reliability assessment measures. In this study, we focus on two measures, i.e., the cumulative number of detected faults at time t, H(t), and the cumulative mean time between failures at t, MTBFC(£) = t/H(t). Both measures can be evaluated by performing the parameter estimation discussed in the previous section. Additionally, we discuss how to obtain their confidence intervals by the bootstrap method. 4.1. Standard

Errors

of Unknown

Parameters

In Eq. (9), we assumed that the mean of the distribution of e* has to be zero, but we do not assume that the distribution is a normal distribution. In general, after performing the transformation introduced by Eq. (8), we can not give any guarantee whether the error term £j has a normality. Hence we use bootstrap method to evaluate the standard errors of A and B, instead of using a common procedure based on a normal error term. The bootstrap method is one of the re-sampling methods for statistical evaluation 6 . We show its procedures as follows. S t e p 1 Estimate Ao = A and Bo = B by a linear regression scheme with the data set and Eqs. (16) and (17). This means the parameters

424

c and d are also estimated. Thus we have the values of z(c, d, U) (i = l , 2 , . . . , n ) . Step 2 Calculate the residual, w(ti), by w(ti) = z(c, d, ti) - (A0 - B0ti)

(i = 1,2,..., n).

Step 3 Set the total number of iteration K. Let k = 1 (k = 1,2,... ,K). Step 4 Generate Zfc(c, d, ti) by randomly choosing one value w(t*) from the set of w(ti) (i — 1,2,..., n). Therefore we have zk(c, d, ti) = w(U) + (A0 - B0ti)

(i = 1 , 2 , . . . , n).

Step 5 Estimate the parameters Ak and Bk by based on a new linear regression: zk(c, d, U) = Ak-

BkU (i = 1,2,..., n).

Step 6 Let k = k + 1 and go back to Step 4 if k < K. Step 7 We obtain K pairs of bootstrap estimates (Ak,Bk) i,2,:..,K).

{k —

Therefore we can obtain E[A] and E[B] ,Vai[A], Var[B], and Cov[,4,5] from the bootstrap method above 7 . a 4.2.

Confidence

Intervals

of Reliability

Measures

From the asymptotic normality of the pair of (A, B), it obeys a bivariate normal distribution as: ^~BN(

M I

E),

(20)

where /x is a mean vector and S is a variance-covariance matrix. In this study, we use the bootstrap estimates (Ak,Bk) {k = 1,2,..., A") for the evaluation. That is -

/ Var[i] ^Cov[i,B]

Cov[A,B}\ Var[£] J'

y

'

Hence we have the 95% confidence interval of H(t) in Eq. (11) by H(t) ± 1.96y/vax[H(t)\, a

(22)

E [ X ] , Var[X], and Cov[X, Y] are the mean of a random variable X, variance of X, and covariance of X and Y, respectively.

425

where Var[iJ(£)] can be derived by 8 Vax[H{t)] =

fdH(t)

V dA

d H ( t ) ^ ^ f ^ \ x S x dB

\rnip.) i(-i) db

(23)

By following the same manner, we can derive the confidence interval for MTBFc(t)5. N u m e r i c a l E x a m p l e s We analyze a sample data set (denoted as DS-1). DS-1 forms {ti,yi) (i = 1,2,..., 12), where ti is measured by calendar time (month) and yt is the cumulative number of detected faults up to time ti. First we estimated the model parameters. The results are: AQ = 9.00534, Bo = 0.212543, c = -0.46834, and d = 0.61352. Figure 1 shows the results of the linear regression by these parameters and DS1. By applying the bootstrap method with K — 30000, we obtained £ = (E[i] = 9.00499 E[B] = 0.212479)', and S is shown as -_(

')-e

Var[i] C o v [ i , B ] ^ _/0.00801322 -0.000963526\ 0.000963526 0.000148269/' VCov[i,B] Var[S]

(24)

We plot the 95% confidence regions of A and B in Fig. 2. The outer region is from the traditional standard error analysis assuming a normal error term, and the inner one is obtained by the bootstrap scheme. The bootstrap confidence region is smaller than that of the traditional method for DS-1. Figures 3 and 4 are the estimated cumulative number of detected faults at time t and MTBF c (i), respectively.

Figure 1. Estimated regression line and Figure 2. z(c,d,U) (i = 1,2, . . . , 1 2 ) . B.

95% confidence regions of A and

426

Figure 3. Estimated cumulative number of Figure 4. Estimated cumulative MTBF detected faults with the 95% confidence in- with the 95% confidence interval and terval. (U, U/yi) (i = 1, 2 . . . , 12). 6. C o n c l u d i n g R e m a r k s We have derived software reliability assessment measures with confidence intervals, and discussed the b o o t s t r a p linear regression method which is based on the two-parameter numerical differentiation method and incomplete gamma function model. T h e b o o t s t r a p linear regression scheme has an advantage which the method does not need a normality assumption for the error term. In the future study, we will consider the generalization of a logistic curve model. Since the method for initial parameters estimation shown in Section 3 is theoretically rough, we need to discuss more precisely the theory of the parameters estimation if the error t e r m has only weak assumptions.

References 1. M. Lyu, Handbook of Software Reliability Engineering, IEEE Computer Society Press, Los Alamitos (1995). 2. J. D. Musa, Software Reliability Engineering, McGraw-Hill (1999). 3. P. K. Kapur, R. B. Garg, and S. Kumar, Contributions to Hardware and Software Reliability, World Scientific (1999). 4. B. M. Ayyub and R. H. McCuen, Probability, Statistics, and Reliability for Engineers and Scientists (2nd Edition), Chapman & Hall/CRC (2003). 5. S. Yamada and H. Somaki, "Statistical methods for software testing-progress control based on software reliability growth models (in Japanese)," Trans, of the Japan Society for Industrial and Applied Mathematics, 6, 4, pp. 33-43 (1996). 6. J. E. Gentle, W. Hardle and , Y. Mori, Handbook of Computational Statistics: Concepts and Methods, Springer (2004). 7. C. F. J. Wu, "Jackknife, bootstrap and other resampling methods in regression analysis," 4 nn. of Statist, 14, 4, pp. 1261-1295 (1986). 8. J. D. Musa, A. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction, Application, McGraw-Hill (1987).

S O F T W A R E RELIABILITY M O D E L I N G B A S E D O N M I X E D POISSON D I S T R I B U T I O N S *

H. OKAMURA AND T. DOHI Department of Information Engineering, Graduate School of Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima 739-8527, JAPAN E-mail: {okamu, dohi}@rel.hiroshima-u.ac.jp

This paper considers a novel modeling framework of software reliability models (SRMs). The resulting SRMs based on the mixed Poisson distribution (MPDs) can involve completely, but are not always equivalent to the non-homogenous Poisson process (NHPP) based SRMs. More precisely, the proposed SRM is given by a mixture of NHPPs, and follows the mixed Poisson process. We develop a parameter estimation method for the MPD-based SRMs based on EM algorithm.

1. I n t r o d u c t i o n During the last three decades, the software reliability community has developed a huge number of software reliability models (SRMs) from various points of view. Specifically, the non-homogeneous Poisson process (NHPP) based SRMs play a central role to assess software reliability, the number of remaining faults, the optimal software release schedule, etc. *'2 Generally speaking, NHPP-based SRMs are stochastic counting processes to describe the number of detected faults in testing phase, and are fairly tractable statistically. The representive NHPP-based SRMs are Goel and Okumoto SRM 3 , Yamada, Ohba and Osaki SRM 4 and Goel SRM 5 . Since these SRMs are built on different debugging scenarios, it is possible to represent a variety of reliability growth phenomena observed in testing phase. The NHPP-based SRMs have gained popularity in software reliability engineering, but at the same time it is empirically known that the faultdetection processes do not always follow the NHPPs. This is related with the fault dependency or the fault correlation. That is, when a software *This research was partially supported by the ministry of education, science, sports and culture, grant-in-aid for exploratory research, grant no. 15651076 (2003-2005).

427

428

fault is newly detected, the detected fault induces other associated faults. In such a situation, the fault-detection process does not longer follow the NHPPs, and the NHPP-based reliability assessment tends to overestimate the software reliability optimistically. On the other hand, some authors 6 propose the Markovian SRMs as alternatives of the NHPP-based SRMs. However, since the number of detected faults usually is equal to the number of states in the Markov chain, the statistical treatment in the Markovian SRMs is more complex than the NHPP-based SRMs. In this paper, we propose a different modeling framework from both Markov SRMs and NHPP-based SRMs. The main idea of the paper is to apply a mixture of Poisson processes. In general, when multiple homogenous Poisson processes are stochastically mixed with some mixture ratios, the resulting process does not have the Poisson properties. We extend this concept to the NHPP-based SRMs, and develop a novel modeling framework of SRMs based on the mixed Poisson distribution (MPD). Hayakawa and Telfar 7 give an example in software reliability by describing an inhomogeneous variant of the mixed Poisson-type process, and generalize the Jelinski and Moranda SRM 6 . On the other hand, our approach leads to a generalization of the NHPP-based SRMs. The rest part of this paper is organized as follows. Section 2 introduces NHPP-based SRMs, and describes the MPD-based SRMs under the similar but somewhat different assumptions for NHPP-based SRMs. Section 3 provides an effective parameter estimation method for the MPD-based SRMs. In particular, we give a primal result of estimating parameters based on EM (ExpectationMaximization) principle 8 . In Section 4, we give a specific example to design the EM algorithm. Finally, the paper is concluded with remarks and future directions in Section 5.

2. Software Reliability Modeling 2.1. NHPP-Based

SRMs

The SRMs based on NHPPs are often described under the following assumptions 9 ' 10 : (i) The number of inherent faults in software is finite, and is given by a Poisson random variable. (ii) All the fault-detection times are mutually and independently distributed with an identical probabilistic law.

429

Suppose that the number of inherent faults obeys the Poisson distribution with mean u>, and each fault-detection time follows a general probability distribution function F(t), (t > 0). Then the probability mass function (p.m.f) of the number of software faults detected before time t is given by = 0,1,....

(1)

Equation (1) is the same as the p.m.f of NHPP. Substituting the wellknown probability distributions into F(t), we can represent most existing NHPP-based SRMs within this modeling framework. 2.2. MPD-Based

SRMs

Let us consider a modeling framework based on MPD. The MPD is an extension of Poisson distribution, and can be regarded as a mixture of a number of Poisson distributions with different rates. Let G(ui), (u> > 0) denote the probability distribution function, called the mixture ratio distribution. In general, the p.m.f of MPD can be given by f°°

bjn

/ —e-udG(u). (2) Jo nFor example, if the mixture ratio distribution is the Erlang distribution with shape parameter m and scale parameter /?, namely, Pr{NM = n}=

am

g(w;m,0)

rn — l

= ^

—e"^, (3) 1)! where
m+

**--H r)(^)W- « In this way, MPDs can represent various types of distributions depending on the mixture ratio. Table 2.2 presents the relationship between typical probability distributions of mixture ratio and the corresponding MPDs n . The main idea of this paper is to use the above MPDs instead of the Poisson distribution in Assumption (i). That is, we rewrite Assumption (i) by (i)' The number of inherent faults in software is finite, and is given by a mixed Poisson random variable.

430

Table 1. Examples of MPDs. MPDs Mixture Ratio Erlang (gamma) negative binomial Delaporte shifted gamma generalized inverse Gaussian Sichel inverse Gaussian inverse Gaussian-Poisson beta beta-Poisson uniform uniform-Poisson truncated normal truncated normal-Poisson lognormal lognormal-Poisson

Since the tail probability of MPD is generally larger than that of Poisson distribution, the variance of MPD becomes larger than the Poisson. This means that the software faults may not be uniformly detected over all the source codes of programs, but may be biased to specific modules and functions. Under the assumptions (i)' and (ii), the p.m.f of the number of detected faults are given by Pr{N(t) =n}=

[°°

{ujF(

Jo

)}n -* e-"FWdG(oj), n

-

n = 0,1,... .

(5)

For instance, when the mixture ratio distribution is given by the Erlang distribution with parameter (m,/?), the resulting p.m.f of MPD-based SRM is

This type of counting process is a generalized Polya-Lundberg process n . As see in this example, the fault-detection processes based on MPD-based SRMs are not NHPPs, but can involve them completely within the modeling framework, although we omit to show the examples mentioned in Table ?. 3. Parameter Estimation In this section, we discuss a parameter estimation for the MPD-based SRMs. The most commonly used method to estimate the model parameters is the maximum likelihood estimation. Define the software fault data set, D = (Xi,... ,XK < T), observed in testing phase, where X, denotes the i-th detection time, and where K and T indicate the total number of all the detected faults and the time length of testing, respectively. In

431 the NHPP-based SRMs, we usually consider the following log-likelihood function (LLF) of the observed fault data: K

LLF(0;£>) = Klogoj + ] T l o g / ( X i ; 0 ) - wF(T;0),

(7)

where /(•) and 0 are the probability density function and the set of its parameters, respectively. Then the maximum likelihood estimates (MLEs) are computed so as to maximize the LLF, namely, 0 := argmax LLF(0; D).

(8)

Generally speaking, it is not so easy to solve the above maximization problem, because LLFs are non-linear functions of 0 in almost all cases. Also, the parameters are often subjected to implicit constraints such as positive condition. Thus, the estimation procedure is complicated for the SRMs with a number of parameters, and is rather sensitive to the initial adjustment of parameters themselves when the Newton's method is applied. To simplify the estimation procedure, it is known that the EM (ExpectationMaximization) algorithms are useful to compute the MLEs for the existing NHPP-based SRMs 12 . Next we discuss the parameter estimation of MPD-based SRMs. Since the MPD-based SRM can be built as a mixture of NHPP-based SRMs, the LLF of MPD-based SRM is given by LLF(0 F , 0 M ; D) = V log f(Xt; 0F) + log / i=i Jo

wKe^F{T-'eF)dG{oj;

0M),

(9) where Bp and 0 M are the parameters of fault-detection time distribution and mixture ratio distribution, respectively. Note in Eq. (9) that a constant term is removed with respect to 6F and &G, because it does not affect the estimation result. As seen from this expression, the LLF of MPD-based SRM becomes more complex rather than that of NHPP-based SRM. In particular, since the second term in Eq. (9) involves an integration, it cannot be simplified any more. In other words, due to the second term in Eq. (9), it is extremely difficult to find the MLEs even if the Newton's method is used for optimization. This will motivate to apply the EM algorithm for computing the MLEs in the MPD-based SRMs. Consider the complete data set as Xi,... ,Xjv, where N is the total number of inherent faults in software. Given the parameter w of Poisson distribution, the MPD can be reduced into an NHPP. If we can observe the

432

parameter u> directly, the LLF for the complete data set, so-called complete LLF, is given by N

LLF(0 F , 6M) = N\ogw - w + J2 log/(X 4 ; 0F) + log 5 (u;; 0M).

(10)

»=i

The EM algorithm consists of two steps: E-step and M-step. The E-step is devoted to deriving the expected values of complete LLF under the incomplete data observed in the field. In the M-step, the parameters are updated by maximizing the expected complete LLF. That is, one step consisting of both E-step and M-step in the EM algorithm for MPD-based SRM is essentially equal to solving the following equations: JV

E

d

k £75ir ddF */(*^>

D

= 0,

(11)

i=l

and E

d logg(w; 6M) D dOM

(12)

= 0.

The initial guesses of 6F and &G are replaced by the solutions of the above equations, 6'F and d'M, in one step of the EM algorithm. It is worth noting that Eqs. (11) and (12) are similar to the direct maximizations of the faultdetection time distribution and the mixture ratio distribution, respectively. Hence, it can be seen that the M-step formulas are likely to be much simpler than the maximization of Eq. (9). On the other hand, in order to evaluate the expected values in Eqs. (11) and (12), we.give the following useful formulas: N

E

EM**)

D

^2h(Xi)

+ E[u\D]

h(u)f{u;9F)du

(13)

and

E[h(w)\D] =

/0°° h{s)sKe-sF^e^dG{s; K

sF T

/ 0 °° s e- ( ;<>F)dG(s;

0M) 0M)

(14) '

where h(-) is an arbitrary function. Using Eqs. (11)—(14), one step formula of the EM algorithm for MPD-based SRMs can be derived.

433 1: Determine the initial guesses for 6 and 0. 2: REPEAT 3: (E-step) Compute the following expected values by using the parameters 6 and 0: E[w\D] = (m + K)/(0 + e - " T ) , E[N\D] = K + E[w|D]e" 9 T , E [ E £ i Xi\D\

= E f e ! * i + E[OJ\D](T +

l/0)e'f>T.

4: (M-step) Compute the new parameters as follows. 6' :=

m/E[u\D],

0' : = E [ i V | £ > ] / E [ £ f = , * i | £ > ] , 5: Update the parameters as 6 := 6' and 0 := 0'. 6: UNTIL satisfying the termination condition. 7: R E T U R N the MLEs as 6 and 0. Figure 1. Pseudo-code of the estimation procedure for MPD-based SRM with exponential fault-detection time distribution and gamma-mixture distribution.

4. A n Illustration of MPD-based S R M In this section, we present a typical illustration on an MPD-based SRM. Suppose that the fault-detection time distribution and the mixture ratio distribution are given by the exponential distribution with parameter f3 and the gamma distribution with parameters m and 0, respectively, where: am. .m—1

f(t)=0e-<*,

fl(w)

= l-^-e-ft".

(15)

1 [m)

As mentioned before, the corresponding MPD-based SRM is a generalized Polya-Lundberg process:

where !?(•,•) is the standard beta function. Then the software reliability which is the probability that no software failure occurs in the time period [s, s + t), is given by ms)

={^{^e-0i)+e)

•

(17)

Next we consider the parameter estimation based on the EM algorithm. From the assumptions, we can directly use the results of Eqs. (11) through (14). Taking account of the MLEs of respective distributions, we have a pseudo-code of the EM algorithm in Figure 1, where the parameter m is assumed to be fixed.

434

5.

Conclusion

In this paper, we have proposed a modeling framework based on M P D . T h e resulting stochastic processes on the number of detected faults are not N H P P s any more, and provide t h e pessimistic assessment on software reliability, compared with N H P P - b a s e d SRMs. Also, we have developed an effective p a r a m e t e r estimation procedure for MPD-based SRMs. This approach has been based on the E M algorithm, and has been simplified rather t h a n t h e direct maximization of t h e L L F . In future, we will evaluate predictive performance of MPD-based SRMs with real software fault d a t a , and compare t h e N H P P - b a s e d SRMs and some Markovian SRMs. References 1. M. R. Lyu, ed., Handbook of Software Reliability Engineering. New York: McGraw-Hill, 1996. 2. J. D. Musa, A. Iannino, and K. Okumoto, Software Reliability, Measurement, Prediction, Application. New York: McGraw-Hill, 1987. 3. A. Goel and K. Okumoto, "Time-dependent error-detection rate model for software reliability and other performance measures," IEEE Trans, on Reliab., vol. R-28, pp. 206-211, 1979. 4. S. Yamada, M. Ohba, and S. Osaki, "S-shaped reliability growth modeling for software error detection," IEEE Trans, on Reliab., vol. R-32, pp. 475-478, 1983. 5. A. L. Goel, "Software reliability models: assumptions, limitations and applicability," IEEE Trans, on Software Eng., vol. SE-11, pp. 1411-1423, 1985. 6. Z. Jelinski and P. B. Moranda, Software reliability research, pp. 465-484. New York: Academic Press, 1972. 7. Y. Hayakawa and G. Telfar, "Mixed poisson-type processes with application in software reliability," Math. Comput. Model, vol. 31, pp. 151-156, 2000. 8. A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the em algorithm," J. Roy. Statist. Soc. Ser. B, vol. 39, pp. 1-38, 1977. 9. N. Langberg and N. D. Singpurwalla, "Unification of some software reliability models," SIAM J. of Computing, vol. 6, pp. 781-790, 1985. 10. D. R. Miller, "Exponential order statistic models of software reliability growth," IEEE Trans, on Software Eng., vol. SE-12, pp. 12-24, 1986. 11. J. Grandell, Mixed Poisson Processes. London: Chapman & Hall, 1997. 12. H. Okamura, T. Watanabe, and T. Dohi, "An iterative scheme for maximum likelihood estimation in software reliability modeling," in Proc. of 14th Int. Sympo. Software Reliab. Eng., pp. 246-256, IEEE CS Press, 2003.

COVERAGE G R O W T H F U N C T I O N S FOR S O F T W A R E RELIABILITY MODELING

J.-Y. PARK Department

of Information Statistics, Gyeongsang National University, 900 Gazwa-dong Jinju, 660-701, Republic of Korea E-mail: [email protected] T. F U J I W A R A

Development

Dept. 2, Development Division, Fujitsu Peripherals 35 Saho, Katoh-shi, Hyogo, 673-144?, Japan E-mail: [email protected]

Limited,

Recently software reliability growth models incorporating coverage growth behavior have been developed and applied in practice, because it is beneficial in order that coverage growth behavior describes a fault detection phenomenon. Performance of such software reliability growth models depends on the kind of selected coverage growth function. This paper first reviews the coverage growth functions considered for software reliability modeling. Then their theoretical characteristics and empirical performance are investigated.

1. Introduction Many software reliability growth models (SRGMs) have been proposed and applied in practice. A new trend is to incorporate coverage information into SRGMs. As the testing proceeds and the coverage grows, the number of detected faults tends to increase since the faults are distributed over and located at the constructs. Therefore the coverage growth directly influences the fault detection process. The non-homogeneous Poisson process (NHPP) SRGMs proposed by Gokhale et al. 2 , Gokhale and Trivedi 3 , Grottke 4 , Malaiya et al. 5 , Pham and Zhang 8 , Piwowarski et al. 9 and Park et al. 6 embody a coverage growth function (CGF), which represents the coverage growth behavior during the testing. Such NHPP SRGMs are referred to as coverage-based NHPP SRGMs. Performance of a coverage-based NHPP SRGM depends on how closely its CGF represents the actual coverage growth behavior. In order for a coverage-based NHPP SRGM to be widely 435

436

applicable, its CGF should be able to represent the coverage growth produced by arbitrary testing profile. Otherwise, its application will be limited. It is therefore necessary to examine the ability of available CGFs to represent the coverage growth behavior. This paper first investigates theoretical characteristics of the available CGFs and then select reasonable ones among them in Section 2. Section 3 modifies the selected CGFs to reflect a practical observation on coverage growth. Performance of the modified CGFs is empirically evaluated and compared in Section 4. Conclusions are given in Section 5. 2. C G F s and Theoretical C o n s i d e r a t i o n Denote by M the set of all the constructs of a software system under testing. Let | • | be the cardinality of a set of constructs. As the testing progresses, the number of covered constructs increases. We denote the set of constructs covered up to testing time t by Mc(t). Then the coverage at testing time t is computed as C(t) = |M c (t)||M| _ 1 . Since the number of constructs executed by a test case is usually modeled as a random variable, C(t) is also a random variable. Thus the CGF is defined as the expected value of C(t), i.e., c(t) = E[C(t)}. The currently available CGFs are summarized in Table 1. An NHPP SRGM is characterized by its mean value function (MVF) Table 1.

Currently available CGFs.

CGF

parameter constraints 3t

References

c1(t) = l-e~l

0
[2], [10]

c2{t) = l-e-l3t''

0
[2]

0
[2]

0
[3], [9]

0
[5]

c3(t) = l-(l c 4 (t) =

T

^

+ /3t)e-^ T

c 5 (t) = 7 l n ( l + / 3 t ) 1

c 6 (i) = l _ ( l - / 3 7 t ) / T

0
[4]

c7(t) = a 0 - aie'0t

0 < a j < a 0 < 1,0 < (3

[11]

c8(t) = a 0 - ^ZTpt

0 < Q 1 <2a 0 ,0
[11]

C

0
[12]

0
[1]

»(t) = ~-^T

cioffl = 1 - / 0 °° e ~ g t - ( 1 X " " j £ / a ' a

d(3

437

m(t). A coverage-based NHPP SRGM contains a CGF within its MVF. MVFs of the coverage-based NHPP SRGMs proposed in the literature are represented by the following differential equations: dm(t) _

~dT

dc(t)

..

= a

[)

~dT>

dm

®=am*@; v ' dt

dt dm(t) dt

[a(t) m(t)]

~

(2) dc(t)

**— r^)'

^fi-M.-^,)]^.

(3)

(4)

where a is the initial fault content, a{t) is the fault content function, b is the fault detection rate and b(t) is the fault detection rate function. Eqs. (l)-(4) were respectively proposed by Piwowarski et al. 9 , Gokhale et al. 2 , Pham and Zhang 8 , and Yamamoto et al. 12 . As described earlier in this section, C(t) is also a stochastic process. However, the above coverage-based NHPP SRGMs simplify the fault detection process involving C(t) by assuming that coverage grows according to a certain deterministic CGF. Similar approach is used for incorporating the imperfect debugging into NHPP SRGMs. For example, the fault content function a(t) in Eq. (3) reflects the stochastic process caused by the imperfect debugging. Eqs. (l)-(4) can be regarded as frameworks for generating NHPP SRGMs. If we insert a CGF into Eqs. (l)-(4), different MVFs and consequently different NHPP SRGMs will be obtained. We attempt to select some reasonable CGFs from the existing ones. Theoretically desirable conditions for CGFs are as follows: (i) c(0) = 0, c(oo) = 1 and 0 < c(t) < 1 for all t > 0. (ii) c(t) is concave, (iii) c(t) produces C]_(t) as a special case. Condition (i) is trivial. We thus explain conditions (ii) and (iii). Coverage grows only when constructs in M — Mc(t) are executed. As the testing progresses, Mc{t) expands and M — Mc(t) shrinks. Therefore the coverage growth increment during [t,t + dt) is not, on average, likely to be greater than the coverage growth increment before t. This implies that dc(t)/dt is decreasing in testing time and that c(t) is concave. CGF ci(t) is the CGF analytically derived for the uniform testing, in which all the constructs in M are equally likely to be executed. A CGF for arbitrary testing is therefore

438

to produce c\{t) as a special case. CGFs cs(t) and ce(t) do not satisfy condition (i). Since cj(t) and cg(i) are proposed for multi-phase testing, they are not forced to meet condition (i). If we subject cj(t) and cg(i) to condition (i), c?(t) becomes identical to ci(i) and c8(t) is simplified to (l - e ^ ' ) 2 / (l + e" 2 ^*). CGFs c 3 (t) and c$(t) are not concave. CGF Ci{t) does not produce c\(t) as a special case. Thus Cj(i) for i = 2,9,10 satisfy all the three conditions and will be investigated further. 3. I n t r o d u c t i o n of M a x i m u m Achievable Coverage Generally 100% coverage can be rarely achieved because of the presence of infeasible constructs and constructs with extremely small execution probability. The subsume relation between coverage metrics also supports the necessity of an upper bound for some coverage metrics. Three CGFs for i = 2,9,10 in Table 1 are thus modified to meet this practical observation. Ci{t) = cmax a(t) for i = 2,9,10,

(5)

where cmax is the maximum achievable coverage. In fact, cg(i) is the CGF proposed by Yamamoto et al. 12 . 4. Empirical Performance Evaluation of CGFs Three data sets, DS1-DS3, are used for the empirical performance evaluation. DS1 is the data set collected by Gokhale and Mullen 1 . It consists of block coverage values measured at 20 different testing times by applying 10 different sequences of 735 test cases to SHARPE. And average of 10 sequences is also reported. The last data obeserved at the end of testing is only used for model validation and the other data are used for model fitting. This approach is also applied to DS2 and DS3. The least squares method is thus used for fitting because all the CGFs in Table 1 but c\o{t) have no distributional assumption. SSE and MSE are the natural performance criteria for the least squares method. We present MSE because MSE takes the number of parameters into account. Testing time usually has a wide range, whereas coverage takes values between 0 and 1. This makes the fitted CGFs so close to each other that sometimes they are indistinguishable in figures. It also makes the fitted CGFs look very close to the data. In order to amplify the difference between the fitted CGFs and the observed data, the logarithm of the testing time is plotted in figures. Figs. 1-2 show three CGFs fitted to average block coverage of 10 sequences and block coverage

439

log(t+1)

Figure 1.

Table 2. seq. no 1 2 3 4 5 6 7 8 9 10 avg.

last obs .930 .914 .919 .923 .918 .926 .914 .932 .924 .922 .921

C2(t), cg(t) and cio(t) fitted to average of 10 sequences of DS1.

Least square estimates and MSE of c.2(t) and cin(i) fitted to DS1. 02 (*) Cmax

.950 .925 .934 .931 .923 .928 .925 .945 .926 .924 .926

P .208 .147 .285 .253 .132 .123 .147 .216 .190 .152 .184

7 .443 .517 .405 .442 .558 .607 .517 .455 .526 .559 .502

MSE 9.047 4.213 7.979 19.493 6.362 9.190 4.213 5.831 16.189 5.810 1.161

last obs .928 .929 .930 .924 .933 .917 .929 .937 .942 .934 .938

Cmax

1.000 1.000 1.000 .963 .980 .928 1.000 1.000 .973 .961 .990

cic (*) (7 M 2.353 -3.322 2.080 -3.668 -2.909 2.638 2.199 -2.838 1.836 -3.576 -3.248 1.386 2.080 -3.668 2.319 -3.197 1.911 -3.073 1.743 -3.243 2.100 -3.290

MSE 7.781 3.697 6.048 14.683 8.586 6.763 3.638 10.201 17.747 8.198 1.303

of sequence 8. The last observation is also plotted in all the figures. It is evident that 62(f) and cio(f) perform significantly better than 69(f). (Even not shown in this paper, 69(f) does not work well for other sequences of DS1, DS2 and DS3.) Henceforth, only the results for 62(f) and 610(f) are presented. The block coverage value of the last observation at time f = 735 is 0.9351 for all the sequences. Table 2 presents the predicted value of the last observation, parameter estimates and MSE for DS1. (Actual MSE values are MSE values in Tables multiplied by 10~ 4 .) Both 62(f) and 610(f) show good fitness and prediction for DS1.

440

'''

^

fay

. B ^

52(t) •----

1

c,(0 c,„(t)

i

1

1

log(t+1)

Figure 2.

C2(t), cg(t) and cio(t) fitted to sequence 10 of DS1.

DS2, reported by Vouk 11 , is one of three data sets collected from a NASA supporting project implementing sensor management in an inertial navigating system. DS3 was collected by Pasquini et al. 7 from a configuration software for an array of antennas developed by European Space Agency. Both DS2 and DS3 contain values of block, branch, c-use and p-use coverages. The last observation of the 4 coverage metrics are respectively 0.960, 0.939, 0.923 and 0.846 at t = 792 for DS2 and 0.82, 0.70, 0.74 and 0.67 at t = 20000 for DS3. The estimation and prediction results are summarized in Tables 3-4 and Figs. 3-4. We again obtain the same conclusion about performance of C2(t) and Cio(i) as DS1. 5. Conclusions In this paper we compared the currently available CGFs theoretically and empirically. It has been found that C2(t) and cio(i) describe the coverage Table 3.

coverage block branch c-use p-use

last obs .981 .964 .963 .882

Least square estimates and MSE of C2W and cio(t) fitted to DS2.

C2

Crnax

1.000 1.000 1.000 1.000

/3 .744 .639 1.107 .431

(*) 7 .250 .247 .163 .240

MSE 11.239 7.104 4.081 1.145

last obs .965 .943 .953 .850

ClC

Cmax

1.000 1.000 1.000 1.000

M -0.379 -0.914 1.751 -2.458

M IT

3.618 3.832 5.252 4.445

MSE 11.599 11.767 4.959 3.258

441

log(H1)

Figure 3.

Table 4.

coverage block branch c-use p-use

last obs .800 .682 .730 .670

C2(t) and cio(t) fitted to the c-use coverage of DS2.

Least square estimates and MSE of C2(t) and cio(i) fitted to DS3.

C2

Cmax

.800 .682 .730 .670

P .603 .378 .484 .437

(*) 7 .356 .426 .412 .362

MSE 4.623 2.072 4.670 2.271

last obs .814 .702 .744 .699

C-max

.814 .702 .744 .699

cic (<)
MSE 4.653 2.737 4.971 3.014

growth behavior well. Therefore C2(t) and cio(t) should be considered primarily for t h e coverage-based N H P P SRGMs. N H P P SRGMs based on other C G F s are t o be applied with caution. Another important problem is whether frameworks for coverage-based N H P P SRGMs such as Eqs. (l)-(4) adequately represent t h e fault detection phenomenon. This problem needs t o b e investigated further.

References 1. S. S. Gokhale and R. E. Mullen, Prom Test Count to Code Coverage Using the Lognormal Failure Rate, Proc. 15th ISSRE, St. Malo, Bretagne (2004). 2. S. S. Gokhale, T. Philip, P. N. Marinos, and K. S. Trivedi, Unification of Finite Failure Non-Homogeneous Poisson Process Models Through Test Coverage, Proc. 7th IEEE Int. Sym. Soft. Rel. Eng., White Plains, New York, pp. 299-307 (1996).

---

r

"~ i

0

1

r2

c 2 (t) Sio(t)

r

i

I

I

I

3

4

5

6

7

log(t+1)

Figure 4. C2(i) and cio(i) fitted to the branch coverage of DS3. 3. S. S. Gokhale, and K. S. Trivedi, A Time/Structure Based Software Reliability Model, Ann. Soft. Eng. 8, pp. 85-121 (1999). 4. M. Grottke, A Vector Morkov Model for Structural Coverage Growth and the Number of Failure Occurrences, Proc. 13th ISSRE, Annapolis, Maryland, pp. 304-315 (2002). 5. Y. K. Malaiya, M. N. Li, J. M. Bieman, and R. Karcich, Software Reliability Growth and Test Coverage, IEEE Trans. Rel. 5 1 , pp. 420-426 (2002). 6. J.-Y. Park, Y.-S. Hwang, and T. Fujiwara, Integration of Imperfect Debugging in General Testing-Domain Dependent NHPP SRGM, Int. J. Rel., Quality and Safety Eng. 12, pp. 493-505 (2005). 7. A. Pasquini, A. N. Crespo, and P. Matrella, Sensitivity of Reliability-Growth Models to Operational Profile Errors vs Testing Accuracy, IEEE Trans. Rel. 45, pp. 531-540 (1996). 8. H. Pham, and X. Zhang, NHPP Software Reliability and Cost Models with Testing Coverage, Eur. J. Op. Res. 145, pp. 443-454 (2003). 9. P. Piwowarski, M. Ohba and J. Caruso, Coverage Measure Experience During Function Test, Proc. 15th IEEE Int. Conf. Soft. Eng., Baltimore, MD, pp. 287-301 (1993). 10. S. Sedigh-Ali, A. Ghafoor, and R. A. Paul, Temporal Modeling of Software Test Coverage, Proc. 26th Ann. Int. Com. Soft, and App. Conf, Orlando, Florida, pp. 823-828 (2002). 11. M. A. Vouk, Using Reliability Models During Testing with Nonoperational Profile, Proc. 2nd Bellcore/Purdue Workshop on Issues in Soft. Rel. Est., pp. 103-111 (1992). 12. T. Yamamoto, S. Inoue, and S. Yamada, A Software Reliability Growth Model with Testing-Coverage Maturity Process, Proc. 10th ISSAT Int. Conf, Las Vegas, NV, pp. 299-303 (2004).

ESTIMATING T H E OPTIMAL S O F T W A R E REJUVENATION SCHEDULE W I T H SMALL S A M P L E DATA*

K. R I N S A K A A N D T . D O H I Department of Information Engineering, Graduate School of Engineering, Hiroshima University, Higashi-Hiroshima, Japan E-mail: {rinsaka, dohi}@rel.hiroshima-u.ac.jp

In this paper, we consider the optimal software rejuvenation schedule which maximizes the steady-state system availability. We develop a statistical algorithm to improve the estimation accuracy in the situation where a small number of failure time d a t a is obtained. More precisely, based on the kernel density estimation, we estimate the underlying failure time distribution from the sample data. We propose the framework based on the kernel density estimation to estimate the optimal software rejuvenation schedule from small sample data. In simulation experiments, we show the improvement in the convergence speed to the real optimal solution in comparison with the conventional algorithm.

1. Introduction Present day applications impose stringent requirements in terms of software dependability since in many cases the consequences of software failure can lead to a huge economic loss or risk to human life. However, these requirements are very difficult to design for and guarantee, particularly in applications of non-trivial complexity. In recent years, considerable attention has been devoted to continuously running software systems whose performance characteristics are smoothly degrading in time. When software application executes continuously for long periods of time, some of the faults cause software to age due to the error conditions that accrue with time and/or load. This phenomenon is called software aging and can be observed in many real software systems 1,2 ' 3 . "The present research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B); Grant No. 16310116 (20042006) and Grant-in-Aid for Young Scientists (B); Grant No. 18710145 (2006-2007).

443

444

Common experience suggests that most software failures are transient in nature 3 . Since transient failures will disappear if the operation is retried later in slightly different context, it is difficult to characterize their root origin. Therefore, the residual faults have to be tolerated in the operational phase. A novel approach to handle transient software failures is called software rejuvenation which can be regarded as a preventive and proactive solution that is particularly useful for counteracting the phenomenon of software aging. It involves stopping the running software occasionally, cleaning its internal state and restarting it. Cleaning the internal state of software might involve garbage collection, flushing operating system kernel tables, reinitializing internal data structures, etc. An extreme, but well known example of rejuvenation which has been around as long as computers themselves is a hardware reboot. Huang et al.4 consider a continuous time Markov chain (CTMC) with four states, i.e., initial robust (clean), failure probable, rejuvenation and failure states. They evaluate both the unavailability and the operating cost in steady state under the random software rejuvenation schedule. Dohi et al.5'6 extend the result of Huang et al.4 and propose the software rejuvenation model based on a semi-Markov process. They propose non-parametric estimation algorithms based on the empirical distribution to obtain the optimal software rejuvenation schedule (OSRS) from the complete sample of failure time data. If a lot of sample of failure time data can be obtained, then with probability 1, the estimate of OSRS based on Dohi et al.'s5'6 algorithm asymptotically converges on the real optimal solution. Hence, the non-parametric estimation algorithms are useful to realize an adaptive control in practice. When the adaptive control is carried out, it is important to obtain more accurate solution in the situation where only fewer failure time data are obtained. The aim of this paper is to improve the estimation accuracy of the OSRS for the semi-Markov model proposed by Dohi et al.5'6. More precisely, we propose a new non-parametric estimation algorithm based on the kernel density estimation 7 ' 8,9,10 to obtain the OSRS which maximizes the steadystate system availability. In simulation experiments, we check the effect of improvement in terms of the convergence speed to the real optimal solution in comparison with the conventional algorithm.

445

2. Semi-Markov Model In this section, we introduce the software rejuvenation model proposed by Dohi et al.5'6 which is an extension of CTMC model by Huang et al.i. The model based on semi-Markov process has following four states: State State State State

0: 1: 2: 3:

highly robust state (normal operation state) failure probable state failure state software rejuvenation state.

Here, State 1 means that the memory leakage is over a threshold or the system lapses from the highly robust state into an unstable state. Let Z be the random time interval when the highly robust state changes to the failure probable state, having the common distribution function Pr{Z 0). Just after the state becomes the failure probable state, system failure may occur with positive probability. Without loss of generality, we assume that the random variable Z is observable during the system operation 4 ' 11 . Let X denote the failure time from State 1, having the distribution function Pr{X 0). If the system failure occurs before triggering a software rejuvenation, then the recovery operation starts immediately at that time. Otherwise the software rejuvenation starts. Let Y be the random repair time from the failure state, having the common distribution function Pr{Y < t} = Fa(t) with finite mean \ia (> 0). Note that the software rejuvenation cycle is measured from the time instant just after the system enters State 1 from State 0. Denote the distribution function of the time to invoke the software rejuvenation and the distribution of the time to complete software rejuvenation by Fr(t) and Fc(t) (with mean fic (> 0)), respectively. After completing the repair or the rejuvenation, the software system becomes as good as new, and the software age is initiated at the beginning of the next highly robust state. Consequently, we define the time interval from the beginning of the system operation to the next one as one cycle, and the same cycle repeats again and again. It is noted that all the states in State 0 ~ State 3 are regeneration points. The transition diagram is depicted in Fig.l. If we consider the time to software rejuvenation time as a constant to, then it follows that

Fr(t) = U(t - to) = { J

* ' " t0

(1)

[0 otherwise, where [/(•) is the unit step function. We call to (> 0) the software rejuvenation schedule in this paper. Hence, the underlying stochastic process

446

completion of 7* ][ ' \ repair / \

system failure Figure 1.

completion of rejuvenation

— rejuvenation

Semi-Markovian diagram.

is a semi-Markov process with four regeneration states. Note that under the assumption that the sojourn times in all states are exponentially distributed, this model is reduced to Huang et o/.'s CTMS model 4 . Applying the standard technique of semi-Markov processes5, the steady-state system availability becomes: A(to) — Pr{software system is operative in the steady state}

no + tfFfWdt

=

MO + liaFffo)

+ HcFffa)

+ /„*"

Ff(t)dt'

(2)

where in general F(-) = 1 — F(-). 3. The T T T Concept To derive the optima! software rejuvenation schedule (OSRS) on the graph, we define the scaled total time on test (TTT) transform 12 of the failure time distribution: 1

r - f /r_F1t•)_ (p)_

4>{ = T —- // V) = 4>{P) A A / J{ Jo

Ff(t)dt,

(3)

where Ff1(p)

= M{t0;Ff(t0)>p},

0
(4)

12

It is well known that Ff(t) is IFR (DFR) if and only if <j>(p) is concave (convex) o n p e [0,1]. Dohi et al.5 show the following result. Theorem 3.1. Obtaining the OSRS, t^, maximizing the steady-state system availability A(to) is equivalent to obtaining p* (0 < p* < 1) such as max O
where a = Ho/\f

{p) + a , p + r]

and r\ = /z c / (fia — fj,c).

,z\ (5)

447

Theorem 3.1 can be obtained by transforming A(to) to a function of p by means of p = Ff(t). If the failure time distribution Ff(t) is known, then the OSRS can be obtained from Theorem 3.1 by tj = Ff1(P*)Here, p* (0 < p* < 1) is given by the x coordinate value p* for the point of the curve with the largest slope among the line pieces drawn from the point (-77,-a) e (-oo,0) x (-oo,0) to the curve (p,
»=1

where h (> 0) is the window width, called the smoothing parameter or bandwidth. The function K(-) is called the kernel function which satisfies the condition:

f K(t)dt = 1,

J tK{t)dt = 0,

f t2K(t)dt = r2 £ 0.

(7)

Usually, but not always, the function K(-) will be selected as a symmetric probability density function. For typical examples of the kernel function, the reader is referred to Silverman 10 . Now, in order to estimate the OSRS from the failure time data, we define the estimator of the scaled total time on test transform by 4>KDE(P)

J-f 1

F/"1(p)_

= j - I

Ff(t)dt,

(8)

Xfn Jo an where Xfn = J2]=\xj/n d Ff(t) = J0 ff(s)ds. The following theorem on the OSRS is obtained as the direct application of the result in Theorem 3.1. Theorem 4 . 1 . It is assumed that n complete data xi,X2,---,xn on the failure time are observed. A non-parametric estimate i^ of the OSRS maximizing the steady-state system availability is given by ijjj = FT1^*) satisfying the following maximization problem: max 0
KDE(P) + p + T]

an

,

(9)

448

where an = /x 0 /A/„. When we utilize the kernel method, the problem of choosing the design parameter h is of crucial importance. Duin 13 and Habbema et al.14 consider choosing an ideal value of h satisfying the following the maximum likelihood criterion: 1

n

n

fc=i

= -J2^&fnk(xk), max:L(h) j h>o

' '

(10)

where

'-^-(^sjS'rrO-

(11)

5. Simulation Experiments Of our interest in this section is the investigation of asymptotic properties and convergence speed of estimators proposed in this paper. Suppose that the failure time obeys the Weibull distribution: Ff(t) = 1 - e-W

(12)

with the shape parameter 7 = 4.0 and the scale parameter 6 — 0.9. The other parameters are fixed as /io = 2.0, /j,a = 0.04 and \xc = 0.03. Through the TTT transform, the OSRS can be derived as t% = 0.5870. In the following, the Epanechnikov's kernel function15 K[t)

~\0

otherwise

[U)

is used to estimate the density function of failure time. Let us consider the estimation of an OSRS maximizing the steady-state system availability when the failure time data are already observed. It is assumed that the observed data consist of 20 pseudo random numbers generated from the Weibull distribution given in Eq.(12). For the 20 pseudo random numbers, we determine the window size as h* = 0.19502 by solving Eq.(10). In Fig.2, we present an estimation example of the OSRS maximizing the steady-state system availability based on the kernel density estimation from 20 data. The point providing the steepest slope among the line segments drawn from (-77, -a) = (-3.0, -2.4848) to the curve (p,
449 0KDEW

1.2 1

—i

1

1

1

Kernel density Empirical distribution real optimal

0.8 0.6 0.4 20

40 60 no.data

80

100

2.4848

Figure 2. Estimation of the OSRS based on the kernel density estimation.

Figure 3. Asymptotic behavior of estimate of the OSRS.

Next, let us study the asymptotic behavior of two non-parametric estimation algorithms, namely, the empirical distribution 5,6 and the kernel density estimation. Monte Carlo simulations are carried out with pseudo random numbers based on the Weibull distribution given in Eq.(12), in order to investigate the convergence toward the real optimal solution. Figure 3 reveals the asymptotic behavior of the OSRS. It is found that the results converge to the real optimal solution when the number of failure time data is close to 20. Figure 4 shows the mean square error of estimates of the OSRS which are obtained by carrying out the 100 Monte Carlo simulations. For small sample data, we can observe that the convergence speed of the OSRS estimated by the kernel density estimation is faster than by the empirical distribution. Form these results, we conclude that the statistical algorithm based on the kernel density estimation can be recommended to estimate the OSRS especially for the small sample problem. References 1. E. Adams, Optimizing preventive service of the software products, IBM J. Research and Development 28, 2-14 (1984). 2. V. Castelli, R. E. Harper, P. Heidelberger, S. W. Hunter, K. S. Trivedi, V. Vaidyanathan, and W. P. Zeggert, Proactive management of software aging, IBM J. Research & Development 45, 311-332 (2001). 3. J. Gray and D. P. Siewiorek, High-availability computer systems, IEEE Cornput. 9, 39-48 (1991). 4. Y. Huang, C. Kintala, N. Kolettis, and N. D. Funton, Software rejuvenation: analysis, module and applications, Proc. 25th IEEE Int'l Symp. Fault Tolerant Computing 381-390, IEEE Computer Society Press, Los Alamitos, CA

450 0.08

Kernel density —•-»••— x Empirical distribution -—><•—

0.06 w

0.04 0.02

0

0

10

20

30

no.data Figure 4. Mean square error of estimate of the OSRS. (1995). 5. T. Dohi, K. Goseva-Popstojanova, and K. S. Trivedi, Statistical nonparametric algorithms to estimate the optimal software rejuvenation schedule, Proc. 2000 Pacific Rim Int'l Symp. on Dependable Computing 77-84, IEEE Computer Society Press, Los Alamitos, CA (2000). 6. T. Dohi, K. Goseva-Popstojanova, and K. S. Trivedi, Analysis of software cost models with rejuvenation, Proc. 5th IEEE Int'l Symp. High Assurance Systems Engineering 25-34, IEEE Computer Society Press, Los Alamitos, CA (2000). 7. E. Parzen, On the estimation of a probability density function and the mode, Annals of Mathematical Statistics 33, 1065-1076 (1962). 8. M. Rosenblatt, Remarks on some nonparametric estimates of a density function, Annals of Mathematical Statistics 27, 832-837 (1956). 9. T. Cacoullos, Estimation of a multivariate density, Annals of the Institute of Statistical Mathematics 18, 178-189 (1966). 10. B. W. Silverman, Density Estimation for Statistics and Data Analysis Chapman and Hall, London (1986). 11. S. Garg, Y. Huang, C. Kintala, and K. S. Trivedi, Time and load based software rejuvenation: policy, evaluation and optimality, Proc. 1st FaultTolerant Symp. 22-25 (1995). 12. R. E. Barlow and R. Campo, Total time on test processes and applications to failure data, Reliability and Fault Tree Analysis (eds. by R.E. Barlow, J. Fussell and N. D. Singpurwalla), 451-481, SIAM, Philadelphia (1975). 13. R. P.W. Duin, On the choice of smoothing parameters for Parzen estimators of probability density functions, IEEE Trnas. Comput. C-25, 1175-1179 (1976). 14. J. D. F. Habbema, J. Hermans, and K. van der Broek, A stepwise discrimination program using density estimation, Proc. Computational Statistics (ed. by G. Bruckman), 100-110, Physica Verlag, Vienna (1974). 15. V. A. Epanechnikov, Nonparametric estimation of a multidimensional probability density, Theory of Probability and Its Applications 14, 153-158 (1969).

I N C O R P O R A T I N G D Y N A M I C S O F T W A R E METRICS DATA IN SOFTWARE RELIABILITY ASSESSMENT*

K. SHIBATA, K. RINSAKA AND T. DOHI Department of Information Engineering, Graduate School of Engineering, Hiroshima University, Higashi-Hiroshima, Japan E-mail: {kazuya, rinsaka, dohi}Qrel.hiroshima-u.ac.jp

The black-box approach based on stochastic software reliability models is a simple methodology with only software fault data in order to describe the temporal behavior of fault-detection processes, but fails to incorporate some significant metrics data observed in the testing process. In this paper we develop a proportional intensity-based software reliability models with time-dependent metrics, and propose a statistical framework to assess the software reliability with the timedependent covariate as well as the software fault data. The resulting model is similar to the usual discrete proportional hazard model, but possesses somewhat different covariate structure from it. We compare three metrics-based software reliability models with some typical non-homogeneous Poisson process models, which are the special cases of our models, and evaluate quantitatively the goodness-of-fit from the viewpoint of information criteria. As an important result, the accuracy on reliability assessment strongly depends on the kind of software metrics data used for analysis and can be improved by incorporating the time-dependent metrics d a t a in modeling.

1. Introduction During the last three decades, the stochastic models, called software reliability models (SRMs) that analyze and explain the software fault-detection phenomena, have been extensively developed in the literature 1 ' 2 . In fact, till now, over 200 SRMs have been proposed from various mathematical standpoints. The classical and the most important SRMs may be the nonhomogeneous Poisson process (NHPP) models that have gained popularity for describing the stochastic behavior of the number of software faults de*The present research was partially supported by a Grant-in-Aid for Scientific Research from the Ministry of Education, Sports, Science and Culture of Japan under Grant Nos. 15651076 and 16310116.

451

452

tected in the testing phase. In other words, the black-box approach based on the NHPP-based SRMs is a simple methodology with only software fault data in order to describe the temporal behavior of fault-detection processes, but fails to incorporate some significant metrics data observed in the testing process. The stochastic approach to incorporate the software metrics and/or environmental factors is taken by Ascher 3 ' 4 , Bendell 5 , Evanco and Lacovara6 and Evanco 7 . They utilize the proportional hazard model (PHM) or equivalently Cox regression model 8 , and formulate the software faultdetection time distribution by regarding the time-series metrics data as the covariate 9 . Pham 2 develops an enhanced proportional hazard SRM based on a continuous-time Markov chain and considers a dynamic version of PHM. However, if the cumulative effect of software development/test effort reported in Musa 10 is considered for analysis, the above modeling approach based on the PHMs loses their validation because the covariate to represent the development effort usually consists of 0-1 binary values. In this paper, we develop a discrete proportional intensity-based SRM with time-dependent metrics. The most different point from the existing PHM is that this model is a dynamic SRM to describe the software reliability growth phenomenon. That is, we introduce the Cox regression to incorporate the time-dependent test metrics instead of the linear and nonlinear regression with white noise or the logistic regression, and still utilize the stochastic counting process to describe the cumulative number of faults detected in the software testing (note that the regression models in Amasaki et al.11 are static models in time). In that sense, the discrete proportional intensity model (DPIM) proposed here would possess both applicability to the actual software reliability assessment from the similarity to the discrete NHPP-based SRMs and flexibility to incorporate the time-dependent metrics data observed in the testing phase. 2. Time-Based Modeling Approach Let {N(i); i = 0,1,2, • • • } denote the number of software faults detected by time i in the software testing and be a stochastic counting process satisfying: (i) N(0) = 0 (ii) {N(i); i = 0,1,2, •••} has independent increments, i.e. for any discrete point of time i\,i2,-" >*fc (0 < i\ < ii < ••• < ik), k random variables, N(ii),N(i2) — N(ii), • • • ,N(ik) — N(ik-i), are statistically independent from each other.

453 (iii) For arbitrary, is and ik (0 < is < its', s < k), Pr{N(ik)

- N(is) = n}=

where H(i) = E[N(i)] in i (= 0,1,2, •••). cess in discrete time, homogeneous Poisson tion (pmf):

{ # ( * * ) - # & ) } " e x p { - [ / f fe) - H(i.)]},

(1)

is said the mean value function and is non-decreasing Under the assumptions (i)-(iii), the counting pro{N(i); i = 0,1,2, • • • } , is called the discrete nonprocess (DNHPP) having the probability mass func-

Pr{N(i) = n\ N(0) = 0} = ^p^exp{-H(i)},

n = 0,1,2,-•-.

(2)

Yamada et al.12 consider a geometric DNHPP-based SRM with mean value function H(i) = a[l-(l-b)i),

(3)

where H(oo) = a (> 0) is the mean initial number of faults contained in software program and b (> 0) is the fault detection rate per fault per test run. Okamura et al.13 propose unified modeling approaches for D-SRMs, and develop many types of D-SRMs by assuming the discrete distribution functions as the software fault-detection time distribution. 3. Discrete Proportional Intensity Model 3.1. Model

Description

In this section, we develop a novel SRM under the discrete time testing circumstance to incorporate the multiple testing-effort parameters. Suppose that I (> 1) kinds of software metrics data x» = (xn, • •• ,xu) (i = 1, 2, • • •) are available at each testing time i (= 0,1,2, • • •). It is also assumed that each metrics x^ depends on the cumulative testing time i and can be regarded as a function of time, say, Xj. In statistics, this type of parameter is called the time-dependent covariate9 and has been studied extensively in the context of Cox PHM. Lawless14 introduces the intensity function with the 0-1 binary valued covariate to express a time-inhomogeneous counting process. Similar to the continuous-time proportional intensity model (PIM) proposed by Lawless14, we assume the following discrete-time intensity function: A I ( t ; 0 ) j 9 | x i ) = Ao(*;e)s(x i ;/3),

(4)

454

where /3 = (/3i, • • • ,fii)T and 0 is unknown parameter. In Eq.(4), the function \o(i;0) (> 0) is called the baseline intensity function and is a function of only time. On the other hand, the function 0) is called the covariate function and is a function of the software time-series metrics X; and the coefficient parameter /3 = (/?i, • • • , A ) T - Similar to the usual Cox's PHM, an appropriate choice of the covariate function would be given by the following exponential form: g{*i;0) = exp(xj/3).

(5)

Actually this form is well known to be convenient for analysis and to be rather flexible in order to express the covariate structure in many applications 8 ' 9 . Lawless14 also assumes the above exponential covariate structure and analyzes the real statistical data in medical applications. However, it is worth noting that the covariate considered by Lawless14 is the 0-1 binary value and does not deal with the cumulative value like test execution time (CPU hr), etc. In other words, a new modeling framework is needed for analysis of time-series metrics data. The simplest but reasonable model is to take account of an effect of the cumulative number of faults in an expression of the mean value function. Suppose: i

Hp(i; 9,/3) = J2 Mk; 9) exp (x*/3),

(6)

where H0(i) = J2]c=i^o(k;0). Note that when f3j = 0 for all j (= 1,2, • • • , /), the above DPIM can be reduced to the existing DNHPP-based SRM 12 - 13 . 3.2. Maximum

Likelihood

Estimation

Both the model parameters 6 and (3 can be estimated by a method of maximum likelihood. Suppose that n sets of fault-detection data (i,yi) (i = 0,1, 2, • • • , n) and I x n software metrics data x^ = (xn,- • • , xu) are observed at testing time interval (0, i], where, y, is the cumulative number of detected software faults and (xn, • • • ,xu) are / kinds of software metrics data consumed until time i. Under the above assumptions and the property of independent increment of the DNHPP, the likelihood function for DPIM with mean value function Hp{i) is given by L(6,(3) = Pr{iV(l) = yuN(2) = e x p [ - / / > ; 0, (3)] n

= y2, • • • ,N(n) ^

-

yn} ^

, (7)

455 where, (0, y0) = (0,0) and x0j = 0 (j = 1,2, • • • , I). Taking the logarithm of both sides of Eq.(7), we have n

LLF(9,/3) = Y,(Vi " i/i-i) HHP(i; 0, (3) - Hp(i - 1; 9, f3)}

(8)

2= 1

n

-Hp(n;0,P)-J2H{Vi-Vi-i)1\i=l

Note that the maximum likelihood estimates. (9,(3) can be obtained via the direct maximization of the logarithmic likelihood LLF(9,(3). 4. Numerical Examples In this section, we focus on a real data set collected in the actual software development project for the real time command and control system 10 . This data set observed during 14 weeks contains 38 faults count and three metrics data; test execution time (CPU hr), failure identification work (person hr) and computer time-failure identification (CPU hr). Using this data set, we perform the goodness-of-fit test of DPIM, and evaluate the predictive performance. 4 . 1 . Goodness-of-Fit

Test

For DPIM, we assume three kinds of discrete probability distribution function: geometric \o(i;a,b) = ab(l — 6) l _ 1 , negative binomial (order 2) \0(i;a,b) = aib2(l - 6) l _ 1 , discrete Weibull (order 2) X0(i;a,b) = a[6( ,_1 ) — bl ]. We calculate the maximum likelihood estimates (9,J3) = (a, b, $i,02,(3?) and derive the corresponding log likelihood (LLF), mean square error (MSE), Akaike's information criterion (AIC) and Bayesian information criterion (BIC), where A I C - - 2 L L F ( 0 , / 9 ) + 27r,

(9)

BIC = - 2 L L F ( £ , / 3 ) - ^ l n ^ .

(10)

In Eqs.(9) and (10) 7r and tp are the number of free parameters and the number of data used for analysis, respectively. In the analysis with this data set, e.g., DPIM with geometric baseline intensity and three metrics data is reduced to the simpler version with execution time data, because the coefficients become (3% = 0 and (3$ = 0. This implies that the best SRM is statistically independent of both failure

456 Table 1.

Goodness-of-fit results.

DPIM

LLF

MSE

AIC

BIC

Geometric Negative Binomial Discrete Weibull

-21.57 -20.07 -24.95

1.68 0.71 7.39

49.15 46.13 57.91

53.15 50.13 64.86

DNHPP

LLF

MSE

AIC

BIC

Geometric Negative Binomial Discrete Weibull

-29.38 -30.60 -37.29

3.22 7.52 21.64

62.76 65.20 78.59

64.03 66.48 79.87

identification work and computer time-failure identification, but strongly depends on the test execution time. Table 1 presents the goodness-of-fit results for DPIM and DNHPP, and Fig.l reveals the behavior of mean value functions of DPIM and DNHPP. It can be seen that DPIM with negative binomial baseline intensity with order 2 can provide the best performance in terms of all goodness-of-fit measures. This result tells us that DPIM based on negative binomial baseline intensity function fits to this data best, and that DPIM with covariate structure can catch the software fault detection process better than the existing SRMs under the same data condition. 4.2. Predictive

Performance

Next, we investigate the predictive ability of DPIM. On the observation point n' (1 < n' < n) when 50%, 75% or 90% of all data is available, we predict the future behavior of the cumulative number of faults. To assess the predictive ability, we apply the prediction logarithmic likelihood (PLL) and the prediction square error (PSE) as prediction performance measures, where n

PLL = Y,

(Vk - Vk-i) HHP(k; 0,0) - Hp(k - 1; 0,0)] - Hp(n; 0,0) n

+Hp{n;0,0)-

Y,

HiVk-Vk-iV],

(11)

k=n+l 1

n

PSE= n — 7n T £

-\2

\yk-Hp(k;0,0)

.

(12)

n — n k-n+l . —' L J It is obvious that the larger (smaller) PLL (PSE) indicates the better predictive performance.

457

observed data • unobserved data o / / DPIM(given, Negative binomial) / ' DPIM{predicted, Negative binomial) DNHPP(Negative binomial) 6

8

10

12

14

10

12

14

Figure 1. Behavior of mean value func- Figure 2. Behavior of Predicted mean tions (100%). value functions (90%). In order t o predict t h e future behavior of software fault-detection process, it should be noted t h a t estimates of test metrics are required, since the mean value function of D P I M depends on t h e covariate. In practice, two cases are possible t o consider. (i) All t h e testing effort parameters (data) are completely known and fixed in advance, so t h a t t h e software testing expenditures are given before testing, (ii) T h e testing effort p a r a m e t e r s experienced in future are random variables and can b e predicted. Especially, in t h e second case, we need t o introduce additional probability models on testing effort parameters. In this paper, we m a d e t h e simplest assumption and employed t h e linear regression m e t h o d to predict t h e future testing effort d a t a . Table 2 presents the prediction results from 50%, 75% and 90% observation points. Prom this table, it is seen t h a t D P I M can still outperform t h e existing SRMs in almost all cases, b u t can not do always in every case. Figure 2 illustrates the behavior of predicted mean value functions from 90% observation point. In the prediction of fut u r e debug p a t t e r n , t h e information on fixed testing effort expenditures is not necessarily needed, because t h e change of process m a y h a p p e n in future and its prediction is not incorporated in the prediction model.

Refere nces 1- J. D. Musa, A. Iannino, and K. Okumoto, Software Reliability Measurement, Prediction, Application, McGraw-Hill, New York (1987). 2. H. Pham, Software Reliability, Springer-Verlag, London (2000). 3. H. Ascher. Proportional hazards modelling of software failure data, Software

458 Table 2. PLL and PSE. 50% ( 7 / 1 4 ) PSE PLL

7 5 % (10/14) PLL PSE

90% (12/14) PLL PSE

D P I M (Geometric, given)

-408.23

68020.00

-7.31

11.64

-3.34

D P I M (Geometric, predicted)

-19.07

43.32

-12.51

27.37

-4.62

5.06

D P I M (Negative Binomial, given)

-54.74

2162.17

-5.87

8.74

-1.65

0.49

D P I M (Negative Binomial, predicted)

-30.15

69.18

-26.46

39.78

-10.42

8.56

D P I M (Discrete Weibull, given)

-33.04

38.66

-24.80

38.64

-8.35

8.30

D P I M (Discrete Weibull, predicted)

-86.96

94.06

-62.19

42.19

-13.67

8.87

D N H P P (Geometric)

-19.48

43.96

-8.91

16.30

-3.46

1.58

D N H P P (Negative Binomial)

-32.61

72.40

-12.93

28.26

-4.00

3.95

-105.99

96.16

-25.47

38.97

-5.40

6.47

D N H P P (Discrete Weibull)

4.

5.

6. 7.

8. 9. 10. 11.

12.

13.

14.

3.90

Reliability State of the Art Report (A. Bendell and P. Mellor, eds.), 229-263 (1986). H. Ascher, The use of regression techniques for matching reliability models to the real world, Software System Design Methods, NATO ASI Series (J. K. Skwirzynski, ed.), F 2 2 , 366-378 (1986). A. Bendell, The use of exploratory data analysis techniques for software reliability assessment and prediction, Software System Design Methods, NATO ASI Series (J. K. Skwirzynski, ed.), F 2 2 , 337-351 (1986). W. M. Evanco and R. Lacovara, A model-based framework for the integration of software metrics, Journal of Systems and Software, 26, 75-84 (1995). W. M. Evanco, Using a proportional hazards model to analyze software reliability, In Proc. 9th Int'l Conf. Software Technology & Engineering Practice, 134-141, IEEE CS Press (1999). D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society, B-34, 187-220 (1972). S. Murphy and P. Sen, Time-dependent coefficients in a Cox type regression model, Stochastic Processes and Their Applications, 39, 153-180 (1991). J. D. Musa, Software Reliability Data, Technical Report, Data and Analysis Center for Software, Rome Air Development Center, New York (1979). S. Amasaki, T. Yoshitomi, O. Mizuno, Y. Takagi, and T. Kikuno, A new challenge for applying time series metrics data to software quality estimation, Software Quality Journal, 13, 177-193 (2005). S. Yamada, S. Osaki, and H. Narihisa, Software reliability growth modeling with number of test runs, Transactions of The IECE of Japan, E67, 79-83 (1984). H. Okamura, A. Murayama, and T. Dohi, EM algorithm for discrete software reliability models: a unified parameter estimation method, Proc. 8th IEEE Int'l Sympo. on High Assurance Systems Eng., 219-228 (2004). J. F. L. Lawless, Regression methods for Poisson process data, Journal of the American Statistical Association, 82, 808-815 (1987).

A U S E R - O R I E N T E D RELIABILITY A S S E S S M E N T M E T H O D FOR O P E N SOURCE S O F T W A R E

YOSHINOBU TAMURA Department of Computer Science, Faculty of Applied Information Science, Hiroshima Institute of Technology, Miyake 2-1-1, Saeki-ku, Hiroshima-shi, 731-5193 Japan E-mail: [email protected] SHIGERU YAMADA Department of Social Systems Engineering, Faculty of Engineering, Tottori University, Minami 4-101, Koyama, Tottori-shi, 680-8552 Japan E-mail: [email protected] Software development environment has been changing into new development paradigms such as concurrent distributed development environment and the socalled open source project by using network computing technologies. Especially, an OSS (open source software) system which serve as key components of critical infrastructures in the society are still ever-expanding now. In case of considering the effect of the debugging process on an entire system in the development of a method of reliability assessment for the OSS, it is necessary to grasp the deeply-intertwined factors, such as programming path, size of each component, skill of fault reporter, and so on. In order to consider the effect of each software component on the reliability of an entire system, we propose a new approach to user-oriented software reliability assessment by creating a fusion of neural network and software reliability growth model. In this paper, we show application examples of user-oriented software reliability assessment based on neural network and software reliability growth model for the OSS. Also, we analyze actual software fault count data to show numerical examples of software reliability assessment for the OSS.

1. Introduction An OSS (open source software) system is frequently applied as server use, instead of client use. Such OSS systems which serve as key components of critical infrastructures in the society are still ever-expanding now. The open source project contains special features so-called software composition that the geographically-dispersed several components are developed in all 459

460

parts of the world. In this paper, we focus on OSS developed by using network computing technologies. Software reliability growth models (SRGM's) 1 have been applied to assess the reliability for quality management and testing-progress control of software development. On the other hand, the effective dynamic testing management method for new distributed development paradigm as typified by the open source project has only a few presented. In case of considering the effect of the debugging process on an entire system in the development of a method of reliability assessment for the OSS, it is necessary to grasp the deeply-intertwined factors, such as programming path, size of each component, skill of fault reporter, and so on 2 ' 3 ' 4 . In this paper, we focus on OSS developed under the open source project. We discuss a useful user-oriented software reliability assessment method in open source project as a typical case of new distributed development paradigm. 2. Level of Importance for Each Component 2.1. Interaction

among Software

Components

In case of considering the effect of debugging process on an entire system in the development of a software reliability assessment method for open source development paradigm, it is necessary to grasp the deeply-intertwined factors, such as programming path, size of each component, skill of fault reporter, and so on. In this paper, we propose a reliability assessment method based on the neural network in terms of estimating the effect of each component on the entire system in a complex situation. Especially, we consider that our method based on neural network is useful for OSS users to assess the software reliability by using the only data sets in bug tracking system on the website. Also, we can apply the importance level of faults detected during operating of each component, the size of component, the skill of fault reporter and so on, to the input data of neural network. 2.2. Weight Parameter Neural Network

for Each Component

Based

on

The structure of the neural networks in this section is shown in Figure 1. In Figure 1, w}j(i — 1,2, • • • , J; j = 1,2, • • • , J ) are the connection weights from i-th unit on the sensory layer to j - t h unit on the association layer,

461

Figure 1.

The structure of 3-layered neural networks in this paper.

w^k{j = 1,2, • • • , J; k = 1,2, • • • , K) denote the connection weights from j th unit on the association layer to fc-th unit on the response layer. Moreover, Xi(i = 1,2, • • • , /) represent the normalized input values of i-th unit on the sensory layer, and yk(k = 1,2, ••• , K) are the output values. We apply the values of the fault level, operating system, fault repairer, fault reporter, and so on, to the input values Xi(i = 1,2, • • • ,1). In Figure 1, the input-output rules of each unit on each layer are given by (l)

(2) where a logistic activation function /(•) which is widely-known as a sigmoid function given by the following equation: 1

/(*) = 1 + e~0x'

(3)

where 9 is the gain of sigmoid function. We apply the multi-layered neural networks by back-propagation in order to learn the interaction among software components 5 . We define the error function in Eq. (2) by the following equation: E

1

K

dkf

(4)

fc=i

where dk{k = 1, 2, • • • , K) are the target input values for the output values. We apply the normalized values of the total number of software faults for each component to the target input values dk(k = 1,2, ••• ,K) for the

462

output values, i.e., we consider the estimation and prediction model so that the property of the interaction among software components accumulates on the connection weights of neural networks. By using the parameter y^ derived from above mentioned method, we can obtain the total weight parameter p, which represents the level of importance for each component by using the following equation: *-*

K

•

v-v

& * i=i

3. Reliability Assessment for Entire System 3.1. Reliability SRGM's

Assessment

Based on

Conventional

Many SRGM's have been used as the conventional methods to assess software reliability for quality management and testing-process control of software development. Among others, NHPP models have been discussed in many literatures since the NHPP models can be easily applied in the software development. In this section, we discuss NHPP models for analyzing software fault-detection count data. Considering stochastic characteristics associated with fault-detection procedures in the testing-phase, we treat {N(t),t > 0} as a nonnegative counting process where random variable N(t) means the cumulative number of faults detected up to testing-time t. The fault-detection process {N(t),t > 0} is described as follows1: Pv{N(t) = n} = iW¥Lexp[-H

(*)]

(n = 0 , l , 2 , - - . ) .

(6)

In Eq. (6), Pr{A} means the probability of event A, and H(t) is called a mean value function which represents the expected cumulative number of faults detected in the time interval (0, t]. 3.2. Extended

Logarithmic

Poisson

Execution

Time

Model

The operating environment of OSS has the characteristics of the susceptible to various operational environments. Therefore, it is different from the conventional software system developed under the identical organization. Then, the expected number of detected faults continue to increase from the effect of the interaction among various operational environments, i.e., the number of detected faults can not converge to a finite value.

463

As mentioned above, we apply the logarithmic Poisson execution time model based on the assumption that the number of detected faults tends to infinity. Thus, we consider the following structure of the mean value function /j,(t) because an NHPP model is characterized by its mean value function:

(0<6,

0 < A0, 0 < P < 1),

(7)

where Ao is the intensity of initial inherent failure, and 9 the reduction rate of the failure intensity rate per inherent fault. Moreover, we assume that the parameter P in Eq. (7) represents the following average in terms of the parameter yt estimated by the neural network: P = YA=I yi/ni where n represents the number of software components 2 ' 3,4 . We can give several expressions as software reliability assessment measures derived from the NHPP models given by Eq. (7). 4. Numerical Examples 4.1. Level of Importance

for Each

Component

a6

We focus on the Thunderbird which is the OSS system developed under an open source project. Thunderbird has been developing under Mozillab project. The fault-detection data used in this paper are collected in the bug tracking system on the website. Estimating the weight parameter in terms of the reliability by using the neural network, the input data sets for OSS are the importance level of faults detected for each component (Critical), OS (operating system), the fault repairer (Assigned to), and the fault reporter (Reporter). The estimated results of the weight parameter Pi(i = 1,2, ••• ,n) for Thunderbird based on the neural network in Section 2.2 are shown in Table 1. Prom Table 1, we can grasp the level of importance in terms of reliability for each component. 4.2. Reliability

Assessment

for Entire

System

On the presupposition that the weight parameters for each component are estimated by using the neural network, we show numerical examples for a

Thunderbird and the Thunderbird logo are registered trademarks of the Mozilla Foundation. Mozilla and the Mozilla logo are registered trademarks of the Mozilla Foundation.

464 Table 1. The estimated weight parameter in Thunderbird. Component Name Account Manager Address Book Build Config General Help Documentation Installer Mail Window Front E n d Message Compose Window Migration Preferences RSS

Figure 2.

Weight p a r a m e t e r 0.0537 0.0487 0.0091 0.2377 0.0054 0.0138 0.4347 0.0951 0.0266 0.0425 0.0327

The testing period for each component in the actual data.

reliability assessment of OSS. The estimated numbers of detected faults of Thunderbird in Eq. (7), fl(t) are shown in Figure 2. From Figure 2, we can find that the number of detected faults at the 19 months (Jan. 2005) is 3013. 4.3. Performance

Assessment

of Our

Model

The logarithmic Poisson execution time model based on the assumption that the number of detected faults tends to infinity, i.e., the intensity of initial inherent failure decline exponentially with the increase in the number of software fault detected during the testing phase. In this paper, we propose the extended logarithmic Poisson execution time model. Our model is formulated as Eq. (7) by using the parameter P in terms of the effect of the interaction among software component. Case 1: 9 - P > 0 The intensity of initial inherent failure declines exponentially with the increase in the cumulative number of software failure. Case 2: 6 - P < 0

465

i

::ir:j:::::.:|::::::|:::::::

P=0.209 Filled P=0.205 P=0.204

/

i

s 6

=

- ^

P^ «s^~,, "s--,^- :: :!£¥±~b -v.'.v.::.:.;.; 0

Figure 3.

>

" •

1,

1O0O 2000 CUMULATIVE NUMBER OF DETECTED FAULTS

30O0

Dependence of parameter P in the intensity of initial inherent failure.

Under the assumption that (6 — P) > —1/X0t, the intensity of initial inherent failure grows exponentially with the increase in the cumulative number of software failure. In case of considering the development environment of OSS, we consider that the practical method is desirable to be used the model based on the assumption that the number of detected faults tends to infinity rather than the model based on the assumption that the number of detected faults tends to finite. Because frequently-used and popular OSS's have a tendency that the software faults are continually reported. Therefore, we consider the conventional SRGM cannot comprehend the reliability growth curve of OSS. In this section, we show some behavior of our model if we change the parameter P which represents the effect rate of the interaction among software component. We illustrate five curves of the intensity of initial inherent failure if we change the parameter P in Figure 3. In Figure 3, we find that our extended model can not only cover conventional logarithmic Poisson execution time model but also can assess software reliability under open source projects. We consider that our model is useful as the method of flexible reliability assessment according to the active state of OSS's. 5. Conclusion In this paper, we have focused on the Thunderbird which is known as the OSS, and discussed the method of reliability assessment for the OSS developed under on an open source project. Especially, we have applied on neural network in order to consider the effect of each software component on the reliability of an entire system under such open source development

466 paradigm. By using the neural network, we have proposed t h e method of reliability assessment incorporating t h e interaction among software components. T h e neural network and N H P P model applied in this paper have simple structures. Therefore, we can easily apply our m e t h o d t o actual open source software by rote. In case of considering t h e effect of debugging process on an entire syst e m in t h e development of software reliability assessment m e t h o d s for open source projects, it is necessary t o grasp t h e deeply-intertwined factors. In this paper, we have shown t h a t our method can grasp such deeplyintertwined factors by using the neural network. Especially, we consider t h a t our m e t h o d based on neural network is useful for OSS user t o assess the software reliability by using the d a t a sets in bug tracking system on the website.

Acknowledgments This work was supported in p a r t by t h e Grant-in-Aid for Scientific Research (C), G r a n t No. 18510124 and Young Scientists (B), G r a n t No. 17700039 from t h e Ministry of Education, Culture, Sports, Science, and Technology of J a p a n .

References 1. S. Yamada, Software Reliability Models: Fundamentals and Applications (in Japanese), JUSE Press, Tokyo (1994). 2. Y. Tamura, S. Yamada and M. Kimura, Reliability assessment method based on logarithmic Poisson execution time model for open source project, Proc. of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology, Novosibirsk, Russia, 54-59, Jun. (2005). 3. Y. Tamura and S. Yamada, Comparison of software reliability assessment methods for open source software, Proc. of the 11th IEEE International Conference on Parallel and Distributed Systems (ICPADS2005)-Volume II, Fukuoka, Japan, 488-492, Jul. (2005). 4. Y. Tamura and S. Yamada, Validation of an OSS reliability assessment method based on ANP and SRGM's, Proc. of the International Workshop on Recent Advances in Stochastic Operations Research, Canmore, Canada, 273-280, Aug. (2005). 5. E. D. Karnin, A simple procedure for pruning back-propagation trained neural networks, IEEE Trans. Neural Networks., 1, 239-242, Jun. (1990). 6. The Mozilla Thunderbird Mail Project, Thunderbird, http://www.mozilla.org/projects/thunderbird/

P E R F O R M A N C E ANALYSIS FOR S O F T W A R E S Y S T E M W I T H P R O C E S S I N G T I M E LIMIT B A S E D ON RELIABILITY G R O W T H MODEL

K. T O K U N O A N D S. Y A M A D A Department

of Social Systems Engineering, Faculty of Engineering Tottori University 4-101, Koyama, Tottori-shi, 680-8552, Japan E-mail: {toku, yamada} @sse.tottori-u.ac.jp

We propose the performance evaluation method for the multi-task system with software reliability growth process. The software fault-detection phenomenon in the dynamic environment is described by the Markovian software reliability model with imperfect debugging. We assume t h a t the cumulative number of tasks arriving at the system follows t h e homogeneous Poisson process. Then we can formulate the distribution of the number of tasks whose processes can be complete within a prespecified processing time limit with t h e infinite-server queueing model. Prom the model, several quantities for software performance measurement considering the real-time property can be derived. Finally, we present several numerical examples of t h e quantities to analyze t h e relationship between the software reliability characteristics and the system performance measurement.

1. Introduction The studies on performance evaluation methods for computing systems have much been discussed from the viewpoint of the hardware configuration. For example, Beaudry 1 has proposed the performance-related measures such as the computation availability and the mean computation between failures. Meyer2 has introduced the concept of performability. On the other hand, there exist few studies on the reliability-related performance evaluation from the viewpoint of software systems. Kimura et al. 3 ' 4 have discussed the evaluation methods of the real-time property for the N-version programming and the recovery block software systems; these are well-known as the methodologies of the fault-tolerant software systems. However, Kimura's studies have just applied the framework for analyzing from the aspect of the hardware configuration to the fault-tolerant software systems and have not included the characteristics peculiar to software systems such as the 467

468

reliability growth process. In this paper, we discuss the evaluation method of the real-time property for the software systems considering the reliability growth process; this is the different approach from Kimura's studies. The real-time property is defined as the attribute that the system can complete the task within the stipulated response time limit. 5 ' 6 We assume that the software system can process the plural tasks simultaneously. Then the software failure-occurrence phenomenon and the software reliability growth process in the dynamic environment are described by the Markovian software reliability model with imperfect debugging.7 The stochastic behavior of the number of tasks whose processes can be complete within the prespecified processing time limit is modeled with the infinite-server queueing model. 8 The organization of the paper is shown as follows. Section 2 defines the operating regulation of the system and states the software reliability growth model used in the paper. Section 3 analyzes the distribution of the number of tasks whose processes are complete within the processing time limit up to a given time point and derives several software performance measures from the model. Section 4 presents the numerical examples of the measures and examines the software performance analysis. In Section 5, we state the conclusion of the paper. 2. Model Description We make the following assumptions for system's task processing: AI-1. The number of tasks the system can process simultaneously is sufficiently large. AI-2. The process {N(t), t > 0} representing the number of tasks arriving at the system up to the time t follows the homogeneous Poisson process with the arrival rate 9. AI-3. The processing time of a task, Y, is distributed generally with the distribution function H(t) and each of the processing times is independent. AI-4. When the system causes a software failure before the processes of tasks are not complete or the processing times of tasks exceed the prespecified processing time limit T r , the corresponding tasks are canceled. Next, we make the following assumptions for describing the software reliability growth process:7

469

AII-1. The debugging activity is performed as soon as the software failure occurs. AII-2. The debugging activity for the fault having caused the corresponding software failure succeeds with the perfect debugging rate a (0 < a < 1), and fails with probability b(= 1 — a). One perfect debugging activity corrects and removes one fault from the system and improves software reliability. AII-3. When n faults have been corrected, the next software-failure timeinterval, Un, follows the exponential distribution with the hazard rate \n{9)\ this depends on the cumulative number of corrected faults and the arrival rate of the tasks. \n(6) is a non-increasing function of n. Let W(t) be a counting process whose state space is n = 0, 1, 2, ...; this represents the cumulative number of faults corrected up to time t. We should note that the number of corrected faults cannot be observed immediately and is not always same as that of software failures which can be easily observed. The probability mass function of W(t) is given by Pr{W(t) = n} = Gn(t) - Gn+1(t) = ^ g ,

(1)

where Gn(t) is the distribution function of the random variable Sn representing the time spent in correcting n faults, and given by Gn(t) = Pv{Sn < t} n-l

= J2 ^"(1 - e~aAi{e)t)

(t > 0; n = 1, 2,...; G0(t) = l(t))

i=0

(Al ^ 1

\

i

and gn(t) is the density function of Sn, i.e., gn(t) =

>,(2)

dGn(t)/dt.

3. Derivation of Software Performance Measures Figure 1 illustrates the configuration of the system's task processing. Let {Z(t\Tr),t > 0} be the random variable representing the cumulative number of tasks whose processes can be complete within the processing time

470 Processing Time Limit

[ # : process complete l o : process canceled

Processing Timer

Arrival Rate

»Time Software Failure Time Figure 1.

Configuration of task processing.

limit Tr out of the tasks arriving up to the time t. By conditioning with {N(t) = k}, we obtain the probability mass function of Z(t\Tr) as Pr{Z(t\Tr)

= j} = J2Pi{Z(t\Tr)

= j\N(t)

= k}e

fc=0

6t

{et)k k\ '

(3)

From Fig. 1, given that {W(£) = n } , the probability that the process of an arbitrary task is complete within the processing time limit Tr is given by Tr

f3n(Tr) = Pr{Y
Tr\W(t) = n} = / "" e-x"V*dH(y).

(4)

Jo

Furthermore, the arrival time of an arbitrary task out of ones arriving up to the time t is distributed uniformly over the time interval (0, t].s Therefore, the probability that the process of an arbitrary task having arrived up to the time t is complete within the processing time limit Tr is obtained as ,-t ° ° dx Tr) = / V Pr{l^(a;) = n) • Pr{Y < Un, Y < Tr\W(x) = n}

T

- t+ ^2 ^ „\ra\ a\n{9)

Pn lr

^ >-

(5)

471

Then from assumption AI-3, Pr{Z(t\Tr)=j\N(t)

= k} =

^k)\p(t\Tr)y[i-p(t\Tr)}k-j

(i = o,i,2,...,fc)

0

(6)

(j > k)

That is, given that {N(t) = k}, the number of tasks whose processes can be complete within the processing time limit Tr follows the binomial process with mean kp(t\Tr). Accordingly, from (3) the distribution of Z(t\Tr) is given by

Pr{Z(t\Tr) = *} = f; (*) WW

p(t)\k-ie-°^

-

et

= c-OtvMTr)[

P(t\Tr)}J

(?)

Equation (7) means that Z(t\Tr) follows the nonhomogeneous Poisson process with the mean value function 6tp(t\Tr). Based on the above analysis, we can obtain several measures for software performance evaluation considering the real-time property. The expected number of tasks completable out of the tasks arriving up to the time t and the instantaneous number of tasks completable at the time point t are given by A(t\Tr) HE E[Z(t\Tr)} = 6 f ; ^Mf3n(Tr),

(8)

respectively. Furthermore, the instantaneous task completion ratio is obtained as

MTr)s^mi/e=±^MTr),

(10)

which represents the ratio of the number of tasks completed within the processing time limit Tr to one arriving at the system per unit time at the time point t. As to p(t\Tr) in (5), we can give the following interpretations: PWlr)

E[N(t)}

•

U1)

That is, p(t\Tr) is the cumulative task completion ratio up to the time t.

472

4. Numerical Examples We present several numerical examples on software performance analysis based on the above measures. We refer to the form of the hazard rate Xn(6). We assume that the hazard rate depends on the number of corrected faults and the arrival rate of the tasks. Then it is appropriate that Xn(9) is a non-decreasing function of 8 satisfying \n{9) > An(0) for 9 > 0. Here we apply the following form to Xn(9): Xn(6) = c-9 + Xn

(c, A „ > 0 ) ,

(12)

where we call A„ the inherent hazard rate and c the influence factor of the arrival rate on the hazard rate. The classical hazard rate models in the software reliability models can be applied to the form of An. Here we apply the model of Moranda 9 to the inherent hazard rate in the numerical examples, i.e., Xn = Dvn (D > 0, 0 < v < 1). As to the distribution of the processing time, Y, we apply the gamma distribution of order 2, i.e., H(t) = 1 - (1 + at)e-at (t > 0; a > 0); this is unimodal and E[Y] = 2/a. Then (3n(Tr) is given by Pn{Tr)=

{xn(ff)

+ a) { W l + ( A „ ( 6 0 + ^ ] e - ^ > + ^ } . (13)

Figures 2 and 3 show the dependence of the instantaneous number of tasks completable, 7(i|T r ), in (9) and the instantaneous task completion ratio, fi(t\Tr), in (10) on the arrival rate, 6, respectively. Figure 2 tells us that j(t\Tr) becomes larger with the increasing 9. The larger 9 can lead the following two scenarios: (i) 7(t|T r ) becomes larger since the number of tasks arriving at the system per unit time increases, (ii) 7(t|T r ) becomes smaller since the possibility that the software failure will occur becomes stronger from the form of Xn{9). Figure 2 displays that the impact of the scenario (i) is greater than the scenario (ii) on 7(t|T r ). On the other hand, Fig. 3 shows the opposite tendency to 7(i|T r ), i.e., ^,{t\Tr) becomes larger with the decreasing 6. This reason is that /i(t|T r ) is the measure for per task arriving at the system, i.e., the scenario (ii) applies to the reason of this tendency. If we specify the objective of n(t\Tr), say no, then we can find the testing time t = t^ satisfying fi(t\Tr) = HQ. AS shown in Fig. 3, we can see that

473

Tr)

0.8 0.7

9=0.7 9=0.6

0.6 0.5

9=0.5 9=0.4 0=0.3

0.4 0.3 0.2

0

50

100

150

200

250

300

350

400

Time Figure 2. Dependence of y(t\Tr) 0.9, a = 0.9).

on 6 (Tr = 1.0, a = 4.0, c = 0.02, D = 0.02, v =

VWr)

0.904

0.896 0.894

0

50

Figure 3. Dependence of n{t\Tr) 0.9, a = 0.9).

100

150

200

250

300

350

400

on 6 (T r = 1.0, a - 4.0, c = 0.02, D = 0.02, v =

it takes longer time to satisfy the objective of n(t\Tr) with increase in the number of tasks to be processed per unit time. 5. Concluding Remarks In this paper, we have discussed the performance evaluation method for the software system with processing time limit, considering the software reliability growth process. Assuming that the cumulative number of the tasks arriving at the system up to a given time point follows the homogeneous Poisson process, we have analyzed the distribution of the number

474 of tasks whose processes can be complete w i t h t h e concept of t h e infiniteserver queueing model. From the model, we have derived several software performance measures considering t h e real-time property. We have also illustrated t h e several numerical examples of these measures t o show t h a t these measures are useful for software performance analysis. In particular, it has been meaningful t o correlate t h e real-time property evaluation with t h e software reliability characteristics.

Acknowledgments This work was supported in p a r t by Grants-in-Aid for Young Scientists (B) and Scientific Research (C) of t h e Ministry of Education, Culture, Sports, Science and Technology of J a p a n under Grant Nos. 16710114 and 15510129, respectively. References 1. M. D. Beaudry, Performance-related reliability measures for computing systems, IEEE Trans. Comput. C-27, 540 (1978). 2. J. F. Meyer, On evaluating the performability of degradable computing systems, IEEE Trans. Comput. C-29, 720 (1980). 3. M. Kimura and S. Yamada, Performance evaluation modeling for redundant real-time software systems, Trans. IEICE D-I J78-D-I, 708 (1995) (in Japanese). 4. M. Kimura, M. Yamamoto and S. Yamada, Performance evaluation modeling for fault-tolerant software systems with processing time limit, J. Rel. Eng. Assoc. Japan 20, 422 (1998) (in Japanese). 5. H. Ihara, A review of real time systems, J. Information Processing Society of Japan 35, 12 (1994) (in Japanese). 6. J. K. Muppala, S. P. Woolet and K. S. Trivedi, Real-time-systems performance in the presence of failures, Computer 24, 37 (1991). 7. K. Tokuno and S. Yamada, An imperfect debugging model with two types of hazard rates for software reliability measurement and assessment, Math. Computer Modelling 3 1 , 343 (2000). 8. S. M. Ross, Applied Probability Models with Optimization Applications, Holden-Day, San Francisco (1970). 9. P. B. Moranda, Event-altered rate models for general reliability analysis, IEEE Trans. Reliab. R-28, 376 (1979).

PART VI ACCELERATED TESTING AND FAILURE ANALYSIS

This page is intentionally left blank

PLANNING ACCELERATED TESTS - A REVIEW BONG-JIN YUM Department of Industrial Engineering, Korea Advanced Institute of Science and Technology,373-1 Gusung-dong, Yusong-gu, Taejon 305-701, Korea e-mail) [email protected] SANG-JUN PARK CS Center, Samsung Electronics Co., 416 Meatan 3-dong, Yeongtong-gu, Suwon, Gyeonggi-do 442-742, Korea HEONSANG LIM, MIN KIM Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1 Gusung-dong, Yusong-gu, Taejon 305-701, Korea It has been a great challenge to reliability engineers to evaluate the reliability of their products within an affordable amount of time and effort. Accelerated tests (ATs) combined with censoring have been effectively used for this purpose. ATs are further classified into accelerated life tests (ALTs) and accelerated degradation tests (ADTs). In the former, failure times of test units are observed while, in the latter, their performance characteristics are measured over time. In this paper, the literature on planning ADTs is reviewed with respect to the test scenario, assumed degradation model, and analysis method employed. Finally, recommendations for future research directions are provided.

1. Introduction As today's products become more reliable, it is getting more difficult to estimate their failure time distributions or reliabilities within an affordable amount of time and effort. Accelerated tests (ATs) have been widely used in industry to overcome such difficulties. An AT employs higher-than-usual levels of stress variables to hasten failures or performance degradation, and thereby, to quickly obtain reliability-related information (e.g., the p-th quantile of the lifetime distribution) at the use condition. ATs are generally classified into accelerated life tests (ALTs) and accelerated degradation tests (ADTs). In an ALT, the failure times and/or censoring times of test units are observed at several accelerated conditions, and then analyzed using a specified ALT model to estimate the quantities of interest at the use condition. On the other hand, in an ADT, a reliability-related 477

478

performance characteristic degraded over time is measured at several accelerated conditions, and then analyzed using the specified ADT model. ADTs have an advantage over ALTs especially when few or no failures are expected due to the high reliability of units. Analyzing degradation data for such a case generally yields more accurate information on the reliability of units (Lu, 1992, Nelson, 1990). Whether an ALT or an ADT is employed, it must be properly designed for statistical efficiency. Nelson (1990), Meeker and Escobar (1993, 1998), and Nelson (2005a, b) provide a review of the literature on designing AT plans with an emphasis on ALT plans. This paper reviews the existing works on designing DT and ADT plans with respect to the test scenario, degradation model, and analysis method employed. 2. Test Scenario A test scenario can be characterized by: 1) whether the test is destructive or nondestructive; 2) whether or not acceleration is considered; 3) the number of stress variables used; and 4) the stress loading method (see Table 1). 2.1. Destructive/Nondestructive Nature of Test Table 1. Classification of Literature with respect to Test Scenario

Authors

Acceleration

Boulanger & Escorbar (1994) Park & Yum (1997) Park & Yum (1999) Yu& Tseng (1999) Yang & Yang (2002) Wu & Chang (2002) Yu (2002) Yu & Chiao (2002a) Yu & Chiao (2002b) Yu & Tseng (2002) Yu (2003a) Yu (2003b) Yu (2003c) Marseguerra et al. (2003) Park & Yum (2004) Tang et al. (2004) Yu & Tseng (2004) Li & Kececioglu (2004) Yu (2006)

Yes Yes Yes No Yes No No No Yes No Yes No No No Yes Yes No Yes Yes

M n ^ltrp55

Variables Single Single Single

Stress Loading Method

Destructive (D) or Nondestructive (ND)

Constant Constant Constant

ND D D ND ND ND ND ND ND ND ND ND ND ND D ND ND ND ND

-

-

Single

Constant

-

-

Single

Constant

-

-

Single

Constant

-

-

Single Single

Step Step

-

-

Single Single

Constant Constant

479 How many times the performance characteristic of a test unit can be measured depends on whether the test is destructive or not. In the case of destructive testing, the characteristic can be measured only once, and therefore, a simple constant rate model is usually used as a degradation model (see Tables 1 and 2). 2.2. Acceleration In an ADT, units are subjected to stress levels higher than the use condition to hasten the degradation of the performance characteristic, and then, the measurement data are analyzed to estimate various quantities of interest (e.g., the /7-th quantile of the lifetime) at the use condition. Several authors (see Table 1) considered a degradation test conducted at the use condition assuming that a noticeable amount of degradation can be observed within the test completion time. 2.3. Number of Stress Variables Stress variables (e.g., temperature, voltage, humidity, mechanical load, vibration, etc.) can be used for the purpose of accelerating degradation. As shown in Table 1, all of the previous works on designing ADT plans deal with a single stress variable. However, as products become more reliable, it becomes more difficult to obtain a sufficient amount of reliability-related information within a reasonable amount of time using only a single stress variable. Multiple-stress ADTs have been used to cope with such difficulties. Nevertheless, little work exists on designing ADT plans using multiple stress variables. In passing, it is worth noting that Escobar and Meeker (1995) developed statistically optimal and practical ALT plans with two stress variables assuming no interaction between them, and Park and Yum (1996) developed optimal ALT plans when two stresses are involved with possible interaction. 2.4. Stress Loading Method The stress loading method can be broadly classified into constant-, step-, and progressive-stress loading. Constant-stress loading is easier to carry out, but may need a longer test time at a low stress level. When step-stress loading is employed in an ADT, the reliability-related information can usually be obtained in a quicker and more economical way compared to constant-stress loading (Iuculano and Zanini, 1986). The cumulative exposure model (Nelson, 1990) is usually employed to analyze the data obtained from a step-stress test.

480

There exist numerous works on designing ALT plans not only under constant-stress loading but also under step- and progressive-stress loading. On the other hand, most of the previous works on designing ADT plans are concerned with the case where constant-stress loading is adopted (see Table 1). Exceptions include Park and Yum (2004) and Tang et al. (2004) in which ADT plans are developed under step-stress loading. Little work exists on designing ADT plans under progressive-stress loading. In an ALT, the progressive-stress loading method is commonly used for fatigue testing and reliability evaluation of some capacitors or integrated circuits, and therefore, the effectiveness of employing this type of stress loading should be investigated for ADTs. 3. Degradation Model Attributes of an ADT model include: 1) time dependence of the performance characteristic; 2) distribution of the performance characteristic; and 3) relation between the unknown parameters and stress variables (see Table 2). 3.1. Simple Constant-Rate Model The simple constant-rate model is one of the most widely used degradation models, and takes different forms depending on whether the test is destructive or not. Table 2. Classification of Literature with respect to Degradation Model. Time Dependence of Performance Characteristic

Distribution of Performance Characteristic

Relation between Unknown Parameter and Stress Variable

Boulanger & Escorbar(1994)

Mixed Effects

(Log) Normal

Park & Yum (1997)

Simple Constant Rate

(Log) Normal

Park & Yum (1999) Yu& Tseng (1999) Yang & Yang (2002) Wu & Chang (2002) Yu (2002) Yu & Chiao (2002a) Yu & Chiao (2002b) Yu & Tseng (2002) Yu (2003a) Yu (2003b) Yu (2003c) Marseguerraetal. (2003)

Simple Constant Rate Mixed Effects Simple Constant Rate Mixed Effects Mixed Effects Mixed Effects Mixed Effects Wiener Process Mixed Effects Wiener Process Wiener Process

(Log) Normal (Log) Normal (Log) Normal Exponential (Log) Normal (Log) Normal (Log) Normal Normal (Log) Normal Normal Normal

Linear Arrhenius, Power, Exponential Arrhenius, Power, Exponential

Mixed Effects

Exponential

Authors

Linear

Linear

-

Linear

-

481

Park & Yum (2004) Tang et al. (2004) Yu & Tseng (2004) Li & Kececioglu (2004) Yu (2006)

Simple Constant Rate Wiener Process Mixed Effects

(Log) Normal Normal Normal

Mixed Effects Mixed Effects

(Log) Normal Normal

Arrhenius, Power, Exponential Linear

Arrhenius Linear

3.1.1. The Case of Destructive Tests In a destructive test, the performance characteristic can be measured only once, and therefore, available models are rather limited. A typical constant-rate model can be described as: w(ylj{tj) = a-b{Si)t

+ eij(t)

where yjj (t) is the performance characteristic of the y'-th unit at time t under stress level Si, «(•) is a suitable transformation, a is an unknown constant, b(St) is the degradation rate under the stress level S,, and eu(t) is an error term. Most of the previous works assume a normal distribution for the error term. The degradation rate b(S,) typically has the following single stress dependencies (Nelson, 1990). - Arrhenius model: 6(5;) = 6, exp(-6 2 /S,) when S is absolute temperature - Power model: b(Si) = fyS^ when 5 is voltage - Exponential model: 6(5",) = A, e x p ^ S , ) when S is a weathering variable where bt and b2 are unknown constants. 3.1.2. The Case of Nondestructive Tests Little research has been done to date on designing nondestructive ADT plans using a simple constant-rate model except that Yang and Yang (2002) developed a plan assuming that the performance characteristic is a linear function of the stress level and time. Their degradation model is as follows. a>(yu(.t)) = aJ+bJt +

cJSl+ev(t)

2 £lJ(t)~N(0,(T c)

where aj, b} and c, are unknown constants depending on the y'-th unit, and ev{t) represents the measurement error.

482

3.2. Nonlinear Mixed-Effect Model The mixed-effect model is frequently employed in nondestructive tests (e.g., fatigue crack growth studies for metallic items), and can be described as follows.

a> U(')) = /('; «,/?,) + *,(0 *,M~;v(o,cr,2) where / ( • ) is the true value of the performance characteristic of they'-th unit under stress level S,, a is a vector of fixed-effect parameters, J3r is a vector of random-effect parameters for the y'-th unit under stress level St and £„ (t) represents the measurement error. The model in which only random-effect parameters are present is called the random effects model. In most cases, a and Ptj are univariate, and it is generally assumed that py follows a (log) normal distribution with the location parameter fip and the scale parameter a2 , although Yu and Tseng (2004) and Yu (2006) assume a reciprocal Weibull distribution for Ptj. In addition, the i-th location parameter (/Up ) is usually assumed as a linear function of possibly transformed stress variable as follows. ttfii=A

+ BSl

where A and B are unknown constants. 3.3. Stochastic Process Model In this model, the (transformed) performance characteristic is described as a stochastic process. For instance, Yu and Cheng (2002), Yu (2003b, c), and Tang, et al. (2004) considered a Wiener process, {^(^(O) U ^ Of > with drift A a n d diffusion constant S2. Let D be the failure level and 0 = a>(D). Then, it is well known that the lifetime at the z'-th condition follows an Inverse Gaussian distribution with parameters //,(=#/&,) and X{=62181). 4. Analysis Method 4.1. Estimation Method Test plans must be designed to provide accurate estimators of model parameters, and therefore, the adopted estimation method plays an important role. Either the maximum likelihood (MLE) or least squares estimation (LSE) method is used in most of the previous works. In some cases, however, both methods are employed in

483

different stages of estimation. For instance, Boulanger and Escobar (1994) adopted weighted LSE to design a stress plan and MLE to develop a time plan. 4.2. Optimization Criterion Table 3. Classification of Literature with respect to Analysis Method.

Boulanger & Escorbar(1994)

Estimation Methods WLSE /MLE

Optimization Criterion Variance / Determinant

Park & Yum (1997)

MLE

Variance

Park & Yum (1999)

MLE

Variance

Yu& Tseng (1999)

LSE

Variance

Yang & Yang (2002)

MLE

Variance

Wu & Chang (2002)

LSE

Variance

Yu (2002)

LSE

Cost

Yu & Chiao (2002a)

LSE

Cost

Yu & Chiao (2002b)

LSE

Confidence Interval

Yu & Tseng (2002)

MLE

Cost

Yu (2003a)

LSE

MSE

Yu (2003b)

MLE

Cost

Yu (2003c) Marseguerra et al. (2003)

MLE

Cost

LSE

Variance, Cost

Park & Yum (2004) Tang et al. (2004)

MLE MLE

Variance Cost

Yu & Tseng (2004)

LSE

MSE

Li & Kececioglu (2004)

LSE/MLE

MSE

Yu (2006)

LSE

MSE

Authors

Decision Variables Stress level, Allocation ratio, Measurement time, Sample size Stress level, Allocation ratio, Measurement time, Sample size Stress level, Allocation ratio, Measurement time, Sample size Inspection frequency, Termination time, Sample size Stress level, Allocation ratio, Critical value Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Inspection frequency, Termination time, Sample size Stress level, Allocation ratio, Stress changing time Sample size, Inspection number Inspection frequency, Termination time, Sample size Inspection frequency, Measurement time, Termination time, Sample size, Stress level, Allocation ratio Inspection frequency, Termination time, Sample size

Various optimization criteria have been proposed to design ADT plans (see Table 3). Most frequently used criteria include the (asymptotic) variance, mean

484

squared error (MSE), or confidence interval of the estimate of thep-th quantile or mean of the lifetime distribution at the use condition. The total experimental cost is the next frequently used criterion. In addition to these criteria, Boulanger and Escobar (1994) considered the determinant of the Fisher information matrix of the MLEs of unknown parameters to evaluate a measurement time plan, and Marseguerra et al. (2003) formulated a multi-objective problem, which is solved using the genetic algorithm. 4.3. Decision Variables In designing an ADT, decision variables (e.g., stress level, sample size, proportion of test items allocated to each stress level, inspection frequency, measurement time, termination time, etc.) are determined such that the above criteria are optimized. On the other hand, it is difficult in most cases to optimally determine all of the inter-related decision variables at the same time, and therefore, they are usually determined separately to obtain the stress and time plan. For instance, Boulanger and Escobar (1994) determined the stress levels and proportion of test items allocated to each stress level in the stress plan, while the measurement times are determined in the time plan. It is also worth noting that Yu and Tseng (1999), Wu and Chang (2002), Yu and Chiao (2002b), Yu (2003a), Yu and Tseng (2004), Li and Kececioglu (2004) and Yu (2006) determined inspection frequency, termination times, and sample size under the constraint that the total experimental cost does not exceed a predetermined budget. 5. Discussions This paper reviews the existing works on designing DT or ADT plans in terms of the test scenario, degradation model, and analysis method employed. We suggest that future research be directed towards the following: 1) development of ADT plans with two or more stress variables to achieve further economy of testing; 2) development of ADT plans under other stress loading methods than constant loading; 3) comparison of the relative performance of ADT plans under different stress loading methods; 4) sensitivity analysis of the plans with respect not only to the guesstimates of the unknown parameters, but also to the assumed models; 5) assessing small-sample properties of a plan and its comparison with the asymptotic results; and 6) development of computationally efficient techniques for determining optimal plans, especially when constraints are present.

485

References 1. 2. 3. 4.

G. Iuculano and A. Zanini, IEEE Transactions on Reliability 35, 409 (1986) W. Nelson, John Wiley & Sons, New York, (1990) C.J. Lu, Ph.D. thesis, Iowa State University, Dept. of Statistics, (1992) W. Q. Meeker and L.A. Escobar, International Statistical Review 61, 147 (1993) 5. M. Boulanger and L.A. Escobar, Technometrics 36, 260 (1994) 6. L.A. Escobar and W. Q. Meeker, Technometrics 37,411 (1995) 7. J. W. Park and B. J. Yum, Naval Research Logistics 43, 863 (1996) 8. J.I. Park and B.J. Yum, Engineering Optimization 28, 199 (1997) 9. W. Q. Meeker and L.A. Escobar, John Wiley & Sons, New York, (1998) 10. J.I. Park and B.J. Yum, Engineering Optimization 31, 301 (1999) 11. H.F. Yu and S.T. Tseng, Naval Research Logistics 46, 689 (1999) 12. S.J. Wu and C.T. Chang, Reliability Engineering and System Safety 76, 109 (2002) 13. G. Yang and K. Yang, IEEE Transactions on Reliability 51, 463 (2002) 14. H.F. Yu, Engineering Optimization 34, 579 (2002) 15. H.F. Yu and C.H. Chiao, IEEE Transactions on Reliability 51, 427 (2002a) 16. H.F. Yu and C.H. Chiao, Journal of the Chinese Institute of Industrial Engineers 19, 23 (2002b) 17. H.F. Yu and S.T. Tseng, Naval Research Logistics 49, 514 (2002) 18. M. Marseguerra, E. Zio and M. Cipollone, Reliability Engineering and System Safety 79, 87 (2003) 19. H.F. Yu, Quality and Reliability Engineering International 19, 197 (2003a) 20. H.F. Yu, Engineering Optimization 35, 313 (2003b) 21. H.F. Yu, The International Journal of Quality & Reliability management 20, 1084 (2003c) 22. Q. Li and D.B. Kececioglu, InternationalJournal of Materials and Product Technology 20, 73 (2004) 23. S.J. Park and B.J. Yum, Quality Technology & Quantitative Management 1, 105 (2004) 24. LC. Tang, GY. Yang and M. Xie, The Annual Reliability and Maintainability Symposium (2004), 287 (2004) 25. H.F. Yu and S.T. Tseng, Quality Technology & Quantitative Management 1, 47 (2004) 26. W. B. Nelson, IEEE Transactions on Reliability 54, 194 (2005a) 27. W. B. Nelson, IEEE Transactions on Reliability 54, 370 (2005b) 28. H.F. Yu, Journal of Statistical Planning and Inference, 136, 282 (2006)

FAILURE ANALYSES AND ACCELERATED DEGRADATION TEST FOR AC FAN MOTOR JOON-SIK JUNG Quality and Reliability Laboratory, Daewoo Electronics Corp. #412-2, Cheongcheon 2-Dong, Pupyong-Ku, Incheon, 403-858, Korea E-mail: jsjung2@dwe. co. kr. JIN-WOO KIM, JAE-KOOK LEE, HEE-JIN LEE, JI-SEOB MOON, JAE-CHUL SHIN Quality and Reliability Laboratory, Daewoo Electronics Corp. #412-2, Cheongcheon 2-Dong, Pupyong-Ku, Incheon, 403-858, Korea SANG-WON CHA, MYUNG-SOO KIM Reliability Innovation Center, The University ofSuwon San 2-2, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743,

Korea

This paper presents failure analyses and an accelerated degradation test of AC fan motors for refrigerators. Several analyses such as destructive physical analysis, blade fracture analysis, FEM, and fan oscillation are made to identify root causes of failures for the field samples. Next, an accelerated degradation test is planned to determine the significant factors and to predict the lifetime for the predominant failure mode, locked-rotor by dinger between shaft and bearing. The amount of oil, temperature, voltage, length of shaft, and unbalance are considered as accelerating factors, and 16 test conditions are selected by design of experiment using orthogonal array. Data analysis shows that RPM is affected by voltage only, but is not degraded significantly in time. It is also shown that voltage and the amount of oil affects the change of oil amount, which decreases in time. The degradation characteristic of AC fan motor will be characterized and its accelerated degradation model will be made after more testing.

1. Introduction AC fan motor (shaded pole motor) is widely used in home appliances for making air-cooling and circulation [10]. There has been significant improvement for the quality and reliability of AC fan motor over the last decade. However, electronic industry goes through AC fan motor reliability problems. If its failure mechanisms are better understood, the quality and reliability of AC fan motor can hopefully be established under more severe conditions such as low or high temperature. Therefore, it is important to investigate AC fan motor failure mechanisms and improve its reliability. 486

487

Some authors [1, 2, 4, 6, 11] have studied the root causes and lifetime characteristics of AC fan motors. Little work has, however, been made for the accelerated degradation test (ADT). Moreover, there is no industry standard for life testing. As such, it is necessary to establish appropriate ADT to accelerate failure at their actual operating conditions. This paper is concerned with the ADT of the AC fan motor for refrigerators. Failure modes and failure mechanisms, and their root causes are investigated by failure analyses. An ADT is planned to investigate the potential factors affecting reliability and to predict the lifetime. The characteristics of AC fan motor and the accelerated degradation model were studied by analyzing test data. 2. Failure Analyses of AC Fan Motor 2.1. Failure Modes and Mechanisms Generally, there are five typical modes of AC fan motor failures: i) excessive vibration or noise, ii) hitting of the blade, iii) reduction in rotational speed, iv) locked rotor, and v) failure to start. In order to investigate these failure modes, 63 failed samples are collected from field. Table 1 summarizes the results, and shows that the locked rotor due to lack of lubricant and the noise because of eccentricity are dominant failure modes. The locked rotor depends on oil amount and oxidation of oil in selflubricating bearing. The oil in felt, which provides supplementary oil to expand lifetime, is gradually reduced in time and finally dried up. This is caused by cyclic loads due to compressor on/off, or by breakdown of lubricant film due to repeating unbalanced operation of the motor. The rate and degree of oxidation of oil depends on moisture. Lubricating oil could be changed into sludge under high humidity condition, which might cause the reduction in rotational speed and locked rotor at the worst case. On the other hand, the high sound pressure level noise can be caused by the eccentricity of fan due to the blade fracture and corrosive wear of bearing. Failure modes No. of sample Percentage

Table 1. Failure modes of the failed samples Locked-rotor & Reduction of RPM Noise 41 22 35% 65%

Sum 63 100%

488 2.2. Failure Analysis 2.2.1. Failure Analysis of Locked-Rotor ofAC Fan Motor Destructive physical analysis (DPA) was performed for the locked rotor. Consequently, the loss of oil in felt, foreign particles between bearing and shaft, and the oxidation of oil (or the corrosion of bearing) were found. By SEM & EDS analyses for the foreign particles, copper (Cu), tin (Sn), oxygen (O), and carbon (C) were detected. Since Cu and Sn are elements of bearing, C is that of oil, and O is attributed to the oxidation of oil or the corrosion of bearing, the locked rotor depends not only on a loss of oil but strongly on the moisture penetration. Therefore, we assumed that moisture condensed in the area between bearing and shaft might produce oil sludge due to oxidation. The local Joule heating at the bearing due to debris might cause the locked rotor and make the RPM be decreased.

(''.' ','V (L! (.J,1 'S! Figure 1. Optical photographs of a failed sample showing (a) assembly construction, (b) dried up felt, (c) wear products, (d) locked rotor, and (e) corroded bearing

(a) (b) Figure 2. SEM&EDS showing (a) oil sludge, (b) its composition

2.2.2. Blade fracture analysis ABS fan fractures are up to 23 % of all failed samples, which are located at the edge of blade or center part fixed by clip tightly. SEM examination of the fracture surface of blade center part shows two distinct types of fracture surfaces (Fig. 3(a) and 3(b)). The A region exhibits a relatively smooth fracture morphology characteristic of environmental stress cracking (ESC), while the B region exhibits a rougher fracture characteristic of mechanical fracture. It was deduced that ESC was caused by oil used for lubricating and absorbed moisture under actual using condition [3, 5, 7, 8].

489 As a result of gel permeation chromatography (GPC), there was no discrepancy in molecular weight and distribution of AS (acrylonitrile-styrene copolymer) part between unused one and 10 years-used sample (Table. 2). However, the result of Fourier transform infrared spectroscopy (FT-IR) analysis of ABS fractured site is as follows; it shows the reduction of the intensity at 966cm"1 due to degradation of CH2 rocking modes of polybutadiene, and an increase at 1760cm"1 due to carbonyl bonds (Fig. 3(c)). This loss of unsaturation seemed to produce a brittle surface layer, which can easily initiate cracks on the blade edge into the bulk material. Temperature-Humidity related environmental tests were run on the lubrication oil used for bearing. The impressive resulted shows the total acid number of the oil tested in high humidity condition increased highly after 1,000 test hours. This allows it to accelerate the formation of oil sludge, hindering bearing running clean and efficient and decreasing life of oil (Table 3).

Figure 3. Analysis of blade fracture, (a) Optical photographs (b) SEM image of surface morphology (c) FT-IR spectrum of the ABS blade (the upper: unused, the lower: fractured) Table 2. Molecular weights and poly dispersive index (PI) by GPC using mixture of MAc+0.05MLiBr as a solvent and calibrated with poly methyl methacrylate. Mn Mw No. Mn/MW(PI) 62379 142405 Unused blade (ABS) 2.3 10 years used blade 58448 142678 2.4 Table 3. Performance properties of environment tested oil (Polyester based synthetic lubricant), L o w temp.(-40 °C)

LoWtemp.(0

V)

High temp.( 150 t )

Before

High humidity (60 t , 9 5 % R H )

2O0h.

SOOh.

lOOOh.

200h.

500h.

lOOOh.

200h.

500h.

lOOOh.

200h.

5<Xlh.

lOOOh.

Lubricant's Viscosity

40TJ.CSI

33 29

31 31

30.62

31 14

31.1

31.08

30.37

31.1

31.67

31.99

30.65

30.07

28.16

( A S T M D445)

100 C , est

6 25

6.14

6.25

6.23

6.16

6.31

6.14

6.19

6.34

6.32

6.23

5 91

5 56

mgKOH/g

0.04

0.05

0.05

(1.06

0.05

0.06

0.08

0.05

0.04

0.51

0 05

0.19

51.57

mm

0.89

0,87

0.85

o.y

0.86

0.9

0.9

0.86

0.92

0.9

0.87

0.83

0.9

TAN ( A S T M D-664 974> 4-ballwoar ( A S T M 1)4172)

490

2.2.3. FEM A nalysis of fan The numerical analysis of ABS blades was performed with finite element method (FEM) using software 'ALGOR' to investigate the structural weak points and principal stress directions. As Figure 4(b) shows, the maximum principal stress is concentrated at the blade edge and the magnitude is about 17 MPa. While catastrophic failure will not occur at the blade edge, repeated vibration during rotation can cause fatigue failure. Figure 4(c) indicates the directions of principal stress, pointing outward from the center. This is in accordance with the failure analysis result; the crack grows perpendicular to the direction of principal stress. Density 1.1506g/cm3

Table 4. Material properties of ABS fan Modulus of Elasticity Poisson's ratio 2496 MPa 0.31

Tensile strength 32-45 MPa

(m

^•tgr

Y(a) (b) * (c) Figure 4. FEM analysis results; (a) mesh information, (b) maximum principal stress, and (c) principal stress directions. (Total number of nodes; 10683, elements; 19108, boundary condition, pressure of 0.1 MPa to the front of the fan and blade center)

2.2.4. Visualization of Fan Oscillation Using ESPI Synchronously with Laser Doppler Vibrometer (LDV) measurements of the mechanical surface behavior, the experiments with holographic interferometry were performed to visualize the displacement distribution and modes of the oscillation of the blade. By increasing the exciting frequency, the phase amplitude was clearly displayed until the third resonant frequency, 246Hz. Fig. 5(a) and 5(b) show fringe patterns of the Electronic Speckle Pattern Interferometer (ESPI) with Laser Trigger System (LTS 200) for the first and second modes, which have gradient of displacement in four blades respectively. In these fringe patterns, the mass of each blade was unbalanced, which says that the eccentric in rotation is caused not only because of an unbalanced of shaft, but also different mass or stiffness of each blade. The natural frequencies for the first, second and third natural frequencies compared to FEM predicted mode are shown in Table 5. The reason why

491 differences between the experimental and the FEM result is that natural frequency was reduced due to increased mass such as shaft or center part fixed by clip in experimentally data. Moreover, generally exciting frequencies are much higher than the resonant frequencies of the plane itself [9].

(a) (b) Figure 5. Blade vibrating with a frequency of (a) 133Hz / amplitude of 5.34 urn, (b) 210Hz / amplitude of 3.78 pm and pulse separation of 100 us Table 5. The natural frequencies for the first, second and third natural frequency 3 rd Natural frequency 2 nli Natural frequency 1" Natural frequency Mode 133Hz 210Hz 246Hz ESPI 168Hz 244Hz FEM 270Hz

3. Accelerated Degradation Test and Lifetime Model There are many factors affecting the performance and lifetime of AC fan motor. Among them, shaft length, oil amount, unbalance, voltage, and temperature are selected as critical factors affecting the locked-rotor. Table 6 shows the considered levels of factors. 16 testing conditions are determined by L16(215) orthogonal array, 6 samples are allocated at each condition, and their RPM and change of oil amount are measured at prescribed times. Level

1

Shaft length 70 mm

2

90 mm

Table 6. Levels of factors for A D T Oil amount Unbalance Voltage

Temperature

1.2 g

0

242 V

60 °C

15 g

0.05

264 V

70 °C

4. Analysis of Test Data ANOVA is performed for the RPM and change of oil amount data. Table 7 and 8, and Fig. 6 show that: i) RPM is affected by voltage and temperature, and is not affected by other factors, ii) The change of oil amount is affected by voltage and the amount of oil, and there is decreasing trend in time.

492

Table 8 also shows that the change of oil amount is significantly affected by oil amounts, voltage, and measurement time. So, we have made regression analysis, and the estimated regression equation is as follows: Y = -0.048 - 0.303X, + 0.00136X, - 0.00033X3, where Y is the change of oil amount, X1( X2 and X3 are oil amount, voltage and time, respectively. This equation can be used to determine the failure criterion of fan motor due to change of oil amount.

Source

DF

Table 7. ANOVA table for RPM SS MS

F

p-value

Temperature

385954

385954

2.35

0 125

Shaft length

47150

47150

0.29

0.592

Oil amount

33719

33719

0.21

0.650 0.709

Unbalance Voltage

22806

22806

0.14

4788127

4788127

29.19

0.000

0.37

0.994

Measurement Time

19

1147164

60377

Error

983

161234624

164023

Total

1007 Table 8. ANOVA table for change of oil amount F DF SS MS

Source

p-value

Temperature

0.0000

0.0000

0.00

1.000

Length

0.0516

0.0516

0.81

0.369

Oil amount

2.0814

2.0814

32.60

0.000

Unbalance

0.0059

0.0059

0.09

0.762

Voltage

0.2224

0.2224

3.48

0.062

18.93

0.000

Measurement Time

19

22.9686

1.2089

Error

983 1007

62.7686

0.0639

Total

Vi3S*|S.

T*f^»p»!ij« StwSatejt?!

kte£i.S8«*!xtOSS.i}«

1

Gtittmi'l

i&wttipwt

¥o8#g* Uvawvm^sai*

-eaj -f ....„.'

-§4* <

-

i

!

Ir,,

• V ~~3£" Figure 6. Main effects plot for RPM and change of oil amount i_

#

»+

*y~

;__V

«*W

493

5. Conclusion We have presented failure analyses and an ADT of AC fan motors for refrigerators to accelerate the failure due to locked-rotor by dinger between shaft and bearing. Through failure analyses, we have found the root causes of the locked-rotor and the blade fracture. By analyzing ADT data, we have determined significant factors affecting the performance of AC fan motors. However, RPM and the amount of oil have not degraded sufficiently for 1,500 hours testing. We expect that the degradation characteristic of AC fan motor will be characterized and its accelerated degradation the model will be made after more testing. Acknowledgments This work was supported by the Components and Materials Technology Development Program of Ministry of Commerce, Industry and Energy in 2005/2006, and is appreciative hereupon. References 1. A. H. Bonnett, Root Cause AC Motor Failure Analysis with a Focus on Shaft Failures, IEEE Trans, on Industry Application 36, 1435 (2000). 2. A. H. Bonnett, Cause and analysis of anti-friction bearing failures in AC induction motors, Conference Record of 1993 Pulp and Paper Industry Technical Conference, 36 (1993). 3. C. D. Bopp and O. Sisman, Radiation Stability of Plastics and Elastomers, Oak Ridge National Laboratory, Report 133 (1953). 4. E. L. Brancato, Estimation of Lifetime Expectancies of Motors, IEEE Electr. Insulation Magazine 8, 5 (1992). 5. M. Ezrin, Plastic Failure Guide, Cause and Prevention, Hanser (1996). 6. Vaag Thorsen O., Dalva M., A Survey of Faults on Induction Motors in Offshore Oil Industry, Petrochemical Industry, Gas Terminals, and Oil Refineries, IEEE Trans, on Industry Application 31, 1186 (1994). 7. J. Scheris, Compositional and Failure Analysis of Polymer, John Wiley & Sons (2000). 8. O. Sisman and C. D. Bopp, Physical Properties of Irradiated Plastics, Oak Ridge National Laboratory, Report 1373 (1953). 9. W. Steinchen, L.X. Yang and G. Kupfer, Digital shearography for nondestructive testing and vibration analysis, Society for Experimetnal Stress Analysis 21(4) 20 (1997) 10. C. G. Veinott and J. E. Martin, Fractional and Sub fractional Horsepower Electric Motors, 4th ed., McGraw-Hill (1986). 11. A. Wysocki and B. Geest, Bearing Failure: Causes and Cures, EC & M, 52 (1997).

AN ANALYSIS OF ACCELERATED PERFORMANCE DEGRADATION TEST ASSUMING THE ARRHENIUS STRESSRELATIONSHIP* TAE HYOUNG KANG, SANG WOOK CHUNG* Department of Industrial Engineering, Chonnam National University 300 Yongbong-dong, Buk-gu, Gwangju 500-757, Korea WON YOUNG YUN Department of Industrial Engineering, Pusan National 30 Jangjeon-dong, Geumjeong-gu, Busan 609-735,

University Korea

An analytical model is developed for accelerated performance degradation tests. The performance degradations at a specified exposure time of products are assumed to follow a normal population. We assume that the relationship between the location parameter of normal population and the exposure time is a linear function of the exposure time, that is, /j(t) = a + bt that the slope coefficient of the linear relationship has an Arrhenius dependence on temperature and that the scale parameter of the normal population is constant and independent of temperature or exposure time. The method of maximum likelihood estimation is used to estimate the parameters involved. A closed form expression of the likelihood function for the accelerated performance degradation data is derived and the Fisher information matrix is also derived for calculating the asymptotic variance of the lOOpth percentile of the lifetime distribution at use temperature.

1. Introduction Maintaining high reliability for the entire system requires that the individual system components have extremely high reliability, even after long periods of time. With short product development times, reliability tests, like life test, must be conducted with severe time constraints. Frequently, some life tests result in few or no failures. Thus it is difficult to assess reliability with traditional life tests that record only time to failure. One way of obtaining additional information about reliability of units is to accelerate the life by testing at higher levels of stress. This is called an accelerated life test (ALT). Nelson [17, 18] provided a current bibliography of 159 references on statistical plans for accelerated tests.

" This study was financially supported by Chonnam National University in the program, 2006.

494

495 Sometimes, for some highly reliable products, the difficulty of evaluating the reliability is encountered even in ALT experiments, because no failures are likely to occur in a reasonable amount of time. In such cases, some product's physical performance degradation measures, taken over time, contain information about product reliability and eventually lead to failures. Then one can define component failure in terms of a specified level of degradation and estimate the time-to-failure distribution from the degradation measures. Usually, in order to facilitate observing the performance degradation of shorten the degradation experiment under a normal use condition, it is practical to collect the degradation data at higher levels of stress and then carry out extrapolation in stress to estimate the reliability under normal use conditions. Such an experiment is called an accelerated degradation test (ADT). Information obtained from monitoring specimen degradation is much richer than failure times [16]. Lu, Meeker and Escobar [10] compared degradation analysis and traditional failure-time analysis. Meeker and Hamada [15] emphasized the ADT as a tool for promptly developing and evaluating high-reliability products. Nelson [19], chapter 11 and Meeker and Escobar [12] surveyed pertinent literature on the subject. Meeker and Escobar [13] gave an updated literature survey. Many papers have been written on reliability degradation modeling research & applications [1, 2, 11]. There are two types of degradation modeling being widely used [14, 20]; one is the degradation path curve approach of performance degradation versus time based on a known physics mechanism [5, 19], the other the graphical approach based on statistical probability distribution function [2, 4]. In this work, performance degradation model established by taken simultaneously on the degradation path curve and the graphical approach. Lu and Meeker [9] proposed a least-squares-based two-stage method for the inference on lifetime distributions using degradation data. Wu and Shao [23] considered direct ordinary, and weighted least squares procedures for degradation analysis. Robinson and Crowder [21] explored a Bayesian approach. Meeker and Escobar [13] proposed a maximum likelihood procedure. For various MLE-based inference procedures, the reader is referred to Lawless [7]. In this paper, the method of maximum likelihood estimation is used to estimate the parameters involved. A closed form expression of the likelihood function for the accelerated performance degradation data is derived and the Fisher information matrix is also derived for calculating the asymptotic variance of the 100/rth percentile of the lifetime distribution at use temperature. The rest of this paper is organized as follows. Section 2 describes useful models for degradation and derivation of the likelihood function. Section 3 presents the method to estimate parameters and the Fisher information matrix. Finally, Section 4 addresses the concluding remarks.

496 2. Models for Degradation and Derivation of the Likelihood Function 2.1. Model Assumptions The assumptions of the degradation model used in this article are as follows: 1.

2. 3.

For any stress level S (absolute temperature) and exposure time / , the distribution of performance degradation data, U(t,S) , of test units are assumed to be independent and to follow lognormal. Thus, the distribution of log performance Y = In U is normal. The scale parameter a of the log performance Y is a constant; that is, <x does not depend on exposure time and stress. The relationship between the location parameter /u and exposure time / and stress S is fi(t,S) = a + j3texp(-y/S).

This is called Arrhenius relationship. The parameters, a,ft,y, and a , are characteristic of the product and test method. In addition to the model assumptions, we have the following assumption about how the test is conducted: • • •

Performance degradation is measured on n test units. For the / th unit, the measurements at times t1,t2,---,tm are yn,yi2,---, yim, respectively. The degradation path is strictly increasing; 0 < yn < yi2 <•••< yim < +oo . Because of the natural ordering of performance degradation; 0<j>;l Q; f(y;0(t2)),y>yi; ••• f(y,8(tm)),y^ym-] , where 6{t) is the time-dependent distribution parameter function vector. In the following sections, we derive the likelihood function.

2.2. Derivation of the Likelihood Function We present the proposed ADT models in which the performance degradation Y follows a normal distribution with a cumulative distribution function (cdf) Pr(7 < y) = ®{{y - v(t, S))/
is g(yn.ya.-.ylm)

=^ ^ — „

f

\f(yn)
i

y

a

)

\f{yii)dya yn

« - * ^ ^ \f(ym,-\)dyim-\ yi„,-\

0)

497

due to the ordering 0 < yn
+ 30

[/(rn^n

+00

|/0'i2)«!vi2

0

\f(ym„-])dy„m-\

y\ 1

.Pnni-l

Then the log likelihood function, the logarithm of / , is written as n

m

n

m-\

L = ln/ = ^ ^ l n / ( ^ ) - ^ ^ l n ( l - ( Z y ) ) i=l 7=1

(2)

i=l 7=1

Note that the log likelihood I is a function of the yIJy s, i, and the parameters a,p,y,

and
3. The Parameter Estimation and Fisher Information Matrix 3.1. The Parameter Estimation Having a sample of performance degradation data yij,Shtj, MLE is used to estimate the model parameters. From Eq. (2), the model parameters can be estimated by solving the ML equations of az,/aa = 0,az,/5/? = 0,aL/5f = 0, and 8Llda = 0 , simultaneously. Numerical algorithms, such as Powell's [22] directional grid search method, Newton-Raphson's procedure [8], EM algorithms [6], etc., could also use to get the MLE. For comparing product designs with 100/rth percentile of the failure-time distribution, we define the failure-time T as (log) time that the degradation reaches a specified level yf . Then the distribution function of T based on MLE at use condition Sn is y/-a-ptexp(-y/S0) Fr(t) = Pr[T

a

=o

t-M,

where n,-a

7° exp(y/sQ)

a,=(<jexp(y/S0j)/f) and O(-) is the cdf of the standard normal distribution. 3.2. The Fisher Information Matrix The Fisher information matrix is obtained by taking expectations of the negative of the second partial derivatives of the log likelihood L with respect to the model parameters. The first order partial derivatives of the log likelihood function are

498

da

ilZZM-ZZl^bJf- fA ZZ(v-)-ZZ ;=i 7=1

dL

1

1=1 y=i V

^ ^ ( Z ,

dp~ cr

X ( /

i=l 7=1 ;=l V i=l

i=l

*

fitjexpi-y/S,)

ZZh

V'J

exp(-r/S,))-^^U-^-x^exper/S,)

/=t 7=1 ;=1 /=1

dL _ 1 cr

v

,=1 y=| V

,i=l 7=1

v-VI

V( Z y)

*(Z(/)

• /=1 ZZ T > j=l

S,

7=1 ^

W'

PtjWA-YlS,)

-tyZ,j)

Si

These four expressions are called the likelihood equations. The expectations of the negative of the second order partial derivatives are obtained by n

d2L

m

n

n

m-\

m-\

«z<jr

^ZZ»*ZZff-^*-ZZ

2

da

1=1 7=1

I

d2L

a

in

n

itt—\

^.

2

dp

1=1 7=1

^'7

2

{1-<XZ9))

1=1 7=1

-zz

F*(^jf

1=1 7=1

»=1 > 1

^x^-expC-r/S,))2

rf^x^exp^/S,))2

1 J •»/(/*, expfr/S,))1

a 2 /.

V.

=1 7=1

2

-- -fz,#z,) ^ /=1 ,/=l 1-WJ n m—\

4(ZU)2

ZZ

\-
1=1 7=1

^ ) "t

'=1 7=1

(^expC-z/S;.))2

3

(l"^))2j

% X

^exp(-r/S,.) rfy„ x cr x —i y,J s2

ZZK-'^^-'ZZf^j*

do-2

, N

7=1

'•=' 7=1

L*ju\ (i-ocz,))2 dad/} w

^{llT^^'^YTi^^^jAh-M-y'Si)) ,=i 7=1 w-1

^

j / T

\3

1=1 7=1

499

S;

dady

|-

+

yy [*

frZy?

Jtjexpj-r/Sj)]

d

/=1 7=1

/=1 y=l

-izn^-iziy^u. n

ffl-1

,-, r? 1,/rj

s2

;=l 7=1

-'

"

w-1

/=l 7=1

Z

5L 5/?5/

Zij^iZ^dyij x

-o

o- j / v s3

\

(yexpC-r/S,)

ptjex.tf.-y IS,)

* lj extf-y /St)

. ' = 1 >=' 2

^rZ,(Z,)

2

^S) x <»^- , "" r ;'"'^«p(-ws,)-i2j;it^^

III;

=i ;=i

+

^expC-y/S,)

(=1 y=l

izr^-^-^^-szf^^-o—^ 2^/v

\2

where = d<$>ldy. Thus the Fisher information matrix for the proposed model is obtained as follows E\-d2L/da2] F=

E\-d2L/dadp\ E\-d2Lldp2\

E-d2Lldady. E-B2Lldpdyi \-d2L/dy2]

-d2L/dada. d2L/dpda. -d2L/d}0a E[-d2L/8a2]

symetric For a particular test plan, the ML estimate of the asymptotic variance-covariance matrix is the inverse of F . That is, Var(d)

Cov(d,P) Cov(a,y) Cov(a,a) Var(P) Cov(J3,y) Cov(P,d) = F~'. Var(y) Cov(y,a) symmetric Var(a)

(3)

500

where F is the Fisher information matrix. Hence, the ML estimate y of the lOOpth percentile is yp=a + fltjexp(-r/si) + zl,a, here zp is the standard normal 1 OOpth percentile. The partial derivatives are 3?p

3yp

, ....^

—— = 1,—— ='/exp(-y/S,-),—— = da dp ' V " dy

0t j exp(-y / S,) dy Si

,—— = z„ . da "

Then, the corresponding asymptotic variance is v a r [ ^ , ] = [Itjexpi-r/Si),

,z p ]S[l,f / exp(-?'/S , / ),

Sj

J

-

,:p] S:

here £ is the estimate Eq. (3) of the asymptotic variance-covariance matrix. 4. Concluding remarks An analytical model described in this paper is developed for accelerated performance degradation tests. The performance degradation model established by taken simultaneously on the degradation path curve and the graphical approach (Arrhenius relationship and normal population). We have provided a closed form expression of the likelihood function for the proposed accelerated performance degradation model and the Fisher information matrix is also derived for calculating the asymptotic variance of the 100/rth percentile of the lifetime distribution at use temperature. References 1. S. J. Bae and P. H. Kvam, A Nonlinear Random-Coefficients Model for Degradation Testing, Technometrics 46, 460-469 (2004). 2. M. Boulanger and L. A. Escobar, Experimental design for a class of accelerated degradation tests, Technometrics 36, 260-272 (1994). 3. M. B. Carey and R. H. Koenig, Reliability assessment based on accelerated degradation: a case study, IEEE Transactions on Reliability 40, 499-506 (1991). 4. W. Huang and D. L. Dietrich, An Alternative Degradation Reliability Modeling Approach Using Maximum Likelihood Estimation, IEEE Transactions on Reliability 54(2), 310-317 (2005). 5. D. Kececioglu and J. Jack, The Arrhenius, Eyring, Inverse Power Law and Combination Models in Accelerated Life Testing, Reliability Engineering 8, 1-9(1984). 6. M. Laird, N. Lange and D. Stram, Maximum Likelihood Computations with Repeated Measures: Application of the EM Algorithm, J. Amer. Statist. Assoc. 82, 397, 97-105 (1987).

501 7. J. F. Lawless, Statistical models and methods for lifetime data, second ed., NY: John Wiley & Sons (2003). 8. M. J. Lindstrom and D. M. Bates, Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated Measures Data, Journal of the American Statistical Association 83, 1014-1022 (1988). 9. J. C. Lu and W. Q. Meeker, Using degradation measures to estimate a timeto-failure distribution, Technometrics 35(2), 161-174 (1993). 10. C. J. Lu, W. Q. Meeker and L. A. Escobar, A comparison of degradation and failure-time analysis methods of estimating a time-to-failure distribution, Statistica Sinica 6, 531-546 (1996). 11. J. C. Lu, J. H. Park and Q. Yang, Statistical Inference of a Time-to-Failure Distribution Derived From Linear Degradation Data, Technometrics 39(4), 391-400 (1997). 12. W. Q. Meeker and L. A. Escobar, A Review of Recent Research and Current Issues in Accelerated Testing, International Statistical Review, 61, 174-168 (1993). 13. W. Q. Meeker and L. A. Escobar, Statistical Methods for Reliability Data, NY: John Wiley & Sons (1998). 14. W. Q. Meeker, L. A. Escobar and J. C. Lu, Accelerated degradation tests: modeling and analysis, Technometrics 40(2), 89-99 (1998). 15. W. Q. Meeker and M. Hamada, Statistical Tools for the Rapid Development & Evaluation of High-Reliability Products", IEEE Transaction On Reliability 44(2), 187-198 (1995). 16. V. N. Nair, Discussion of "Estimation of Reliability in Field Performance Studies", by J.D. Kalbfleisch and J.F. Lawless, Technometrics 30, 379-383 (1988). 17. W. Nelson, A Bibliography of Accelerated Test Plans, IEEE Transactions on Reliability 54(2), 194-197 (2005). 18. W. Nelson, A Bibliography of Accelerated Test Plans: Part II-References, IEEE Transactions on Reliability 54(3), 370-373 (2005). 19. W. Nelson, Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, NY: John Wiley & Sons (1990). 20. W. Nelson, Analysis of performance degradation data from accelerated tests, IEEE Transactions on Reliability R-30(2), 149-155 (1981). 21. M. E. Robinson and M. J. Crowder, Bayesian methods for a growth-curve degradation model with repeated measures, Lifetime Data Analysis 6, 357374 (2000). 22. M. J. D. Powell, An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives, Computer Journal!, 155-162(1964). 23. S. J. Wu, and J. Shao, Reliability Analysis Using The Least Squares Method in Nonlinear Mixed-Effect Degradation Models, Statistica Sinica 9, 855-877 (1999).

OPTIMUM DESIGN OF ACCELERATED LIFE TESTS UNDER TWO FAILURE MODES C.M.KIM Six Sigma Division, Samsung Economic Research Institute, WFKukje Building Hangangro 2-ga, Yongsan-gu, Seoul, 140-702, Korea

191,

This paper considers the design of accelerated life tests when an extrinsic failure mode as well as intrinsic one exists. A mixture of two distributions is introduced to describe these failure modes. It is assumed that the lifetime distribution for each failure mode is Weibull. Minimizing the generalized asymptotic variance of maximum likelihood estimators of model parameters is used as an optimality criterion. The optimum test plans are presented for selected values of design parameters and the effects of errors in pre-estimates of the design parameters are investigated

1. Introduction Accelerated life tests(ALTs) are used to obtain information quickly on the lifetime distribution of materials or products. The test units are run at higher than usual levels of stress to induce early failures. The test data obtained at the accelerated stresses are analyzed in terms of a model, and then extrapolated to estimate the lifetime distribution at the design stress. The stress can be applied in various ways; constant, step, and progressive stress loading. Most common method in practice is the constant stress test in which the stress applied to each unit is constant throughout the test. Several authors have considered the problem of designing ALT plans. See, for instance, Nelson and Kielpinski [9], Meeter and Meeker [6], Tang et al. [12], and Pascual nd Montepiedra [10] for constant stress ALTs. All these works on the design of ALTs assume that the lifetime distribution of a test unit has only one failure mode. However, some electronic devices or other system components are subject to both intrinsic(wearout-related) and extrinsic (defect-related) failure modes. See, for instance, Mori et al. [7], Martin et al. [5] and Croes et al. [1]. Kim and Bai [4] considered the problem of estimating the lifetime distribution at use condition for constant stress ALTs when an extrinsic failure mode as well as intrinsic one exists. This paper considers the optimum design of constant stress ALTs under Type I censoring. The mixed distribution is introduced to represent intrinsic and 502

503

extrinsic failure modes, and stress levels and the proportion of test units at each stress are simultaneously determined. It is assumed that each failure time follows Weibull distribution, that a log-linear relation exists between each scale parameter and stress, and that the shape parameter is constant and independent of the stress. Minimizing the generalized asymptotic variance of the MLEs described in Escobar and Meeker [2] is used as the optimality criterion. The tables for optimum test plan are given, and the effects of errors in pre-estimates of design parameter are also investigated for selected value of parameters. 2. The Model 2.1. Assumptions 1) At any stress s y , the log-lifetime of a test unit follows a mixture of two smallest extreme value distributions with location and scale parameters, ftt and ok, k = 1 (intrinsic), 2(extrinsic). 2) /jjk is a linear function of a (possibly transformed) stress st; that is, 3) <Jk is constant and is independent of the stress. 4) The lifetimes of test units are independent and identically distributed. 2.2. Standardized Model Define the standardized stress Elj as p —

1

J

s -s

where sd and sh are design and highest stresses, respectively. The location parameters of log-lifetime distributions of test units at stress s • can be rewritten in terms of | y as ptjk = 0ok + p^

where pot = a0k + ausd and ftk = au (sh -sd).

We note that for sj = sd , £,d - 0 and //M = pok , and for sj = sh , £, = 1 and Mhk ~ Pok + P\k •

2.3. Test Procedure 1) At accelerated stress st, npi units are put on test at time 0 and run until a pre-specified time rj, j-\,---,h . 2) The failure times are observed continuously in time.

504 2.4. Lifetime Distribution From the assumption of the mixture of two distributions, the pdf of Yr , i = \,2,---,npj,

j = \,2,---,h,

is:

/U;©) = ^/U;©.)+^/J(j',;©2) where YtJ is log-lifetime of unit /' under stress s,, i = l,2,---,npj,

0) j-\,2,---,h.

The maximum likelihood estimates (MLEs) of the distribution parameters and mixing proportion can be obtained by expectation and maximization(EM) algorithm. See Kim and Bai[4] for details. 3. Optimum Test Plans For constant stress ALTs with a nonconstant shape parameter, Meeter and Meeker[6] found that test plans with three different levels of stress may be optimum, whereas it is optimum to use just two levels of stress when the shape parameter is constant(Nelson and Kielpinski[9]). Through numerical experiments with four or more levels of stress, we have found that two-stress or three stress plans are optimum when extrmsic failure mode as well as intrinsic failure mode exists. For the case where the product has two failure modes, tables useful for finding optimum test plans are constructed for selected combinations of design parameters. 3.1. Optimality Criterion Minimizing the generalized asymptotic variance of MLEs of model parameters is used as the optimally criterion. The asymptotic variance-covariance matrix of the MLEs 0 is the inverse matrix of l(®J. The generalized asymptotic variance of 0 is the determinant of the asymptotic variance-covariance matrix and is equal to the reciprocal of the determinant of l(®). That is, G4var(©) = r ' ( © ) =

(2)

2>,I,.(®)

7

* 2>A(©)

The mathematical derivations of G/lvar(©) can be found in Kim [3].

505 3.2. Design Parameters A test plan depends on the model parameters involved. Let pdi and pM be the probabilities that a unit following intrinsic failure mode will fail by censoring time rj at sd (£, = 0) and sh {£h = 1). That is, In/? -A>i

ft.=("

•A'-°l

a,

(3)

where O(-) is a standardized small extreme value cumulative distribution function. Similarly, pd2 and ph2 for extrinsic failure mode can be defined. Then Fisher information matrix and the asymptotic variance-covariance matrix can be written in terms of stress level ^ and proportion of test units at each stress level pj, j = \,•••,/i and design parameters w,, O " 1 ^ ^ ) , 0~'(p M ) and <jt, k = \,2. See Kim [3]. As a consequence, GL4var(©) can be rewritten in terms of nx, <&~'(pdk), r>

® {Phk) y &/, > 4j

an

d Pj , and the objective is to find £j and p* which

minimize G^var(©) for specified design parameters, j = \,---,h. 3.3. Optimization Method To find the optimum values of £y and p, , the Powell conjugate direction method(Powell [11]) for finding the minimum of a function without using derivatives is used. The generalized asymptotic variance may have several local minima. Optimum £* and /?* are obtained with several different initial values. 3.4. Tabulations For the case where there exist two failure modes, optimum test plans with three stress levels are tabulated since extensive numerical studies revealed that test plans with two or three levels of stress are optimum. Table 1 gives optimum ALT plans for all combinations of following design parameters. 1. «•, =0.7. 2. ^=0.0001, 0.001. 3. p u =0.25, 0.40, 0.60, 0.80, 0.90, 0.99. 4. cr, = 0.8 , cr2 = 0.5 . Table 1 are constructed with the constraints pm < prf2 and pM < ph2.

506

The information given in the tables are p), gj, y = 1,2 and GA', where GA' is the optimum «7G^var(©)/l09. To use an optimum ALT plan, one must have information about design parameters which are usually unknown. Therefore they have to be approximated from past experience, similar data, or a preliminary test. Table 1. Optimum ALT plans under two failure modes 7T,=0.7 Pa

0.0001

Pdl

0.001

Ph\

0.25

0.4

0.6

0.8

0.9

A*

P\

#>'

£

GV'

0.4

0.159

0.249

0.517

0.801

2876.

0.6

0.258

0.178

0.630

0.844

241.460

0.8

0.322

0.142

0.658

0.886

31.963

0.9

0.327

0.163

0.654

0.899

11.493

0.99

0.302

0.216

0.636

0.901

2.534

0.6

0.196

0.238

0.562

0.825

92.910

0.8

0.280

0.194

0.634

0.870

9.017

0.9

0.302

0.194

0.644

0.888

2.765

0.99

0.297

0.225

0.634

0.895

0.536

0.8

0.236

0.229

0.588

0.844

4.093

0.9

0.279

0.215

0.621

0.870

1.001

0.99

0.296

0.229

0.629

0.887

0.159

0.9

0.246

0.227

0.580

0.839

0.666

0.99

0.294

0.226

0.618

0.874

0.072

0.99

0.290

0.221

0.605

0.861

0.054

Phi

3.5. Example Consider an electronic part which has intrinsic and extrinsic failure modes. Suppose that the lifetime distribution for each failure mode is Weibull and the design temperature is 7^ = 180 °C and the accelerated high temperature is Tk = 260 °C. Assuming Arrhenius life-stress relationship, the transformed stress levels are sd =1000/(180 + 273.2) = 2.207 and sk =1.875 . For a pre-specified censoring time r\ , if pre-estimates of nx, pdi,pdl,pM, P^a\ a n ^ a% are 0.7,

507

0.0001, 0.001, 0.25, 0.6, 0.8 and 0.5, respectively, then the optimum test plans are obtained from Table 1 as p\= 0.258, /jj =0.178, £ = 0.630 and £ = 0.844. The optimum generalized asymptotic variance is GV' = 241.460 . The optimum low and middle stress levels are sj =£*X(J A -sd) + sd = 1.998 and s\ = 1.927 , respectively. Thus 7]* =1,000/1.998-273.2 = 227.3 °C and f2* = 245.7 °C and a life testing is performed as follows. np\ = 0.258n and wpj=0.178« units are randomly chosen and allocated to 227.3 °C and 245.7 °C, respectively, and the rest 0.564/; units are allocated to 260 °C. 4. Sensitivity Analysis To use an optimum test plan, one must have information about test parameters which are usually unknown. Therefore they have to be approximated from experience, similar data or preliminary tests. Incorrect choice of pre-estimates gives a non-optimum test plan and could result in less accurate estimates of the life distribution at design stress. In the case of ALT plan under single failure mode, the effect of incorrect choice of the pre-estimates of pd and ph was studied by Meeker[8]. Here we investigate the effects of errors in pre-estimates of /r, in terms of asymptotic generalized variance ratio GV'IGV', where GV is the generalized asymptotic variance of the MLEs when incorrect pre-estimate is used and GV' is the corresponding asymptotic generalized variance when true value is used. Table 2 gives GV'IGV' due to incorrect pre-estimate n\ of ny for given values of pdl = 0.0001 , pd2 =0.001 , pM=0.25 , ph2=Q.9 , and a, =0.8, and shows that if n\ is far from /r,, GV'IGV' is large and it is more pronounced in the case of a2 = 1.5 than a2 = 0.5 or <J2 = 2.0 . In the range of 0.5 < nx < 0.9 , test plans with two-stress and three-stress are optimum when a2 =2.0 and a2 =0.5, respectively. However, the optimum test plans when cr2=l.5 are two-stress ALTs if 7[, < 0.6 and three-stress ALTs otherwise. Since the variation of optimum stress levels and proportions allocated to each stress level gets larger when the number of optimum stress level changes, GV'IGV' of
508 Table2 GV'IGV' due to incorrect pre-estimate n\ of TT,

< n

\

0.5

0.6 0

1.000 0.5

0.6

0.7

0.8

0.9

1.002

0.7

0.8

0.9

1.007

1.016

1.035

1.005

1.020

1.035

1.059

1.003

1.010

1.022

1.040

1.002

1.008

1.022

2)

1.000

1.0003) 1.002

1.000

1.005

1.000

1.008

1.014

1.032

1.003

1.000

1.003

1.010

1.022

1.007

1.002

1.000

1.002

1.012

1.021

1.006

1.000

1.003

1.014

1.011

1.003

1.000

1.003

1.010

1.017

1.008

1.002

1.000

1.004

1.050

1.022

1.003

1.000

1.004

1.024

1.010

1.003

1.000

1.003

1.040

1.025

1.013

1.004

1.000

1.095

1.051

1.015

1.004

1.000

1.045

1.024

1.011

1.003

1.000

1) cr, = 0 . 5 ; 2) o- 2 =1.5;3)
5. Conclusion This paper proposed the optimum design of accelerated life tests under Type I censoring for products with two failure modes exist. It is assumed that the failure time distribution for each failure mode is Weibull with a scale parameter that is a log-linear function of the stress, and that the shape parameter is independent of stress. The optimum test stress levels and sample proportions allocated to stresses are determined numerically to minimize the generalized asymptotic variance of MLEs of model parameters. For a mixture of two Weibull distributions, tables useful for optimum constant stress ALT plans are constructed and found that test plans with two or three levels of stress are optimum when extrinsic failure mode as well as intrinsic failure mode exists. For selected combinations of design parameters, the effects of errors in preestimate of nx and o-, are studied in terms of variance ratio, and it is shown that the variance increase is large if incorrect pre-estimate is far from true value and

509 optimum ALT plans are more sensitive to incorrect choice of pre-estimate nx than those of <JX . References 1.

2. 3.

4.

5.

6. 7.

8. 9.

10. 11.

12.

K. Croes, W. De. Ceuninck, L. De Schepper and L. Tielemans, Bimodal Failure Behavior of Metal Film Resistors, Quality and Reliability Engineering International, 87-90 (1998). L. A. Escobar and W. Q. Meeker, Planning Accelerated Life Tests with Two or More Experimental Factors, Technometrics, 411-427 (1995). C. M. Kim, Inferences and Design of Accelerated Life Tests under Intrinsic and Extrinsic Failure Modes, Ph. D thesis, Korea Advanced Institute of Technology (2001). C. M. Kim and D. S. Bai, Analysis of Accelerated Life Test Data under Two Failure Modes, Int. J. of Reliability, Quality and Safety Eng., 111125 (2002). A. Martin, P. O'sullivan and A. Mathewson, Study of Unipolar Pulsed Ramp and Combined Ramped/Constant Voltage Stress on Mos Gate Oxides, Microelectron. Reliab., 1045-1051 (1997). C. A. Meeter and W. Q. Meeker, Optimum Accelerated Life Tests with a Nonconstant Scale Parameter, Technometrics, 71-83 (1994). S. Mori, N. Arai, Y. Kaneko and K. Yoshikawa, Polyoxide Thinning Limitation and Superior ONO Interpoly Dielectric of Nonvolatile Memory Devices, IEEE trans. Electron Devices, 270-276 (1991). W.Q. Meeker, A Comparison of Accelerated Life Test Plans for Weibull and Lognormal Life Distributions, Technometrics, 157-172 (1984). W. Nelson and T. J. Kielpinski, Theory for Optimum Censored Accelerated Life Tests for Normal and Lognormal Distributions, Technometrics, 105-114(1976). F. G. Pascual and G. Montepiedra, Model-Robust Test Plans with Applications in Accelerated Life Testing, Technometrics, 47-57 (2003). M.J.D. Powell, An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives, Computer Journal, 155-162(1964). L. C. Tang, A. P. Tan and S. H. Ong, Planning Accelerated Life Tests with Three Constant stress Levels, Computers & Industrial Engineering, 439-446 (2002).

ANALYSES OF ACCELERATED LIFE TESTS FROM GENERAL LIMITED FAILURE POPULATION C.M.KIM Six Sigma Division, Samsung Economic Research Institute, WFKukje Building Hangangro 2-ga, Yongsan-gu, Seoul, 140-702, Korea

191,

J.H.SEO Office of Consulting Services, Small Business Corporation, 24-3Yeouido-dong, Yeongdeungpo-gu, Seoul, 150-718, Korea This paper proposes a method of estimating the lifetime distribution at use condition for constant stress accelerated life tests when an extrinsic failure mode as well as intrinsic one exists. General limited failure population model is introduced to describe these failure modes. It is assumed that the log lifetime of each failure mode follows a locationscale distribution and a linear relation exists between the location parameter and the stress. An estimation procedure using the expectation and maximization algorithm is proposed. Specific formulas for Weibull distribution are obtained and illustrative examples are given..

1. Introduction Accelerated life tests (ALTs) are used to obtain information on life distributions of products or parts quickly and economically. Test items are run at higher-thanusual levels of stress to induce early failures. Test data are then extrapolated to estimate the lifetime distribution at design stress in terms of a model to relate life to stress. The stress can be applied in various ways; the most common method is to test units at constant stress until all units fail or censoring time is reached. The analyses of ALTs usually assume that the lifetime distribution at each stress comes from a prespecified parametric family of distributions such as exponential, Weibull, lognormal etc; See, for instance, Nelson and Meeker [13] for Weibull distribution and Nelson and Kielpinski [12] for lognormal distribution. See Nelson [11] for detailed treatments of ALTs. Most of previous works assume that the lifetime distribution has only intrinsic failures due to wearout, etc. However, this assumption may not be appropriate in some populations of electronic devices or other system components since extrinsic failures caused by defects, etc. also exist; see, for example, Mori et al. [10], Prendergast et al. [14], Martin et al. [7], and Croes et al. [2]. 510

511

When more than one failure mode exist, a mixture of distributions has been widely used in describing the lifetimes of units. See, for instance, Mann et al. [6], Lawless [5], Moosa et al. [9] and Gerhold et al. [4]. Another model for representing the situation where two failure modes coexist is general limited failure population(GLFP) model. Chan and Meeker [1] proposed GLFP model in which the defective units will usually lead to an infant-mortality failure early in their lifetimes and the nondefective units will eventually fail from wearout. This paper proposes a method of estimating the lifetime distribution at use condition for constant stress ALTs when an extrinsic failure mode as well as intrinsic one exists. The GLFP model is used to describe the two failure modes. Assuming that the log lifetime of each failure mode follows a location-scale distribution and its location parameter is a linear function of stress, the maximum likelihood estimates (MLEs) of the distribution parameters and the proportion of extrinsic failure are obtained by expectation and maximization(EM) algorithm. Section 2 describes an ALT model with intrinsic and extrinsic failure modes. EM algorithm and estimators of the lifetime distribution are presented in Section 3. The following notations will be used in this paper. Notation h k Sj

Number of stress levels. Failure mode index; 1 (extrinsic), 2(intrinsic). j th stress level, j = 1,•••,/>.

n]

Number of test units at stress sn j = 1, • • •, h .

4j

Standardized stress, £,• = —

aM, axk Pat> P\k Hjk

Parameters of linear relation. Parameters of standardized linear relation. Location parameter at stress s,, j = l,---,h.

ak Ft (•)

Scale parameter. Location-scale cdf.

/((•)

Location-scale pdf.

n

Mixing proportions of extrinsic failure mode.

, j = l,---,h.

©

{*.©„e 2 }

Y0

Log-lifetime of unit i under stress s., / = l,2,--,w>, ,j = \,2,---,h .

512 2. The Model The GLFP model is an extension of the LFP model proposed by Meeker [8] which considers only infant mortality failures caused by manufacturing defects and does not allow for wearout failures. It combines components from the LFP model for infant mortality with a competing risk model for long-term wearout. Therefore, the GLFP model can be a good alternative to represent the lifetime of units with intrinsic and extrinsic failure modes. See Chan and Meeker [1] for more detailed treatment of GLFP model. 2.1.

Assumptions

1) At any stress s, the log-lifetime of a test unit follows the GLFP model with location and scale parameters, /Jk (s) andcr t , k = l(extrinsic), 2(intrinsic). 2) /uk is a linear function of a (possibly transformed) stress s ; that is, Mkis) = aok+ociks. 3) The scale parameter crk is constant and is independent of the stress. 4) The lifetimes of test units are independent and identically distributed. 5) The cause of failure is not observed. 2.2. Lifetime Distribution From the assumption of the GLFP model, the cdf and pdf of Ytj, /' = l,2,---,« 7 , y' = l,2,---,/j,are: F(^;0) = l-[l-^(y,;0,)]F2(^;02), (1) /(^;0) = ^(^;01)F2(^;02) + /2(^;02)[l-^(y,;0l)],

(2)

where F J (^;0 2 ) = 1-F 1 (^;0 2 ). 3. Estimation with EM Algorithm From assumption 5), the only information available is the time to failure. One reason justifying this assumption is that despite the fact that in some circumstances the cause of failure could be classified as intrinsic or as extrinsic, it would require costs and time to classify all the failures. Another justification is that in many cases it is technically difficult to find out the real source of failure. Thus the cause of failure can be regarded as a missing variable and the EM algorithm can be utilized. The EM algorithm can obtain iterative solutions to the maximum likelihood equations in a wide class of missing data problems. On each iteration of the EM

513 algorithm there are two steps; the expectation step (E-step) and the maximization step (M-step). In the E-step, log-likelihood including missing data is replaced by its conditional expectation given the observed data. In the M-step, MLEs of the parameters are computed which maximize the conditional expectation of the log-likelihood calculated in the expectation step. See Dempster et al.[3] for details. When all failure times are observed, the log-likelihood is logL = X S l o g / ( x J ; 0 ) '*""."'

_

(3)

= XZiog(^U;®,)^U;©2)+/2U;02)[i-^U;©1)]) j'\

1=]

Let Iin and IiJ2 {= 1 - I.Jt) be the indicator variables denoting whether unit i at stress j follows intrinsic or extrinsic failure mode, respectively. If these IiJt 's were observable, then the log-likelihood of a complete data set would become log£c = £ i ; / „ [ l o g * + tog/(;y,;el) + logF1(jV;ei)]

(4)

+ / S ! [log/ 2 U;e 2 ) + l o g [ l - ^ ( ^ ; 0 , ) ] ] Here, Iljk 's are the missing variables. E-step: As the log-likelihood of a complete data set is linear in lljk 's, the expectation step simply requires the calculation of the conditional expectation of Iijk given the observed data y:j. We have

for i = \,2,---,rij and j = l,2,---,h . The quantity r , ^ ; © ) = l-T2(yjj;@) is the posterior probability that the failure is extrinsic given ytJ . Thus the expectation of I!jk given ytJ on the p th iteration is

V>(^l^) = r *(v 0 ( ' _ , ) )

(6)

where &p~^ is the parameter set obtained on the ( p - l ) t h iteration. Thus the conditional expectation of log-likelihood is e(0;0("») = Xir,(>',;e(- l >)[log^ + log/i(^;0 1 ) + logF2(>,,;©2)] +^(^;©(""'))[iog/2U;02)+iog[i-^(^;0,)]]

(7)

M-step: At the M-step of the p th iteration, the intent is to maximize

514

g(©;©(''~')) with respect to 0 to produce a new estimate 0 W of 0 . The values ©M of 0 can be obtained by simultaneously solving maximum likelihood equations obtained from g(©;0 ( '" 1) ). As the iteration of the expectation and the maximization steps progresses, 0 converges to the stationary solution. If the likelihood function is unimodal, the stationary solution of the algorithm is the unique MLE(Wu [15]). Example If the lifetime T follows Weibull distribution with scale parameter A. and shape parameter 8, then the log lifetime Y = \ogT has an extreme value distribution with location parameter // = log2 and scale parameter a = \I6. We illustrate the estimation method in previous section with the ALT data generated from GLFP for two extreme value distributions with parameters; x = 0.2;

A, =16, fl,=-6, o-,=0.8; A, =12, #2 =~8, <72=0.5; 7, =13, 772=10; In this example, the two-stress ALT is considered. Given £, = 0.5 , £2 = 1.0, «, = 40 and «2 = 20, Table 1 contains the failure and censoring times in minutes under each stress level. Table 1. Failure times with censoring : Weibull case Low Stress 8.0609 •13.0000

High Stress

12.8621

10.4355

10.5633

11.8881

12.7182

3.512

3.0552

8.028

6.842

8.9702

•13.0000

13.373

11.1832

11.0701

9.771

9.4451

12.7088

•13.0000

•13.0000

12.9983

8.536

•10.0000

11.4267

•13.0000

12.3584

12.151

3.5593

9.3345

10.1345

12.3878

9.3292

10.1685

3.8946

1.9653

11.7658

•13.0000

12.6632

13.8884

5.3833

9.8196

12.0121

12.5858

8.3515

6.1584

8.0971

•10.0000

•13.0000

12.0902

•13.0000

12.5658

9.1819

•10.0000

12.6148

11.6873

7.3143

12.098

3.6689

9.0773

' • ' denotes censored observation

515 Initial step: Initial estimates TC{0) = 0.5, pf = 20, /?,(,0) = - 4 , o-,(o) = 0.9, /?<20) = 11, p\f = - 9 , and a^ = 0.6 are chosen. E-step: With the initial estimates, the rJx y ;© (0 ') can be computed for i = \,2,---,rij, j = \,2,---,h and k = 1,2. For example,

,(, li; e'-))=^||fe^) = o,379. M-step: The first partial derivatives of Q(©;©''"''1 for the extreme value distribution are given in the Appendix. With rJy^®^)

and r2(>V©(0')> ^ ( , ) ,

©j'1 and ©j' can be obtained by simultaneously solving maximum likelihood equations, using a numerical method such as Newton-Rapson algorithm. Computations are iterated until the differences between ( p - l ) t h and p th value of parameters are smaller than 10"5. The stationary solutions k = 0.1750, A, =15.9831, fiu =-6.3975,CT,=1.0017, PQ2 =12.4098 , pn = -8.8776 , a2 = 0.4496 , are obtained after 90 iterations and 0.93 seconds on a Pentium III PC with 500Mhz and the estimates of qth quantile of the lifetime distribution at use condition are 4o5 =11.7498, r01 =12.1841. Appendix When the log lifetime of each failure mode follows an extreme value distribution, the first partial derivatives of g(©;0 ( '" ,, )with respect to © are 1

,

(A.l)

i=\ .=1

ae(©;®(''"')) (A.2) +T.

M'-°) 1

-**>(*. U)).

, / = 1,2,

516 aof©;©'""")

\„

l

* -j i

r

,

-II^('.' t M H-'-4h(',H4)) (A.3)

-V J -5sfr<* !8M )-'W*»« ^^-ti;H'.(** M M*M*W) (

,)

(A4) (A.5)

1

^ ( > ' , ; ® ' - ) { - - ^ U ) + z1(>',)exp(z1(>-„))}] where 2k[y^) = ———

^ - ^ , a n d {) and o(-) arepdf and cdf of standard

extreme value distribution, respectively. References 1. 2.

3. 4.

5. 6. 7.

8.

W. Chan and W. Q. Meeker, A Failure-Time Model for Infant-Mortality and Wearout Failure Modes, IEEE Trans, on Reliability, 377-387 (1999). K. Croes, W. De Ceuninck, L. De Schepper and L. Tielemans, Bimodal Failure Behavior of Metal Film Resistors, Qual. Reliab. Engng. Int., 87-90 (1998). A. P. Dempster, N. M. Laird and D. R. Rubin, Maximum Likelihood from Incomplete Data, J. of Royal Statistical Society Series B, 1-38 (1977).. J. Gerhold, M. Hubmann and E. Telser, Breakdown Probability and Size Effect in Liquid Helium, IEEE trans. Dielectrics and Electrical Insulation, 321-333(1998). J. F. Lawless, Statistical Models and Methods for Lifetime Data, John Wiley & Sons (1982). N. R. Mann, R. E. Schafer and N. D. Singpurwalla, Methods for Statistical Analysis of Reliability and Life Data, John Wiley & Sons (1974). A. Martin, P. O'sullivan and A, Mathewson, Study of Unipolar Pulsed Ramp and Combined Ramped/ Constant Voltage Stress on Mos Gate Oxides, Microelectron. Reliab., 1045-1051 (1997). W. Q. Meeker, Limited Failure Population Life Tests: Application to

517

9. 10.

11. 12.

13.

14.

15.

Integrated Circuit Reliability, Technometrics, 51-65 (1987). M. S. Moosa, K. F. Poole and M. L. Grams, EFSM: An Integrated Circuit Early Failure Simulator, Qual. Rehab. Engng. Int., 229-234 (1996). S. Mori, N. Arai, Y. Kaneko and K. Yoshikawa, IEEE trans. Electron Devices, "Polyoxide Thinning Limitation and Superior ONO Interpoly Dielectric of Nonvolatile Memory Devices," 270-276 (1991). W. Nelson, Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, John Wiley & Sons (1990). W. Nelson and T. J. Kielpinski, Theory for Optimum Censored Accelerated Life Tests for Normal and Lognormal Distributions, Technometrics, 105-114(1976). W. Nelson and W. Q. Meeker, Theory for Optimum Accelerated Censored Life Tests for Weibull and Extreme Value Distributions, Technometrics, 171-177(1978). J. G. Prendergast, E. Murphy and M. Stephenson, Predicting Gate Oxide Reliability from Statistical Process Control Nodes in Integrated Circuit Manufacturing - a Case Study, Qual. Reliab. Engng. Int., 267-277 (1997). C. Wu, On the Convergence Property of the EM Algorithm, Annals of Statistics, 95-103 (1983).

DEGRADATION CHARACTERISTICS OF POLYMERIC HUMIDITY SENSORS UNDER HIGH TEMPERATURE AND HUMIDITY SUNG-MIN KIM, DONG-HOON HWANG, JUNG-WON PARK, JUNG-KEOL HAM, Reliability Technology Center, Korea Testing Laboratory, 7'h Floor, Gyeonggi TechnoPark, 1271-11, Sa-1-dong, Sangnok-gu, Ansan-si, Gyeonggi-do, KOREA MYUNG-SOO KIM, GEUN-TAE OH Dept. of Industrial Engineering, The University ofSuwon, San 2-2, Wau-Ri, eup, Hwaseong-Si, Gyeonggi-Do, KOREA

Bongdam-

The degradation characteristics of polymeric humidity sensors under high temperature and humidity were investigated. The degradation tests were carried out at 60°C-90 relative humidity (RH), 85°C-85% RH and 110°C-85% RH. The test results show that the characteristic values of sensors at 60°C-90% RH and at 85°C-85% RH decreased over time, while those of sensors at 110°C-85% RH increased. From these test results, we could see that the degradation mechanism at 110°C-85% RH was different from that at other conditions. According to the failure analysis results, delamination at the interface between the polymer film and the Au electrode was the main reason for degradation at 60°C -90% RH and 85°C-85% RH.

1. Introduction The polymeric humidity sensor ('humidity sensor' below) is a device to measure relative humidity using a polymer film, as its electrical characteristics vary with the absorption and desorption of moisture. Much work on improving the reliability of humidity sensors has been done because of the rising interest in reliability of such devices in industry. With the development of polymer film materials with superior and stable absorbency by changing the monomeric chemical structure, it has become possible to manufacture polymer film with improved electrode binding, flexibility and the ability to withstand submergence in water[l~2]. However, in the field, the adhesive strength between the polymer film and the electrode pad decreases under continuous high humidity and the characteristics of the humidity sensors change because of delamination[3]. Therefore, the degradation characteristics of humidity sensors under high temperature and humidity are important factors in assessing their reliability in the field and were investigated in this research. 518

519 2. Technical Information for Humidity Sensors A humidity sensor's gold electrode is printed on an aluminum pad in a comblike pattern, with its leading edges soldered with Ag-Pd. The humidity sensors are manufactured after forming the polymer film through a process of leveling and curing in the polymeric liquid. A protective film is applied to prevent contamination from outside and to ensure stability at high humidity. The structure of a humidity sensor is shown in Figure 1 .[1—3] 5.08

Polymer film 10 Au electrode Over coat

-

Ag - Pd pad

Lead wire 0.5 13

254 mm Figure 1. Structure of the humidity sensor

Humidity sensors are characterized by standard characteristics, temperature characteristics, frequency characteristics, response characteristics, hysteresis and the ability to withstand submergence in water. Standard characteristics mean that the resistance value decreases exponentially as the relative humidity

520

increases. Standard resistance at 30% RH, 60% RH and 90% RH at 25°C is 1050 k£2, 31 kO and 1.8 kQ, respectively. Temperature characteristics refer to the fact that at the same relative humidity, the resistance varies with temperature; as a result, the humidity sensors must be calibrated for temperature. Frequency characteristics mean that the resistance varies with frequency. The humidity sensors must have rapid response characteristics to respond sensitively to changes in relative humidity and must have low hysteresis that is not affected by change of relative humidity. Because the polymer film is easily affected by water, the ability to withstand submergence in water is also an important factor. 3. Test Plans of Humidity Sensors Because the failure of humidity sensors is accelerated under high temperature and humidity, three test conditions were determined to investigate the degradation characteristics of humidity sensors, as shown in Table 1. Case 1 was based on the humidity sensor reliability standard RS C 005 5 [4], case 2 was based on IEC 60068-2-67[6] and JESD22-A101-B[7], and case 3 was based on IEC 60068-2-66[5] and JESD22-A110-B[8]. Twenty two specimens were tested under each test condition. Table 1. Test Conditions Case

Test conditions

1

60°C -90% RH

17.9

1000

2

85°C -85% RH

49.1

1000

3

110°C-85%RH

122

Humidity pressure(kPa)

Total test time, h

392

The standard characteristics of humidity sensors were measured at 25°C 60% RH and the humidity sensors were classed as 'failed' if the difference between the measured relative humidity and the initial relative humidity was greater than ± 5% RH in a test. The measurement system for measuring the standard characteristics was devised as in Figure 2 to minimize the error of measurement.

521

HbaHBy OraoMltaj ElHder Figure 2. Structure of the measurement system

4. Test Results The degradation tests were carried out at test conditions 1 and 2 for 1000 h and the characteristics of the specimens were measured at 0, 250 h, 500 h, 750 h and 1000 h. The degradation test was carried out at test condition 3 for 392 h, and the characteristics of the specimens were measured at 0, 24 h, 72 h, 252 h and 392 h. The relative humidity of humidity sensors at 60°C-90% RH for 1000 h changed as shown in Figure 3. The characteristics of the specimens in figure 3 show a downward trend although no failures occurred.

Tim e (h) Figure 3. Degradation characteristics at 60°C-90% RH

522

The relative humidity of humidity sensors at 85°C-85% RH for 1000 h show similar downward trend as for 60°C-90% RH, as shown in Figure 4. Under these conditions, 13 specimens failed.

i

or

400

600

Tim e (h)

Figure 4. Degradation characteristics at 85°C-85% RH

The relative humidity of humidity sensors at 110°C-85% RH for 392 h show different trends. The characteristics of the specimens have upward trends as shown in Figure 5. The variation of the characteristics of the specimens was within ±5% RH and there were no failures.

i

•2

-5

100

200

300

Time (h) Figure 5. Degradation characteristics at 85°C-110% RH

500

523

These results indicate that the degradation mechanism at 60°C-90% RH and 85°C-85% RH was different from that at 110°C-85% RH. According to these test results, 110°C-85% RH could not be used as an accelerated test condition. To investigate the degradation mechanism at 60°C-90% RH and 85°C85% RH, test specimens before and after test were inspected with a microscope. Figure 6 shows specimens before and after test.

(c) after test (85°C-85% RH) Figure 6. Delamination of the polymer film

524

As a result of comparing these three pictures, delamination between the polymer film and the Au electrode was found in specimens after test and the delamination at 85°C-85% RH was more serious than at 60°C-90% RH. We confirmed that the characteristics of the humidity sensors degraded because of the delamination between the polymer film and the Au electrode. 5. Conclusion Humidity sensors were tested under three different test conditions, 60°C-90% RH, 85°C-85% RH and 110°C-85% RH, to investigate their degradation characteristics. The characteristics of test specimens at 60°C-90% RH and 85°C-85% RH decreased over time, while those at 110°C-85% RH increased. From the failure analysis of the specimens after test at 60°C-90% RH and 85°C-85% RH, it was found that the degradation was caused by delamination between the polymer film and the Au electrode. These test results also show that 85°C-110% RH could not be used as an accelerated test condition. References 1. O. Y. Kim, M. S. Gong, J. of Korean Ind. & Eng. Chem. 9(4), 554-560 (1998). 2. C. W. Lee, M. S. Gong, J. Korean Ind. & Eng. Chem. 10(3), 461-466 (1999). 3. M. S. Gong, S. S. Lee, R. Y. Lee, J. of the Korean Institute of Electrical and Electronic Material Engineers 14(4), 302-308 (2001). 4. RS C 0055, Industry and Energy Agency for Technology and standards, Ministry of Commerce, (2004). 5. IEC 60068-2-66, International Electrotechnical Commission (1994). 6. IEC 60068-2-67, International Electrotechnical Commission (1995) 7. JESD22-A101-B, Electronic Industries Association, JEDEC Solid State Technology Association (1997). 8. JESD22-A110-B, Electronic Industries Association, JEDEC Solid State Technology Association (1999).

THE RELIABILITY ESTIMATION OF PIPELINE USING FAD, FORM AND SORM OUK SUB LEE School of Mechanical Engineering, 1NHA University, #253, Nam-Gu, Incheon, 402-751, Korea

Yonghyun-Dong,

DONG HYEOK KIM Department of Mechanical Engineering, INHA University, #253, Nam-Gu, Incheon, 402-751, Korea

Yonghyun-Dong,

The reliability estimation of pipelines is performed with help of the probabilistic method which includes the uncertainties in the load and resistance parameters of limit state function. The FORM (first order reliability method) and the SORM (second order reliability method) are carried out to estimate the failure probability of pipeline utilizing the FAD (failure assessment diagram). Furthermore, the MCS (Monte Carlo Simulation) is used to verify the results of the FORM and the SORM. It is noted that the failure probability increases with increase of the dent depth, gouge depth, operating pressure and outside radius, and decrease of wall thickness. And it is found that the FORM is useful and is an efficient method to estimate the failure probability for evaluating the reliability of the pipeline utilizing FAD. Furthermore the safety assessment technique for pipeline which utilize the FAD only and is deterministic method, is found to be more conservative than those using the probability theory and the FAD.

1.

Introduction

The pipeline industry for energy supply and delivery is being advanced with the rapid economical growth. Pipelines, like other structures in industries, are usually deteriorated according to varying boundary conditions. This natural deterioration in a metallic pipeline mainly occurs as a result of the damage caused by the surrounding environment. Therefore it is necessary to evaluate the reliability of pipeline and many researches on this subject have been progressed. One of varying methods for evaluating the level of reliability of structures with defects like crack is the failure assessment diagram (FAD) and crack driving force. In this paper, the FAD is utilized to evaluate the reliability of pipeline due to its simplicity into the practical application and its easiness of extension to varying mechanical elements [1-4]. The limit state for assessing the reliability of pipeline has been formulated using the FAD, and the failure probability is determined using the first order reliability method (FORM) and the second order reliability method (SORM). The reliability of pipeline is assessed using this failure probability and a case study has been done using the methodological 525

526 procedure developed in this study. The results obtained by using the FORM and the SORM are systematically analyzed to assess the reliability of the pipelines and compared with those estimated by using the Monte Carlo simulation (MCS). 2.

FAD (Failure Assessment Diagram)

The FAD expressed in Figure 1 is probably the most widely used methodology for elastic plastic fracture mechanics analysis of structural components. The state of a structure with a defect is expressed by a specific value of (Sr, Kr) on FAD. If this point is located inside region of failure assessment line (FAL), it is assessed that the defect can be allowable. However, it is assessed that the defect cannot be allowable, if this point is located outside of the FAL [2, 4, 5]. 1.2

|

0.8

« „ 0.6 "•0

s

1 °4 w S

3 0.2 0 0

0.2

0.4 0.6 0.8 Plastic Collapse, L r

1

1.2

Figure 1. Typical failure assessment Diagram (FAD)

Figure 2. Schematic of defects of buried pipeline with circumference crack

For the pipeline with a defect shown as in Figure 2, to utilize the FAD, the resistance for the plastic collapse, Sr and the brittle fracture, Kr must be calculated to evaluate the state of pipeline with a defect. These calculated values

527

are compared with the values determined by using Eq. (1) which is originated from the Dugdale model [2, 4, 5]. R Kf — Sr

3.

I ( w \\ -ln-jsec| — Sr

-1/2

U

(1)

Failure Probability

The failure probability is calculated using the FORM. The FORM is based on a first-order Taylor series approximation of the limit state function (LSF) which is defined as below[3,6-8].

Z = R-L

(2)

where R is the resistance and L is the load variable, respectively. Assuming that R and L are statistically independent normally distributed random variables, the variable z is also normally distributed. The failure probability (PF) is given as below. •da =
where o is the cumulative distribution function for a standard normal variable and p is the safety index or reliability index. Define, LSF and „ assume .initial value of the design point

i-

Compute mean and „ standard deviation of equivalent standard normal space

Compute Failure probability Yes • ^1*=-—^B^OOOf—--aCompute the new design pomt

+

4

Compute partial derivative at the design point

+

Compute partial derivative in, the standard normal space

Compute, the reliabihtv index

•

Compute new values at equ. standard normal space Figure 3. Processing of computing the reliability index

-+

The LSF for most real systems and cases are not linear but nonlinear. Rackwitz and Fiessler proposed a method to estimate the reliability index using the procedure as shown in Figure 3 for the nonlinear LSF. In this paper, we iterated the loop as shown in Figure 3 to determine a reliable reliability index until it converges to a desired value [3,6-8]. The SORM approach explored by

528

Fiessler uses various quadratic approximations. A simple closed form solution for a second order approximation was given by Breitung using the theory of asymptotic approximation as. n-\ -1/2

PFsoRM=(rP)Y[
(4)

;=1

where Ki denotes the principal curvatures of the limit state at the minimum distance point and p is the reliability index calculated by using the FORM [7,8]. Unlike many engineering analytical results, the ones obtained by probabilistic methods are difficult to verify experimentally. However, the adequacy of the results out of the FORM and the SORM may be required to be verified somehow. We use the MCS technique to do this job performed by the steps shown in Figure 4 [3,6-8]. Compute failuiv probability Set up a conventional deterministic .analysis model Replace constants with probability distributions For variables Compute the deterministic result and count failure event and simulatiot on number respeclivley

Generate random numbers according, to probability attributions of Variables

Figure 4. Processing of computing the failure probability using Monte Carlo simulation

4.

A Case Study

The random variables listed in Table 1 have been utilized to calculate the failure probability using the FORM, the SORM and the MCS, and to assess the reliability of the pipeline with defect [2, 3, 5]. Variable

Table 1. Random variables and their parameters used in a case study Mean C.O.V Variable Mean C.O.V Variable

Mean

C.O.V

°>

445.9MPa

0.029

°u

593.4MPa

0.024

P

7MPa

0.01

R

457.2mm

0.016

E

207GPa

0.04

w

12.8mm

0.023

v0

112,300mJ

0.02

D

2mm

0.015

L

120mm

0.01

A

53.55mm2

0.03

a

2mm

0.015

cv

55,200mJ

0.025

a

0.5

-

b

0.495

-

C

529

5. Results and Discussion A pipeline containing the defect in the axial direction is analyzed by using the FORM, the SORM and the MCS incorporating with the FAD. The variation of failure probability corresponding to the change of varying parameters is shown in Figure 5 for the FORM, the SORM and the MCS. It is noted that the failure probability increases with the increasing of dent depth, gouge depth, operating pressure and outside radius and with the decreasing of wall thickness. And it is also recognized in Figure 5 that the failure probabilities obtained by the FORM, the SORM and the MCS are very similar for the variation of gouge depth and wall thickness and it seems to be slightly different for the variation of dent depth, operating pressure and outside radius. 0 014-

I

— FORM » -SORM - . - MCS

-FORM -SORM MCS

i

L

bilit

0012-

/I

1

. o o ooa • O ° " 0 006V 3 = 0 004-

u.

7

0 0020 000-J

4—«^=

(a) Dent Depth(mm)

f—, , ,

J?

{b)Gouge Depth(mm)

0 14-

- . - FORM • SORM — MCS

— FORM - • - SORM - » - MCS

1 ! 0 020 00-

(c) Operating Pressure (MPa)

(d) Pipe Outside Radius(mm)

0.4-

bility

— FORM -•-SORM •••••.- M C S

2

1lure

Failure

1 I

0 12-

£ o.i-

0.0-

—.—1—.—f—

^-—

«

i

r

i

1

(e) Pipe Wall Thickness(mm)

Figure 5. A relationship between the results of the FORM, the SORM and the MCS

530

Table 2 shows the mean percentile differences between the results obtained by using the FORM and the SORM, and the MCS. It is recognized from Table 2 that the mean percentile difference between results of the FORM and the MCS is smaller than those of the SORM and the MCS even it is not clearly shown in Figure 5. Since the MCS is performed with 108 of a total simulation number, the failure probability computed by the MCS may be assumed as the theoretical value. It is interesting to note that the failure probability estimated by the FORM is closer than that obtained by the SORM to the value from MCS. The Breitung's SORM method used in this paper, uses the theory of asymptotic approximation to derive the probability estimate. The asymptotic formula is known to be accurate for large values of reliability index, i.e, for small failure probability Table 2. Comparison of the mean percentile differences between the values obtained by the FORM, the SORM, and the MCS. * Dent Depth Gouge Depth Operating Pressure Outside Radius Wall Thickness

(%) FORM vs. MCS

7.99

9.83

8.82

7.01

4.52

SORM vs. MCS 81.79 10.44 8.63 22.95 19.56 * percentile error = {(result of MCS)-(result of FORM or SORM)}/ (result of MCS)} x 100

Gouge Depth FAD * L^Omm « L=12mm v L=150mm After Iteration o L=*0mm A L=120mm v L=150mm

Dent Depth FAD • L=90mm * L=12mm T L=150mm After Iteration o L=90mm --* L=120mm v L=150mm

Sr (b) Gouge Depth

^""""""^ •^

Operating Pressure FAD • L=90mm • *.y & L=12mm " r L=150mm After Iteration o L=90mm •••••• L = 1 2 0 m m

v

L=150mm

Sr (c) Operating Pressure

\

\ \ \ \

I

Outside Radius FAD • L=90mm * L=12mm v L=150mm After Iteration o L=90mm •->. L=120mm v L=150mm 0.2

(d) Pipe Outside Radius

531

Sr (e) Pipe Wall Thickness

Figure 6. The difference in FAD between before and after iteration to calculation of reliability index

Figure 6 shows the FAD, in which all assessment points are displayed for the variation of random variables before and after the iteration in order to estimate the reliability index. It is found in Figure 6 that most of the assessment points are located in the safety region of the FAD before the iteration. However most of the assessment points are moved on the FAL as the iterations go on, because the value of limit state function is converged to zero with iteration. This processes show the application of FAD with the FORM theory and the estimated failure probability seem to be appropriate. It is normally decided that the pipelines are safe, if the failure probability calculated by any theory of the probability is maintained below about 10"4 level. Using this criterion, it is found in Figures 5 that more than half of the assessment points are not in the safe region. On the other hand, most of the assessment points are safe as shown in Figure 6. Therefore, it is found that the assessment for the pipeline using the FAD give more conservative estimation than those obtained by using a theory of probability such as the FORM. 6.

Conclusion

In this paper, the FORM and the SORM are used to estimate the failure probability to evaluate the reliability of pipeline with axial direction defects. The MCS is used to verify the applicability of the FORM and the SORM to pipeline by comparing the failure probability. The effects of various random variables on the failure probability estimated by the MCS, the FORM and the SORM incorporated with the FAD are systematically investigated and the following results are obtained: 1. The FORM is useful and is found to be an efficient method to estimate the failure probability for evaluating the reliability of the pipeline incorporated with the FAD, because the results of the FORM and MCS are similar.

532

2. The failure probability obtained by using the FORM, the SORM and the MCS increases with increase of the dent depth, gouge depth, operating pressure and outside radius, and decrease of the wall thickness. 3. The assessment of safe for the pipeline using the FAD is found to be more conservative than those obtained by using the theory of probability. Acknowledgments This work was supported by the fund of POSCO named endowed chair professor, INHA University Research Grant (INHA-33084). The authors wish to thank all the members concerned. References 1. S. L. Kwak, J. B. Wang, Y. W. Park and J. S. Lee, J. ofKSME 28(3), 289295 (2004). 2. Y. J. Kim, D. J. Shim, N. S. Huh and Y. J. Kim, Eng. Frac. Meek 71, 173191 (2004). 3. O. S. Lee and D. H. Kim, Key Eng. Mat. 270, 1688-1693 (2004). 4. T. L. Anderson, Frac. Mech.; Fundamentals and Applications, CRC press (1995). 5. B. Limited, Probabilistic methods, HSE (2001). 6. M. Ahammed, Int. J. Pressure Vessels and piping 75(4), 321-329 (1998). 7. S. Mahadevan and A. Haldar, Reliability Assessment Using Stochastic FEA, John Wiley & Sons, (2000). 8. S. Mahadevan and A. Haldar, Probability, Reliability and Statistical Method in Engineering Design, John Wiley & Sons, (2000).

RELIABILITY EVALUATION OF HIGH PRECISION OIL COOLER* SEUNG WOO LEE 1 Intelligent Machine System Research Center, KIMM, 171 Jangdong YusungGu, Daejeon 305-343, Korea SEUNG WOO HAN Nano-Mechanical System Research Center, KIMM, 171 Jangdong YuSungGu, Daejeon 305-343, Korea HWA KI LEE Department of Industrial Engineering, INHA University, 253 NamGu, Incheon 402-751, Korea

Younghyundong

The reliability, that is long-term quality, requires a different approach from the previous emphasis on short-term concerns. The purpose of this paper is to present reliability evaluation of high precision oil cooler system. The oil cooler system in question is a cooling device that minimizes deformation by heat of driving devices. This system is used for machine tools and semiconductor equipment and so forth. We carry out reliability prediction based on failure rate database and conducted the reliability test to evaluate life of oil cooler using test-bed. The results of this study have shown the reliability in terms of the failure rate and MTBF for oil cooler system and its components and the distribution of failure mode. It is expected that presented results will help to increase the reliability of oil cooler system and will be applicable to the evaluation of the reliability for other machinery products.

1. Introduction The reliability has been used as the testing standard for performance quality demanded by the user [1]. This trend has an effect on the machine part and machine tools industry where the skill has been neglected, indicates the change of environment where the concept of reliability is adapted to design step of existing design/production based on simple producing/safety parameters. The uncertainty of reliability is one of the most significant reasons why products are disregarded in the market [2]. This work is supported by Korean National Research Program of MOST. * Work partially supported by program of Development of Advanced Machinery Part.

533

534

Because most domestic machinery companies conduct rough test without scientific data, user cannot trust the reliability of products. In the machinery system & structure of machine tools, the ability of each part is linked to every other so the system is dependent on each part's reliability, which in turn determines the whole system's reliability [2,4]. This study introduces an example of reliability evaluation about high precision oil cooler system used to ensure the accuracy of machinery and control thermal errors. 2. Function & failure mode of cooler system The latest machine tools incorporate high speed production and highly efficient production. These introduce heat generation due to the friction of spindle, highspeed transfer table, and high-load devices on machine tools. Such heat generation leads to irregular heat distribution in whole structure of machine tools, and is one of the factors that cause severe damage to precision as well as weakening a machine's average durability. In order to control these causes, heat generation parts are cooled using an oil cooler system. Securing of an oil cooler system's reliability is essential to improving the reliability of machine tools. A reliability evaluation of the oil cooler system has been carried out by conducting a reliability prediction on a 1,000 M capacity On/Off type oil cooling system, which is widely used in the thermal control of machine tools because of simple structure and ease of use, and by making reliability test-bed.

Oil Tank

Figure 1. Structure and composition of the oil cooler system

As shown Figure 1, it consists of 6 sub system and 62 parts, including a control part that performs On/Off motion for cooling action, an electric part that supply electricity to the oil cooler system, a cooling oil circulation part through which the cooling oil that cools the heat on various part flows, a refrigerant pipe

535 part where the refrigerant is compressed by compressor, and an evaporation part in which the cooling oil and refrigerant meet and exchange heat. Failure of the oil cooler system generally occurs in the control part which controls the temperature of cooling oil. In particular, failure frequently occurs in the relay because it is the On/Off type. A further failure situation occurs when cooling fails because of damage to the coupling that supplies the motor's power to the cooling oil pump [3, 4, 7], 3. An Objective Reliability of Oil Cooler To evaluate an objective reliability of the oil cooler system, oil cooler system's working time is analyzed in machine tools to which oil cooler was attached. Thus, it appears to complete an On/Off motion in 8 minutes - 5 minutes for 'On' status and 3 minutes 'Off status. The oil cooler system's durability which, as mean time between failures, is predicted as the life of relay when performing the On/Off motion of compressor since it is the target oil cooler system that uses the relay. The relay used is guaranteed for 100,000 operations, the average durability presented by the manufacturer. Furthermore, to protect control circuits for oil cooler system, the term of On/Off motion was limited to 1 minute, and when condition is assumed the objective durability is calculated as 13,333 hours. Durability is calculated to 16,667 hours at most On/Off motion under common operating condition. 4. The Reliability Prediction To predict the reliability of oil cooler system, the NPRD95 failure database for non-electric part (machinery part) reliability and the MIL-HDBK-217F N2 reliability prediction standard are used. These databases have an exponential distribution [8,9]. In case of proper parts are not found, the method of relative part comparing is used. After calculating the each part's failure rate using the database, a RBD (Reliability Block Diagram) with cooling oil and refrigerant flow as its center is composed [6]. According to the reliability prediction, the oil cooler system has a failure rate 73.28 failures/million hours and MTBF 13,645 hours. Figure 2 shows the reliability change of oil cooler system and subsystem over time. Among the subsystem, it seems that the reliability of control part declines faster than other system. The control part shows the fastest declining reliability even though it is consisted of just 4 relays in the reliability prediction. As such, control part requires a thorough technical check. Table 1 shows the results of the reliability

536 prediction of 6 subsystems. As shown in the Table, the failure rate of control part and cooling oil and refrigerant pipe part account for 84% of the entire failure rate; it is thought that the reliability of oil cooler system could be greatly improved by reviewing these subsystems.

M

V

fu

•— ODCooter

s

— — —

Heclrio CootngOK RefrigsrantF•ipo

— Evaporalkxi ••« Caver

M

8

1IU)

-TO

' 3fiS>

#& $&

sea

M

M

*»

ine mm K M

130D

1«9>

Figure 2. Reliability change of oil cooler system over time Table 1. Result of the reliability prediction for the oil cooler system Subsystem Control Part Cooling Oil Circulation Part Refrigerant Pipe Part Electric Part Evaporation Part Cover System

Failure rate (failures/million hours) 23.68 19.99 17.89 7.35 1.29 3.06 72.38

MTBF (hour) 42,222 50,005 55,867 136,003 774,954 326,947 13,645

5. Reliability Test-Bed and Reliability Test 5.1. Items of Reliability Test & Reliability Test-Bed The six test items used in the oil cooler's reliability test are determined by analyzing the A/S data of the manufacturer. The six items are shown in Table 2. The precise temperature sensor - used to test accuracy - is installed on the same position as oil cooler's temperature sensor. When testing the controller to access whether it works accurately or not, the signal that drives the compressor must be found. To find the breakage of coupling that link the motor and pump,

537

one body type sensor is attached to coupling. Also alarm lamp is used to detect electric overload. To determine any leakage of the refrigerant (R-22 gas) caused by a crack in the pipe's welded part, a manometer is installed at the entrance of the compressor to measure any change in pressure of the refrigerant. Finally, cooling capacity is evaluated by measuring the change of temperature using a flow meter installed in cooling oil pipe. Table 2. Reliability test items Items of reliability test

Evaluation method

Temperature Sensor Controller Breakage of coupling Motor & Pump Overload of Motor & Condenser Leakage of refrigerant(welded part) Flowmeter

Precise sensor for scanning temperature Driving signal of compressor Contact Sensor Alarm lamp Manometer Calculating Cooling capacity

Figure 3. Test-bed for the reliability test of oil cooler system

Figure 3 shows the test-bed for reliability test of the oil cooler system. The test-bed consists of a heater, temperature control, and test-bed control. Instead of increasing temperature the cooling oil's temperature by increasing the revolutions of the spindle, heater is used to increase the cooling oil's temperature. The heater facilitates the control of temperature and heat according to each test condition by means of control computer. Two types of test method are used; manual and automatic mode. In the manual mode, test is performed under certain temperature and heating condition, while in automatic mode various test conditions such as temperature and heating time are set. It is also possible to set the initial values of required factors, and acquired test data is saved in the control computer as a file. When cooling oil's temperature exceeds

538 60 °C, heater turn off automatically, while test-bed is developed for remote monitoring as a safety devices. 5.2. Test Condition and Results The representative operational mode of machine tools is classified into 3 steps. The conditions of reliability test of oil cooler are determined as shown in Table 3. Table 3. Conditions of the reliability test for the oil cooler system

Step

Test condition Time Heat (min) (kcal)

1

0

30

2

1000

120

3

1500-1800

901

Status • Operate under off heater • Check feature of control temperature • Shortest term of on/off under normal temperature • Warm up the machine tools • Adding heat continuously • Revolution of spindle continuous • Disable cutting force • Rapidly increasing temperature • Rapidly accelerating spindle • Enable cutting force

The first step requires the warming of machine tools, and involves the shortest On/Off term for the compressor of oil cooler system. Under such condition, the control of the cooling's temperature could be grasped. The heater is in the Off mode during this step. For the second step the spindle of machine tools revolve without machining and cutting force is not applied. During this step, heater provides 1,000 kcal/hour which is the oil cooler system's general cooling capacity. In last step, cutting force is applied, and then cooling oil's temperature is increased suddenly by accelerating spindle very rapidly. One complete test cycle (3 steps) takes 4 hours. The change in cooling oil's temperature for inflow and outflows is shown in Figure 4. The cooling oil's temperature declines because heater is turned off in first step as shown Figure 4. The second step involves revolving the spindle without load. Oil cooler system decreases cooling oil's temperature induced by revolving of the spindle. In third step, machining was introduced. Therefore the temperature of the cooling oil becomes higher. We analyze cooling capacity and the temperature-control following capacity by monitoring cooling oil's temperature change of inflow and out outflow. The largest cooling capacity is

539 about 1200 teal, and this corresponds to cooler's capacity, 1000 teal through analysis.

Inflow CNJHtOW

20

1« ' •

I SO

'

1 100

•

1 150

'

T 200

•

( ZKJ

Tiene{mm)

Figure 4. Change of cooling oil's temperature over time (each step)

The failure rate, MTBF, and fail distribution of oil cooler system is calculated by using test-data from developed reliability test bed we developed. The function and parameters of failure distribution are inferred using the data gathered from the acquired reliability test. The suitability of failure distribution, which showed whether the inferred function of failure distribution accurately represent the real data's distribution, is decided using the Kolmogorov-Smirnov test or x2 [10]. In reliability test, failure times of 12,312 and 11,113 hours are obtained. Both attribute the failure to the durability of relay in control part. The failure distribution is analyzed following Weibull distribution (parameter eta = 11,997.282, m = 23.418), and MTBF of 11,722 hours is calculated. Because of the lack of the reliability test, the inferred index of the oil cooler system's reliability is difficult to trust. Further reliability testing is necessary. 6. Summary In this study, a reliability evaluation of oil cooler system is used to control the thermal error of machine tools. By analyzing the A/S data and the results of the reliability prediction, it has been shown that the failure pattern of parts that show a high failure rate are very similar to each other. The failure of cooler system generally occurs in the control part that controls temperature by an On/Off motion. The control part has highest failure rate in the whole oil cooler system, although the relay is used to make it up in reliability prediction.

540

In the results of reliability prediction, analysis shows a higher failure rate in the sensor for air temperature than in the flow type sensor for the fluid temperature, and in case of trochoid one body type pump, the failure rate is evaluated by dissembling the parts as there is no corresponding type. The figure of 13,333 hours, the system's objective reliability derived from the relay's durability basis, and the MTBF of 13,645 hours derived from the reliability prediction, suggests that the term of the relay's durability could be the MTBF of oil cooler system. Through a reliability test using a developed reliability test-bed, failure mode is analyzed, distribution is predicted and MTBF 11,722 hours is calculated. However, because of the limitations of reliability time and test, only two reliability tests are undertaken; as such, a far more accurate analysis of failure distribution of oil cooler system through further reliability testing is required. A more accurate reliability evaluation will be enabled by applying the data of reliability test from the developed reliability test-bed for oil cooler system. Finally, the oil cooler system with high reliability can be developed based on reliability evaluation in design step. Acknowledgments This study was supported by the National Research Development Program (The development of advanced machinery part) of the Korean Ministry of Science and Technology (MOST). References 1. J.H. Saleh and K. Marais, Reliability Engineering and System Safety 91(6), 665-673(1999). 2. S.W. Lee, J.Y. Song and H. K. Lee, J. of Applied Reliability 3(1), 41-58 (2003). 3. S.W. Lee, H.Y. Park and et al. Autumn Conference and Annual Meeting on KSPE, 43-46 (2001). 4. B.S. Kim, S.H. Lee, J.Y. Song and S.W. Lee, J. of KSMTE 14(1), 15-23 (2005). 5. S.W. Lee and H.K. Lee, J. ofKISE 28(1), 49-54 (2005). 6. MOA Soft Inc., A guide book of reliability prediction, Kyowoosa (2002). 7. Reliability Analysis Center, Failure Mode/Mechanical Distribution Document, (1997). 8. Reliability Analysis Center, Nonelectric Part reliability Data document, (1995). 9. Department of Defense, MIL-HDBK-217F N2 document, (1995). 10. Korean Standard Association, Distribution and Statistics of Reliability, (1992).

DESIGN OF ACCELERATED LIFE TEST SAMPLING PLANS WITH A NONCONSTANT SHAPE PARAMETER J.H.SEO Office of Consulting Services, Small Business Corporation, 24-3Yeouido-dong, Yeongdeungpo-gu, Seoul, 150-718, Korea M.JUNG System Engineering Group, Production Engineering Research Institute, LE PERI, 19-1 Cheongho-ri, Jinwuy-myun, Pyungtaek-si, Kkyunggi-do, 451-713, Korea C.M.KIM Six Sigma Division, Samsung Economic Research Instutute, 10F Kukje Duilding 191, Hangangrq 2-ga, Yongsan-gu, Seoul, 140-702, Korea Design of accelerated life test sampling plans (ALTSPs) is considered for products with Weibull lifetime distribution. It is assumed that the Weibull scale and shape parameters are log linear functions of (possibly transformed) stress. Two types of ALTSPs are considered; time-censored and failure-censored. Optimum ALTSPs which satisfy the producer's and consumer's risk requirements and minimize the asymptotic variance of the test statistic for deciding the lot acceptability are obtained.

1. Introduction Acceptance sampling plans are commonly used to determine the acceptability of a product. When the life of a product is an important quality characteristic, lifetest sampling plans are commonly used to determine acceptability of a product. In the life testing, the fixed number of items are often tested simultaneously and the test continues for a fixed period of time (time-censoring or Type I censoring) or until some fixed number of items on test fail (failure-censoring or Type II censoring). Several authors have considered the design of life-test sampling plans for Weibull distribution when items are run at used condition stress; see, for instance, Fertig and Mann[3] and Schneider[7]. However, when one needs to test the acceptability of high reliability products, it is impractical to use such a life-test sampling plans since they need a very long test time. One way of overcoming such a difficulty is to introduce accelerated life tests (ALTs) in lifetest sampling plans. ALTs are used in many contexts to obtain information quickly on lifetime distribution of products. Bai et al.[2] obtained the optimal 541

542

failure-censored ALTSPs when two prescribed levels of stress higher than used condition with given degree of censoring at each stress are used. Bai et al.[l] obtained failure-censored ALTSPs with equal expected test times at high and low test stresses. When lifetime of a product follows a Weibull distribution, the previous researches on ALTSPs assumed that the scale parameter varies with stress but the shape parameter remains constant. However, this assumption that shape parameter does not depend on stress is inappropriate for many applications. The literatures on metal fatigue, electronics reliability, and reliability physics contain many such applications; see Glaser[4], Schwraz[8], and Hiergeist et al.[5]. Meeter and Meeker[6] discussed the statistical models and test plans for ALTs for location and scale distributions with nonconstant scale parameter. In this paper, we extend the previous works on ALTSPs to the case in which the shape parameter of a Weibull distribution depends on stress. Design of time-censored ALTSPs is given Section 2. Section 3 obtains optimum failure-censored ALTSPs. 2. Design of Time-Censored ALT Sampling Plans 2.1. Assumptions The following assumptions are made: 1. 2. 3.

The lifetime X of a test item follows Weibull distribution at stress si, with the scale parameter i, and shape parameter vf. G(x) = 1 - exp{-(4*)"'} , 0 < x < oo , A, > 0 , v, >.0 The scale parameter A, and shape parameter IA is a log linear function of a (possibly transformed) stress. In4. = K0 + KISJ ,

4.

(1)

In vi = K'0 + /c'lsi. Lifetimes of the test items are statistically independent.

(2.1)

(2.2)

2.2. Life-Test Procedure 1. The life test uses m stress levels (s, < • • • < s,„ = sH ). 2. niti items randomly chosen among n sample items are allocated to sj. 3. All test items are run simultaneously and the failure time of each item is observed until a prespecified censoring time /;.

543

2.3. Lot Acceptance Sampling Procedure Consider the lifetime of a product to which a one-sided lower specification limit L is assigned. Items that have a lifetime X < L are considered nonconforming. The distribution of X is assumed to be Weibull with unknown parameters. Instead of using the actual lifetime X , the log lifetime Y = \n(X) is used, which leads to SEV distributions, respectively, with location and scale parameters //0 = — In^, , <70 = l/v0 . The lower specification limit will be L' = \n(L) . The following lot acceptance sampling procedure will be considered. 1. 2. 3. 4.

n items are randomly selected from the lot and are tested according to the above test procedure. MLEs /}„ andCT0of the location parameter at used condition fi0 and scale parameter <J0 , respectively, are obtained from the ALT data. The value of T = jii0 - ko0 (3) is compared with V , and the lot is accepted if T > L' and rejected otherwise.

The sample size n and the acceptability constant k are to be determined so that lots with fraction nonconforming p < pa are accepted with a probability of at least \-a and lots with p>pp are rejected with a probability of at least

\-p. 2.4. Theoretical Derivations 2.4.1. ModelStandardization and Reparametrization The following reparametrization is convenient. Define the standardized stress £• =(s,.-,s 0 )/(s / ; -s 0 ). For the used condition stress £ 0 =0 and for the highest stress 4m =1 . Stress-life relationship models Eq.(2.1) and Eq.(2.2) at stress s: may be rewritten in terms of c, as /', = r0 + n4i, In o ; ^ ; + / , ' £ . where

y0 =-(«„ +KISO) ,

yl=-icl(sll-s0)

(4.1) (4.2) ,

y'0 = - « , + 4s 0 ) = lno-0 ,

and

y[ = \n(crll/a0) = \nd . Note that the location parameter of the log lifetime distribution at used condition stress is /J0=y0.

544

2.4.2. Asymptotic Variance of Test Statistic The asymptotic variance of test statistic Eq.(3) is Asvar(T) = Asvar(y0)-2k-

Ascov[y0,a0)+k2

• /tevar(c7 0 ).

(5)

Asvar(/0) , Ascov(y0,cr0) and Asvar(- r,£)/<x,. can be written in terms of the standardized quantities a = (lnTj-y0)/cr0

and b = yja0

as

4t

C,=a-b4le . The Fisher information matrix F, for an observation at stress
Pi

('-£) HO

pfi

Pi

H&) 60-5)

Hi,)

pfi Hi,)

Svmmetric

(6)

-hrC(C.) 9.

where A(C,), S(C,),and C(C,) are *&) = >¥&), 5 (£.)= £"w-lnwe"Vw + (l-vP(C,))-C--exp(C)» C(C) = T(C)+ f w-(ln W )V'Ww+(l-T(C))<, 2 -exp^,), Let n/r, items be allocates to the each stress. The Fisher information matrix in

F is obtained as F = H^/T,.F, . Let H = ( « / C T „ ) F ' ' and /;;/ be (i,j) -element of

matrix H . Asvar(f0) , ^scov(f0,<x0) , and Asvar(cr0) are //,, , /;13 , and /?33 , respectively. Then the asymptotic variance can be expressed as 2

Asvai-(T) = ^(hu-2kh]i+k%2)

2

= ?±V2.

where V1 is a function of a , b, 9 , f., it. and k .

(7)

545

2.4.3. Operating Characteristics(OC) Curve The test statistic T is asymptotically normally distributed and the standardized variate k ^ V = fn {ro- °o-(ro-kcTo)} (T0-V is parameter-free and follows the asymptotically standard normal distribution. The OC curve is obtained by plotting L(p) against fraction nonconforming p , where L{p) = \-$> -yW~k) and up=(L'-y0)/cr0

(9)

is the quantile of the standard SEV distribution

corresponding to the fraction nonconforming p . The producer's and customer's risk requirements lead us to z, „un -z,u„ «_Js.t k-=±U±

(10)

(11) We can see that k' is determined by two points (pa,\-a)

and {pp,P) in the

OC curve and n depends on these two points and V . To minimize total sample size n , it would be reasonable to design the test plans so that V is minimized. 2.4.4. Optimum Design of Time-Censored ALTSPs The factor V1 depends on £ , ni, k , and design parameters 6, a and b. To obtain optimum design one must know the value of 0 , a and b , which is impossible. Many authors use pre-estimates of unknown parameters to overcome such difficulties and obtain optimum designs. Such pre-estimates can be approximated from past experiences, similar data, or preliminary test. Let p0, P„ be, respectively, the probabilities that an item will fail by ?j at used condition and at high stress level. Then the model parameters a and b can be expressed by 6, p0, and pH as fl = 4 ' - ' ( p 0 ) ,

b=

v-'(Po)~0'r-,{pll).

The optimum standardized stress level £," and sample proportion TT" allocated to it are the values tt and n. which minimize V1 for given 0 and

546

(a ,b) or (p0,p„) • We have chosen 0 and (p0,p„) as design parameters because they are more convenient for obtaining pre-estimators. The optimal values of £" and n' can be obtained by using a numerical search method such as Powell's conjugate direction method. The optimum total sample size n can be obtained by replacing V of Eq.(ll) with V". Table 1 shows optimum time-censored ALTSPs for selected combinations of two points of OC curve and (p0,pH) • Table 1. Optimum time-censored ALTSPs: (/>„,!-a) = (0.0006,0.95) , ( / y / ? ) = (0.04,0.10)

k = 5.0465 Po 0.000045

Pn 1 0.999 0.632

0.000335

1 0.999 0.632

(9=0.8

£

A

0.434 0.454 0.515 0.321 0.339 0.392

0.876 0.872 0.829 0.902 0.897 0.859

0=1.0

n 336 371 575 203 219 312

£

K\

0.420 0.443 0.516 0.311 0.330 0.394

0.858 0.852 0.800 0.891 0.885 0.840

n 240 266 425 158 171 247

In designing of ALT, the number of stress level is an important factor. Meeter and Meeker[6] shows that three subexperiements are optimum in a few situations. By the numerical studies, however, in ALTSPs, we have seen that the statistically optimum plan for ALTSPs reduces to a two-stress plan within reasonable range of k (k<&) and (p0,p„). 3. Design of Failure-Censored ALT Sampling Plans 3.1. Additional assumptions With the assumptions in Section 2, the following additional assumptions are considered. 1. 2.

The life test uses two predetermined stress levels, i.e., m = 2, low (s l ) and high (s2) stress level. The degree of censoring qt at a stress s, is prespecified.

3.2. Life-Test Procedure 1.

nit items randomly chosen among n sample items are allocated to sx and the rest are allocated to s2.

547

2.

All test items are run simultaneously and the failure time of each item is observed. At each stress level 5,., the test runs until ;; failures are observed, 7=1,2.

Lot acceptance sampling procedure is the same as that of Section 2. 3.2.1 Asymptotic Variance ofTtest Statistic The asymptotic variance of T and Fisher information matrix F is obtained by similar way in Section 2. Using qt, the elements of F, can be approximated as; 5(C() = !

vv-ln w-e "dw-q: •\nqi ln(-ln<7;),

C(£i) = \~ql. + f

w-(lnvv) •e~wdw-ql Tn<7;.-{in(-In9,.)} .

Asymptotic variance of T is Asvar(7') = -^--F 2 n and V2 is a function of £,., 9,, #, /c, and ;r . 5.2.2. Optimum Design of Failure Censored ALTSPs Optimum values of n and k can be obtained using the same procedure in Section 2. With the determined values of £,, qn 9 and k, K is computed so that the F2 is minimized. The following theorem gives optimum sample proportion n allocated to low stress level. [Theorem 1] For given f, , q;, 9 and k , the optimum value n* of n which minimizes the asymptotic variance of the test statistic subject to the restriction 0<7i<\

is v

A

where A and B are A = [ald1 -a2dt£2yk2

'

+ 2\b^d2pi -b2dxp2i;2}-k + ctd2p2 -c2dtc2p2,

B = d2(alk2 + 2blplk + c,p2 J and A, = A(C,), b, = 5 ( C ) , c, = C{£), d, = a[Ci - bf . Table 2 shows optimum failure-censored ALTSPs combinations of two points of OC curve and (i?,.^,,^).

for

selected

548 Table 2. Optimum failure-censored ALTSPs: (p{/,\ -a) = (0.0006,0.95) , (p,,,fi) = (0.04,0.10) k = 5.0465 9, 0.3

0.5

1i
(9=0.8 7i

0.919 0.792 0.696 0.915 0.781 0.682 0.928 0.813 0.724 0.924 0.803 0.710

(9=1.0 n

20 45 113 20 46 118 25 55 137 26 56 142

it

0.919 0.790 0.693 0.914 0.780 0.680 0.928 0.810 0.719 0.923 0.800 0.706

n

20 45 114 20 46 119 25 55 137 26 56 142

References 1.

2.

3. 4. 5.

6. 7. 8.

D. S. Bai, Y. R. Chun and J. G. Kim, Failure-Censored Accelerated Life Tests Sampling Plans for Weibull Distribution under Expected Test Time Constraint, Reliability Engineering and System Safety 50, 61-68 (1995) D. S. Bai, J. G. Kim and Y. R. Chun, Design of Failure-Censored Accelerated Life Tests Sampling Plans for Lognormal and Weibull Distribution, Engineering Optimization 21, 197-212 (1993). K. W. Fertig and N. R. Mann, Life-Test Sampling Plans for TwoParameter Weibull Populations, Technometrics 22, 165-177 (1980). R. E. Glaser, Estimation for a Weibull Accelerated Life Testing Model, Naval Research Logistics 31, 559-570 (1984). P. Hiergeist, A. Spitzer and S. Rohl, Lifetime of Oxide and Oxide-NitroOxide Dielectrics within Trench Capacitors for DRAM's, IEEE Transactions on Electron Devices 36, 913-919 (1989). C. A. Meeter and Jr. W. Q. Meeker, Optimum Accelerated Life Tests with a Nonconstant Scale Parameter, Technometrics 36, 71-83 (1994).. H. Schneider, Failure-Censored Variable-Sampling Plans for Lognormal and Weibull Distributions, Technometrics 31, 199-206 (1989). J. A. Schwarz, Effect of Temperature on the Variance of the Lognormal Distribution of Failure Times due to Electromigration Damage, Journal of Applied Physics 61,801 -803 (1987).

A STUDY ON THE LIFETIME PREDICTION OF THE RUBBER MATERIALS FOR REFRIGERATOR COMPONENT* CHANG-SU WOO Korea Institute of Machinery & Materials, 171, Jang-dong, Daejeon, 305-343, Korea

Yuseong-gu,

SUNG-SEEN CHOI Department of Applied Chemistry, Sejong University,98, Korea

Gwangjin-gu,

Seoul,

143-747,

This paper discusses the failure mechanism and material tests were carried out to predict the useful life of NBR and EPDM for compression motor, which is used in refrigerator component. The heat-ageing process leads not only to mechanical properties change but also to chemical structure change so called degradation. In order to investigate the heataging effects on the material properties, the accelerated tests were carried out. The stressstrain curves were plotted from the results of the tensile test for virgin and heat-aged rubber specimens. The rubber specimens were heat-aged in an oven at the temperature ranging from 70 °C to 100°C. Compression set results changes as the threshold are used for assessment of the useful life and time to threshold value were plotted against reciprocal of absolute temperature to give the Arrhenius plot. By using the compression set test, several useful life prediction equations for rubber material were proposed.

1. Introduction The interest of the fatigue life for rubber components was increasing according to the extension of warranty period [1]. A design of rubber components against fatigue failure is one of the critical issues to prevent the failures during the operation. Therefore, useful life prediction and evaluation are technologies to assure the safety and reliability of mechanical rubber components [2,3]. In this paper, the heat-aging effects on the material properties and useful life prediction of rubber materials for compression motor, which is used in refrigerator components as shown Fig. 1 was experimentally investigated. In order to investigate the heat-ageing effects on the material properties, the stressstrain curves were obtained from the results of tensile test. The rubber This work is supported by Reliability Design Technology Program of the.Ministry of Science and Technology, Korea.

549

550 specimens were heat-aged in an oven at the temperature ranging from 70°C to 100°C for a period ranging from 1 day to 180days. Compression set results changes as the threshold are used for assessment of the useful life and time to threshold value were plotted against reciprocal of absolute temperature to give the Arrhenius plot. By using the compression set test, several useful life prediction equations for rubber material were proposed.

Figure 1. Rubber mount for refrigerator

2. Failure Mechanism of Rubber Mount During its lifetime a rubber can be submitted to different types of degradation from different exposure conditions. The mechanism which produces degradation depend not only on the degradation agents present but also on the polymer and additives. The contribution of each factor is often complicated to evaluate but their classification is simply stated as shown Table 1. One of the important characteristics of rubber mount for compression motor, which is used in refrigerator component is their lifetime, they should be maintained enough mechanical and chemical properties for actual service conditions until their lifetime. However, it is difficult to estimate the lifetime, since their properties are sensitive to the manufacturing parameters and are also affected by typical use conditions including temperature, moisture, UV light, ozone, chemical attack and applied load type and so on [4]. It is necessary to get specimens having uniform properties and to find real service environments to predict the lifetime of rubber mount. Root cause failure analysis provided clues as to the elements of the design that need to modify for improved quality and long-term reliability [5,6].

551 Table 1 Ageing influence factors Type of ageing

Factors Temperature

Themo-oxidation, additive migration

UV light

Photo-oxidation

Ionising radiation

Radio-oxidation, crosslinking

Humidity

Hydrolysis

Fluids (gas, organic, vapours)

Chemical degradation, swelling

Mechanical stress, pressure

Fatigue, creep, stress relaxation, set

3. Material Properties Test for Rubber 3.1. Specimen and Test conditions The materials of rubber mount for refrigerator compression motor were a NBR and EPDM. Rubber can be considered as a hyper-elastic material showing highly elastic isotropic behavior with incompressibility. Hyper-elastic properties of rubber are characterized by the strain energy functions, which can be determined by the experimental stress-strain relationship [7]. The basic tests for strain states, namely, hardness and uni-axial and bi-axial tension test were performed to get the constitutive relation of the NBR and EPDM. The hardness of the specimen was measured using the International Rubber Hardness Degree(IRHD). Uni-axial and bi-axial tension test was loaded by a UTM at a speed of lOOmm/min, and the deflection was measured using a laser extensometer in Fig. 2. In the case of thermal ageing, the only way in which acceleration can sensibly be applied is by raising the temperature. Rubber specimens were heat-aged in an oven at the temperature ranging from 70 ° C to 100°C for a period from 1 to 180 days.

(a) Uni-axial tension

(b) equi-biaxial tension

Figure 2. Mechanical properties test of rubber materials

552

3.2. Result and discussion The loading curve and unloading curve are not coincident. Strain induced stress softening in rubber material called the Mullin's effect. Figure 3 are shown the results of the tensile strength and Mullin's effect for NBR and EPDM. When the stress-strain curve no longer changes significantly, the material may be considered to be stable for strain levels. The stress-strain curve stabilized after between 3 and 5 repetitions. After heat ageing, physical and chemical properties of rubber are likely to have changed. The monitoring of those changes requires the choice of suitable mechanical test such as tensile stress and strain measurement. Figures 4-5 are shown stress-strain curve of NBR and EPDM for heat-aging conditions. We know that the stiffness increases with increasing ageing temperature and days. Also, the properties change for NBR was higher more than EPDM.

(a) Tensile strength (b) NBR (c) EPDM Figure 3. Results of mechanical test and Mullin's effect of NBR and EPDM

111

(a)70°C

(b)85°C Figure 4. Stress-strain curve for NBR

^y.,

(a)70°C

(b)85°C Figure 5. Stress-strain curve for EPDM

(c) 100°C

.-/

(c) 100 °C

553 4. Lifetime prediction of Rubber Materials 4.1. Arrhenius Model The reaction rate of a chemical reaction normally increases with increasing temperature. By exposing test pieces to a series of elevated temperatures and measuring property change, the relation between the reaction rate of degradation mechanisms and temperature can, in principle, be deduced. Estimates can then be made by extrapolation of the degree of degradation after a given time at a given temperature [8,9]. The Arrhenius relation is the best known and most widely used of the two applications to the permanent effects of temperature. The reaction rate and temperature relationship can often be represented by the Arrhenius equation: -£

K(T)=A-eRT

(1)

or Lo&K(T))=B-(-jfi)

(2)

where K(T) is the reaction rate, A and B are constants, E is the activation energy, R is the gas constant and T is the absolute temperature. The reaction rate at any temperature is obtained from the change in the selected property with exposure time at that temperature. Although the Arrhenius relation is generally the first choice to apply to the effect of temperature, no general rule can be given for the measure of the reaction rate to be used with it. In the example shown in Fig. 6(a) the property parameter has been plotted against time at four temperatures, and the reaction rate taken as the time for the property to reach a given threshold value or end of life criterion. The log of the chosen measure of reaction rate is plotted against the reciprocal of absolute temperature, which should result in straight lines as illustrated in the Arrhenius plot in Fig. 6(b).

I i

(a) Change of property with time (b) Arrhenius plot Figure 6. Arrhenius model for lifetime prediction

554 4.2. Lifetime Prediction In order to lifetime prediction, we carried out the compression set with heataged in an oven at the temperature ranging from 70°C to 100°C for a period ranging from 1 day to 180days. The compression set was determined according to ISO 815 with a small test piece having a diameter of 29mm and thickness of 12.7mm. To carry out this test a simple compressive force is applied to small cylindrical disks, usually to a fixed degree of strain. Compression set is calculated from; Set (%) = -(rf° ~ (d0

-

dl)

(3)

x 100

d,)

Where d is the original thickness of the specimen, d { is the thickness in the compressed state and d 2 is the thickness after removal of the load. Not surprisingly, in all cases compression set increased with time of exposure and with increasing temperature. Compression set results presented graphically in Fig 7. Figures illustrate how the rate of change with time will vary for different materials and for different temperatures. Compression set results changes as the threshold are used for assessment of the useful life and time to threshold value (10%, 15%, 20%) were plotted against reciprocal of absolute temperature to give the Arrhenius plot. By using the compression set test, several useful life prediction equations for rubber material were proposed as shown table 2 and Figs 8-9. Table 3 are shown on the useful lifetime for NBR and EPDM. According to Arrhenius equation, useful lifetime of EPDM was longer more than NBR. Table 2. Arrhenius equations for rubber materials Material NBR

EPDM

Degradation 10%

Arrhenius equation l n ( 0 = - 1 1 . 2 1 +5351 /(273 + T)

15%

ln(/) = - 1 0 . 9 8 +5910 /(273 + T)

20%

ln(/) = - 9 . 1 7 +6213 /(273 + T)

10%

l n ( 0 = - 1 8 . 7 3 +8221 /(273 + T)

15%

ln(/) = - 1 8 . 6 2 + 8 9 2 1 /(273 + T)

20%

ln(/) = - 1 4 . 3 5 +8474 /(273

+T)

555

'«<•

(a) NBR (b) EPDM Figure 7. Change of properties for heat-ageing temperature

(a) 10%

(b) 15% Figure 8. Arrhenius plot of NBR material

(c)20%

(a) 10%

(b) 15% Figure 9. Arrhenius plot of EPDM material

(c) 20%

Table 3. Lifetime prediction on each temperature. Temperature (°C)

15 20 25 30

10% 1,587 1,156 850 632

NBR, Lifetime (hr) 15% 20% 13,916 243,510 9,805 168,514 6,989 118,065 5,039 83,696

10% 18,306 11,248 7,023 4,447

EPDM, Lifetime (hr) 15% 20% 3,518,730 232,117 2,129,723 136,899 1,310,925 82,125 50,111 819,948

556

5. Conclusion The material properties and fatigue life evaluation of rubber materials are very important in design procedure to assure the safety and reliability of the rubber components. In order to investigate the heat-aging effects on the material properties, the stress-strain curves were obtained from the results of tensile test. The aged specimens were heat-aged in an oven set at a temperature ranging from 70°C to 100°C for a period ranging from 1 to 180 days. We know that the material properties were a function of heat-aged period as well as temperatures. The stiffness increases with increasing ageing temperature and days. Also, properties change for NBR was higher more than EPDM. By using the compression set test, several useful life prediction equations for rubber material were proposed. Useful life estimation procedure employed in this study could be used approximately for the design of the rubber components at the early design stage. Acknowledgments This research has been supported by Reliability Design Technology Program of Ministry of Science and Technology, Korea. References 1. A.N Gent, Engineering with Rubber. Hanser Gardner (2001). 2. Frederick R. Eirich, Science and Technology of Rubber. Rubber Div. Of the American Chemical Society, (1978). 3. G.J. Lake, Fatigue and Fracture of Elastomers, Rubber Chemistry and Technology 68, 435-460 (1997). 4. Roger P. Brown, Rubber Product Failure. Rapra Review Reports 13(3), (2002). 5. R.P. Brown, M.J. Forrest and G. Soulagnet, Long-Term and Accelerated Ageing Tests on Rubbers, Rapra Review Reports 10(2), (2000). 6. R.P. Brown, T. Butler and S.W. Hawley, Ageing of Rubber - Accelerated Heat Ageing Test Results, Rapra technology Limited (2001) 7. A.K. Mai, S.J. Singh, Deformation of Elastic Solids, Prentice Hall PTR, (1990) 8. R.P. Brown, Practical Guide to the Assessment of the Useful Life of Rubbers, Rapra technology Limited, (2001). 9. Eberhard Meinecke, Frank Stuchal, Predictive Aging of Elastomers. 136th Meeting Rubber Div. ACS, Detroit, Fall (1989).

PART VII STATISTICAL ANALYSIS AND RELIABILITY MODELING

This page is intentionally left blank

M I X T U R E FAILURE RATES: O R D E R I N G A N D A S Y M P T O T I C THEORY

MAXIM FINKELSTEIN Department of Mathematical Statistics University of the Free State PO Box 339, 9300 Bloemfontein, Republic of South Africa and Max Planck Institute for Demographic Research Rostock, Germany (email: [email protected]) V E R O N I C A ESAULOVA Otto-von-Gurericke

University,

Magdeburg

We consider non-asymptotic and asymptotic properties of mixture failure rates in different settings. We show that the mixture failure rate is 'bent down', compared with the corresponding unconditional expectation of the baseline failure rate. We also consider a problem of mixture failure rate ordering for the ordered mixing distributions. Some results on asymptotic behavior of mixture failure rates are discussed The suggested lifetime model generalizes all three conventional survival models (proportional hazards, additive hazards and accelerated life) and creates possibility of deriving explicit asymptotic formulas. A special emphasis is given to the accelerated life model. It is shown that the mixture failure rate asymptotic behavior depends only on the behavior of a mixing distribution in the neighborhood of zero and not on the whole mixing distribution.

1. Introduction One can hardly find homogeneous populations in real life, although most of the studies on the failure rate modelling deal with a homogeneous case. Neglecting existing heterogeneity can lead to substantial errors in stochastic analysis in reliability, survival and risk analysis and other disciplines. Mixtures of distributions usually present an effective tool for modeling heterogeneity. It is well known that mixtures of decreasing failure rate (DFR) distributions are always DFR (Barlow and Proschan, 1975). On the other hand, mixtures of increasing failure rate distributions (IFR) can decrease at least in some intervals of time, which means that the IFR class 559

560

of distributions is not closed under the operation of mixing. While considering heterogeneous populations in different environments the problem of ordering mixture failure rates for stochastically ordered mixing random variables arises. We show that the natural ordering for mixing random variables in this case is the ordering in the sense of likelihood ratio. In Block et al (2003) it was proved that, if the failure rate of each subpopulation converges to a constant and this convergence is uniform, then the mixture failure rate converges to the failure rate of the strongest subpopulation: the weakest subpopulations are dying out first. This result generalizes a case of constant failure rates of populations considered by Clarotti and Spizzichino (1990) and also presents a further development of Block et al (1993) (see also Lynn and Singpurwalla, 1997; Gurland and Sethuraman, 1995). Although the lifetime model in these findings could be rather general, analytical restrictions, e.g., uniform convergence, are rather stringent. We suggest a class of distributions, which generalizes the proportional hazards, the additive hazards and the accelerated life models and prove a simple asymptotic result for the mixture failure rate for this class of lifetime distributions. It turns out that asymptotic behavior of mixture failure rates depends only on the behavior of the mixing distribution in the neighborhood of the left end point of its support and not on the whole mixing distribution. 2. Ordering of Mixture Failure Rates Let T > 0 be a lifetime random variable with the cumulative distribution function (Cdf) F(t) (F(t) = 1 - F(t)). Assume that F(t) is indexed by a random variable Z in the following sense: P(T < t\Z = z) = P(R < t\z) = F(t, z) and that the probability density function (pdf) f(t,z) exists. Then the corresponding failure rate X(t,z) is f(t,z)/F(t,z). Let Z be interpreted as a non-negative random variable with support in [a, b], a > 0, b < oo and the pdf TT(Z). Thus, a mixture Cdf is defined by /•OO

Fm(t)=

/ F(t,z)n(z)dz. Jo As the failure rate is a conditional characteristic, the mixture failure rate Am(£) should be defined in the following way (see, e.g., Finkelstein and Esaulova, 2001):

561

xm(t) = £i{!:z)^\iz

= fHt,zMz\t)dz,

J0 F(t,z)ir(z)dz

(i)

Ja

where the conditional pdf (on condition that T > t) is: 7r(z|t) = *(z\T >t)=

f ( *' Z )

n(z)

.

(2)

Therefore, this pdf defines a conditional random variable Z\t, Z|0 = Z. As a specific case consider the following multiplicative model X(t,z) = zX(t),

(3)

where, X(t) is a baseline failure rate. This setting defines the widely used in applications frailty model. Applying definition (1) gives: \m(t)

= / X(t,z)ir(z\t)dz

=

\{t)E[Z\t].

(4)

Ja

Assume now that we have different ordered mixing distributions. We want to establish ordering of the corresponding mixture failure rates. It will be shown that a natural ordering for our mixing model is the likelihood ratio one. A somewhat similar reasoning can be found in Block et al (1993) and Shaked and Spizzichino (2001). Let Z\ and Z^ be continuous nonnegative random variables with the same support and densities TTI(Z) and ^ ( z ) , respectively. Recall (Ross, 1996) that Zi is smaller than Z\ in the sense of likelihood ratio:

Zl >LR Z2j if i^2{z)l^\{z)

(5)

is a decreasing function.

T h e o r e m 2 . 1 . Let the family of failure rates X(t,z) in the mixing model (1) be ordered as X{t,zi) < X(t,z2),

zx < z2,

Vzi,z2e[a,b],t>0.

(6)

Then the family of random variables Z\t = Z\T > t is decreasing in t € [0, oo) in the sense of the likelihood ratio.

562

Proof. In accordance with definition (2): _F(t2,z)fbaF(t1,u)Tr(u)du

irjzlh) Li(Z,ti,t2)

= —, . , . — —

j —

(I)

F(t1,z)J F(t2,u)n(u)du

Therefore, monotonicity in z of L(z,ti,t2) F(t2,z) F(h,z)

•

b

exp s—

is defined by A(s, z)ds >,

which, due to ordering (6), is decreasing in z for all t2 > t\.

•

For the mixing model (l)-(2), consider two different mixing random variables Z\ and Z2 with probability density functions TTI(Z),TT2(Z), and cumulative distribution functions 111(2), 112(2), respectively. Assuming some type of stochastic ordering for Z\ and Z2, we intend to arrive at a simple ordering of the corresponding mixture failure rates. It can be seen using simple examples that the 'usual' stochastic ordering (stochastic dominance) is too weak for this purpose. It was shown in the previous section that the likelihood ratio ordering is a natural one for the family of random variables Z\t in our mixing model. Therefore, it seems reasonable to order Z\ and Z2 in this sense too. Let g{Z)MZ)

Mz)=

,

b

(8)

Ia9{z)iri{z)dz where g{z) is a decreasing function. Then Z\ is stochastically larger than Z2:

z1>stz2

(ni(s)
(9)

Theorem 2.2. Let relation (8), where g(z) is a decreasing function hold, which means that Z\ is larger than Z2 in the sense of the likelihood ratio ordering. Assume that ordering (6) holds. Then for W e [0,oo):

=

l/(M)»i(*)* b

J aF(t,z)1r1(z)dz

J?/(MW*)* f*F(t,z)w2(z)dz

s

(t)

(1Q)

563

Proof. Inequality (10) means that the mixture failure rate, which is obtained for the stochastically larger (in the likelihood ratio ordering sense) mixing distribution, is larger for Vt € [0, oo) that the one obtained for the stochastically smaller mixing distribution. We shall prove first that , ,x fz F(t,u)wi(u)du IIi(z|i) = J\ _ / F(t,u)iri(u)du

fz F(t,u)-K2(u)du , , , < Jab _ Tl2(z\t). J F(t,u)iT2(u)du

(11)

Indeed:

C^u)Sl^S)dudu

H F(t,u)Mu)du = Jba F(t,u)ir2(u)du Ja

v

'

v

!ba F(t,u) J(1

'

v

ni

rl"M

Ja

M

du

g(u)ir\{u)du

_ H g{u)F{t,u)ni{u)du

f*

la 9(u)F(t, U)-KI (u)du

F(t,u)iri(u)du

Ja F(t, u)7Ti (u)du

Finally: Ami(t)-Am2(t)= /

A(t,z)[7ri(i!|t)-7r2(z|t)]dz

Ja

= \(t,z)\a1(z\t)-u2{z\t)]\ba-

f x'z(t,z)[n1(z\t)-u2(z\t)}dz Ja

= f -Aipli(z|t) - U2(z\t)]dz > 0,

t > 0.

D

Ja

The proof of the following ordering theorem can be found in Finkelstein and Esaulova (2006b): Theorem 2.3. Let Z\ and Z 2 (E[Z2] = E[Zi]) be two mixing distributions in the multiplicative model (3)-(4)Then ordering of variances Var{Zx) > Var(Z2)

(12)

is a sufficient and necessary condition for ordering of mixture failure rates in the neighborhood of t = 0: A m i(t) 0 is sufficiently small.

*e(0,e),

(13)

564

3. A S Y M P T O T I C BEHAVIOR OF M I X T U R E FAILURE RATES Asymptotic behavior of mixture failure rates was studied in Block et al (1993), Gurland and Sethuraman (1995), Lynn and Singpurwalla (1997), Block et al (2003) to name a few. The approach of this section is different: we study a new lifetime model and derive explicit asymptotic formulas for mixture failure rates generalizing different specific results obtained for proportional hazards and additive hazards models. This approach allows also to deal with the accelerated life model (ALM), which was not studied in the literature. We define now a class of distributions F(t,z) and study asymptotic behavior of the corresponding mixture failure rate \m(t). It is more convenient at the start to give this definition in terms of the cumulative failure rate A(t, z), rather than in terms of X(t, z): A(t,z) = A(z(t) + iP(t)).

(14)

Natural properties of the cumulative failure rate of the absolutely continuous distribution F(t, z) (for all z € [0, oo)) imply that the functions: A(s),(f>(t) and ip(t) are differentiable, the right hand side of (14) is nondecreasing in t and tends to infinity as t -> oo and A(z(0)) + xp(0) = 0. Therefore, these properties will be assumed throughout this section, although some of them will not be needed for formal proofs. An important additional simplifying assumption is that A(s), s € [0,oo); (t), t 6 [0,oo)

(15)

are increasing functions of their arguments and A(0) = 0, although some generalizations (e.g., only for ultimately increasing functions) can be easily performed. Therefore, we will view 1 - e - ^ ^ , z ^ 0 in this paper as a lifetime Cdf. The failure rate, which corresponds to the cumulative failure rate A(t, z) is \{t,z)

= z<j>l(t)A'{zct>{t))+V{t),

(16)

where by A'(z(f>(t)) we, in fact, mean dA{z(j){t))/d{z(f>{t)). Relation (14) defines a rather broad class of survival models which can be used, e.g., for modelling an impact of environment on characteristics of survival. The widely used in reliability, survival analysis and risk analysis

565

proportional hazards, additive hazards, and accelerated life models, are the obvious specific cases of our relations (14) or (16): PH (multiplicative) Model: Let A(u) = u,

4>(t) = A(t),

ij>(t) = 0.

Then \{t, z) = z\(t),

A{t, z) = zA(t).

(17)

Accelerated Life Model: Let A(u) = A(u),

(£) = i,

^ W = 0.

Then rtz

A(t,z)=

\{u)du = A(tz),

\{t,z)

= z\(tz).

(18)

Jo AH Model: Let A{u) = u,

(j)(t) = t,

tp(t) is increasing, ^(0) = 0.

Then \(t,z)

= z + ip'(t),

A{t,z)=zt

+ ip(t).

(19)

The functions X(t) and ij)'{t) play the role of baseline failure rates in equations (17), (18) and (19), respectively. Note that in all these models, the functions (f>(t) and A(s) are monotonically increasing. The next result derives an asymptotic formula for the mixture failure rate Xm(t) under rather mild assumptions. We use approach related to the ideology of generalized convolutions, e.g., Laplace and Fourie transforms. Theorem 3.1. Let the cumulative failure rate A(t,z) (14) and the mixing pdf ir(z) be defined as TT(Z)

=zan1(z),

be given by model

(20)

566

where a > — 1 and TTI(Z),TTI(Z) ^ 0 is a bounded in [0,oo) and continuous at z = 0 function. Assume also that 7r(t) is increasing to infinity: 4>{t) -¥• oo

as t ->• oo

(21)

and

f

/o Jo increasing. where A(s) is also ultimately Then Xm(t)-^(t)^(a

+ l)&.

(23)

By relation (23) we, as usual, mean asymptotic equivalence and write a(t) ~ b(t) as t -> oo, if r i m ^ o o f a ^ ) / ^ ) ] = 1. The proof of this theorem can be found in Finkelstein and Esaulova (2006a). Specific cases: In a conventional notation the baseline failure rate is usually denoted as Xo(t) (or Aj,(i))- Therefore the multiplicative model (3) reads

X(t,z) = z\0(t),

A0{t) = J

X0(u)dv

(24)

Jo

and the mixture failure rate is given by J^°

zXo(t)e-^Mz)dz

As A(u) = u, 4>(t) — Ao(t), ip(t) = 0 in this specific case, Theorem 4 is simplified to Corollary Assume that the mixing pdf n(z),zE[0,oo) TT(Z) - zam(z),

can be written as (26)

where a > — 1 and 7Ti(z) is bounded in [0,oo), continuous at z = 0 and TTi ( 0 ) 9 * 0 .

Then the mixture failure rate for the multiplicative model (24) has the following asymptotic behavior:

567

Xm{t)

„ (« + DAoW. Jo Mu)du

(27)

In a conventional notation the accelerated life model is written as: ptz

X(t,z) = zX0{tz),

A(t,z) = A0(tz)=

\0{u)du

(28)

Jo

Although the definition of the ALM is also very simple, the presence of a mixing parameter z in the arguments make analysis of the mixture failure rate more complex than in the multiplicative case. Therefore, as it was already mentioned, this model was not practically studied before. The mixture failure rate in this specific case is A

fn°° z\0(tz)e-Ao^TT(z)dz - W = ° r°o 'ltz) „ y /v

•

, N (29)

Asymptotic behavior of Xm{i) can be described as a specific case of Theorem 3.1 with A(s) = A 0 (s), (p{t) = t and ip(t) = 0: CorollEtry Assume that the mixing pdfTr(z),z S [0, oo) can be defined as w(z) = za7Ti(z), where a > 1, 7Ti(z) is continuous at z = 0 and bounded in [0,oo), T T I ( 0 ) # 0 . Let the baseline distribution with the cumulative failure rate Ao(t) have a moment of order a + 1. Then Xm(t) ~ ^ ~

(30)

as t —>• oo. The conditions of Corollary 2 are not that strong and are relatively natural. The most of the widely used lifetime distributions have all moments. Relation (30) is really surprising, as it does not depend on the baseline distribution, which seems striking at least at the first sight. It is also dramatically different from the multiplicative case (27). References 1. Barlow, R and Proschan, F. Statistical Theory of Reliability and Life Testing. Probability Models. Holt, Rinehart and Winston: New York (1975). 2. Block, H. W., Mi, J., and Savits, T. H. Burn-in and mixed populations. Journal of Applied probability. 30, 692-702 (1993).

568 3. Block, H.W., Li, Y., and Savits, T.H. Initial and final behavior of failure rate functions for mixtures and functions. Journal of Applied probability, 40, 721-740 (2003). 4. Clarotti, C.A., and Spizzichino, F. Bayes burn-in and decision procedures. Probability in Engineering and Informational Sciences, 4, 437-445 (1990). 5. Finkelstein, M.S., and Esaulova, V. Modeling a failure rate for the mixture of distribution functions. Probability in Engineering and Informational Sciences, 15, 383-400 (2001). 6. Finkelstein, M.S., and Esaulova, V. Asymptotic behavior of a general class of mixture failure rates. Adv. Appl. Prob. 38, 244-262 (2006a). 7. Finkelstein, M.S., and Esaulova, V. On mixture failure ordering. Communications in Statistics. Theory and Methods, 35, 11 (2006b). 8. Gurland, J. and Sethuraman, J. How pooling failure data may reverse increasing failure rate. J. Americ. Statist. Assoc. 90, 1416-1423 (1995). 9. Lynn, N. J., and Singpurwalla, N., D. Comment: "Burn-in" makes us feel good. Statistical Science. 12, 13-19 (1997). 10. Ross, S. Stochastic Processes. John Wiley: New York (1996). 11. Shaked, M and Spizzichino, F. Mixtures and monotonicity of failure rate func-tions. In: Advances in Reliability (N. Balakrishnan and C.R. Rao-eds), Elsevier: Amsterdam (2001).

BAYESIAN ESTIMATION OF HAZARD RATE OF A MIXTURE MODEL WITH CENSORED LIFETIMES SUN EUNG AHN, CHANG SOON PARK, HYUN MOOK KIM, WOO HYUN KIM Department of Industrial Engineering, Hanayang Ansan 426-791, Korea

University

This paper is intended to compare the hazard rate from the Bayesian approach with the hazard rate from the maximum likelihood estimate (MLE) method. The MLE of a parameter is appropriate as long as there are sufficient data. For various reasons, however, sufficient data may not be available, which may make the result of the MLE method unreliable. In order to resolve the problem, it is necessary to rely on judgment about unknown parameters. This is done by adopting the Bayesian approach. The hazard rate of a mixture model can be inferred from a method called Bayesian estimation. For eliciting a prior distribution which can be used in deriving a Bayesian estimate, a computerizedsimulation method is introduced. Finally, a numerical example is given to illustrate the potential benefits of the Bayesian approach.

1. Introduction The hazard rate of an engineering system is defined as the instantaneous failure rate at which a failure occurs during a certain infinitesimal time interval [ t, t + At ], given that a failure has not occurred prior to the beginning of the time t [8]. Unknown parameters in the hazard rate should be updated if there are observed data. It is common to estimate unknown parameters from the observed data by using the maximum likelihood estimate (MLE) method. Then the hazard rate is expressed in terms of such estimated parameters. This inference is reliable as long as there are sufficient data because the MLE method uses only the observed data. On the other hand, if few pieces of data are available, the MLE method may not produce a good result. For this reason, we need to allow unknown parameters to have their own probability distribution, called a prior distribution. We do this by adopting the Bayesian approach. We first introduce a computerized-simulation method for eliciting a prior distribution. According to this prior distribution, we infer the hazard rate in two ways. The first method is to derive the hazard rate from a predictive distribution which is a mixture of a likelihood distribution and a prior distribution. The second method is to estimate the hazard rate by calculating the expectation of the hazard rate of a likelihood distribution over parameters using their own 569

570 distribution. The resulting estimate is called the Bayesian estimate. It is wellknown that the Bayesian estimate is an optimal decision rule with respect to a squared error loss function (SELF) and is mathematically the posterior mean of the parameters obtained from the given data [6]. For the sake of simplification, the exponential and the gamma distributions are employed as a likelihood distribution and a prior distribution, respectively. 2. Elicitation of a Prior Distribution In order to elicit a prior distribution, two things should be considered: the functional form of the prior distribution and the parameters in the prior distribution called hyperparameters. In the Bayesian approach, a natural conjugate prior distribution has been generally recommended because its functional form is identical to the likelihood distribution and its posterior distribution also has the same functional form. Therefore, the functional form of a prior distribution can be determined by adopting the notion of conjugacy [4]. When no observed data are available and experts have subjective knowledge of observable data rather than knowledge of the parameters, hyperparameters can be determined from the following two facts: First, the observable data are related to the likelihood distribution and the hyperparameters are mathematically associated with the statistics of the likelihood distribution. Second, the statistics of the likelihood distribution can be calculated from the experts' subjective knowledge by using a computerized simulation called the bootstrap [1, 2, 3, 10]. To illustrate how to determine hyperparameters, an exponential distribution can be employed as a likelihood distribution. The exponential likelihood distribution of an outcome /(> 0) given parameter # ( e 0 ] , is L(0;t) = 0e-".

(])

For the parameter 9 of the exponential distribution, the natural conjugate prior distribution is the gamma distribution [2], which is of the form n(e\a,p)=~P^9"-'e-"",

(2)

r(a) where a and 0 are the hyperparameters characterizing the uncertainty of 8, and r(-) is a gamma function. It should be noted that the parameter 0 is the reciprocal of the expectation of the exponentially distributed random variable and the mean and vanance of the gamma distribution (2) are a/j3 and aj'fl2 respectively. Hence, the

571 hyperparameters a and /? can be determined as follows. First, generate x'(they'th artificial data point in the / th artificial data set) from the subjective distribution which follows an exponential likelihood distribution as determined by experts. Second, calculate the statistics T(X' )= nJY.".^xj which correspond to the reciprocals of the expectation of the exponential random variable. Third, calculate the statistics T(X) which are the sample mean and variance of T(X'), and finally equate T(x) to a//3 and a/j32 . For a numerical example, suppose that experts express their subjective knowledge in terms of the subjective distribution Fs as in Table 1. Table 1. The subjisctive distribution,•F, time period

010

1020

2035

3550

5070

7090

90-1 20

120160

160230

230300

probability of failure

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

0.1

Based on the subjective distribution Fs and the predetermined variables n = 10 and m = 100, the artificial data sets, X1 , X2 , ..., X100 of the size 10 are generated. When X1 = {14,87,60, 28,108,90,104, 27, 68,100} is the first data set, the reciprocal of the sample mean of X'is T ( X ' ) = 10/686* 1.4577 xlO" 2 . After calculating up to T ( X 1 0 0 ) , the sample mean and the sample variance of 2 5 T ( X ' ) , T ( X 2 ) , ..., T(X 1 0 0 ) turn out to be 1.1654xl0" and 1.9177x10 , respectively. Since the mean and variance of the gamma distribution (2) are aj P and aj fS1 respectively, two equations a/ft = 1.1654x10~2 and a//? 2 =1.9177x10"5 result in the hyperparameters a = l and /? = 608 . Depending on the meaning of the hyperparameters a and /? in the gamma distribution, the experts' subjective knowledge of the parameter 6 is translated into the fact that a (= 7) failures have occurred during time length /?(= 608). 3. Hazard rates from the Bayesian Approach The hazard rate h(t) is mathematically defined by

where S() is a survival function and /(•) is a probability density function for a system's lifetime. In the Bayesian approach, the hazard rate can be inferred by two methods. The first one is to derive the hazard rate, according to its definition, from a predictive disfribution. The predictive distribution results

572 from the mixture of a likelihood distribution with a prior distribution for the parameters of interest. The second one, which is called the Bayesian estimation, is to estimate the hazard rate by calculating the expectation of the hazard rate of a likelihood distribution over the parameters using their prior distribution. It is well-known that the Bayesian estimation is performed for a squared-error loss function (SELF) so that the resulting Bayesian estimate is simply the posterior mean of the parameters given data [6]. For various reasons such as preventive maintenance activity or carelessness of the observers, it frequently happens that the completely observed lifetimes of an engineering system can not be obtained. In this case, the lifetime t of a system is limited by a specified time T so that the exact form of the likelihood distribution is

V

\eeT

'

,ift>T.

If there are n observed lifetimes D = {t],t2,...,tn} composed of k(
£(0;0) = n !(
= <9"-*exp(H9£fJ.)xexp(-/fc07'))

(5)

where S/ ; is the summation of all the completely observed lifetimes. Bayes' formula [2] states that the posterior distribution is related to the prior and likelihood distributions according to *(0\D)= V

' '

m D M 0 )

(6)

f

JL(0; D)7t{6)d8 • e

Since the gamma distribution is a natural conjugate prior distribution for the exponential distribution, by putting (2) and (5) into (6), the posterior distribution of 9 given data D becomes the gamma distribution with updated hyperparameters a + n - k and /3 + I.t^+kT. That is,

n{6\D)= te + Z'j+wY*' ^ - * - e x p { _ ( / ? + 2:/ Yya + n-k)

+kT)g}.

The distribution (7) is used to calculate the hazard rates of our concern.

(7 )

573

3.1. Hazard Rate of the Predictive Distribution The predictive distribution for a future observation 7 given data D can be obtained by using the following mixture, f{i\D)=

^L{6;D,t)7T(0\D)d0.

(8)

From the fact that 7 and D are independent when the parameter 8 is given, L{6\D,t) is equivalent to L{8;t). That is, 7\d is exponentially distributed as in (1). Putting (7) into (8) results in

/-, \ / J f{t D)=(a + n-k)\

0 + Zt,+kT P *

[p + Ltj+kT

v

a+n-k /

+t

\

0 + Tt:+kT + t

(9)

Its corresponding survival function, S{t \ D), becomes S(t\D)~-

0 + I.tj+kT P + I.tj+kT + 7

(10)

According to the definition in (3), the hazard rate from the predictive distribution (9) becomes h(i\D) =

a + n0 + T.t -+kT + t

(11)

Note that the hazard rate (11) is a function of the unobserved future observation t and decreases in time 7. This coincides with the fact that a mixture of an exponential distribution has a decreasing hazard rate (DHR) [9, 11]. 3.2. Bayesian Estimate of the Hazard Rate The expectation of the hazard rate of the likelihood distribution L{6);7) given the observed data D is defined as the Bayesian estimate [6]. This means that E[ h(t | D, 9) ] is the Bayesian estimate. Since the hazard rate of an exponential distribution is its parameter 0, we take the expectation over 9 with the updated posterior distribution (7). Thus, we have the following updated Bayesian estimate of the hazard rate after observing D .

574

E[h(T\D,

9)]= E[E[h(7\ D, d)]\D] = E[0\D] _

a +n - k

~ P + Iti + kT '

(12)

Note that the Bayesian estimate of the hazard rate (12) is a constant hazard rate (CHR) because all variables in (12) are predetermined or observed. 4. Comparing the Resulting Hazard Rates with the MLE The MLE [6] of the hazard rate for our problem is obtained by differentiating (5) (or its logarithm) with respect to 6 and equating the differential to 0. It is h(t\D)

= 6=-^—

(13)

J

Likewise the Bayesian estimate of the hazard rate (12), the estimate (13) by the MLE method is also a CHR. Compared with the MLE method, the Bayesian approach has the following advantages. First, when there are sufficient data, the MLE of the hazard rate (13), which is based only on the collection of the observed data, will work fine. We, however, are often faced with the situation in which little data (or sometimes no data) are available. In this situation, the MLE method cannot guarantee a good result (or cannot be used). On the other hand, the Bayesian approach overcomes such a problem by using a prior distribution if it is determined properly. This is because any knowledge of parameters can be incorporated into a prior distribution. Second, in analyzing the unknown parameters, the MLE method provides only the point estimation. To obtain interval estimation or probability distribution of the unknown parameters, the MLE method requires another quantity which is generally called a pivot [7]. Reasonable options for the choice of a pivot quantity, however, are not necessarily available in any given problem. On the other hand, the Bayesian approach essentially considers the unknown parameters as random variables having their own probability distributions. Hence, the Bayesian approach provides more formal framework for the estimation of unknown parameters than the MLE method does. Third, based on routine probability theory, decisions based on the Bayesian approach are optimal if the model assumptions and prior specifications are accurate. In contrast, classical analyses only yield approximations to an optimal decision [5].

575

5. Numerical Example For a numerical example, the following is assumed: First, failures occur according to an exponential distribution with*? = 0.01. Second, lifetime data are censored whenever lifetimes are greater than 70 (that is, 7 = 70). Third, by calculating with the proposed method described in section 2, the gamma distribution with hyperparameters a = 5 and /? = 549 is obtained as a prior distribution for 6. By using a random number generator, 10 random data composed of 4 censored data and 6 completely observed data are generated. Those data are listed below and censored data are marked by an asterisk: 70*, 70*, 6 , 1 1 , 40 , 70*, 70*, 18, 34, 22. From the Bayesian approach using (7), the updated posterior distribution becomes the gamma distribution with the updated hyperparameters a = 11 and J3 = 960 . Then, based on (11) and (12), the hazard rate from the predictive distribution and the Bayesian estimate of the hazard rate are ll/(960+j) and 11/960 , respectively. From the MLE method using (13), the estimate of the hazard rate is 6/411. 0.016 0.014 0.012 o.oi 0.008 0.006 0.004 0.002 0

r :

' "

"

-~-^______ ~—i

'—————^__ " ' "

• -

—

_

I

1

1

1

1

1

1

1

1

1

>

0

50

100

150

200

250

300

350

400

450

500

MLE —— Predictive distribution

Bayesian estimate True value

Figure 1. Comparison of the hazard rates

Three hazard rates are plotted in Fig. 1, which shows that the Bayesian approach reflects the constant hazard rate (CHR) of the exponential distribution more effectively than the MLE method does. Fig. 2 shows the variations from the two estimates of CHR from the Bayesian estimate and the MLE method as the amount of data increases. Even when there is no available data, the Bayesian estimate can infer the hazard rate as shown in Fig. 2. This is due to the utilization of the prior distribution. It should be noted that the MLE method has no estimate until the third data point is available since the first two data points

576

are censored. From Fig. 1 and Fig. 2 it can be seen that the Bayesian approach produces more reliable results than the MLE method does. 0.02

•

0.015

•

•

0.01 -

.

'

i

2

3

-- - • '

^

'

• • • • * - '

'

0.005 0

•-- - MLE

4 A

5

6

7

8

Bayesian estimate

9 True value

Figure 2. Variations of the estimates of CHR

Acknowledgments This work has been supported by the research fund of Hanyang University (HY2003-S), Korea. References 1. A. C. Davison and D. V. Hinkley, Bootstrap Methods and their Application, Cambridge, Cambridge University Press, (1997). 2. A. Gelman, J. B. Carlin, H. S. Stern and D. B. Rubin, Bayesian Data Analysis, New York, Champman & Hall, (2000). 3. B. Efron and R. G. Tibshirani, An Introduction to the Bootstrap, Boca Raton, Champman & Hall, (1993). 4. D. F. Percy, European Journal of Operational Research, 139, 133 (2002). 5. D. F. Percy, K. A. H. Kobbacy and B. B. Fawzi, International Journal of Production Economics 51, 223 (1997). 6. H. F. Martz and R. A. Waller. Bayesian Reliability Analysis, New York, John Wiley & Sons Inc., (1982). 7. H. S. Migon and D. Gamerman, Statistical Inference: An Integrated Approach, LONDON, Arnold (1999). 8. L. M. Leemis, RELIABILITY: Probabilistic Models and Statistical Methods, New Jersey, Prentice-Hall (1995). 9. M. S. Finkelstein and V. Esaulova, Reliability Engineering and System Safety 71, 173(2001). 10. P. Hall, The Bootstrap and Edgeworth Expansion, New York, Springer (1992). 11. R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, New York, Holt, Rinehart & Winston (1975).

A N O T E O N P A R A M E T E R ESTIMATION FOR P H A S E - T Y P E D I S T R I B U T I O N IN C A N O N I C A L FORM *

H. GOTOH, H. OKAMURA AND T. DOHI Department of Information Engineering, Graduate School of Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima 739-8527, JAPAN E-mail: {okamu, dohi}@rel.hiroshima-u.ac.jp

A phase-type (PH) distribution is the probability distribution of a killing time for a finite-state Markov chain. In this paper, we consider an algorithm for fitting parameters of the PH distribution to sample d a t a sets. Especially, we focus on a specific subclass of P H distributions, which admits the minimal representations called canonical forms. The developed estimation procedure is based on the EM (Expectation Maximization) principle. The EM algorithm specified to the canonical forms needs less computational efforts than the EM algorithms for general PH distributions.

1. Introduction A phase-type (PH) distribution is the probability distribution of a killing time for a finite-state Markov chain, and can be represented as the time until an absorption in the Markov chain with some transient states. The commonly used probability distributions for state transitions are exponential distributions, and thus the PH distribution consists of both mixtures and convolutions of them; for example, n-Erlang distribution and hyperexponential distribution. One of the advantages of using PH distributions is to build some computationally tractable models. Thus it has been applied to a modeling of queueing system 1 ' 2 - 3 , insurance risk analysis 4 , renewal analysis 5 and reliability studies 6 . Although the PH distributions are quite attractive in terms of mathematical treatment, statistical estimation procedures for the PH distributions are not so easier than those for the other probability distributions. *This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Exploratory Research, Grant No. 15651076 (2003-2005).

577

578

The intractability in the statistical analysis is caused by the fact that PH distributions often involve a number of parameters. Since the PH distribution is a straightforward extension of the exponential distribution, the number of parameters tends to increase as it captures statistical features like non-exponential distributions such as Weibull distribution, log-normal distribution and pareto distribution. Therefore the parameter estimation problems of the PH distributions have been occasionally discussed in the viewpoint of computational efforts. Traditionally, there are two research aspects for fitting parameters of the PH distributions; moment matching and maximum likelihood (ML) estimation. In the moment matching, one focuses on several kinds of moments like mean, variance and associated higher moments, and determines the parameters such that analytical moments from the PH distribution match to the empirical moments. This is, nowadays, recognized as a conventional way to estimate parameters of the PH distributions, but is still capable of reducing computational efforts in practice. Johnson and Taffe7 attempt to determine algebraically unknown parameters of the PH distributions with 2 or 3 phases, based on the moment matching. On the other hand, Yoshihara et al.s apply the familiar linear programming techniques to the moment matching for a Markov-modulated Poisson process as an extension of the PH distribution. Thus most researchers and engineers are willing to use the moment-based procedures in actual estimation problems on the PH distributions. However, in the statistical points of view, the likelihoodbased methodologies are obviously superior to the moment-based ones. In the 1990s, the likelihood-based approaches have drastically been developed by enhancing numerical computation techniques for the maximization of likelihood or log-likelihood function. It is well known that the likelihood potentially includes more information on the underlying probability distributions than the associated partial moments; for example, Riska et al.9 prove that the moment-based approaches cannot capture the feature of long-tailed behavior. However, the likelihood functions of the PH distributions are generally presented by the forms including integral and convolution operations. Therefore most challenges in the likelihood-based approaches are to solve maximization problems for likelihood functions of the PH distributions. Asmussen et al.10 first describe an advantage of EM (Expectation-Maximization) algorithms 11,12 for estimating parameters of the PH distributions, and their scheme is the de facto standard for estimating parameters of the PH distribution family such as Markov-modulated Poisson process, Markovian arrival process and batch Markovian arrival

579 process. The novel idea is to address the maximization problem for likelihood functions by producing hidden variables in the underlying Markov chain. That is, the essential complexity on estimation problems of the PH distributions is concentrated to the computation of the hidden variables, and thus the computational efforts can be reduced to almost same amount of computing the likelihood function. In this paper, we consider an EM algorithm for estimating parameters of the PH distributions with specific structures, called canonical forms13. Especially we deal with the first canonical form, CFl, in the paper. The canonical form is one of the minimum representations for the PH distributions, namely, the number of parameters is the least among the other PH distributions. In general, the time complexity of EM algorithms strongly depends on the number of parameters. Perhaps the EM algorithm for CFl is the fastest procedure with the least computation efforts, as far as we know. This motivates us to discuss the EM algorithm for the CFl. 2. P H distribution 2.1.

Definition

PH distributions are based on the method of stages technique, generalized by Neuts 1 , and are related with the transient analysis of a continuoustime Markov chain (CTMC) with one absorbing state. The basic definition is essentially same as that for an absorbing time distribution of CTMC. Then the cumulative distribution function (c.d.f.) and probability density function (p.d.f.) of PH distributions can be written in the forms: F(t) = 1 - 7rexp(2Y)e

and

f{t) = 7rexp(T£)£,

(1)

respectively, where 7T, T and £ are a row vector of initial probabilities, a matrix of transition kernel of the underlying CTMC and a column vector of transition rates to the absorbing state, respectively. Letting e be a column vector whose element is 1, we have £ = -Te. For convenience of explanation, the states of the underlying CTMC and the absorption are often called phases and an event, respectively. 2.2. Classification

and Canonical

Forms

The PH distribution is one of the widest classes of probability distributions, and can approximate other types of probability distributions with arbitrary number of phases. However, as the number of phases increases, the PH distributions become less tractable with respect to statistical analysis. Thus

580

to address the problem, there exist several kinds of representative subclasses of PH distributions. The commonly used sub-classes are acyclic PH (APH) distributions, mixed hypoexponential (MHE) distributions, hypoexponential (HPO) or Erlang (ERL) distributions, hyperexponential (HPE) distributions and the exponential (EXP) distribution. They can be classified by the structures of phase transitions. The interrelationship among the above sub-classes is summarized as follows. EXP c (HPO, ERL, HPE) c MHE c APH c GPH, where the general PH (GPH) distributions are equivalently denned as PH distributions described in the previous section. In all the sub-classes, APH distributions are particularly significant in terms of mathematical tractability as well as wide application to other probability distributions. The APH distributions prohibit a transition to the past phases, i.e., the transition matrix becomes an upper triangular matrix. On the other hand, the APH distributions involve redundant parameters in the upper triangular matrix. For example, if the number of phases is m, all the eigenvalues of the transition matrix in APH distributions are obviously m negative real numbers. In this case, the APH distribution with m phases can be re-defined by using the products of 7r and the eigenvectors, and the eigenvalues of the transition matrix. Indeed, all the APH distributions with m phases can be reduced to the minimum representation by using 2m — 1 parameters. The APH distributions with minimum representation are said to have canonical forms. Cumani 13 proves that APH distributions can be represented only by three kinds of canonical forms; CFl, CF2 and CF3. This paper focuses only on CFl. CFl essentially consists of a mixture of HPO distributions, and the parameters of CFl are given by

/-Ai

O

Ax

\

—A2 A2 7T :

(7Tl 7T2 • • • 1Tm ) ,

T

(2) —

\ O

Am_i

Am_i

-\mJ

where 7^ > 0, Y^i 7Tj = 1 and 0 < X\ < < Am. Figure 1 shows the transition diagram of phases in CFl, where the double circle represents the absorbing state.

581 K\

7t2

TCm

0

(0—K?)—*' • —*©—*^0 Figure 1.

T h e phase structure of C F l .

3. Parameter Estimation for C F l 3.1. Direct Search

Method

Consider an independently sampled data T> = {X\,... Then the likelihood function (LF) is given by

,Xn} from C F l .

n

LF(TT, T\V) = H 7Z exp(TXi)t

(3)

where TV and T are represented in Eq. (2). The estimation problem is to find 7Ti,... ,7rm and A i , . . . , Am under the constrains, YliLi i t = 1 and Ai < • • • < A m . As is well known, the problem related with maximum likelihood estimates (MLEs) for the PH distributions is regarded as an optimization for a non-linear function under some constrains. Usually, such an optimization problem is too difficult to solve explicitly. In addition to the difficulty of solving the optimization problem, the constrains of C F l are more strict even than the GPH distributions, and thus the well-used numerical approaches such as Newton's method cannot be applied to estimating parameters of CFl. Therefore, Horvath and Telek14 develop a direct search method like Nelder-Mead method to derive the estimates in CFl. Although this method has a good property on global convergence, the estimates obtained by the direst search do not converge to the MLEs even if the number of iterations increases, i.e., this method does not provide the MLEs exactly. 3.2. EM

Algorithm

This paper develops an EM algorithm for fitting parameters of CFl. The estimation scheme based on the EM algorithm has been developed by Asmussen et al.10 originally, and they deal with parameter estimation of GPH distributions. It is worth noting that the original algorithm by Asmussen et al.10 is not always effective for a specific form in terms of computational efforts. While the Asmussen's scheme is the more costly algorithm, it is well known that the computational efforts can be reduced by focusing on

582

the specific form of sub-classes of PH distributions 15 . Thus we attempt to improve the EM algorithm for CF1. The basic idea of the EM algorithm by Asmussen et al.w is to recognize phase transitions and their sojourn times as hidden variables, i.e., unobservable data. Therefore, they define a sequence of random and hidden variables representing the phase transitions and their sojourn times, respectively and the algorithm is built on the hidden variables. Since the number of phase transitions cannot be observed in the GPH distributions, the length of the sequence of phase transitions has to be regarded as a random variable. On the other hand, in the case of CF1, the number of phase transitions is at most the number of phases, TO. This property leads to less computation cost of the estimation procedure in CF1. Let Bi, i — 1 , . . . , m, denote the total number of starting from phase i for all the events. Also, Zi is defined as the total sojourn time in phase i. Then the parameters of initial vector are given by Bi Ki = ^

p ,

m

1=1,...,

TO.

(4)

If there is no constraint for Ai . . . , AOT, the parameters can be estimated as \i = —±-=

,

i = l,...,m.

(5)

In fact, the parameters A i , . . . ,\m are restricted with Ai < • • • < A m , and therefore the estimates of A i , . . . , \m become more complex. Fortunately the parameters of A i , . . . , Am have the following convenient property: If we know groups of same transition rates; for example, a group consists of Aj< = • • • = \j,, then the parameters are given by \

-

-

\

-

\v-----\j.-——p—

^fc=»' ^j=i

B

i

lfC.

.

(6)

That is, the estimation problem of parameters A i , . . . , Am is to find the groups of same transition rates which maximize the likelihood function. Finally, in the EM algorithm for CF1, one step becomes the following procedure: Update the parameter of initial vector E[Bj\V]

^:~ ZZiVmvy

*- 1 '---. m -

(7)

Search the groups of same transition rates and update

_..._

ELEkW.

(8)

583 There remains the problems with computation of E[Bj|2>] and E[Zj|X>]. Consider the computation of E[Bi|Xjt] and E[Z,|Xfc] which mean the probability of starting phase i and the expected sojourn time of phase i during a single event Xk, respectively. Under the assumption that V is the independently sampled data, we have E[i?,|X>] = Sfc=iE[-Bi|-Xfc] and E[Zi\T>] = ^Zfc=i E[Z;|Xfc]. By using the previous notation, we can formulate E[Bi\Xk] = [exp(TXfc)£]i,

(9)

fe f [7rexp(ra;)]i[exp(T(Xfe-a;))C]idx, (10) Jo where [•]* denotes the i-element of vector. The above representations are general, so that they can be directly applied to the GPH distributions. Asmussen et al.10 use the above equations as the differential equations, and develop the EM algorithm based on the differential equations by RungeKutta method. Although their method is widely capable of fitting the GPH distribution, the computation cost is not so small. In this paper, we can use the Laplace transforms for the above equations because of the special structure of CF1. Let £ denote the operator of Laplace transform. Then we have

E[Zi\Xk]=

m

.

C(E[Bt\Xk]) = J ] — i - ,

i = l,...,m,

mZi\Xk}) = J 2 ^ U ^

(11)

< = !,...,»».

(12)

Generally speaking, the computation cost to derive the inverse Laplace transforms is higher than that of solving differential equations for one event. However in the EM algorithm, E[Bj|Xfc] and E[Zi|-Xk] must be computed for all the events, X\.... , XK- Although the inverse Laplace transform is costly in terms of computational efforts, once the inverse Laplace transforms are derived, they can be applied to the computation of the other events. Since there are a lot of observations in practice, the method based on the inverse Laplace transform, as a result, can lead to less computation effort in the EM algorithm than differential-equation-based approaches. References 1. M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models. Baltimore: John Hopkins University Press, 1981.

584 2. B. Sengupta, "Markov processes whose steady-state distribution is matrixexponential with an application to the G I / G / 1 queue," Advances in Applied Probability, vol. 21, pp. 159-180, 1989. 3. S. Asmussen, "Phase-type representations in random walk and queueing problems," The Annals of Probability, vol. 20, no. 2, pp. 772-789, 1992. 4. S. Asmussen and T. Rolski, "Computational methods in risk theory: a matrix-algorithmic approach," Insurance: Mathematics and Economics, vol. 10, pp. 259-274, 1991. 5. L. Lipsky, Queueing Theory: A Linear Algebraic Approach. New York: MacMillan Publishing Company, 1992. 6. A. Bobbio, A. Cumani, A. Premoli, and O. Saracco, "Modelling and identication of non-exponential distributions by homogeneous Markov processes," in Proc. of the 6th Advances in Reliability Technology Symposium, pp. 373-392, 1980. 7. M. A. Johnson and M. R. Taffe, "Matching moments to phase distributions: mixtures of erlang distribution of common order," Stochastic Models, vol. 5, pp. 711-743, 1989. 8. T. Yoshihara, S. Kasahara, and Y. Takahashi, "Practical time-scale fitting of self-similar traffic with Markov-modulated Poisson process," Telecommunication Systems, vol. 17, no. 1/2, pp. 185-211, 2001. 9. A. Riska, V. Diev, and E. Smirni, "An EM-based technique for approximating long-tailed data sets with PH distributions," Perfomance Evaluation, vol. 55, pp. 147-164, 2004. 10. S. Asmussen, O. Nerman, and M. Olson, "Fitting phase-type distributions via the EM algorithm," Scandinavian Journal of Statistics, vol. 23, pp. 419441, 1996. 11. A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, vol. B-39, pp. 1-38, 1977. 12. C. F. J. Wu, "On the convergence properties of the EM algorithm," Annals of Statistics, vol. 11, pp. 95-103, 1983. 13. A. Cumani, "On the canonical representation of homogeneous Markov processes modelling failure-time distributions," Microelectronics and Reliability, vol. 22, pp. 583-602, 1982. 14. A. Horvath and M. Telek, "Phfit: A general phase-type fitting tool," in Proc. Computer Performance Evaluation / TOOLS, Lecture Notes in Computer Science, vol. 2324, pp. 82-91, 2002. 15. A. Thiimmler, P. Buchholz, and M. Telek, "A novel approach for fitting probability distributions to real trace data with the EM algorithm," in Proc. of Int'l Conf. on Dependable Systems and Networks, pp. 712-721, IEEE CS Press, 2005.

FUZZY VARIABLE RELIABILITY MODELING BASED ON CREDIBILITY THEORY RENKUAN GUO Department of Statistical Sciences, University of Cape Town Private Bag, Rondebosch 7700, Cape Town, South Africa

System reliability, as a quality index, is the capability to complete the specified functions accurately in mutually harmonious manner under the specified conditions within specified time period. The vague feature is intrinsic and inherent to the system reliability and inevitably engaging fuzzy mathematics. Fuzzy mathematics initiated by Zadeh (1965) facilitated a foundation dealing with vague phenomena in reliability modeling. However, the fuzzy mathematical foundation initiated by Zadeh (1965, 1978) is membership function and possibility measure based and widely used. However, possibility measure, which was originally expected to play the role of probability measure in probability theory but did not because possibility measure does not possess self-duality property as that in probability theory. To resolve this dilemma, Liu (2004) proposed an axiomatic foundation for modeling fuzzy phenomena, named as credibility theory. The credibility measure possesses self-duality property and is able to play the role of that in probability theory. In this paper, we will explore the concept of credibility measure, its axiomatic foundation, concept of fuzzy variable and its credibility distribution in the sense based on the credibility theoretical foundation. Furthermore, we propose the concept of credibility copula for characterizing the fuzzy dependence between fuzzy variables. Finally, we explore the credibility reliability evaluation based on the fuzzy load-strength concept.

1. Introduction Today's dazzling and rapid stride in technology have buried the traditional way of designing, manufacturing , operating and analyzing a single-part (or single-subsystem) and brought the 21 th century into a system engineering age. Today's design and analysis, due to the complexity of the system designing, manufacturing and operating, has to investigate the operating behavior of the complex system as a whole. In other words, we must study the operating mechanism underlying the whole system and the interactions between subsystems within the system and the environmental impacts on the whole system operating behavior as well, i.e., the interactions between the whole system studied and other systems, say, social systems, business systems, climate system and others. Inevitably, the reliability of complex system, a quality index of the operating behavior of the system, will be one of the focus points of quality and productivity improvement. 585

586

It should be fully aware that the concept of system reliability itself possesses a certain fuzzy characteristic. If we take the capability (i.e., reliability) of the movement of the operating system as an example, it can be reasonable to argue that when the operating system is carrying on its specified task for the specified time interval K 15 f 2 l t n e accurate characteristic of each movement's behavior in the operating system is not clear. More specifically, let us use a train station ticket-selling robot as an illustrative example. The right "hand" of the robot is required to move from point ax (ticket drawer) to point a2. All we know that each ticket is put at a point a2 within the table area under ticket window in front of customer but we do not know the accurate error between design-specified point a2 and the actual point d2 . If the movement error, denoted by e, of the robot's right hand is described by a probability density
R= j
(1)

As a matter of fact the movement error e is a sequence discrete sequence E = \ex, e2, • • •,en j . Define a membership

L{e) e\<e< e, 1 We)=' R(e) 0

e, < e < e2 e2<e<e\

(2)

otherwise

Then the system reliability can be calculated accordingly (3)

The fuzziness of complex system reliability analysis can be looked in the following ways: 1.

Time impacts on reliability of individual subsystems. Time fact affecting subsystem reliability can be analyzed from two angles: materials constituting of the subsystems are wearing our and downgrading according

587

to specifications in long term and the shape and strength of materials are changed associated with movements in short term. 2. Operating environment impacts on reliability of individual subsystem. Operation environment involves hard side, say, temperature, humidity, dust, light etc and soft side, say, work floor culture and in general company and local social culture environment. It is worth to stress here, the environmental factors interact with time factors and such relationships are difficult to evaluate and therefore a fuzzy issue appears here for consideration. 3. Human behavior impacts on reliability of individual subsystem. In today's globalization environment, it is fair to say none of the complex system will be designed, manufactured and used by one company or by the same country. More and more complex systems are international-made. Inevitably, the human factors affect system reliability directly and indirectly during the system design, manufacturing, shipping and the end-usage operating. Even making the focus narrow to system operating, it is obvious the human and system (machine) interaction is often too complicated to describe. This will be fuzzy uncertainty problem again. System reliability, denoted by Rs is dependent upon the system (machine) operating reliability, denoted by RM and human operating reliability, denoted by RH, which means the capability for the human (operator) finishes the specified job accurately within specified time. If using probability for expressing such capability, it is called operating reliability. A simplified description is treating human and machine as a series system and then the system reliability is simply Rs = RM x RH . However, more and more complex system with automatic control subsystems and human (operator) restricted overriding function can not use simple series or parallel system to model them. 4. System design impacts on reliability of individual system. Today's quality starts at design stage. The allocation of reliability to individual subsystem is not known completely. As a matter of fact, the system operating behavior is unknown in principle before the system being manufacturing and putting into functioning. Therefore, the investigation and analysis of the complex system reliability is enabling the system information from fuzzy state evolving into a less-fuzzy state. The methodology of the study is to analyzing the isomorphic characteristics of the systems falling in the same category and to find the key factors and the relationships between the constituting subsystems. According to the prior information from the

588 isomorphic systems the optimal design of the system reliability and the reliability allocation to individual subsystem. In summary, the fuzzy uncertainty problems appeared in complex system reliability analysis lie in shorting system structural clarity, shorting of the underlying mechanism of the interaction between subsystems and shorting of overall information of the system as a whole. Accordingly, the methodology to solve the fuzzy reliability of the complex system should be developed in terms of the basic concept of fuzzy mathematics. 2. Credibility Measure Based Fuzzy Theory Credibility theory is a branch of mathematics that studies the behavior of fuzzy phenomena. Zadeh (1965) defined a fuzzy set in terms of membership function which is a natural extension of indicator function of a Cantor set. Later Zadeh (1978) proposed the concept of possibility measure which was intended as a counterpart of that in probability theory. However, possibility measure does not possess self-dual property that is absolutely critical both in theoretical developments and applications. Liu (2004) proposed the concept of credibility measure with self-duality and established an axiomatic foundation of credibility theory for fuzzy mathematics. Let 0 be a nonempty set, and ^}(0) = 2 e the power set on 0 . Each element, say, A a® , v4e<}3(®) is called an event. A number denoted as Cr(^) , 0
4.

Cr{U(4}A0-5

= su

p[Cr{4}]

for

any

{4}

with

C r ( 4 ) < 0.5. Axiom 5. Let ®k be nonempty sets on which Crt (•),k = 1,2,-••,« satisfy the first four axioms respectively, and 0 = 0, x 0 2 x • • • x 0 n . Then

589

if

Cr{A} =

1if

sup

min Cr t {0 t j,

sup

min Cxk{0k} <0.5

sup sup

min C r ^ ^ } , min Cr t {6O>0.5

for each A €

• K such that for any B e *}3 (E) , {0|£(<9)c B,B eq3(K)} e
(5)

Definition 2.5 (Liu [85]) The credibility distribution
Cr{de®\£(8)<x]

(6)

590

That is the credibility distribution Q>(x) is the accumulated credibility grade that the fuzzy variable £, takes a value less than or equal to a real-number x e R . Generally speaking, the credibility distribution G> is neither leftcontinuous nor right-continuous. Theorem 2.6 Let £ be a fuzzy variable on (0,
(7)

y>x

Definition 2.7 Let <& be the credibility distribution of the fuzzy variable ^ . Then function :M -»[0,+co) of a fuzzy variable £, is called a credibility density function such that

^(x)^^(y)dy,

Vx.

(8)

Example 2.8 A three-parameter half-sine fuzzy variable ^{rl,r^,ri') has a membership function sin [a- (* - r, )/2 (/-2 - r,)]

ifr, < x < r 2

/*(*) = 1

if r, < x
0

otherwise

(9)

Then credibility distribution of a three-parameter half-sine fuzzy variable ^(r{,r„r3) is 0,

*(*) =

sin[^-(x-rl)/2(r2-r,)]/2,

if x < r. ifr, <x
1/2,

ifr2<x
if

(10)

JC> J

and its credibility density function is $<x\\KC0S[x(x-^)l2(ri-ri)\l\_A(ri-ri)\ 0

'^ri^x
(11)

591 In reliability modeling, the multivariate fuzzy variable and its joint distribution are required. Definition 2.9 Let ( £ P £ 2 > ' " > 0 be a fuzzy vector defined on (0,(xl,x2,-,Xn) = Cr(9e®:{i(e)<xl,{2(d)<x2,-,{„(8)<xn)

(12)

And the joint credibility density function 0 :[—oo,+oo]" —>[0,+oo] of fuzzy vector (£i,£ 2 >""'>0 is the function such that <3>(xl,x2,---,x„)= | J-—oo —ac

fy(xl,x2,---,xn)dxldx1---dx„

(13)

—oo

Holds for all (xl,x2,---,xn)e[-<x>,+<x>]", where «t>(x,,x2,--,xn) is the joint credibility distribution for fuzzy vector (£p£2>"">£n) • 3. Credibility Copula and Survival Copula In probability theory, copula is the concept for describing the dependent structure between two random variables and obtained more and more attention in financial risk analysis. Parallel to that in probability theory, we will define copula concept for credibility measure based fuzzy variables and will investigate the applicability in fuzzy reliability analysis. Definition 3.1 Let H(xt,x2) be the joint credibility distribution of the fuzzy variable pair (X,,X,) and the continuous marginal credibility distributions be ^ ( J C , ) and F2(x2) respectively. Then for any w, xw2 E [ 0 , 1 ] X [ 0 , 1 ] ,

C(u],u2) = H(Fi-l(u]),F2-[(u2))

(14)

where the inverse of credibility distribution F ( ) is defined by F'x (u) = inf {x e K: F(x) > u, Vu e [0,l]}

(15)

Now let us investigate the joint survival function of credibility fuzzy lifetimes (X,,Jf 2 ) , //(x,,x 2 ) , and their marginal credibility survival

592 distributions are defined Fl (x,) = 1 - Fi (x,) respectively. Then H(xl,x2)

= l-Fl(xl)-F2(x2)

by

Ft (*,) = 1 - Fl (JC, )

and

+ H(xl,x2)

=Fi(xl) + F2(x2)-l

+ C(Fl(xi),F1(x2))

=Fl(x1) + Ft(x1)-l

+

(16)

c(l-F1(x1),l-F2(x2))

Definition 3.2 Let C(W,,M 2 ) be the credibility copula for credibility fuzzy variable pair (Xt,X2) with continuous marginal credibility survival distributions ^ ( * , ) and F2{x2~) respectively. Then C(w,,w2) = w, +u2 - 1 + C ( 1 - M , , 1 - W 2 )

(17)

is called the survival copula (reliability copula). And the joint credibility survival function is H(xl,x2)

= C(Fl(xl),F2(x2))

(18)

However, we need to point out that the Axiom 5 stated in Section 2 is necessary to be replaced by an alternative one, which is (V,•) -based. The (V,•) -based credibility theory is called non-classical credibility theory by Liu (2006) and with the (V, •) -based credibility measure, we can define more important concepts like conditional credibility distribution, conditional expectation and others, which will help to facilitate a full structural development as that in probability theory and copula theory. Axiom 5*: let 0 - 0 , X@2 X . . . x 0 n with &k [k = 1,2,••-,«) being nonempty sets where Crk satisfy the first four axioms and furthermore satisfies

ifl((2Cr,{3})Al) ifminCr^}<0.5 Cr{(0p02,..,£„)}=

(19)

minCr^} \
for each {0x,02,...,9n)

if minCrA{^}>0.5 \
6 0 . In this case, we denote C r ^ C r ^ C ^ ' - ' - ' C r ^ .

4. Generalized Fuzzy Load-Strength Analysis of Complex System The fuzzy reliability of a complex system can be virtually treated as the interacting state between two factors: factor resulting from the fuzzy load or

593 functional demand of the system and the factor rooting within the strength or the intrinsic capacity of the system. If the complex system is a structure, say, a bridge, a roof, etc, the terms load and strength directly reflect the physical property of the structure. In general context, load (functional demand) and strength (intrinsic capacity) terms will represent virtual state variables of the complex system. In the concept of reliability, if the system fuzzy load is less than system fuzzy strength the system is safe and reliable otherwise, the system is unsafe and unreliable. Let Yx,Y2,---,Yn and Zx,Z2,---,Zm be two groups of control variables and the strength Xx = g(Zx,Z2,---,Zm)

and the load X 2 = r {YX,Y2,-• • ,Yn). Todinov

(2005) classifies the domain (Xx,X2)

into three areas: safe region, where

F = {(x,,x 2 ): Xx -X2 < 0, (x,,x 2 ) e [x, min ,x, raax ]x[x 2 min, x2max ]}

(19)

Failure surface Xx - X2 = 0, and safe region, where S = {(x,, x2): X, - X2 > 0, (x,, x2) e [x, min, x, max ] x [x2 min, x2 max ]}

(20)

The fuzzy reliability is thus R=

( 21 )

J"i, ^^(xi'^dxAi

where ^(x,,x 2 ) is the joint credibility density function of load-strength (XX,X2)

with a joint credibility distribution 0(x,,x 2 ) such that Hx»x2)

d2 = -^r®(xl,X2)

(22)

We notice that the partition of the domain of [\ min ,*,, miK ]x[x 2min ,x 2max ] can be converted to [ a , m i n , u l - m ] x [ u 2 m n , u 2 m ] , where M

l,min

M

2.rain

=

n

I ^l.min j '

M

-F(' \ ^2 V ^ . m i n ) '

l.max

M

=

2,max

M ( X\.man ) -F( \ ^2 ^ l . m a x )

—

^

and the safe region S = {(«,,w2): F~[ (ux)-F2x (i/2) > o} and therefore, the fuzzy reliability of the system can be presented by credibility survival (reliability) copula C(W 1 ,H 2 ) = M 1 + M 2 - 1 + C ( 1 - W , , 1 - W 2 ) , V ( « , , W 2 ) E S

(24)

594 5. Concluding Remarks In this paper, we briefly discussed the fuzzy characteristics of complex system reliability. We reviewed the concepts of credibility based fuzzy variable, the distribution and density function proposed by Liu (2004). Then we propose a credibility copula concept for the description of the fuzzy dependent structure, which is important in the investigation on complex system. Based on these credibility measure based fuzzy variable developments, we investigate fuzzy reliability for complex system in terms of the fuzzy load-strength concept and thus present a general theoretical foundation for fuzzy reliability research. References 1. 2. 3. 4. 5. 6.

A. Kaufmann, Introduction to the theory of Fuzzy Subsets Vol. I, Academic Press, New York (1975). B. D. Liu, Uncertainty Theory - An Introduction to Its Axiomatic Foundations, Springer (2004). B.D. Liu, A survey of credibility theory, Fuzzy Optimization and Decision Making 5(4), (2006). M. Todinov, Reliability and Risk Models: Setting Reliability Requirements, John Wiley & Sons Ltd, New York (2005). L. A. Zadeh, Fuzzy Sets, Information and Control 8, 338-353 (1965). L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1,3-28 (1978).

SMALL SAMPLE ASYMPTOTIC DISTRIBUTION OF COSTRELATED RELIABILITY RISK MEASURE RENKUAN GUO Department of Statistical Sciences, University of Cape Town Private Bag, Rondebosch 7700, Cape Town, South Africa

System reliability, as a quality index, is the capability to complete the specified functions accurately in mutually harmonious manner under the specified conditions within a specified time period. We notice that high costs are sometimes associated with the occurrence of tiny probability and therefore the reliability index alone would not fully characterize the consequence of system breakdown. Todinov (2005) proposed the cost of failure as a measure for system reliability risk and explored related models. However, the sparse data availability extracted from the system may haunt the modeling exercises. In this paper, we will merge cost of failure idea and the small sample asymptotic idea together for the investigation on the asymptotic distributions for the total cost of failures due to system failures and associated losses.

1

Introduction

In engineering context, system reliability, as a quality index, is the capability to complete the specified functions accurately in mutually harmonious manner under the specified conditions within a specified time period. Reliability is typically described by the probability of the system surviving (functioning) up to time t. Therefore, the classical reliability index just involves the system functioning capability not how much costs involved in the system operation and inevitably unrealistic in decision making on production scheduling and maintenance planning. On the contrary, the cost-related reliability risk measure is better and more realistic for understanding system availability and managing system reliability. Todinov (2005) proposed the cost of failure as a measure for system risk and accordingly investigated some related models. Therefore the study of the cost-related risk measure of a functioning system is essentially linking to the problem of finding the distribution function of the total loss, which is in nature a weighted sum of losses due to system failures. Furthermore, today's industrial systems are more and more complicated and fast-changed. Then the sparse data availability extracted from the system's operations may haunt the modeling exercises. Therefore, the so-called "small 595

596

sample asymptotics" or "saddle point approximation" is the proper mathematical solution to address the approximate distribution of total cost under small sample size. 2

Asymptotic Distribution for a Compound Risk Process

Denote the cost of loss due to the /' failure by Yi ( / = 1,2, • • • , « ) . Note that Yt is not necessarily just the repair cost but also the losses from production interruption and even the delay deliver of products etc due to the failure. Obviously, Yt > 0 is a random quantity and can be expressed by Yi=SlLl,i

= l,2,--,n

(1)

y&^L^, i— 1,2,---,n are independent bivariate random vectors with failure indicator (0 with probability p [ 1 with probability 1 - p and the loss I, > 0 due to the i failure having a common distribution FL(x) = Pr[L,<x]

(3)

or equivalently the loss L. > 0 has probability generating function PL (9). Therefore, the total loss due to the system failures at time / is a general compound process {C, : t > 0} N(l)

N(,)

C,=Z^ = S ^ , i=i

(4)

/=i

where {7V,,i > 0} is a regular counting process. We notice that the cost-related risk measure C, itself forms a compound process {C, - -<>0} where the summation index is a random number N, at time t > 0 . Then we have to find out the cumulant generating function of a compound process. Assume that the counting process {N, : ? > 0 } is a nonhomogeneous Poisson process with intensity or rate function A(f) > 0 and denote the corresponding probability generating function by PN (0) . Furthermore, it is assumed that loss Yt follows a common distribution function FY (•). {Nt :t > 0} and loss Yi are assumed to be independent. Then in terms of conditioning we have ( ' Pc, (9) = PHi (PL (0)) = exp - JA(u)du(\-PL V o

^ (0))

(5)

J

The required function is cumulant generating function via moment generating function. The relationship between a moment generating function and the probability generating function is

597 mc,(0) = Pc,{ee)

(6)

because Px (0) = E[0*] = E[e^9)x

] = mx (In 0)

(7)

Therefore KCi(0) = \nmc,(0) = \nPc,(ee) = (pY(ee)-\)\A{u)du

(8)

0

=(my(0)-l)JA(u)du 0

By imposing an additional assumption that Pr[i9f = l] = p, i.e., the failure probability does not vary. Then for loss Y: ='dtLl , its moment generating function can be simply expressed by mr(g) = E[eJ'L'] = p + (l-p)mL(q). With the above assumptions, we will have KCi(0) = (l-p)(mL(0)-l)JA(u)du

(9)

0

Then i

Kc,(0) =

mL(0)(\-p)JA(u)du

Kci(0) =

: mL(0)(l-p)JA(u)du

(10)

o

According to Field and Ronchetti (1991), the asymptotic distribution of C, can be expressed by fc,(x) = gc.{x)(l + 0(n-1))

(11)

where

(

gc W=

'

Y /2

[ ^ R j "PMM*)-**))

d2)

where the saddlepoint a0 is obtained by solving the following nonlinear equation for any given (x,t), x e R+, t ~> 0 K'Q(a0) = x i.e., solving the following equation

(13)

598 mL(a0) =

*

(14)

(l-p)JA(u)ctu 0

Once the asymptotic distribution of the cost-related risk measure of sample size n at time t > 0, gc (x) is obtained, we are often interested in finding the probability the accumulated loss exceeding the given risk threshold level x, at time t > 0 Pr[C,> C,]=)gCi(u)du

(15)

c,

3

Sample Asymptotic Distribution for Cost-Related Risk

Obviously the asymptotic distribution obtained in section 2 comes with quite strong assumptions which will limit the applicability of it. In this section, we will relax some of the assumptions but also tight some, for example, 9i=\

fO

with probability

p.

(16)

[1 with probability 1-/7, and the loss L, > 0 due to the / failure has a distribution Fj,(*) = Pr [!,<*]

(17)

and mean /Lt, = £[C,]. Therefore, the total loss due to the complete system failure is C{n) = tlYl=£tSlLl (=1

(18)

;'=I

Again because of the key role playing in small sample asymptotics is the cumulants of a distribution and the cumulant generating function, which is related to the moment generating function. We need another look at it. Given a random variable X and a probability distribution function Fx (x), if there exists an a > 0 such that mx(q) = E[eqX], V #| < a , then mx is called the moment generating function (abbreviated as mgf)of random variable X , or equivalently, of the distribution Fx (x). The logarithm of a mgfis called a cumulant generating function of random variable X . Kx(0) = \nmx(0) = Y^J

(19)

;=i Jm

where K.. , j = l,2,--- are called the cumulants. If Y = ^^XlXi weighted sum of m independent random variables, then ' =l

is a

599 m

KY(0) =

(20)

YjKXi(Ai0)

Therefore, for the random quantities Yj (/' = 1,2,•••,«) (defined in Eq. 1), the corresponding cumulant generating function K,(0) =

\nmr,(0)^^

(21)

' J'-

The average cumulant generating function of sum J ^ " = , ^ is then denoted by K„(0) =

(22)

±£K,(0) n (=1

We notice that from Eq. (1), Yt have mixed distributions respectively, with an atom at zero. As n—>-oo, the distribution of £ ) " = 1 ^ can be approximated by a continuous one, denoted by n (x) (i.e., the approximate density). Regularity conditions are stated for the asymptotic distribution given by Beran and Ocker (2000): For each S > 0 , there exists a (limiting function) K(e) lim sup \Kj0)-K(0)\ M

=O

»»6[-J,J]'

(23)

'

and dK„(0)

lim sup

d0

dK(0) d0

=0

(24)

and lim sup

d2Kn(0) dd2

d2K(0) d02

=0

(25)

If the above-stated regularity conditions hold and let 0O e[—6,5] be a point satisfying K{0o) =

[dK(0)/d0]g_eo=x/n

(26)

Then, fa(x) = gn{x) + 0(n-2)

(27)

where 8» (x) = i pyrnK

ex

P (nK

(9o)-0ox)

(0O)

holds uniformly for x — J ^ / J < (3n where (3 > 0 is a fixed constant.

(28)

600

4

Asymptotic Distribution in Case of Failure and Loss Being Independent

The simplest case we are interested is that the /'* failure mechanism is independent of the loss structure due to the /"" failure. Then by noting that the independence between i9, and Lt leads to K,(e) = lnE[e^]

= In[p, + ( l - p , ) m u (0)]

(29)

and

£(*)=-2>l>+0-/>/K (*)] "

(30)

1=1

Then, under the regularity conditions, Eq. (27) and (28) can be stated as

M*) = g;(x) + o(n-2)

(31)

and g'„ (x) = * pxnK„(d0)

exp (nK„ (0O) - 0ox)

(32)

There are two cases worth to investigate: (1) The loss amounts are essentially known and fixed quantities, i.e., L, are constants. Then,

K(1(0) = - L 2>[>+(l-flK 1 ']

(33)

And accordingly,

W'\t£i^

(34)

and 1

' »£[>+(i-*)^] 2

(35)

(2) The loss random variables Z,, are gamma distributed with parameter (a,, /3,), r (a,, /3 f ). The gamma density function is then f,(x;a„{Jf)

= -^-x*'-le-»*,x>0

(36)

and T(ai)=\ua-'ie-"du

(37)

601 and accordingly, the mgfs for L, will be ( l — # / # ) "'

(38) n

/=i

L

J

Note that dm, (0) _q,f

0^

de ~p\

p,

d2mi{6) d92

-(«,+!)

at(a,+\) r

Pf

+1

ex^

)

(39)

1

P.

Therefore, .(«, + i)

Pi { ^)=-inttp,+(\-p,){\-01

PJ [}<)-"•

(40)

and

R-IC)

ifp-(i-A>;(^+(i-/'-)2"';(^)'".(^)+[(i-p->-(^2

(39)

[p,+(i-P,){i-eiP,r1 Then the asymptotic distribution for the total loss C(«)(with sample size n) is approximated by g"n (x). 5

An Asymptotic Distribution for Time-Dependent Failure Cost-Related Loss

The assumption on indicators i9; (in Eq. (2)) for developing the asymptotic distribution of cost-related cost is in some sense a stationary one, i.e., a stead limiting distribution assumption. More realistically, the individual default indicator 3i is time dependent, i.e., 3t= S.(t) . Accordingly, the individual default probability is time-dependent Pl=E[S,{t)]

= p,{t)

(40)

This is will create a wide applicability of cost-related risk-measure. As a matter of fact, for any given component, say, the /'* component (subsystem), if it is a repairable one, then the failure probability

A = £[$(/)] = />,(') In case of Gamma distributed loss Lt, we will have

(41)

602

K„(e,t) = ±£ln\pl(t) + (l-pl(t))(l-0/fi)-*]

(42)

and

{\-p,{t))a,(x^-{a^

K„(e) = - t 1

'

^

\U?L1

(43)

nkPl{t)+{\-Pl(t))i\-eip,r

and p

F: ,e)Jf

-W-*(OH(0Wl-p<('))*

»i(*)*»(0)+[il-p>(Oh(*)] ( 4 4 )

[/*«+o-A(0)o-«/»rI Then the asymptotic distribution for the total loss C («) (with sample size n) is given by g* (x) in terms of Eq. (31). 6

Conclusions

In this paper, we develop the small sample asymptotic distributions for the costrelated risk measure under different scenarios. With these approximate distributions, we can calculate the threshold probabilities and also the asymptotic confidence interval for given level 100(1— a)% (Kolassa and Tanner, 1999). We believe that these theoretical results establish the foundation for reliability analysis and decision-making. We will perform numerical analysis in future research. References 1. 2.

3.

4.

5.

J. Beran and D. Ocker, Small Sample Asymptotics for Credit Risk Portfolio, Journal of Computational & Graphical Statistics 14 (2), 339-351 (2000). C. A. Field and E. Ronchetti, Small Sample Asymptotics. Institute of Mathematical Statistics, USA, Lecture Notes - Monograph Series, 13 (1990). C. A. Field and E. Ronchetti, An Overview of Small Sample Asymptotics. Directions in Robust Statistics and Diagnostics (Part I). Editors, W. Stahel and S. Weisberg, Pringer-Verlag (1991). J. E. Kolassa and M. A. Tanner, Small Sample Confidence Regions in Exponential Families, Journal of the International Biometric Society 55 (4), 1291-1294(1999). M. Todinov, Reliability and Risk Models: Setting Reliability Requirements, John Wiley & Sons Ltd, New York (2005).

ESTIMATION OF FAILURE INTENSITY AND MAINTENANCE EFFECT UNDER TWO DIFFERENT ENVIRONMENTS

JONG-WOON KIM Railroad System and Safety Research Department, Korea Railroad Research Institute, 360-1, Woram-dong, Uiwang-si, Gyeonggi-do, 437-757, Korea WON-YOUNG YUN Department of Industrial Engineering, Pusan National University, Geumjeong-gu, Busan, 609-735, Korea JUN-SEO PARK, JAE-HOON KIM Railroad System and Safety Research Department, Korea Railroad Research Institute, 360-1, Woram-dong, Uiwang-si, Gyeonggi-do, 437-757, Korea The maintenance effect is a peculiar factor of repairable systems. Malik (1979) and Brown, Mahoney & Sivazlian (1983) proposed general approaches for the maintenance effects, where each maintenance reduces the age of the unit with respect to the rate of occurrences of failures. An important problem in failure data analysis has been that all parts of data have not always been collected under similar conditions. We consider an estimation problem of repairable systems under two different environments and Malik's proportional age reduction model. Failure intensities depending on environmental conditions and maintenance effect are estimated by the method of maximum likelihood. Simulation results are presented to illustrate the accuracy and the properties of the proposed estimation method. 1.

Introduction

Maintenance effect is a characteristic factor to repairable systems. Conventional statistical analysis for failure times of repairable systems takes into account one of the following two extreme assumptions, namely, the state of the system after maintenance is either as "good as new" (GAN, perfect maintenance model) or as "bad as old" (BAO, minimal maintenance model). Under GAN assumption, the failure process follows the renewal process and under BAO, the failure process follows the non-homogeneous Poisson process. It is well known in practice that maintenance may not yield a functioning item which is as "good as new". On the other hand, the minimal maintenance assumption seems to be too pessimistic 603

604

in realistic maintenance strategies. From this it is seen that the imperfect maintenance is of great significance in practice. Some authors assumed that maintenance restores the system operating state to somewhere between GAN and BAO. Malik (1979) proposed the approach modeling the improvement effect of maintenance, where maintenance reduces a part of the age, the operating time elapsed from the previous maintenance, while maintenance reduces the total system age in Brown, Mahoney & Sivazlian (1983). Malik's proportional age reduction model is considered in this article. Another important problem in failure data analysis has been that all parts of the data have not always been collected under similar conditions. For example, we often encounter the situation where a piece of equipment may have been used in different environments or may have a different age or modification status. Such different environments obviously affect the equipment's inherent reliability characteristics. Therefore, it may be useful to take account of the environmental factors in equipment reliability modeling. Several models have been used to account for the influence of the environments on a system. In this paper, the Weibull lifetime distribution is considered and it is assumed that the scale parameter of the Weibull distribution is changed by environmental conditions. This assumption is based on the property that the covariate just changes the scale parameter of the Weibull baseline hazard rate; X is changed into Aesn in the proportional hazards model. Figure 1 shows the problems of repairable systems with effective corrective maintenance under two different environments. In Figure 1, the scale parameter in the environment 1 is different to that by the environment 2. We estimate one common shape parameter and two different scale parameters of the Weibull distribution and the maintenance effect. The likelihood function is constructed and the genetic algorithm is used to find a set of value maximizing the likelihood functions. Notation p : the age reduction factor (0 < pt < l) mi : the number of the units operated under therthenvironment (i = 1,2) ntj : the number of failures of the/th units under the Mi environment tj . k : the &th failure time of the/th units under thetihenvironment t, ., : the termination time of the/th units under therthenvironment

605 —•*

*

A.1,1

^ , 1

—X 2,1,1

• • •

x {

A,l,2 •

UA,

'1,^,2

',»!,«,„

H

•••

'2,1,2

'.

?

'.l.*

• Environment 1

1^.*

H

1

2.1."i,i

2,1.*

*• Environment 2

I J ^ , 1

^ , 2

2."i.n:,^

^2^,«

Figure 1. Failure and maintenance process under two different environments

2. Literature Review Malik (1979) proposed an approach for modeling the improvement effect of maintenance, where maintenance is assumed to reduce the operating time elapsed from the previous maintenance in proportion to its age. On the other hand, in BMS's approach (1983), it is assumed that maintenance reduces system age. The two models can be expressed by the virtual age concept suggested by Kijima(1989). Higgins and Tsokos (1981) studied an estimation problem of failure rate under minimal repair model using the quasi-Bayes method. Tsokos and Rao (1994) considered an estimation problem for failure intensity under the Powerlaw process. Coetzee (1997) proposed a method for parameter estimation and cost models of non-homogeneous Poisson process under minimal repair. Park and Pickering (1997) studied an estimation problem to estimate parameters of failure process with failure data of multi-systems. Whitager and Samaniego (1989) estimated the lifetime distribution under Brown-Proshan imperfect repair model (1983). It is assumed that the data pairs (7},Z,-) are given, where T, is a failure time and Z, is a Bernoulli variable that records the mode of repair (perfect or imperfect). Lim (1998) studied an estimation problem using the EM algorithm when masked data (Z, is unknown) are given under Brown-Proshan imperfect repair model (1983). Lim and Lie (2000) extended Lim's work (1998) and considered first-order dependency between two consecutive repair modes. Shin, Lim and Lie (1996) proposed a method for estimating maintenance effect and intensity function in Malik's model. Jack (1997) estimated lifetime parameters and the degree of age rejuvenation when a machine is minimally

606

repaired on failures and imperfect preventive maintenance is also carried out. Pulcini (2000) used the Bayesian approach to estimate overhaul effect and intensity function under minimal corrective maintenance and effective preventive maintenance. Baxter, Kijima and Tortorella (1996), and Calabria and Pulcini (1999) dealt with some properties of the stochastic point process for the analysis of repairable units. Baker (2001) focused on fitting models to failure data. Martorell, Sanchez and Seradell (1999) proposed a age-dependent reliability dependent model considering effects of maintenance and working conditions which was applied to nuclear power plant. 3. Likelihood Function We consider the 2-parameter Weibull distribution whose probability density function and the survival function are given by: /(/)=A/?/'-'exp(-Af')

(1)

R{t) = exp(-Xtfi)

(2)

The method of maximum likelihood is used to estimate parameters. The likelihood function is of the form:

^>m$$&&%&

(3)

The likelihood function and log-likelihood function for the Weibull distribution are: 2

m.

(=i

j=\

L(^,A2,Ap)=nn^v",; nta-p'ri*-!))"" (4)

exp

-4*. -ptjr-ith

+

-p'ueJ *>t(t* -p**Y

«,(iia, +in/?)+\iip-iMtIJk

-ptij(kA

i=i y=i

(5)

•4r-PfJ-^k

-P*«H)Y+lth -P*J

607

4. Experimental Results Simulations are carried out to investigate some properties of the estimation. The number of simulation runs is set to be 100. Failure times are generated for some specified sets of parameters and estimations are performed in the generated failure data. Mean and standard deviation (Std), are calculated to show the performance of the estimation. The genetic algorithm is used in these experiments to find the set of values maximizing the log-likelihood function. Table 1 is the result from the experiments which are performed under four combinations of the number of units and that of failures per unit while the total number of failures is set to be equal. Table 1 shows that the estimation precisions for A,, A2 and p are more sensitive to the number of units than the number of failures per unit. Table 2 shows that the degrees of the difference between two scale parameters (A,, X2) do not highly influence the estimation precision. Table 1. Effect of the number of units vs the number of failures per unit

4 =3 2 5 12 30

Mean 3.13 3.16 3.11 3.!0

30 12 5 2

Std 0.90 0.77 0.66 0.48

KMean 7.30 7.43 7.56 7.14

7 Std 1.46 1.31 1.29 1.15

P-= 2 Mean 2.08 2.09 2.08 2.04

Std 0.24 0.19 0.19 0.17

P = 0.5 Mean Std 0.48 0.17 0.51 0.14 0.49 0.14 0.49 0.15

Table 2. Effect of the difference between two scale parameters

(«- a0,n

X "i

K

2 3 4 5

9 7 6 5

K Mean 1.98 3.10 4.26 5.33

P-= 2

K Std 0.36 0.60 1.06 1.26

Mean 8.39 7.44 6.59 5.64

Std 0.83 1.19 1.43 1.39

Mean 2.00 2.03 2.05 2.13

Std 0.21 0.19 0.21 0.23

= 5) P = 0.5 Mean Std 0.46 0.13 0.50 0.13 0.50 0.15 0.51 0.14

5. Conclusion In this study, the estimation problem is dealt with for repairable systems which are operated under two different environments. Malik's model is considered for the effect of corrective maintenance while there is no preventive maintenance. It is assumed that time to the first failure follows the Weibull distribution and only its scale parameter is changed by environmental conditions.

608

The likelihood function is given for the parameters, A,, A2, p and p, and the genetic algorithm is used in the experiments to find the sets of values maximizing the likelihood function. The simulation experiments show that the estimation precisions for A,, A2 and p are more sensitive to the number of units than the number of failures per unit and the degrees of the difference between two scale parameters do not highly influence the estimation precision. References 1. R. D. Baker, Data-based Modeling of the Failure Rate of Repairable Equipment. Lifetime Data Analysis 7, 65-83 (2001). 2. L. A. Baxter, M. Kijima and M. Tortorella, A Point Process Model for the Reliability of Maintained System Subject to General Repair, Commun. Statist. - Stochastic Models 12, 37-65 (1996). 3. J. F. Brown, J. F. Mahoney and B.D. Sivazlian, Hysteresis Repair in Discounted Replacement Problems, HE Transactions 15, 156-165 (1983). 4. M. Brown, and F. Proschan, Imperfect Repair, Journal of Applied Probability 20, 851-859 (1983). 5. R. Calabria and G. Pulcini, Discontinuous Point Process for the Analysis of Repairable Units, International Journal of Reliability, Quality and Safety Engineering 6, 361-382 (1999). 6. J.L. Coetzee, The Role of NHPP Models in the Practical Analysis of Maintenance Failure Data, Reliability Engineering & System Safety 56, 161168(1997). 7. J. J. Higgins and C. P. Tsokos, A Quasi-Bayes Estimate of the Failure Intensity of a Reliability-Growth Model, IEEE Transactions on Reliability 30,471-475(1981). 8. N. Jack, Analyzing Event Data from a Repairable Machine Subject to Imperfect Preventive Maintenance, Quality and Reliability Engineering International 13, 183-186 (1997). 9. M. Kijima, Some Results for Repairable Systems with General Repair, Journal of Applied Probability 26, 89-102 (1989). 10. T. J. Lim, Estimating System Reliability with Fully Masked Data under Brown-Proschan Imperfect Repair Model, Reliability Engineering & System Safety 59, 277-289 (1998). 11. T. J. Lim and C.H. Lie, Analysis of System Reliability with Dependent Repair Modes, IEEE Transactions on Reliability 49, 153-162 (2000).

609 12. M. A. K. Malik, Reliable Preventive Maintenance Scheduling, AIIE Transactions, 11, 221-228 (1979). 13. S. Martorell, A. Sanchez and V. Serradell, Age-dependent Reliability Model Considering Effects of Maintenance and Working Conditions, Reliability Engineering & System Safety 64, 19-31 (1999). 14. W. J. Park and E. H. Pickering, Statistical Analysis of a Power-Law Model for Repair Data, IEEE Transactions on Reliability 46, 27-30 (1997). 15. G. Pulcini, On the Overhaul Effect for Repairable Mechanical Units : a Bayes Approach, Reliability Engineering & System Safety 70, 85-94 (2000). 16. I. Shin, T. J. Lim and C. H. Lie, Estimating Parameters of Intensity Function and Maintenance Effect for Repairable Unit, Reliability Engineering & System Safety 54, 1-10 (1996). 17. C. P. Tsokos and A. N. V. Rao, Estimation of Failure Intensity for the Weibull Process, Reliability Engineering & System Safety 45, 271-275 (1994). 18. L. R. Whitager and F. J. Samaniego, Estimating the Reliability of Systems Subject to Imperfect Repair, Journal of the American Statistical Association 84, 301-309 (1989).

CHARACTERIZING A NEGATIVE BINOMIAL PROCESS FOR A GAMMA DISTRIBUTED FAILURE RATE*

W. H. KIM, S. E. AHN, C. S. PARK Department of Industrial Engineering, Hanyang University, Ansan 426-791,

Korea

Used as a mixing distribution for a random Poisson parameter, the gamma distribution leads to a negative binomial process. This appears to be a useful model for failure data, particularly for data from a number of repairable systems all of which follow a Poisson process but with different intensities. The hyper-parameters of the gamma distribution have different meanings according to the sources of randomness in the Poisson failure parameter. Two such sources are failure time and failure rate. Random failure time and random failure rate are interpreted in the resulting negative binomial average failure in terms of the number of failures and the intensity of a failure, respectively.

1. Introduction The important role in the area of reliability and life testing of the Poisson distribution and the associated exponential distribution of inter-failure time is well known. It does not appear that the negative binomial model has received much attention as an alternative to the Poisson model in the area of reliability. It has been used in analyzing failure data as count data but to the authors' knowledge the distribution of failure times or inter-failure times of a negative binomial distribution has not been developed for use in the area of reliability. A failure model assumes that failure intensity follow a Poisson distribution with a constant and estimated means. However, the assumption of a constant failure intensity has been questioned recently, since failure is generally uncertain and may even vary with time. Here we assume that failure numbers are described by a Poisson distribution with the random parameter following a gamma distribution. This implies that a negative binomial distribution is obtained by mixing the mean of the Poisson distribution with a gamma distribution (Agrawal, 1996). For example, consider the following situation. A manufacturer supplies quantities (e.g. one truckload) of some product to a central warehouse. The failure rates of such production constitute a renewal process, while the failure intensities for the product arrive at the warehouse according to a Poisson process.

This work has been supported by the research fund of Hanyang University (HY-2004-S), Korea.

610

611 It turns out that the distribution of the number of failures with a failure rate is negative binomial if and only if the failure rate has a gamma distribution (Engel, 1980). The main reasons for mixing the mean of the Poisson distribution with the gamma distribution are: (i) the mathematical tractability resulting from the fact that the gamma distribution is the natural conjugate to the Poisson distribution (Percy, 2002), and (ii) the considerable mathematical flexibility for fitting different distribution patterns. Many authors have found the gamma distribution to be sufficiently versatile for practical application. Burgin (1975) pointed out that the gamma distribution provides a sufficient reference to deal with most of the problems likely to occur in the application of the gamma distribution to failure control. Apostolakis and Mosleh (1979) studied the model for the evaluation of probabilities of rare events by combining the available information with a gamma distribution. Characterizations of the gamma and negative binomial distributions can be found in the literature. Gerber (1991) showed that using the generalized gamma distribution as a mixing distribution for an unknown Poisson parameter leads to a generalized negative binomial distribution. His generalized gamma and negative binomial distributions contain the gamma and negative binomial distributions as special cases. The purpose of this paper is to give an interpretation of the negative binomial failure by considering the sources of variability in the unknown Poisson parameter. Such variability comes from the unknown failure rate and the unknown failure time interval. In Section 2, the negative binomial failure as a mixed Poisson distribution is explained by a Bayesian model, where the mixing distribution is the prior distribution of the unknown Poisson parameter and the failure time is gamma distributed. In Sections 3 and 4, interpretations of the negative binomial failures are given when the sources of variability in the Poisson random failure rate are the unknown failure rate per unit failure time interval and the unknown failure time interval, respectively. 2. The Mixed Poisson distribution The Poisson distribution arises naturally in the study of data in the form of counts. For a homogeneous Poisson process {N(t),t ^ 0} with rate A, the number of events in any interval of length / is Poisson distributed with mean At. That is, for all s,t>0 Vr{N(t + s)-N(s) = y}=e-;l'^-,

v = 0,l,...

(1)

612

Note that the expected value of N(t) is X t which explains why X is called the rate of the process. We denote the distribution (1) as Poisson (y\At). We will further assume that the parameter X t is a random variable with prior distribution p(Xt) . We consider the special case when the random parameter At is gamma distributed with the shape and scale parameters of a (a positive integer) and /?>0 respectively, which is of the form pVt) = £-{kt)a-xe-Wt\ T(a)

(2)

and is denoted as gamma (At\ a,fi). Since the gamma distribution is a conjugate family for the Poisson likelihood (Percy, 2002), and Bayes' theorem states that the posterior distribution is related to the prior and likelihood distributions according to „,i , p(y\Xt)P(Xt) , p(Xt\y) = —— , p(>0

(3)

the posterior distribution of A t conditional on y is gamma (A t\ a + y, p +1). The unconditional marginal distribution p(y) can be obtained according to p(y)=jp(y,At)d(At),

(4)

where the integration is performed over the admissible range of X t. That is, by marginalizing the joint distribution of y and At, p(y,At) = p(y\At)p(At), with respect to A t, we can find the distribution p{y): p{y)=\p(y\At)p(At)d(At) =

Y"{Aty Pa o

y]

r

^t)a-\e-P(x,)d{Xt)

(«)

r(.a)yW + \r> J r(a + y) (a + y-1) a-\

P

n—i

fi + \

(5)

Equation (5) is known as the negative binomial distribution with parameters a and /?/(/? + l). We denote (5) as neg - bin (y \ a, /?). Note that the expected value of the random variable a + y in equation (5) is a times the mean

613 {P + \)lP of a geometric random variable, that is, E[a+y] = a(p + l)/p. For constant a , we have E [y] = a IP. Equation (5) shows that the negative binomial distribution is a mixture of the Poisson distribution where its parameter follows the gamma distribution, which can be expressed as follows: neg-bin{y\ a,p) = jPoisson(y\ Xt) gamma{Xt\a,P) d{Xt) .

(6)

More precisely, when the occurrences of some events (e.g. purchasing orders for a product or demands) follow Poisson (y\Xt) and the random mean rate X t follows a gamma failure time, the distribution of the number of such events is neg-bin (y\ a,P). Consider the random failure rate Xt in equation (1), which corresponds to the random number of failures which occurred during the time interval t when X is the failure rate per unit time interval. We have two cases allowing for the parameter Xt to be random; First, Xt is the random number of failures which occurred with a random failure rate per unit time interval during a fixed failure time interval t . Secondly, Xt is the random number of failures which occurred with a fixed failure rate per unit time interval during a random time interval t. By allowing one of the two variables X and t to be random and the other to be fixed, we have different interpretations of the unconditional marginal distribution in equation (5).

3. The Poisson Random Failure Rate with a Fixed Failure Time Interval When the Poisson random failure rate Xt is gamma{Xt\a,P) distributed and t is a fixed time interval, by the change of variable method we can see that the random failure rate per unit time interval X follows gamma (X\ a,pt). From equation (3) and the same derivation as in (5), the posterior distribution of X becomes gamma(X\a + y,pt + t) and the unconditional marginal distribution of y becomes

[ a-\

){pt+t)

ypt+t)

Note that equation (7) is equivalent to equation (5). Now we interpret the possible meanings of the parameters a and p. Because the prior mean failure rate Xt in the Poisson (y\Xt) likelihood of equation (1) is E (X t) - a I p, or equivalently E (X) = a I p t, the shape parameter a can be interpreted as the total failure number in the previous pt failure

614 time units. On the other hand, when we compare the form of the Poisson (y\ Xt) likelihood in equation (1) p(y\Xt)oz(Xt)y

e~Xt

oc^e-(')A

(8)

p{X)ozXa-xe-(p,)\

(9)

with the form of gamma {X\a,pi)

we can say that the prior distribution in equation (9) is equivalent to a total failure of a-\ in the previous pt failure time units. However, choosing equation (9) for X in equation (8) is still intended for use with alpt as the mean of the random X rather than (a -\)l pt. Note that if al pt converges to a constant, then al pt and (a-\)l pt are approximately the same for large Pt. In the posterior distribution of X given y which is described by gamma (X\ a + y,pt+t), the parameter (a + y) is referred to as the combined number of failures, whereas (Pt+t) is the combined total observed failure time units. The effect of gamma {X| a,pi) as a prior lead is to increase the observed number of failures y by a and increase the observed total failure time units t by pt. This gives a clear interpretation of the effect of the prior distribution in the analysis. The unconditional marginal distribution of y is given by equation (7) with mean and variance given by alp and a(P + \)lp2, respectively. It is further noted that the future failure y has a negative binomial distribution as in equation (7) and its expected value becomes alp. 4. The Poisson Random Failure Rate with a Fixed Failure Rate per Unit Failure Time Interval When the Poisson random failure rate Xt is gamma (Xt\ a,P) distributed and A is a fixed failure rate per unit time interval, by the change of variable method we can see that the random time interval t follows gamma(t\a,pX). By equation (3) and the same derivation as in equation (5), the posterior distribution of t given the failure datum y becomes gamma(t\a + y,px + X) and the unconditional marginal distribution of y becomes p(y) =

a + y-\^ a-\

PX

px + x) px+x

Note that equation (10) is equivalent to equation (5).

(10)

615 Consider a homogeneous Poisson process with a fixed unit time failure rate X and let N(t) be the total number of failures in the time interval [0, /) wherein t is a nonnegative random failure time. In the case of X = 1, Engel and Zijlstra (1980) showed that N{t) = y has a negative binomial distribution with parameters a and /?/(/?+ 1) if and only if t is gamma distributed with parameters a > 0 and p > 0 . This is equivalent to equation (6) when X = 1. For a fixed X, equation (6) becomes neg - bin (v | a, P X) = j Poisson (y | X t) gamma (t\a,pX)dt

.

(11)

The negative-binomial distribution in equation (11) can be interpreted by considering two independent Poisson processes with failure rates X and px . The Poisson likelihood in equation (11) represents the probability that exactly y independent failures, each of which has the exponential failure rate X, occur during a certain failure time interval [0,0- The distribution in equation (11) plays the role of a weighting factor for reflecting the effect of the length of / in determining the probability that exactly y independent failures with the exponential rate X occur during the failure time interval [0,0. This implies that the distribution gamma(t\a,pX) generates the failure time interval t which varies. We can now interpret the right-hand side of equation (11) as the probability that exactly y independent failures with the exponential rate X occur during the failure time interval [0,/) for all possible values of Xt wherein / is generated from gamma {t\ a,pX). Furthermore, the value of t is known as the sum of a independent and identically distributed exponential random variables with a mean of \/(PX). Therefore, the negative binomial distribution from equation (11) shows that exactly y independent failures with the rate X occur until the a th independent failure with the rate px occurs. The above negative binomial failure can be derived from the fact that a homogeneous Poisson process has the properties that the process from any point on is independent of all that has previously occurred, and also has the same distribution as the original process. In other words, the process has no memory, and hence exponential inter-arrival failure times are to be expected. The probability that exactly y failures occur in one Poisson process with the rate X until the a th failure occurs in another independent Poisson process with the rate px can be derived as follows. Let {Nx(t),t>0} and {N2(t),t>0} be two independent Poisson processes with respective failure rates X and p X . Also, let Ty denote the time epoch of the y th failure of the first process, and T% the time epoch of the a th failure of the second process. We seek P{Tly
616 Consider the special case y = a = 1. Since T\ , the time of the first failure of the JV] (0 process, and T,2, the time of the first failure of the N2 (t) process, are both exponentially distributed random variables with respective means MX and 1 ipX , it follows that P{r, 1
x

px+x

=—

•

(12)

This is the probability that one exponential random variable is smaller than another. Let us now consider the probability that two failures (i.e. y — 2) occur in the TV, (t) process before a single failure (i.e. a = 1) has occurred in the N2(t) process. To calculate P{T2
px + x)

' 1 U+i This reasoning shows that an occurrence of a failure is going to be a failure of the 7V,(0 process with probability X/(px + X) or a failure of the N2(t) process with probability PX/(PX + X), independent of all previous failures. In other words, the probability that the N^t) process reaches v before the N2(t) process reaches a is the product of two probabilities: (1) the probability of obtaining exactly a-\ failures from the N2(t) process in the first a + y-\ failures ( a -1 failures from Nx(t) process and y failures from N2(t) process), and (2) the probability of a failure on the a th failure from the N2(t) process. Thus, we have equation (10).

617 5. Conclusion The variances of the negative binomial distributions in equations (5), (7) and (10) are always greater than their corresponding means, in contrast to the Poisson distribution, whose variance is always equal to its mean. The negative binomial distribution is a two-parameter family that allows the mean and variance to be fitted separately, with variance at least as great as the mean. This is why the negative binomial can be used as a robust alternative to the Poisson distribution. In the limit as /?-><» with alp remaining constant, the underlying gamma distribution approaches a spike, and the negative binomial distribution approaches the Poisson distribution. The dominant factor in selecting a prior model for the random parameter Xt in the Poisson distribution of equation (1) is that the selected model represents the analyst's knowledge and experience concerning Xt. That is, the prior distribution should reflect the analyst's prior belief about Xt . The flexibility present in the gamma distribution of equation (2) through the choices of the parameters a and p allows the analyst to select the model that best expresses the current state of knowledge about Xt. It should be noted that a has the same meaning as the random variable y of our interest, i.e. the number of failures, and p plays a role in determining the time interval (see Section 3) or the failure rate (Section 4) in generating the values of the Poisson random parameter. Random failure time and random failure rate interpret the resulting negative binomial average failure in terms of the number of failures and the intensity of a failure, respectively.

References 1. 2. 3. 4. 5. 6.

Agrawal, N., and Smith, S. A., Naval Research Logistics, 43, 839 (1996). Apostolakis, G., and Mosleh, A., Nuclear Science and Engineering, 70, 135 (1979). Engel, J., & Zijlstra, M., Journal of Applied Probability, 17, 1138 (1980). Gerber, H. U., Insurance: mathematics and economics, 10(4), 303 (1991). Percy, D. F., European Journal of Operational Research, 139(1), 133 (2002). T. A. Burgin, Operational Research Quarterly, 26(3), 507 (1977).

ON NONPARAMETRIC TESTING EQUALITY OF RESIDUAL LIFE TIMES JAE-HAK LIM Department

of Accounting Hanbat National Taejeon 305-719, Korea

University,

DONG HO PARK Department of Information andStatistics Hallym Chuncheon 200-702, Korea

University,

A nonparametric procedure is proposed to test the exponentiality against the hypothesis that one life distribution has a greater residual life times than the other life distribution. Such a hypothesis rums out to be equivalent to the one that one failure rate is greater than the other and so the proposed test works as a competitor to more IFR tests by Kochar (1979, 1981) and Cheng (1985). Our test statistic utilizes the U-statistics theory to establish its asymptotic normality and consequently, a large sample nonparametric test is proposed. The power of the proposed test is investigated by calculating the Pitman asymptotic relative efficiencies against several alternative hypotheses. A numerical example is presented to exemplify the proposed test.

1. Introduction In reliability theory the concept of ageing plays a fundamental role of classifying the life distributions. The classes of increasing failure rate(IFR) and increasing failure rate average(IFRA) are based on the monotonicity pattern of failure rate of the distribution and the classes of decreasing mean residual life (DMRL) and new better than used in expectation(NBUE) are classified by the pattern of its mean residual life. The class of new better than used (NBU) is defined by utilizing the stochastic ordering of the residual life length. The dual classes of DFR, DFRA, IMRL, NWUE and NWU are defined by reversing the pattern of failure rate, mean residual life or residual life length. The border distributions for all of the above classes are in the class of exponential distribution. To test the null hypothesis that the distribution is exponential against the alternative that the distribution belongs to one of the above classes, many authors have proposed a number of nonparametric tests in the literature. Proschan and Pyke (1967), Barlow and Proschan (1969) and Ahmad (1975,2001) propose the IFR tests. For the NBU and NBUE alternatives, there 618

619 exist several nonparametric tests by Hollander and Proschan (1972), Koul (1977), Hollander and Proschan (1975) and Ahmad (2001), among many others. Hollander and Proschan (1975) and Aly (1990) also propose some classes of tests against the DMRL alternatives. Since Chikkagoudar and Schuster (1974) consider the problem of comparing two populations in terms of their failure rates, Kochar(1979, 1981) and Cheng (1985) develop several nonparametric procedures for testing the null hypothesis that two life distributions are equal against the alternative that one failure rate dominates the other. Such tests are proved to be useful to compare two used items with different underlying life distributions with regard to their degradation processes as they age. There exist other tests that compare two life distributions with respect to their NBU-ness or IFRA-ness by Hollander, Park and Proschan (1986) and Tiwari and Zalkikar (1991 ). Let X > 0 be a life length random variable with continuous survival function F(x) =P{X > x). Then the residual life time at age / , denoted by Xt, is a random variable with continuous survival function Fi(x) = F(x + t)/F(t), x,t>0. In this paper we propose a new nonparametric procedure to test the equality of two life distributions against the alternative that one life distribution has a greater residual life length than does the other life distribution. The following situation illustrates how our proposed test might be proved useful. Suppose that there are two groups of patients suffering from a certain type of cancer, one group being treated by a certain medical treatment and the other group being a placebo group. To check if the medical treatment is effective in treating that particular type of cancer, the medical authority wishes to test the hypothesis that the residual life length of the treatment group is stochastically greater than the residual life length of the placebo group after the treatment group receives the treatment for a certain length of time, while the placebo group is not treated at all for the same length of time. If such a hypothesis is accepted, then the treatment is verified to be effective and the medical authority draw a conclusion that the medical treatment may prolong the residual life of the patient with that particular type of cancer. To the best of our knowledge, no other tests have been proposed to test the stochastic ordering of two residual life lengths. However, it can be shown that the stochastic ordering of residual life lengths is equivalent to the stochastic ordering of failure rate. Thus, the efficiency of our proposed test can be evaluated by calculating the Pitman asymptotic relative efficiencies with respect to the more IFR tests suggested by Kochar (1979, 1981) and Cheng (1985).

620

In Section 2, we derive the test statistics for detecting greater residual life length property. Section 3 proposes a nonparametric test procedure by proving asymptotic normality of the proposed test statistic. The consistency of the test is also proved. Sections 4 illustrates a numerical example. 2. Test Statistic Let X and Y be nonnegative random lifetimes of two systems with continuous distribution functions F(t) and G(t) and let Xt and Yt be its corresponding residual life times (RLT) at age t, respectively. In this section we develop a test statistic for testing the null hypothesis H0 : F=G (the common distribution is unspecified) versus the alternative hypothesis SI

Ha : X, < Yt for all t > 0 , with strict inequality holding for some t, based on two complete random samples Xx ,...,Xm and Yx ,...,Yn taken from F and G , respectively. We assume that X_ = ( Xt,...,Xm ) and Y_ = ( 7 , , . . . , Yn) are independent. Ha states that the residual life time at age t is stochastically greater when the underlying distribution is G than when the underlying distribution is F . Note that under Ha, ~F{x + t)G^(t) 0 . Natural test statistic for H0 versus Ha can be derived based on the following parameter def

A(F,G)

oooo

= 2 \\{G{x + t)F{t)-F{x

+ t)G(t)}dG{x + t)dF{t)

00 oo

2

= \G (t)F(t)dF(t) 0

Q000

- 2 | \F{u)G

(t)dG(u)dF(t).

0/

Under H0 , A(F,G) = 0 and under Ha , A(F,G) > 0 . Thus A(F,G) can be used as a measure of departure from HQ and the larger value of A(F,G) indicates that Yt is stochastically greater than Xt. Let Fm{Fm) and Gn(Gn) denote the empirical (survival) functions formed by random samples X_ and Y_, respectively. A nonparametric test statistic for testing H0 versus Ha can be formed by substituting Fm and Gn in place of F and G in A ( F , G ) , respectively , as

621 Am,„=A(Fm,G„) CO

COCO

\G„2 (t)Fm (t)dFm(t) - 2 \ \I(u > t)Fm {u)G

n(t)dG„(u)dFm(t)

00

0

1 - j - j - d Z Z Z t / d V , >Xh)I(Yh m n h h J\ h - 21{Yh >Xh )I{Xh

>Xh)I{Xh>Xh)

>Yh ) I(Yh >Xh )]}

1

m

2n 2-zzzz^^,^2,y7l,^),

Where

hhh h I(a>b)=\ifa>b,=0ifa{Xx,X2J,,Y2) = I{Yx>X,)nY2>Xx)

-2I(YX>XX)I(X2>YX)I{Y2>

and I{X2>XX)

Jf,).

Note that Am n is a U-statistic utilizing the symmetric kernel for an estimable parameter (j){Xy ,X2,7, ,Y2 ) . Significantly large values of A m n may indicate that Y has a stochastically greater residual life at age t than X. Thus, our test is to reject H0 in favor of Ha if Am is too large. In the following section, we establish the asymptotic normality of Am „ and propose the two-sample residual life test. 3. Two-sample RLT Test The limiting distribution of Am n can be established by applying the Hoeffding's (1948) U-statistics theory. To evaluate the asymptotic variance of Am , we need the following conditional expectations. Direct calculations yield E{
°°

= G (*,)*TO"2(?(*,) J[G(A-,)-G(«)]dF(«), *i

£{<*(*,,*2,r,,y2) I *2} = fG2(u)dF(u)-2 0

\G(u)[G{u)-G(X2)]dF(u), 0

622

E{0(Xx,X2,Yx,Y2)

|F,}=

p(u)F{u)dF(u)-2F{Yx)p{u)dF(u), 0

0

E{{XX,X2JX,Y2)\Y2) Y2

Y2

_

= JG(u)F(u)dF(u)-2[ 0

p(u)F(u)dG(u) 0

oo

+ JF(u)F(Y2)dG(u). Thus, under H0 : F-G,ihc a] = Varij^

asymptotic null variance is obtained as

E{{XX ,X2 Jx ,Y2)

\Xk}]

*=i

+ Var[fjE{^(Xx,X2,Yx,Y2)

\ Yk}]

k=\

+ ^F\xx)}2

= Var{l-F(Xx) 2 105

2 105

+Var{^-F(YX)+^F3(YX)}2

4 105 '

The asymptotic distribution of Amn can be readily obtained by direct applications of Hoeffding's (1948) two-sample U-statistic theory (cf. Randies and Wolfe (1979)). The asymptotic null distribution of Amn is summarized as Theorem 3.1. Let X - lim ml N for 0 oo. Under H0, the limiting distribution of (105Ny2 Amnl2 is a standard normal distribution. Due to Theorem 3.1, the approximate a -level test is to reject H0 in favor oftfaif (l05N/lAm,„ .

>z ,

623

where za is the upper a - percentile point of the standard normal distribution. This is referred to as a two-sample RLT test. Assuming that F and G are continuous, Am is strictly greater than 0 under Ha and hence, the asymptotic normality of Am„ ensures the consistency of the two-sample RLT test against the class of (F, G) pairs for which Ha holds. The asymptotic unbiasedness of the two-sample RLT test can be proved by showing that Pr (

(105A0^„

for sufficiently large N,

>za)>a

whenever the alternative hypothesis holds true.

We first observe that pr((105A0

2

Am,„ > Z a ) = P r ( i V X ( A m n

-A(F,G)>

If A ( F , G ) = 0, the probability is equal to a. Under Ha, A(F,G) > 0 and thus, by the asymptotic normality of Am , the result follows for sufficiently large TV. 4. An Example In this section, we illustrate the use of two-sample RLT test based on the data set from Proschan(1963). Table 1 shows the life lengths (in hours) of the airconditioning systems of two different planes. Direct calculations yield Jm,n =—^TT.I.ZI.
and yl\05*NJmn

= -6.7209797£- 3

i\i2jlj2

12 = Vl05*45(-0.006721)/2 = -0.2310. Thus, the p-value

is obtained as P ( Z > -0.2310 ) = 0.5913, which strongly suggests to accept H0. The test suggests the equality of random residual life lengths of two underlying distributions, which agree with Hollander, Park and Proschan's

624

(1986) and Lim, Kim and Park's (2004) test results regarding the more NBUness. Table 1. Lifelengths of air-conditioning systems of planes 8045 and 7049

X (plane8045) 230 54 209 34 152 32 134 27 102 14 67 14 66

310 208 208 186 156 130 118

79 76 70 62 61 60 59

61 59

101 90

57

84

56 49 44

Y (plane7909)

44 29 26 25 24 23 20 14 10

References 1. L A . Ahmad, A nonparametric test for the monotonicity for a failure rate function, Comm. Statist. 4, 967-974 (1975). 2. I. A. Ahmad, Moment inequalities for aging families of distributions with hypotheses testing applications, Journal of Statistical Planning and Inference 92, 121-132 (2001). 3. E. A. A. Aly, Tests for monotonicity properties of the mean residual life function, Scand. J. Statistics 17, 189-200 (1990). 4. R. E. Barlow and F. Proschan, A note on test for monotone failure rate based on incomplete data, Ann. Math. Statist. 40, 595-600 (1969). 5. K. F. Cheng, Tests for the equality of failure rates, Biometrika 72, 211-215 (1985). 6. M. S. Chikkagoudar and J. S. Schuster, Comparison of failure rates using rank tests, J. Am. Statist. Assoc. 69, 411-413 (1974). 7. W. A. Hoeffding, A class of statistics with asymptotically normal distribution, Ann. Math. Statist. 19, 293-325 (1948). 8. M. Hollander, and F. Proschan, Testing whether new is better than used, Ann. Math. Statist. 43, 1136-1146 (1972). 9. M. Hollander, and F. Proschan, Tests for the mean residual life, Biometrika 62,585-593(1975). 10. M. Hollander, D. H. Park, and F. Proschan, Testing whether F is more NBU than is G, Microelectronics and Reliability 26, 39-44 (1986b).

625

11.

12. 13. 14. 15. 16. 17.

S. C. Kochar, Distribution-free comparison of two probability distributions with reference to their hazard rate, Biometrika 66, 437-441 (1979). S. C. Kochar, A new distribution-free test for the equality of two failure rates, Biometrika 68, 423-426 (1981). H. L. Koul, A test for new better than used. Comm, Statistist. A-Theory Method 6, 563-573 (1977). J. H. Lim, D. K. Kim and D. H. Park, Testing equality of NBU-ness at a specified age, Advances and Applications in Statistics 4, 97-108 (1967). F. Proschan and R. Pyke, Tests for monotone failure rate, Proc. Fifth Merk. Symp. Math. Statist. Prob. 3, 293-312 (2004). R. H. Randies and D. A. Wolfe, Introduction to the theory of nonparametric statistics, John Wiley, New York (1979). R. C. Tiwari, and J. N. Zalkikar, A new measure of IFRA-ness and its application to two sample test, Statistics 22, 3, 419-430 (1991).

P A R A M E T E R ESTIMATION OF T H E S H A P E P A R A M E T E R OF T H E G A M M A D I S T R I B U T I O N FREE FROM LOCATION A N D SCALE INFORMATION

H I D E K I N A G A T S U K A , HISASHI Y A M A M O T O Faculty of System Design, Tokyo Metropolitan University, 6-6 Asahigaoka, Hino-shi, Tokyo 191-0065, Japan E-mail: [email protected], [email protected] TOSHINARI KAMAKURA Department of Science and Engineering, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan E-mail: kamakura@indsys. chuo-u. ac.jp

The gamma distribution, having location (threshold), scale and shape parameters, is used as a model for distributions of life spans, reaction time, and for other types of non-symmetrical data. It has been said that the inference for the threeparameter gamma distribution is difficult because of nonregularity in maximum likelihood estimation although numerous papers have appeared over the years. On the other hand, the methodology for inference for the two-parameter gamma distribution have been established over the years. It is usual to avoid fitting the three-parameter gamma distribution and to fit the two-parameter gamma distribution to data in practice. In this article, we propose a new method of estimation of the shape parameter of the gamma distribution based on the data transformation free from location and scale parameters. The method is easily implemented with the aid of table or graph. A simulation study shows t h a t the proposed estimator performs better than the maximum likelihood estimator of the shape parameter of the two-parameter gamma distribution when the threshold is existent even though that is close to zero. Key Words: Gamma distribution; Threshold parameter; Location and scale parameter free; Maximum likelihood estimator; Order statistics; Method of moments.

1. Introduction The gamma distribution, having location (threshold), scale and shape parameters, is used as a model for distributions of life spans, reaction time, and for other types of non-symmetrical data. A random variable X has a gamma distribution if its probability density function (pdf) and cumulative 626

627

distribution function (cdf) are of forms: J(.;a,A7)

G{x\a,P,i)=

= ^

^

^

I g(z\<*,p,i)dz,

^

,

a>0,/J>0,x>7>(l)

a > 0,/3 > 0,x > 7.

(2)

This distribution includes x 2 distribution, Ehrang distribution and exponential distribution. It depends on three parameters a, (3 and 7. If 7 is equal to zero, the distribution is termed the two-parameter gamma distribution. The three-parameter gamma distribution has the problem of nonregular estimation when the three parameters are unknown. When a < 1, the likelihood function tends to infinity as 7 approaches the smallest observation. When a > 1, there exists a local maximum. However, if a is near to 1, the unstable results arise, even though it exceeds 1. Johnson et al.- recommends not to use the maximum likelihood estimators unless it is expected that maximum likelihood estimate of a is at least 2.5. It has been said that the inference for the three-parameter gamma distribution is difficult although numerous papers have appeared over the years. The fitting three-parameter gamma model to data has been avoided in many cases. On the other hand, two-parameter gamma distribution is used frequentry in practice. The methodology for inference for the two-parameter gamma distribution have been established over the years. The details of that, especially maximum likelihood estimation and the properties of maximum likelihood estimators, is well described in Bowman and Shenton ? . Johnson et aV mentioned that it is possible to assume 7 is zero for fitting the threeparameter gamma distribution in many cases. It is usual to avoid fitting the three-parameter gamma distribution and to fit the two-parameter gamma distribution to data in practice. In this article, we propose a new method of estimation of the shape parameter of the gamma distribution based on the data transformation free from location and scale parameters. The method do not require complicated calculation and iterative method and is easily implemented with the aid of table or graph. We focus on estimation of the shape parameter of the gamma distribution. The other parameters are easily estimated when the shape parameter is estimated. A simulation study shows that the proposed estimator performs better than the maximum likelihood estimator of the shape parameter of the two-parameter gamma distribution when the threshold is existent even though that is close to zero.

628

2. W - t r a n s f o r m a t i o n Let Xi,i = l , . . . , n (n > 3) be independently distributed with common distribution function (2) and X^, 1 < i < n, be the order statistic of order i. Now, we consider the following transformation:

w

=

X(j) - X,(1)

2,...,n-l.

^ x<, x (1)

(3)

We refer to this transformation as "W-transformation" (Nagatsuka and Kamakura' ). It is shown that the W-transformation statistics (3) are independent of the location 7 and scale /? as follows. For i — 1 , . . . , n, let Yjj) be ith order statistics of the random variables distributed identically with the standard distribution of (2) denoted by F ( x ) ( = G(x; a, 1,0)) then,

_ x(i) - x(1) _ 7 + pY(i) - ( 7 + jgy(1)) _ r w - r((i)

W{1) =

- ^ (n)

-

-X'I (i)

7 + PY(n) - (7 + /3F(1))

^(n) - Y{1

For the gamma distribution (2), we obtain the marginal distribution function of W(i) as follows:

= Pr(^)< W )=Prf^ W "^ (1) < w VA(n)-A(i) = p r ( y ( 1 ) < y w < « ; ( K ( n ) - K ( 1 ) ) + y(1)) /*oo

= / J0

n-2

/>oo

/

n(n - l)/(u)/(t/)

Ju

.fc=i-l

x {F{{l-w)u

+

wv)-F(u)}k + wv)}n-k-2

x {F(v) -F((l-w)u = /

n(n-

Jo Ju

n-2

E .fc=i-l

{ F ((1 - u/) F-\u)

dvdu

n-2 Jfc + w F~\v))

- «}' n-fe-2

dvdu,

(4)

where the function /(a;) is the pdf of standard distribution of (2), that is, f(x) = g(x;a, 1,0).

629

3. Estimation of a 3.1. The Mixture ofW(i)

Distribution

of the Marginal

Distribution

We consider the mixture of the marginal distributions of W^), i = 2 , . . . , n1 as follows: F{n\w)

= / / n{n-l)(v-u)n-3F((l-w)F-1{u) Jo Ju

+ wF-1{v))

n-2 Then, the mean of the mixture distribution (5) is given by

dvdu (5)

Mr. (a) /»00

/

h-F^(w)\

1+ n-2

dw

-„(„-!)/ / J* u> F l Jo Ju {v)-F-l{u)

/ JF-Hu)

F(t)dtdvdu. (6)

3.2. Estimation

of a

The mean of the mixture distribution (5) is also expressed as -i

n—1

5 > [ W ( i ) ] = i5[Wl.

(7)

whereW =£-£,?=! WW. From Eqs. (6) and (7), we obtain the following equation: /i„(a) = E[W}.

(8)

From Eq. (8), we present an estimator a, of a, which satisfies the following equation: Vn(a) = W.

(9)

It is readily seen that the estimator a are distributed independently of /3 and 7.

630

4. Simulation Study In order to compare numerically the maximum likelihood estimator of the shape parameter of the two-parameter gamma distribution (MLE) and the proposed estimator (Prop.) under the condition that 7 is near to zero, we performed a simulation study. Without loss of generality, we fix (3 = 1. We let a = 0.5, and 1.5 (we consider two very different shapes of the distribution) and 7 = 0.0(0.1)0.2(0.2)1.0. Performance of the estimators will depend on the number of the sample size. We let n = 10,25,50 and 100. For each configuration, 10000 simulations were carried out. The random number generator is from IMSL. The samples were discarded when the procedures failed to find parameter estimates. Table 1 shows the rate of successful runs for each estimator of a. For a = 0.5 and n = 25,50 and 100, the rates of successful runs for the proposed estimator are equal to or larger than the corresponding maximum likelihood estimator. Otherwise, that for the maximum likelihood estimator are equal to or larger than the corresponding proposed estimator. Table 2 shows the bias and root mean squared error (RMSE) for each estimator of a. The results are summarized as follows: • As the sample size increases, the performances of both estimators improve. • A s a becomes larger, the performances of both estimators change for the worse. • As 7 becomes larger, the performance of the maximum likelihood estimator changes for the worse. • the performance of the proposed estimator doesn't vary according to values of 7. • For 7 = 0, the maximum likelihood estimator has less bias and RMSE than the corresponding proposed estimator. • For 7 > 0.1, the proposed estimator has less bias than the corresponding maximum likelihood estimator. • For a = 0.5 and 7 > 0.2, the proposed estimator has less RMSE than the corresponding maximum likelihood estimator. • For a = 1.5 and 7 > 0.8, the proposed estimator has less RMSE than the corresponding maximum likelihood estimator. These results suggest that the proposed estimator is generally, but not uniformly, better than the maximum likelihood estimator when the threshold is existent.

631 Table 1. 7 0.0

0.2

0.4

n 10 25 50 100 10 25 50 100 10 25 50 100

a= =0.5 Prop. MLE 1.0000 0.9825 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9794 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9963 0.9850 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

The rate of successful runs (/? = 1.0) a—=1.5 MLE Prop. 1.0000 0.8746 1.0000 0.9849 1.0000 0.9995 1.0000 1.0000 1.0000 0.8679 1.0000 0.9826 1.0000 0.9988 1.0000 1.0000 0.9996 0.8707 1.0000 0.9846 1.0000 0.9988 1.0000 0.9999

7 0.6

0.8

1.0

a= =0.5 MLE Prop. 0.9845 0.9820 1.0000 0.9999 1.0000 1.0000 1.0000 1.0000 0.9587 0.9826 1.0000 0.9992 1.0000 1.0000 1.0000 1.0000 0.9207 0.9842 1.0000 0.9966 0.9998 1.0000 1.0000 1.0000

a—1.5 MLE Prop. 0.9996 0.8730 1.0000 0.9828 0.9994 1.0000 1.0000 1.0000 0.9982 0.8737 0.9832 1.0000 0.9998 1.0000 1.0000 1.0000 0.8695 0.9968 0.9826 1.0000 0.9994 1.0000 1.0000 1.0000

Note: Based on 10000 random samples.

5. A n Example In order to illustrate the estimation procedure proposed in this paper, we fit the gamma distribution to the data (Davis?) using the proposed method. The data are hours to failure for 188 transmitting tubes. Greenwood and Durand ? attempted to fit an two-parameter gamma curve to the failure data and estimated a = 1.3538 and /3 = 23.79. The mean of the observed value of the W-transformation statistics (3) is 0.201631. Then the estimate of the shape parameter is given as the solution of Eq.(9). The estimate is also obtained easily from table or graph which show the values of /x„(o:). We calculate a = 1.31433. We also read a = 1.3 from Figure ?? (If greater accuracy is required, linear interpolation in table which shows the values of ^n(a) should be used). Then the maximum likelihood estimates of j3 and 7 of the gamma distribution (2) with a = 1.31433 are obtained as $ = 23.322 and 7 = 1.543. 6. Concluding remarks The new estimator of the shape parameter of the gamma distribution free from location and scale parameters was proposed. A simulation study showed that the proposed estimator performed better than the maximum likelihood estimator for two-parameter gamma distribution when the threshold is existent. The success rates of finding the parameter estimates were shown. It is seen that the failure rate for the proposed estimator is larger than the corresponding maximum likelihood estimator. As a future work, we will improve

632 Table 2. 7

n

0.0

10 25 50

0.1

100 10

25 50

0.2

100 10

25 50

0.4

100 10

25 50

0.6

100 10

25 50

0.8

100 10

25 50

1.0

100 10

25 50 100

Simulation Summary (/3 = 1.0)

a=0.5 MLE Prop. 1.035 1.291) 0.452 (2.077) 0.769 0.827) 0.106 ( 0.562) 0.696 0.721) 0.048 0.306) 0.666 0.678) 0.026 0.230) 1.035 1.291) 0.452 2.077) 0.769 0.827) 0.106 0.562) 0.696 0.721) 0.048 (0.306) 0.666 ' 0.678) 0.026 0.230) 1.908 2.454) 0.499 2.204) 1.368 1.476) 0.104 0.573) 1.236 [ 1.279) 0.044 0.302) 1.181 , 1.201) 0.027 0.230) 3.832 [ 4.919) 0.512 2.206) 2.723 [ 2.983) 0.105 0.616) 2.432 [ 2.531) 0.048 0.311) 2.311 ' 2.355) 0.024 0.224) 5.993 ( 7.519) 0.468 2.161) 4.307 [ 4.766) 0.102 0.550) 3.842 [ 4.028) 0.044 0.291) 3.630 ( 3.713) 0.031 0.230) 8.142 ( 9.890) 0.486 2.174) 6.176 ( 6.915) 0.106 ' 0.665) 5.464 ( 5.760) 0.043 ' 0.301) 5.108 ( 5.237) 0.027 0.226) 9.913 (11.612) 0.466 2.069) 8.226 [ 9.191) 0.098 0.503) 7.228 ( 7.645) 0.043 0.299) 6.805 ( 6.995) 0.027 ( 0.228)

a=1.5 MLE Prop. 1.005 ( 1.676) 1.210 ( 4.458) 0.564 0.782) 0.820 (3.012) 0.458 0.575) 0.415 1.743) 0.406 0.466) 0.187 0.870) 1.005 1.676) 1.210 4.458) 0.564 0.782) 0.820 (3.012) 0.458 0.575) 0.415 ( 1.743) 0.406 0.466) 0.187 0.870) 1.472 2.189) 1.238 ( 4.544) 0.944 1.149) 0.907 3.268) 0.803 0.897) 0.367 1.624) 0.747 0.795) 0.163 0.861) 2.359 3.100) 1.155 4.326) 1.683 1.883) 0.775 2.927) 1.512 1.605) 0.398 1.728) 1.429 1.474) 0.185 0.915) 3.376 4.242) 1.164 4.385) 2.512 2.750) 0.853 3.111) 2.261 2.365) 0.424 1.750) 2.164 2.213) 0.191 0.908) 4.474 5.479) 1.173 4.354) 3.361 ' 3.638) 0.936 3.315) 3.075 3.198) 0.395 1.680) 2.947 3.006) 0.208 0.948) 5.609 6.740) 1.182 4.360) 4.327 4.661) 0.858 3.120) 3.930 4.076) 0.418 1.767) 3.766 ( 3.836) 0.168 ' 0.871)

Note: Entries are the bias and (RMSE) for each estimator based on 10000 random samples.

this point. The proposed estimation procedure were illustrated with an example. We suggested to use table or graph which shows the values of j i n ( a ) - With the aid of table or graph, the estimates is easily obtained without complicated calculation and iterative method. The maximum likelihood estimator of a of two-parameter gamma distribution does not show the good performance when the threshold is existent. The almost estimators of a of three-parameter gamma distribution that have been proposed up to the present have some problems and require the simultaneous solution of nonlinear equations by iterative method. We recommend the use of the proposed estimator.

633 0.6

0.55 0.5

n=10 n=25 : n=50 :

\

n=200

!

!

I

i

]

:

!

0.45 0.4 0.35

S

0, 0.25 0.2 0.15 0.1 0.05

0

:

'•

X - ~ ~ — —

-

:

••"

""""

j^-::.''--- --''V7'V7

:

:

f//fy''r-' I

p-: :. I , , ,

0

,

1

2

3

4

I

5

6

:

I , , , i I , . , , I , i i i I ,

7

8

9

10

11

12

13

, I

14

15

a Figure 1. Graphs of p„(m) (n = 10, 25, 50, 100, 200)

References 1. Bowman, K. O. and Shenton, L. R., Properties of estimators for the gamma distribution, DefcA:er(1988). 2. Davis, D. J., An analysis of some failure data, J. Amer. Statist. Assoc. 47 113-150 (1952). 3. Greenwood, J. A. and Durand D., Aids for fitting the gamma distribution by maximum likelihood, Technometrics 2, 55-65 (1960). 4. Johnson, N. L. , Kotz, S. and Balakrishnan, N., Continuous Univariate Distributions, Vol.1, 2nd ed. , Wiley (1994). 5. Nagatsuka, H. and Kamakura, T., A new method of inference for Weibull shape parameter, J. Rliab. Eng. Assoc. Jap. 25, 583-597 (2003). 6. Nagatsuka, H. and Kamakura, T., Parameter estimation of the shape parameter of the Castillo-Hadi model, Commun. Statist. Theory Meth. 33, 15-27 (2004).

C O M P A R I S O N OF TWO I N F O R M A T I O N S T R U C T U R E S W I T H NOISE IN B A Y E S I A N DECISION ANALYSIS *

C. H. Q I A N A N D J. C H E N College of Management Science and Engineering, Nanjing University of Technology, 200 Zhongshan Road North, Nanjing, 210009, China E-mail: [email protected] T. NAKAGAWA Department

of Marketing and Information System, Aichi Institute of Technology, 1247 Yachigusa Yagusa, Toyota, Aichi,470-0392 Japan E-mail: [email protected]

In order to raise the efficiency of Bayesian decision analysis, the comparison analysis of added information structure with noise are considered by using the correlative coefficient of information structures. The entropy measures the indefiniteness of information system in information theory. It is possible to conduct the evaluation of information reliability, which through denning the decrease of object's indefiniteness by using of conditional entropy, and further to measure the volume of information content in the added information structures. An expression about information correlation and distance that matched to Bayesian decision analysis is derived by considering the standardization of information content. Such analysis demonstrates the correlative coefficient of information structures can be used to compare and appraise the information structures with noise.

1. Introduction Decision making has been widely studied by many authors. Refs. 1-3 present maximum expected utility theory forming a basis normative decision theory under uncertainty. A Bayesian decision maker (DM) has a priori probability distribution over the true state of nature. Upon receiving a signal that depends on the state realized by an information structure, the "This work is supported by NSFC(70471017) and Humanities & Social Sciences Research Foundation of MOE of China.

634

635

DM updates her priori belief.4 Blackwell 5~~6 compared different information structures, and many authors have done a series of significative work on this topic. 7 - 1 0 But there are several limitations in the application of information comparison. The well-known measure of uncertainty for a purely probabilistic system is the Shannon's entropy.11'12 Based on the entropy concept, several authors have proposed many different methods for modeling uncertainty. 13-16 Qiu 15 defined the concepts of sensitivity, value sensitivity and precision of information to compare different information. In this paper, we present an expression about information correlation and distance matched to Bayesian decision analysis by considering the standardization of information content in generalizing based on Refs. 14 -15. Such analysis demonstrates the correlative coefficient of information structures can be used to compare and appraise the information structures with noise.

2. Setup Consider decision maker under uncertainty. Let X = {xi, X2, • • • ,xn} be a finite set of states of nature, and let P(X) = {p{x\),p{x2), • • • ,p{xn)) be a priori probability distribution over X. It is assumed that P(X) assigns a positive probability to any state. That is, p(x) > 0 for every x G X. The set of actions available to the DM is denoted by A. The DM chooses an action from a € A and obtains a utility which depends on the realized state, x. This utility is denoted by u(a,x). We assume that the DM does not observe the true state of nature and takes an action that maximizes her expected utility. Then the expected utility, in the absence of information services, is given by V0 = Eu(a*,x)

= m&xy^ p(xi)u(a, Xi),

(1)

i

where a* is an optimal action. The DM uses information services in order to make a decision. We assume that an information structure is a pair (Y,P(Y\X)), where Y = {2/1.3/2, • • • , Vm} is the set of signals ( e.g., message, observation, or a family of mutually exclusive sets) and P(Y\X) is a collection of distribution on Y, one for each state. P(Y\X) is a stochastic matrix with n rows, the i-th row of P{Y\X) is the distribution over signals given the state x, and cell p(Vj\xi) of the i-th row is the conditional probability of receiving the signal yj 6 Y given that the state of nature is x*.

636

Before taking an action the DM from one of information services observes a signal y &Y which is correlated to the state of nature by P(Y\X). Upon receiving a signal yj, using Bayes rule, the DM updates her priori probability to the posteriori probability p(Xi|Wj,

~EiP(wWp(xi)' and then chooses actions so as to maximize her expected utility. Eu{a**\yj,x)

= max ^S2p(xi\yj)u{a,xi)1

( )

(3)

where a**\yj is an optimal action given the signal yj. The marginal probability p(yj) — J2iP(yj\xi)p(xi) i s occurrence probability for the signal yj, and we have P(Y) = P(X)P(Y\X).

(4)

Then, the expected utility is given by VY = Y^p(yj)Eu(a**\yj,x),

(5)

3

in the presence of the information service. Eu(a**\yj,x) > ^2iP(xi\yj)u(a*,Xi), we have ^2p(yj)Eu(a**\yj,x) 3

According to Eq.(3),

> ^p{yj)'Y^P{xi\yj)u(a*,xi) 3

= Eu(a*,x).

(6)

i

It shows that after accepting the signal from (Y,P(Y\X)), the expected utility will not reduce in average. In particular, if there exists Xi which satisfies p(xi\yj) = 1 for any yj, the mapping

(7)

given the state xt and the expected utility in the presence of the full information is given by Vx = ^2p(xi)u{a"*\xi,xi). i

For any P(Y\X),

it is evident that Vx > VY.

(8)

637

In general, there exists yj which satisfies p(xi\yj) < 1 for any a;;. It shows that the signal from (Y, P(Y\X)) is information with noise. P(Y\X) is the noise added on X, at the same time, we call Y is not better than X. Obviously, comparison of the qualities of information with noise is the same as of information structures. 3. Comparison of the Information Structures with Noise It is necessary to compare and choose the information before using. In general, the comparison of the information, indirectly by comparing the expected utility of the services, includes priori probability, the set of actions and the utility function. The utility gains V~x — Vo,Vy — Vo are called the value of full information and information with noise, respectively. But, we can not examine all the plans of action and the utility of each action in their states is difficult to evaluate in advance. These limit the comparison and appraising of the information structures. Furthermore, as to the provider of the information, the plans of action and their utility is indefinite which leads to no accordance of comparison. In this view, we should find another way to compare and appraise information. The well-known measure of uncertainty for a purely probabilistic system is the Shannon's entropy. It is possible to conduct the evaluation of information reliability formally by putting utility u(a,x) by — log(a;). Firstly, define the decrease of object's uncertainty by using of conditional entropy, then measure the volume of information content in the added information structures. The amount of information content is one of the criterions of information appraising. It will be more valuable after standardization. In this paper, an expression about information correlation and distance is derived by considering the standardization of information content, which can be used to compare and appraise the information structures. 3.1. Entropy

and Informational

Content

Let us represent the informational content of signals Y by the Shannon's entropy. 11 The uncertainty of states can be expressed by entropy

H{X) =-Y^pixjlogpixi),

(9)

where OlogO = 0. The entropy function attains its maximum where the uncertainty is maximal: at the distribution P(X). The uncertainty becomes

638

0 from H{X) when the DM obtains full information about the state realized. For the information structure with noise, upon receiving a signal yj from information structure (Y, P(Y\X)), the uncertainty of states becomes H

(X\yj) = ~ X]p(zi|2/j) logp(xi\yj),

(10)

i

the expected uncertainty is given by

H(X\Y) =-Y^piy^HWyj).

(11)

3

The conditional entropy H{X\Y) must be interpreted as the uncertainty about the state space X given the information provided by the structure (Y,P(Y\X)): It is evident that H(X\Y)

< H(X).

(12)

The equality holds iff (if and only if) X and Y are statistically independent, i.e., p{xi\yj) = p{xi) for any i,j. This equality denotes that the result of the observing of Y is helpless at reducing the uncertainty of X. Eq.(12) shows that after accepting the signal from (Y,P(Y\X)), the expected uncertainty of X will be partly eliminated in average. Obviously, H(X)—H(X\Y) is the decrease of uncertainty about the state space X given the information provided by the structure (Y,P(Y\X)). It must be interpreted as the volume of information content about state X in the added information structures (Y,P(Y\X)). In fact, we have 15 I(X, Y) = H{X) - H{X\Y)

= H(Y) - H{Y\X).

(13)

Definition 3.1. Mutual information content I(X,Y) defined by Eq.(13) is a measure of the statistical dependence between the two information X andy. For the full information, there exists x, which satisfies p(%i\yj) = 1 and H{X\yj) = 0 for any %, then H{X\Y) = 0, denoted X C Y. It shows that the information Y can eliminate the uncertainty of X completely. If H{X\Y) = H(Y\X) = 0, then call X = Y. Definition 3.2. Complemental information content D(X,Y) is a measure of the statistical distance between the two information X and Y, is defined by D{X,Y)

= H(X\Y)

+

H(Y\X).

(14)

639

It is evident that H(X) = I(X,Y)

+ H(X\Y),

(15)

and H(X,Y)

= I(X,Y)

+ D(X,Y),

(16)

where joint entropy H(X, Y) = - J^ij P(xi> Vj) ^°EP{xi-,Vj) denotes the uncertainty about state space 1 x 7 , and p(xi,yj) is the joint probability. 3.2. Information

Reliability

and

Comparison

Definition 3.3. If Y is an added information, we define reliability function of Y as px(Y) = I(X,Y)/H(X), which is a measure of the correlative coefficient between the two information X and Y. If Y and Z are different added information, and pxiY) > px{Z), then call Z is not better than Y. Let dx(Y)

= H(X\Y)/H(X),

we have the following properties:

(1) 0 < px(Y) < 1, 0 < dx(Y) < 1 and px(Y) + dx(Y) = 1. (2) px(Y) = 0 and dx{Y) = 1 iffX and Y are statistically independent. (3) px{Y) = 1 and dx(Y) = 0 iff X C Y. Blackwell compared different information structures. He defined two orders over the set of information structures: one in terms of decision problems and the other in purely probability terms. He showed the equivalence of these two orders. 5 - 6 Definition 3.4. (Blackwell) (Y,P(Y\X)) is more information than (Z,P(Z\X)) iffVy > Vz for all sets of terminal actions, all utility functions and all a priori probability. Theorem (Z,P(Z\X)) P(Z\X) =

3.1. (Blackwell) (Y,P(Y\X)) is more information than iff there exists a non-negative stochastic matrix B, such that P(Y\X)B.

The meaning of the Blackwell theorem is both simple and intuitively clear. P(Z\X) = P(Y\X)B, i.e., the one information is represented as a transformation {garbling) of the other. That is, P(Y) = P(X)P{Y\X), i.e., P{Y\X) is the noise added on X, from P(Z) = P(X)P{Z\X), we have P(Z) = P{Y)B, i.e., B = P{Z\Y) and B is the noise added on Y. But this condition is so hard that to any A, u{a,x) and P(X) will satisfy the excepted utility of Vy higher than which of Vz- In practice,

640

we can only appraise the information structure partly. We can't judge the information by the Blackwell theorem if an information structure doesn't satisfy P{Z\X) = P(Y\X)B, but we can judge by px(Y). In fact, we have: Theorem 3.2. If(Y, P(Y\X)) px(Y)>px(Z).

is more information than (Z, P{Z\X)),

Proof. Let P{Z\Y) = B. p(zk\xi) = T,jP(yj\xi)p(zk\yj), Then,

H(X\Z) =

then

From P{Z\X) = P(Y\X)P(Z\Y), i.e., we have p(xi\zk) = T,jP(yj\zk)p{xi\yj).

-Y^p{zk)Y^P(xi\zk)^0E,p{xi\zk) k

i

= ~ ^Zp(zk) k

^ E P ^ I ^ ^ i

1

* ! ^ ) )

j

log[^p(2/j|zfc)p(ar»|j/j)]j

Let f(x) = -a; log a;, then f(x) is convex, i.e., / ( £ V AjZj) > £V A»/(xi) for any A» > 0 and ^ i A» = 1. Then we have H(X\Z)

-^p(zk)Y^p(yj\zk)Ylp(Xi\y^logp(Xi\y^

> k

j

i

= - E Ep(z^J')p(^) E ^ I ^ J ) 'ogPfel!/;) = H(X\Y)j

k

i

Thus px(Y)>px(Z).

•

The theorem shows that pxiX) c a n D e used to compare and appraise the information structures under the defined condition by Blackwell. The correlative coefficient muse be high between more information and the state space, but the opposite is not true. Under the defined condition, when B = P(Z\Y) is known, we have: 14 Definition 3.5. If Y and Z are different added information and P{Z\X) = P(Y\X)P(Z\Y), we define complementarity and redundancy of Y and Z, respectively, as d(Y,Z) = [H(Y\Z) + H{Z\Y)]/H{Y,Z) and p(Y,Z) = I(Y,Z)/H(Y,Z). The complementarity of information is a measure of the availability of the less information Z. When H{Y) > 0, H(Z) > 0 H(W) > 0, we have the following properties: (1) 0 < p(Y, Z) < 1, 0 < d{Y, Z) < 1 and p(Y, Z) + d(Y, Z) = 1.

641 (2) p(Y, Z) = 0 and d(Y, Z) = 1 iff X and Y are statistically independent. (3) p(Y, Z) = 1 and d{Y, Z) = 0 iff X = Y. (4) d{Y, Z) + d{Z, W) > d(Y, W) for any W.

References 1. J. von Neumann and 0 . Morgenstm, Theory of Games and Economic Behaviour, Princeton University Press, New Jersey (1944) 2. R. Keeney and H. Raiffa, Decisions with Multiple Objectives: Preferences and Value Trade-offs, John Wiley & Sons, New York (1976) 3. S. French, Decision Theory: an Introduction to the Mathematics of Rationality, Ellis Horood, Chichester, UK (1986) 4. H. Raiffa and R.O. Schlaifer, Applied Statistical Decision Theory, MIT Press, Massachusetts (1968) 5. D. Blackwell, Comparison of experiments. Proc. Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 93102 (1951) 6. D. Blackwell, Equivalent Comparison of Experiment. Annals of Mathematical Statistics, 24, 265-272 (1953) 7. B.K. Agnieszka, Sufficiency in Blackwell's theorem. Mathematical Social Sciences, 46, 21-25 (2003) 8. B. Eckwert and I. Zilcha, Economic implications of better information in a dynamic framework. Economic Theory, 24, 561-581 (2004) 9. I. Gilboa and E. Lehrer, The value of information - an axiomatic approach. Journal of Mathematical Economics, 20, 443-459 (1991) 10. O. Gossner, Comparison of information structures. Games and Economic Behavior, 30, 44-63 (2000) 11. C.E. Shannon, A mathematical theory of communication. Bell System Technical Journal, 27, 379-423 (1948) 12. C.E. Shannon, Communication theory of secrecy systems. Bell System Technical Journal, 28, 656-715 (1949) 13. A.D. Luca and S. Termini, A definition of non-probabilistic entropy in the setting of fuzzy sets theory. Information and Control, 20, 301-312 (1972) 14. C. Rajski, A Metric Space of Discrete Probability Distributions. Information and Control, 4, 371-377 (1961) 15. W.H. Qiu and M. Yan, Improvement of information-decision analysis. Control and Decision, 12, 353-356 (1997) (in Chinese) 16. T. Nakagawa and K. Yasui, Note on reliability of a system complexity considering entropy. Journal of Quality in Maintenance Engineering, 9, 83-91 (2003)

This page is intentionally left blank

PART VIII STOCHASTIC MODELS

This page is intentionally left blank

GREY RELIABILITY ANALYSIS UNDER Lj NORM RENKUAN GUO Department of Statistical Sciences, University of Cape Town Private Bag, Rondebosch 7700, Cape Town, South Africa YANGHONG CUI Department of Statistical Sciences, University of Cape Town Private Bag, Rondebosch 7700, Cape Town, South Africa

Grey theory initiated by Deng (1982) is a mathematical branch dealing with system dynamics having sparse data availability. Grey reliability analysis is thus advantageous because of the small sample size requirements, for example, the first-order one-variable grey differential equation only needs as little as four data points. However, the grey estimation of state dynamic law uses least-square approach, i.e., parameter estimation under L2 norm. Problems associated with L2 norm grey estimation are the model accuracy specifications. The L2 norm grey modeling campus often borrows the model fitting criteria from statistical linear model analysis, for example, using mean-sum-of squared errors as model fitting criterion and even using probability bound for it. These exercises are putting themselves in controversy. In numerical analysis and approximation theory, relative error is a standard approximation criterion although the L2 norm grey modeling campus also uses relative error as model accuracy measure. In this paper, we propose a L t norm based grey modeling and search the grey parameters in terms of simplex technique in linear programming. We will discuss briefly the grey reliability analysis under Li norm based grey state dynamics.

1. Introduction In reliability engineering modeling, or more specifically, in repairable system modeling, most of the reliability engineers are using the maximum likelihood theory for facilitating empirical analysis, which is in nature a large-sample based asymptotic theory for (asymptotic) confidence intervals and hypothesis testing. Nevertheless, we should be aware that small sample asymptotics developments in the statistical literature, for example, field and Ronchetti (1990, 1991), which facilitate approximations to the asymptotic distribution of interested quantity that models the true state of system. This approximation is still on the route of the probabilistic thinking - once the distribution of the system state is available then the information about the state is fully available. The real aim of modeling is actually to find the dynamic law of the system state. Therefore, the probabilistic thinking route is one of the possible choices. There are other choices of thinking logics, for example, fuzzy thinking, rough sets thinking, grey thinking or other thinking logics rooted in modern approximation 645

646

theory. Grey thinking is an approximation methodology aiming at directly revealing a system state dynamic relation (without the priori probabilistic distribution assumptions). However, the grey estimation of state dynamic law uses least-square approach, i.e., parameter estimation under L2-norm. Problems associated with L2 norm grey estimation may arise from the model accuracy specifications. The L2 norm grey modeling campus often borrows the model fitting criteria from statistical linear model analysis, for example, using mean-sum-of squared errors as model fitting criterion and even using probability bound for it. These exercises are putting themselves in controversy. In numerical analysis and approximation theory, relative error is a standard approximation criterion although the L2 norm grey modeling campus also uses relative error as model accuracy measure. In this paper, we propose a Lj norm based grey modeling and search the grey parameters in terms of simplex technique in linear programming. Finally, we will explore the L r norm based grey approximation of the individual repair improvements. 2. A Review of the L2-Normed GM(1,1) Model The most important grey differential equation model is the first order one variable (equal-spaced) grey differential equation model, abbreviated as GM(1,1) model, which is an efficient one with high predictive capacity. Given a (positive) discrete data sequence N [X<°>;f = [ | > (1)] ,jyo> (2)] , - , [ > » ( « ) ] ] , Equation^) is called the GM( 1,1) model with respect to observation sequence [^<0> 1 , (Liu et al, 2004) x{0) (k) + az{1)(Jfc) = fi, jfc = 2 , 3 , - , «

(1)

where the background value z^' (#) are generated by MEAN operator, z(l) (*) = MEAN(* 0) (*)) = 0.5(*(1) (k) + x{,) (k-1))

(2)

and jr (k) are generated by AGO operator, x(1)(/t) = A G O ( X ( 0 ) ) t = t ^ ( 0 , ( 0 ^

= 1 2

' '-'"

(3)

Using least-square approach (L2-nprm), the parameters a and ft can be estimated and denoted by a and J3 respectively. The filtering-predictive equation takes the discrete version of solution (with GM(1,1) least-square estimated parameter-values)

647

x{t)(k) = (x{l)(l)-fi/a)exp(-a(k-l))

+

fi/a

(4)

which is the solution to the differential equation with estimated parameters a

and J3 dx{,)/dt + ax{l)=J3 It [X^f

is

noticed

that

the

(5)

observational

data

sequence

is

= f p ° ) ( l ) J f a , [ ^ ( 2 ) ] * , . . . , [ ^ 0 » ( « ) J i 1 . The system dynamics is

assumed to be described by a first order linear (constant coefficient inhomogeneous) differential equation (5), and the theoretical system dynamics is described by f dx{i)/dt + ax{l) =/? (6)

\[d^ydt]it=x^{i) Then solution to (6) is

*<'>(0=f*W(l)-^exp(-a(,-l)) + £

(7)

a

which is called the whitenization solution to the grey differential equation GM(1,1) defined by Equation (1). Because the integrated level data is not extracted from system directly, rather created in accordance with some criteria, for example, AGO in Equation (3) and MEAN in Equation (2) proposed by Deng (1985), a natural question arises here what is the appropriate composition of jz' 1 ' (k), k = 2,3,••-,«} such that the integrated observation data sequence \x^'{k),k = 1,2, ••-,«} can be formed and lead to an optimal statistical estimation with respect to parameter a , j3, and the initial condition y. Deng (1985) argued that the only candidate at the level x (t) should be AGO based {z{l](k), k = 2 , 3 , - , / j ) , wherez(]](k) = 0.5x(x(1)(jfc) + x{>)(k-1)), while others, for example, Fan et al (2003) argued that the weight factor , denoted by co , between *'''(&) and r ' { k - l ) should not be predetermined rather than determined by the optimization procedure. In other words, the candidate, z{l](k) = cox{,\k) + (l-co)xi,}(k-\),

a,e[0,1]

should be considered and thus the L2-estimation estimation will be:

(8)

648

and the function x'1' (z) is given by (12): Then, it is required to solve the following nonlinear equation system for the parameters a,fi, and co :

Z^l[zo)(0-,o»(,)] =0 •

| ^ [ 2 ( , ) 0 ) - A 0 ] =o

do)

Mathematically, it is obvious that the optimal choice of a> is not necessary to choose as 0.5. Let us examine an example. Given a discrete data sequence X = {2.874, 3.278, 3.337, 3.390, 3.679}. We perform the parameter searching for two cases: Model (1): modeling with equal-spaced GM(1,1) model and Model (2): modeling based on Equ. 13-15 in terms of genetic algorithm. (In both models, we use x1'' (l) = 2.874 as initial values although this is debatable.) Table 1. The weight CO impact in GM( 1,1) models under L2-norm. Model (1) (2)

CC -0.03720 -0.04870

p 3.06536 2.91586

CO 0.5 (Deng) 0.048645

Squared error 0.006929752 0.002342187

It is obvious that in Model (2) a genetic algorithm-searched weight co = 0.0486454, This indicates that w is not necessarily 0.5 claimed by Deng [11-14] but it is a data-dependent parameter. Also the Model (2) gives much small squared error at xr -level, i.e., •/'' is minimized in the Model (2). However, if we think the computation convenience of GM(1,1) model which can carry on Excel, we would still appreciate the superiority of Model (1)GM(1,1) proposed by Deng [11].

649 3. An Investigation on L r Normed GM(1,1) Model Li-normed optimization was used in statistics with long history, particularly, in robust statistics. If we re-write Eq. (I) in the form x{0)(k) = /3 + a(-z{])(k)),k

=

2,3,-,n

en)

Then the L r normed formulation (L r Model I) will be

satk,(*)-[/'+«H,(*))]

(12)

We put a> as parameter searching list as Eq. 13 defined. The constraints for the L i-optimization problem (18) should be -2/(n + l)ia<2/(n 0
+ l)

(13)

where n is the sample size under consideration. Then let us write '< =x{0){k)-\/3

+ a(-z{l){k))\

ifx<0) ( * ) - [ > + a (-z ( , ) (*))' >0

+ a (-z(1) (*))]

if x(0) ( * ) - [ / ? + a (-z (l) (*))' <0

(14) e;

= XW (k)-\/3

The L] problem becomes n

mmY(el+e'k) s.t.

(15)

- 2 / ( « + l ) < a < 2 / ( « + l) 0<
min V j r ' f f c )

*M(ft)-(/J + ap>(ft))) x(0](k)

(16)

650

The term x (0, (Jfc)-(/3+a(-z (l, (*)]j) /x{0](k) is just the relative error contributed by /'* fitting. Therefore, the object function in L r Model I is a weighted sum of the (absolute-valued) relative errors. If we drop the weight term x(0) (k), then we have the L]-Model II formulation

=l5%)(e;"er)

(n>

which will gives the parameters (a,/3,u;) minimizing the sum of the (absolutevalued) relative errors. Li, Yamaguchi and Nagai (2005) argued that there are certain illogical treatments on the initial value, ;c'(0) , the derivative (or difference) approximation, atr'(t)/dt, t = k — 0,1,• • •,n , and the background value z — x^'(t) in classical grey theory proposed by Deng [11]. They proposed an approach in terms of cubic Hermite spline function for resolving these three problems. The Hermite interpolation function is #

( * ) = y,.Ft (t) + ytF ( / ) + h [y ' , G ( / ) + y \ G ( r ) ]

(18)

where (F 0 (,)=(W 2 )(1 + 20,/=|(0=< 2 (3-20

|G0(/HI-O2,

G,{ty-t\\-t)

(

}

with — , ^j_i <X<xi

(20)

X is called an interpolation point and ht is called an interpolation interval. The Hermite cubic spline function satisfies the following restrictions lim H\ (X) = lim //, (x)

Hl{xQ) = Hl{xn) = 0 Let yt = x(l) (/) and h,. = 1, and y] — dx^ (i)/dt, then following the results derived in Li et al (2005)

Y' = A~XB

(22)

651 where

r = [y*>y\>—>y,\

A=

b' =

2

1

0

•• 0

0

1/2 0

2 1/2

1/2 2

••0

0

•• 0

0

0

0

0

2

1/2

0

0

0 - • 1

(23)

2

[b0,b„...,bnf

with t>0 = , 3

3{yi-y0) U-M-JVI). i = l,2,-,»-l

(24)

*. = 3 ( 3 ' . - ^ . , ) Notice that >>,' (/ = 0,1,2, • • •, n) are the values of the derivative function of the cubic Hermite spline function Hit)

at interpolation points, i.e., y\ = H\ (/')

and the interpolation points of yt (/' = 0,1,2,• • •, n) are setting to y, = j r ' (/') = H3 (i), i.e., yt = r 1 ' (/) is just the background value used in L r optimization for parameter (a,(3). The parameters is dropped because the background value z is no longer obtained by weighing the adjacent 1-AGO values j r ' {k — l) and jr1' (k) proposed by Deng (1985). Then the Hermite cubic spline function based L]-formulation of GM(1,1) model is (abbreviated as 3HSPL L,-Model)

™pEK-|>+a(->'/)J

(25)

4. Evaluation of Individual Repair Improvement with Li-Normed M(l,l) Model Guo began his efforts in applying grey theory to repair efforts modeling since 2004. The motivation of grey theory modeling lies on two aspects. On the one hands, it facilitates a structure for sparse data modeling. On the other hand, grey modeling aims to provide individual estimated effect for each repair or PM (not

652

in statistical sense) directly. This is different from probabilistic or fuzzy modeling where the estimates are in average or cut-level sense. The grey estimate itself still contains intrinsic uncertainty. Although the individual estimated effect of repair is imprecise and not unique (as a matter of facts, it is a whitenization of the grey interval number) the repair effect does not describe the underlying mechanism (system changes under repair or PM) in statistical or probabilistic sense. In practices, the grey individual repair effect estimate provides reliability engineers or management the information for decisionmaking on production and maintenance planning. The key step in grey approximation to the individual repair effect is the partition of system functioning time into a system intrinsic functioning time and the repair effect (measured by time). This partition is carried on by GM(1,1) model. In other words, once the /,, -estimate of (a,/3) is obtained, we can use

^(k) = (^(0)-p/a)mv(-a(k)) iw(k) =

+

p/a

i{'){k)-x{'\k-l)

for obtaining the intrinsic functioning time sequence JC(0) = {xm(\),xi0)(2),-,xl0)(n)} which in nature the filtered values of x{0] under L, -norm. This is essentially the same as that under L2 -norm. For more details about L2 -normed treatments, see Guo (2005a-2005c), Guo and Love (2005). 5. Concluding Remarks In this paper, we reviewed L2-Normed GM(1,1) model and then we develop a set of Li-Normed GM(1,1) models. The grey reliability analysis under L r Norm is briefly discussed and we will carry data analysis in future research. Acknowledgment The author deeply thank Professors Nagai, Yamaguchi and Dr Li for kindly clarifying his understandings on their paper [12] in our communication e-mails. References 1. 2.

J. L. Deng. Control Problems of Grey Systems, Systems and Control Letters 1(6), March (1982) J. L. Deng, Grey Systems (Social • Economical), The Publishing House of Defense Industry, Beijing, (1985) (in Chinese)

653

3. 4. 5. 6.

7.

8.

9. 10.

11.

12.

13.

14. 15.

J. L. Deng, Grey Control System, Huazhong University of Technology Press, Wuhan City, China (1993), (In Chinese) J.L. Deng, The Foundation of Grey Theory, Huazhong University of Technology Press, Wuhan City, China (2002a) (in Chinese) J.L. Deng, Grey Prediction and Decision, Huazhong University of Technology Press, Wuhan City, China (2002b) (in Chinese) C.A. Field and Ronchetti, E., Small Sample Asymptotics, Institute of Mathematical Statistics, Lecture Notes-Monographs Series 13, USA, (1990) C.A. Field and Ronchetti, E., An Overview of Small Sample Asymptotics, in: W. Stahel, S. Weisberg (eds.), Directions in Robust Statistics and Diagnostics, Part I, Springer-Verlag, New York (1991) X.H. Fan, Q. M. Miao, and H.M. Wang, The Improved Grey Prediction GM(1,1) Model and Its Application, Journal of Armored Force Engineering Institute 17(2), 21-23 (2003) (in Chinese) R. Guo, Repairable System Modeling Via Grey Differential Equations, Journal of Grey System 8(1), 69-91 (2005a) R. Guo, A Repairable System Modeling by Combining Grey System Theory with Interval-Valued Fuzzy Set Theory, International Journal of Reliability, Quality and Safety Engineering 12(3), 241-266 (2005b) R. Guo, and C. E. Love, Fuzzy Set-Valued and Grey Filtering Statistical Inferences on a System Operation Data, International Journal of Quality in Maintenance Engineering - Advanced reliability modeling 11(3), 267-278 (2005c) G.D. Li, D. Yamaguchi, and M. Nagai, " New Methods and Accuracy Improvement of GM According to Laplace Transform", Journal of Grey System 8(1), 13-26(2005). S.F. Liu, Y. G. Dang, and Z. G. Fang, Grey System Theory and Applications, The Publishing House of Science, Beijing, China (in Chinese) (2004) S.F. Liu and Y. Lin, Grey Information, Springer, London (2006) K.L. Wen, Grey Systems: Modeling and Prediction, Yang's Scientific Research Institute, Taiwan (2004)

STATISTICAL ANALYSIS OF INTENSITY FOR A QUEUEING SYSTEM JAU-CHUAN KE Department of Statistics National Taichung Institute of Technology,

Taiwan

YUNN-KUANG CHU Department of Statistics National Taichung Institute of Technology,

Taiwan

Traffic intensity is an important measure for assessing performance of a queueing system. In this paper, we propose a consistent and asymptotically normal estimator of intensity for a queueing system with distribution-free interarrival and service times. Using this estimator and its associated estimated variance, a 100(l-a)% asymptotical confidence interval of intensity is obtained. A numerical simulation study is conducted to demonstrate performance of the proposed estimator and its associated estimated variance applied to interval estimations of intensity for a queueing system.

1. Introduction Traffic intensity (intensity) plays an important role in queueing models. It is defined as the ratio P=~, (1) M where 1/ X represents mean interarrival time and 1/ /u denotes mean service time, p can be interpreted as expected number of arrivals per mean service time in the limit, an important parameter, also called utilization factor that measures the average use of the service facility (see Gross and Harris [1]). The traffic intensity is a dimensionless quantity, but in teletraffic theory it is often quoted in 'erlangs', in honor of the pioneering effort of A.K. Erlang. This paper is aimed to propose statistical inference for the intensity p of a queueing system with distribution-free interarrival and service times. In section 2, we prove that the natural estimator p of p is strongly consistent and asymptotically normal. Based on the asymptotical normality of p , we can construct a 100(l-a)% confidence interval for p. A comprehensive simulation study is conducted in Section 3 to 654

655 demonstrate performance of p applied to interval estimation. Simulation results are shown by appropriate curves for illustrating performance of p in interval estimations. 2. Nonparametric Statistical Inference of Intensity Let X and Y represent the interarrival and service times of a queueing system, respectively. Then the intensity of the system is defined by P - ^ , Mx

(2)

where fix and fiy denote the mean interarrival time and mean service time of the system, respectively. The definition (2) is equivalent to definition (1). 2.1. Estimating Intensity Assume that X\, X2, ..., X„ is a random sample of Xand Yx, Y2, ..., Y„ is a random sample of Y. We use ( Xb Yt) to represent interarrival time and service time for the rth customer of a queueing system. Define X and Y to be the sample means ofXs and Ys, respectively. According to the Strong Law of Large Numbers (see Roussas [2], p. 196), we know that X and Y are strongly consistent estimator of fix and fiy, respectively. Thus a strongly consistent estimator of intensity p is given by p = YIX.

(3)

In practical queueing systems, the true distributions of X and Y are seldom known, so the exact distribution of p cannot be derived. But under the assumption that X and Y are independent, the asymptotical distribution of p can be developed as the following procedures. Firstly, according to the Central Limit Theorem (see Hogg & Craig [3]), we have ^(X-MX)^^N(0,
(4)

and ^i(Y-Mr)—^>N(0,(Tr),

(5)

where <JX and oy are variances of X and ^respectively. Next note that

J7i{p p)

^"^x(r

~//y}~Jly(-x~Mx)] Mx*

(6)

656

Therefore by the Slutsky's theorem (see Hogg & Craig [3]), we get •yfc(p-p)—°-+N(0,
(7)

where a2 = (/4°"y + MWX)/ f*x • Now, set &2=(X2S2+Y2S2X)/X4, where S2X =-£(X,

-X)2 and S2 = -i(Yi-Y)2

n 1=1

(8) .

n 1=1

Then tf(0,l). (9) Thus p is a strongly consistent and asymptotically normal (CAN) estimator with approximate variance a11 n . 2.2. A Confidence Interval Using the CAN estimator p and its associated approximate variance a21 n , we construct a confidence interval of intensity p for a distribution-free queueing system. Let za be the upper am quantile of the standard normal distribution, by the asymptotic distribution of V« (p - p) I o in expression (9), an approximate 100(l-a)% confidence mterval of p is obtained as \-axP{p-zallaly[n
+ zal2&lJn)

Consequently, an approximate 100(1-a )% confidence interval of p is {p-zano14n,p + zall614n). 3.

(10)

Simulation Study

We simulate the confidence interval p + za/2<J/yln as follows. The levels of (A, ju) considered in the simulation process are set to (0.1, 1), (0.5, 1), and (0.9, 1) so that the intensity p is low (p=0.1), moderate (p=0.5), and high (/)=0.9), respectively. For each level of (A,//) , random samples of interarrival times ( xx,Xj,...,xn ) and service times ( yuy2,...,yn ) are drawn from X and Y, respectively. Then the estimate p and its associated variance a~ In are computed. The 90% confidence interval of p is given as yO + z 005
657 The simulation is replicated JV=1000 times and we record the fraction of times that the 90% confidence interval contains the true intensity p, which is called the coverage percentage. This process is repeated for ra=10, 20, ..., 1000. Since the number of confidence intervals containing the true intensity p follows a binomial distribution with A/=1000 and/?=0.9, the 99% confidence interval for the coverage percentage itself is 0.9+ 2.576^0.9(1 -0.9)/1000 = 0.9 + 0.0244, or (0.876,0.924). The simulation results for the performance of p in terms of confidence interval are presented from Figure 1 to Figure 3. Under each simulated queueing system, the coverage percentages of 90% confidence interval corresponding the three kind of intensity (low, moderate, and high) are plotted versus n from 10 to 1000. All three curves almost appear in the 99% confidence interval (0.876, 0.924) and fluctuate along the nominated 90% level provided that n reaches large enough (n > 50). On the other hand in Figure 4 (p=0.5), evaluating the chance of the three curves inside the 99% confidence interval, it appears that ordering these curves by their relative performance on coverage fraction produces E 4 /Hf /1 > M/E4/l > E4/M/l. coverage percentage 0.93 0.9: 0.92 0.91 0.9

• Hi"! IMS ;:".?5ti ; I'a < w. uiits i ;>Vi\'" ^ 3 ii ! ? '•'.• ? i '; 3i;S;'.iFi*1S! HH% •] "\t\ ff.jl :ihS | i : T~'- f- :H: '- P. xi%i Viisn/'f-i?/;

0.89 t

f?T * W ill' \ I W W 'I ti

:

' i

\! i

i'"j \'i

"i \\

!

ill

i* if

0.88 0.87|& 0.87 =0.1

0.86

=0.5 P =0.9

0.85

M/k JX system _ 0.84 0.83 0

100

200

300

400

500 n

600

700

800

900

1000

Figure 1. Coverage fraction of 90% confidence interval for different values of n and p .

658 coverage percentage 0.94 0 924 0.92

kmm m* II

.A U

0.9

fW.'

0..87D

p-#); —

0.86

-

p =0.1 -0.5

-. p

0.84

= o g

Ei/M/1 system 0.82

J 100

0

I 200

I 300

I 400

I 500 n

L 600

700

800

900

1000

Figure 2. Coverage fraction of 90% confidence interval for different values of n and p . coverage percentage 0.96

I

|

0.94

!

"W pi-.

.......

4

1 , :

0.9

|

!

0.9: /\ 0.92

•r

=.

i.

•

V\/-.| •

'•

'-

.........

— . — --

...

0.86

0.84

rh ;

i

i

;

i"

I

;

P

=o.p

p -0.5

!

:

0.82

r

........... p _Q g E.VH90/1.svstBrrL _ 4

4

"'"

i 0

100

200

300

400

500 n

700

800

900

1000

Figure 3. Coverage fraction of 90% confidence interval for different values of n and p

659 coverage percentage 0.94 i r-

0.9: 0.92 i

0.9

,1,

i f xii '(

i 0.87^

0.86

M/E / I system E /M/1 system

0.84

——

0.82 0

J 100

l 200

l 300

L 400

500 n

600

700

E4/HP°/1 system

J 800

l_ 900

1000

Figure 4. Coverage fraction of 90% confidence interval for different values of n ( p = 0.5 )

4.

Conclusions

This paper provides the interval estimations for the intensity p of a queueing system with distribution-free interarrival and service times. We show that the natural estimator p is strongly consistent and asymptotically normal with approximate variance a11 n . References D. Gross and C. M. Harris, Fundamentals of Queueing Theory 3"' Ed., NewYork, John Wiley, (1998). G. G. Roussas, A course in Mathematical Statistics. 2nd edition, Academic Press, (1997). R.V. Hogg and A.T. Craig, Introduction to Mathematical Statistics, Prentice-Hall, Inc., (1995).

APPLICATION FOR MARKET IMPACT OF STOCK PRICE USING A CUMULATIVE DAMAGE MODEL

SYOUJI NAKAMURA, MIWAKO ARAFUKA Department of Human Life and Information, Kinjo Gakuin University, 1723 Omori 2-chome, Moriyama-ku, Nagoya 463-8521, Japan E-mail: snakam@kinjo-u. ac.jp;arafuka@kinjo-u. ac.jp TOSHIO NAKAGAWA Faculty of Management and Information Science, Aichi Institute of Technology, 1247 Yachigusa, Yagusa-cho, Toyota 470-0392, Japan E-mail: [email protected] HITOSHI KONDO Faculty of Economics, Nanzan University, 18 Yamazato-cho, Showa-ku, Nagoya 466-8673, Japan E-mail: [email protected] This paper considers the problem of maximizing an expected liquidation profit of holdings, when the market impact of stock price is caused by the holdings sell-off. The cumulative damage model is applied to the fluctuations of stock price. We derive and analytically discusse an optimal sell-off interval of holdings to maximaize the expected liquidation profit of holdings.

1. I n t r o d u c t i o n When we have to sell off security holdings in a short term on the market, we need to consider a liquidation policy which maximizes the total amount of security holdings in consideration of their influence for the market price. We consider the following two stochastic models of liquidation policies for the security holdings: The security holdings 5 are sold off by one time or by deviding them into n blocks, S/n is sold off. Then, the market price decreases along with the amount of disposition according to an impact function. In addition, the market price also decreases from the supply-demand relation in the market, as the accumulation of selling orders increases gradually. But, the influence degree of the price becomes lowered if the security 660

661 holdings are broken down into small blocks, however, the dealing cost gradually increases. Conversely, if the security holding are sold off by dividing roughly, the market price greatly falls, however, the dealing cost decreases. That is, the disposition lot and market price of the security holdings have a trade-off relation. Another assumption is that the stock prices rise by selling off the stocks. In addition, the market impact to which stock prices drop sharply is assumed when it reaches the threshold price. In general, it is not easy to formulate the stochastic model of market impact because it depends on various factors. A consensus of a liquidation policy for security holdings has not been obtained yet, although various approaches in academic and business fields have been made. In this paper, we consider the market impact when the security holdings are sold off on the market, and derive analytically an optimal liquidation policy which maximizes its total amount. 2. Assumption of Model The market is composed as follows: The market price changes only according to this assumption. The following notations are used: So: Nominal value of security holdings at time 0. E(n): Real value of security holdings at time n. Co: Constant dealing cost per transaction. A: Parameter of market impact function. H'. Parameter of price restoration function.

Tablel: Relation between amount of contract and transaction fee. Amount of contract (yen) 1,000,000 2,000,000 5,000,000 10,000,000 20,000,000 30,000,000

Transaction fee (yen) 12,495 21,420 47,145 82,845 140,595 198,345

662

3. Model 1 It is assumed that the security holdings SQ at time 0 have to be sold off in a certain limit time. As liquidation methods, the security holdings S are sold off by one time or S/n sold off by dividing into n every time. Then, the market price decreases along with the amount of disposition according to an impact function. After a certain time has passed, the security price recovers to the price before the previous time. In addition, the market price also decreases from the supply-demand relation in the market, as the accumulation of selling orders increases gradually. But, the influence degree of the price becomes lowered if the security holdings are broken down into small blocks, however, the dealing cost gradually increases. Conversely, if the security holding are sold off by dividing roughly, the market price greatly falls, however, the dealing cost decreases (Table 1). When the security holdings So/n are sold off on the market, the amount of liquidation decreases exponentially, and is given by {So/n)(l — e~Xn/s°). Further, letting CQ be a dealing cost of transaction, we evaluate the present values of all costs by using a discount rate a(0 < a < oo). In addition, we consider the restoration function for an increasing market price rate and the decreasing market price rate by the market impact. The amount of liquidation of transaction at time 0 is ^(l_e-A„/5

0 )

_

C o

.

(1)

n This liquidation is restored exponentially, and is given by Q\ = (So/n)(l — T e-Xn/So }ev . Then, the amount of liquidation of transaction at time T is •{Q1(l-e-V«i)_Cd}e-°T Xn s

(2)

lT

the amount of liqui-

{Q2(l-e-A/«*)-co}e-Q2r.

(3)

Further, letting denote Q2 = (5o/n)(l - e~ l °)e> , dation at time T% is

Repeating the above procedures, we have generally Qj+1 = Qj(l - e-x^)e»T

(j = 0 , . . . ,n - 2),

Qo^^.

(4)

n Hence, the amount of n times liquidation of transaction is Ei(n) = £ 3=0

{Qj(l - e " ^ ) - c 0 } e~^T,

(5)

663

Thus, substituting (4) into (5), the expected total amount of liquidation per unit of security holdings is

Zi(n)=^p3.1. Numerical

Example

(n = l,2,...).

(6)

of Model 1

In Table 1, we compute the optimal number n* numerically when SQ = 108, A = 150,000,000, a = 0.01 and a transaction cost is c0 = S0 x 0.002625 + 119595.0. For example, when // = 0.006,n* = 35 blocks and E3(n") = 93,289.357.

°

Tune

Figure 1.

Market impact for Model 1

4. Model 2 The stock is sold off for T period, and its prices is assumed to rise gradually. However, as the market impact, the price drops sharply when it reaches the threshold price. The cumulative damage model[4] is applied to this model by replacing the change of stock prices with damage. When the clearance of the stock is continuously exercised, the stock price Z(0) = ZQ at time 0 is assumed to rise to Z(t) = at + zo(a > 0). When the stock price reaches a threshold K, it drops sharply and becomes

664

Table 2 Total amount of liquidation of security holdings So = 100,000,000, A = 150,000,000 and a = 0.01. n* 33 34 35 36 38 39

M 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012

£i (n*) (yen) 92,943,840 92,647,601 93,289,357 93,949,124 94,638,556 95,367,121

0 (Figure 2). In this case, a threshold K is a random variable with a distribution function K(x). The probability that the stock price drops sharply at time jT is denoted by Pr{jaT + ZQ > K} = K(jaT + ZQ), and the probability that the stock price drops sharply at time nT is n

j —1

J2K(JaT

+ zQ)l[K(iaT

j=i

+ zQ),

(7)

»=i

where K = 1 - K. Conversely, the probability that the stock prices does not drop sharply until time nT is

l[K(jaT + z0).

(8)

3=1

It is clearly shown that (7) + (8) = 1. Similarly, the mean time to the clearance of the stock is 3-1

n

Y^UT)K(jaT

n

+ z0) J} K(iaT + z0) + (nT) J J K(jaT + z0) j=0

j=\

n-l

j=l

j

l[K(iaT+z0) ]=0

(9)

.i=0

Then, the mean time until the stock price crash is »

3

l[K(iaT

+ zo)

(10)

3 =0 .t=0

The amout of stock is divided equally in n and is sold off by time nT(n = 1 , 2 , . . . ) . That is, the stock So/n at each jT(j = 1,2,... )are sold

665 off and stock price in jT is jaT + ZQ. In addition, when the stock price drops sharply, the amount of the clearance of the stocks is assumed to be 0. The expected liquidation of So at time nT is c

i

"•

P

J-1

+ z0) Y[ K(iaT + z0) Y,(iaT + zo)

C(n) = ^\Y/K(JaT n

_ 1

' 3= 1

U=l

i=l

n

+ 11 K(jaT + zo) ^{iaT

= — T^iJaT

+ z0)

+ zQ)T\R{iaT

+ z0)

(n = l , 2 , . . . ) .

(11)

Thus, the expected liquidation per unit of time for an infinite interval is, from (9) and (11), n

j

aT z

Y,u + o) n CW4

J =

'

n -i

R iaT

( +*>)

i ^

(n = l,2,...).

(12)

j=0 i=0

We seek optimal numbers n* that maximize C(n) and C{n). 4.1. Optimal policy for Model 2 We seek an optimal n* that maximizes C(n) in (11). If C(n) > C(n + 1) then -. n

*

n+1

- Y,(J*T + zo)Kj > — j J2UaT + **)Kh i,e., n

Y,{tiaT

+ z0)Kj - [(n + l)aT + z0}Kn+1}

> 0,

(13)

i=i

where Kj = Ili=oKjiiaT + zo) (j — 1,2,...). Letting L(n) denote the left-side of (13), L(n + 1) - L(n) = (n+ l){[(n + l)aT + z0]Kn+1 - [(n + 2)aT + zQ}Kn+2} =(n+l)Kn¥1{[(n+l)aT+z0}-[(n+2)aT+z0]Kl(n+2)aT+Zo]} =(n + l)Kn+1{K[{n

+ 2)aT + z0] - aT}.

(14)

666

It can be easily seen that K[(n + 2)aT + z0] is strictly increasing to 1 in n. Therefore, we have the following policy: Thus L(n) is also strictly increasing from L(l) or it increases after decreasing once for aT < 1. Therefore, then there exists a finite and unique n* which satisfies (13).

7

10

Figure 2.

4.2. Numerical

Example

13

16

19

22

25

28

31

34

Market impact for Model 2

of Model

2

In paticular, when K{x) = 1 - e~6x, Equation(13) is J2\UaT

+ z0)expl

-[{n+l)aT+z0}expl

\j(j + 1) -aT + zQ) -9

[n + l)(n + 2)

aT+{n+l)z0)

}}>_, (15)

For example, when S = 1, a = 5, z0 = 450, T = 1 and 6 = 4.0 x 10~ 7 in Table 3, the optimal sell-off interval is n* = 42, and the maiximam profit per unit time for an infinite is C(n*) = 13.27. Further, when 6 is large, the optimal sell-off interval is small, and the expected total amount of liquidation C(n*) is low. This reason is that if the probability of stock price drops sharply is large, the risk grows, and hence, the optimal sell-off interval is small.

667

Table 3: Total amount of liquidation of security holdings So = I,a = 5.0 and T = 1.

e 4 x 10-7 8 x 10 -7 12 x 10-7 16 x 10~7 20 x 10-7 24 x 10-7

n* C(n*) 42 13.27 27 19.26 18 27.64 14 34.82 11 43.64 10 47.75

5. Conclusions In analyzing the optimal liquidation behavior, we have made the trade-off between dealing cost and market impact, and considered the optimization problem: This paper has proposed the stochastic model in which the expected cost is quantitatively evaluated to liquidation. We have to formulate the market impact. The mechanism of the market impact is complex and the formulation is not easy. The market impact is different in a disposition of large security holding and a disposition of small of security holding. We have discussed analytically and numerically the optimal division frequency of security holdings which minimizes it.

References 1. Y.Hisata and Y.Yamai, Research Toward the Practical Application of Liquidity Risk Evaluation Methods, Monetary and Economic Studies, Vol.18, No.2, December (2000). 2. R.Oubil, Optimal Liquidation of Large security Holdings in Thin Markets, working paper, January (2002). 3. R.Jarrow and S.Turnbul, Derivative Securities. Thomson Learning Company, (1973). 4. T.Nakagawa, Maintenance Theory of Reliability. Springer, (2005). 5. D.R.Cox, Renewal Theory. John Wiley & Sons Inc, (1962). 6. D.Duffie and K.J.Singleton, Credit Risk. Princeton University Press, (2003).

SOME RESULTS O N A M A R K O V I A N D E T E R I O R A T I N G S Y S T E M W I T H MULTIPLE I M P E R F E C T R E P A I R

N. T A M U R A Department

of Electrical and Electronic Engineering, National Defense Academy, 1-10-20 Hashirimizu, Yokosuka, 239-8686, JAPAN E-mail: [email protected]

This paper considers a system which is inspected equally spaced points in time and whose deterioration follows a discrete time Markov chain with an absorbing state. After each inspection, one of the following actions can be taken: operation, imperfect repair m (1 < m < M) or replacement. When imperfect repair m is taken for the system which has been repaired n times in state i, it moves to state j with probability g^{n + 1). We study an optimal maintenance policy which minimizes the expected total discounted cost for unbounded horizon. It is shown that a generalized control limit policy is optimal under reasonable assumptions. We investigate structural properties of the optimal policy. Furthermore, numerical analysis is conducted to show that these properties could hold under weaker assumptions.

1. Introduction In general, system deteriorates due to usage or age. Since deterioration is not evitable, most system can not remain in a good operating condition and eventually fail without maintenance action. To retain the desirable operating condition, it is essential to determine when and how to perform maintenance actions. To analyze this problem mathematically, various maintenance policies for stochastically failing systems have been widely investigated in the literature. The papers by Cho 1 and Wang 2 are excellent reviews of the area. When we can assume that the operating condition of a system is classified into a finite state, the deterioration process of the system could be described as a Markov process because of the tractability of the resulting mathematical problems. This model is called as Markovian deteriorating system. Derman 3 studied a discrete time Markovian deteriorating system where replacement is the only maintenance action possible, and established 668

669

sufficient conditions on the transition probabilities and the cost functions under which the optimal maintenance policy has a control limit rule. In real situations, however, system could be repaired and it might not be as good as new like replacement. That is, system after completion of repair might be younger and occasionally, it might be worse than before repair because of faulty procedure, e.g., wrong adjustments and bad parts. So, various stochastic models for systems with imperfect repair have been suggested and studied by Lam 4 , Kijima 5 , and Kijima and Nakagawa 6 ' 7 . Pham and Wang 8 provided a survey of recent studies on imperfect repair of two-state systems Focused on Markovian deteriorating system with imperfect repair, Douer and Yechiali9 proposed the idea of a general-degree of repair which is the action from any state to any better state at any time inspection and showed that, under reasonable assumptions, control limit policy holds. Chiang 10 studied a continuous time Markovian deteriorating system where it is costly to identify state through inspection and proposed an algorithm to find an optimal inspection policy. But control limit policy is not derived because of the complexity of the model. These studies don't assume multiple repair actions. In the case of medical treatment for persons who develop cancer, however, there are some options such as surgical treatment, drug treatment and radiation therapy. Then the effect and cost of treatment depend upon the option which is selected for patients and their conditions. In maintenance problem, we can interpret medical treatment as imperfect repair. Hopp and Wu 11 studied a Markovian deteriorating system with multiple maintenance actions. But this study doesn't introduce the idea of imperfect repair. Therefore it is important to pay attention to multiple imperfect repair. We propose a discrete time Markovian deteriorating system for which one can select replacement or one of M kinds of imperfect repair as maintenance action. It is shown that a generalized control limit policy holds under some reasonable assumptions and several structural properties of an optimal maintenance policy which minimizes a total expected discounted cost are provided. The outline of the paper is as follows. In the next section, the stochastic model is described in detail. In section 3, we derive a total expected discounted cost for unbounded horizon. Section 4 investigates several properties of an optimal maintenance policy. In section 5, numerical results are provided. Finally, some conclusions are drawn in section 6.

670

2. Model Description We propose the following model. The function of a system deteriorates with time, and the grade of deterioration is classified as one of N + 1 discrete states, 0, • • •, iV -f-1, in the order of increasing deterioration. State 0 represents the process before any deterioration takes place, that is, it is an initial new state of the system, whereas state N represents a failure state. The intermediates states 1, •••, N — 1 are deterioration states. The system is inspected at equally spaced points in time. Then the true state of the system is certainly identified. These states are assumed to constitute a discrete time Markov chain with an absorbing state. After each observation, we can select one of the following action. (1) action 1: we continue to operate the system until the next time. (2) action 2: we repair the system and operate it for one period. (3) action 3: we replace the system with a new one and operate it for one period. When the system in state i is operated, it moves to state j with probability pij at the next time. We assume that there are M kinds of imperfect repair. So, if repair m is performed for the system which has been repaired n — 1 times in state i, then it moves to state j with probability q^{n) after completion of repair m. We call q^{n) as repair probability. When replacement is selected for the system in state i, it becomes new, that is, the system moves to state 0 without fail. The time to replace or repair the system is assumed to be negligible. For these probabilities, we impose the following assumptions. In this paper, the term "increasing" means "nondecreasing"and "decreasing" means "nonincreasing", respectively. Assumption 1. For any h, YLj=hPij *s increasing in i. Assumption 2. For any h, m, n, Ylj=h ^Tji12)

JS

increasing in i.

Assumption 1 means that as the system deteriorates, it is more likely to make a transition to worse states. Assumption 2 implies that as the system deteriorates, it is less likely to be repaired to better states. Also, we assume that these probabilities have the next property. Assumption 3 . For any h, m, n, Ylj=h \Pij ~~ ^2i=o Qii(n)Pij) ing in i.

JS

increas-

Assumption 3 indicates that as the system deteriorates, the system which is operated until the next time is more likely to move to worse states

671 in comparison with the system which is operated until the next time after completion of imperfect repair. Furthermore, we assume that g£?(n) has the following property. Assumption 4. For any i, h, m, ^2j=hQij(n)

«* increasing in n.

The above property means that as the system undergoes imperfect repair, it is more likely to be repaired to worse states. The cost structure of this system is as follows: expected operation cost for the system in state i Ui repair cost when repair m is conducted for the system in state i ' i replacement cost for the system in state i These costs satisfy the next properties. Assumption 5. For any m, Ui, r™ and Ci are increasing in i. Assumption 6. For any m, Ui - Ci,

(1)

«< ~ ( r r + I>J?(")«i J ,

(2)

r? ~ Ci

(3)

are increasing in i. Assumption 5 means that as the system deteriorates, it is more costly to operate, repair or replace the system. Assumption 6 states that as the system deteriorates, in Eqs.(l) and (2), the merit of repair or replacement becomes bigger than that of operation, and in Eq.(3), the merit of replacement becomes bigger than that of replacement. 3. Mathematical Formulation In this section, we derive a total expected discounted cost for unbounded horizon. Let V(i,n) be the total expected discounted cost for unbounded horizon when the system which has been repaired n times stays in state i and an optimal maintenance policy is employed. We denote a discount factor by /? (0 < j3 < 1). Let Ha{i,n) be the total discounted cost when the system which has been repaired n times stays in state i and action o is selected. We denote by D(i,n) an optimal action for the system which has been repaired n times in state i. Then D(i,n) = 2„, indicates that an optimal action is to operate the system after completion of repair m (m = 1, • • • , M). We write the action that one of M kinds of repair is taken before operation as D(i,n) = 2.

672

By using the theory of Markovian decision process, Hi(i,n)

is given by

N

H1(i,n) = ui + pYlpijV(j,n).

(4)

3=0

Similarly, we get H2m(i,n)

and H3(i,n) N

as follows. (

N

Him{i,n) = »? + 5 > 3 > + 1) } Ul + ^PijV^n i=o

(

+ 1)

(5)

i=o

N

Hz(i,n)

= Ci + wo +

P^2p0jV(j,n).

Then V(i,n) is obtained by the following recursive equation. V(i,n) = min Hi(i,n) mm H2m (i, n), H3 (i, n)

(6)

(7)

Km<M

4. Structural Properties of an Optimal Maintenance Policy We derive several properties of an optimal maintenance policy which minimizes a total expected discounted cost for unbounded horizon. First, we can show that V(i,n) have the following property. Theorem 4.1. V(i, n) is increasing in i for any n, and increasing in n for any i. From this result, theorem 4.2 is derived. Theorem 4.2. There exist the states in and in for any n such that 1 for 0 < h < tn, 2 for in < h < in,

{

(8)

3 for 'in < h < N, where 0 < in < in < N + 1. Next, we examine properties of optimal maintenance policy under the following assumptions. Assumption 7. For any m, (i) r™ — rj™+1 is decreasing in i (ii) r™ — r™ +1 is increasing in i Assumption 8. For any h,, m, n, 0) Y^=h(lTj(n) n

00 12j=h(lij( )

~ qTj+l{n)) ~ Qij

n

( ))

is

decreasing in i.

is

increasing in i.

Then the optimal maintenance policy has the following structure. Theorem 4.3. For i and i' such that i < i1,

673

(i) if assumption 7-(i) and 8-(i) hold, then m>m' when D(i,n) = 2 m and D(i',n) — 2m<. (ii) if assumption 7-(ii) and 8-(ii) hold, then m < m' when D(i,n) = 2TO and D(i',n) = 2m>. Theorem 4.3 states that the optimal maintenance policy can be characterized by M + 2 regions at most. In the above theorem, assumption 7-(i) and 8-(i) state that as the system deteriorates, the merit of repair m becomes bigger in comparison with that of repair m + 1. Therefore, theorem 4.3-(i) is intuitively valid because it is appropriate to select smaller m with deterioration. Theorem 4.3-(ii) has the same interpretation as theorem 4.3-(i). As far, we consider that the system might be repaired to worse states than state immediately before repair. Next, we focus on the situation that the system is certainly repaired to better states before repair without fail. Corollary 1. Assume that J2)=h9ij(n) and i

*s increasing in m for any

i,h,n

N

Y^Cjin) = i> J2 « « ( " ) = ° f°r j=0

an

y »' m . n -

j=i+l

And, if (i) r™ — r™ +1 is decreasing in i, (») E}=h(9ij ( n ) - m' 0 <m,m' < M.

when D(i,n) = 2 m and D(i',n) = 2m< where

Next, the following assumption is imposed to investigate the properties of an optimal maintenance policy with respect to the number of repairs. Assumption 9. For any h„ m, i, (i) ^j=h{
- q%+1(n)) is decreasing in n.

(") YJj=h(l?j(n)

~ 9ij+1(n))

is

increasing in n.

Then theorem 4.4 is obtained. T h e o r e m 4.4. For n and n' such that n

(i) if assumption 9-(i) holds, then m < m' when D(i,n) = 2 m and D(i,n') = 2m,, (ii) if assumption 9-(ii) holds, then m > ml when D(i,n) = 2 m and D{i,n')=2m,, where 0 < i < N.

674

Similar interpretation as theorem 4.3 holds for the above property. Furthermore, we can obtain an upper bound of the optimal number of repairs. Theorem 4.5. / / assumption 9-(i) holds and there exists n such that Efcotfo/W > c0 + u0 - nr0, or if assumption 9-(ii) holds and there exists n such that J2i=o 1oi( ) > Co + u0 - r0, then n is an upper bound of the optimal number of repairs. By using theorem 4.5, we can find an optimal maintenance policy numerically. 5. Numerical Results In this section, numerical analysis is carried out to show that our theoretical results could hold under weaker assumptions when gt™(n) = q£ for all i, j , m, n and M ~ 2. We consider a 5-state Markovian deteriorating system and define that P = (Pij), Qm = (gg), u = {Ui), r m = ( r f ) , c = (a). Parameters used for numerical analysis is as follows. f0.2 0.0 P = 0.0 0.0

0.2\ / 0 . 9 0.04 0.03 0.4 0.8 0.1 0.04 0.5 , Q 1 = 0.7 0.1 0.1 0.6 0.6 0.15 0.08 V0.5 0.2 0.09 Vo.o o.o o.o o.o 1.0/

Qs

0.2 0.1 0.0 0.0

/0.88 0.77 0.66 0.55 Vo.44

0.2 0.2 0.2 0.0

0.2 0.3 0.3 0.4

0.05 0.11 0.11 0.16 0.2

0.03 0.04 0.1 0.08 0.1

0.02 0.035 0.06 0.07 0.1

0.02 \ 0.045 0.07 0.14 0.16 J

0.02 0.035 0.06 0.07 0.1

/? = 0.99

0.01 \ 0.025 0.04 0.1 0.11 J

(9)

(10)

u = (45.0,55.0,70.0,90.0,115.0), c = (60.0,65.0,70.0,75.0,80.0),(11) r 1 = (13.5,32.5,33.5,43.5,53.5), r 2 = (7.0,18.0,28.5,40.0,51.0), (12) The optimal maintenance policy can be summarized by table 1, where D(i,n) — D(i) and V(i,n) = V(i) for all i, n.

Table 1. i V{i) D(i)

Example of optimal maintenance policy

0 735.71 1

1 2 765.50 784.29 1 22

3 803.38 2]

4 815.71 3

675 From table 1, we find t h a t theorems 4.1, 4.2 and 4.3 hold. Then Eqs.(9) and (10) don't satisfy assumption 3. Also E q s . ( l l ) and (12) don't satisfy assumption 6. These results indicate t h a t the properties obtained by us could hold under weaker assumptions. 6.

Conclusions

We proposed a Markovian deteriorating system where one of M kinds of repair or replacement can be selected as maintenance action and investigated structural properties of an optimal maintenance policy which minimizes an expected total discounted cost for unbounded horizon. It is found t h a t generalized control limit policy holds under reasonable assumptions. Also, we derive some monotonic properties of an optimal maintenance policy under intuitively valid assumptions. These properties are useful to find an optimal maintenance policy numerically. Future work is to obtain the same results under much weaker assumptions. References 1. Cho, D.I., A survey of maintenance models for multi-unit systems, European Journal of Operational Research, 5 1 , 1-23 (1991). 2. Wang, H., A survey of maintenance policies of deteriorating systems, European Journal of Operational Research, 139, 469-489 (2002). 3. Derman, C , On Optimal replacement rules when changes of states are Markovian, In: Mathematical Optimization Techniques (R. Bellman, Ed.), The RAND Corporation, 201-210 (1963). 4. Lam, Y., Geometric processes and replacement problem, Acta Mathematicae Applicatae Sinica, 4, 366-377 (1988). 5. Kijima, M., Some results for repairable systems with general repair, Journal of Applied Probability, 26, 89-102 (1989). 6. Kijima, M. and Nakagawa, T., A cumulative damage shock model with imperfect preventive maintenance, Naval Research Logistics, 38, 145-156 (1991). 7. Kijima, M. and Nakagawa, T., Replacement policies of a shock model with imperfect preventive maintenance, European Journal of Operational Research, 57, 100-110 (1992). 8. Pham, H. and Wang, H., Imperfect maintenance, European Journal of Operational Research, 94, .425-438 (1996). 9. Douer, N. and Yechiali, U., Optimal repair and replacement in Markovian systems, Communications in Statistics -Stochastic Models-, 10, 253-270 (1994). 10. Chiang, J.H. and Yuan, J., Optimal maintenance policy for a Markovian system under periodic inspection, Reliability Engineering and System Safety, 7 1 , 165-172 (2001). 11. Hopp, W.J. and Wu, S.C., Machine maintenance with multiple maintenance actions, HE Transactions, 22, 226-233 (1990).

T R A N S I E N T ANALYSIS OF I N T E R N E T - W O R M P R O P A G A T I O N B A S E D O N SIMPLE BIRTH A N D D E A T H PROCESSES *

K. T A T E I S H I , H . O K A M U R A A N D T . D O H I Department of Information Engineering, Graduate School of Engineering, Hiroshima University, 1~4~1 Kagamiyama, Higashi-Hiroshima 739-8527, JAPAN E-mail: {okamu, dohi}@rel.hiroshima-u.ac.jp

In this paper, we analyze the transient behavior of Internet worms. We show t h a t a stochastic SIS (Susceptible-Infected-Susceptible) model to describe the Internetworm propagation can be approximated by a simple birth and death process. Specifically, when there are a huge number of hosts, the propagation of Internet worms can be modeled by the simple birth and death processes. Deriving the probability generating function of the number of infected hosts, we formulate the probability mass function explicitly, and define some dependability measures for evaluating the transient behavior of Internet worms.

1. Introduction The Internet plays one of the most important roles among all the information technologies established during the last two decades. Because of growth of the Internet, it is possible to communicate easily with a number of users through the Internet. On the other hand, the Internet causes several social problems such as the Internet worm and the cyber-terrorism. In particular, the Internet-worm propagation is one of the most severe problems, because the damage caused by the Internet worms is spreading day by day, and their activities are becoming more and more malicious. Recently, some researchers attempt to predict the propagation process of Internet worm by mathematical modeling 1'2'3^5. Most of them are based on the continuous-time Markov chain (CTMC) to describe the Internetworm propagation. In their modeling, a number of-states are required to "This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Exploratory Research, Grant No. 15651076 (2003-2005).

676

677

represent the behavior of the Internet worms, because the number of states in the CTMC directly corresponds to the number of hosts located in the computer network under consideration. Therefore when the propagation processes of the Internet worms are modeled by the CTMC, it requires a huge number of states. That is, the CTMC modeling is not so tractable to evaluate the realistic Internet-worm propagation and has a limitation. In this paper, we consider a stochastic model to evaluate the probabilistic behavior of Internet-worm propagation based on a simple birth and death process. This model can be regarded as an approximate model with the well-known stochastic SIS (Susceptible-Infected-Susceptible) model which is used to represent the propagation of Internet worms as well as physiological viruses. One of the advantages of using the approximate model is to derive the probability generating function (PGF) of the number of infected hosts. In other words, the explicit probability mass function (PMF) of the number of infected hosts can be derived from the algebraic approaches based on the PGF. The PMF enables us to perform the transient analysis of propagation of Internet worms. More precisely, we define some dependability measures for the propagation of Internet worms based on the PMF, and investigate the sensitivity of these measures in numerical experiments. 2. Stochastic SIS Model The propagation of Internet worms is often modeled by the stochastic SIS model. The stochastic SIS model is an extensive model from the deterministic SIS model, where both of them are used for behavioral analysis in the epidemiology 6 . Consider two possible states for one host computer; a susceptible state (state S) and an infected state (state 7). The state S means that the corresponding host computer is eventually infected with the Internet worm. Once the host is infected, the state of the host becomes I. When the elimination of Internet worm is successful, i.e., the worm is removed, the state of the host becomes S again. Let {X(t);t > 0} denote a stochastic process representing the number of infected hosts at time t with PMF: pfc(t) = Pr{X(i) = k}.

(1)

Assuming that the total number of hosts is M, we have the transition probability of the number of infected hosts for the time period [t, t + At) as follows. Pr{X(t + At)=j\X(t)

= i}

678

' Pi(l - i/M)At + o(At), fiiAt + o{At), 1 - {pi{\ - i/M) + ip,}At + o(At), k o(At),

j =i+ 1 j =i-l j =i otherwise,

( ) [

'

where (3 and /x are the pairwise infection rate and the removal rate, respectively. Also, o(At) is the higher order function of At, i.e., limAt->o At = 0. By taking the limit of the above equation, At —> 0, we obtain the following differential-difference equations: jtPo(t) = 6lPl(t),

(3)

^Pfc(t) = €k-iPh-i(t)+5k+iPk+i{t)-(£k+fa)Pk(t),

(4)

A=l,...,Af-l, ^TPM(*)

= iM-\PM-i{t)

- (£ M + 6M)pM{t),

(5)

where ft = f3k{\ — k/M) and 8k = p,k. Prom Eqs. (3)-(5), the stochastic SIS model is represented by state-dependent parameters £& and 5k- Roughly speaking, the resulting stochastic process from the above differentialdifference equations is reduced to the birth-and-death processes. The stochastic SIS model, several kinds of birth-and-death processes are used in the epidemiology; for example, the SIR (Susceptible-Infected-Removal) model, the SIRS (Susceptible-Infected-Removal-Susceptible) model, the Predator-Prey model, etc. On the other hand, based on the stochastic SIS model, we can define some dependability measures such as hazard probability and extinction probability in order to evaluate and predict the propagation of Internet worms 4 . However, as mentioned before, since the stochastic SIS model consists of a huge number of states, the computation of these dependability measures have to be made approximately. 3. Approximation of Internet-Worm Propagation 3.1. Approximation

by Simple

Birth

and Death

Process

In this section, we propose an approximate model to represent the transient behavior of Internet-worm propagation based on the stochastic SIS model. First we prove that the stochastic SIS model can be approximated by the simple birth and death process in the case where the number of states is large. Suppose that the number of hosts is extremely larger than the number of infected hosts, that is, we make the following assumption:

679

• Assumption: the number of infected hosts is sufficiently smaller than the number of hosts in overall the Internet. Under this assumption, the fraction of susceptible hosts over the whole hosts is considerably close to one, i.e., (M — k)/M —> 1 as the number of infected hosts k extremely decreases. Therefore, in the formulation of £&, replacing (3(1 — k/M) by a constant value A yields the following differential equations: jtPo(t) = SlPl(t),

(6)

-jTPfc(0 = €k-iPk-i{t)

+ 5k+iPk+i{t) - (& + 5k)pk(t),

(7)

k = 1 , . . . , oo, where £& = Afc and 8k = fik. This stochastic process is classified into one of the birth-and-death processes, and particularly, the process with simple rates £& and 8k is called a simple birth and death process. From the above assumption, this model describes the situation where the number of infected hosts at the early stage of influences of Internet worms is quite small.

3.2. Transient Process

Analysis

based on Simple Birth and

Death

Consider a stochastic process {X(t);t > 0} representing the number of infected hosts at time t as the simple birth and death process. Then the PMF of the number of infected hosts and the initial number of infected hosts are denoted by pn(t) = Pr{X(t) = n\X(0) = N} and X(0) = N, respectively. The state-dependent parameters of the simple birth and death process are given by £fc = Afc and 6k = /xfc, where fc is the number of infected hosts, A and /J, are called the propagation rate and the removal rate of Internet worm, respectively. The PGF for the probability mass function Pk(t) is defined as oo

P(z,t) = E[zx^} = J2pk(t)zk.

(8)

k=0

From Eqs. (6) and (7), we have

^ ^ . = \M(1-Z)+XZ(Z-1)]2^1

an d

P(z,0) = ztf.

(9)

680

The above partial differential equation can be solved algebraically7. That is, the PGF of X(t) is given by

Next consider the PMF explicitly from the above PGF. For example, Kyriakidis 8 attempts to derive an explicit form of a simple birth and death process with catastrophe based on a moment approach from the PGF. Chao and Zheng9 and Zheng and Chao 10 discuss the transient analysis based on the inverse Laplace transform for somewhat different processes from a original simple birth and death process. This paper applies the inverse Z-transform into deriving the PMF. Based on the binominal theorem, Eq. (10) can be straightforwardly rewritten by

N

t=0 r

^-(A-M)* -

fj, ^

Ae-(A-/x)t

_ ^

. N—i |

ne-(x-n)t - \ ~ \e-(.x-i*)t - \ J •, N-i

(11) Me-(*-(.)i_A'

5

.

Since there is one term including z in Eq. (11), it is enough to take the inverse Z-transform for this term. Note that the PGF of negative binomial distribution is equal to this term. Therefore, we have the following inverse Z-transform:

Ae-^-»)'-A

r

(N + n - i - 1)! J Ae-< A -^' - A \ nl(N-i-l)\ ue-(A-")*-A

(12)

Finally the PMF of the number of infected hosts based on the simple birth

681 and death process can be derived as M-Y*

Pn{t) - }_,

N\{N + n - i - 1)1

ilnl(N

_ i _ ^N

J Ae-(A-">* - A»

_ {y J Ae -(A- M )« _ A

t=0

ne-(\-i*)t Ae

- x ~ \e-(\-n)t

-(A-^_Ar

^e-(\-u,)t 3.3. Dependability

_ AJ (13)

- Af '

Measures

We define two dependability measures to evaluate the transient behavior of Internet-worm propagation: extinction probability and hazard probability. The extinction probability, Pg;(i; N), is defined as the probability that none of hosts is infected with the Internet worm at an arbitrary time t provided that the number of infected hosts is N at the initial time t = 0. From the PGF or the PMF of the number of the infected hosts, the extinction probability is given by

«•<*">=(£££:;) -

<">

On the other hand, we define the hazard probability, Pn(t;N,H), as the probability that the number of infected hosts exceeds H at an arbitrary time t provided that the number of infected hosts is ./V at the initial time t = 0. Then we have P„(t; N, H) = Vx{X(t) > H\X(0) = N} oo

H-l

Pk(t) = i - £ ?*(*). k=H

(is)

fc=0

where pk (t) is the PMF of the number of infected hosts, as Eq. (13). Both of measures, PE(*; JV) and Pn(t;N,H), can estimate quantitatively the force of infection of Internet worms. 4. Numerical Example This section gives a numerical example to calculate the hazard probability. Figure 1 illustrates the behavior of hazard probability that the number of infected hosts exceeds 2N, where N is the initial number of infected hosts.

682

Figure 1.

Transient behavior of PH (t; N, 20).

In this case, we set N = 10, A = 0.5 and /J. = 0.3. As Fig. 1 shows, the probability, PH (t; 10, 20), gradually increases as the time t increases. Similarly, as the initial number of hosts increases, the probability also increases. In both increases, the hazard probability eventually converges to one. In this example, the model parameters are set as A > fi, and the force of propagation is stronger than the removal. Other numerical experiments in some similar situations cause the phenomenon where the hazrd probability converges to one. On the other hand, in the case of A < fi, we often observe that the propagation is eventually terminated with depending the initial number of infected hosts. Thus when we focus on the quantity of p = A//x, it can be regarded as a simple measure on the propation risk of Internet worms, and the termination of propagation requires this measure to be less than one.

5. Conclusion In this paper, we have derived the transient probability of the Internetworm propagation based on a simple birth and death process. The simple birth and death process has been derived as an approximate model for the stochastic SIS model under the assumption on the huge number of hosts.

683 This case can b e regarded as t h e propagation of Internet worms a t t h e early stage of their influences. In t h e approximate model based on t h e simple b i r t h and d e a t h process, b o t h P G F and P M F of t h e number of infected hosts can b e explicitly derived by an inverse Z-transform approach. Moreover, two dependability measures; probabilities of extinction and hazard, are denned t o estimate t h e quantitative force of infection of Internet worms. In future, we will analyze the transient behavior of existing Internet worms based on t h e proposed model, and examine t h e prediction ability of propagation of Internet worms with real d a t a .

References 1. J. O. Kephart and S. R. White, Measuring and modeling computer virus prevalence, Proceedings of the 1993 IEEE Computer Society Symposium on Research in Security & Privacy, pp. 2-15, 1993. 2. J. C. Wierman and D. J. Marchette, Modeling computer virus prevalence with a susceptible-infected-susceptible model with reintroduction, Computational Statistics & Data Analysis, 45, pp. 3-23, 2004. 3. H. Okamura, H. Kobayashi and T. Dohi, Dependence of computer virus prevalence on network structure -Stochastic modeling approach, Proceedings of Asian International Workshop on Advanced Reliability Modeling, pp. 379386, 2004. 4. H. Okamura, H. Kobayashi and T. Dohi, Markovian modeling and analysis of Internet worm propagation, Proceedings of 16th International Symposium on Software Reliability Engineering, pp. 149-158, 2005. 5. L. J. S. Allen and A. M. Burgin, Comparison of deterministic and stochastic SIS and SIR models in discrete time, Mathematical Biosciences, 163, pp. 1-33, 2000. 6. G. H. Weiss and M. Dishon, On the asymptotic behavior of the stochastic and deterministic models of an epidemic, Mathematical Biosciences, 11, pp. 261-265, 1971. 7. L. J. S. Allen, An Introduction to Stochastic Processes with Applications to Biology, Peason Education, Inc., New Jersey, 2003. 8. E. G. Kyriakidis, Transient solution for a simple birth-death catastrophe process, Probability in the Engineering and Informational Sciences, 18, pp. 233-236. 2004. 9. X. Chao and Y. Zheng, Transient analysis of immigration birth-death processes with total catastrophes, Probability in the Engineering and Informational Sciences, 17, pp. 83-106, 2003. 10. X. Zheng, X. Chao and X. Ji, Transient analysis of linear birth-death processes with immigration and emigration, Probability in the Engineering and Informational Sciences, 18, pp. 141-159. 2004.

This page is intentionally left blank

PART IX STATISTICAL QUALITY CONTROL

This page is intentionally left blank

IMPROVED INSPECTION SCHEDULE FOR A BATCH MODE PRODUCTION SYSTEM * JIH-AN CHEN* Department of Business Administration, Kao-Yuan University, NO.1821, Jhongshan Lujhu Township, Kaohsiung County 82151, Taiwan

Rd.,

YU-HUNG CHIEN Department of Statistics, National Taichung Institute of Technology, 129, Sec 3, San-min Rd., Taichung 404, Taiwan In this paper, we introduce and deal an inspection schedule for a lot-sizing production system. The system of production process has a general deterioration distribution with increasing failure rate and non-self-announcing failures. Rather than develop a nonMarkovian shock model, we focus on a quantile-based reliability model. This research will also provide a strategy of inspection based on the economic production quantity and examples of Weibull shock models will be given to illustrate this inspection schedule.

1. Introduction Rapidly changing markets and the explosion of product variety has increased automation and the need for sophisticated production system. The problem of the determination of economic production quantity (EPQ) for production processes has been well studied in literature. The role of the condition of production system in controlling quality and quantity is well known and production system must be maintained in conforming conditions through adequate maintenance programs. There are many studies, such as Rosenblatt and Lee (1986), on the effects of deteriorating processes on economic manufacturing quantity. The classical economic manufacturing quantity model assumes that the output of the production system is defect-free. When developing EPQ models, consideration of controlling the quality of the product has generally not been taken into account. Optimal inspection policies for products with imperfect qualities were also investigated considerably. For further review of the EPQ topics with imperfect production processes, the reader This work is supported in part by the National Science Council of Taiwan under grant NSC 932218-E-244-003. Corresponding author. Tel.: +886-7-6077074; E-mail address: [email protected].

687

688

can refer to Sheu (2004). Suppose that failure of the production system can be detected only by inspection of product. And, inspection can not determine either the level of deterioration or the remaining life of the production system. Such is the case in many protective items such as circuit breakers and protective relays, as well as in spare or standby. If the production system is found to be failed by performing inspection, it is replaced immediately with an identical item and the production process need not to be broken down. Many studies, such as Rahim (1994), have effort on the production system which is repairable. Being motivated by this, proposed that the length of sampling intervals should be chosen in such a way that the integrated hazard over each interval should be equal. Yang and Klutke (2000) were also deal inspection policies for maintaining deteriorating equipment with non-self-announcing failures. Their inspection policies utilize information from the inspection/repair history as well as the system lifetime distribution to schedule future inspections. If inspection, replacement and downtime cost are available, they also derived the expression for long-term expected cost. Our objective is to present inspection policies and propose some solution procedures for simultaneous determination of the inspection intervals and the number of inspection in a production run. 2. Inspection Schedule 2.1. The Quantile-based Reliability in a Production System The most widely used but weak inspection schedule is perhaps constant interinspection. It is to schedule inspections periodically with the advantage of being simple to implement and relatively easy to analyze. Such a schedule has constant inter-inspection ht=h, for all /', and inspection ages {h,2h, •••}. The periodic inspection schedule tends to over-inspect at less likely failure times and under-inspect at more likely failure times. Thus, it employs no information about the remaining life that is inherent in the sequence of previous inspection times. Yang and Klutke (2000) proposed an inspection schedule QBI( cc0) with quantile-based reliability. Let the sequence of inspection ages be {x„x2,---} and we denote t as the time to failure. And, t can be expressed as a random variable with a continuous, strictly increasing distribution function F{t). The quantile-based reliability a0 is fixed and known. Then, the inspection ages are JC, = sup{x>0: P{t>x}>a0} and

(1)

689 xi = sup {x > JCM : P{ t > x 11 > xM }>«„}, for i = 2,3, • • •.

(2)

Thus, x,. = F-\a[),

for i = l , 2 , - .

(3)

If the distribution of the time to failure is increasing failure rate, then F{x + hi)/F{x) is non-increasing in the time to failure x for all ht>0. Thus, hM < hi, for / = 1,2, • • • . For lot-sizing production system, the cycle time of operation system L can be expressed as £ = JA,=F- , (O 0 "),

(4)

where m is the known number of inspection in the cycle time of operation system. We denote such a schedule as QBI. 2.2. The Balance of Risk under Fixed Economic Manufacturing Quantity The problem to determine the economic production quantity and the production cycle time for lot-sizing production system has been well studied in literature. For a Markovian shock model, a uniform inspection scheme provides a constant integrated hazard rate over each interval when the number of inspection m is determined. And such a schedule has constant inter-inspection time h = L/m and inspection ages {h,2h,---,mh} . Being motivated by this, Rahim (1994) extended the idea to non-Markovian shock models and proposed that the interinspection times h, = x. - x,_, should be chosen in such a way that the integrated hazard over each interval should be equal, i.e.: ['r{t)dt JO

= --\Lr{t)dt, yyt

(5)

JO

and \^r{t)dt=\\(t)dt,

i = \,-,m,

(6)

where I r{t)dt represents the expected failure number over interval [a, b) and x0 = 0. It is noteworthy that keeping a constant integrated hazard over each sampling interval is equivalent to stating that the probability of a shift in an interval, given no shift until its start, is a constant for all intervals. The deterioration of production systems is inherent to most manufacturing industries. If the distribution of the time to failure is increasing failure rate, the inter-

690

inspection times \,h2,•••,hm are also non-increasing, as expected. According to the lot-sizing production cycle time and the number of inspection m, we have the inspection ages {x1,x2,---,x„}: i

x, = '£ihJ,i = l,-,m.

(7)

7-1

We denote such a risk balance schedule of as RB. 2.3. The Hybrid Inspection Schedule For most manufacturing industries, the quantile-based reliability is a specific aim for standard operation process. And, it is an unknown but important objective to determine the optimal number of inspection and the inspection ages during each production run. When the cycle time is given and under the known quantile-based reliability, we have the initial inspection ages. And,

i-l

i»l

Thus, we could determine the least number of inspection m from (8) and the alternative inspection ages {y„y2,•••,)>„} can be generated by Eq.(6). I'' r(t)dt=

[*r{t)dt = —- [Lr{t)dt, i = l , - , m ,

J >,-i

Jo

(9)

yfi Jo

where y0 = 0. We denote such a hybrid schedule as HYB. 3. Discussion 3.1. Equivalence between QBI and RB While the production cycle time and the number of inspection are predetermined, we can use equations (5) and (6) to obtain the inspection ages {x1,x1,---,xm}. At the first inspection age X,, the reliability i?(x,) can be expressed as P{t > xx} = R(x,) = exp {- £' r(t) di).

(10)

It implies that x, = R-\exV{-\\(t) dt}) = F-\aa),

(11)

691 where a0 is the quantile-based reliability. And, at the z'th inspection age is x =^hj

,i = 2,---,m , we have P{t > x,} = R(Xi) = exp{-£r(t) dt) = e x p { - £ £ r(t) dt)}.

(12)

From Eq.(6): P{t > x,} = exp {-/ • £ r(t) dt) = (exp {- £ r # df})' = a't.

(13)

It implies that x = F " ' ( a j ) . Because that the production cycle time has been predetermined, the reliability is also determined at the same time under fixed number of inspection. On the other hand, if the quantile-based reliability has been predetermined, we have also the same sequence of inspection ages with schedule RB under fixed number of inspection. 3.2. The Domination of Hybrid Inspection Schedule We may make better decision to a given scenario with no loss of generality. Firstly, fixed lot-sizing batch mode of production and constant production rate could be anticipated from economic manufacturing quantity or the characteristic of batch mode in most industrial. Thus, we have the known production cycle time for each production run. In a complete production run, the least number of inspection, an unknown but important objective, may also be determined. Under schedule QBI, we have the inspection ages x, = F~'(a'0) with reliability: R(x:) = exp{- j'rft) dt} . Under schedule HYB, we have the inspection ages fromEq.(8)toEq.(10): V'r(t)dt = — • \Lr(t)dt< — - V"r(t)dt=

R(v)

expf-fVodf}

Thus, y, <x, . And, -±£1 = h R (*,) exp{-[r(t)dt}

V'r{t)dt.

„ = exp{\ r(t)dt} .Because of J> '

y, < x, , the ratio —^- = exp{ ?r(t) dt) > e° = 1, i.e. a = R(y,) > R(x.) = aa x h R(x,) R( i) We could improve the reliability by adopt schedule HYB 4. Example To demonstrate the applications of our proposed procedure under nonMarkovian shock models, a few special cases of Weibull failure mechanisms are

692 considered in this section. Recall that the Weibull distribution function with scale parameter A and shape parameter ji is given by F(t) = l-exp{-(AtY} for t > 0. The corresponding failure rate is r(t) = (A/3) • (At)"''. For table 1, we use a0 = 0.990 and A = 0.05, p = 2.00. The mean time between failure (MTBF) is 17.7245. The first inspection age is 2.005 with reliability 0.990. Because of linearly increasing failure rate, the inter-inspections are decreasing. Table 1. Inspection schedule for time to failure distributed as Weibull(0.05,2.00)

i xj

i

2

3

4

5

6

7

2.005

2.836

3.473

4.010

4.483

4.911

5.305

K

2.005

0.831

0.637

0.537

0.473

0.428

0.394

R(*,)

0.990

0.980

0.970

0.961

0.951

0.941

0.932

y,

1.890

2.673

3.273

3.780

4.226

4.629

5.000

k

1.890

0.783

0.601

0.506

0.446

0.403

0.371

R(y,)

0.991

0.982

0.974

0.965

0.956

0.948

0.939

In table 1, suppose the production cycle time is 5 and the quantile-based reliability should be larger than 0.99. The number of inspection 7 can be determined from Eq.(8) and the inspection ages can be generated by Eq.(9). Under HYB schedule, the inter-inspections are decreasing similar to QBI schedule and the inspection ages advance against schedule QBI resulting in higher reliability. 5. Conclusion In this paper, we introduce, modify and compare some inspection policies for deteriorating lot-sizing production system subject to a non-Markovian random shock. These policies are simple to schedule and easy to implement, thus making it more realistic and reflective of situations of the known production condition. Weibull shock models with increasing failure rates are also considered as seen in the simulation example. Also, the modified inspection schedule is used to improve the reliability of the whole production cycle. References 1. 2. 3. 4. 5.

M.A.Rahim, HE Tran. 26, 2(1994) M.J. Rosenblatt and H.L. Lee. HE Tran. 18,48 (1986) S.H. Sheu and J.A. Chen, USS. 35, 69 (2004) S.H. Sheu, J.A. Chen and Y.H. Chien, AIWARM. 451 (2004) Y. Yang and G.A. Klutke, Prob. Eng. Inf. Sci. 14,445 (2000)

EVALUATION OF MULTI-PROCESS CAPABILITY BY A FUZZY INFERENCE METHOD T. W. CHEN Department of Mechanical Engineering, National Chung Hsing Taichung, Taiwan ROC

University

T. C. WANG Department of Mechanical Engineering, National Chung Hsing Taichung, Taiwan ROC Process capability indices C , Cpt , C ,„ and C

University

fitting for nominal-the-best type

quality characteristics, are an effective tool to assess process capability since these indices can reflect a centering process capability and process yield adequately. The index c introduced by Greenwich and Jahr-Schaffrath (1995) provides additional and individual information concerning the process accuracy and the process precision. Although c is useful to evaluate process capability for a single product in common situation, c

cannot

be applied to evaluate the multi-process capability. Referring to Vannman and Deleryd's (C;,, Q,)-plot, a fuzzy inference method is proposed in our study to evaluate the multiprocess capability based on values of a confidence box calculated from sample data. This method takes the advantages of fuzzy systems such that a grade instead of sharp evaluation result can be obtained. An illustrated example of ball-point pens demonstrates that the presented method is effective for assessment of multi-process capability.

1. Introduction Process capability indices (PCIs) are effective tools for the assessment of process capability indeed since the formulae of PCIs are easy to understand and straightforward to apply. Greenwith and Jahr-Schaffrath [1] introduced a new index CPI„ which provides an uncontaminated separation between information concerning process accuracy and process precision. It has been widely used to provide numerical measures on whether a production is capable of producing items within the specification limits preset by the designer. The index Cpp can be defined as:

c

+

--tefT feT693

694

where ^i is the process mean, a the process standard deviation, d the half the length of specification interval = (USL - LSL)/2, USL is the upper specification limit, LSL the lower specification limit, and T the target value. Now, let the inaccuracy index Cdr and imprecision index Cdp be defined as:

C =JL

* f

c

*=7

(2) (3)

Obviously, one can recognize that Cpp = (3Cdr)2 + (3Cdp)2 • Cpp (including Cdr and Cdp) provides additional information concerning the process accuracy and the process precision. Index Cpp detects process inaccuracy and process imprecision by using indices Cdr and Cdp. Thus, Cpp is a deter choice for engineers measuring process potentials and performance. Although Cpp is useful to evaluate process capability for a single product in a common situation, Cpp cannot be applied to evaluate process capability for that of multi-process. Thus, we extend the applicability of the contour plot for processes with multiple characteristics. In addition, we also apply the method developed by Chen et al. [2] who introduced a process capability plot, called the MCPCA control chart, which is an adjustment of Vannman and Deleryd's (Cdr, Cdp)- plot [3] where Cdr = (f.i - T)ld and Cdp = old. Referring to (Cdn Qp)-plot, we propose a fuzzy inference method to assess the process capabilities of multiprocess. The concept of fuzzy sets was first proposed by Zadeh [4]. Now, fuzzy theorems have been applied in many fields such as automatic control, manufacturing system and decision-making [5-8] in industry. In this paper, a fuzzy inference method is proposed such that process capability can be assessed. This fuzzy inference evaluation will consider Cdr and Cdp to formulate new indices as input and obtain a result value as output. In addition, illustrated example and evaluation procedure will be presented for ease of applications. 2. Process Capability Confidence Intervals The index Cpp can directly be used to assess process capability when 100% inspection is applied. Instead of 100% inspection, acceptance sampling is most likely to be useful when the testing is destructive, or the testing cost is extremely high, etc. Generally, only estimated capability index Cpp by using a sample can be obtained in practice. Each process has the same product specification and target value. Thus, the natural estimator of Cppl can be written as the following:

695

(Xi-TY

C

+

D2

-±r,i=\,2,...,k

D2

(4)

where D = d/3, X, and Sf are the sample mean and sample variance of process i with sample size ni. The probability density function of Cppi (see Chen [9]) is:

ff (*)

(nD2/a1)x

(nD2

^ - T ^ ) ^a J l^o

f

fK{^-x-y)fr](y)
J 0

(5)

where x > 0, 7; is distributed as chi-square with (1 + 2/) degrees of freedom and "-(A/2)(A/2)J

pn.)-

r-

Furthermore, the mean value and variance about C

' \~-ppi ppi) 1

Var (c •) v-'ppi I

is:

CD,

2a" nD4

nDA

Let /} and a be the unbiased estimators of p and a, then we have /} = X , a = c 4 x S and c4 = yJ2/(n — l) • r [ « / 2 j / r [ ( « - l ) / 2 ] . The factor c4 is a function of the sample size n and c4 approaches to unity when the sample size is large enough. Under normal assumption, (H-1)[(C 4 (T)/
M~ta/4,("-l)

XC

a

„

4 X~f='^ '4~n +ta/4,(n-l)

\{n-\)xcl X\-au(n-\)

xa2

XC

4

X

(«-l)xe 4 2 x(T 2 Xai^n-\)

a

~r=

(X,,X„)

(Y,,YU)

(6a)

(6b)

where ta/A ,„_n is the upper quartile of t distribution with (« - 1) degrees of freedom; X\_an{n-\) and x\i\{n-X) are the upper percentile of chi-square distribution with (« - 1) degrees of freedom. The joint confidence intervals of ju and a are used to formulate a confidence region and applied to reveal the process capability.

696 Referring to Vannman and Deleryd's (Cdr, Q ; )-plot, we propose a new method to assess the process capabilities of multi-process. Since Cdr = {/u - T)ld and Cdp = a/d, thus the confidence region in the plot of Cdr -Cdp should be changed as: Upper-right coordinate: ((Xu - T) I d, Yu I d) = (X ru , Yru)

(7a)

Bottom-right coordinate: ((Xu - T) I d, Yu I d) = (X ru , Yr,)

(7b)

Upper -left coordinate: ((X, - T) I d, Yu I d) = (X rl , Yru)

(7c)

Bottom -left coordinate: ((X, - T) I d, Y, I d) = (X rl , Yrl)

(7d)

where Xu, X,, Yu and Yt are described in equation (6). And the maximum estimated index (C ) could be calculated as: v

pp /max

(CPP )max = max[ (3Xm f + (3Yru ) 2 , (3Xrl f + (3Kra ) 2 ]

(8)

To reduce the influence of sampling errors, we shall use values of mentioned confidence box described in equation (7) to afford a more reliable assessment. In the plot of Cdr -Cdp, there are two process capability values for PI and P2 as seen in Fig. 1, one can recognize that the process capability is adequate if the confidence box is inside of the line Cpp = 1 (process P2) and process capability is inadequate if the confidence box is outside of the line Cpp = 1 (process PI). The larger distance of this confidence box to the coordinate original point (0, 0), the worse process capability. Let the nearest distance from the coordinate original point to each confidence box be Rmin in Fig. 1 and the most far distance from the coordinate original point to each confidence box be Rmax, then Rminar\d Rmax can be defined referring to equation (7) as: Rmm = min (Jxl+Y?u, Rmax = max{]x2m

fx2ru+Y*, jx2+Yr2u,

^X^+Y2,)

+ Y2U,]x2ru + Y2 ,]x2rl + Y2U ,^X2+

(9) Y2,)

(10)

In our study, Rmi„ and Rmax are used to represent the process capability for each model and a reliable assessment is achieved by using the magnitudes of Rminand

3. Fuzzy Inference Method for Process Capability In this section, a fuzzy inference method is proposed to assess the process capability for multi-process. As stated in section 3, Rmin and Rmax are used to

697 assess the process capability. The larger values of Rminand Rmm for each model, the worse process capability. Let [ Rmmi, Rm3xi ] and [ RmmJ, RmaxJ ] be the nearest and the most far distances in the plot of C& -Cdp for the process / and j , respectively. Consider [Rminj, ^ m a x ,] and [Rminj-, ^max_,-] as set of two lines in same axis, then the comparison of two processes can be represented by statistical method. In order to distinguish the equal grade of capability (for process i and j) in different intersection, a method to incorporate the fuzzy inference with a process capability index is then proposed. An approximating rule-based reasoning approach is used for quantitative analysis. In our study, the process i is said to be superior to process^' when the value of inference result is positive. The larger of result value, the more capable of process i than that of process j . Oppositely, the negative value of result implies that the process / is inferior to process j . In other words, the process i is said to be completely better-quality than that of process j when the value of inference result is equal to 1; the process i is said to be completely same-quality to that of process j when the value of inference result is equal to 0 and the process / is said to be completely worse-quality than that of process j when the value of inference result is equal to -1. The result value of inference within {0, 1} or {-1, 0} is used to represent the different grade of capability. The h processes are tested two at a time, there are hC2 = h(h - l)/2 possible paired comparisons. Let the indices 8 and y be defined as R X_

— R

mini

max/

/ii\

max(RmiSKJ,Rmmj) R

—R

max/

/ =

: maX

min /

—•

., _.

(12)

(-Rmax/'^maxy)

Then the fuzzy inference systems are composed of two inputs and one output. Generally, the procedure of fuzzy analysis consists of four steps: definition of input/output fuzzy variables, fuzzy rules, fuzzy inference and defuzzification [6]. (1) Definition of input/output fuzzy variables. In our study, we adopt the triangular and trapezoid types as MFs for the sake of simplicity and easy to describe the asymmetric property. The triangular MF is specified by three parameters {a, b, c) which determine the three corners of triangle. Furthermore, the trapezoid MF is specified by four parameters {a,b,c,d}.

698 The universe of input variables is defined in {-1, 1} as shown in Fig. 2. Membership functions of input 8 are defined as N4 (negative), N3, N2, Nl, ZE(zero) and PO (positive), respectively. Also, input y are defined as NE (negative), ZE, P\ (positive), P2, P3 and P4, respectively. In addition, the output variables are composed of seven triangular MFs for representing Z3 (inferior), L2, LI, EQ, 51 (superior), 52 and 53 as shown in Fig. 3. (2) Fuzzy rules. Fuzzy rules are important to successful inference result [7]. A rule base represents the experience and knowledge of experts. The fuzzy rules are similar to the intuitional thinking of a human. A fuzzy inference system, composed of two inputs and one output, could employ this kind of fuzzy rule as If xl is An and x2 is Ai2 thenyis Bt (for/= 1 tow), In this study, the fuzzy inference system is applied to assess the process capability by using the confidence interval values of Rmin and Rmax (defined in equation 7). Thirty-three if-then rules are employed in our study. They are: Rule \:if{8

is PO) and(y

is PA) then (result is L3).

Rule 2: if(5

is PO) and(y

is P3) then (result is 13).

Rule 33: if {8 is N4) and(y

is NE) then (result is 53).

The tabulated fuzzy rules are listed in Table 1. The fuzzy rules indicated in Table \,if(8 is PO) and(y is ZE) or (y is NE) in addition to / / ( 8 is ZE) and (y is NE), are never happened since the definition of two input variables always exists y>8. Table 1. The tabulated fuzzy rules.

8 PO ZE N\ N2 Nl NA

PA

Pi

P2

P\

Li 12 12 L\ L\ EQ

Li L2 12 L\ EQ S\

LI 12 L\

Li LI

EQ

EQ S\ S\

Si 52 52

ZE

NE

EQ 51 52 52 53

Si 53 53 53

\

(3) Fuzzy inference. Fuzzy inference is an inference procedure to derive conclusion based on a set of if-then rules. In this paper, the Mamdani inference

699

>

US

\ "f

j! -0.6

N..-1 °

/'

P1

-0.4

K

V

" • . „ \

-0.2

Hi

0

0.2

0.4

0.6

0.8

Fig. 1 Multi-process capability analysis plot.

input membership functions

-0.4

-0.2

NE 0.8 0.6 0.4 0.2

0 0.1 Input 8

TO

PI

/

YYVVY A A A A

IvVVv^\

-0.1 0

0.2

0,4

Fig. 2. Membership functions of input variables. output membership functions

Fig. 3. Membership functions of output variable.

700

method [8] that employs the maximum-minimum product composition to operate fuzzy if-then rules is adopted. Let the rule be: if xx = Ax and x2 = A2 then y = B, then the result of inference can obtain a fuzzy set with MF of B- as jUB;(y) = max{min[//4. (*,),MA;2(*2)>HR,(*i>X2>>0j}>

(13)

where VR, (xi,x2,y)

= min[,«4i ,MAi2,MB, (>')] •

(4) Defuzzification. The fuzzy sets of B- are obtained by step (3), then the defuzzification is used to find a crisp value y e Y which represents the fuzzy sets. The frequently used defuzzification methods have: weight, area and height method in [6]. The weight defuzzification method is used in our study, and then we have \yB{y)dy y* = —r )B{y)dy

•

(14)

Y

Result of fuzzy inference, performed by a Matlab Logic Fuzzy Toolbox [10], is then used to represent the process capability for each model. 4. Procedure of Fuzzy Evaluation and Illustrated Example Below, an example is taken to state the proposed procedure in detail. To illustrate how the procedure may be applied to collected data, the following case on four manufacturing processes of ball-point pens is presented. In this case, four models of ball-point pens that are named as PEN1, PEN2, PEN3 and PEN4 models. The fitness of the cap to the body is considered as an important characteristic to avoid losing the cap of the ball-point pen or hard to pull the cap out; that is, the fitness cannot be too tight or too loosely. The critical measurement is how much strength is required to pull out the cap from the body of the ballpoint pen and the strength unit is measured in kilogram. The specification limits are set to 2.5±1.0, that is, the upper/lower specification limits are set to USL = 3.5, LSL = 1.5, and the target value is set to T= 2.5. Table 2 displays the concise information of these four models. The fuzzy evaluation procedure is stated as follows:

701 Step 1 :Determine the sample size n = 60 for all manufacturing processes, then the values of mean and standard deviation are calculated as indicated in Table 2. Also, the significant level is given 0.05. Step 2: Compute the values of c4 and a . Also, calculate four coordinates values of each confidence box described in equation (7). Step 3: Compute the nearest and the most far distances (Rmi„ and Rmax) in equations (9) and (10) for each process. Table 2: Process capability value for four processes.

Process

x,

s,

PEN1

2.52

0.202

0.2011

0.1720

0.2661

0.6373

PEN2

2.51

0.250

0.2489

0.2168

0.3252

0.9519

D

/Y

D

min /

A

PEN3 2.44 0.300 0.2987 0.2502

0.4054

1.4788 *

PEN4 2.59 0.197 0.1962 0.1662

0.2887

0.7499

max/

max /

Step 4: Compute the indices 8 and y for two-process pairs (there are six pairs) and thus to obtain the fuzzy evaluation results through the proposed fuzzy inference system. The above calculations are performed through developed Matlab program. As indicated in Table 3, one can recognize the model of PEN1 is the best one among these four processes since all values of inference result are positive for competing pairs (PEN1 to PEN/, for _/=2, 3 and 4). Furthermore, model of PEN3 is the worst process. Table 3: Fuzzy inference results. Pairs (;' toy)

1 ^min/ '

max/ J

PEN 1 to 2

[0.1720,0.2661]

PEN 1 to 3 PEN 1 to 4

I

min / ' ^ m a x / J

8

Y

Result

[0.2168,0.3252]

-0.4710 0.1515

+0.50

[0.1720,0.2661]

[0.2502, 0.4054]

-0.5756 0.0392

+0.87

[0.1720,0.2661]

[0.1662,0.2887]

-0.4040 0.3461

+0.16

PEN 2 to 3

[0.2168,0.3252]

[0.2502,0.4054]

-0.4651 0.1860

+0.42

PEN 2 to 4

[0.2168,0.3252]

[0.1662,0.2887]

-0.2208 0.4889

-0.35

PEN 3 to 4

[0.2502,0.4054]

[0.1662,0.2887]

-0.0948 0.5900

-0.65

5. Conclusion Process capability indices of Cpm and CPI, are proved that it can reflect the centering process capability and process yield adequately and is used to afford a

702

numerical measure on whether a production is capable of producing items within the specification limits. The index Cpp provides individual information concerning the process accuracy and process precision. Referring to Vannman and Deleryd's (Cdr, C
A NEW PROCESS IMPROVEMENT CAPABILITY INDEX OF CONSIDERING COST TO SELECT SUPPLIER

K. S. CHEN Department of Industrial Engineering

& Management,

National Chin-Yi Institute of Technology, Taichung, Taiwan, R.O.C. E-mail •'

[email protected]

S. L. YANG Institute of Production System Engineering and

Management,

National Chin-Yi Institute of Technology, Teaching, Taiwan, R.O.C.

J. M. HUANG, C. Y. HSIEH Department of Mechanical Engineering, National Chin-Yi Institute of Technology, Taichung, Taiwan, R.O.C. A lot of businesses bring process capability index into be measurement quality tool. Process capability index Cpm is now in widespread uses in industries because of process capability index Cpm amply react to loss of the process and yield of the process. In general, manufactures select suppliers just only use the index Cpm to evaluate the suppliers' process capability. However, coming the supply chain competitive times, manufacturers have to realize how to select suppliers and subcontractors to be key point work of the supply management. Therefore, for original suppliers, we matched up the process capability index Cpm and process improvement capability index CPIM to evaluated, measured the suppliers' process capability and reduced manufactures' improvement cost. When suppliers' process capability not enough, we can further make use of process improvement capability index CPIM to measure suppliers' process improvement capability, decrease internal company's process of improvement cost effectively, and improve product's quality and production to get up to the goal of business operation forever.

1.

Introduction

Quality is by no means a new concept in modern business James R. Evans and William M. Lindsay (2002). Now, a lot of businesses bring in process capability index to be measurement quality tool (Mats Deleryd, 1999). Measure the 703

704

process quality of products need to compare the process mean fl and the process standard deviation (7 at the same time. Due to the process of products result from each product hasn't been the same specifications. In fact, process capability index definition is based on above view. Process capability index is a function value for process distribution parameter value, and specifications limit. Kane (1986) proposed that process capability indices Cp and Cpk, however, process capability indices Cp and Cpk can't react to the process loss because that indices Cp and Cpk is based on yield definition. Therefore, Chan et al. (1988) propound the index Cpm which is an index can be responds the process loss sufficiently. Pearn et al. (1992) provided that take index Cpm measure process loss also can really react to the process capability and the process loss. Besides, Govaerts (1994) pointed out the relationship between index Cpm and the process yield is Yield%^20(3 Cpm)-1 when index Cpm value largely enough. It is clear that the index Cpm not only reaction a process loss but also reaction a process yield. Define to of the index Cpm as follows (1): c

USL-LSL

=

d

_

1

.

(1)

According to Phadke (1989) & Pearn & Chen (1997) pointed out that the process loss includes the following two factors: (A) one factor is the process specification not on target (precise of the process not enough) (B) the other one factor is the process variance extremely very big. These two factors will make the index Cpm to be changed. When these two factors loss small will have high index value Cpm. Conversely, when these two factors loss upper and then will have low index value Cpm (at this time mean that process capability is not enough). Obviously, index of Cpm is an excellence index to measure the process capability. However, Chen et al. (2005) pointed out the enterprises (manufacturers) need to realize that how to select and contract their suppliers and manufacturers and apply supply chain management to be an important task in the current competitive global business environment. The competitiveness of an enterprise can be increased by using strategic alliances to integrate supply chain parties and activities. Although manufacturers and subcontractors themselves could take index Cpm measured product's quality, and evaluated the process capability. However, when manufacturers want choose one of suppliers to continued collaboration to take the index of Cpm selected new suppliers and evaluated subcontractors, the index Cpm has no reaction for manufacturers who want to have a completely result of measured standard for process improvement capability of the primal suppliers. Hence, this paper will basis for the index Cpm model to consider the improvement cost and focus on this problem to provided a

705

process improvement capability index, so that the index Cpm can judge and evaluate the process capability of original suppliers with process improvement capability index CPiM. When the process capability Cpm for suppliers not enough, we will think over to improve precise or accuracy of the process capability first. And then we could use process improvement capability index CPiM to measure whether suppliers' process improvement capability superiority or not. Manufacturers by operating these two indices, process capability index Cpm and process improvement capability index CPIM to find out real potential, capable and continued collaboration's suppliers; enhance quality of the process, get more profit with suppliers, and advance the competition for whole supply chain. 2.

Concept of Cost to Set up a Process Improvement Capability Index Model

Above section I, process capability insufficiently (Cpm not arrive standard of quality's needed) includes the following two factors: one factor is the process precise not enough, the other factor is the process accuracy deficiency. Usually, low improvement cost originate from precise not enough, high improvement cost come from accuracy shortfall. The index Cpm can be measured the process capability but couldn't to be evaluate the process improvement capability. Assuming two factories, A. and B. factories produce the same products simultaneously. Where Tis target value, d- (USL-LSL)/2 and T± d is product's upper and lower specification limits. Hypothesizing CI is a cost unit to improve the process precise and C2 also is a cost unit to improve the process accuracy. Supposing supplier A. that the process average of product move did distance from target (The process mean fJ.) and the process standard deviation a is d/3, the index value Cpm will equal to 0.9. According to Chen et al. (2001) pointed that the process capability is not enough need to be improvement when process capability index value Cpm is smaller than 1. Based on definition of six sigma for American Motorola Corporation allows that the process shift dIA distance from not on target (six sigma tolerate the process shifts under 1.5 a ) . So that, the process of supplier A. shifts distance in the six sigma tolerances range. Thus, accuracy shortfall is a really reason for the process capability not enough. After analysis by quality team, maybe scarcity of materials in market is a main effect factor due to the origin short of materials led materials often needed to be changed, led to the process variance big due to difference bench come from difference materials sources and a lot of cost needed to input in improvement quality of product. For convenient to solve this problem, maybe increase purchase of cost, check product's quality from materials, or monitor and control product's quality. On the other hand, supplier B. shifts d/3 distance from target, standard deviation is did, at this time the index value Cpm as same as

706

equal to A. supplier. (Cpm = 0.9) It is very clear that the process precise over dIA from tolerance range, so that the process capability not enough is a really reason for the process precise. After quality team analysis, maybe deflective machine parameters led precise insufficiency. As general, solving this problem needed to input a quite low cost to improve product's quality, such as adjusts the machine parameters back. Comparing with A. and B. suppliers, supplier A. spends decuple cost to improve accuracy of the process then B. supplier. The result is that supplier B. increases a little cost to improve precise of the process supplier A. higher then improvement cost. According to above view of improvement cost, defined process improvement capability index as following (2): c

-

* 3VC,a 2 +C 2 /? 2

(2)

Based on above, process improvement capability index Cp]M focus divergent improvements cost on suppliers. For advantageous application of business, we simplify process improvement capability index CPIM (make r =C2/ C|) as below (3):

C

l 2

3-yja +r/32

0)

To carry on, we were individually count out process improvement capability value are 0.53 and 0.31 from A. and B. suppliers'. Although A. and B. suppliers' process capability all not enough (both of the index value Cpm are equal to 0.9). Thus, B. supplier's process improvement capability is higher then A. supplier's process improvement capability. Obviously, B. supplier will be a top priority to input a shade of improvement cost in for manufactures also according to the index CPIM to select suppliers and promote suppliers' process capability of products.

3.

Estimation of CPlM

On the assumption that characteristic of the process X is normally distributed with mean fx and variance a 2 . Let Xh_X„ be a random sample from normal distribution. Has the same estimation ways with the index Cpm. In this paper, we took the sample mean * = £>,/» to estimate at//and the sample standard deviation $=[£" (*,-]o2/(w-i)~|"2 to estimate atcr. The nature estimation of CPIM as following (4):

707

C„u =

. 1 , 3ja2+r/32

(4)

With a-S I d and ft = (X -T)l d . And then we derive the expected value, variance in (4) as following (6):

x

y £f y=o

( ^ / a » 2 ) > T ( ( H - l ) / 2 + y) y! r(«/2+y)

Var(CPIM)-CPIM2(l+^r)(^-) a

(nB2/a22)J j\ n-\

^

4.

j 2

n 2

2 7

^ «

1

n

2

2

(»/J 2 /a 2 2y r((#i-l)/2 + y)

yAi-^V}. «

j 2

2-

n + 2j-2

' " ' " f t —f. 2

v

n^77r

2F/(

1

y+

1

2' 2' (6>

J

Conclusion

Quality is by no means a new concept in modern business. It's a principal competition of company to find out many methods to reduce operation cost effectively, raise high quality product, and increase production. Manufacturers have to put more improvement cost to assist their suppliers control and supervise suppliers' products quality or purchase new machines. For original suppliers, we matched up the process capability index Cpm and process improvement capability index CPiM to evaluated, measured the suppliers' process capability and reduced manufactures' improvement cost. Lately, When the process capability Cpm for suppliers not enough, we will think over to improve precise or accuracy of the process capability first. And then we could use process improvement capability index CPIM to measure whether suppliers' process improvement capability superiority or not. Manufacturers by operating

708

these two indices, process capability index Cpm and process improvement capability index CPiM to find out real potential, capable and continued collaboration's suppliers; enhance quality of the process, get more profit with suppliers, and advance the competition for whole supply chain. References 1. 2.

3.

4. 5. 6. 7. 8.

9. 10.

L.K. Chan, S.W. Cheng and F.A. Spiring, A New Measure of Process Capability: Cpm, Journal ofQuality Technology 20, 162-175 (1988). K.S. Chen, M. L. Huang, and R. K. Li, Process capability analysis for an entire product, International Journal of Production Research 39(17), 4077-4087(2001). K.S. Chen, K.L. Chen and R.K. Li, Contract manufacturer selection by using the process incapability index CPP , International Journal of Advance & Manufacturer Technology 26, 686-692 (2005). M. Deleryd, A pragmatic view on process capability studies, International Journal of Production Economics 58, 319-330 (1999). B. Govaerts, Private Communication to Kotz, S (1994). R. E. James, M. L. William, The Management and Control of Quality, Thomson, 1-2 (2002). V.E. Kane, Process capability indices, Journal of Quality Technology, 18, 41-52(1986). W. L. Pearn, S. Kotz, N. L. Johnson, Distributional and inferential properties of process capability indices, Journal of Quality Technology 24, 216-231(1992). W.L. Pearn, K.S. Chen, Process Improvement Capability Analysis Based on Expected Process Loss, JCSA 35(2), 151-160 (1997). M.S. Phadke, Quality Engineering Using Robust Design, AT&T Bell Laboratories, New Jersey, (1989).

JOINT OPTIMIZATION OF PROCESS MEAN AND TOLERANCE LIMITS FOR MULTI-CLASS SCREENING

SUNG HOON HONG Department of Industrial & Information Systems Engineering, Chonbuk National University, Chonju, Chonbuk 561-756, Korea IK JUN CHOI Department of Industrial & Information Systems Engineering, Chonbuk National University, Chonju, Chonbuk 561-756, Korea MIN KOO LEE Department of Information and Statistics, Chungnam National University, 220 Gung-dong, Yuseong-gu, Daejeon 621-759, Korea HYUCK MOO KWON Department of Systems and Engineering, Pukyong National University, Yongdang-dong, San 100, Nam-gu, Pusan 608-709, Korea Most models reported in the literature treat the determination of process mean and tolerance limits as two separate research fields. In this paper, the problem of jointly determining the optimum process mean and tolerance limits for each market is considered in situations where there are several markets with different price/cost structures. A profit model is constructed which involves selling price, production cost, penalty cost, and inspection cost. A Taguchi's quadratic loss function is utilized for developing the economic model for determining the optimum process mean and tolerance limits. A numerical example is given.

1.

Introduction

Consider the problem of selecting the optimum mean value for a continuous production process. All items are inspected to determine whether its quality characteristic satisfies a predetermined lower specification limit. Conforming items are sold at a regular price, whereas all others are reprocessed or sold at a 709

710 discount. Typical quality characteristics under consideration are weights, volume, number and concentration. Items produced by a production process may deviate from the process mean because of variations in materials, labor and operation conditions. The process mean may be adjusted to a higher value in order to reduce the proportion of the nonconforming items. Using a higher process mean, however, may result in a higher production cost. Consequently, the decision of selecting a process mean should be based on the tradeoff among production cost, payoff of conforming items, and the costs incurred due to nonconforming items. This problem has been studied by several researchers. Hunter and Karma solved the problem of selecting the optimum process mean when the nonconforming items are sold at a reduced price [11]. Bisgaard et al. extended Hunter and Kartha's model to a situation where the nonconforming items are sold at a price proportional to the amount of ingredient used [4]. Golhar considered the problem of selecting the optimum process mean in a canning process [8]. Boucher and Jafari, and Al-Sultan discussed situations in which the items are subjected to lot-by-lot acceptance sampling rather than complete inspections [1] [5]. Arcelus and Rahim developed a model for jointly selecting optimum target means for both a variables and an attributes quality characteristics [1], and Chen and Chung considered an economic model for determining the most profitable target value and the optimum measuring precision level for a production process [6]. Tang and Lo [16] and Lee and Jang [13] determined the optimum process mean and the tolerance limits when a surrogate variable is used in inspection. Hong and Elsayed studied the effect of the measurement error on the determination of the optimum process mean for a filling process [9]. In all the previous work, the inspected items are classified into two quality grades only; conforming items are accepted and nonconforming items are rejected. But, it is common practice to grade the outgoing items on the basis of the quality and then sell them in different markets. This practice has been used for chemical materials and primary materials such as lumber, wheat, cotton, and butter (England and Leenders) [7]. Different grades of a product may be sold at different selling prices under different names or marketed in different chain stores or areas under the same brand name (Tang) [15]. Economic inspection procedures under similar situations are considered by several authors; Tang [15], Bai and Hong [3], Kim et al. [12], Lee and Jang [13], and Hong et al. [10]. In this paper, economic models are developed for jointly determining the optimum process mean and the tolerance limits for each market in situations where there are several markets with different price/cost structures. A Taguchi's quadratic

711 loss function is utilized for developing the expected profit function model. The loss caused by imperfect quality may include loss of goodwill, warranty, replacement cost, and handling cost. Classical concept in the field of optimum target value determination assumes that this loss is a constant when an item does not conform to product specifications and is zero otherwise. However, Taguchi [15] argued that this cost concept was incorrect. Instead, he suggested that a quadratic function of the deviation from the product target value could better measure the true loss. This function has received widespread attention and been used by several researchers. This paper is organized as follows: An economic models is formulated and the procedure for obtaining the optimum solution is derived in Section 2. An example is used to illustrate the solution procedure, and sensitivity analyses are performed in Section 3. 2. The Model Let 7 be a performance variable representing the quality characteristic of interest and r denote the target value of 7 . We assume that 7 is a "larger is better" variable and normally distributed with unknown mean value u and known variance <7 . Suppose that a product can be sold to several different markets. When products are sold to market / , the selling price is A,, and the item with y < r causes a loss of C, (y, r) = a,- (y - T) which is a quadratic function. a ( is a positive constant, and y is the observed value of 7 . This function was strongly advocated by Taguchi (1984) and has received widespread attention. If y > x , Ci (y, r ) = 0 . Now consider the case where At>Aj and Ci (y, T) < Cj (y, r) , or A, = A} and C, (y, r ) < C} {y, r) . It is easy to verify that market j is dominated by market i and thus a product should be sold to market / rather than market j . Therefore only the markets which are not dominated need to be considered. Assume that there are m markets which are not dominated. It is not profitable to ship the low quality products to an ordinary market because of the penalty cost C,- (y, r ) . Therefore, market m is considered as an alternative with one of following modes; sell the products at a discount, scrap the products, etc. Without loss of generality, it is assumed that A, > A} and C, (y, r) > Cj (y, r) for all i<j. The condition Cj (y, T) > Cj (y, r) is equivalent to the condition a, > ay.. Since a smaller index market / requires higher quality products than a lager index market j , an appropriate inspection procedure is as follows. 1.

Take measurement y for each incoming item.

712

2.

Let Sj , i = 1,2,..., m , be real numbers such that Sl>S2>...>Sm = -oo and 80 = oo . If St < y < 5t_x , i = 1,2,...,m, ship the item to market / . Note that if Sj = 8-_j, the item will not be shipped to market / .

The item is shipped to market i whenever Sj < y < St_i, i = 1,2,..., m . Therefore, the expected revenue for market i is

A^~lg(y)dy,

(i)

where giy) is the probability density function of Y which is a normal density function with mean jJ. and variance <J . The production cost per item is c0+cty which is proportional to the quantity y . c 0 and C\ are positive constants (Bisgaard et al. (1984)). The expected production cost per item thus becomes Ko(c 0 +cly)g(y)dy = C0 +cxfly . The expected penalty cost for market / caused by imperfect quality is $" Cj(y,T)g(y)dy.

(2)

Therefore, the expected profit per item is given by m

EP=-sy-c0-cxfiy+Y,

*

£'"ki(y)g(y)dy.

(3)

where sy is the inspection cost of Y and kt (y) = At - Ct (y, r) . The optimum values of (/J ,8],S2,---,8m) can be obtained by maximizing equation (3). We first determine the optimum tolerance limits St =8j (jUy) , i = l,2,...,m , for given fj.y and then determine /dy maximizing the expected profit. For a given value of jU , the expected profit is maximized by choosing the values of 8\,S2,---,Sm that maximize the fourth term in equation (3). An upper bound of the fourth term for given jU is J[ {maxkj(y)}g(y)dy. This value is clearly attained by shipping the item to market i whenever y e.lx•, , i = l,2,...,m , where It is the set of real numbers y satisfying the inequalities kj(y)> kj(y) for all j ^ i , simultaneously. Since k\ (y) — k: (y) for j >\ is a nondecreasing function of y , it is clear that 7| is given by the interval / , =[Si ,oo) where Sj , i = 1,2,...,m — 1 , are the smallest real numbers satisfying the inequalities kj (y) > k (y) for all / > /, simultaneously. Similarly, if / , is not empty, it is of the form I2= [S2 ,8^ ) . If I2 is empty, we let S2 = Sx . By the

713 same reasoning, if I{, i = 3,4,..., m — 1, is not empty, it is of the form / — \8t ,8',-_, ) . If Ij is empty, we let 8X = S t_x . Since kt (y) is a function of the parameters ( A { , at, T) , it is clear that the optimum values of (8^ ,S2 ,...,Sm ) also depend on the same parameters and do not depend on the value of fJ. . Inserting the optimum values of (8X ,82 ,...,8m ) into equation (3), we obtain EP=-sy

- c 0 -cxny

+ ^Jmaxki(y)}g(y)dy.

(4)

Setting the partial derivative of equation (4) with respect to fl following equation is obtained ^Jmaxki(y)}(y-My)g(y)dy

- cxa2

= 0.

to zero, the

(5)

It is difficult to find closed form expressions for the solution of equation (5). Numerical studies over a wide range of parameter values of (r, At, al, cr Cj) indicate that equation (5) has unique solution and it represents a maximum point. Search algorithms such as Newton-Rapson or bisection method can be used for finding the value of jUy . 3. A Numerical Example Consider a packing plant of cement industry. The plant consists of two processes; a filling process and an inspection process. Each cement bag processed by the filling machine is moved to the loading and dispatching phases on a conveyor belt. Inspection is performed by continuous weighting feeders (CWFs). A CWF measures the mA (milli ampere) X of the load cell of the cement bag, which is positively correlated with the weight Y of the cement bag. From theoretical considerations and past experience, it is known that the variance of Y, ay =(1.25%) , and that X for given Y = y is normally distributed with mean 4.0 + O.O&y and variance (0.05mA)2 . That is, X and Y are jointly normally distributed with, unknown means (JUX,JU ) , known variances ax =(0.112m/i) , ay =(l.25kg) , and correlation coefficient p = 0.894. The weight marked on each bag is 40kg, and it is the target value r. The cement bag can be sold to foreign, domestic, or discount markets. The low quality product is scraped because of the penalty cost. The selling price in the foreign market is higher than that of the domestic market. The cost caused by imperfect quality in the foreign market is also higher than that of the

714 domestic market, because of differences in the costs of identifying and handling a nonconforming item, labor cost, transportation cost, etc. We will consider the foreign market as market 1, the domestic market as market 2, the discount as market 3, and the scrap as market 4. The selling prices and the estimated penalty cost coefficients in dollars are as follows: Foreign market Domestic

market

Discount

Scrap Price (A,) Penalty cost coefficient ( a . )

40 10.5

39 6.5

24 0.75

0 0

The production cost in dollars is 6.0 + 0.6y which is proportional to the quantity y , and the inspection costs are sx =$0.2 and sy =$1.3 . Using these values, we obtain (/j.*,£,*,S2*,S$) = (41.74, 39.50, 38.38, 34.34). Hence the cement bags are sold to the foreign market if y > 39.50, to the domestic market if 38.38 < y < 39.50 , or to the discount market if 34.34 < y < 38.38 . If >- < 3 4 . 3 4 , they are scraped. In this case the expected profit per item is $7,333. The optimum values of /J and their expected profits are given in Table 1 for selected values of <Jy for 0.50 (0.25) 2.50. It shows that /u can be set significantly lower as ay decreases. These results agree with our intuition that // can be set r if
s;

s2*

V

Expected

°y 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50

40.56 40.95 41.35 41.74 42.13 42.51 42.88 43.24 43.60

39.5 39.5 39.5 39.5 39.5 39.5 39.5 39.5 39.5

38.38 38.38 38.38 38.38 38.38 38.38 38.38 38.38 38.38

34.34 34.34 34.34 34.34 34.34 34.34 34.34 34.34 34.34

8.229 7.937 7.636 7.333 7.034 6.738 6.447 6.158 5.873

Profit

4. Conclusions We have considered the problem of jointly determining the optimum process mean and tolerance limits for each market in situations where there are several markets with different price/cost structures. A Taguchi's quadratic loss function is utilized for developing the profit model. It is difficult to find closed form

715 expressions for the optimum process mean as well as to show analytically that the solution is optimum. Numerical analyses over a wide range of parameter values, however, indicate that the expected profit functions are indeed unimodal for the process mean. A numerical search such as Muller's method is used to find the optimum process mean. Extensive sensitivity analyses show that the optimum process mean and tolerance limits are very insensitive to the changes of cost parameters. Numerical results also show that the optimum process mean tends to increase and the expected profit tends to decrease as the process variation <J increases. Numerical studies are performed by using FORTRAN and IMSL (International Mathematical and Statistical Libraries) subroutines on a Pentium II PC. In most cases the results can be obtained within a few minutes. A possible area of further investigation would be the extension of the model to the cases where the parameters (cr , c , A , , A 2 ) are unknown. References 1.

2.

3. 4. 5.

6.

7. 8. 9.

K. S. Al-Sultan, An Algorithm for the Determination of the Optimal Target Values for Two Machines in Series with Quality Sampling Plans, International Journal of Production Research 12, (1994). F. J. Arcelus and M. A. Rahim, Simultaneous Economic Selection of a Variables and an Attribute Target Mean, Journal of Quality Technology 26, 125-133 (1994). D. S. Bai, and S. H. Hong, Economic Design of Sampling Plans with MultiDecision Alternatives, Naval Research Logistics 37, 905-918 (1990). S. Bisgaard, W. G. Hunter and L. Pallesen, Economic Selection of Quality of Manufactured Product, Technometrics 26, 9-18 (1984). T. O. Boucher and M. A. Jafari, The Optimum Target Value for Single Filling Operations with Quality Sampling Plans, Journal of Quality Technology 23, 44-47 (1991). S. L. Chen and K. J. Chung, Selection of the Optimal Precision Level and Target Value for a Production Process: the Lower-Specification-Limit Case, HE Transactions 28, 979-985 (1996). W. B. England and M. R. Leenders, Purchasing and Materials Management (5th Edition, Homewood, IL: Richard D. Irwin), (1975). D. Y., Golhar, Determination of the Best Mean Contents for a Canning Problem, Journal of Quality Technology 19, 82-84 (1987). S. H. Hong and E. A. Elsayed, Setting Optimum Mean for Processes with Normally Distributed Measurement Error, Journal of Quality Technology 31,338-344(1999).

716 10. S. H. Hong, E. A. Elsayed and M. K. Lee, Optimum Mean Value and Screening Limits for Production Processes with Multi-Class Screening, International Journal of Production Research 37, 155-163 (1999). 11. W. G. Hunter and C. P. Kartha, Determining the Most Profitable Target Value for a Production Process, Journal of Quality Technology 9, 176-181 (1977). 12. C. T. Kim, K. Tang and M. Peters, Design of a Two-Stage Procedure for Three-Class Screening, European Journal of Operational Research 79, 431-442(1994). 13. M. K. Lee and J. S. Jang, The Optimum Target Values for a Production Process with Three-class Screening, International Journal of Production Economics 49, 91-99 (1997). 14. G. Taguchi, Quality Evaluation for Quality Assurance, Romulus, MI: American Supplier Institute, (1984). 15. K. Tang, Design of Product Grading Procedures, Decision Sciences 21, 434-445 (1990). 16. K. Tang and J. Lo, Determination of the Process Mean When Inspection is Based on a Correlated Variable, HE Transactions 25, 66-72 (1993).

SEPARATE RESPONSE SURFACE MODELING FOR MULTIPLE RESPONSE OPTIMIZATION YOUNG JIN KIM Department of Systems Management and Engineering, Pukyong National Busan 608-739, Republic of Korea

University

In spite of the growing importance of multiple response optimization, there have been few research efforts in this area. This article proposes a versatile optimization model by employing the concept of quality loss function and response surface modeling. Taguchi (1986) proposes the use of quality loss function to measure a societal cost incurred by the customer on a monetary scale. Mostly applied to a single quality characteristic problem, however, the quality loss function may also be extended to multiple response systems by combining performances of individual quality characteristics into a single objective function. The overall performance of a multiple response problem is then evaluated by obtaining mean and variance responses for each quality characteristic, and covariance responses among quality characteristics.

1. Introduction Most manufacturing industries have been faced with the problem involving simultaneous optimization of several quality characteristics that could be considered as the basis for the product selection by customers. One of the major concerns in manufacturing design is to find settings of design parameters that result in a satisfactory combination of quality characteristics. This generally requires a trade-off or balancing between quality characteristics. Response surface methodology (RSM) has widely been used in design optimization problems. Especially, Myers and Carter (1973) proposed a dual response surface technique, which models process mean and variance as separate surfaces, to achieve a robust design for single quality characteristic. Several approaches using RSM have been proposed to optimize multiple response systems and include desirability function (Derringer and Suich 1980, 1994), utility function (Myers, Khuri, and Carter 1989), and quality loss function (Pignatiello 1993, Ames et al. 1997, and Vining 1998). Taguchi (1986) perceived that product quality is closely related to the manufacturing imperfection and quality loss is always incurred to the society if the quality characteristic(s) of a product deviates from the nominal target value(s). Quality loss function (QLF) is widely accepted since it provides a 717

718 monetary indication of product quality by relating the deviation of a product characteristic from its target value to the monetary loss. As discussed in Pignatiello (1993) and Ames et al. (1997), QLF can be extended to multiple response systems by combining performances of multiple quality characteristics into a single objective function. We will use the concept of QLF for multiple response optimization. Pignatiello (1993) proposed two different strategies for multiple response optimization: a direct strategy and a partitioned strategy. A direct strategy models and estimates QLF itself as a response of interest. In a partitioned strategy, the sum of variance and sum of squared bias in QLF are modeled separately. He proposed a two-step procedure in which the variance portion is first minimized and the squared bias term is then minimized with given minimum variance. He did not provide the reasoning for the preemptive optimization of the variance portion. Moreover, rather unrealistic assumption has been made by strictly partitioning design variables into three nonoverlapping components: design variables that affect process dispersion, that affect process mean, and that affect neither the mean nor the dispersion. Ames et al. (1997) proposed a multiple response optimization scheme that minimizes weighted sum of individual QLF's. Vining (1998) pointed out that their model ignores the correlation structure among the responses and proposed a squared error loss approach based on the distance function of Khuri and Conlon (1981). Even if addressed the issue of correlation among responses, this approach may have difficulty in separately analyzing process mean and variance-covariance structure since they are embodied in estimated functions of individual responses. On the other hand, individual responses can not be analyzed separately within the framework proposed by Pignatiello (1993) since all the responses are mixed up in the estimated functions. This problem may be overcome by employing a dual response surface approach since the process mean and variance for individual responses are modeled and estimated separately. This paper provides a multiple response optimization scheme by employing the concept of QLF and extends a dual response surface approach to multiple response systems. The proposed model is aimed at minimizing QLF for multiple responses. To do this, we need to know estimated functions for process means, variances, and covariances. Within the proposed framework, the mean and variance for individual responses are modeled separately through a dual response surface approach. Furthermore, covariances between individual

719 responses are also estimated using the data obtained through design of experiment. 2. Quality Loss Function for Multiple Responses Most manufacturing product has more than one quality characteristics and its quality is typically evaluated by the customer on the basis of multiple quality characteristics. Let rxl vector of quality characteristics denoted by y = [yi>y2>'">yr]' • Assume that y follows an /"-dimensional multivariate normal distribution with mean vector M = [juuju2,---,^r]' and variancecovariance matrix S and all yi,s(i = \,2,---,r) are functions of the same set of p input variables x = {xx,x2,---,xp) . Furthermore, the target values for individual quality characteristics are known and denoted by a vector T = [ r 1 , r 2 , - - , r r ] ' . With this, a quadratic loss function for multiple responses can be defined by (Pignatiello 1993) Z(y(x)) = ( y ( x ) - T ) ' K ( y ( x ) - T ) .

(1)

where AT is an r x r vector of positive loss coefficients and represent the losses incurred when y deviates from T. Note that if K is a diagonal matrix, it implies that there are no interactions between any responses and the quality loss function becomes just a sum of r single response quality loss functions, i.e.,

i(y(x)) = X*/0'l-(x)-r/)2. (=i

th

where k, is the z' diagonal element of K. And non-diagonal elements of K are related to the incremental losses only when pairs of quality characteristics are simultaneously off the target. It can easily be shown that the expected quality loss is given by E[L(y (x))] = (M - T)'K(M - T) + trace(KS),

(2)

or equivalently (Kapur and Cho 1996),

£Wy«)] = X 4 t t -r,)2 + ^ ] + Z I X h + ( ^ -^(Mj -TJ)], i=\

(3)

i=2 j=l

where ktj and atj are the (i,j) element of A" and S, respectively, and cr,2 is the t diagonal element of Z. If r = 2 , the expected quality loss is given by

720

E[L(y(x))] = kx[(Ml -T02+V?\+

k2\jt2

-r2)2+a22\ (4)

QLF is widely accepted since it provides a monetary indication of product quality by considering the deviation of a quality characteristic from its target value. For multiple response optimization, minimizing quality loss is a simple and persuasive approach to finding the best settings of the design variables. Another advantage of QLF approach is that it does not require much information on preferences while those kinds of information are indispensable in constructing desirability functions and utility functions. One of the basic features of the desirability and utility functions is the subjectivity on the part of the user in evaluating. Moreover, improperly assessed desirability functions or utility functions can lead to inaccurate results. 3. Model Development When the true response function is unknown, it could be approximated over a limited experimental region by a polynomial representation through response surface methodology. Consider the situation in which multiple responses of quality characteristics y = [yu v2,---,yr]' depend on the same set of variables x = (x],x2,---,xp). It has been very common to estimate each response as a quadratic response of the form. p

p

p

Aw^o+X/^+ZZ/v^ • (=1

(5)

1=1 _/=/

Kim and Lin (1998) pointed out that such a model works well when the variance of the response is relatively small and stable, but when the variance is not a constant, classical response surface methodology could be misleading. For the robust design of single quality characteristic, Vining and Myers (1990) proposed a dual response approach, which models process mean and variance as separate responses, to tackle such a problem. The concept of a dual response approach can be extended to the multiple quality characteristics. It is obvious that the biases, variances, and covariances of quality characteristics should be reduced to decrease quality loss. To estimate process means and variances for individual responses, a typical dual response approach can be appropriately employed. Let yukj- be the fh experimental value of kth quality characteristic at the wth design point. The sample mean and variance for each

721 quality characteristic can be used to get the estimated functions for process mean and variance. The sample mean and variance for tih quality characteristic at the M"1 design point are given by i J".

yUk=—Y.

m

i and

4 = — T X (y>*i - *«* >2 •

yUkj

7=1

(6)

7=1

where m is the number of replications at each design point. Equation (5) can be used to get the estimated functions for process mean and variance. Let jik (x) and ak (x) represent the fitted functions for the mean and the standard deviation of the kth quality characteristic, respectively. Assuming a second-order polynomial model for the response functions, the fitted functions on the basis of the sample mean and variance yield to p

p

X

p

h (*) = Po + X P' > Z Z PiJXiXJ P

°k W = n+X

+

P

r x

P

+

< < H S YVX'XJ •

(7)

Similarly, we can get the estimated functions of mean and standard deviation for every response. Thus, the biases and variances can be taken care of by employing a typical dual response surface approach. Now, we only need to know the estimated functions for covariances as well as means and variances to develop our QLF minimization model. To do this the concept of a dual response surface approach will also be extended to covariances just like mean and variance. It is well known that a sample covariance is an unbiased estimator of a covariance. Thus a sample covariance can be used to find estimated functions of covariance. The sample covariance between kth and f" quality characteristics at the wth design point can be expressed as 1 m uki = —7y(y U kj~y U k)(y u ij-y u i),

s

for k*i.

(8)

7=1

Assuming a second-order polynomial model for the response function again, an estimated function for covariance between A* and f" quality characteristics
722 P

P

a x

<*«(*) = «0 + X ' i 1=1

+

P

S 2 ,=i

a

Vx'xJ

(9)

j=t

Similarly, estimated functions for covariance between any pair of responses can be obtained. Introducing estimated functions of means, variances, and covariances in the expected quality loss, the optimization model can simply be described as E[L(y(x))] = ] T *, [(A to - r, ) 2 + af (x)] Minimize

'"

+

,

x +

ZZ^h( ) (AW-^)(^W-r7)] 1=2 7=1

where x e fi and Q represent the feasible region. Major advantages of the proposed model over the previous ones come from its modeling versatility. The model by Pignatiello (1993) uses an estimated function for expected quality loss itself. So, individual responses cannot be analyzed within the framework and one can get only an overall picture of the multiple response systems. Later, Vining (1998) modeled individual responses separately and discussed the variance-covariance structure of multiple response systems. However, the results may turn out to be misleading if the variances or covariances are not stable. In this regard, separate modeling of process mean, variance, and covariance seems to be attractive and makes it possible examining multiple response systems more rigorously and extensively. Furthermore, individual responses can be analyzed separately with the proposed model. For example, suppose keeping the mean of the first quality characteristic on its target should be pre-emphasized over the other criteria. Then, we just need to add the constraint Mi(x) = Ti t 0 t n e proposed model. However, other approaches cannot be applied to this kind of situations. Separate modeling process means, variances, and covariances may provide us with a great deal of opportunities to explore individual responses in multiple response systems, so the proposed model can be considered as a versatile approach compared with existing models. 4. Conclusions We have proposed a multiple response optimization model which minimizes QLF by extending the concept of a dual response surface approach to the multiple response systems. Individual process means, variances, and

723

covariances are modeled separately and estimated as in a dual response surface approach. Introducing estimated functions into the QLF, the optimization model is to minimize QLF. Separate modeling for each process parameter provides us with modeling versatility. As a concluding remark, we would like to point out that the weights representing relative importance of individual responses could be introduced to the proposed model. Let W be the importance weight matrix of individual responses. Then, the QLF can be written as

£(y (*)) = ( y « - T) W (

y

(x) - T).

Taking expectations on both sides yield to £[£(yO))] = (M - T ) ' K ^ ( M - T) + trace(KJfE). Introducing estimated functions, one can get the optimization model which considers relative importance of responses. How to get the weight matrix W is beyond the scope of this paper and needs to be examined further. References 1.

A.E. Ames, N. Mattucci, S. MacDonald, G. Szonyi and D.M. Hawkins, Journal of Quality Technology, 29, 339 (1997). 2. N. Artiles-Leon, Quality Engineering, 9, 213 (1996). 3. R.E. Chapman, Quality Engineering, 8, 31 (1995). 4. E. Del Castillo and D.C. Montgomery, Journal of Quality Technology, 25, 199(1993). 5. G. Derringer and R. Suich, Journal of Quality Technology, 12, 214 (1980). 6. K.C. Kapur and B.R. Cho, HE Transactions, 28, 237 (1996). 7. A.I. Khuri and M. Conlon, Technometrics, 23, 363 (1981). 8. K.J. Kim and D.K.J. Lin, Journal of Quality Technology, 30, 1 (1998). 9. D.K.J. Lin and W. Tu, Journal of Quality Technology, 27, 34 (1995). 10. J.J. Pignatiello, HE Transactions, 23, 5 (1993). 11. G.G. Vining, Journal of Quality Technology, 30, 309 (1998). 12. G.G. Vining and R.H. Myers, Journal of Quality Technology, 22, 38 (1990).

A GLOBAL CRITERION APPROACH TO MULTIPLE RESPONSE OPTIMIZATION YOUNG JIN KIM Department of Systems Management and Engineering, Pukyong National Busan 608-739, Republic of Korea

University

BYUNG RAE CHO Department of Industrial Engineering, Clemson Clemson, SC 29634, United States

University

Most manufacturing industries have been faced with the problem involving simultaneous optimization of several quality characteristics that may be considered the basis for the product selection by customers. Compared with the case of single quality characteristic, however, the design optimization of multiple quality characteristics has received little attention. This paper employs the concepts of the dual response surface technique and the MSE criterion to the optimization of multiple quality characteristics. The proposed model is aimed at simultaneously minimizing the MSE of individual quality characteristics. However, the optimal solution to one quality characteristic may result in poor performances in other quality characteristics. Thus, a tradeoff among quality characteristics is required and the design optimization of multiple quality characteristics may be viewed as a multiple objective programming problem. A global criterion approach (Lai and Hwang 1994) is employed to set up an optimization model for multiple quality characteristics.

1. Introduction Most manufacturing industries have been faced with the problem involving simultaneous optimization of several quality characteristics that could be considered as the basis for the product selection by customers. The design optimization with multiple quality characteristics is to find settings of design variables that result in a satisfactory combination of quality characteristics. This generally requires a tradeoff or balancing among quality characteristics. The response surface methodology (RSM), which is designed to find optimal settings of design variables to optimize the response (or a set of responses), has widely been used in design optimization problems. Especially, there have been many progresses in design optimization associated with single quality characteristic. Recently, a dual response surface technique, developed by Myers and Carter (1973), has received a lot of attention. Unlike the classical 724

725

RSM, the dual response surface approach models the process mean and variance as separate responses. Vining and Myers (1990) first applied the dual response approach to achieve robust design, and minimized variance subject to the constraint that process mean equals to the target value. Later, Lin and Tu (1995) pointed out that their approach may rule out better conditions by forcing process mean to a fixed value, and proposed a procedure based on the minimization of mean squared error (MSE). Considering that the quality of a product is closely related to both the bias and variance of the process (Taguchi 1986), the MSE approach is quite appealing since it incorporates the variance as well as bias. Besides, there have been several efforts to further develop the dual response surface approach including Castillo and Montgomery (1993), Copeland and Nelson (1996), Kim and Lin (1998), and Kim and Cho (1998). Compared with the case of single quality characteristic, the design optimization of multiple quality characteristics has received little attention. Furthermore, there are so many opportunities to extend the concept developed in the design optimization of single quality characteristic, such as a dual response surface approach, to the optimization of multiple quality characteristics. This paper extends the concepts of the dual response surface technique and the MSE criterion to the optimization of multiple quality characteristics. The proposed model is aimed at simultaneously minimizing the MSE of individual quality characteristics. However, the optimal solution to one quality characteristic may result in poor performances in other quality characteristics. Thus, a tradeoff or balancing among quality characteristics is required and the design optimization of multiple quality characteristics can be viewed as a multiple objective programming problem. A global criterion approach (Lai and Hwang 1994) is employed to set up an optimization model for multiple quality characteristics. The proposed model first identifies the individual optimal solutions, which minimize the MSE of each quality characteristic, and then provides the compromise solution that balances the generalized distances (Khuri and Cornell 1987) from the individual optimal solutions. The MSE's are incommensurable due to the differences in relative magnitudes of individual quality characteristics. A generalized distance removes the incommensurability. 2. A Dual Response Surface Approach and MSE Criterion The RSM is designed to find the optimal settings for a set of design variables that optimize the response. Typically, such a problem is focused on the mean value of the response, so the classical RSM works well when the variance of the response is stable. As pointed out in Lin and Tu (1995) and Kim and Lin (1998),

726

the classical RSM does not work well if the variance is not stable. Vining and Myers (1990) first made use of the dual response surface approach, developed by Myers and Carter (1973), to resolve this problem. The dual response surface approach models the process mean and standard deviation as separate responses as follows: p

p

p

i=\

i=\

j=i

p

p

p

y x +

<7« = n+Yu ' i HH 1=1

Y

'Jx'xj

(l

)

1=1 j=i

where //(x) and o-(x) represent the fitted functions for the mean and the standard deviation, respectively. Vining and Myers (1990) proposed the following model to achieve the robust design of nominal-the-best quality characteristic. Minimize
(2)

subject to x e Q . Considering that the quality of a product is closely related to the variance as well as the bias of the process (Taguchi 1986), the MSE approach is quite appealing since it incorporates both of them. Furthermore, the MSE criterion can easily be applied to the optimization of multiple quality characteristics as shown in the next section.

727

3. Model Development 3.1. Multiple Objective Programming Consider the situation in which the quality of a product is determined by n quality characteristics, Yl,Y2,---,Yn , that depend on the same set of input variables x = {xx,x2,---,xp). To estimate process mean and variance for each quality characteristic, a typical dual response surface approach can be employed. Let yukJ be the jth experimental value of kth quality characteristic at the uth design point. The sample mean and variance for each quality characteristic can be used to get the estimated functions for process mean and variance. The sample mean and variance for kth quality characteristic at the uth design point are given by i Jn

I

yUk =—Y,yUkj mr-i

_m

and s

lk = — { Y i y u k j -y»k)2 • OT-1 —

where m is the number of replications at each design point. Letting juk (x) and ak (x) denote the estimated functions of mean and standard deviation of kth quality characteristic, respectively, juk (x) and ak (x) can be found as in equation (1). Thus, one can get the estimated functions of mean and standard deviation and the optimization model based on MSE criterion can be constructed as in equation (2) for each quality characteristic. However, the optimal solution for one quality characteristic may not be the optimal to the other quality characteristics. Thus, the optimization of multiple quality characteristics can be written as a multiple objective programming problem Minimize MSE, (x) = (/}, (x) - r, )2 + a? (x)

Minimize MSE„ (x) = (//„(x) - r„ )2 + a2n (x) subject to x e Q , where MSE,(x) and r,- represent the MSE and target value of the quality characteristic Y{, respectively. The simplest way to solve this problem is to minimize the sum of the objective functions. However, keeping that the relative magnitudes of MSE's are different from each other in mind, this naive approach may yield to the solution close to the point which minimizes the MSE of the quality

728

characteristic with largest magnitude. For example, consider a situation where two quality characteristics jointly determine the quality of a product. Suppose the MSE of one quality characteristic ranges from 1 to 2 and that of the other from 100 to 200 within the feasible region. If the optimal solution for each quality characteristic is different from the other, the optimal solution that minimizes the sum of MSE's may be close to the point which minimizes the MSE of the quality characteristic with larger magnitude. Consequently, minimizing the sum of MSE's is not convincing due to the incommensurability among MSE's. 3.2. Proposed Optimization Model In the context of multiple objective programming, a global criterion approach provides a compromise solution, which balances the performances of individual objective functions, while removing the incommensurability among objective functions. We shall apply the global criterion approach to the optimization of multiple quality characteristics. One can get the optimal solution for each quality characteristic by solving optimization model as in equation (2). Let MSE* and MSE/ denote the MSE's of the quality characteristic Yt and K(for j*i) at the point that minimizes MSE,(x), respectively, then a payoff matrix can be developed as in Table 1. Further, define MSE," as the maximum of MSE/ for all j&i . The generalized distance of MSE,-(x), denoted by of, (x), can be written as |MSE,W-MSE-| [MSET-MSE*] As seen in equation (3), the generalized distance function removes the differences in relative magnitudes of individual objective functions by properly normalizing. A design optimization of multiple quality characteristics requires a simultaneous optimization. If a minimum operator is employed for incorporating multiple objectives, a design optimization problem can be written as Minimize 9 subjectto

dt{\)<6,

i = 1,2,---, n , and x e Q .

The above formulation, which is actually a min-max problem, balances the generalized distances of individual MSE's. The proposed model not only

729 balances the performance of each quality characteristic, but also removes the differences in relative magnitude of individual MSE's. Table 1. A payoff matrix.

Setting

MSE,(x)

MSE 2 (x)

MSE,(x)

x (,)

MSE*

MSE,2

MSE;

MSE"

x (2)

MSEJ

MSE j

MSE^

MSEj

x (;)

MSE)

MSE 2

MSE*

MSE,"

x(">

MSE),

MSE2,

MSE'„

MSE*„

..

MSE„(x)

X '' indicates the point at which MSE,(x) is minimized.

4. Conclusions This paper proposes an optimization model for multiple quality characteristics by employing the concept of a dual response surface approach and the MSE criterion. The process mean and variance for each quality characteristic are modeled separately. It is shown that the design optimization of multiple quality characteristics can be modeled as a multiple objective programming problem, in which individual objective functions are to minimize the MSE's of individual quality characteristics. Individual objectives are normalized by defining generalized distance. Thus, the proposed model balances the generalized distances of quality characteristics based on a max-min formulation. As a concluding remark, we would like to point out that the weights representing relative importance of each quality characteristic could be introduced to the proposed model. Let wi be the pre-specified constant which represents the relative importance of the quality characteristic Yt. Then, the optimization model can be written as

730

Minimize 9 subject to Wjdj (x)<0, y

;i

i = 1,2, • • •, n

w, =1 and x e Q .

References 1. A.E. Ames, N. Mattucci, S. MacDonald, G. Szonyi and D.M. Hawkins, Journal of Quality Technology, 29, 339 (1997). 2. N. Artiles-Leon, Quality Engineering, 9, 213 (1996). 3. R.E. Chapman, Quality Engineering, 8, 31 (1995). 4. E. Del Castillo and D.C. Montgomery, Journal of Quality Technology, 25, 199(1993). 5. G. Derringer and R. Suich, Journal of Quality Technology, 12, 214 (1980). 6. A.I. Khuri, and M. Conlon, (1981). Technometrics, 23, 363 - 375. 7. A.I. Khuri and J.A. Cornell, Response Surfaces: Designs and Analyses. Marcel Dekker, Inc., New York, NY, (1987). 8. K.J. Kim and D.KJ. Lin, Journal of Quality Technology, 30, 1 (1998). 9. Y.J. Lai and C.L. Hwang, Fuzzy Multiple Objective Decision Making: Methods and Applications. Springer-Verlag, New York, NY, (1994). 10. D.KJ. Lin and W. Tu, Journal of Quality Technology, 27, 34 (1995). 11. J.J. Pignatiello, HE Transactions, 23, 5 (1993). 12. G.G. Vining, Journal of Quality Technology, 30, 309 (1998). 13. G.G. Vining and R.H. Myers, Journal of Quality Technology, 22, 38(1990).

DETERMINATION OF MEAN VALUE FOR A PRODUCTION PROCESS WITH MULTIPLE PRODUCTS MIN K 0 0 LEE Department of Information and Statistics, Chungnam National University, 220 Gung-dong, Yuseong-gu, Daejeon 621-759, Korea SUNG HOON HONG Department of Industrial & Information Systems Engineering, Chonbuk National University, Chonju, Chonbuk 561-756, Korea HYUCK MOO KWON Department of Systems and Engineering, Pukyong National University, Yongdang-dong, San 100, Nam-gu, Pusan 608-709, Korea We consider the problem of determining the optimum target value of the process mean for a production process where multiple products are processed. Every outgoing item is inspected, and each item failing to meet the specification limits is scrapped. Assuming that the quality characteristics of the products are normally distributed with known variances and a common process mean, the common process mean is obtained by maximizing the expected profit which includes selling prices, costs of production and inspection, and losses due to the scraps. A method of finding the optimum common process mean is presented and an illustrative example from electronic device production process is given.

1. Introduction The problem of selecting the most profitable mean value is considered for a continuous process. All products are inspected to determine whether their quality characteristic satisfies predetermined lower limit. Conforming products are sold at a regular price, whereas all others are reprocessed or sold at a discounted price. Typical quality characteristics applicable for such strategy are weights, volume, number and concentration. Products produced by a production process may deviate from the process mean because of variations in materials, labor and operational conditions. The process mean may be adjusted to a higher value in order to reduce the proportion of the nonconforming items. Using a higher process mean, however, may result in a higher production cost. Consequently, the decision of selecting a process mean should be based on the 731

732

tradeoff among production cost, payoff of conforming items, and the costs incurred due to nonconforming items. Techniques to determine the optimum process mean have been discussed and developed for more than 40 years. Bettes (1962) solved the problem of choosing the optimum values for the process mean and upper specification limit. Hunter and Kartha (1977) considered the problem of selecting the optimum process mean when underfilled items are sold at a reduced price. Bisgaard et al. (1984) extended Hunter and Kartha's model to a situation where underfilled items are sold at a price proportional to the amount of ingredient used. Golhar (1987) considered the problem of selecting the optimum process mean in a canning process; cans filled above a lower specification limit are sold at a fixed price and the underfilled cans are emptied and refilled at a reprocessing cost. Boucher and Jafari (1991), and Al-Sultan (1994) discussed situations in which the items are subjected to lot-by-lot acceptance sampling rather than complete inspections. Tang and Lo (1993), Lee and Jang (1997), and Lee et al. (2001) determined the optimum process mean and the screening limits when a surrogate variable is used in inspection. Elsayed and Chen (1993) determined optimum levels of process parameters for products with multiple characteristics, and Arcelus and Rahim (1994) developed a model for simultaneously selecting optimum target means for both variable and attribute quality characteristics. Chen and Chung (1996) considered an economic model for determining the most profitable target value and the optimum measuring precision level for a production process. Hong and Elsayed (1999) studied the effects of measurement errors on process target, Pfeifer (1999) showed the use of an electronic spreadsheet program as a solution method, and Hong et al. (1999) presented the problem of jointly determining the optimum process mean and screening limits in situations where there are several markets with different price/cost structures. Most recently, Rahim and Shaibu (2000) and Lee et al. (2004) applied the Taguchi loss function to determine the optimum process target and variance. Kim et al. (2000) proposed a model for determining the optimal process target with the consideration of variance reduction and process capability. Teeravaraprug and Cho (2002) designed the optimum process target levels for multiple quality characteristics, and Rahim et al. (2002) considered the problem of selecting the most economical target mean and variance for a continuous production process. Finally, Duffuaa and Siddiqui (2003) considered process targeting with multi-class screening and measurement error.

733

Although the quality engineering literature related to this issue contains a vast collection of work, some questions still remain unanswered. In all the previous studies, they considered the case where only one product type is produced. In some situations, however, multiple products may be produced through a common production process. For example, many types of electronic devices are processed in some metal plating process. The plating thickness of electronic device depends on the volume of the electronic device. In this situation, the previous studies cannot be applied directly. Assuming that the quality characteristics of multiple products are normally distributed with known variances and a common process mean, the optimum target value of the common process mean is found by maximizing the expected profit function which involves selling price, production, inspection, scrap costs. The proposed model is demonstrated with an illustrative numerical example and sensitivity analyses are performed. 2. The Model Consider a production process where multiple products are produced continuously through a common process. Suppose that the quality characteristics (X{, X2,..., Xn ) of products are independently and normally distributed with a common process mean JU and known variances (cr, , <J2,..., an ) . Every item is inspected prior to the shipment to determine whether they are conforming to the lower and upper specification limits L- and £/, or not. If L-, < X-, < Uj, then the item is sold at a fixed selling price per item of the i

product Ai for / = 1, 2, • • •, n . Otherwise, the

item is scrapped at a scrap cost S . The production cost per item is assumed to be equal for different product types, i.e., B + Cxt, where B and C represent the fixed per item and the unit cost of a quantity of material. Let Tt be the indicator variable, which takes 1 when an i

product type is produced and 0

otherwise. It should be noted that £[7^] = a. for i = \,2,---,n th

denotes the production proportion of the / The profit function P(x{ ,x2i...,x„;//)

, where at n

product type and thus V

= P can then be written as

a

=\.

734

\A, -B-Cxt -C,), (-B-Cxi -C,-S), P= (-B - Ox, -Cj- S),

forT, = 1,Lj < x , Ut,

0,

(1)

for ^ = 0,

where C and C ; arethe unit cost of a quantity of material and the inspection cost per item, respectively. Taking the expectatioin on both sides of equation (1) yields

£(/>) = £

a,. JM-B-Cxi-CMixJdXi-a,

-a, ^(B +

j(B + Cxi +

CI+S)fi(xi)dxi

Cxi+C1+S)fi(xi)dxi

(2)

Ui

where /.[•) represents the probability density function of Xt . After some algebra, equation (2) can be rewritten as

E{P) = Yjai{Ai+S)U Ui-M

-
— B-C/u-C,-S,

(3)

1=1

where ^(-) and O(-) are the standard normal density function and the standard normal distribution function, respectively. The optimum common process mean jU can then be obtained by maximizing E(P) , of which solution procedures are discussed in the next section. 3. The Optimum Solution The optimum solution to equation (3) can be obtained by taking the first and second derivatives with respect to (J., which are given by dE(P) _

and

^ajjAi+S)

t

-

\Li-M)

I

°i

•c,

J

(4)

735

dzE{P) 2

dM

=-z

a,(Ai+S)

r

(U,-M)*

ut-^

i=\

(5) respectively. It is reasonable to set the process mean greater than the lower specification limits and smaller than the upper specification limits, i.e., JU > L, and // < £/, for all / , for which equation (5) is negative. Therefore, the expected profit function is concave when jU > Li and fi < Ut for all i, and thus the global maximum is guaranteed at the value of // satisfying dE(P) I d/U = 0 . Consequently, the optimum common process mean JU is the value of jJ, such that

aM+S)

I ai J

(=! jU>Lj

-
and // < t/(. for all /

•-c,

(6)

Computation search algorithms, such as bisection search and golden section search, may be employed to obtain the optimum process mean JU from equation (6). In most cases, the optimum solution has been obtained within a few seconds using a simple computer program using FORTRAN and IMSL (International Mathematical and Statistical Libraries) subroutines on a 586PC. An Illustrative Example: Consider a production process where three types of electronic devices are passed through a common copper plating process. The plating thickness of electronic devices depends on the volume of the electronic devices. From the past data, it is known that the plating thicknesses {Xs ,X2,Xi)of three electronic devices are normally distributed with known variances ( <x,2 =(\.\/jm)2,a\ ={\.2fjm)2,al =(l.2jum)2 ) and a common process mean JU . The production proportions of three electronic devices are GC] — 0.4 a n d a 2 = a3 = 0.3 . Suppose that the cost components and lower specification limits of (X], X2, X3) are ^ , = $30.5, ^ 2 = $32.5, ^ 3 = $ 3 4 . 5 , = $6.0 , C = $1.0 , C 7 =$0.8 , 5 1 = $2.5 , L, = 13.25//m , L2=\2>.5fjm 3= 14.0/urn , £/,= 18.O/0w , L 2 =18.25//m and Z, 3 =18.75/^K The optimum common process mean jU is obtained from equation (6) employing the bisection search algorithm, which yields //* = 15.759//W with the expected profit E(P) = $6.8337 .

736

The optimum common process mean fj,

for the above example are given

in Table 1 for selected values of the production proportion ax, CC2, and a3 ranging from 0.1 to 0.9. The computational results agree with our intuition that * the optimum common process mean jU decreases as CCl and CC2 increase, whereas the process mean is to be set at a larger value as GC3 increases.

Table 1. fX for selected values of Of], a2

a2

a,

«3

(a, =a 3 )

a

(«2 = 3 )

and a3 .

*

(a, =a2)

0.1

15.902

0.1

15.829

0.1

15.634

0.2

15.857

0.2

15.813

0.2

15.703

0.3

15.808

0.3

15.797

0.3

15.770

0.4

15.759

0.4

15.782

0.4

15.835

0.5

15.707

0.5

15.766

0.5

15.898

0.6

15.652

0.6

15.751

0.6

15.958

0.7

15.594

0.7

15.735

0.7

16.018

0.8

15.534

0.8

15.720

0.8

16.076

0.9

15.471

0.9

15.706

0.9

16.135

4. Conclusion We suggested a model for determining the optimum common process mean for a production process where multiple products are processed. An economic model is constructed which involves selling prices, costs of production and inspection, and losses due to the scraps. We assumed the quality characteristics of the products are independently and normally distributed with known variances and a common process mean. The optimum common process mean is obtained by maximizing expected profit function. The solution is shown to be unique and optimum under the reasonable assumption that the proportion of

737

defective items is less than 50%. However, closed form expression for the optimum common process mean is not obtained, and a numerical search algorithm such as bisection method is used. In this study, we considered the case where inspection is performed directly the quality characteristics of interest. In some situations, it is impossible, or not economical, to inspect the quality characteristics of interest It will be of interest to consider the case where inspection is performed the surrogate variables correlated with the quality characteristics of interest. References 1.

2.

3.

4. 5.

6.

7.

8.

9. 10.

K.. S. Al-Sultan, An Algorithm for the Determination of the Optimal Target Values for Two Machines in Series with Quality Sampling Plans, International Journal of Production Research 12, (1994). F. J. Arcelus and M. A. Rahim, Simultaneous Economic Selection of a Variables and an Attribute Target Mean, Journal of Quality Technology 26, 125-133(1994). D. C. BETTES, Finding an Optimum Target Value in Relation to a Fixed Lower Limit and an Arbitrary Upper Limit, Applied Statistics 11, 202-210 (1962). S. Bisgaard, W. G. Hunter and L. Pallesen, Economic Selection of Quality of Manufactured Product, Technometrics 26, 9-18 (1984). T. O. Boucher and M. A. Jafari, The Optimum Target Value for Single Filling Operations with Quality Sampling Plans, Journal of Quality Technology 23, 44-47 (1991). S. L. Chen and K. J. Chung, Selection of the Optimal Precision Level and Target Value for a Production Process: the Lower-Specification-Limit Case, HE Transactions 28, 979-985 (1996). S. DUFFUAA and A. W. SIDDIQUI, Process Targeting with Multi-Class Screening and Measurement Error, International Journal of Production Research 41, 1373-1391 (2003). E. A. ELSAYED and A. CHEN, Optimal Levels of Process Parameters for Products with Multiple Characteristics, International Journal of Production Research 31, 1117-1132(1993). D. Y. Golhar, Determination of the Best Mean Contents for a Canning Problem, Journal of Quality Technology 19, 82-84 (1987). S. H. Hong and E. A. Elsayed, Setting Optimum Mean for Processes with Normally Distributed Measurement Error, Journal of Quality Technology 31,338-344(1999).

738

11. S. H. Hong, E. A. Elsayed and M K. Lee, Optimum Mean Value and Screening Limits for Production Processes with Multi-Class Screening, International Journal of Production Research 37, 155-163 (1999). 12. W. G. Hunter and C. P. Kartha, Determining the Most Profitable Target Value for a Production Process, Journal of Quality Technology 9, 176-181 (1977). 13. Y. J. KIM, B. R. CHO and M. D. PHILLIPS, Determination of the Optimal Process Mean with the Consideration of variance Reduction and Process Capability, Quality Engineering 13, 251-260 (2000). 14. M. K. Lee and J. S. Jang, The Optimum Target Values for a Production Process with Three-class Screening, International Journal of Production Economics 49, 91-99 (1997). 15. M. K. LEE, S. H. HONG and E. A. ELSAYED, The Optimum Target Value under Single and Two-Stage Screenings, Journal of Quality Technology 33, 506-514 (2001). 16. M. K. LEE, S. B. KIM, H. M. KWON and S. H. HONG, Economic Selection of Mean Value for a Filling Process Under Quadratic Quality Loss, International Journal of Reliability, Quality and Safety Engineering 11,506-514(2004). 17. P. E. PFEIFER, A General Piecewise Linear Canning Problem Model, Journal of Quality Technology 31, 326-337 (1999). 18. M. A. RAHIM, J. BHADURY and K. S. AL-SULTAN, Joint Economic Selection of Target Mean and Variance, Engineering Optimization 34, 1-14 (2002). 19. M. A. RAHIM and A. B. SHAIBU, Economic Selection of Optimal Target Values, Process Control and Quality 11, 369-381 (2000). 20. K. TANG, and J. LO, Determination of the Process Mean when Inspection Is Based on a Correlated Variable, HE Transactions 25, 66-72 (1993). 21. J. TEERAVARAPRUG, and B. R. CHO, Designing the Optimal Process Target Levels for Multiple Quality Characteristics. International Journal of Production Research 40, 37-54 (2002).

SOME ADVANCED CONTROL CHARTS FOR MONITORING WEIBULL-DISTRIBUTED TIME BETWEEN EVENTS J.Y. LIU, M. XIE, T.N.GOH Department of Industrial and Systems Engineering, National University of Singapore 10 Kent Ridge Crescent, 119260, Singapore Time-between-events (TBE) data are available in industries such as manufacturing, maintenance, and even in service. Recently, control charts have been shown to be useful for the time-between-events to detect changes in the statistical distribution, especially the mean change. A common assumption for control chart design is the time between occurrences of events is exponentially-distributed. However, this is valid only when the events occurrence rate is constant. In this paper, a version of exponentially weighted moving average (EWMA) chart is developed for monitoring Weibull-distributed TBE data. The Average Run length (ARL) and Average Time to Signal (ATS) properties are examined, and an example is given for illustration.

1. Introduction Control charts have been shown to be effective for process monitoring not only in manufacturing but also in maintenance and service where time-between-event (TBE) is an important factor to show the capability and stableness of the process (Ye et al. 2003; Wang and Tsung, 2005; and Zhou et al. 2006). Here, the word events can stand for nonconforming items in a manufacturing process, failures in a maintenance process, accidents in a traffic system, diseases in health care, etc. The events occurrence rate of such a process can be monitored by a control chart which plots the time between events occurs, namely, time-between- events (TBE) chart. TBE charts are especially suitable when the events rarely occur and therefore it is quite difficult to form rational subgroups as the traditional Shewhart control charts require. There have been increasing interests on TBE charts recently. Some researchers suggested employing a control chart based on probability control limits like the Cumulative Count of Conforming (CCC) chart (e.g. Calvin, 1983; Goh,1987; and Bourke,1991), the Cumulative Quantity Control (CQC) chart (Chan et al. ,2000) or their extensions (Xie et al. 2002 and Schwertman,2005). Others proposed to apply the Cumulative SUM (CUSUM) and the Exponentially Weighted Moving Average (EWMA) methods for TBE data directly, as shown by Gan (1998) and Lucas (1985). Moreover, Shewhart 739

740

control charts can also be used to monitor TBE data after a proper transformation, see Radaelli (1998) and Jones and Champ (2002). However, most of the current studies on TBE charts are based on the assumption that the occurrence of events can be modeled by a homogeneous Poisson process, thus the time between two successive events follows exponential distribution. However, the assumption is true only when the events occurrence rate is constant, and as a result may limit the scope of TBE chart. A possible extension is to use Weibull distribution to simulate various TBE situations (including exponential) with non-constant events occurrence rate by varying its scale and shape parameters. This is especially useful in reliability monitoring, where events occurrence rate is rarely constant due to the aging property. Control charts for Weibull data have been studies by many scholars. Earlier studies focus on Shewhart control charts (e.g. Nelson, 1979; Ramalhoto and Morais, 1999). Xie et al. (2002) developed another control chart, named /-chart, for monitoring exponential and Weibull distributed time between failures based on probability control limits. Moreover, the CUSUM and EWMA charts can also be applied for monitoring Weibull-distributed data. Chang and Bai (2001) proposed a heuristic method of constructing X, CUSUM, and EWMA chart for skewed populations with weighted standard deviations. Hawkins and Olwell (1998) studied the optimal design of CUSUM for Weibull data with fixed shape parameter. Borror et al. (2003) investigated the robustness of TBE CUSUM for Weibull-distributed and Lognormal-distributed TBE data. However, few methods have been proposed using EWMA chart. Zhang and Chen (2004) developed a lower-sided and upper-sided EWMA chart for detecting mean changes of censored Weibull lifetimes with fixed censoring rate and shape parameter. In this study we considered the EWMA for complete Weibull data with known parameters. The rest of this paper is organized as follows: Section 2 briefly introduces two existing TBE charts for monitoring Weibull data, and then Section 3 describes the construction of the Weibull EWMA chart with an illustrative example. Section 4 discusses the Average Run Length (ARL) and Average Time to Signal (ATS) properties of the Weibull EWMA chart. Finally, some conclusions are given in Section 5. 2. Control Charts for Weibull-distributed TBE Data Let A',, X2, denote a sequence of time between events data, which are independent Weibull random variables with probability density function:

741 /

/(*)=

N'T"'

Afi. \rj

eKP> ,x>0,/3>0,5>0

(1)

where ft is the scale parameter and rj is the shape parameter. 2.1. The Cumulative Quantity Control (CQC) Chart The CQC chart, also referred as /-chart in Xie et al. 2002, monitors the time between two successive events based on probability control limits. If the TBE is Weibull-distributed and the acceptable false alarm probability is a, the upper control limit (UCL), central line (CL) and lower control limit (LCL) of the CQC can be calculated as: i/-i

l/>7

LCL = P In

2-a

,CL = p[\n(2)f\UCL = 0

a

(2)

so that the probability of a point is located above UCL or below LCL is equal to all. 2.2. The Weibull CUSUM Chart The basic statistics for upper and lower-sided CUSUM chart are S;

=max{0,S;_l+(Xl-k)}

5:=min{0,5 / l 1 +(^, -k)}

(3)

where k is the reference value and can be determined by the following formula when X follows Weibull distribution. k=

pr

(4)

/?, and /?o are the out-of-control and in-control scale parameter, respectively. An out-of-control alarm will arise if the control statistics go beyond the control limits, i.e., S~ < -h or 5,+ > h. Hawkins and Olwell (1998) provides the detailed calculation methods of ARL for Weibull CUSUM chart. 3. The EWMA Chart for Monitoring Weibull-distributed TBE Another alternative is to use EWMA chart for monitoring the Weibulldistributed TBE. The statistic of two-sided Weibull EWMA chart is

Z,=AX,+{\-X)Z,_

(5)

742

where ?, is the smoothing constant that satisfies 0 < k <1. Usually the starting value is set to be the process target, i.e. Z0=/u0, where /J0 the mean of Weibull data. With this definition, it can be deducted that £(Z,) = //„, Var{Z,)=a^-^j\-{\-Xf]

(6)

Therefore, the UCL, CL and LCL can be calculated by UCL = n0+Lv JVar(Z,);CL = na;LCL = //„ - LjVar{Z,)

(7)

where the Lv and LL are the design parameters which influence the width of the control limits. An out-of-control alarm will arise when X exceeds either UCL or LCL. Since the time between events is always positive, the LCL will be set to be zero if the calculated LCL is less than zero. For large value oft, the variance of Z, will approximate constant, and the upper and lower asymptotic control limits hu and hL are given by

An out-of-control signal will arise when Zt< /zL or Zt > hv. Here is an illustrative example of the Weibull EWMA chart. Table 1 shows a set of time between failures data for monitoring the reliability of a process. The first 20 observations are simulated following Weibull distribution with shape parameter r/=2 and scale parameter/?= 10 hours. The next 20 observations were generated following Weibull distribution with r\=2 and /?=5 hours. A twosided Weibull EWMA chart is designed so that the in-control ARL=370 (A=0.10, LV=LL=2.70). Fig. 1 shows the Weibull EWMA chart for the data in Table 1. An out-of-control alarm is raised from the 33rd point, which indicates that the mean time to failure may have decreased. Therefore, engineers need to check the process and try to find out the reasons for it so as to further improve the reliability of the process. 4. Properties of Weibull EWMA chart The ARL properties of an EWMA scheme can be approximated using Markov Chain approach similar to that described by Brook and Evans (1972). The continuous state Markov chain can be evaluated by discretizing the infinite-state transition probability matrix.

743 Table 1 .Time between failures (TBF) data for Weibull EWMA chart

EWMA

UCL

LCL

Failure No.

TBF (hours)

EWMA

UCL

LCL

8.37

6.31

685

5.33

21

1.13

6.69

7.82

4.35

2

3.25

6.01

7.11

5.06

22

6.05

6.63

7.82

4.35

3

4.43

5 85

7.28

4.89

23

3.53

6.32

7.82

4.35

4

6.62

5.93

7.40

4.77

24

4.25

6.11

7.83

4.35

Failure No.

TBF (hours)

1

5

4.48

5.78

7.49

4.68

25

1.70

5.67

7.83

4.35

6

6.44

5.85

7.56

4.61

26

2.61

5.36

7.83

4.35

7

10.36

6.30

7.62

4.55

27

4.43

5.27

7.83

4.34

8

11.13

6.78

7.66

4.51

28

5.99

5.34

7.83

4.34

9

10.37

7.14

7.69

4.48

29

2.22

5.03

7.83

4.34

10

7.92

7.22

7.72

4.45

30

3.50

4.88

7.83

4.34

11

5.65

7.06

7.74

4.43

31

3.48

4.74

7.83

4.34 4.34

12

10.83

7.44

7.76

4.41

32

2.41

4.50

7.83

13

4.20

7.11

7.77

4.40

33

1.43

4.20

7.83

4.34

14

7.52

7.16

7.78

4.39

34

2.75

4.05

7.83

4.34

15

9.97

7.44

7.79

4.38

35

4.59

4.11

7.83

4.34

16

5.94

7.29

7.80

4.37

36

4.75

4.17

7.83

4.34

17

7.77

7.34

7.81

4.37

37

1.29

3.88

7.83

4.34

18

9,00

7.50

7.81

4.36

38

1.47

3.64

7.83

4.34

19

6.04

7.36

7.81

4.36

39

4.72

3.75

7.83

4.34

20

6.90

7.31

7.82

4.35

40

1.68

3.54

7.83

4.34

J 0 -i

15 0 -i

0

1

?

r—

10

,

1?

— - ,

20

,

25

r-

30

,

35

.

,

40

Failure No

Figure 1. The two-sided EWMA chart for monitoring Weibull distributed time between failures

744

Consider a two-sided Weibull EWMA chart with design parameters A, hv and hL, the interval between the lower and upper control limit (hL, hv) is divided into m subintervals of width w. w can be expressed as: W =

h -h ?LL^L m

(9)

The EWMA control statistics Z, is said to be in transient sate (j) at time (t) if hL+jw
j = K + 0 + 0.5)w>y = °X •••m-X

(10)

The control statistics Zt is regarded as in the absorbing state m if the point goes outside the control limits, i.e. Zt> hv or Zt
_p\K+pv-{\-*)m, \ A

P„=\X.

c^+(y+l>v-(l-^K]

c x

'

A

< t < ^ } + ^ , >^±*i},/.„,1„.„_l

/>,„;= 0,7= 0,1,.../H-l P

i=0X..m-l }j=0,l,...,m-l

(11)

=1

r mm

Let Q be the matrix of transition probabilities obtained by deleting the last row and column of P. The vector of ARLs 6 can be calculated with 9 = {I-Q)'\

(12)

where 1 is an m*\ vector of Is and I is a m*m identity matrix. The elements in the vector 0 are the ARL's when the EWMA chart starts in various states. The first element in the vector 6 gives the average run length for the Weibull EWMA chart starting from zero. Let the rth element be the ARL given that the EWMA chart starts from, r can be is achieved by Z

o-hl.

;i3)

745 where [C] stands for the largest integer not greater than C. Hence, the Average Time to Signal (ATS), which presents the average time until an alarm is observed, can be obtained by

ATS^B^X,

=E(R)E(x)=ARL-j3-r

(14)

where R is the number of points ploted until an alarm signals. ARL and ATS are useful measurements to judge the efficiency of a TBE chart for detecting changes in statistical distributions. Table 2 lists some ARL and ATS values for the Weibull EWMA chart in the illustrative example (>.=0.10, LV=LL=2.70). The shape parameter is fixed at 2.0, and scale parameter varies from 2 tolO. The in-control scale /?=10 hours, and in-control ARL=370. Fig. 2 presents the ARL curve for the Weibull EWMA chart, from which we can see that the Weibull EWMA chart is very sensitive to the scale parameter shifts. The ATS value implies that the average time to an out-of-control alarm will be around 46 hours for detecting the scale parameter's change from 10 hours to 5 hours. Table 2. Some ARL and ATS values for the Weibull EWMA 2

5

6

8

8.5

9

10

ARL

5.34

10.38

15.16

64.94

120.67

249.19

370.84

ATS

9.47

45.99

80.63

460.40

908.99

1987.51

3286.48

Scale/?

11

12

13

14

15

18

20

ARL

Scale/?

89.19

35.14

19.89

13.55

10.24

5.98

4.74

ATS

869.48

373.76

229.12

168.14

136.14

95.40

84.07

scale parameter

Figure 2. The ARL curve of the Weibull EWMA chart.

746 5. Conclusions Control charts for Weibull-distributed TBE can be very useful for monitoring reliability processes. In this paper, an EWMA scheme for the monitoring of Weibull-distributed TBE data is proposed, and the ARL and ATS properties are investigated. The results show that Weibull EWMA chart is sensitive to the process shifts and can be very effective for reliability monitoring. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

T.W. Calvin, IEEE Trans. Comport. Hybr. 6, 323 (1983). T.N. Goh, Qual. Assur. 13, 18(1987). P.D. Bourke, J. Qual. Technol. 23,225(1991). L. Y. Chan, M. Xie and T. N. Goh, Int. J. Prod. Res. 38, 397(2000). F.F. Gan, J. Qual. Technol. 30, 55(1998). J.M. Lucas, Technometrics 27, 129(1985). G. Radaelli, Total Qual. Mgnt. 9, 33 (1998). L.A. Jones and C.W.Champ, Qual. Reliab. Engng. Int. 18, 479 (2002). P.R. Nelson, IEEE Trans. Reliab. 28, 283(1979). M. F. Ramalhoto and M. Morais, J. Appl. Stat. 26,129 (1999). M. Xie, T. N. Goh and P. Ranjan, Reliab. Eng. Syst. Safe. 77, 143(2002). Y.S. Chang and D.S. Bai, Qua!. Reliab. Engng. Int.M, 397(2001). D.M. Hawkins and D.H. Olwell, Cumulative Sum Charts and Charting for Quality Improvement, New York: Springer, (1998). C. M. Borror, J.B. Keats and D.C. Montgomery, Int. J. Prod. Res. 41, 3435(2003). L.Y. Zhang and G.M. Chen, J. Qual. Technol^, 321(2004). D. Brook and D.A. Evans, Biometrika 59, 539(1972). N.C. Schwertman, Qual. Reliab. Engng. Int. 21,743(2005). N.Ye, S.Vilbert and Q.Chen, IEEE Trans. Reliab. 52, 75(2003). S. Zhou, D. Sun and J. Shi, IEEE Trans. Autom. Sci. Engng. 3, 60 (2006). K.Wang and F. Tsung, Qual. Reliab. Engng. Int. 21,677(2005).

THE RELATIONSHIP BETWEEN PCIS AND 6 a

L. Y. OUYANG, C. H. HSU Graduate Institute of Management Science, Tamkang Taipei, Taiwan, R.O.C. E-mail [email protected]

University,

Six Sigma has already become an efficient improvement technique adopted by a great number of enterprises. Numbers of Sigma has become a tool of measuring process capability in some enterprises. However, many enterprises still use Process Capability Indices (PCIs) to measure the process capability. The paper will research the relationship between PCIs and number of Sigma. In bilateral specifications, the paper will research the relationship between the PCIs which are Cp, Cpk, Cpm and Cpmk and number of Sigma. In unilateral specifications, the paper will research the relationship between the PCIs which are Cpu and Cpl and number of Sigma. If supplier and buyer use different tools to measure the process capability, the relationship can decrease the communicate noise.

1.

Introduction

In the 1980s and early 1990s, Motorola was one of many U.S. and European corporations whose lunch (along with all other meals and snacks) was being eaten by Japanese competitors. Motorola's top leaders conceded that the quality of its products was awful. They were, to quote one Motorola Six Sigma veteran, "In a word of hurt." Like many companies at the time, Motorola didn't have one "quality" program, it had several. But in 1987, a new approach came out of Motorola's Communications Sector—at the time headed by George Fisher, later top exec at Kodak. The innovative improvement concept was called "Six Sigma" (Pande et al., 2000). Six Sigma is named after the process that has six standard deviations on each side of the specification window. Such a process produces 3.4 defects per one million opportunities in the long term (Wyper and Harrison, 2000). Based on Tong et al.(2004), six sigma has been initiated using statistical tools and techniques in business, transactional, and manufacturing process. It has been proven to be successful in reducing costs, improving cycle times, eliminating defects, raising customer satisfaction, and significantly increasing profitability. In the first 5 years of Six Sigma implementation, Motorola achieved saving of SUS 2.2 billion. Other companies followed, e.g. 747

748

GE, ABB, Bombardier and Allied Signal (Wyper and Harrison, 2000). Apparently, Six Sigma has already become an efficient improvement technique adopted by a great number of enterprises. Numbers of Sigma has become a tool of measuring process capability in some enterprises. However, many enterprises still use Process Capability Indices (PCIs) to measure the process capability. If supplier and buyer use different tools to measure the process capability, the communicate noise will be maybe happened. So the relationship between number of Sigma and PCIs is an important topic. Because PCIs can differentiate two parts which are unilateral specifications and bilateral specifications, the paper will research the relationship between the two parts of PCIs and number of Sigma. In bilateral specifications, the paper will research the relationship between the PCIs which are Cp, CPk, Cpm and Cpmk and number of Sigma. In unilateral specifications, the paper will research the relationship between the PCIs which are Cpu and Cpi and number of Sigma. Based on Linderman et al.(2003), Motorola set this goal so that process variability is ± 6crfrom the mean. They further assumed that the process was subject to disturbances that could cause the process mean to shift by as much as 1.5
The Relationship Between ka and PCI when the Process Mean shifts 1.5cr

Because the specifications are different in different product, manager of process can't evaluate process performance from //and a right away. For the above reason, Juran (1974) combined process parameters with product specifications to bring up the idea of Process Capability Lndices(PCIs). Following the above idea, Kane(1986) proposed that Cp, Cpk evaluated process capability of bilateral specifications. And then Chan et al.(1988) used the idea of loss function to improve Cpk and to propose Cpm. Pearn et al. (1992) proposed another index called Cpmk which can reflect the degree that process mean shifts the process centering. The section defines Cp, Cpk, Cpm, and Cpmk to compute the corresponding value in 6a, 5a, Aa and 3owhen the process mean shifts 1.5a. The indices Cp, Cpk, Cpm, and Cpmk have been defined as fallows:

749

c =*o

3a

d-\fi-f\ \r i

c

pk

3a

=

d

' C pm

d

-\jU-T\

3^1^-T)1

pmk

The section will research the corresponding values of Cp, Cpk, Cpm, and Cpmk when the process mean shifts 1.5 a (i. e., I \i-T I =1.5
_d_ 3
d-\ju-T\ 3a

ka-l.5a 3a

c> 'pk = pm

k 3J325

2

3^a +(ju-Tf

2

k-1.5

ka 3-sla +(l.5a)2

2

d-\ju-T\ pmk

k 3

=

3-Ja +{M-T)

2

kcr-1.5a 2

=

3ja +(\.5a)

2

fc-1.5 3J325

The corresponding values which are Cp, Cpk, Cpm and Cpmk can be computed as follows table 1 in 6a, 5a, 4a and 3a when the process mean shifts 1.5 a .

750 Table 1: the corresponding value of PCIs of bilateral specifications when the process mean shifts 1.5
yield

cP

Cpk

^pm

^prnk

3(7

0.9331928

1.00

0.50

0.55

0.28

4(7

0.9937903

1.33

0.83

0.74

0.46

5a

0.9997673

1.67

1.17

0.92

0.65

6a

0.9999966

2.00

1.50

1.11

0.83

Because Six Sigma was allowed that the process mean can shift 1.5cr off the target based on Linderman et al.(2003), the process mean shifts right or left 1.5cr off the target in smaller-the-better larger-the-better process. If we will find the minimum value of Cpu of arriving 6 a standard, the process mean must shift right 1.5cr off the target. Because the value of Cpu is more and more large when the process mean shifts left. For the same cause, if we will find the minimum value of CP| of arriving 6 cr standard, the process mean must shift left 1.5 a off the target. When that the process quality level arrived ka (i. e., d = ka) and the process mean shifts 1.5cr, two unilateral specifications PCIs which are Cpu and Cpi can be showed as fallows: _USL-n pu

=

3cr

=

la

H-LSL "'

(k-l.5)cr

=

3cr

k-\.5 3

(k-l.5)cr 3cr

=

£-1.5 3

The corresponding values which are Cpu and Cpi can be computed as follows table 2 in 6
yield

L-pu

Cpi

la

0.9331928

0.50

0.50

4(7

0.9937903

0.83

0.83

5(7

0.9997673

1.17

1.17

6a

0.9999966

1.50

1.50

751 3. Conclusions Because Six Sigma has already become an efficient improvement technique adopted by a great number of enterprises, numbers of Sigma has become a tool of measuring process capability in some enterprises. No papers research the relationship between PCIs and numbers of Sigma perfectly yet. So the paper researched the relationship between PCIs and numbers of Sigma. In the second section, the relationship between PCIs and numbers of Sigma was built, and the corresponding values which are Cp, Cpk, Cpm, Cpmk, Cpu and Cpi were computed in 6a, 5a, 4a and 3
6. 7.

8.

9.

R. A. Boyles, Process capability with asymmetric tolerances, Communications in Statistics-Simulation and Computation 23, 615-643 (1994). L. K. Chan, S. W. Cheng and F. A. Spiring, A new measure of process apability: Cpm, Journal of Quality Technology 20(3), 162-175 (1988). J. M. Juran, Juran's quality control handbook, 3 rd Edition, McGraw-Hill, New York (1974). V. E. Kane, Process capability indices, Journal of Quality Technology 18, 41-52(1986). K. Linderman, R. G, Schroeder, S. Zaheer and A. S. Choo, Six Sigma: a goal-theoretic perspective, Journal of Operations Management 21, 193-203 (2003). P. S. Pande, R. P. Neuman and R. R. Gavanagh, The Six Sigma Way, McGraw-Hill, New York (2000). W. L. Pearn, S. Kotz and N. L. Johnson, Distributional and inferential properties of process capability indices, Journal of Quality Technology 24, 216-231 (1992). J. P. C. Tong, F. Tsung and B. P. C. Yen, A DMAIC approach to printed circuit board quality improvement, International Journal of Advance Manufacturing Technology 23, 523-531 (2004). B. Wyper and A. Harrison, Deployment of Six Sigma methodology in human resource function: a case study, Total Quality Management 11, NOS4&5, 720-727 (2000).

C O N F I D E N C E INTERVALS FOR M E A S U R E S OF VARIABILITY IN B A L A N C E D M I X E D G A U G E R&R STUDY

D . J. P A R K Division of Mathematical Sciences College of Natural Science Pukyong National University 599-1 Daeyeon 3-Dong Nam-Gu, Pusan, 608-737, South Korea E-mail: [email protected]

We consider a design for a balanced mixed model with covariates considered as fixed effects and one factor as random effects. In this paper, some methods for constructing confidence intervals on measures of variability in repeatability and reproducibility are provided. The objective of this study is to compare the confidence intervals on repeatability and reproducibility to apply for R&R study. A numerical example is provided.

1. Introduction In many measurement studies, a gauge is used to obtain replicate measurements on units by several different operators, setups, or time periods. One of the common experiments employed in industry is to properly monitor manufacturing process, e.g., repeatability and reproducibility. The variability is inherent in the measurement system, which we generally think of as the precision of the gauge. Burdick et al.(2005) define that repeatability represents the gauge variability when it is used to measure the same unit(with the same operator or setup or in the same time period). Reproducibility is referred to as the variability arising from different operators, setups, or time periods. Borror et al.(1997) considered the capability of a gauge as designing a statistical experiment in which several manufactured parts are measured under controlled conditions. The gauge capability is expressed as variance components, one variance components for the repeatability of the gauge and the other variance component for the reproducibility of the gauge. These variance components are analyzed using 752

753

ANOVA method that leads to a direct and convenient method of estimating the variance components in an experimental design. 2. The Balanced Mixed Model A traditional gauge study utilizes a random two-factor design with a random sample of operators and of parts. That is, o operators are randomly chosen to measure p randomly selected parts from manufacturing process. Each measurement is made n times by each operator. The measurements, Yijk, are modeled by Yljk = lx + P, + Oj + (PO)ij + Eijk i = l,.-,p;j

(1)

= l,...,o;fc= l,...,n

where n is a constant as overall mean, Pi, Oj, (PO)ij, Eijk are jointly independent normal random variables with means of zero and variances aP, a o> a%0> a n d a%> respectively. Adamec and Burdick(2003) extended results of the typical two-factor repeatability and reproducibility (R&R) study to a situation with three random factors. Unlike this type of traditional model, the measurements of parts, Yij, can be linearly related to a covariate (or concomitant variable), Xij. I ope r a t o r s ^ operating machines) are randomly sampled and J measurements are made for each operator(or operating machine). Table 1.

Breaking Strength Data.

(Y = strength in pounds and X = diameter in 1 0 - 3 inches) Machine

1

Machine

2

Machine

3

Y

X

Y

X

Y

X

36

20

40

22

35

21

41

25

48

28

37

23

39

24

39

22

42

26

42

25

45

30

34

21

49

32

44

28

32

15

Montgomery(1984) published a data set in Table 1 of this experiment. We here assume that three different operating machines were randomly chosen and they measured a monofilament fiber strength for a textile company. The process engineer is interested in determining the measurement of variability in the breaking strength of the fiber measured by the three machines

754

randomly chosen and the measurement of variability among three machines. However, the strength of a fiber is related to its diameter, with thicker fibers being generally stronger than thinner ones. A random sample of five fiber specimens is selected from each machine. The fiber strength(Yy) and corresponding diameter(Xij) for each specimen are shown in Table 1. Then the replication of measurements, Yjj, is modeled by Y{j = n + pXij +Oi + E^ i = l,...,I;j

(2)

= 1,..., J

where \i is a constant, /3 is linear regression coefficient indicating the dependency of measurements Yij, X^ is a covariate, Oj and E^ are jointly independent normal random variables with means of zero and variances O"Q and a\, respectively. This model has both fixed effects for a covariate and random effects for operating machines so that it is a mixed model with two error terms. One possible partitioning for ANOVA for model (2) is shown in Table 2. Table 2.

ANOVA for model(2)

sv

DF

SS

Mean

1

IJY2

Covariate after Mean

1

MQ\&xxa

Operating Machines adjusted for Covariate after Mean

I- 1

dyya

Error

IJ -- 7 - 1

•5yyw

Total

IJ

^i^-jYij

i

Jxxw)

\ P^Oxxw

HQ\&xxa

~T~ ^>xxw)

P^Jxxw

The notation in ANOVA Table 2 is as follows: $w = Sxyw/Sxxw, \^xya Y..j

~r ^xyw)/\^xxa i ^xya

=

i &xxw))

J*ji\Xi.

^xxa

A )(li

J^iy-^-i.

A...J > ^yya

— Y..), £>xxw = 2-li2-Jj\Xij

2

Y,iT,j(Yij-Yi.) , Sxyw = Y,iY,j(Xij-Xi.)(Yij-Yi.). of model (2), RA/{d2E + Ja20) ~ X?_2> RW/
ti\V

=

^yyw

P\y&xxwi

PA

=

—

Pc = J2-'i\ii.

— A j J , Jyyw

=

Under the assumptions where R ~ Xjj-i-i A =

^xya/^xxai

and liA

anu

n\y

are independent. Reproducibility in model (2) is the variation attributed to using different operating machines to measure the specimens. Repeatability represents the

755

variation within specimens on the same operating machine. The variance component that measures reproducibility is <JQ and the variance component that describes repeatability is a\: reproducibility

®O

(3) a

repeatability

=

a

E

W

Parameters to make the estimates are shown in Table 3. Table 3.

Parameters of interest in model (2)

Parameter

Definition

IP = o"o

Measurement of variability attributed to using different operating machines

1M = &%

Measurement of variability within specimens on the same operating machine

3. Confidence Intervals on Measurement of Variability Using the distributional properties in model (2) the parameters of interest in Table 3 are written as functions of expected mean squares. E(SA) =
(5)

= a

E = ®w

where S^ = RW/(U - I - I) and S\ = RA/(I - 2). Modified large sample method that Burdick and Graybill(1992) discussed can be employed for constructing confidence intervals on individual variance components. Another method for constructing confidence intervals on the parameters in Table 3 could be SAS(Statistical Analysis System) procedure MIXED employing Restricted Maximum Likelihood Estimation(REML)which are generally preferred to the ANOVA estimators. The concept of generalized inference that Tsui and Weerahandi(1989) introduced for testing hypotheses can also be used to construct confidence intervals on the parameters.

756

3.1. Modified Large Sample

Method

3.1.1. Confidence Interval on ~JM Using equation (5) measurement of repeatability 7 M is written as 1M

= 0W.

(6)

Since modified sum of squares Rw/°% is a chi-square random variable with IJ — I — 1 degrees of freedom. The exact 100(1 — a)% confidence interval on repeatability 7 M is c2

c2

F

(7)

'F

where ir(a/2:<^1,d/2) is the 100(1 — a/2) percentile F—value with df\ and dfo degrees of freedom. 3.1.2. Confidence Interval on 7 P Using equation (5) measurement of reproducibility jp is written as IP =

j

•

(8)

An approximate 100(1 — a)% confidence intervals on 7 P is constructed using the method of Ting et al.(1990) 1 SA ~ Sw - (GxSA + H2Sw J

+ Gi2SASw)?,

&A ~~ $w + (HiSA + G2SW + where d

= 1-

1/F(Q/2:/_2J0O),

Hi2SASw)i

G 2 = 1 - l/Fia/2.u_j_hoo),

l / - f ( l - a / 2 : / - 2 , o o ) ~ 1, # 2 = l/F(l-a/2:IJ-I-l,oo)

~ 1> Gi2

GlFf-HR/Ft, Fx = ^ ( a / 2 : / _ 2 , / j _ / _ i ) , # 1 2 = and F 2 = i r ( 1 _ Q / 2 :/-2,/j-/-i)3.2. Restricted

Maximum

(9)

Likelihood Estimation

#i =

= [(-Fl - I ) 2 ~

[(1-F2)2-H*F$-Gl]/F2,

Method

Measurements of repeatability 7 M and reproducibility 7 P can be obtained using the PROC MIXED of SAS that reports asymptotic standard errors for restricted maximum likelihood estimators. The likelihood-based statistics to estimate variance components is Wald statistics Z which is computed as the parameter estimate divided by its asymptotic standard error. The asymptotic standard errors are computed from the inverse of the second

757 derivative matrix of the likelihood with respect to each of the variance parameters. Wald statistics Z is valid for large samples but unreliable for small data sets. 3.2.1. Confidence Interval on 7 M An approximate 100(1 - a)% confidence interval on measurement of repeatability 7 M is thus obtained using asymptotic standard error by REML method: v\ 5% 2 -X(i/i,l-<*/2)

v\ &%

T

2 X(Vua/2)

(10)

where v\ is degrees of freedom defined as v-i = 2 x Z\, Z\ is Wald statistics defined as Z\ = d\/S.E.(a2E), and b\ is the REML estimator. 3.2.2. Confidence Interval on "/p Similarly, an approximate 100(1 — a)% confidence interval on measurement of reproducibility •yp is obtained using asymptotic standard errors for REML estimators: vi <3Q 2 X(j/ 2 ,l- a /2)

'

VI <5O 2 X(i/2,a/2)

(11)

where v2 is degrees of freedom as v2 = 2 x Z\, Z2 is Wald statistics defined as Z2 = &O/S.E.(&Q), and &Q is the REML estimator. 3.3. Generalized

Confidence

Intervals

Application of the generalized inference concept requires a generalized pivotal quantity(GPQ) which extends the standard notion of a pivotal quantity. 3.3.1. Confidence Interval on ^M Since Ry/jo-\ ~ X / J _ J _ I > @W quantity as follows:

m

equation (5) can be written as a pivotal

6w _(IJ-I-

l)s2w

= —w*—

(12)

where s ^ is an observed value of S^ and W* = (IJ — I—l)Syy/o~%. Define R\ as the solution for 6\y- The distribution of R\ is completely determined

758 by W* using Monte Carlo methods. An approximate 100(1—a)% confidence interval on repeatability is defined as (13)

[-^la^'-Rli-o/a]

where i?i o / 2 and Ril_a/2 are the percentile a/2 and 1 - a/2 of the distribution of R\, respectively. 3.3.2. Confidence Interval on 7p Since RA/{&% + JO-Q) ~ Xi-2> ®o in equation (5) can be written as a generalized pivotal quantity as follows: (T 7 _ T _ 1 U 2

{I ~ 2 ) 42 U*

e0-)

(IJ-I-l)sw W*

(14)

where s\ is an observed value of S\ and U* = (I - 2)S2A/{o-2E + J
are

where i?2Q/2 d R2l_a/2 bution of R2, respectively. Table 4. Parameter 7F = ° o

Method MLS GEN

=o\

the percentile a/2 and 1 — a / 2 of the distri-

Two-sided 90% Confidence Intervals using d a t a in Table 1

REML

1M

(15)

Lower Limit

Upper Limit

Interval Length

0

52.58

52.58

0.22

3338.76

3338.54

0

52.71

52.71

MLS

1.42

6.12

4.70

REML

1.43

6.24

4.81

GEN

1.43

6.19

4.76

MLS, REML, and GEN in Table 4 represent modified large sample method, restricted maximum likelihood estimation method, and generalized confidence intervals, respectively. Because of limited space, we did not show simulation results for three methods. The generalized confidence intervals were calculated based on 10,000 simulated values of R\ and Z?2- Since Wald

759 statistics Z is valid for large sample, it is not surprising for Table 4 results t h a t the R E M L method generates wider interval length for reproducibility parameter ^p t h a n MLS and G E N methods.

4.

Conclusion

W h e n practitioners meet an experimental design where the measurements of parts, Yij, are linearly related t o a covariate, Xij, and the measurements are made for operating methods(or operating machines) randomly sampled in manufacturing process, they can use three methods provided in this paper to construct confidence intervals on repeatability and reproducibility. In order to measure variability of repeatability three methods generate desirable confidence interval lengths. However, MLS and G E N methods are recommended t o measure the variability of reproducibility when sample sizes are moderately small. REML method is useful only for large samples.

References 1. E. Adamec and R. K. Burdick, Confidence Intervals for a Discrimination Ratio in a Gauge R&R Study with Three Random Factors, Quality Engineering 15(3), pp.383-389 (2003). 2. C. M. Borr, D. C. Montgomery, and G. C. Runger, Confidence Intervals for Variance Components from Gauge Capability Studies Quality and Reliability Engineering International 13, pp.361-369 (1997). 3. R. K. Burdick, C. M. Borror, and D. C. Montgomery, Design and Analysis of Gauge R&cR Studies, ASA-SIAM Series on Statistics and Applied Probability (2005). 4. R. K. Burdick and F. A. Graybill, Confidence Intervals on Variance Components, Marcel Dekker, Inc. (1992). 5. D. C. Montgomery, Design & Analysis of Experiments 2ed. John Wiley & Sons, Inc. (1984). 6. N. Ting, R. K. Burdick, F. A. Graybill, S. Jeyaratnam, and T.-F.C. Lu., Confidence Intervals on Linear Combinations of Variance Components, Journal of Statistical Computation and Simulation, 35, pp. 135-143 (1990) 7. K. Tsui and S. Weerahandi, Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters, Journal of the American Statistical Association, 84, pp. 602-607 (1989)

T H E R A P E U T I C DECISION M A K I N G FOR U N C E R T A I N T Y AVERSE PATIENT *

T . S A T O W A N D H. K A W A I Tottori University Faculty of Engineering, 4-101 Koyama-Minami, Tottori 680-8552, Japan E-mail: {zwsatow, kawai}@sse.tottori-u.ac.jp

A therapeutic strategy is planned by a doctor who takes charge of medical treatment. The strategy is based on a criterion which maximizes an expected utility of patient. In general, the axiom of Savage's expected utility is used to measure the patient's utility. However, Savage's utility can not be used when several probability measures are nominated as a candidate for the derivation of the utility. For measuring a patient's subjective value, health-related QOL scale is considered as one of most important issue. As a recent topic, Q-TWiST(Quality-adjusted Time Without Symptoms and Toxicity method) is a powerful tool to measure the variable subjective value of patient. In order to derive Q-TWiST, the survival probability of patient has to be estimated from censored or non-censored clinical data. Q-TWiST takes different values when there are two or more recommended estimation methods. Which values should be adopted by the viewpoint of patient? For the purpose of solving the above-mentioned problems, Q-TWiST which is composed by several probability measures is proposed. The sensitivity analysis concerning the utility value which depends on the state of the patient is made.

1. Introduction 1.1. Quality

of Life

An extension of survival time has been considered as a criterion for highquality medical treatment, because it is easy to do an objective assessment. Misunderstanding of which life-prolongment is the profit of patient was caused by an overemphasis of life-sustaining treatment. Therefore, the situation which increases the patient's load is caused by intervention which does not consider patient's well-being. Such an overemphasis on the objective "This work is supported by Grant-in-Aid for Young Scientists (B) 17710137.

760

761 assessment can satisfy only with the unilateral side of wide-ranging health care. As a standard concept of the world, the goal of health care is not only helping patient's long life but also enhancing Quality Of Life(QOL). In the current constitution of the World Health Organization(WHO), the following principles are basic to the happiness, harmonious relations and security of all peoples: Health is a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity1. As we can face with informed consent, patient's independence and dignity begin to be being esteemed. This would be an end of paternalism in the relation between doctor and patient. How should we evaluate patient's profit, i.e. patient's well-being? One of answers to the question is QOL which attracts attention in recent years. A lot of health services and outcome researches were carried out in many countries in 1980's. As a result, some health-related QOL measures were proposed, and the application to a medical practice site started, too. WHO defines QOL and tries to spread the concept. WHOQOL is "An individual's perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns. It is a broad ranging concept affected in a complex way by the person's physical health, psychological state, personal beliefs, social relationships and their relationship to salient features of their environment" 2 . In addition, MOS 36-Item Short-Form Health Survey(SF-36) can be also given as the typical health-related QOL 3 . These QOL measure only the "quality" of a patient's prognosis. Quality-Adjusted Life Years(QALYs) was proposed as one method of implementing the trade-off between "quality" and "quantity". As an extended model of QALYs, Quality-adjusted Time Without Symptoms and Toxicity method(Q-TWiST) was proposed 4 . Q-TWiST is used when we want to decide a treatment method.

1.2. Quality-adjusted Toxicity

Time Without Symptoms

and

Q-TWIST is a statistics technique proposed as a model by whom the side reaction by the therapeutic intervention etc. can be considered for the cancerous patient 5 ' 6 - 7 . Q-TWiST consists of 3 health states; TOX(toxicity), TWiST(time without symptoms or toxicity) and REL(relapse). TOX is time duration with conscious side reaction by treatment. TWiST is time duration without conscious side reaction and symptom. REL, sometimes calls PROG (progression) is time duration from admission of relapse to the

762

death. Utility coefficients are given to each health state. Q-TWiST is defined that the sum of the products of the proportion of survived patient of each health multiplied by the utility value of each health state. In a survival time analysis, Kaplan-Meier or Cutler-Ederer methods are used in comparing between treatment groups. The Kaplan-Meier method is recommended when the number of cases is less than or equal to 50. On the other hand, the Cutler-Ederer method is adopted when there are a lot of numbers of cases.

1.3. The aim of work The threshold value is about 50 as previously stated. Which method should be adopted when the number of cases is about 50? In other words, you can not judge which estimation to have to use. As another instance, some different case studies for the same disease are reported under the same estimation method. By the difference of estimation method or of case report, survival curves take different sharp. As a result, Q-TWiST takes a different value. Therefore, the possibility that different treatment method is recommended is caused. It causes confusion to the decision making of therapeutic strategy. In general, the behavior of decision maker follows a rule which maximizes his(her) expected utility. This is the concept of Savage's 8 expected utility theory. A probability measure underlies the behavior of decision maker, and a unique probability measure is assigned to future events. On the other hand, even if any unique probability measures are assigned, it is known that behaviors not expressible exist. As a famous instance is known as Ellsberg paradox 9 . In order to solve the Ellsberg paradox, Gliboa and Schmeidler 10,11 ' 12 developed Savage's axiom. In their axiom, the ambiguity can be expressed by two or more probability measures. Under these measures, the decision maker behaves for maximizing an expected utility which is calculated by using the most pessimistic probability. It is called uncertainty averse(or ambiguity averse) since the most pessimistic probability is used. In this research, it thinks about decision making for planning a treatment strategy when the patient is uncertainty averse. The conception and definition of uncertainty averse is introduced in section 2. In section 3, the formulation of our Q-TWiST is proposed. The sensitivity analysis of Q-TWiST is in section 4. Finally, a summary and problem in the future are described in section 5.

763

2. Uncertainty Averse Let S be a state space, and T be an algebra which consists of events. A function ip : T —» [0,1] is called probability capacity when the following conditions are satisfied. V # ) = o , ^ ( s ) = i,

(i)

(\M, B € .F)J4 C B => V(^) < V W

(2)

It says that the probability capacity is convex if the following inequality is satisfied. (\/A,Bef),

iP(AuB)

+ i>(ADB)

> rp{A)+^{B).

(3)

Further, it is called probability charge when the equal sign is always satisfied. Let define the core of ip. The core of probability capacity tp is Q{iP) = { p | p e probability charge, (yA e T), ip(A) < p(A) < ip'{A)} , (4) where rl>'(A) = l-,l>(Ac).

(5)

c

The notation A is a complementary set of A. The ip' : T —» [0,1] is called conjugate. The core contains all probability measures that are compatible with the information conveyed by the probability capacity ip and the conjugate ip' 13 . Now, we interpret the state space 5 as a state space which expresses the condition of patient. Let X be a clinical treatment set, and / and g are functions to translate S to X. We call each function an act. A set H consists of close and convex probability measures on (S,T). A notation >indicates the order of preference. If you prefer B more than A, then B x A. For / and g, if

/ x g <=^> min j / u(f(s))p{ds)\p

e H I > min I / u{g(s))p(ds)\p G H I , (6)

then acts / and g is called uncertainty averse. The function u is a utility function. Such preference is called maxmin expected utility (MMEU). It was axiomatized by Gilboa and Schmeidler11. On the other hand, Schmeidler12

764

axiomatizes the order of preference for the probability capacity. For u X —» R, probability charge ip, f > g <==> m i nJ / « ( / ( s ) ) # t e ) j > min i^J u( 5 ( s ))V(d*)| .

(7)

This is called Choquet expected utility(CEU). Especially, if ip is convex, then J u(f(s))i>(ds) = min Uu(f(s))p(ds)\p

£ H\ .

(8)

3. Q-TWiST 3.1. Q-TWiST

Derivation

Q-TWiST is calculated by the mean duration of 3 clinical health states and their utility coefficients. Usually, the utility coefficient for TWiST is assumed to be 1. The utility coefficient for death is 0. Utility coefficients for TOX and REL depend on the individual. If utility coefficients for TOX and REL are /^TOX and ^REL respectively, then Q-TWiST can be calculated by the following equation. QTWiST = fiToxL[TOX]

+ L[TWiST] + »RELL[REL}.

(9)

In general, coefficients /XTOX and //REL are weights between 0 and 1. L[TOX], L[TWiST], and L[REL] in Eq.(9) are the mean duration with being each state, respectively. The outline of this model is as follows. The patient can select one method from n kinds of treatment methods. There are m probability measures to estimate the survival probability. In this subsection, we use a treatment method k (= 1, 2, • • • ,n) and probability measure i (= 1,2, • • • , m). Usual Q-TWiST is defined on a unique probability measure. Let T^t be the length of the TOX period, T$i be the time to relapse, and T ^ be the time to end point. In general, the end point is interpreted as the death of patient. However, it should be noted that it is not all. A function Sji(t) is the survival estimator for S^^t) = P{T^ > t}. By using the estimator Sj^t) for each treatment method, Q-TWiST which is derived in Eq.(9) is rewritten as follows.

QTWiSTi(fc) = /iTOxTOXi(A;) + TWiST^A;) + where

^RELREL^/C),

(10)

765

TOMk)

= f " SiMdt, Jo TWiST^fc) = p {Sl/t) - SlM} RELi(fc) =

MREL p

{SIM

(11) dt,

(12)

~ S£i(t)} dt.

(13)

The upper bound rfc denotes the time to the end of evaluation period. As an estimation method of rjt, a median value at the follow-up study period is used. However, there is criticism to use of the median follow-up time. Shuster14 was pointing out that the definition of the median follow-up time is different in abstracts at ASCO and AACR. Figure 1 shows the survival probability for each health state. Q-TWiST is obtained by multiplying the area of each state by its utility value.

60

120

ISO

240

300

360

420

480

Days

Figure 1.

3.2. Uncertainty Averse

Partitioned survival probability

Q-TWiST

In the previous subsection, we used a unique probability measure i to derive Q-TWiST. Now it is assumed that some appropriate measures appear as a candidate of Q-TWiST. For arbitrary time v t (> 0), a minimum element of the survival probability vector is defined as follows. SM£(0 = Min {S^(t),

S* 2 (0, ••• , S £ m ( 0 } .

(14)

The Q-TWIST value when the uncertainty averse patient selects treatment method k is QTWiST(fc) =

MTOXTOX(A;)

+ TWiST(A;) + ^RELREL(A;),

(15)

766

where

TOX(fc)= / Jo

SMf(t)dt,

.

TWiST(fc) = J " J5M2fc(i) - SM?(t)} dt, REL(k)=

|^M 3 f c (i)-5M 2 f e (t)}df.

(16)

(17) (18)

Therefore, the best treatment for the uncertainty averse patient, k*, is k* = {k; Max QTWiST(fc), k = 1,2,-- • ,«} . 3.3. Sensitivity

(19)

Analysis

A sensitivity analysis for Q-TWiST is useful to select a treatment method. It assumes that the difference between QTWiST(A) and QTWiST(B) is G(A,B), i.e. G{A, B) = QTWiST(>4) - QTWiST(B).

(20)

If G(A, B) takes a positive value then the treatment A is preferred (avoided) than the treatment B, i.e. G(A,B) > (<)0 <=> A >- (~<)B. The relation G(A, B) — 0 is rewritten asiollows. _ REL(g) - KEL(A) MTOX- TOx(A)-TOX(B)MREL

TWiST(i4) - TWiST(ff) TOX(y4)-TOX(S) '

(

'

Eq.(21) can be considered as a threshold line which decides the recommended treatment method. If a plot (//REL,/^TOX) is in the left hand side of Eq.(21) then the treatment A should be selected. On the other hand, if a plot (/iREL,MTOX) is in the right hand side of Eq.(21) then the treatment B should be selected. (Refer to Figure 2.) 4. Conclusion We proposed the basic theoretical formulation of Q-TWIST for the uncertainty averse patient. The sensitivity analysis for Q-TWiST is introduced. Usually, the utility value /i-rox and ^REL are estimated from QOL information. However, the case, where enough QOL data cannot be collected happens frequently. It is necessary to devise the estimation method of appropriate weight when QOL information is not enough.

767

_ 8KI.;»; - HKl.l/tt ! / CIVMC - TOX.,,:, -.T
/

A » B /

/ * -< B

Figure 2. Sensitivity analysis with a threshold line

References 1. World Health Organization, BASIC TEXTS Forty-Fourth Edition, WORLD HEALTH ORGANIZATION, (2004). 2. World Health Organization, WHOQOL Measuring Quality of Life, WORLD HEALTH ORGANIZATION, (1997). 3. http://www.sf-36.org/ 4. R.D. Gelber and A. Goldhirsch, A New Endpoint for the Assessment of Adjuvant Therapy in Postmenopausal Women with Operable Breast Cancer, Journal of Clinical Oncology, 4, 1772-1779, (1986). 5. P. P. Glasziou, B. F. Cole, R. D. Gelber, et at, Quality Adjusted Survival Analysis with Repeated Quality of Life Measures, Statistics in Medicine, 17, 1215-1229, (1998). 6. A. Goldhirsch, R. Gelber, R. Simes, et al., Costs and Benefits of Adjuvant Therapy in Breast Cancer: A Quality-Adjusted Survival Analysis, Journal of Clinical Oncology, 7, 36-44, (1989). 7. M.A. Nooija, J.C. J.M. de Haesb, L.V.A.M. Beexc, et al. for the EORTC Breast Cancer Group, Continuing Chemotherapy or not after the Induction Treatment in Advanced Breast Cancer Patients: Clinical Outcomes and Oncologists ' Preferences, European Journal of Cancer, 39, 612-621, (2003). 8. J.L. Savage, The Foundations of Statistics, John Wiley, New York, (1954). 9. D. Ellsberg, Risk, Ambiguity, and the Savage Axioms, Quarterly Journal of Economics, 75, 643-669, (1961). 10. I. Gilboa, Expected Utility with Purely Subjective Non-Additive Probabilities, Journal of Mathematical Economics, 16, 65-88, (1987). 11. I. Gilboa and D. Schmeidler, Maxmin Expected Utility with Non-Unique Prior, Journal of Mathematical Economics, 18, 141-153, (1989). 12. D. Schmeidler, Subjective Probability and Expected Utility without Additivity, Econometrica, 57, 571-587, (1989). 13. K.C. LO, Correlated Equilibrium under Uncertainty, Mathematical Social Sciences, 44, 183-209, (2002). 14. J.J. Shuster, Median Follow-Up in Clinical Trials, Journal of Clinical Oncology, 9, 191-192, (1989).

THE AUTOCORRELATED GWMA CONTROL CHART

SHEY-HUEI SHEU Department of Industrial Management, National Taiwan University of Science and Tchnology, 43, Keelung Road, Section 4, Taipei 106, Taiwan

SHIN-LI LU Department of Industrial Management, National Taiwan University of Science and Technology, 43, Keelung Road, Section 4, Taipei 106, Taiwan

Traditionally, using a control chart to monitor a process assumes that process observations are normally and independently distributed. In fact, for many processes, products are either connected or autocorrelated and, consequently, obtained observations are autocorrelative rather than independent. In this scenario, applying an independence assumption instead of the autocorrelation for process monitoring is unsuitable. This study examines a generally weighted moving average (GWMA) with a time-varying control chart for monitoring the mean of a process based on autocorrelated observations from a first-order autoregressive process (AR( 1)) with random error.

1. Introduction Statistical Process Control (SPC) is a significant quality control issue in which data analysis is employed to determine whether the process is under control. One primary aim of SPC is to detect instantly a process shift and adopt the necessary corrective actions to improve process quality. Control charts are widely applied tools for monitoring processes. How to best select and apply a control chart, therefore, is a critical task. Alwan and Reoberts [1] applied an appropriate time series model to illustrate autocorrelated processes and used control charts to monitor forecast errors of the process. Montgomery and Mastrangelo [2] proposed a method which fitted a proper time series model for original data and then constructed the exponentially weighted moving average (EWMA) control chart for the residuals. Thombs and Padgett [3] studied Shewhart charts when process observations can be modeled as an AR(1) process with random error. Wardell, Moskowitz and Plante [4] considered four charts in the presence of data correlation and 768

769 compared the performance of the four charts when the process can be described as an autoregressive moving average process of order 1 and 1 (ARMA (1,1)).. Lu and Reynolds [5] considered the performance of EWMA charts of the residuals and observations. Studies have demonstrated that the EWMA chart of observations is more efficient than the Shewhart observational chart in detecting small shifts of autocorrelated processes. Sheu and Griffith and Sheu and Lin [6] proposed and applied an expanded EWMA control chart, called the generally weighted moving average (GWMA), to enhance the detection ability of control charts. They indicated that the GWMA control chart performs substantially better than both the Shewhart and EWMA control charts for monitoring small process mean shifts. This work focuses on the GWMA control chart of autocorrelated observations to monitor the process mean. The GWMA control chart performance is compared with that of the EWMA control chart. Numerical simulation is used to assess the ARL properties of various process mean shifts and adjusted parameters with different levels of autocorrelation. 2. The AR (1) Process with Random Error The AR(1) process with random error is presented. It is assumed that Xt can be written as Xt=nt+s, ,t = 1,2,3 (1) where nt is random process mean at time t and £t 's are independent and have a common normal distribution with mean 0 and variance a2 . Supposing that fxt be specialized as anAR(l) process with process mean £ 0 ,thus, /*/=(!-)£o + M-\ +a, ,t = 1,2,3 (2) where a, is the white noise of the AR(1) process at time t and 0 is the AR(1) parameter satisfying \<j>\ < 1. We may assume that a, 's are independent normal random variables with mean 0 and variance a2, independent of the s, 's. If we suppose that the starting value ju0 follows a normal distribution with mean £0 and variance cr2/(l-02), then ju, will be a normal distribution with mean £ 0 and variance alM = — a —, \-tt>2

for all t > 1

hence, the mean and variance of process observations X, 's respectively are:

Var(Xl) = a2x=*l

+

a2£=^-I

+

<j2

770

One can conveniently define y/ = —j- =

*"

to be the proportion of the

process variance that is due to the AR(1) process. The covariance between Xt and Xl+l is tjxr^, the autocorrelation coefficient between Xt and Xt+l is P = 4>¥ •

The AR(1) process observed with random error is equivalent to an ARMA( 1,1) process (see Box, Jenkins, and Reinsel [7] ). The ARMA(1,1) process can be represented as *,=(l-0£o+#rr-i+*,-<*,-! (3) where b, is the random white noise of the ARMA( 1,1) process at time t and follows independent normal random variables with mean 0 and variance a\, £0 is the mean of the observation, 9 is the MA parameter, and ^ is the same AR parameter as in Eq. (2). Most applications, it is interpretable based on the AR(1) with random error; however, for some purposes, the ARMA(1,1) model can be convenient for acquiring original observations and estimating parameters. Hence, as long as parameters satisfy Q<9<<j><\ and 0, the ARMA(1,1) model can be used to yield process observations. 3. The General Model Sheu and Lin [6] first proposed the GWMA control chart to enhance the detection ability of EWMA control charts. This section briefly introduces the GWMA control chart. Supposing the sequence of independent samples which include events A and B are complementary and mutually exclusive. Let M count the number of samples until the first occurrence of event A since the previous occurrence of event A. Let Pj = P(M > j). Hence, Pj satisfies 1 = P0 > P] > • • • > 0. Let P(M =j) = Pj_l-Pj. P(M = l), P(M = 2),... P(M = j)can be regarded as the weights in the weighted moving average and be the weights of the current sample, the second updated sample, ,the remote sample, respectively. Let Xj represents the measurement at the j -th time period and assume that Xj , j = 1,2,3, • • •, are independent random variables with mean /u0 and constant variance a1. Let Yj denotes the generally weighted moving average in the plotted statistics at time j . Usually, the starting value Y0 is set equal to /u0 for convenience and Pj be the weight of Y0. Then, Yj can be configured as Yj = P(M = X)Xj + P(M = 2)Xj.x +- + P(M= j)Xx + P(M > j)Uo

771 = (P0 -P,)Xj

+

{Px -P2)Xj_l+-

+ {Pj_l-PJ)Xl +PjMo

(4)

The expected value of Eq. (4) is E(Yj) =E[(P0-Pl)Xj+(P]-P2)Xj_i+-

+ (PJ_l-PJ)Xl+PJM0]

=#,(5)

The variance of Eq. (4) is Var(Yj) = f(P0 -j>) 2 + (/> -P2)2 + - + {PH -Pj)2]-a2

(6)

Supposing L denotes the width of the control limits, the central limit (CL ), upper control limit (UCL ) and lower control limit (LCL ) of the GWMA control chart can be written as UCL^ft+LjVariYj) CL = ^ LCL^/Jo-LjVariYj)

(7)

When statistics Y • falls outside the range of control limits, it indicates that the process is out of control and some actions should be taken adequately. 4. Constructing Autocorrelated GWMA Control Charts Generally, process data are assumed to be independent in conventional control charts. This work, investigates GWMA control charts for monitoring the mean of autocorrelated observations based on an AR(1) process with random error. Notably, it has the property of a time series AR(1) and independent data. Therefore, we propose that the data from an AR(1) process with random error are close to practical autocorrelated processes. According to Eqs. (1) and (4), the autocorrelated GWMA control statistic, Yj can be written as Yj =tp(M

= j-i +1)(//,. + e.) + P(M >j)Z0

(8)

The expected value of Eq. (8) will be E(Yj) = fp(M

= j-i + l)E(fi, + ei) + P(M > j)E(40) =
(9)

1=1

The variance of Eq. (8) will be Var(Yj) = Var(tP(M

= j-i +1)(//,. + *.) + P(M > j)^)

;=i

= Qj (*l +
+ \)P{M =j-k

+l ^ a l

(10)

772 J

-

where Qj = J^[P(M = i)] .The time-varying control limits of the (=1

autocorrelated GWMA control chart will be UCL = £0+LjVar(YJ)

LCL = 40-L^Var(Xj)

(11)

Particularly, we choose Pj = qJ where design parameter q is constant (0
02)

(=1

The variance of Eq. (12) will be Var(Zj)=Qjal+2

P2 M-iqi

^ L2 _ [ g Z g - - « ^ 1Y JH + T ^ ^ < 13 ) \-q 1-& l-^" \+q

where p = l - g . W h e n 7 increases to 00, Var(Zj) will increase to a limit value, VariZu)=-P-£4-o>+ol) \ + q \-
(14)

The fixed-width control limits of the EWMA control chart of autocorrelated observations will be UCL = <^0+Laz CL = {0

(16)

LUL = gQ-L(jz Hence, the EWMA control chart becomes a special case of the GWMA control chart of autocorrelated observations when a = 1. 5. Performance Measurement and Comparison Average Run Length simply is the average number of points plotted before an out-of-control signal is observed. Generally, control chart performance is measured by the ARL. An ARL should be sufficiently large to reduce false alarms when a process is under control and adequately small to detect shifts

773

rapidly when a process is out of control. Numerical analyses and computer simulations in this work are used to estimate ARL of a control chart, (see Roberts [8] and Crowder [9]). The following steps are recommended: Stepl. Give parameters , \y , the magnitude of shift (S) and the charting parameter (q,a,L). Step2. Produce a set of simulation data under an AR(1) process with random error. Step3. Calculate the autocorrelated GWMA control statistics Yj at target value Step4. Record the run length when Yj exceeds the control limits and the trial halts. The simulation is performed using FORTRAN 90. Each simulation runs 20,000 iterations. The approximate standard error of an ARL can be obtained from a simulated ARL. For instance, if each simulation runs 20,000 iterations and the ARL is roughly 370, then the approximate standard error will be 370/V20,000=2.62. To investigate the detection ability of the GWMA control chart for autocorrelated observations, the width of the control limits ( L ) are adjusted to maintain the in-control (S = 0) ARL at approximately 370. In other words, type I errors are set to 0.0027 for different GWMA control schemes, whereas out-of-control ARLs are applied for comparison. The ARL values of the GWMA charts with time-varying control limits in Table 1 are given for y/ = 0.9 , with various design parameters q , q e {0.5,0.9} , and different adjustment parameters a, a e {0.5,0.7,0.9,1.0}, The values of ^ , the correlation between Hj and // -+1, are taken to be 0.4 and 0.8. When a = 1.0, the GWMA control chart with time-varying control limits reduces to the EWMA control chart with time-varying control limits. The adjustment parameter a of the GWMA control chart is more sensitive to small process mean shifts than to those of the EWMA control chart with time-varying control limits. The boldface figures in Tables 1, especially when p and q are small, make the result more clear. When y/ = 0.9 , <j> = 0.4 , q = 0.5 and a = 0.5 within shift of 1.50, the out-of-control ARL of the GWMA control chart is smaller than the out-of-control ARL of the EWMA control chart. However, when q is large, the enhancement of detection ability is unapparent. For example, when p = 0.36 , a = 0.5 and q increased to 0.9, the out-of-control ARL of the GWMA control chart is smaller than the out-of-control ARL of the EWMA control chart within shift of 0.75.

774 Table 1 .ARLs of GWMA charts with time-varying control limits when (// = 0 . 9

+ -= 0.4 5

* = 0.8

a = 0.5

q==0.5 a = 0.9 a = 0.7

a = 1.0

a = 0.5

a = 0.7

q=0.5 a = 0.9

a = 1.0

L=2.950

L=2.934

L=2.928

L=2.926

L=2.754

L=2.744

L=2.747

L=2.750

0.00

370.65

369.82

370.32

370.30

370.46

370.58

370.55

370.87

0.25

217.11

227.56

239.87

243.42

279.09

287.53

292.91

294.11

0.50

89.51

97.32

106.60

110.31

156.58

167.11

174.08

175.92

0.75

42.80

46.24

50.63

52.48

87.30

94.87

99.45

101.30

1.00

23.37

24.64

26.90

28.11

51.71

56.20

59.67

61.00

1.25

14.44

14.68

15.66

16.15

32.44

34.93

37.18

37.92

1.50

9.54

9.46

9.89

10.19

21.33

22.65

23.99

24.45

2.00

4.96

4.80

4.83

4.88

9.54

9.91

10.38

10.60

3.00

1.93

1.88

1.85

1.85

2.12

2.11

2.14

2.16

q=i0.9

a = 0.5

or = 0.7

a = 0.9

a = 1.0

a = 0.5

q=0.9 a = 0.7 a = 0.9

L=2.778

L=2.659

L=2.616

L=2.613

L=2.439

L=2.353

0.00

370.55

370.27

370.25

370.36

369.83

370.28

370.57

370.03

0.25

115.50

117.92

136.51

147.65

207.97

222.29

243.59

253.84

0.50

44.48

42.75

46.90

50.52

96.80

102.90

118.60

127.45

0.75

23.49

21.88

22.79

23.89

52.92

54.71

62.29

67.08

1.00

14.43

13.14

13.39

13.81

32.27

32.52

36.56

39.31

1.25

9.61

8.73

8.75

8.93

20.87

20.61

22.63

24.13

8

L=2.365

a = 1.0 L=2.384

1.50

6.84

6.15

6.11

6.21

13.82

13.45

14.61

15.50

2.00

3.87

3.50

3.43

3.46

6.18

5.84

6.21

6.55

3.00

1.68

1.56

1.53

1.53

1.51

1.44

1.44

1.48

6. Conclusions In sophisticated industries with automatic process development, quality characteristics in many applications no longer meet a standard independent assumption.Lu and Reynolds [5] argued that the EWMA control chart of autocorrelated observations is more efficient than Shewhart control chart in detecting small process mean shifts. This work considered a GWMA control chart for monitoring the process mean in which the observations can be modeled as an AR(1) process with random error. This study utilized a simulation to acquire the ARL of the autocorrelated GWMA control chart. The autocorrelated GWMA control chart proved superior to the autocorrelated EWMA control chart for detecting small process mean shifts at low levels of autocorrelation. However, for high levels of autocorrelation, the autocorrelated GWMA control chart also

775

performs well at the shifted size within 3a, especially in small process mean shifts. The GWMA control chart of autocorrelated observations reduces the type II errors when the process mean is out of control and parameter a < 1. Hence, when the process is autocorrelated, the GWMA control chart is superior for detecting small process mean shifts. References 1. L. C. Alwan and H. V. Roberts, Time-Series Modeling for Statistical Process Control, Journal of Business and Economic Statistics, 6, 87-95. (1998). 2. D. C. Montgomery and C. M. Mastrangelo, Some Statistical Process Control Methods for Autocorrelation Data, Journal of Quality Technology 23,179-193(1991) 3. C. S. Padgett, L. A. Thombs and W. J. Padgett, On the a -risk for Shewhart Control Charts, Communications in Statistics-Simulation and Computation 21,1125-1147.(1992). 4. D. G Wardell, H. Moskowitz and R. D. Plante, Control Charts in the Presence of Data Correlation, Management Science 38, 1084-1105. (1992). 5. C. W. Lu and M. R. Reynolds Jr., EWMA Control Charts for Monitoring the Mean of Autocorrelated Processes, Journal of Quality Technology 31, 166-188 (1999a). 6. S. H. Sheu and T. C. Lin, The Generally Weighted Moving Average Control Chart for Detecting Small Shifts in the Process Mean, Quality Engineering 16,209-231.(2003). 7. G E. P. Box, G. M. Jenkins and G. C. Reinsel, Time Series Analysis: Forecasting and Control, 3rd, Prentice-Hall, Englewood Cliffs, New Jersey, (1994) 8. S. W. Roberts, Control Chart Tests Based on Geometric Moving Average, Technometrics 42, 97-102. (1959). 9. S.V. Crowder, Design of Exponentially Weighted Moving Average Schemes, Journal of Quality Technology, 21, 155-161. (1989).

AN EXTENDED EXPONENTIALLT WEIGHTED MOVING AVERAGE CONTROL CHART FOR MONITORING POISSON OBSERVATIONS SHEY-HUEI SHEU Department of Industrial Management, National Taiwan University of Science and Technology, 43 Keelung Road, Section 4, Taipei 106, Taiwan TSE-CHIEH LIN, SHIH-HUNG TAI Department of Industrial Management, Lunghwa University of Science and Technology, 300 Wan-Shou Road, Section I,Kueishan, 33306 Taoyuan, Taiwan

The c chart is often used to monitor the number of non-conforming products. When small shifts in the nonconformities of process mean result from assignable causes, classical c charts are relatively inefficient in detecting small shifts in the nonconformities of process mean. Therefore, the Poisson exponentially weighted moving average (Poisson EWMA) control scheme is another superior alternative to c charts. In this paper, we extended the Poisson EWMA control chart. This generalized chart, is called herein the Poisson generally weighted moving average (Poisson GWMA) control chart. This study presents the Poisson GWMA control chart for monitoring the Poisson counts. Simulation is used herein to evaluate the average run length (ARL) properties of the c chart, the Poisson EWMA control chart and the Poisson GWMA control chart. The Poisson GWMA control chart is superior to the Poisson EWMA control chart as measured by ARL. An example is also given to illustrate this study.

1. Introduction Attribute data are based on counts, or the number of times a particular event is observed. The events may be the number of nonconforming items, the number of defects, or any other distinct occurrences that are operationally defined, Including for example, defects per printed circuit board, the number of blemishes per tire and the number of accidents per month in a plant. Such counts are often well fitted by a Poisson distribution. In many manufacturing and business processes, it is often important to monitor various counts that can be modeled using Poisson random variables (see, e.g., Refs, 1-3). The control limits and performance measures for the c

776

777 chart is based on the Poisson distribution. The c chart is perhaps the simplest type of control chart for monitoring the Poisson counts. When small shifts in the nonconformities of process mean result from assignable causes, classical c charts are relatively inefficient in detecting small shifts in the nonconformities of process mean. Therefore, the Poisson-EWMA control scheme is another superior alternative to c charts. Gan (4) introduced a modified version of the EWMA control chart, which monitors the mean of the Poisson distribution. In Gan's method, no information is lost because the exact EWMA statistic is maintained and plotted. Borror, Champ and Rigdon (5) evaluated the average run lengths (ARLs) of the Poisson EWMA chart for various in-control means, using the Markov chain approach and simulation. The Poisson EWMA control chart has smaller ARL than the c-chart. Sheu and Lin (6) introduced the generally weighted moving average (GWMA) control charts to enhance the detection ability of control charts. The GWMA control chart is more sensitive in detecting small shifts in the process mean, and can spot small shifts in the initial process, due to its added adjustment parameter a . In this study, we propose the GWMA chart to monitor the mean of a Poisson distribution. It is called the Poisson GWMA control chart. Simulation is performed to evaluate the ARL properties of the Poisson GWMA control chart. 2. The Poisson GWMA Control Chart For a sequence of independent samples, let q. represent the probability of the occurrence of event A at the j-th sample and 1 — q . represent the probability of the occurrence of event B at the j-th sample. Events A and B are assumed to be complementary and mutually exclusive. Let M count the number of samples until the first occurrence of event A since the previous occurrence of event A. L e t P ; = P(M > j) . That is, Pj is the probability that the event A does not occur in the first j samples. We_assume throughout that the domain of Pj is {0,1,2, •••} and that 1 = P o > P i > ••• . We use the notation {Pj} as an abbreviation for ^sequence of probabilities. The sequence {Pj } is supposed to be known. Let Pj = P(M = j) = Py_, - Pj = PH (1 -

Zi.)

(Note: assumes probabilities > 0 for all j , which may not be the case if probability distribution changes over time for unstable processes.).

778

Hence, event A occurs with probability q • = 1 - _ J J Pj-l occurs with probability i _ q. = E' '

at the ;'-th sample. Event B

at they'-th sample.

Pj-\

Therefore, 00

m=\

= P(M = \) + P{M = !) + ••• + P(M =j) + P(M>

= {P0-Pl) =1

+ (Pl-P2)

+ - + (Pj-l-Pj)

j)

(!)

+ Pj

If (Po - P i ) > (Pi -Pi) > •••> (Pj-i -Pj), then the current sample is weighted most heavily, and the preceding samples are assigned lower weightings; the earliest sample is weighted least. P(M = j) can be regarded as the weight in the weighted moving average. The weight of the current sample is P(M = 1) and the previous observation has a weight of P(M = 2) . Accordingly, P(M > j) is weighted c 0 , where C0 is the target mean. Hence, all of the samples are considered, but the current sample is weighted most heavily. The weights decrease with the age of the samples. Let Y. denote the generally weighted moving average in the plotted test statistics at time j {j = 1,2,3, •••) and C represents the observation at the time j . C •, j — 1,2,3, • • • are assumed to be independent identically distributed Poisson random variables with mean c, where c is the mean number of defects per inspection unit. When the process is in control, C = C 0 , where C0 is the target value of mean number of defects per inspection unit. The Y0 represents the starting value that is set by the practitioner. Setting Y0 = C0 is convenient. For the Poisson GWMA control chart the sample statistic is a weighted average of the current observation C . and all previous observations, with the current observation most heavily weighted. Consequently, Y, can be configured as, Y = P ( M = 1)C, + i W = 2 ) C , . _ , +-+P(M=j)C] = (P0 -Pi)Cj +(/>, -Pi)CH +-+{PJA -Pj)Cx

+P(M>j)c0 +Pjc0

(2)

779

LetPj =qJ .where design parameter q is constant (0 < q < 1) , (j = 0,1,2,3, • • •) and a is adjustment parameter determined by the practitioner. Thus, the expected value of Eq. (2) can then be calculated as, E(YJ)=E[(q0" -/)Cj

+ ( / -qT)CH

+-+{q(Hf

~qfY\ + / c 0 ]

p )

The variance is, Var(Yj) = [(qoa -q[°f

+ ( / -q2")2 + ... + {q(Hf

-qf )2}Var{C)

= [(<7°" - / ) 2 + ( ? r -q2")2+... + (qU-,r ~qf)2]c0

(4)

where, Q} = (9°° - qx° f + (ql" - q2" )2 + • • • + (q^" - qJ" )2 . Let L denotes the multiplier to determine the width of the control limits, the control limits can be written as,

CL =c 0

(5)

LCL = c0-L^JQjCQ If Cj , j — 1,2,3, • • • are independent identically distributed Poisson random variables, then the Poisson GWMA statistic given in Eq. (2) will be nonnegative and the Poisson GWMA chart will not signal at a LCL that is less than or equal to zero. Thus, if the computed value of LCL is less than zero, then we set LCL=0. The Poisson GWMA statistics are then plotted on a control chart with the UCL and LCL, and if any point exceeds the control limits, then the Poisson GWMA chart shows an out-of-control signal. 3. Comparison of control charts The ARL is a useful measure for the design of control charts. The ARL is defined as the average number of points plotted before an out-of-control signal is given. When the process is under control, the ARL should be sufficiently large to avoid false alarms.

780 When the process is out of control, the ARL should be sufficiently small to rapidly detect shifts. However, the ARL can be obtained through numerical analysis and computer simulation (see, e.g., Refs, 7-10). The process is said to be in control when C = c0 and out of control when the mean shifts to some other value, say, C = C, . The early detection of a shift to C, > C0 is desirable and so adjustments can be made to bring the process mean back in control. The early detection of a shift to C, < C0 is desirable to allow any beneficial changes to be made. A downward shift in C may be caused by errors in the inspection of the units, rather than by an actual change in the process. For example, a new inspector may not have been trained properly to inspect the process output. In such a case, quick detection is desirable. Simulations under the following conditions were conducted to evaluate the ability of the Poisson GWMA chart to detect shifts. First, the in-control ARL is maintained at approximately 1234, and second, the in-control ARL is maintained at approximately 1384. First, the in-control ARL is maintained at approximately 1234. The top of Table 1 compares the ARLs of the Shewhart c chart, of Gan (4) Ceil EWMA control charts (CEWMA), of Borror, Champ and Rigdon (5) Poisson EWMA control charts (PEWMA), and of Poisson GWMA control charts (PGWMA) proposed herein. The lower and upper control limits (LCL, UCL) of the Shewhart c, CEWMA (A =0.27), PEWMA (X =0.27) control charts are given by (5, 35), (13,27) and (25.864, 14.136), and the in-control ARLs are 1218.6, 1234.5 and 1233.4, respectively. The in-control process mean c 0 is 20. For a fair comparison with CEWMA and PEWMA chart, in-control ARL of Poisson GWMA control chart with design parameter q=0.73 and different adjustment parameters a {a =1.00, a =0.90, a =0.80, a =0.75, and a =0.50) is maintained at approximately 1234 by changing the width of the control limitsL (Z=3.320,1=3.331,1=3.344,1=3.352, andi=3.398). Table 1 presents the ARLs for various values of c when the in-control process mean is ° =20. The Poisson GWMA control charts are superior to the Shewhart c chart based on ARL. For most of process mean shifts, the Poisson GWMA control charts with q=0.73 yield lower ARLs than the Ceil EWMA control charts and the Poisson EWMA control charts.

781 Table 1. Comparison of the ARLs of the Poisson GWMA Control Charts and Other Control Charts Shewart Gan's Borror et al's c-chart CEWMA PEWMA .parameters °

X=\

X =0.27

A =0.27

34.2 131.6 553.7 2426.2 4876.2 1218.6 266.1 76.0 27.5 12.2 6.4

7.9 18.9 138.5 5566.2 89688.2 1234.5 78.8 18.1 8.4 5.4 4.0

3.4 4.8 8.5 24.2 197.2 1233.4 85.2 17.7 7.8 4.8 3.5

\

10 12 14 16 18 20* 22 24 26 28 30

Poisson GWMA 9=0.73 a =1.00 a =0.90 a =0.80 a =0.75 #=0.50 £=3.320 £=3.331 1=3.344 £=3.352 £=3.398 2.8 2.9 3.0 3.7 3.1 4.4 4.3 4.5 5.8 4.7 8.0 8.0 8.1 10.5 8.3 22.1 21.2 21.1 25.4 23.8 198.9 176.4 157.6 149.7 138.4 1233.4 1233.3 1233.5 1233.5 1233.3 83.9 76.9 71.0 68.4 64.4 17.0 16.3 15.8 15.7 17.5 7.1 7.1 7.1 7.2 8.3 4.2 4.2 4.3 4.3 5.0 2.9 2.9 3.0 3.4 3.0

* The in-control process mean is 20 (Cg = 20) . The actual process mean is c.

4. Conclusions This paper presents a way to extended the exponentially weighted moving average control chart to the case of Poisson data for fraction nonconforming. The GWMA control charts proposed for monitoring the Poisson counts are superior to Shewhart c charts in terms of ARL considerations. For most shifts in the process mean, Poisson GWMA control charts have shorter ARLs than modified EWMA control charts and Poisson EWMA control charts. Due to its superiority in detecting shifts, the Poisson GWMA control chart for fraction non-conforming should be given high priority by quality control practitioners. References 1. A. J. Duncan, Quality Control and Industrial Statistics, 5th ed., Irwin, Homewood Illinois, (1986). 2. D. C. Montgomery, Introduction to Statistical Quality Control, 4th ed., John Wiley & Sons, New York, (2001). 3. H. Gitlow, A. Oppenheim and R. Oppenheim, Quality Management: Tools and

782

4.

5. 6. 7. 8.

9. 10. 11.

Methods for Improvement, 2nd ed., Irwin, Burr Ridge, Illinois, (1995). F. F. Gan, Monitoring Poisson Observations Using Modified Exponentially Weighted Moving Average Control Charts, Communications in Statistics-Simulation and Computation 19, 103-124(1990). C. M. Borror, C. W. Champ and S. E. Rigdon, Poisson EWMA Control Charts, J. Qual. Technol. 30, 352-361 (1998). S. H. Sheu and T. C. Lin, The Generally Weighted Moving Average Control Chart for Detecting Small Shifts in the Process Mean, Qual. Eng., 16(2), (2003). S. W. Roberts, Control Chart Tests Based on Geometric Moving Averages, Technometrics 1, 239-250 (1959). S. V. Crowder, A Simple Method for Studying Run Length Distributions of Exponentially Weighted Moving Average Control Charts, Technometrics 29, 401-407(1987). P. B. Robinson and T. Y. Ho, Average Run Lengths of Geometric Moving Average Charts by Numerical Methods, Technometrics 20, 85-93 (1978). C. H. White and J. B. Keats, ARLs and Higher-Order Run-Length Moments for the Poisson CUSUM, J. Qual. Technol. 28, 363-369 (1996). IMSL., User's Manual Statistical Library, Version 1.1, IMSL Inc., Houston, TX, (1989).

BUILD UP THE PRODUCT SATISFACTION PERFORMANCE MEASUREMENT BY USING 6-SIGMA METHODOLOGY AND S/N RATIO APPROACH - AN EXAMPLE OF PDA C. J. TAO Department of Industrial Engineering & Management, National Chin-Yi Institute of Technology Taichung, Taiwan, R.O.C S. C. CHEN 12

Department of Industrial Engineering & Management, National Chin-Yi Institute of Technology Taichung, Taiwan, R.O.C H. C. TSUNG

Department of electric machinery, National Chin-Yi Institute of Taichung, Taiwan, R.O.C

Technology

In the original, there is only personal information management function build in the personal digital assistant (PDA). But under the circumstance that by the importance, PDA is less than mobile phone and by function it is not as good as Notebook PC, it's product features are being overlooked, if we could clarify the factors that will affect the satisfaction and importance in the promotion course that would be helpful for the spreading of product and the market development. In order to know the satisfactory and respective on the product feature, brand name, service conveniences of PDA as to clarify the factor that will affect the purchasing desire of PDA so that might help for the IA promotion in the future. Therefore, this article will take the example of PDA and utilize the DMAIC methodology of 6-Sigma to build up the measurement and improvement model to promote the customer overall satisfaction performance. At first, we define the questions and measurement model and then using the questionnaire to measure the performance of customer satisfaction and importance), and then construct the satisfaction performance metrics and overall satisfaction performance control chart thus be used as the analyzing tool to find out the key improvement items and review items, and then, using the cause effect diagram to make analysis and find out the corrective action, then implementing these corrective action in the critical review and improvement items. In the final, focus on the critical and improvement items re-do the questionnaire survey and build up the overall satisfaction performance control chart to control and sustain the execution result of the relevant corrective actions. Through the comprehensive measurement and improvement model of this article, the enterprise could quickly and effectively measure, analyze, improve and control their service quality and then under the reasonable cost condition to effectively promote overall customer satisfaction

783

784 performance to create high value-added quality competitiveness and effectively enhance the profit gaining capability of enterprise. This is where the abstract should be placed. It should consist of one paragraph giving a concise summary of the material in the article below. Replace the title, authors, and addresses with your own title, authors, and addresses. You may have as many authors and addresses as you like. It is preferable not to use footnotes in the abstract or the title; the acknowledgments of funding bodies etc. are to be placed in a separate section at the end of the text.

1. Introduction After the coming of so called Post PC Era of personal computer industry which emphasizing the compact and easy-to-use product has become the new cosset product in the market and caused every company are thinking over of transfer and involve into the R&D of IA (Information Appliance). The IA product is not only implementing the concept of 4C: Computer, Communication, Consumer and Contents into the Internet networking, but also possess the convenience of portability. In view of the relevant research on new product in the past, some topics are on the innovation proliferation and few are focusing on researching the consumer evaluation to the product feature attributes. Therefore this article will take the example of PDA to applying the DMAIC methodology that described in 6-Sigma deviation of Michael (2002) to promote the overall customer satisfaction performance. At first, with reference to the theses of Dabholker et al. (1966) and Parasuraman et al. (1985, 1991), designing the relevant questionnaire on the PDA. There are total 5 quality items: Function, Specification, System, Service Convenience and Brand Name; by that we deployed out 30 quality feature items and use these as the questions. And then use these 30 questions to carry out the performance measurement of satisfaction and importance performance, and then build up the performance matrix and overall satisfaction performance control chart as the tool of analysis, so that we can find out the key items for review and improvement. And then deploy these items by using the cause-effect diagram to analyze and find out the corrective action for improvement. At the last, focus on the key item of review and improvement to re-design the questionnaire and redo the survey measurement and then set up the overall satisfaction performance control chart to control and maintain the executing effectiveness of the relevant corrective actions. By the comprehensive measurement and improvement model of this thesis, the enterprise of the PDA could focus on the quality items and under the consideration of time and cost, to do an effective and quick measurement and in accordance with the overall customer satisfaction performance value to find out the quality items that need review cost and improve the satisfaction. Then under the reasonable cost condition to promote the overall customer satisfaction

785 effectively and create high value added quality competitiveness and the capability of gaining profit. 2. The Definition of Service Quality Measurement Model The definition of service quality by the view point of Dabholker et al. (1966) and Parasuraman, A. et al. (1985, 1991), In the following, we will in reference to the concept of Hung et al. (2003), Lin et al. (1005) and Chen et al. (2005, 2006), focus on the expectation importance of customer for every quality item and the perception satisfactory level after usage to define an index, the relation formula can be shown as below respectively: m i nn

jUj -

Pl =

Ps =

R jus - min

R

(1)

(2)

Where, Ps represents Satisfactory index value, Pt represents Importance index value, us is for average value of satisfactory, and fij is for average value of importance, min = 1 represents the minimum value of k scale table, R = k - 1 represents the full range of k scale table. These two index value both are set between (0, 1). Because the index value of Pi and P s can not reflect the variation level of questionnaire so that in this article we utilize the S/N concept of Taguchi (1989) to modify the index value and by comparing them with P, and P s to for finding the variation level of index value to interpret the meaning of questionnaire contents and find out the more properly corrective actions, the relation formula as shown as:

SNi = A . s

(3)

PI

SNS= A .

(4)

In the following we will modify the Performance Evaluation Matrix proposed by Hung et al. (2003) ,Lin et al. (1005) and Chen et al. (2005, 2006) and use it as the evaluation tool.. This performance evaluation matrix can express both the importance and satisfactory of every quality service item Due to the above coordinates can not be accurately and objectively judge how many items need to be improved or reviewed, therefore, in the following we will use the difference of the coordinate value of quality importance minus the

786 coordinate value of satisfaction as the overall satisfactory index value, the relation formula is as shown in the following: SN,.S = SN, - SNS

(5)

SNi represents the importance performance index, SNS represents the satisfaction performance index and SNi_s represents the overall satisfactory performance index, their values are within the range of -1 to +1. When the overall satisfactory performance value SNi_s is positive, means need to improve the satisfaction thus need to increase the resource to promote the satisfaction to proper block; similarly, when the overall satisfactory performance value SNi.s is negative, means need to review if need to decrease the resource investment to lower the cost. 3. The Definition, Measurement and Analysis of PDA's Quality Satisfaction In accordance to the above mentioned evaluation model, this article will investigate the importance and satisfaction of the 5 quality compositions, and these 5 quality compositions are: "function", "specification", "system", "brand name" and "service convenience". Then we deploying these 5 quality compositions into 30 question features and use them as the question items of the questionnaire and focus on the Taiwan area survey the customer in the information industry market. In total, we released 100 copies of questionnaire after retrieval and arrangement, deducting the general public that did not purchase the PDA commodity and ineffective 37 copies that incomplete and with same answer for all questions, we totally collected 63 copies effective questionnaire. The effective retrieval rate is 63%. In the following, we will carry out the confidence analysis on the questionnaire. The most used internal confidence level is Cronbach's alpha coefficient. In according to the viewpoint of Gary (1992) if the confidence coefficient is greater than 0.90, it means this scale has an excellent confidence level; and a scale with the confidence coefficient greater than 0.80 is acceptable. But there is no consistent standard among the scholars on the confidence level. The scholars such like as DeVellis (1991) and Nunnally (1978) are considering the minimum acceptable coefficient of confidence level is 0.70. After the analyzing of statistic software the average Cronbach's a value of the importance of the 5 quality compositions of PDA is over 0.7487, that is, for the question item of this questionnaire to the person for answering the importance of that question there is over 74.87% confidence level that his answer is consistent, the Cronbach's a value of the other compositions

787

are all over 0.7134, In summary of the above analysis, as to the importance and satisfaction of this questionnaire the confidence level is acceptable. On the analysis respect of importance and satisfaction of product and service quality of PDA, at first we make statistics on every quality question to calculate the importance value P! and satisfaction value P s , then in accordance with the above formula to get the importance performance value SN b satisfactory performance value SNi.s, and at last calculate the quartile Ql, Q2, Q3 and standard deviation value of the Pi and SNr value of importance and the P s and SNS value of satisfaction, as well as their 3 coordinates, by using these coordinates to draw out the box-and-whisker plot and fill the data into table 1. Table 1. Quality satisfactory and importance performance table SNS

SN,.S

27.105 7.75

14.5936 9.286872

-1.53687

8.75038 14.30955 11.61502

10.2178 3.080622

5. Touch panel

18.9682 17.39017 24.0002

26. Service convenience 27. Brand name reputation 28. Payment flexibility

12.3044

13.57105 16.90483

-1.26665

10.84859 5.564523

29. Passion of serviceman

22.74527

30. Professional Knowledge

24.9998 13.97972

11.84981 11.22332 11.67372

13.7765 -1.49202

18.03053

15.24178

2.788745

22.55131

17.83259

Measured question

sitionS~~-^

Function

Service

1. Easy to control 2. Stock News 3. Entertainment 4. Image processing

1" quartile Q, 2,Ki quartile Q2 3'd quartile Q3 Standard Deviation

SN,

12.33785

12.5114

12.3852

-6.05624 -6.77332 10.8955

9.68628 8.085816

Due to the evaluation matrix is established by the quality performance value P, and P s , and P[ and P s value is the performance values that established by the average and target value which only considering the shift and not considering the deviation, thus causes the coordinates of every quality feature are too cumulative and can not accurately find out the quality items that need to be reviewed and improved. Therefore, this article will applying the SN ratio concept of Taguchi (1989) to add the parameter of standard deviation to set up

788

the evaluation index value of SNi and SNS and use these indices value to set up the performance matrix, and using the 3 coordinates of the quartile values Ql, Q2, Q3 of importance SN[ and satisfactory value SNS to draw the box of boxand-whisker plot,; and then using the minimum and maximum coordinate as the whisker of the box and use it to measure the variation degree deviation of importance and satisfaction and judging the bad or good of the overall satisfactory level by the block that box-and-whisker plot locates, as shown in figure 1: 70

•

1

• %

»

B

&

• ' 7 * * *>*

<£ S it

,*«

M

•%

***«

J

/ /

0 [

5

10

15

2D

»

»

Importmce SNi Figure 1. The evaluation matrix of quality performance SNT and SNS before improvement

From figure 5, we can see the box-and-whisker plot locates in improvement zone it means that most of the quality feature with the importance lower than it's expecting satisfactory level, that is, most of the quality feature can not meet customer's expected satisfaction. Since the evaluation matrix of ion SNi and SNS do not easy to define the upper and lower control limit, and thus hard to find out the quality items that need be improved and reviewed. Therefore, this thesis will adopt the concept of s control chart to define the control chart of overall satisfaction. At first, using the above formula and the relative distance between coordinate on the quality performance evaluation matrix to the target line, to transfer the 2 dimensional coordinate of performance value SNT and SNS of the performance matrix into one dimensional coordinate of quality satisfactory performance value SNi-s- If the following, we adopting the above formula to set up the central line, upper and lower control limit. We define the central line T value is 0 (zero) and the upper and lower control limits are plus and minus one standard deviation value. Due to 1 standard deviation value is 6.56815 so that the upper control limit UCL will be 6.56815 and the lower control limit LCL is -6.56815. In the end, draw the overall satisfactory

789

performance value SNi.s of every quality item on the overall satisfactory performance control chart, as shown in figure 2:

V-ii-l*: :

.-•"'••• V : i ' l 0 ' Oveidi \-:-f;¥ S atjfictqiy: 0

;*•»** l i

:

f;.a^^; i J3tf. : ;>:

€

'...••'-" -15 n

Items of quality'..;

Figure 2. The overall satisfactory performance control chart before improvement

From the figure 2, we can see in the performance control chart of SN[.S, the area above the upper limit is the improvement zone and there are 9 items locate in this area they are: 1st item - easy to control, 3rd item - video entertainment, 5th item - touch panel, 12th item - expandable function, 15th item - document processing, 24th item - down rate, 25th item - time to repair, 29th item - service passion and 30th item - professional knowledge; Therefore, we need to increase the resource investment to promote the satisfactory level. The area below the lower control limit is the review zone, there are 2 items locate in this area, and they are: 21 st item - brand name and 22nd item - market share. 4. The Improvement and Control of Product and Service Quality Features of PDA At first, focus on the 9 items, then proceeding the 4P analysis on process, man and facility to get 9 critical cause they are: when design the PDA to take customize and customer layer variation into consideration, enhance the video entertainment technique of PDA, enhance the touch control technique and quality, design the survey questionnaire to find out the actual need of customer, establish the relation with Microsoft to set up the co-platform with OFFICE, enhance the quality of battery power supply system, increase the repair station, enhance the training of service altitude and considering the importance of customer's need to the quality features, simplify the product design. In the end, for the action limitation we propose 2 counter plan, they are: set up the cost control mechanism should considering the design budget and investment benefit

790 and set up the schedule control for the above mentioned task and related personnel and in reference with competitor's product for review, revise in periodically. Besides, according to the above analysis there are 2 items that need to be reviewed for considering decreasing the resource investment to lower the cost, After discussion and analysis we concluded there are 2 main factors: We find out 3 corrective actions in accordingly: 1. Due to the customer is not respecting the brand name so we need to consider to decrease the promoting advertisement cost and promoting the product bundled with the other relevant products. 2. Increase the product value-added to the customer to expand the market growth rate. 3. Enhance the market survey to design the PDA product that really meet customer's requirement. In the end, we propose 2 counter action plans: 1. Review and approve the confidence level of the market survey questionnaire and analyzing the practice process. 2. Set up control mechanism for advertisement promotion; strictly control the investment benefit of advertisement. For the respect of control, at first focus on specific customer layer carry out the planned critical corrective action, counter plan and execution steps, then focus on the deviated question item re-design the questionnaire and in accordance with the coordinate position of importance and satisfaction on the performance matrix (SNi, SNS) calculate the overall satisfactory performance value SNi_s of every service quality item and set up the overall satisfactory performance control chart, accordingly. Use the control chart as a toll for controlling the service quality. And follow up the above formula we can calculate the SNi_s value of every service quality item and set up a control chart with control limit been set between -20 to +20. By using this overall satisfactory performance control chart, the managers can measure and detect if the above deviated service quality item is out of the control limits. If they are not within the control limit, then we need to submit the related corrective action strategy and counter plan. For the real example of this thesis, currently, the deviated product quality function all within the control limit, as shown in figure 3. It means we have achieved the expecting improvement target therefore need to standardization the process to set up the quality plan and standard operation procedure of after improvement, and expand the education and training for related process personnel.

791

;

: 20 ;

•"•:•.-

i s -

Oveiiil ; :»'*v#: *•>*•' ... ;• •SatjfaetQsry. 6 SNtS; '/I';.'": .:.C •

•

•

•

•

*

•

.

•

•

•

*

•

:

."

•

'

•

-

.

:

43

:

;-2ff;: ^ • Ifeni -of gijality:

Figure 3. Overall performance control chart after improvement

5. Conclusion By using the DMAIC evaluation methodology of 6-sigma provided by this thesis the industrial enterprise can focus on the related industry's product quality and service quality, For the example case of this thesis, from the process of evaluation analysis we found 2 quality items the relevant actions they are: consider to reduce the brand name promotion advertisement expense and promote the product by combining with the other related products and enhance the market survey to design the PDA product that meet the customer requirement to promote the value-added that the product brought to the customer to enlarge the market growth rate and quality. Besides, there are 9 service items that need improvement they are: easy to control, video entertainment, touch panel, expansion of function, document processing, down time rate, time to repair, service passion, and professional knowledge, respectively. Then we found out 9 relevant corrective actions for improvement, they are: while designing the PDA consider the customized and differentiation of customer level, enhance the video entertainment technique of PDA, enhance the function and quality of the touch panel, design a questionnaire to enhance the market survey to understand the real need of customers, build up the coplatform with Microsoft Office, increase the quality of the battery's power supply system, increase the repair station quantity, enhance the training of service altitude and considering the importance of customer requirement to the quality features and simplify the product design, respectively. The study of this thesis found that due tot eh main function of personal digital assistant (PDA) can not meet the customer's basic main need; therefore, the overall satisfactory level is lower than its importance.

792

Reference 1.

2.

3.

4. 5. 6.

7. 8. 9.

10.

11. 12. 13.

S. C. Chen, K. S. Chen and T. C. Hsia, Promoting Customer Satisfactions by Applying Six Sigma Example of Automobile Industry Process, Quality Management Journal 12(4), 21-33 (2005). S. C. Chen, K.S. Chen and C. J. Tao, 6 Sigma Methodology in Constructing the Measurement Model for ISO-9001 Implementation, Total Quality Management & Business Excellence. 17, 131-147(2006). Dabholker, Thorpe and Rentz et al., Do customer returns enhance product and shopping experience satisfaction?, The International Review of Retail, Distribution and Consumer Research 8(1), 1 — 13 (1996). R. F. DeVellis, Scale Development Theory and Applications, London: SAGE, (1991). L. R. Gay, Educationall Research Competencies for Analysis and Application, New York: Macmillan, (1992). A. Parasuraman, V. A. Zeithaml and L. L. Berry, A Conceptual Model of Service Quality and Its Implications for Future Research, Journal of Marketing 49, 41-50 (1985). A. Parasuraman, V. A. Zeithaml and L. L. Berry, Understanding customer expectation of service, Sloan Management Review, 39-48 (1991). G. Taguchi, E. A. Elsayed and T. C. Hsiang, Quality Engineering in Production Systems, McGraw-Hill Book Company, (1989). Y. H. Hung, M. L. Huang and K. S. Chen, Service Quality Evaluation By Service Quality Performance Matrix, Total quality management 14(1), 7989 (2003). W. T. Lin, S. C. Chen and K. S. Chen, Evaluation of Performance in Introducing CE Marking on the European Market to the Machinery Industry in Taiwan, International Journal of Quality and Reliability Management 22(5), 503-517(2005). Michael, Lean Six Sigma , McGraw-Hill, New York, 170-177 (2002). W.A. Shewhart, Statistical Methods from the Viewpoint of quality Control, The Graduate School, The Department of Agriculture, (1939). J. C. Nunnally, Psychometric Theory 2nd ed., New York: McGraw-Hill, (1978).

AUTHOR INDEX

Ahn, S.E., 569, 610 Akiba, T., 107 Arafuka, M., 660

Ding, J.Y., 48 Dohi, T., 219, 243, 251, 395, 427, 443,451,577,676

Bae, CO., 142 Bae, S.J., 267 Bertolini, M, 116 Bevilacqua, M., 116

Esaulova, V., 559

Cha, J.H., 3 Cha, S.W., 486 Chakraborty, T., 369 Chang, CM., 310 Chang, W.L., 318 Chaturvedi, S.K., 11 Chen, C.H., 329 Chen, J, 634 Chen, J.A., 337, 345, 687 Chen, J.W., 80 Chen, K.S., 703 Chen, M.C., 195,203 Chen, S.C, 783 Chen, T.W., 693 Cheng, C.Y., 195,203 Chien, Y.H.,337 345, 687 Cho, B.R., 724 Cho, G.S., 24, 32 Choi, C.H., 353 Choi, I.J., 709 Choi, S.K.., 211 Choi, S.S., 549 Chu, Y. K.., 654 Chukova, S., 361 Chung, I.H., 185 Chung, S.W., 494 Chung, Y.J., 64 Cui, L., 97 Cui, Y. H., 645

Giri, B.C., 369 Goh, T.N., 739 Gotoh, H., 577 Guo, R., 585, 595, 645

Finkelstein, M., 559 Fujiwara, T., 395, 435

Ha, Y.J., 227 Ham.J.K., 518 Han, S.W., 553 Hayakawa, Y., 361 Hiroshima, T., 124 Hong, J.W., 301 Hong, S.H., 709, 731 Horikoshi, Y., 132 Hsieh, C.Y., 703 Hsu, C.H., 747 Huang, J.M., 703 Hwang, D.H., 518 Hwang, H.S., 24, 32 Imaizumi, M., 40 Ishii, T., 395 Ito, K., 235 Iwamoto, K., 243,251 Jin, L., 132 Johnston, M.R., 361 Jung, J.S., 486 Jung, M., 541 Kaio,N., 219, 243 793

794 Kamakura, T., 626 Kambayashi, Y., 124, 176 Kang, C.W., 267 Kang, K.M., 385 Kang, T.H., 494 Karim, M.R., 377 Kawai, H., 760 Ke, J.B.,48, 80 Ke, J.C., 654 Kim, CM., 502, 510, 541 Kim, Dohoon, 56 Kim, Do Hoon, 259 Kim Dong Hyeok, 525 Kim, D.K., 403 Kim, H.M., 569 Kim, HoGyun, 142, 185 Kim, Hu Gon, 64 Kim, H.S., 411 Kim, Jae Hoon, 603 Kim, Jae Hwan, 150 Kim, J.R., 56 Kim, J.S., 150 Kim, Jin Woo, 486 Kim, Jong Woon, 603 Kim, M, 477 Kim, M.S., 486, 518 Kim, S.J., 267 Kim, S.M., 518 Kim, W.H., 569, 610 Kim, Y.B., 301 Kim, Y.J., 717, 724 Kimura, M., 40 Kimura Mitsuhiro, 419 Kondo, H., 660 Kwon, H.M., 709, 731 Lee, B.J., 267 Lee, E.Y.,211 Lee, G.L., 158 Lee, H.J., 486 Lee, H.K., 533 Lee, H.W., 275

Lee, J.K., 486 Lee, M.K., 709, 731 Lee, O.S., 525 Lee, S.W., 533 Lie,C.H., 301 Lim, H.S., 477 Lim, J.H., 259, 618 Lim, K.E.,211 Lin, C.S., 72 Lin, T.C., 776 Liu, J. Y., 739 Lu, S.L., 768 Maruo, T., 251 Maeji, S., 293 Mishra, R., 11 Mizutani, S., 285 Moon, J.S., 486 Mukuda, M., 166 Nagatsuka, H., 107, 626 Nakagawa, T., 235, 285, 293, 634, 660 Nakamura, S., 660 Naruse, K., 293 Nishimaki, T., 285 Oh,G.T., 518 Okamura, H.,251,427, 577, 676 Osaki, S., 219 Ouyang, L.Y., 747 Paik, C.H., 64 Park, C. S., 569, 610 Park, D.H., 403,411,618 Park, D.J., 752 Park, J.H., 301 Park, J.S., 603 Park, J.W., 518 Park, J.Y., 435 Park, N.I., 275

795 Park, S.J., 477 Qian, C.H., 634

Wang, K.H., 48, 80 Wang, M., 310 Wang, T.C., 693 Woo, C.S., 549

Rinsaka, K., 443, 451 Xie, M., 739 Satow, T., 760 Seo,J.H., 510, 541 Sheu, S.H., 72, 768, 776 Shibata, K.,451 Shin, G.H., 301 Shin, J.C., 486 Shingyochi, K., 176 Suzuki, K., 132, 377 Tai, S.H., 776 Tamura, N., 668 Tamura, Y., 459 Tao, C.J., 783 Tateishi, K., 676 Teramoto, K., 235 Tokuno, K., 467 Tsujimura, Y., 124, 166, 176 Tsung, H.C., 783

Yamachi, H., 124 Yamada, S., 411,459, 467 Yamamoto, H., 107, 124, 176, 626 Yanagi, S., 89 Yang, S.L., 703 Yang, X., 97 Yasui, K., 40 Yeh, R.H., 318 Yeo, I.K., 403 Yuge, T., 89 Yum, B.J., 477 Yun, W.Y., 185, 227, 353, 385, 494, 603 Zhao, X., 97

Advanced Reliability Modeling II Reliability Testing and Improvement lity Modi (AIWARM) is the the i hers , from no

- the v issues at the system! phase. The theme

of AIWARM 2006

improvement". The contributions in bility and maintenance engineering,

provid

ntation of theory and practice.

Advanced Reliability Modeling: Proceedings of the 2004 Asian International Workshop (AIWARM 2004), Hiroshima, Japan, 26 - 27 August 2004

Read more

Embedded and Ubiquitous Computing: International Conference, EUC 2006, Seoul, Korea, August 1-4, 2006, Proceedings

Read more

Mechanical Reliability Improvement

Read more

Static Analysis: 13th International Symposium, SAS 2006, Seoul, Korea, August 29-31, 2006, Proceedings

Read more

August 2006)

Read more

Pattern Recognition in Bioinformatics: International Workshop, PRIB 2006, Hong Kong, China, August 20, 2006, Proceedings

Read more

Information Security and Cryptology ICISC 2006: 9th International Conference, Busan, Korea, November 30 - December 1, 2006, Proceedings

Read more

Reliability Modeling, Analysis And Optimization

Read more

Reliability modeling, analysis and optimization

Read more

Mechanical Reliability Improvement - Probability and Statistics for Experimental Testing Marcel

Read more

Mechanical Reliability Improvement: Probability and Statistics for Experimental Testing

Read more

Theory and Applications of Satisfiability Testing - SAT 2006: 9th International Conference, Seattle, WA, USA, August 12-15, 2006, Proceedings

Read more

Secure Data Management: Third VLDB Workshop, SDM 2006, Seoul, Korea, September 10-11, 2006, Proceedings

Read more

Computer Safety, Reliability, and Security: 25th International Conference, SAFECOMP 2006, Gdansk, Poland, September 27-29, 2006, Proceedings

Read more

PC World (August 2006)

Read more

Haptic and Audio Interaction Design: First International Workshop, HAID 2006, Glasgow, UK, August 31 - September 1, 2006, Proceedings

Read more

The Economist (28 August, 2006)

Read more

Location- and Context-Awareness: Second International Workshop, LoCA 2006, Dublin, Ireland, May 10-11, 2006, Proceedings

Read more

Intelligence and Security Informatics: International Workshop, WISI 2006, Singapore, April 9, 2006, Proceedings

Read more

Parameterized and Exact Computation: Second International Workshop, IWPEC 2006, Zürich, Switzerland, September 13-15, 2006, Proceedings

Read more

Semiconductor Memories: Technology, Testing, and Reliability

Read more

Biomedical Image Registration: Third International Workshop, WBIR 2006, Utrecht, The Netherlands, July 9-11, 2006, Proceedings

Read more

The Economist (05 August, 2006)

Read more

The Economist (19 August, 2006)

Read more

The Economist (12 August, 2006)

Read more

Ubiquitous Computing Systems: Third International Symposium, UCS 2006, Seoul, Korea, October 11-13, 2006, Proceedings

Read more

Ensuring Software Reliability (Quality and Reliability)

Read more

Biomedical Image Registration: Third International Workshop, WBIR 2006, Utrecht, The Netherlands, July 9-11, 2006, Proceedings

Read more

Advanced Data Mining and Applications: Second International Conference, ADMA 2006, Xi'an, China, August 14-16, 2006, Proceedings

Read more

Mathematical Knowledge Management: 5th International Conference, MKM 2006, Wokingham, UK, August 11-12, 2006, Proceedings

Read more

Recommend Documents

Advanced Reliability Modeling: Proceedings of the 2004 Asian International Workshop (AIWARM 2004), Hiroshima, Japan, 26 - 27 August 2004

Advanced Reliability Modeling This page intentionally left blank Proceeding sof the 2004Asina International Wroksho...

Embedded and Ubiquitous Computing: International Conference, EUC 2006, Seoul, Korea, August 1-4, 2006, Proceedings

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Mechanical Reliability Improvement

TLFeBOOK Mechanical Reliability Improvement Probability and Statistics for Experimental Testing R. E. Little The Univ...

Static Analysis: 13th International Symposium, SAS 2006, Seoul, Korea, August 29-31, 2006, Proceedings

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

August 2006)

SOFTWARE TO MAKE YOU SMARTER • SEEING IN BLACK AND WHITE MIND T HOUGH T • IDE A S • BR A IN S C IENC E J u n e /J ul y...

Pattern Recognition in Bioinformatics: International Workshop, PRIB 2006, Hong Kong, China, August 20, 2006, Proceedings

Lecture Notes in Bioinformatics 4146 Edited by S. Istrail, P. Pevzner, and M. Waterman Editorial Board: A. Apostolico ...

Information Security and Cryptology ICISC 2006: 9th International Conference, Busan, Korea, November 30 - December 1, 2006, Proceedings

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Reliability Modeling, Analysis And Optimization

Reliability Modeling, Analysis and Optimization This page intentionally left blank Series on Quality, Realibility a...

Reliability modeling, analysis and optimization

Reliability Modeling, Analysis and Optimization This page intentionally left blank Series on Quality, Realibility a...

Mechanical Reliability Improvement - Probability and Statistics for Experimental Testing Marcel

TLFeBOOK Mechanical Reliability Improvement Probability and Statistics for Experimental Testing R. E. Little The Univ...