Replacement Models with Minimal Repair

Springer Series in Reliability Engineering For further volumes: http://www.springer.com/series/6917 Lotfi Tadj Moham...

Author: Lotfi Tadj | M.-Salah Ouali | Soumaya Yacout | Daoud Ait-Kadi

38 downloads 944 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Springer Series in Reliability Engineering

For further volumes: http://www.springer.com/series/6917

Lotfi Tadj Mohamed-Salah Ouali Soumaya Yacout Daoud Ait-Kadi •

•

•

Editors

Replacement Models with Minimal Repair

123

Editors Prof. Lotfi Tadj Department of Finance, Information Systems and Management Science Sobey School of Business Saint Mary’s University 923 Robie Robie Street Halifax, NS B3H 3C3 Canada e-mail: [email protected] and School of Business Administration Dalhousie University 6100 University Avenue Halifax NS B3H 3J5 Canada e-mail: [email protected]

Prof. Soumaya Yacout Department of Mathematics and Industrial Engineering Ecole Polytechnique de Montreal Montreal Québec Canada e-mail: [email protected] Prof. Daoud Ait-Kadi Department of Mechanical Engineering Laval University Quebec Canada e-mail: [email protected]

Assoc. Prof. Mohamed-Salah Ouali Department of Mathematics and Industrial Engineering Ecole Polytechnique de Montreal Montreal Québec Canada e-mail: [email protected]

ISSN 1614-7839 ISBN 978-0-85729-214-8

e-ISBN 978-0-85729-215-5

DOI 10.1007/978-0-85729-215-5 Springer London Dordrecht Heidelberg New York British Library Cataloging in Publication Data A Catalogue record for this book is available from the British Library Ó Springer-Verlag London Limited 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Cover design: eStudio Calamar, Berlin/Figueres Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

To the loving memory of my parents Lotfi Tadj To my parents, my wife, my children, my sister, and my brothers Mohamed-Salah Ouali To my husband, my children, and my grandchildren Soumaya Yacout To my parents, my brothers Mohamed, Abdellatif and Ahmed, my sisters Saadia and Hafida, my wife Guylaine, my son Camil, and all the friends and members of my research team Daoud Ait-Kadi

Preface

This book is dedicated exclusively to the subject of minimal repair. The book presents the state of the art and the recent advancements in studying the effects of minimal repair on a system’s state, and introduces a large number of mathematical models that can be used in order to plan minimal repair and maintenance actions efficiently The book contains six parts. Part I is dedicated to mathematical modeling of minimal repair. It consists of three chapters (1, 2, and 3) that analyze in depth the effects of minimal repair on replacement strategies, the generalization of these strategies when information concerning the system’s condition is available, and when competing dependent and independent failure modes are considered, respectively. Chapter 1 is an exhaustive literature review that divides replacement models with minimal repair into two categories: age replacement and block replacement. In each category, papers are grouped in three groups. In the first group, the replacement models in which a system is minimally repaired up to time T, and replaced at the first failure after T, are presented. These models are called the Tpolicy. The second group of models is similar to the T-policy with the difference being the replacement is signaled when a prespecified number of minimal repairs is achieved. Finally, in the third group, replacement is performed when either a time T or a prespecified number, n, of minimal repairs is reach. In each of these three groups, the presented models aim at optimizing an objective function. This function usually represents cost and may take into consideration production costs; inventory costs, warranty costs, inspection costs, costs of imperfect repairs, leasing costs, and out-sourcing costs. All the models of this chapter are presented in a way that allows practitioners to use the results without having to go through the details of mathematical derivation. In Chap. 2, a repair is defined according to its relation with the number of failures or with the system’s condition. Traditionally minimal repair has no effect either on the age or on the condition of the system. This can be expressed by a failure rate or a number of failures that are not affected by the minimal repair. In this chapter, the author gives mathematical expressions of failure rates which are vii

viii

Preface

functions of age and of condition respectively. The condition is defined by the number of shocks that the system receives. He also gives a mathematical expression of the failure rate of a system composed of many components which are minimally repaired upon failure. A mathematical definition of minimal repair process is introduced. A repair is minimal if the repair process is not a stopping time. Mathematical condition for minimal repair is also given. Finally, the author presents two optimal replacement policies under minimal repairs. The first is a period replacement policy. The second is an optimal policy of replacing a system composed of many components. In Chap. 3, minimal repair models with competing failures modes are considered. The failure modes are categorized as maintainable and non maintainable. The rate of maintainable failure modes is improved by preventive maintenance actions while the rate of non maintainable failure modes is unaltered. A cost function is introduced for the cases of dependent and independent failure modes. Part II is dedicated to preventive maintenance models and optimal scheduling of imperfect preventive maintenance activities. It consists of two chapters (4 and 5) that analyze the effect of PM actions on parameters of the hazard rate, and discuss optimal schedules of two periodic imperfect PM policies. In Chap. 4, the author uses a novel approach in order to review preventive maintenance models that appear in the literature. The first differentiate, on one hand, between maintenance strategies namely the Reliability Centered Maintenance (RCM), the Total Productive Maintenance (TPM), the Risk Based Maintenance (RBM), and, on the other hand, between maintenance policies, namely the Preventive Maintenance (PM) and the Corrective Maintenance (CM). The author then introduces a mathematical formulation of the hazard rate with four parameters, and he showed that the existing literature on PM can be categorized according to the effect of PM actions on these four parameters since these actions will affect either one of these parameters or a combination of them. This leads to a change in the scale parameter or the location parameter or both, thus affecting the hazard rate or the virtual age of a system or both. The author concluded by noting that the effect of PM actions on the shape parameter has not been found. In Chap. 5, the authors compare two imperfect preventive maintenance policies. They call them local and global. The local policy has a local effect of wear-out, while the global policy restores the global wear-out. For both policies the optimal number of preventive actions, as well as, the optimal period between these actions are calculated. Each one of parts III, IV, and V consists of one chapter. Chapter 6 presents a new warranty servicing strategy with imperfect repair. The authors study the case of a product sold with a two-dimensional warranty: the age and the usage. The strategy is based on finding a specified region of the warranty defined by these two parameters. This involves finding three disjoint intervals before the expiration of warrant. If the first failure occurs in the specified middle interval, it will be rectified by an imperfect repair, all other failures being minimally repaired. For a given usage rate, the values of intervals are selected such that the expected warranty servicing cost is minimized.

Preface

ix

Chapter 7 is dedicated to mathematical models combining burn-in procedure and general maintenance policies. Burn-in is intended to eliminate early failures. If burn-in period is too long it may induce unnecessary failure, while if it is too short it will miss some early failures. Thus, the objective of mathematical modeling is to find the optimal burn-in period combined to optimal maintenance actions of replacement and minimal repair such that average cost is minimized. Stochastic models for burn-in procedures in accelerated environment and optimal accelerated burn-in and maintenance actions with age or block replacements, failures type I and type II, are also proposed. Part V (Chap. 8) presents methods for parameters’ estimation of some minimal repair models. In this chapter it is assumed that minimal repair affects virtually the age of the system. The virtual age depends on the actual real age of the system and the degree of repair which can be at one of four levels: perfect, imperfect, minimal and sloppy. Two parameters’ estimation models are introduced. The first model estimates the conditional probability distribution of the degree of the nth repair by using a Hidden Markov Model and the Expectation–Maximization algorithm. In the second model, the transitional probabilities of a Markov chain are estimated. The M-ary detection procedure is used in Electrical Engineering to describe sequential hypothesis testing for M hypotheses, and it is applied in this chapter in order to find the hypothesis that best represents the sequential states of a system subjected to minimal repair random variable. Chapter 9 is dedicated to the subject of product support. This means the design of all elements of the service after sale including installation, training of operators, maintenance, repair, warranty, and in particular the availability of spares. The objective is to increase the service after sale’s performance while keeping the costs at an acceptable level. The author gives special consideration to the surrounding environment which affects the product’s performance. He emphasizes that product support is usually thought of in the design phase, this is called design for supportability. He also explains that product reliability characteristics are called product dependability because its availability depends on those characteristics, on maintainability and on maintenance support. It is discussed that product geographical location is a critical factor of product support since it determines service delivery strategies, spare parts logistics and inventory management, which aim at minimizing the product support cost of ordering, holding, and transportation, while spare parts management program ensures the availability of spare parts at optimal cost, by categorizing the spare parts into classes based on their importance to the production operation, their costs, and their number in the system. Since the optimal number of spare parts depends on the demand rates, which in turn depends on product’s reliability, reliability prediction methods and spare parts provisioning methods occupy a lengthy part of this chapter. These methods include the Poisson process, the renewal process, the normal distribution, the constant interval replacement model, the age-based prevention model, the Bayesian approach and the Proportional hazards model. Again the estimation and calculation of the required number of spare parts while considering their techno-economical characteristics and the operating environment is discussed.

x

Preface

In conclusion this book is a useful reference to faculty members, researchers, and practitioners who are interested in all aspects of minimal repair and its effect on maintenance policies and strategies. It is presented in a way that seeks a middle ground between a very detailed mathematical models and a simple practical use of some interesting results and models which are found in the literature. The references are chosen such that the subject of minimal repair is covered in a complete versatile way that was thought to be the most interesting to the reader. Montreal, June 2010

Soumaya Yacout

Acknowledgments

We take the opportunity to thank and acknowledge all contributors to the realization of this book. The contribution of all the authors, who responded positively to our call for a chapter on minimal repair, has been invaluable in the development of the book. We thank Claire Protherough, Senior Editorial Assistant at Springer UK, for her patience and kind guidance. Many thanks thanks to the Department of Mathematics and Industrial Engineering at École Polytechnique of Montréal for offering Lotfi Tadj the position of Associate Professeur for the year 2009–2010 during which this book was produced.

xi

Contents

Part I

Mathematical Modeling of Minimal Repair

A Survey of Replacement Models with Minimal Repair. . . . . . . . . . . . Mohamed-Salah Ouali, Lotfi Tadj, Soumaya Yacout and Daoud Ait-Kadi

3

Information-Based Minimal Repair Models . . . . . . . . . . . . . . . . . . . . Terje Aven

101

Minimal Repair Models with Two Categories of Competing Failure Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inma T. Castro

Part II

Preventive Maintenance

Preventive Maintenance Models: A Review . . . . . . . . . . . . . . . . . . . . Shaomin Wu Optimal Schedules of Two Periodic Imperfect Preventive Maintenance Policies and Their Comparison . . . . . . . . . . . . . . . . . . . Dohoon Kim, Jae-Hak Lim and Ming J. Zuo

Part III

115

129

141

Two-Dimensional Warranty

Warranty Servicing with Imperfect Repair for Products Sold with a Two-Dimensional Warranty . . . . . . . . . . . . . . . . . . . . . . . . . . Bermawi P. Iskandar and Nat Jack

163

xiii

xiv

Part IV

Contents

Burn-in

A Survey of Burn-in and Maintenance Models for Repairable Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ji Hwan Cha

Part V

Filtering

Filtering and M-ary Detection in a Minimal Repair Maintenance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lakhdar Aggoun and Lotfi Tadj

Part VI

179

207

Product Support

Efficient Product Support—Optimum and Realistic Spare Parts Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behzad Ghodrati

225

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

271

List of Contributors

Daoud Ait-Kadi, Department of Mechanical Engineering, Laval University, Quebec, G1K 7P4, Canada, e-mail: [email protected] Lakhdar Aggoun, Department of Mathematics and Statistics, Sultan Qaboos University, P.O. Box 36, Al-Khod 123, Muscat, Sultanate of Oman, e-mail: [email protected] Terje Aven, Faculty of Science and Technology, University of Stavanger, 4036, Stavanger, Norway, e-mail: [email protected] Inma Torres Castro, Department of Mathematics, University of Extremadura, Caceres, Spain, e-mail: [email protected] Ji Hwan Cha, Department of Statistics, Ewha Womans University, Seoul, 120750, Korea, e-mail: [email protected] Behzad Ghodrati, Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON, M5S 3G8, Canada, e-mail: [email protected] Bermawi P. Iskandar, Departemen Teknik Industri, Institut Teknologi Bandung, Jalan Ganesha 10, Bandung, 40132, Indonesia, e-mail: [email protected] Nat Jack, Dundee Business School, University of Abertay Dundee, Dundee, DD1 1HG, UK, e-mail: [email protected] Dohoon Kim, Graduate School, Kyonggi University Suwon, Gyenggi-do, 443– 760, Korea, e-mail: [email protected] Jae-Hak Lim, Department of Accounting, Hanbat National University, Yusonggu, Daejon, 305–719, Korea, e-mail: [email protected] Mohamed-Salah Ouali, Département de Mathématiques et Génie Industriel (MAGI), École Polytechnique de Montréal, 2500 chemin de Polytechnique, Montréal, Québec, H3T 1J4, Canada, e-mail: [email protected]

xv

xvi

List of Contributors

Lotfi Tadj, Department of Finance, Information systems and Management Science, Sobey School of Business, Saint Mary’s University, 923 Robie Street, Halifax, Nova Scotia, B3H 3C3, Canada, e-mail: [email protected] Shaomin Wu, School of Appllied Sciences, Cranfield University, Bedfordshire, MK43 0AL, UK, e-mail: [email protected] Soumaya Yacout, Département de Mathématiques et Génie Industriel (MAGI), École Polytechnique de Montréal, 2500 chemin de Polytechnique, Montréal, Québec, H3T 1J4, Canada, e-mail: [email protected] Ming J. Zuo, Department of Mechanical Engineering, University of Alberta, Edmonton, Alberta, T6G 2G8, Canada, e-mail: [email protected]

Part I

Mathematical Modeling of Minimal Repair

A Survey of Replacement Models with Minimal Repair Mohamed-Salah Ouali, Lotfi Tadj, Soumaya Yacout and Daoud Ait-Kadi

Abbreviations MR PM CM HPP Cdf Sf pdf pmf

minimal repair preventive maintenance corrective maintenance non-homogeneous Poisson process cumulative distribution function survival function probability density function probability mass function

M.-S. Ouali (&) and S. Yacout Département de Mathématiques et Génie Industriel (MAGI), École Polytechnique de Montréal, 2500 chemin de Polytechnique, Montreal, QC H3T 1J4, Canada e-mail: [email protected] S. Yacout e-mail: [email protected] L. Tadj Sobey School of Business, Saint Mary’s University, Halifax, NS B3H 3C3, Canada e-mail: [email protected] L. Tadj School of Business Administration, Dalhousie University, Halifax, NS B3H 4H6, Canada D. Ait-Kadi Department of Mechanical Engineering, Laval University, Quebec, G1K 7P4, Canada e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_1, Springer-Verlag London Limited 2011

3

4

M.-S. Ouali et al.

1 Introduction Asset management of industrial systems requires the implementation of various maintenance activities mainly grouped into two categories: preventive maintenance (PM), and corrective maintenance (CM) activities. PM activities, such as the inspection and lubrication frequencies or the replacement of some known deteriorating components, are usually prescribed by the system’s designer. They are also proposed by the maintenance department in order to reduce the stress of some wear-out components due to not recommended operating environment or conditions. CM activities are undertaken to repair failed components. They require a prior diagnosis of failure in order to identify and isolate the failed component and then to replace it by a new or used one. CMs should be avoided as much as possible, because they are very costly in terms of high skills labor, duration of labor, urgent purchase of spare parts, lost of production and above all the lack of the system safety and its immediate environment. For several industrial systems, it is also possible to repair a failed component without replacing it. This CM is a common practice in the industry, and is known as the troubleshooting activity. From the experimental point of view, the troubleshooting can be considered as a minimal repair (MR) which restores the failed component to the condition that it was in just before the failure occurrence. Moreover, it is likely to consider that the replacement of a failed component by a new one is a minimal repair of the entire system. Based on this main characteristic, many researches have proposed many practical replacement models with minimal repair. This survey is a contemporary issue of the main replacement models in conjunction with minimal repair already published over the last five decades. It is customary to classify models into two general types: descriptive and prescriptive models. Descriptive models are models which describe some current real-world situations, while prescriptive models are models which prescribe what the real-world situation should be, that is, the ‘optimal’ situation at which to aim, Gross and Harris [62]. The development of reliability theory is primarily dominated by descriptive models. Some attention has been given nonetheless to optimization. Prescriptive models are generally standard reliability models with a superimposed cost function to be optimized with respect to some parameters, such as the life cycle length of some component or system. A notion virtually always linked to reliability prescriptive models is that of MR. This is a hot contemporary issue that has been receiving increasing attention.

1.1 Terminology Besides minimal repair, terms such as imperfect repair, overhaul, preventive maintenance, imperfect maintenance, and other repair and/or replacement policies

A Survey of Replacement Models with Minimal Repair

5

are used by researchers. Such a number of different appellations and fundamental concepts, that sound synonyms sometimes, may be confusing. Endrenyi et al. [59] made an attempt to offer a consistent set of definitions which may be acceptable to most users. They note that no standard nomenclature exists in this field. For the novice, for completeness, and to avoid any ambiguities in the rest of this survey, we reproduce here the most relevant definitions: • Failure: the termination of the ability of a device to perform a required function. • Random Failure: a failure whose rate of occurrence (intensity) is constant, and independent of the devices condition. • Deterioration (Wear or wear-out): a process by which the rate of failure increases due to loss of strength, the effects of usage, environmental exposure or passage of time. • Deterioration failure: a failure resulting from the deterioration of a device. • Restoration: an activity which improves the condition of a device. If the device is in a failed condition, the intent of restoration is the re-establishment of a working state. • Replacement: restoration wherein a device is removed and one in better condition is put in its place; if the device is failed, it is replaced by a working one. It is often assumed that the device so installed is new. • Repair: restoration wherein a failed device is returned to operable condition. Note: it is common to use the term corrective maintenance for both replacement and repair. • Minimal repair: repair of limited effort wherein the device is returned to the operable state it was in just before failure. • Maintenance: restoration wherein an un-failed device has, from time to time, its deterioration stopped, reduced or eliminated. Note: It is common to call this concept planned maintenance or preventive maintenance. These terms are meant to contrast with corrective maintenance (see Repair). • Scheduled maintenance: a maintenance carried out at regular intervals (rigid schedule). Note: Another term often used for this activity is preventive maintenance. • Predictive maintenance: a maintenance carried out when it is deemed necessary, based on periodic inspections, diagnostic tests or other means of condition monitoring. • Emergency maintenance: a predictive maintenance that must be carried out immediately, or with the shortest delay possible, after condition monitoring detects a danger of imminent failure. • Minor maintenance: maintenance of limited effort and effect. Note: If deterioration is modeled in discrete stages and the intent of maintenance is to improve conditions by just one stage, the maintenance procedure is often called minimal. • Overhaul: maintenance or repair requiring major effort and resulting in a significant improvement of the devices condition.

6

M.-S. Ouali et al.

Note: Occasionally the terms maintenance-overhaul and repair-overhaul are used to indicate the distinction. In most cases, however, this is not necessary. • Minor Overhaul: an overhaul of substantial effort yet involving only a limited number of parts, whose effect is a considerable improvement of the equipments condition. • Major Overhaul: an overhaul of extensive effort and duration which involves most or all parts of the equipment and results, as far as possible, in the good as new condition. Note: A major overhaul usually involves complete disassembly and maintenance of all parts of the equipment, and replacement of some. Therefore, a replacement resets the age of the system to 0 while a MR does not change the age of the system. A MR involves only that amount of work which is necessary to restore the system to its operating condition.

1.2 Brief History of Minimal Repair The very first mention of the notion of MR is found in Morse [115] who calls it optimum repair effort. It is, however, used in a queueing rather than reliability framework and there is no mention whatsoever of failure rate. Considering a single machine that is subject to breakdowns, Morse uses queueing theory arguments to derive the mean duration of a repair as hpﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ i1 gTa =cm 1 ; T s ¼ Ta where Ta is the mean running life of a machine, g the income brought by the monthly output of a machine that is worked full time, and cm the unit repair cost. The objective function (monthly revenue) that he uses to reach this result is RðTs Þ ¼

gTa cm : Ta þ T s Ts

In the context of reliability theory, Barlow and Hunter [14] introduce the notion of periodic replacement or overhaul with MR for any intervening failures. In this model it is assumed that the system failure rate remains unchanged by any repair of failures between the periodic replacements, i.e., the system after each MR has the same failure rate as before failure, i.e., the failed system is restored to a condition which is the statistically the same as its condition just prior to failure.

1.3 Definitions of Minimal Repair We are dealing in this paper with the ‘physical’ and not the ‘statistical’ MR of a system orcomponent. For a comparison of these two notions, see for example Arjas [5].

A Survey of Replacement Models with Minimal Repair

7

A definition of MR used by Sheu [155] and then by others, e.g., Bagai and Jain [8], Bae and Lee [7], uses the Sf as follows: Definition 1 If F is the lifetime distribution of a device, the failure time X distribution following a perfect repair is always F; but the failure-time distribution following a MR performed at age s is given by þ tÞ Fðs PfX [ tjX [ sg ¼ FðtjsÞ ¼ ; FðsÞ 1 FðÞ: where FðÞ This definition is written in a more formal way by Nakagawa and Kowada [126] as follows: suppose the system begins to operate at time 0 and that the time for repair is negligible. Let Y0 ; Y1 ; Y2 ; . . .ðY0 ¼ 0Þ denote the system failure times and let Xn ¼ Yn Yn1 ; ðn ¼ 1; 2; . . .Þ denote the times between failures. Definition 2 Let FðtÞ PfXn tg for t 0: The system undergoes MR at failures if and only if PfXn xjX1 þ X2 þ þ Xn1 ¼ tg ¼

Fðt þ xÞ FðtÞ ; FðtÞ

ðn ¼ 2; 3; . . .Þ;

ð1Þ

for x [ 0; t 0 such that FðtÞ\1: Nakagawa and Kowada [126] derived an expression for the Cdf for the number of failures when the unit is always subjected to a MR after each failure. Their derivation is based on the Cdf for the time between failures. Murthy [116] derives an expression for the Cdf of the number of failures over an interval, using a simpler, more direct, conditional probability approach. Baxter [17], Gupta and Kirmani [64], and Kochar [92] have established monotonicity properties of interoccurrence times in the sense of the usual stochastic order. Kirmani and Gupta [91] and Yue and Cao [193] have established similar results for occurrence times.

1.4 Mathematical Modeling of Minimal Repair Successive system failures can be modeled using a point process formulation, and it is the type of corrective maintenance action performed at each failure that determines the type of this point process. For example, • if the repaired item is returned to a ‘good as new’ state (perfect repair) and repair time is negligible, the sequence of failure times comprises a renewal process; • if the repaired item is returned to a ‘good as new’ state (perfect repair) and repair times are not negligible and the sequence of random variables consisting of an up time and the subsequent repair times are independent and identically distributed, then an alternating renewal process results;

8

M.-S. Ouali et al.

• if the repaired item is returned to a ‘good as old’ state (MR) and repair time is negligible, the sequence of failure times comprises a NHPP. This sequence may comprise a semi-renewal process; • if the repaired item is returned to a ‘good as old’ state (MR) and repair times are not negligible, an alternative approach is to use some kind of monotone process. If the items are stochastically deteriorating, the successive operating periods after repairs will become shorter and shorter, whereas the mean lengths of the repair periods will be increasing. As a first order approximation, Lam [95] studied the geometric process replacement model in which the successive operating periods fXn ; n ¼ 1; 2; . . .g of an item form a non-increasing geometric process and the consecutive repair periods fYn ; n ¼ 1; 2; . . .g constitute a nondecreasing geometric process. Other processes have been considered in the case of other maintenance actions. The interested reader is referred to the review of Lindqvist [104] for more details.

1.5 Two-Dimensional Failure Modeling with Minimal Repair Baik et al. [11, 12] discuss two-dimensional failure modeling for a system where degradation is due to age and usage. They extend the concept of MR for the onedimensional case to the two-dimensional case and characterize the failures over a two-dimensional region under MR. An application of this important result to a manufacturers servicing costs for a two-dimensional warranty policy is given and they compare the MR strategy with the strategy of replacement of failure.

1.6 Prescriptive Modeling The goal of prescriptive modeling is to help the decision maker rather than explain an observed behavior. In the context of replacement models with MR, most of the models aim at determining the optimal replacement time. In order to achieve that goal, the renewal reward theorem (see, e.g., Ross [147], p. 52) is used as follows. Let Xi denote the length of the ith successive replacement cycle for i ¼ 1; 2; . . .: Let Ri denote the operational cost over the renewal interval Xi : Then fðXi ; Ri Þg constitutes a renewal reward process. If CðtÞ denotes the expected cost of operating the unit over time interval ½0; t; then it is well-known that lim

t!1

CðtÞ E½R1 : ¼ t E½X1

ð2Þ

For the infinite horizon case, we want to find optimal values of the parameters which minimize the total expected long-run cost per unit time given by (2). As can be seen, of utmost importance is the determination of the expected length of the

A Survey of Replacement Models with Minimal Repair

9

system life cycle. We will show in the survey its various formulas under the different replacement strategies and models. As for the expected operational cost over the first interval, it is generally written as E½R1 ¼ c1 þ c2 (number of minimal repairs);

ð3Þ

where c1 is the replacement cost and c2 is the MR cost. Another point of view is considered by Chen and Jin [36] who argue that traditional PM policies are all studied based on the expected cost criteria without considering the management risk due to the cost variability, which could lead to crisis. In order to solve this problem, they propose to consider the effects of both cost expectation and cost variability on the optimal maintenance policy. The concept of the long-run variance of the cost is defined to represent the maintenance management risk and then the objective function is revised accordingly to achieve a maintenance policy that is sensitive to the variability of the cost. A discrete time scale is considered. Let t ¼ 1; 2; . . . denote the discrete time units, and Ct ðpÞ denote the cost spent at time unit t under maintenance policy p: The long-run variance of the cost under maintenance policy p is defined as T 1X ½Ct ðpÞ /ðpÞ2 ; T!1 T t¼1

VðpÞ lim

where T is the horizon of maintenance and T 1X Ct ðpÞ; T!1 T t¼1

/ðpÞ lim

is the long-run average cost per unit time under policy p: The variability-sensitive optimization problem is formulated as: h i min ð/ðpÞÞ2 þ kVðpÞ ; k 2 ½0; 1Þ; p2K

where K is the class of the considered maintenance polices, and k is the costvariability-sensitive factor.

1.7 Applications of Minimal Repair MR means that the system is brought to the condition it had immediately before the failure occurred, i.e. the age of the system is not changed as a result of the repair. The purpose of the repair is to get the system back in operation as soon as possible. For many real world systems, a MR is enough to get the system operational again. For example, if a television set stops functioning because of the failure of an integrated-circuit panel, the MR to replace the panel is sufficient to fix the set. Also, if the water pump fails on a car, the MR consists only of installing a new water pump.

10

M.-S. Ouali et al.

However, real-world applications of MR are not as frequent in the literature as one would expect. In fact, only a limited number of application papers came to our attention. We review them briefly below. Das and Acharaya [51] present the optimal block replacement policies for the preventive replacement of induced draft fan systems in India. Induced draft fans are used to handle flue gas and fly ash in thermal power plants. They derive closed form expressions whose solutions give optimum PM intervals. Kumar [93] illustrates some situations under which the proportional hazards model (PHM) and its extensions can be used for identification of the most important covariates influencing electric load-haul-dump machines used for transport of fragmented ore from a production face to an ore-pass at the LKAB Kiruna Iron Ore Mine, Kiruna, Sweden. Lindqvist [104] reports studies related to the oil and gas installations at the Norwegian continental shelf, where analyses of operation data are used, e.g., for descriptions and comparisons of equipment, optimization of maintenance procedures, and control and improvement of safety and cost-effectiveness. Stillman [185] considers repairable system and life data methods for assessing PM of power distribution systems which contain a multiplicity of feeder lines. The research associated with this work is related to a very large widespread ruralprovincial system and a portion of an urban network in Australia. Gasmi et al. [61], see also Kahle [84], develop estimation procedures to obtain an operating/repair effects model consistent with data obtained from hydro-electric turbines. Operating data from one specific turbine of the British Columbia Hydro-Electric Power Generation System was used to test these procedures. Siqueira [183] presents an optimization model whose solution yields the best maintenance frequency for reliability-centered maintenance activities and reports tests conducted on the model by Companhia Hidro Eléctrica do Sao Francisco (CHESF), the largest electric utility company from Brazil. Caballero [23] describes how the Weibull point process is applied to repairable systems and reports applications to the Cuban sugar industry. Lugtigheid et al. [105] describe in great detail a case study for an original equipment manufacturer, Materials and Manufacturing Ontario (MMO) of Canada, operating a fleet of mobile machines under a full maintenance and repair contract (MARC). In order to describe the various mathematical models used, in reliability theory, in conjunction with MR, we recall that replacement policies are classified into age replacement policies or block replacement policies.

1.8 Replacement Policies The oldest replacement schemes are the age replacement and block replacement policies, Barlow and Proschan [15]. In the first, a component is replaced at a

A Survey of Replacement Models with Minimal Repair

11

certain age or when it fails, whichever comes first. In the second, all devices in a given class are replaced at predetermined intervals, or when they fail. • Age replacement: a unit is always replaced at the time of failure or T hours after its installation, whichever occurs first. Unless otherwise specified, T is constant. When T is random, the policy is called random age replacement. • Block replacement: all units of a given type are replaced simultaneously at times kT; ðk ¼ 1; 2; . . .Þ and at failure. The last policy is easier to administer (especially if the ages of components are not known) and may be more economical than a policy based on individual replacement. Popova and Popova [142] define other replacement policies main characteristics such as time horizon, objective functions, and failure time distributions. See also the recent book of Nakagawa [124]. The goal of this paper is to compile, as best as possible, an exhaustive list of all the research dealing with the mathematical modeling of the notion of MR. Of course, we may have missed some papers and we apologize to the concerned researchers if this is the case. There are already many surveys that complement each other and, among them, provide a comprehensive explication and bibliography of research into maintenance and replacement models, but there is no survey that is devoted specifically to the topic of MR. These surveys are McCall [109], Pierskalla and Voelker [141], Sherif and Smith [154], Thomas [187], Valdez-Flores and Feldman [189], Cho and Parlar [46], Jensen [75], Pham and Wang [140], Dekker et al. [54], and Wang [190]. We tried as much as possible in our survey to avoid any overlap with these papers, and only a few papers, mainly those of historical interest, may have been cited in the other surveys too. One difficulty that faced us in compiling our list of models comes from the fact that researchers tend to combine different activities into a single model in attempts to come up with some integrated models, see for example Dekker [53], Murthy and Asgharizadeh [117], Castanier et al. [26], and Aghezzaf and Najid [1]. They try to come up with a single framework which covers several optimization models. This may have lead us to some occasional repetitions in our listing. Still, we believe that it is not too much high a price to pay, since we are able to have some kind of classification of the research on the mathematical modeling of MR. Section 2 deals with MR in the context of the age replacement policy while Sect. 3 describes the various features considered by researchers in conjunction with MR in the context of the block replacement policy. We have further divided each section into three subsections dealing with T-policy, N-policy, and (N,T)policy, respectively, where T is the time to replacement and N is the number of repairs before replacement. Each subsection surveys the various features considered in conjunction which MR by researchers. Within each feature, papers are listed, as much as possible, in the order of their apparition. We note that the literature on age replacement policy is far more abundant than the literature on block replacement policy. Section 4 concludes the survey.

12

M.-S. Ouali et al.

2 Age Replacement The decision variable that triggers a replacement can be either continuous or discrete. When it is continuous, generally time T; the replacement policy is sometimes referred to as a T-policy. When it is discrete, generally a predetermined number N of intervening MRs allowed, we will call the replacement policy an Npolicy. A combination of both policies in which a replacement takes place when a replacement time T or replacement number N is reached, which ever occurs first, will be called (N, T)-policy.

2.1 T-policy Muth [119] is the first to consider an age replacement policy in which the system is minimally repaired up to a time T; and replaced at the first failure after T: Let FðxÞ and f ðxÞ denote the Cdf and pdf of the time to failure X: Since failures after T occur according to a NHPP with rate hðxÞ ¼ f ðxÞ=½1 FðxÞ equal to the hazard rate of X; the mean number of failures in ½0; T is given by

HðTÞ ¼

ZT hðxÞdx: 0

When the system has age T; the expected time remaining to the next failure is rðTÞ; the mean residual life function defined 1 rðTÞ E½X TjX [ T ¼ 1 FðTÞ

Z1

½1 FðzÞdz:

T

Therefore, the expected duration of a cycle is given by E½cycle ¼ T þ rðTÞ: Muth shows that the expected cost per unit of time over an infinite horizon is given by CðTÞ ¼

co þ cr HðTÞ ; T þ rðTÞ

where co and cr represent the cost of a replacement and the cost of a MR, respectively, and investigates the conditions under which an optimal solution T exists. Yun [194] generalizes the model of Muth [119] to the case when the MR cost c1 ðtÞ at time t is no longer constant but increases in system age. The duration of the

A Survey of Replacement Models with Minimal Repair

13

replacement cycle is still T þ rðTÞ as in Muth model but the mean cost rate becomes RT c0 þ 0 c1 ðtÞhðtÞdt ; CðTÞ ¼ T þ rðTÞ where c0 is the replacement cost, and hðtÞ and rðtÞ are the hazard rate and the mean residual life function, respectively. Butani [22] generalizes the model of Muth [119] to the case when the system is subject with some probability p to a major breakdown before the age T: He assumes that the system is replaced on a major failure before age T or on any failure after age T: He shows that the expected cycle length in this case is given by ZT

E½cycle ¼

GðtÞdt þ GðTÞrðTÞ;

0

¼ epQðtÞ and QðtÞ is the where rðtÞ is the mean residual life function, GðtÞ cumulative hazard rate of the system. The expected cost rate is then given by hR i T GðTÞ c0 þ c1 0 GðtÞdQt ; CðTÞ ¼ RT GðtÞdt þ GðTÞrðTÞ 0

where c0 is the replacement cost and c1 is the mean value of the repair cost. Sheu [157] tries to generalize the previous models in two ways: first, he assumes that if an operating unit fails at age y\T; it is either replaced by a new unit with probability pðyÞ at a cost co ; or it undergoes MR with probability qðyÞ ¼ 1 pðyÞ: Otherwise, a unit is replaced when it fails for the first time after age T: Second, he takes general random repair costs. The cost of the ith MR gðCðyÞ; ci ðyÞÞ of a unit at age y depends on the random part CðyÞ and the deterministic part ci ðyÞ: This paper seems to have some inaccuracies which are corrected by Sheu and Liou [175]. They show that the expected length of the replacement cycle is in this case

E½cycle ¼

ZT

p ðyÞdy þ F p ðTÞUðTÞ; F

0

p ðyÞ ¼ e where F

Ry

pðxÞrðxÞdx

is the survival distribution of the time between sucR1 ½1FðyÞdy cessive unplanned replacement, UðTÞ ¼ T 1FðTÞ ; and FðxÞ and rðxÞ are the cumulative life distribution and failure rate of the item, respectively. To find the optimal T; Sheu and Liou derive the following formula for the long run expected cost per unit time 0

14

M.-S. Ouali et al.

RT p ðsÞqðsÞrðsÞds c0 þ 0 hðsÞF CðTÞ ¼ R T ; 0 Fp ðyÞdy þ Fp ðTÞUðTÞ where hðyÞ ¼ EMðyÞ ECðyÞ gðCðyÞ; cMðyÞþ1 ðyÞÞ and MðyÞ counts the number of MRs in ½0; y:

2.1.1 Cost Limit Replacement Policy This policy is used when the repair cost of a failed system is random. In this case, one has to be careful not to expend more than the replacement cost on a catastrophic failure. Under this policy, when a system requires repair, it is first inspected and the repair cost is estimated. Repair is then undertaken only if the estimated cost is less than a certain amount, known as the ‘repair cost limit’. However, the repair cannot return the system to ‘as new’ condition but instead returns it to the average condition for a working system of its age. Assuming a Weibull distribution of time to failure b

FðtÞ ¼ 1 eðktÞ ; and a negative exponential distribution with average l of estimated repair costc GðcÞ ¼ 1 ec=l ; Park [132] shows that the duration of a replacement cycle is given by 1 1 c=ðblÞ E½cycle ¼ C 1 þ ; e k b while the average cost per replacement is i kec=ðblÞ h r l c þ lec=l : CðcÞ ¼ C 1 þ b1 Here r is the replacement cost. Note that the decision variable is not the time T but the repair cost limit c: Chung [48] derives an upper bound for the optimal repair cost limit c : Then, with this upper bound, a simple and accurate algorithm to get c is developed. Arguing that the general shape of a true dynamic repair cost limit should resemble an exponentially declining curve to reflect the depreciation of assets, Park [133] extends Park [132] to the case of an exponentially declining repair cost limit cðtÞ ¼ reat ;

A Survey of Replacement Models with Minimal Repair

15

where t is time, and a is the depreciation rate. The resulting long run average cost per unit time from repairs and replacement is, however, too complex and no optimization is attempted to derive the optimal values of r and a: Bai and Yun [10] generalize the model of Park [132] by assuming a general time to failure with hazard rate hðtÞ and cumulative hazard HðtÞ: The MR cost is distributed GðÞ: Their decision variables are the minimal repair cost limit L and the replacement period T: They formulate the mean cycle duration as

E½cycle ¼ MðTÞUðT; LÞ þ

ZT Uðt; LÞdt; 0

where MðtÞ is the mean residual life function and Uðt; LÞ ¼ eHðtÞGðLÞ : The mean cost rate is obtained CðT; LÞ ¼

þEL GðLÞ c0 þ c11GðLÞ ½1 UðT; LÞ ; RT MðTÞUðT; LÞ þ 0 Uðt; LÞdt

where c0 and c1 are the replacement and inspection costs, respectively, and EL is the mean value of repair cost. This paper is generalized by Yun and Bai [195] to the case of imperfect inspection. The decision variables are T and L: Butani [22] generalizes the model of Park [132] to the case when the system is subject with some probability p to a major breakdown before the age T: He assumes that the system is replaced on a major failure before age T or on any failure after age T: When a non-major failure occurs (with probability p) before age T; its repair cost is determined by inspection. If the repair cost does not exceed the predetermined value L; the system is minimally repaired, otherwise it is replaced. He shows that the expected cycle length in this case is given by

E½cycle ¼

ZT

eHðtÞ dt þ eHðTÞ rðTÞ;

0

where rðtÞ is the mean residual life function, HðtÞ ¼ ½1 pKðLÞQðtÞ; QðtÞ is the cumulative hazard rate of the system, and KðxÞ is the Cdf of the MR cost. The expected cost rate is then given by 1 KðLÞþc2 HðTÞ c0 þ p½c1 pKðLÞ 1 e ; CðTÞ ¼ R T HðtÞ dt þ eHðTÞ rðTÞ 0 e where c0 is the replacement cost and c1 is the mean value of the repair cost, and c2 is the inspection cost. Beichelt [19] considers a system subject to two types of failures and failure type probabilities are allowed to depend on the system age at failure time.

16

M.-S. Ouali et al.

Type I failures are removed by MRs, type II failures are removed by replacements. Next he identifies the failure types as follows: 1. Type I failure occurs if and only if random repair cost C is less than or equal to a given repair cost limit c; 2. Type II failure occurs if and only if random repair cost C is less than the given repair cost limit c: Systems with cost limit replacement policies turn out to be special cases of systems subject to two types of failures. Chien and Chen [42] also divide system failures into two categories: a type I failure is a minor failure that can be corrected by MR and a type II failure, which occurs with probability p; is a catastrophic failure in which the system is damaged beyond repair. They consider a model based on a cumulative repair cost limit. A cumulative repair cost limit policy uses information about a system’s entire repair cost history to decide whether the system is repaired or replaced. They also take a random lead-time for replacement delivery into account as follows: if an ordered spare unit has not arrived when a replacement is necessary, the replacement execution must be postponed. In other words, if a spare unit is available, the system is replaced preventively at age T; or at the jth type I failure at which the accumulated repair cost exceeds the pre-determined limit n; or at the first type II failure, whichever occurs first. Once ordered, the lead-time has Cdf LðtÞ and finite mean lL : For this model, the expected length of a replacement cycle is given by E½cycle ¼

1 X j¼0

ðjÞ

G ðnÞ

ZT

p ðtÞpj ðtÞdt þ lL ; LðtÞF

0

p ðtÞ ¼ epKðtÞ with KðtÞ denoting the cumuwhere GðÞ is the repair cost Cdf, F lative hazard rate, and pj ðtÞ ¼ ½ð1 pÞKðtÞj eð1pÞKðtÞ =j! is the probability of j type I failures in ½0; t: The resulting expected operating cost in a replacement cycle is, however, too complex and no optimization is attempted to derive the optimal value of T: A generalization of this model with optimization is presented in Chien et al. [44].

2.1.2 Inventory Models Some authors have studied replacement policies for systems subject to failure in the context of inventory control theory, mainly Economic Order Quantity (EOQ) and Economic Production Quantity (EPQ) type models. We first mention the EOQ models. Park and Park [135] consider the joint inventory and replacement model with MR for any failures between replacements. In this policy Q units are purchased per order. The operating unit is replaced after use during the time interval Ti if P inventory level is ði 1Þ: The entire cycle repeats after Q i¼1 Ti : The decision

A Survey of Replacement Models with Minimal Repair

17

variables are the order quantity Q and the replacement intervals fTi g ¼ ðTQ ; TQ1 ; . . .; T1 Þ: Denoting by HðÞ the cumulative hazard function, the total cost per unit time is P PQ co þ cp Q þ cf Q i¼1 HðTi Þ þ ch i¼1 ði 1ÞTi ; CðQ; fTi gÞ ¼ PQ T i¼1 i where cf is the expected cost of a MR, cp is the expected cost of a replacement, ch is the inventory carrying cost per item per unit time, and co is the fixed ordering cost. The notion of random lead-time we have mentioned in the previous paragraph is usually taken into account in EOQ type models and has been considered by many researchers. Sheu and Griffith [170] consider a system subject to two types of failures. At age y; type I (minor) failure occurs with probability qðyÞ and is corrected with MR. A type II failure (catastrophic failure) occurs with probability pðyÞ ¼ 1 qðyÞ and is followed by unit replacement. If the random lead-time finishes before the type II failure or before the scheduled replacement of a unit, the replacement can be made immediately when the type II failure or the scheduled replacement of a unit takes place. Otherwise, the replacement must wait until the random lead-time finishes, in which case no scheduled replacement can be made because there is no unit available for replacement. The replacement policy can be summarized as follows: 1. If the ordered spare arrives before time T and no type II failure occurs before time T; then the delivered unit is put into stock and the unit is replaced by that spare at age T; at a cost g1 (preventive replacement). 2. If the ordered spare arrives after time T and no type II failure occurs before the arrival of the ordered spare unit, then the unit is replaced by the spare as soon as the spare is delivered, at a cost g2 (delayed preventive replacement). 3. If the ordered spare arrives before a type II failure which occurs before the time T; then the delivered unit is put into stock and the unit is replaced by the spare upon the type II failure, at a cost g3 (corrective replacement). 4. If a type II failure occurs before the arrival of the ordered spare, then the unit is shut-down and replaced by the spare as soon as the spare is delivered, at a cost g4 (delayed corrective replacement). Denoting by GðxÞ the Cdf of the lead-time of a new unit for replacement and letting GðxÞ ¼ 1 GðxÞ; the expected duration of a replacement cycle is given by E½cycle ¼

ZT 0

p ðtÞGðtÞdt þ F

Z1

½1 GðtÞdt;

0

Ry p ðyÞ ¼ e 0 pðxÞrðxÞdx is the safety function. Also, denoting by ch the costwhere F rate for stocking a unit and by cs the cost-rate resulting from system down, the average cost per unit time is

18

M.-S. Ouali et al.

E½R1 ; R p ðtÞGðtÞdt þ 1 ½1 GðtÞdt F 0

CðTÞ ¼ R T 0

where p ðTÞGðTÞ þ g2 E½R1 ¼ g1 F

Z1

p ðtÞdGðtÞ þ g3 F

T

þ

ZT 0

þ ch

GðtÞdFp ðtÞ þ g4

0

p ðtÞGðtÞhðtÞqðtÞrðtÞdt þ F

Z1 Z t 0

ZT

ZT

p ðtÞGðtÞdt þ cs F

0

Z1

Z1

Fp ðtÞdGðtÞ

0

p ðyÞhðyÞqðyÞrðyÞdydGðtÞ F

0

GðtÞF p ðtÞdt:

0

Here, hðyÞ ¼ EMðyÞ EWðyÞ /ðWðyÞ; cMðyÞþ1 ðyÞÞ where MðtÞ counts the number of MRs in ½0; t; /ðWðyÞ; ci ðyÞÞ is the cost of the MR of the unit at age y; and WðyÞ is the random repair cost at age y: Jhang [78] extends the model of Sheu and Griffith [170] by assuming that units are inspected upon delivery. Each time a replacement takes place, a new unit is ordered at an order and inspection cost c0 ; and it is delivered then inspected after its arrival. Letting N1 denote the total number of successive orders for each replacement until the arrival of the first accepted unit, then N1 has a geometric Cdf with constant probability p1 that the inspected unit has no defect. The random delivery time Li of the ith order has Cdf GðxÞ; pdf gðxÞ and finite mean l: The total P 1 delivery time is L ¼ Ni¼1 Li : The replacement policy is the same as in Sheu and Griffith [170], with costs now being g1 ¼ N1 c0 þ cr ; g2 ¼ N1 c0 þ cr0 ; g3 ¼ N1 c0 þ cf 0 ; and g4 ¼ N1 c0 þ cf ; where cr ; cr0 ; cf ; cf 0 are the replacement costs for the states 1, 2, 3, 4 described above in Sheu and Griffith [170] model. In this case, the expected duration of the replacement cycle is given by

E½cycle ¼

1 X n¼1

qn1 1 p1

ZT

p ðtÞGn ðtÞdt þ l ; F p1

0

where Gn ðxÞ is the n-fold convolution of GðxÞ with itself and q1 ¼ 1 p1 : The expected cost per unit time is given by CðTÞ ¼ P1

E½R1 ; p ðtÞGn ðtÞdt þ l F

RT n1 n¼1 q1 p1 0

with

p1

A Survey of Replacement Models with Minimal Repair

19

0 Z1 ZT 1 c0 X @ Fp ðtÞdGn ðtÞ þ cf 0 Gn ðtÞdFp ðtÞ E½R1 ¼ þ cr Fp ðtÞGn ðtÞ þ cr0 p1 n¼1 T

þcf

Z1

Fp ðtÞdGn ðtÞ þ

ZT

0

þ

þcd

p ðtÞGn ðtÞhðtÞqðtÞrðtÞdt F

0

Z1 Z t 0

0

p ðyÞhðyÞqðyÞrðyÞdydGn ðtÞ þ cs F

0

Z1

Z1

p ðtÞGn ðtÞdt F

0

1 n ðtÞFp ðtÞdtAqn1 p1 ; G 1

0

where cd is the cost per unit time resulting from system down, cs is the cost per unit time for stocking a unit, rðtÞ is the failure rate function, and hðyÞ is defined as above in the model of Sheu and Griffith [170].

2.1.3 Production Models Dagpunar [50] examines the problem of lot-sizing in a production facility in the face of machine breakdowns. A single machine produces components at a rate P and the consumer demand rate is Dð\PÞ: Let c2 and t2 denote the expected cost and the known duration of a production set-up. The set-up includes a machine maintenance which restores it to ‘as-new’ condition. Between consecutive set-ups the machine may fail, in which case it undergoes MRs, the expected duration and expected cost of each repair being t1 and c1 ; respectively. The expected time between consecutive setups is E½cycle ¼

q þ t1 Rðq=PÞ; D

where q is the production lot-size and RðtÞ is the cumulative hazard of the time to failure. The decision variable being the lot-size q; the expected cost rate is given by n o R q=P c2 þ c1 Rðq=PÞ þ hð1 D=PÞ 0:5q2 =D þ t1 P 0 ½Rðq=PÞ RðtÞdt : CðqÞ ¼ q=D þ t1 Rðq=PÞ Dohi et al. [57] consider a single-product manufacturing system with production lot-size q and production rate p per unit time. The decision variable is T ¼ q=p: They introduce a maximum number k 0 of allowable MRs per cycle and assume that a major repair is started with repair time L when the number of failures exceeds k: Since the products are uniformly demanded after the machine breakdown, a shortage will occur if the major repair is needed and if the completion time of repair is longer than the time when the inventory level becomes zero even

20

M.-S. Ouali et al.

after the ðk þ 1Þst breakdown. After the major repair or after completing production without failure for one production period, the production machine becomes as good as new. Denoting by RðtÞ the cumulative hazard and by FðtÞ the Cdf of the repair time L; the mean duration of one cycle is found to be E½cycle ¼

Z1

ZT 0

þ

ðt þ sÞdFðsÞdGk ðtÞ

ðpdÞt=d

ZT

ðpdÞt=d Z

0

ðp dÞt ðp dÞT tþ dFðsÞdGk ðtÞ þ T þ Gk ðTÞ; d d

0

P ½RðtÞi eRðtÞ ; and d is the demand rate. Also, where Gk ðtÞ ¼ 1 i¼kþ1 Pði; tÞ; Pði; tÞ ¼ i! the expected cost per unit time in the steady-state is CðTÞ ¼

E½R1 ; E½cycle

where 8 T
ðp dÞt ðp dÞt dGk ðtÞ E½R1 ¼ cp þ ci tþ : d 2 0

ðp dÞT ðp dÞT Gk ðtÞ Tþ d 2

ZT Z1 ZT Z1 ðp dÞt dFðsÞdGk ðtÞ þ cr s sdFðsÞdGk ðtÞ þ cs d d 0

þ cm

ðpdÞt=d

( k X

iPði; TÞ þ

i¼0

k X

)

0

0

kPði; TÞ ;

i¼kþ1

and the cost components are as follows: cp is the fixed production and PM cost per lot, cr is the major repair cost per unit time, cs is the shortage cost per unit, ci is the inventory holding cost per unit time per unit, and cm is the MR cost per failure. Makis [106] study a model with lost sales where the lot size Q; the production rate P; and the demand rate D are fixed and both the time to failure and machine replacement time are generally distributed. Denote by TM the replacement time and FM ðtÞ the Cdf of TM : The machine is replaced after completing m production runs. For this model, the expected duration of a replacement cycle is given by

E½cycle ¼ E½TM þ

mQ=D Z 0

PD ðP DÞt 1 dt: ½1 FM ðtÞ mP mP

A Survey of Replacement Models with Minimal Repair

21

Denoting by S the set-up cost, R the production resumption cost after MR, K the preventive replacement cost, cðtÞ the MR cost at age t; hðxÞ the inventory holding cost per unit time when x units are present, h the operating cost per unit time, and L the cost associated with one unit of lost demand, the expected average cost per unit time is given by CðQ; mÞ ¼

E½R1 ; E½cycle

where E½R1 ¼ K þ mS þ h þ

mQ=P Z

0

mQ þ LDE½TM P

P ðP DÞt ðP DÞL ðR þ cðtÞÞrðtÞ þ h ½1 FM ðtÞ D m m ðP DÞt dt; m

and rðtÞ is the failure rate of the time to failure. Note that the decision variables of this model are the lot-size Q and the number m of production cycle before replacement of the production system. Sheu and Chen [164] develop an integrated model for the joint determination of both economic production quantity and level of PM for an imperfect production process. The effect of PM activities on the deterioration pattern of the process is modeled using the imperfect maintenance concept. In this concept, it is assumed that after performing PM, the ageing of the system is reduced in proportion to the PM level. After a period of time in production, the process may shift to out-ofcontrol states, either type I or type II. A MR will remove the type I out-of-control state. If a type II out-of-control state occurs, the production process has to stop, and then restoration work is carried out. The production rate P and the demand rate D are assumed to be finite. The system is inspected periodically and hj is the length of the jth inspection interval. For this model, it is shown that the expected inventory cycle length is given by m Y PX hj ð1 hpi Þ; D j¼1 i¼1 j1

E½cycle ¼ where pi ¼

Fðyj Þ Fðwj1 Þ ; j1 Þ Fðw

is the conditional probability that the process shifts to the out-of-control state given that the process was in an in-control state, m is the number of inspections per

22

M.-S. Ouali et al.

cycle, h probability of a type II out-of-control state when the system is out-ofcontrol, and yj ðwj Þ represents the actual age of the system right before (after) the jth PM. The expected total cost per unit time is the expected cost over the renewal interval per the expected cycle length where the expected cost over the renewal interval is composed of the set-up cost, inspection cost, inventory holding cost, quality-related costs, and PM cost. The decision variables are the lengths h1 ; h2 ; . . .; hm of the inspection intervals, the cost Cpm of PM, and the number m of inspections. Chen [35] extends Sheu and Chen [164] to take into account the possibilities of inspection errors and shortages. Using the same notation as Sheu and Chen [164], the expected length of a production cycle becomes m Y PX hj ½ð1 pi Þð1 ahÞ þ pi ð1 h þ bhÞ; D j¼1 i¼1 j1

E½cycle ¼

where a is the probability of exceeding the control limits given that the process is in control and b is the probability of not exceeding the control limits given that the process is out of control. The expected cost over the renewal interval is also adjusted by including the shortage cost component and the cost of producing defective items, to end up with a very complex expression.

2.1.4 Job Shop Scheduling Production is also considered in job shop scheduling. The most basic version of job scheduling is as follows: given n jobs of varying sizes, which need to be scheduled on m identical machines, minimize the total length of the schedule, that is, when all the jobs have finished processing. Many variations of the problem exist. Preventive maintenance planning and production scheduling are two activities that are inter-dependent but most often performed independently. Cassady and Kutanoglu [24] consider a single machine in a manufacturing system that is required to process a set of n jobs. The purpose of production scheduling for this particular problem is to choose an optimal sequence for the jobs. The machine of interest is subject to MR upon failure, and can be renewed by PM. Assuming an age-based PM policy is applied, i.e. PM is performed on the machine after s time units of operation, the expected length of the replacement cycle is given by E½cycle ¼ s þ HðsÞtr þ tp ; where tr is the time required to repair the machine, tp is the time required to perform PM on the machine, and HðsÞ is the expected number of machine failures in s time units. Cassady and Kutanoglu [24] propose an integer programming model that coordinates PM planning decisions with single-machine scheduling decisions so that the total expected weighted completion time of jobs is minimized.

A Survey of Replacement Models with Minimal Repair

23

2.1.5 Shock Models Many PM models for systems subject to shocks have been considered. Different models have been proposed, depending on the type of damage the shock causes to the system. For example, it may have the system fail when the cumulative damage reaches some threshold level; it may have the system fail when the stress generated by the shock exceeds some critical level; it may increase the failure rate of the system by a certain amount; or it may increase the operating cost of the system. The shock could also be lethal or no-lethal. Sheu [159] considers an optimal ordering policy of a system subject to shocks. Shocks that arrive according to a NHPP. As shocks occur, a system has two types of failure. Type I failure (minor failure) is removed by a MR, whereas when type II failure (catastrophic failure) occurs a unit has to be replaced. Given a specified time T0 ; the replacement policy of the model can be summarized as follows: 1. if the type II failure occurs before T0 ; then the system is shut down and replaced by the spare as soon as the spare is delivered; 2. if the type II failure occurs between T0 and the arrival of the regular ordered spare, then the system is shut down and replaced by the spare as soon as the spare is delivered; 3. if the type II failure occurs after the arrival of the regular ordered spare, then the system is replaced by the delivered spare immediately irrespective of the state of the original system. The lead-time of an expedited (regular) order is random with pdf ke ðxÞ ðkr ðxÞÞ and finite mean ue ður Þ: Then, the expected length of a replacement cycle is given by

E½cycle ¼

ZT0

0 Þur þ HðT0 Þue ; HðyÞdy þ HðT

0

where ¼ HðtÞ ¼ 1 HðtÞ and HðtÞ

1 KðtÞ X e ½KðtÞk k¼0

k!

k: P

k is the (known) probability that the first k shocks are type I failures while Here P Rt KðtÞ ¼ 0 rðsÞds where rðyÞ denotes the intensity rate of the system subject to shocks at age t: By introducing a cost ce ðcr Þ of an expedited (regular) order, a shortage cost cs ; and a cost ck for the kth MR, Sheu [159] derives the expected cost per unit time in the long run as follows CðT0 Þ ¼

E½R1 ; E½cycle

24

M.-S. Ouali et al.

where 0 0 Þ þ cs B 0Þ E½R1 ¼ ce HðT0 Þ þ cr HðT @ue HðT0 Þ þ ur HðT

Z1 TZ0 þx 0

þ

Z1 X 1 0

kþ1 ckþ1 P

k¼0

T Z0 þx

1 C HðyÞdyk r ðxÞdxA

T0

eKðyÞ ½KðyÞk rðyÞdykr ðxÞdx: k!

0

In Sheu [160], a system is subject to shocks that arrive according to a NHPP. As shocks occur the system has two types of failures. Type I failure (minor failure) is removed by a MR, whereas type II failure (catastrophic failure) is removed by an unplanned (or unscheduled) replacement. Letting rðtÞ denote the intensity rate Rt at which a system of age t is subject to shocks and KðtÞ ¼ 0 rðsÞds; then the expected duration of a replacement cycle is given by E½cycle ¼

ZT

GðtÞdt;

0

where T denotes the planned replacement period, and ¼ GðtÞ

1 KðtÞ X e ½KðtÞk k¼0

k!

k; P

k is the (known) probability that the first k shocks are type I failures. Also, and P k =P k1 ; the expected long-run cost per unit time is given by letting qk ¼ P ( 1 CðTÞ ¼ R T R2 þ ðR2 R1 Þ½1 GðTÞ GðtÞdt 0

9 ! ZT X k 1 KðyÞ e ½KðyÞ = þ ðnkþ1 ðtÞqkþ1 rðtÞ þ mk ðtÞÞ Pk dt ; ; k! k¼0 0

where R2 is the cost of the planned replacement of the system at age T; R1 is the cost of the unplanned replacement of the system at the time of type II failure, gðcðtÞ; ck ðtÞÞ is the cost of the kth MR at age t with the expected cost nk ðtÞ ¼ EcðtÞ ½gðcðtÞ; ck ðtÞÞ; where cðtÞ is the age-dependent random part, ck ðtÞ is the deterministic part which depends on the age and the number of the MR, and mk ðtÞ denotes the cost per unit time of maintenance of the system at time t 2 ½Sk ; Skþ1 Þ; where Sk is the arrival time of the kth shock for k ¼ 0; 1; 2; . . . with S0 ¼ 0: Sheu [160] computes also the expression of the total a-discounted cost for the age replacement maintenance policy.

A Survey of Replacement Models with Minimal Repair

25

Sheu and Chien [164, 165] generalize Sheu [159] by considering a model which incorporates the ordering policy for a spare, storing for a spare, random lead-time for delivering a spare, and system downtime for waiting a spare. The system is subject to shocks that arrive according to a NHPP. The summary of the replacement policy is the same as in Sheu and Griffith [170], already described above. The expected duration of a replacement cycle is given by

E½cycle ¼

ZT

HðtÞGðtÞdt þ

Z1

0

GðtÞdt;

0

where ¼ HðtÞ

1 KðtÞ X e ½KðtÞk k¼0

k!

k; P

and GðtÞ is the Cdf of the lead-time. The average cost per unit time is given by CðTÞ ¼ R T 0

E½R1 HðtÞGðtÞdt þ

R1 0

GðtÞdt

;

where E½R1 ¼ g1 HðTÞGðTÞ þ g2

Z1

HðtÞdGðtÞ þ g3

T

þ

þ

ZT X 1 i¼1 0 Z1 Z t 0

þ ch

0

ZT 0

ai ðtÞ

1 X i¼1

ZT

GðtÞdHðtÞ þ g4

0

Z1 HðtÞdGðtÞ 0

eKðtÞ ½KðtÞi1 Pi GðtÞrðtÞdt ði 1Þ!

ai ðyÞ

eKðyÞ ½KðyÞi1 Pi rðyÞdydGðtÞ ði 1Þ!

HðtÞGðtÞdt þ cs

Z1

GðtÞHðtÞdt:

0

Here, ai ðtÞ ¼ EWðtÞ ½/ðWðtÞ; ci ðtÞ where /ðWðtÞ; ci ðtÞÞ is the cost of the ith MR of the unit at age t: Chien and Sheu [43] consider an operating system subject to shocks that arrive according to a NHPP fNðyÞ; y 0g with intensity rðtÞ and mean value function Rt KðtÞ ¼ 0 rðuÞdu: As shocks occur the system has two types of failure: type I failure (minor) or type II failure (catastrophic). A generalization of the age replacement policy for such a system is proposed and analyzed in this study. Under such a policy, if an operating system suffers a shock and fails at age yð tÞ; it is

26

M.-S. Ouali et al.

either replaced by a new system (type II failure) or it undergoes MR (type I failure). Otherwise, the system is replaced when the first shock after t arrives, or the total operating time reaches age T ð0 t TÞ; whichever occurs first. The expected duration of the replacement cycle of this model is given by E½cycle ¼

Zt

GðyÞdy þ GðtÞUðt; TÞ;

0

R Tt P eKðyÞ ½KðyÞk t ðxÞ ¼ FðtþxÞFðtÞ t ðxÞdx; F ; and GðyÞ ¼ 1 : where Uðt; TÞ ¼ 0 F k¼0 k! FðtÞ Here FðxÞ is the Cdf of the waiting time until the first shock. The long-run expected cost per unit time of this policy is given by Cðt; TÞ ¼

cu GðtÞ þ

Rt 0

t ðT tÞ GðtÞ E½~ aNðyÞþ1 ðyÞrðyÞdy þ cr Ft ðT tÞ þ cp F ; Rt 0 GðyÞdy þ GðtÞUðt; TÞ

where cu is the replacement cost performed before age t; cr is the replacement cost performed at the first shock after t; cp is the replacement cost performed when the total operating time reaches age T; /ðcðyÞ; ci ðyÞÞ is the cost of the ith minimal repair at age y where cðyÞ is the age-dependent random part and ci is the deterministic part that depends on the age and the number of the MR, ai ðyÞ ¼ i ; and P k EcðyÞ ½/ðcðyÞ; ci ðyÞÞ is the expected cost of ith MR at age y; ~ai ðyÞ ¼ ai ðyÞP is the probability that the first k shocks are type I failures.

2.1.6 Burn-In Models Burn-in is a method used to improve the quality of products. It screens out defective units before they are shipped to customers or put into field operation. That is, before delivery to the customer, the units are tested under some conditions that approximate the working conditions in the field operation. Those units which failed during the burn-in procedure are scrapped or repaired, and only those which survived the burn-in procedure are considered to be of good quality. These will then be shipped to the customers or put into field operation. It is important to determine the appropriate length of this procedure because burn-in is usually costly. Sheu and Chien [165, 166] consider a system subject to two types of failures. Type I (minor) failure occurs with probability 1 p and is removed by a MR, whereas type II failure (catastrophic failure) occurs with probability p and is removed only by a complete repair (replacement). If the unit fails before completion of the burn-in time b; only MR is done for the type I failures with shop MR cost csm ; and the burn-in procedure is continued for the repaired unit. A complete repair is performed for the type II failures with shop complete repair cost cs ; then the time is reset to 0, and the repaired unit will be burned-in again, and so on.

A Survey of Replacement Models with Minimal Repair

27

The procedure stops when no type II failure occurs during ½0; bÞ for the first time. Let hðbÞ be the costs that is incurred until the first unit survives burn-in without type II failure. If the unit survives the burn-in procedure, it is then put into field operation and it is operated until the first occurrence of type II failure. A gain is obtained which is proportional to the length of the residual life. Only MR is done for each type I failure with repair cost cm ; and a replacement is performed for type II the Sf of the waiting time until the first type failure with a cost cf : Denoting by GðtÞ II failure of a burned-in unit, the cost function is given by 1 CðbÞ ¼ cs þ ðcm csm Þ 1 p Rb R1 þ csm 1p 1 þ cs K b GðtÞdt c0 0 GðtÞdt þ ; GðbÞ where c0 is the cost-rate for operating the burn-in procedure and K is the gain proportionality constant. Note that the decision variable is the burn-in time b: Sheu and Chien [165, 166] consider also a second model where, instead of the gain proportional to the mean life of the unit in field operation, the expenditure due to replacement at a catastrophic failure during field operation is taken into account. They assume that if the unit fails in field use, only MR is performed for the type I failure with MR cost cm ; whereas the unit is replaced by another burned-in unit with cost cf when type II failure first occurs. The long-run average cost becomes h CðbÞ ¼

cf þ c m

1 p

1

i

h i Rb ðcf cs Þ þ ðcm csm Þ 1p 1 GðbÞ þ c0 0 GðtÞdt R1 : GðtÞdt b

Cha et al. [31] also assume that two types of system failures may occur. Type I failure (minor failure) occurs with probability 1 pðtÞ and can be removed by MR, and type II failure (catastrophic failure) occurs with probability pðtÞ and can only be removed by replacement. A new system is burned-in for time b; and it will be put in field use if it survives the burn-in. In the field use, the system is replaced by another system, which has survived the same burn-in time b; at the field use age T or at the time of the first type II failure, whichever occurs first. For each type I failure occurring during field use, only MR will be performed. Let rðtÞ denote the Rt b ðtÞ ¼ failure rate of the system, let Kp ðtÞ ¼ 0 pðuÞrðuÞdu; and let G Kp ðbþtÞKp ðbÞ e : Then the length of a replacement cycle is given by 2 T 3 Z Z1 b ðtÞdt þ 4 rðb þ tÞG b ðtÞdt Gb ðTÞ5m1 þ Gb ðTÞm2 þ G b ðTÞm3 ; G E½cycle ¼ 0

0

where m1 ; m2 ; and m3 represent the means of a MR time, an unplanned replacement caused by the type II failure, and time for a replacement done at the system field

28

M.-S. Ouali et al.

use age T by planned PM policy, respectively. Cha et al. [31] look for the burn-in time and replacement policy maximizing the steady-state availability of a repairable system defined Aðb; TÞ ¼

E½total up time in a renewal cycle : E½length of renewal cycle

For this model, E½total up time in a renewal cycle ¼

Z1

b ðtÞdt: G

0

Cha [27] generalizes the model of Sheu and Chien [165, 166] by considering the following two burn-in procedures: 1. Burn-in procedure I: consider a fixed burn-in time b; and begin to burn-in a new component. If the component fails before burn-in time b; then repair it completely regardless of the type of failure, with shop complete repair cost cs ; and then burn-in the repaired component again (i.e., restart the burn-in procedure), repeating as necessary. 2. Burn-in procedure II: consider a fixed burn-in time b; and begin to burn-in a new component. On each Type I failure during burn-in, only MR is done with shop MR cost csm ; and continue the burn-in procedure for the repaired component. If a Type II failure occurs before burn-in time b; then a complete repair is performed with shop complete repair cost cs ; and then restart the burn-in procedure for the repaired component. For each procedure, three different cost functions are considered. In the first one, a gain due to no type II failure within mission time is included. The second one includes a gain proportional to the mean time to the first type II failure. The third one assumes a replacement at type II failure during field operation. In the researches on optimal burn-in, the assumption of a bathtub shaped failure rate function is commonly adopted. Cha [28] considers a more general assumption on the shape of the failure rate function, which includes the bathtub shaped failure rate function as a special case.

2.1.7 Warranted Systems A warranty is a contractual obligation offered by the manufacturer in connection with the sale of a product. A warranted product entails a greater cost to the supplier than an identical item sold without warranty. Also, a buyer should be willing to pay more for a warranted item than for an identical item without warranty. Warranty policies can be divided into renewing warranties and non-renewing warranties. Under a renewing warranty, an item failing before the end of its warranty period is replaced by a new one (or restored to as good-as-new), and the

A Survey of Replacement Models with Minimal Repair

29

warranty is renewed. Under a non-renewing warranty, failed items are replaced or repaired by the manufacturer at no cost to the user or at a partial cost to the user during the warranty period, and the original warranty is not altered. Sahin and Polatoglu [148] consider a system where the user can modify a standard warranty arrangement by applying a maintenance procedure for a period of time, following the expiration of warranty. They study both cases where maintenance is performed after either a renewing or a non-renewing warranty. For each case, two types of replacement policies, following the expiration of warranty are considered: (1) the user applies MR for a fixed length of time and replaces the unit by a new one at the end of this period, and (2) the unit is replaced by the user at first failure following the MR period. 1. In the case of a renewing warranty, a cycle begins with the installation of a new item. If the item fails during its warranty period w; it is replaced by a new one under the same warranty and a new cycle begins. When it survives to age w; the cycle is extended by a maintenance period of fixed length s; during which the user maintains the item by MR. There is a replacement at the end of this period, paid for in full by the user. For this fixed-maintenance-period policy, the mean cycle length is E½cycle ¼

Zw

tf ðtÞdt þ ðw þ sÞFðwÞ;

0

where f ðtÞ and FðtÞ are the pdf and Sf of the failure time. In the variablemaintenance-period policy, the maintenance period terminates at the time of first failure after w þ s; not at w þ s: The mean cycle length becomes E½cycle ¼

Zw

tf ðtÞdt þ ½w þ s þ lðw þ sÞFðwÞ;

0

R 1

is the mean residual life of FðtÞ at t: =FðtÞ where lðtÞ ¼ t FðuÞdu 2. Under renewing warranty, the age of the unit surviving to the end of a warranty period is always w: Under non-renewing warranty, the age y of the item that is in use at the end of a warranty period could be anywhere between 0 and w: If y is known, maintenance policies are non-stationary. In this case, for a fixedmaintenance-period policy, the length of the cycle is E½cycle ¼ w þ s; while for a variable-maintenance-period policy, it is E½cycle ¼ w þ s þ lðw þ sÞ: If y is unknown, maintenance policies are stationary. In this case, for a fixedmaintenance-period policy, the cycle length is again E½cycle ¼ w þ s;

30

M.-S. Ouali et al.

while for a variable-maintenance-period policy, the cycle length depends on whether the manufacturer replaces an item failing under warranty by a new one (replacement warranty) or the manufacturer performs MR under warranty (repair warranty), so that R 1 w ðsÞ; replacement warranty; w ðtÞdt =G G s E½cycle ¼ w þ s þ lðw þ sÞ; repair warranty: w ðtÞ is the survival function. For all these models, the length w of the where G warranty period is assumed to be given while the decision variable is the length s of the maintenance period. Zuo et al. [201] consider a multi-state deteriorating and repairable product. The product may experience N different working states (gradually deteriorating from state 1 to state N). It may fail from any working state. When an item fails during the warranty period of length T; the manufacturer has the option of either repairing it using MR or replacing it with a new one free of charge to the customer. Specifically, if the failed item is in failure state i ð1 i NÞ; and the residual warranty time is t ð0 t\TÞ; then it is replaced by a new one if and only if K þ 1 i N and t a; otherwise, it is minimally repaired, where 1 K N and 0\a T: The manufacturer’s decision of repair or replacement depends on two variables: the residual warranty period a (from the present time to expiration of warranty) and the degree K of deterioration of the failed item. These variables are chosen so as to minimize the expected cost of servicing cost during the warranty period. Ja et al. [71] aim at estimating warranty costs during the life cycle of a product. This is important to the manufacturer who has to plan for creating a fund for warranty reserves. Replacement or repair costs associated with product-failure within the warranty period are drawn from this fund. They consider a policy where warranty is not renewed on product failure within the warranty period but the product is minimally repaired by the manufacturer. MR costs DðtÞ are assumed to depend on the product age. The sales process is modeled by an NHPP process with rate function kðtÞ; and the quantities sold are independent and identically distributed (iid) with batch sizes having the common mean E½Y: Then, the total discounted warranty cost of all sales during the product life cycle L; where the decision variable is the warranty length W; is given by CðWÞ ¼

ZW

at

E½DðtÞrðtÞe 0

ZL dtE½Y

eat kðtÞdt;

0

where a is the continuous discounting factor and rðtÞ is the product failure rate. Ja et al. [72] consider a stochastic sales process derive the first and second moments of the producers total discounted warranty cost of single sale for singlecomponent items under four different kinds of warranty policies from the

A Survey of Replacement Models with Minimal Repair

31

manufacturers perspective: renewable free-replacement, nonrenewable free replacement, renewable pro-rata, and nonrenewable minimal-repair warranty policy. They also compute the mean and variance of the producers total discounted warranty cost of the aggregate sales. Furthermore, they use those quantities to derive the warranty reserve by applying a normal approximation, based on a desired coverage probability that warranty reserves do not drop below zero. Sheu and Yu [176] consider a repair–replacement warranty strategy where a repairable item is sold with a non-renewing free replacement warranty of period W which requires the manufacturer either to repair or to replace the item when it fails. The maintenance strategy is characterized by two parameters K and L; where 0\K\L\W; and is defined in the following way: 1. All item failures that occur in the interval ð0; KÞ are rectified by MRs. 2. During the first failure in the interval ½K; L; the failed product is replaced with a new one and any subsequent failures in this interval are minimally repaired. 3. Any failure during the period ½L; W is always minimally repaired. The cost structure comprises the cost cr to replace an item and the cost gðcðyÞ; ci ðyÞÞ of the ith MR at age y; where cðyÞ is an age-dependent random part and ci ðyÞ is a deterministic part depending on the age and the number of MRs. The expected cost of servicing the warranty over the warranty period, in which the decision variables are K and L; is given by 1 FðKÞ½hðKÞ GðKÞ FðLÞ½hðLÞ GðLÞ JðK; LÞ ¼ Rh ðWÞ þ FðKÞ 9 ZL = FðyÞhðW yÞrðW yÞdy ; ; K

Rt

where Rh ðtÞ ¼ 0 hðzÞrðzÞdz; GðtÞ ¼ hðtÞ cr þ Rh ðWÞ Rh ðtÞ Rh ðW tÞ; hðzÞ ¼ ENðzÞ EcðzÞ gðcðzÞ; cNðzÞþ1 ðzÞÞ ; NðtÞ is the NHPP with intensity rðtÞ representing the number of MRs, and FðtÞ is the Cdf of the item lifetime. Jhang [77] considers the two-phase warranty models for repairable products. The first phase ½0; W is the warranty period and the second phase ðW; T þ WÞ is the buyer survival period. The products have two types of failures. Type I failures (minor failures) occur with probabilities qðzÞ in the first phase and ~qðzÞ in the second phase, and are removed by MRs in both phases. Type II failures (catastrophic failures) occur with probability pðzÞ ¼ 1 qðzÞ in the first phase and are removed by replacements in the first phase. They take place with probability ~pðzÞ ¼ 1 ~pðzÞ in the second phase and it is supposed the life of products will be ended. To buy a new product is conducted at time T þ W or upon the type II failure. Whenever each replacement takes place, the spare unit is ordered and then delivered. Therefore, the lead-time L is considered. Letting rðtÞ denote the failure

32

M.-S. Ouali et al.

Ry p ðyÞ ¼ e 0 pðxÞrðxÞdx ; and using similar notation rate during the first phase, F augmented with a tilde in the second phase, the mean cycle length is given by

E½cycle ¼ W þ

ZT

~ ~p ðtÞdt: F

0

Also with V ðtÞ being the renewal function associated with Fp ðtÞ with L displacements, the expected cost of the society per unit time is CðTÞ ¼

NðTÞ RT ~ ~p ðtÞdt Wþ 0 F

where 2 NðTÞ ¼ cr 4Fp ðWÞ þ 2 þ cd 4

þ

3 V ðW yÞdFp ðyÞdy5

0

ZW WL

ZW

ZW

Fp ðyÞdy þ

ZW

V ðW yÞFp ðyÞdy

0

ZW

3 V ðW y LÞFp ðyÞdy5

0

p ðyÞhðyÞqðyÞrðyÞdy þ ð1 þ V ðW yÞÞF

0

ZT

~ ðzÞ~hðzÞ~qðzÞ~r ðzÞdz; F ~ p

0

where cr is the replacement cost and cd is the total downtime cost. Here, the cost of the ith MR at time Si is gðc0 ðSi Þ; ci ðSi Þ; where c0 ðSi Þ is the age-dependent random part and ci ðSi Þ is the deterministic part which dependson the age and the number of the MR and hðyÞ ¼ EMðyÞ Ec0 ðyÞ gðc0 ðyÞ; cMðzÞþ1 Þ where MðyÞ counts the number of MRs in ½0; y: Chien [37] considers a repairable product sold under a failure-free renewing warranty agreement, subject to two types of failure. Type I failure (a minor failure) occurs with probability qðtÞ and can be rectified by MRs. Type II failure (a catastrophic failure) occurs with probability pðtÞ ¼ 1 qðtÞ and can be rectified only by replacement. Out of warranty, a product is completely replaced whenever it reaches the use time T; i.e. the product with total use age W þ T where W is the warranty period and T: Denoting by rðtÞ the failure rate of the product, the expected duration of a replacement cycle is given by E½cycle ¼

WþT Z 0

GðuÞdu;

A Survey of Replacement Models with Minimal Repair

33

where ¼ e GðtÞ

Rt 0

pðuÞrðuÞdu

:

It is assumed that the consumer has purchased a product sold under a failure-free renewing warranty. Within the warranty period W; the manufacturer must maintain the products, free of failure. Although the maintenance is free, the consumers will experience inconvenience or loss incurred by the product failure. Let cs1 and cs2 be the costs incurred by the consumer resulting from type I and type II failures, respectively. Out of warranty, all the repair and replacement costs due to productfailure are incurred by the consumer. A preventive out-of-warranty replacement policy is considered, in which MRs or replacement take place according to the following scheme. Out of warranty, a product will be completely replaced whenever it reaches the use time T (i.e. the product with total use age W þ T) at a cost cr1 (planned replacement). If the product fails at time of use y 2 ð0; TÞ; then it will either be replaced, with a probability of pðW þ yÞ (type II failure) at a cost cr2 (unplanned replacement), or it will undergo MRs, with a probability of qðW þ yÞ ¼ 1 pðW þ yÞ (type I failure) at a cost cm : After a complete out-ofwarranty replacement (i.e. planned or unplanned), the procedure is repeated. The decision variables being the warranty period and the out-of-warranty replacement age from the perspective of the buyer, the expected total cost per unit time over the life cycle is given by E½R1 ; CðT; WÞ ¼ R WþT GðuÞdu 0 where

E½R1 ¼ cs1

ZW 0

qðuÞrðuÞGðuÞdu þ cs2 GðWÞ þ ðcs1 þ cm Þ

WþT Z

qðuÞrðuÞGðuÞdu

W

GðW þ TÞ þ cr1 GðW þ TÞ: þ ðcr2 þ cs2 Þ½GðWÞ Sheu and Chien [167] consider system subject to two types of failures. Type I failure occurs with probability q ¼ 1 p; is minor, and can be corrected by MR. Type II failure occurs with probability p; is catastrophic, and can be removed by replacement only. The product is sold under warranty and burn-in is required the Sf of the failure time and before the product is put on sale. Denoting by FðtÞ p GðtÞ ¼ FðtÞ ; the expected total manufacturing cost per unit for products with burn-in time s is given by Rs þ qcp3 GðsÞ c0 þ c1 þ c2 0 GðuÞdu ; mðsÞ ¼ GðsÞ

34

M.-S. Ouali et al.

where c0 is the manufacturing cost per unit without burn-in, c1 is the fixed setup cost of burn-in per unit, c2 is the cost per unit time of burn-in per unit, and c3 is the minimal repair cost per type I failure during burn-in. The first warranty policy considered is the failure-free policy in which the manufacturer is responsible for all the repair and replacement costs during the warranty period ½0; T: Two cases are considered under this policy, namely renewing and non-renewing. In the case of the failure-free policies with renewing, the expected total cost (manufacturing plus warranty costs) per unit sold is given by

þ TÞ qðc3 þ c4 Þ GðsÞ Gðs þ mðsÞ ; CðsÞ ¼ þ c 4 þ TÞ p Gðs where c4 is the extra cost incurred when a failure occurs during the warranty period, regardless of the failure type. In the case of the failure-free policies without renewing, the expected total cost per unit sold is given by

qðc3 þ c4 Þ þ c4 þ mðsÞ Vs ðTÞ; CðsÞ ¼ p where Vs ðTÞ is the expected number of replacements in ½0; T: The second warranty policy considered is the rebate policy in which all type I failures in the warranty period ½0; T are rectified (through MR actions) by the manufacturer free of cost, and the buyer is refunded a proportion of the sales price Cp when the type II failure occurs for the first time. The amount of rebate, RðtÞ; is a function of the type II failure time t and is assumed to be a linear function of t; kcp 1 atT ; 0 t T; RðtÞ ¼ 0; t [ T: The expected total cost per unit sold is given by 1 qðc3 þ c4 Þ þ TÞ CðsÞ ¼ T ½GðsÞ Gðs p T GðsÞ 2 39 ZT = þ TÞ a Gðs þ tÞdt5 : ð1 aÞT Gðs þkcp 4T GðsÞ ; 0

The decision variable is these models is the burn-in time s: Rinsaka and Sandoh [146] consider a case where a manufacturer offers an additional warranty service under which the failed system is replaced by a new one for its first failure, but MRs are carried out to the system for its succeeding failures before the warranty expires. The optimal choice for the customer is based on maximizing his expected utility function while the manufacturer is assumed to be risk neutral and is interested in maximizing his expected profit. Chen and Chien [32] study the effect of PM carried out by the buyer on items sold under a free-replacement renewing warranty. For the manufacturer, if the product fails during the warranty period, it shall either be repaired (minor

A Survey of Replacement Models with Minimal Repair

35

failure) or replaced completely (catastrophic failure) by a new one with a new warranty and at no cost to the buyer. When the product is out-of-warranty and if the product fails due to a catastrophic failure before its useful life limit, the buyer will incur a penalty cost, the amount of which depends on the failure time. Under such a framework, the cost model is derived and the effects of the following three PM options on the cost are examined both from the manufacturers and buyers perspectives: Option A. No PM action over the useful life of each item. Option B. Continuous PM action over the whole useful life of each item. Option C. No PM over the warranty period and continuous PM over the postwarranty period. The results show that a significant cost saving can be obtained by taking the optimal PM action route. Chien and Chen [41] consider an age-replacement policy, in which MR or replacement takes place according to the following scheme. If the product fails before age T; it is either replaced by a new product (due to type II failure with probability p) at a downtime cost cd and a purchasing cost cp , or it undergoes MR (due to type I failure with probability 1 p) at a MR cost cm : Otherwise, the product is preventively replaced whenever it reaches age T: For a repairable product purchased under the renewing free-replacement warranty (RFRW), if a failure occurs within the warranty period w; either a new product with the same warranty is offered, free of charge by the seller, to replace the failed one (type II failure); or a MR is performed free of charge by the seller (type I failure). For this model, the expected duration of a replacement cycle is given by E½cycle ¼

ZT

GðuÞdu;

0

Rt

pKðuÞdu

¼e 0 and KðuÞ is the cumulative hazard function. The total where GðtÞ cost incurred in a renewal cycle depends on whether a replacement is performed within the warranty period or not. Therefore the two cases t w and t\w are considered. In the first case, the long-run expected cost rate is given by h i þ cd þ cm 1p 1 GðTÞ cm 1p 1 GðwÞ cp GðwÞ CðTÞ ¼ ; RT 0 GðuÞdu while in the second case, it becomes CðTÞ ¼

cp GðTÞ þ cd GðTÞ : RT GðuÞdu 0

In this model, the length w of warranty interval is supposed to be known a priori, and T is the decision variable.

36

M.-S. Ouali et al.

Chien [39] is a very slight modification of Chien and Chen [41]. Chien [40] is another modification in which RFRW is assumed to imperfect. Whenever a product fails during its warranty period, it is replaced by a repaired one at no cost to the purchaser, and a new full warranty is issued. Under the imperfect RFRW, the failure characteristic of the repaired product is assumed to be inferior to that of a new product. Jung et al. [82] deal with a replacement model following the expiration of renewing warranty and non-renewing warranty. If the system fails during its warranty period w; it is replaced with a new one. If the system survives to age w then it is minimally repaired at each failure and the life cycle is extended by a maintenance period of fixed length s: Arguing that the expected length of downtime of the system during the life cycle can be considered a significant factor when the optimal maintenance period following the expiration of renewing warranty is sought, Jung et al. [82] use both the expected downtime and the expected cost rate as criteria to determine the optimality of the replacement period. 1. Renewing warranty. Both the renewing free-replacement warranty (RFRW) and the renewing pro-rata warranty (RPRW) policies are considered. In this case, the expected length of life cycle is represented as E½cycle ¼ IðwÞ þ ðw þ sÞFðwÞ;

Rw

are the pdf and Sf of the time to first where IðwÞ ¼ 0 tf ðtÞdt and f ðtÞ and FðtÞ failure of the system, respectively. Denote by qðtÞ the failures rate and let cr be the unit cost of replacement, cm be the unit cost of MR, cfw be the unit failure cost during the warranty period, and cfm be the unit failure cost during the maintenance period. Then, the expected cost rate per unit time for the case of renewing warranty can is R wþs c0 þ ðcm þ cfm ÞFðwÞ qðtÞdt w ; CðsÞ ¼ IðwÞ þ ðw þ sÞFðwÞ where

cr c0 ¼

w IðwÞ þ cfw FðwÞ þ cr FðwÞ; cfw FðwÞ þ cr FðwÞ;

RPRW; RFRW;

while the expected downtime per unit time is h i R wþs dðwÞFðwÞ þ dr þ w dðtÞqðtÞdt FðwÞ ; DðsÞ ¼ IðwÞ þ ðw þ sÞFðwÞ where dr is the length of downtime due to replacement at the end of the maintenance period and dðwÞ is the downtime function due to MR at time w: 2. Non-renewing warranty. Both the non-renewing free-replacement warranty (NFRW) and the non-renewing pro-rata warranty (NPRW) policies are considered. In this case, the expected cycle length is E½cycle ¼ w þ s:

A Survey of Replacement Models with Minimal Repair

37

Let y denote the age of the system in use at the end of non-renewing warranty period and ‘ be the number of replacements during the non-renewing warranty period. Then, the expected cost rate per unit time for the case of non-renewing warranty is CðsÞ ¼

c1 þ ðcm þ cfm Þ

R yþs y

qðtÞdt

wþs

;

where c1 ¼

cr wy w IðwÞ þ ‘cfw þ cr ; NPRW; ‘cfw þ cr ; NFRW;

while the expected downtime per unit time is given as DðsÞ ¼

‘dw þ dr þ

R yþs y

wþs

dðtÞqðtÞdt

;

where dw represents the length of downtime due to replacement during the warranty period. Jung et al. [83] study the Sahin and Polatoglu [148] maintenance model under the renewing warranty from the user’s point of view. They define the life cycle a new from the user’s perspective and discuss the optimal maintenance policy after the renewing warranty is expired. They argue that from the user’s point of view, it is more reasonable to assume that the life of a system ends when the system is replaced by a new one at the users expense after the renewing warranty is expired. This is due to the fact that the manufacturer replaces or repairs at his own expense for the failures during the renewing warranty period. Finally, to complete the discussion of warranted system, we will mention the warranty servicing strategy for items sold with two-dimensional warranty where the failed item is replaced by a new one when it fails for the first time in a specified region of the warranty and all other failures are repaired minimally. A typical example is an automobile warranted for 3 years or 60,000 km of travel. The main goal is to determine the subregions, so that the associated expected warranty servicing cost per item sold is minimized; see for example the papers of Iskandar et al. [70] and Chukova and Johnston [47], and Jack et al. [73].

2.1.8 Inspection Models Failures are not always self-announcing and, sometimes, inspections are required to determine the exact state of the system. Some systems have been studied in which a failure is detected only by an inspection. Inspections are carried out as per schedule until the detection of failure and its subsequent repair or replacement, and the process continues for an infinite number of cycles. The costs considered in this

38

M.-S. Ouali et al.

case include the cost of a single inspection, the cost of repairing a failure, and the cost of delay in failure detection per unit time. We have already mentioned some models that dealt with inspection: Bai and Yun [10] and Butani [22] in the cost replacement policy models, Jhang [78] in the inventory models, and Sheu and Chen [164] and Chen [35] in the production models. Wang [191] considers an inspection model for a process with two types of inspections and repairs. A shift in product quality caused by minor process defects may be identified and rectified by routine inspections and repairs. A major defect caused by a major mechanical or electrical problem that can be observed only when the defect has led to a breakdown of the process or the defect is revealed by a major inspection followed by an appropriate major repair action at the time of the inspection. The following notation is used: X1 the random time to the initial point of a major defect with pdf f1 ðx1 Þ and CDF F1 ðxÞ X2 the random time to failure from the initial point of a major defect with pdf f2 ðx2 Þ and CDF F2 ðxÞ r the probability of a perfect major inspection t time interval of a minor inspection T time interval of a major inspection, T ¼ kt for simplicity where k [ 1 Tmr1 major repair time for rectifying the major defect identified at a major inspection point Tmr2 major repair time due to a major failure For this model, the renewal cycle, which is terminated by either a major inspection or failure repair, has expected length given by 2 ZjT 1 X i X 6 E½cycle ¼ ðiT þ Tmr1 Þð1 rÞij rf1 ðx1 Þ½1 F2 ðiT x1 Þdx1 4 i¼1 j¼1

þ

ZiT ði1ÞT

ðj1ÞT

ZjT

3 7 ðx þ Tmr2 Þð1 rÞij f1 ðx1 Þf2 ðx x1 Þdx1 dx5:

ðj1ÞT

Now, introduce the unit costs Cmi average cost of a minor inspection Cma average cost of a major inspection Cmr average cost of a minor repair Cmr1 average cost of a major repair for the defect identified at a major inspection Cmr2 average cost of a major repair due to a major failure

A Survey of Replacement Models with Minimal Repair

39

Then, the expected renewal cycle cost E½Cc is 8 1 i < ZjT X X> E½Cc ¼ iðCma þ kCmi Þ þ Cmr1 þ E½Cijs ðx1 Þ > i¼1 j¼1 : ðj1ÞT

ð1 rÞij rf1 ðx1 Þ½1 F2 ðiT x1 Þdx1 þ

ZiT ði1ÞT

ZjT

x ði 1ÞT Cmi ði 1ÞðCma þ kCmi Þ þ Cmr2 þ int t

ðj1ÞT

1 C þE½Cijf ðx1 ; xÞA

ZiT ði1ÞT

ZjT ðj1ÞT

9 > = ij ð1 rÞ f1 ðx1 Þf2 ðx x1 Þdx1 dx : > ;

Here, int½ stands for the integer function to return a maximum integer equal to or less than the argument within the brackets, E½Cijs ðx1 Þ is the expected minor repair cost minus the expected profit when the major inspection repair was done at iT and x1 2 ððj 1ÞT; jT and E½Cijf ðx1 ; xÞ is the expected minor repair cost minus the expected profit when x 2 ðði 1ÞT; iT and x1 2 ððj 1ÞT; jT: These last two expressions are further developed and approximated for ease of computation.

2.1.9 Deteriorating Systems Deteriorating systems have been considered in the context of when repair times are not negligible. These systems are stochastically deteriorating, i.e., the lengths of the operating intervals are stochastically decreasing, whereas the durations of the repairs are stochastically increasing. Sim and Endrenyi [182] propose a Markov model for a continuously operating device whose condition deteriorates with time in service. The deterioration levels of the system are classified into non-negative integers, i.e., S ¼ f1; 2; 3; . . .; kg and the device has a deterioration failure immediately following the completion of k stages of deterioration. Following deterioration failure, the device is overhauled, ie, restored to ‘as good as new’. The overhaul duration is exponentially distributed. The duration of each deterioration stage has an exponential distribution. Besides deterioration failure, the device is also subject to Poisson failures. Poisson failure occurs at a uniform rate or intensity, independent of the deterioration stage of the device. MR is performed and the repair duration is exponentially distributed. Periodically, the device is removed from operation for maintenance. For s 1 maintenances since the device was ‘as good as new’, the maintenance is minimal; maintenance s is a major maintenance. An exact recursive algorithm computes the steady-state probabilities Pði; j; nÞ where i is the deterioration-stage index, j is the minimal maintenance number, and n describes the state of device: n ¼ 0 is the

40

M.-S. Ouali et al.

state following a Poisson failure, n ¼ 1 is the operating state, and n ¼ 2 is the minimal-maintenance state. Costs ðcm ; cM ; c0 ; cd Þ are assigned for the unit times of the various outages (minor and major maintenance, and repairs after Poisson and deterioration failures), and the total cost is defined as the weighted sum Cðkm Þ ¼ cm Pm þ cM PM þ c0 P0 þ cd Pd ; where Pd ¼ cd Pð1; 0; 1Þ is the steady-state probability that the device is being overhauled following a deterioration failure, PM ¼ cM Pð1; 0; 1Þ is the steady-state probability that the device is out of service due to major maintenance, Pm ¼ Pk Ps1 Pk Ps1 i¼1 j¼1 Pði; j; 2Þ; and P0 ¼ i¼1 j¼0 Pði; j; 0Þ: Note that the decision vari1 able is km where km is the mean time to the next maintenance event. Ohnishi et al. [130] investigate an optimal minimal-repair and replacement problem of a discrete-time Markovian deterioration system. It is assumed that the system is partially observable through a certain monitoring mechanism which yields a signal relating probabilistically to the exact level of its deterioration. The problem is to find an optimal minimal-repair and replacement policy of minimizing the expected total discounted cost over the infinite horizon. Soro et al. [184] develop a model for evaluating the availability, the production rate and the reliability function of multi-state degraded systems subjected to minimal repairs and imperfect preventive maintenance. The status of the system is considered to degrade with use. It is assumed that the system can consecutively degrade into several discrete states, which are characterized by different performance rates, ranging from perfect functioning to complete failure. In addition, the system can fail randomly from any operational state and can be minimally repaired. 2.1.10 Improving and Deteriorating Systems A repairable system improves or deteriorates with time according to whether the times between two successive repairs tend to get larger or smaller in some sense, usually in terms of one of the partial orders defined for life distributions. Bagai and Jain [8] study improvement and deterioration for a repairable system, in particular in terms of the effect of ageing on the distribution of the time to first failure under a NHPP. They consider a unit which, upon failure, is replaced by a new unit with probability p or is minimally repaired with probability q ¼ 1 p: The expected length of a replacement cycle is found to be E½cycle ¼

ZT

p ðyÞdy; F

0

R t 1 p ðtÞ ¼ e 0 pF ðyÞdFðyÞ and FðtÞ is where T is the time to a planned replacement, F the Sf of a new unit. Denote by c1 the cost of an unplanned replacement, by c2 the cost of the planned replacement at age T; and by ci0 ðxÞ the cost of MR i at age x;

A Survey of Replacement Models with Minimal Repair

41

then the long-run average cost rate for such a system under age replacement is given by R 1 ðyÞF p ðyÞdFðyÞ p ðTÞ þ T qhðyÞF c1 Fp ðTÞ þ c2 F ; CðTÞ ¼ RT 0 0 Fp ðyÞdy h i MðyÞþ1 where hðyÞ ¼ E c0 ðyÞ and MðtÞ is an NHPP with mean function R 1 ðyÞdFðyÞ: qF 2.1.11 Imperfect Repair Models Brown and Proschan [21] introduced the notion of imperfect repair: when a device fails, with probability p; it is returned to the ‘good-as-new’ state (perfect repair), with probability q ¼ 1 p; it is returned to the ‘bad-as-old’ state (MR). If p ¼ 0; then the repair is always a MR and if p ¼ 1; then the repair is always a perfect repair. This notion has been generalized and modified by many authors. Block et al. [20] assume that the probability of a perfect repair depends on the age of the unit at its failure. Sheu and Griffith [168] consider a bivariate imperfect repair model, which is the multivariate version of the imperfect repair model of Brown and Proschan [21]. Lim et al. [101] propose a Bayes imperfect repair model by assuming the probability of perfect repair as a random variable. Cha and Kim [29] show the existence of the steady-state availability of the age-dependent MR model under the assumption of non-negligible repair times. Makis and Jardine [107] consider a model where the system can be replaced at any time at a cost c0 and upon failure the system can be either replaced at the cost c0 or can undergo a repair at a cost c1 ðxÞ if x is the age of the system at failure. The repair is imperfect with the following three possible outcomes: 1. the system is as good as new with probability pðxÞ; 2. the system is restored to the functioning state just prior to failure with probability qðxÞ (MR); 3. the repair is unsuccessful with probability sðzÞ ¼ 1 pðxÞ qðxÞ; the system must be scrapped and replaced at the additional cost c0 : Denoting by hðtÞ the hazard rate, the expected length of a replacement cycle is given by ZT E½cycle ¼ GðtÞdt þ GðTÞsðTÞ; Ry ½1qðxÞhðxÞdx ; while the where sðtÞ is the mean residual life and GðyÞ ¼e 0 expected average cost per unit time under a T-policy is given by RT c0 þ 0 ½c1 ðtÞ c0 pðtÞhðtÞGðtÞdt CðTÞ ¼ : RT GðtÞdt þ GðTÞsðTÞ 0

0

42

M.-S. Ouali et al.

Lim and Park [99] consider the imperfect-repair model in which a unit is either perfectly-repaired or minimally-repaired, with known, fixed probabilities. The exact expected cost-rate in a long run is obtained for the exponential distribution. However, only an upper bound for the expected cost rate is obtained for general life distributions. Lim et al. [100] generalize the model of Lim and Park [99] to the case where the repair time is a random variable instead of being negligible. Again, only an upper bound is obtained for the expected cost rate. Cui et al. [49] consider an interesting problem which arises when both types of repair (minimal and perfect) are possible. The problem is to determine the repair policy; that is, the type of repair which should be carried out after a failure. Two models are studied. In the first model, a fixed amount of resource is available for minimal and perfect repairs. That is, the number of repairs allowed for MR and perfect repair (PR) are Nm and Np ; respectively. In the second model, the total available resource C is fixed and the costs of each minimal and perfect repair are c1 and c2 ; respectively. But the number of MR and PR is not fixed as long as the total resource consumed is less than C: 2.1.12 General Repair Models While the notion of imperfect repair generalizes that of MR, the notion of general repair generalizes that of imperfect repair. When a general repair is performed at a failure instant, it will return the system to a working condition between a ‘good-as-new’ state (a perfect repair) and a ‘bad-as-old’ state (a MR). In other words, a general repair rejuvenates the system and brings its condition to a level somewhere between as good as new and just prior to the overhaul. Two main models have been suggested for addressing the issue of the general repair of such systems: either the age of the system is improved (younger) or its failure rate is improved (reduced). Each of these two models has its advantages and its drawbacks. For other models proposed in the literature, see for example Doyen and Gaudoin [58]. Kijima [87] proposes a general repair model. The notion of the ‘age’ of the product and the degree of repair (also called improvement factor, lack of perfection, restoration factor, parameter of rejuvenation, etc.) are used to define the virtual age of the product. If the system has the virtual age Vn1 ¼ y immediately after the ðn 1Þth repair, the nth failure-time Xn is assumed to have the Sf þ yÞ=FðyÞ Fðx where FðxÞ is the Sf of the failure-time of a new system. A general repair is represented as a sequence of random variables An taking a value between 0 and 1 where An ¼ 1 means a MR and An ¼ 0 a perfect repair. Depending on how the repair affects the virtual age process, the following two models are constructed. In model I: Vn ¼ Vn1 þ An Xn ; and in model II: Vn ¼ An ðVn1 þ Xn Þ: Scarsini and Shaked [152] consider an item which can be repaired N times, where each repair is general in the sense of Kijima [87]. Denote by Xi the times between repairs or maintenance epochs of an item, by Ai the lack of perfection of

A Survey of Replacement Models with Minimal Repair

43

the respective repair or maintenance actions, and by Vi the value of the virtual age of the item right after the respective repairs or improvements. Then, Vi ¼ Ai ðVi1 þ Xi Þ: The virtual age of the item at time t is given by 8 t; < n n nþ1 P P P VðtÞ ¼ Xi t\ Xi ; : vn þ t Xi ; if i¼1

i¼1

if t\X1 ; n ¼ 1; 2; . . .; N:

i¼1

Now, suppose that the item continuously yields a benefit whose rate bðvÞ at any time t depends only on the virtual age v of the item at time t: Then, the total benefit generated by the item when its virtual age is a random variable is given by N þ1 P

D

i¼1 Z

0

Xi

bðVðtÞÞdt ¼

Nþ1 X

bðVi1 þ tÞdt:

i¼1

Dimitrov et al. [56] assume that a sold item with hazard function KðtÞ is covered by warranty for a calendar time of duration T; according to the free replacement warranty over the interval ½t0 ; t0 þ T: They also assume that during its usage, the product is maintained under age-reducing repairs according to Kijima’s model I, i.e., if Xi denotes the inter-repair or inter-maintenance times of the product and di denotes the lack of perfection of the ith repair, then, Ti ¼ Ti1 þ di Xi is the value of the virtual age of the product immediately after the ith repair. The moment of purchase t0 is a free of charge maintenance check-up. At the expiration of any b units of time, where b is the inter-maintenance time, a PM must be performed (these are the planned check-ups). Assuming a constant agecorrecting factor di ¼ d; the age-dependent repair/maintenance check-up of factor d made at age u will cost cr ðu; dÞ: All other failures are fixed by MRs at a costs cm each. The expected warranty cost is given by the expression 8 T ½Tb ½ b <X X CðTÞ ¼ cr ðt0 þ kb; dÞ þ cm ½Kððt0 þ ðk 1ÞbÞd þ bÞ Kððt0 þ kbÞdÞ : k¼1 k¼1 9 = T T T ; b dþT b K t0 þ b d þK t0 þ ; b b b where Tb is the smallest integer less than or equal to the number Tb : Aven and Castro [6] consider a system subject to two types of failures. Type 1 failures arrive according to a NHPP with intensity function r1 ðtÞ and are minimally repaired at a cost c1 : Type 2 failures arrive according to a NHPP with intensity function r2 ðtÞ: In this case, the system is minimally repaired with probability p and

44

M.-S. Ouali et al.

replaced with probability 1 p: The associated costs are c2;m and c2;r ; respectively. The system is replaced at a constant time T after its installation or at a nonrepairable type 2 failure, whichever occurs first. A cost cr is incurred whenever a Rt M ðtÞ ¼ planned replacement is performed. Let K2 ðtÞ ¼ 0 r2 ðuÞdu and F ð1pÞK2 ðtÞ e : Then, the total expected discounted cost CðTÞ in ½0; 1Þ given by CðTÞ ¼ R T 0

E½R1 ; M ðtÞdt aeat F

where a is a positive discount factor, and E½R1 ¼ c1

ZT

M ðtÞdt þ c2;m p eat r1 ðtÞF

0

ZT

M ðtÞdt eat r2 ðtÞF

0

þ c2;r ð1 pÞ

ZT

M ðTÞeaT : M ðtÞdt þ cr F eat r2 ðtÞF

0

The decision variable is the planned replacement time T: Further, Aven and Castro [6] assume that failures of type 2 are safety critical and to control the risk, management has specified a requirement that the probability of at least one such failure occurring in the interval ½0; A should not exceed a fixed probability limit x; called the safety probability limit. Yun et al. [196] look at two warranty servicing strategies (strategies 1 and 2) involving minimal and imperfect repairs. In both strategies, a failed item is subjected to at most one imperfect repair over the warranty period W: In servicing strategy 1, all item failures occurring in the interval ½0; xÞ are minimally repaired. The first failure in the interval ½x; y is rectified by imperfect repair with the proportional reduction factor in the hazard rate dðtÞ depending on the age t at failure. The cost of the imperfect repair is ci ðdðtÞ; tÞ: All subsequent failures in this interval are minimally repaired, as are those that occur in the remaining interval ðy; W: This strategy is characterized by the set x; y; Dðx; yÞ consisting of the two parameters x and y and the function Dðx; yÞ fdðtÞ; x t yg: These are the decision variables that need to be selected optimally to minimize the expected warranty servicing cost given by 2 W 3 Z f ðyÞ Cðx; y; Dðx; yÞÞ ¼ cr 4 rðuÞdu 5 FðxÞ 0

þ

Zy

f ðtÞ fci ðdðtÞ; tÞ cr dðtÞ½rðtÞ rð0ÞðW tÞg dt; FðxÞ

x

where f ðtÞ; FðtÞ; and rðtÞ are the pdf, the Sf, and the hazard rate of time to first item failure. In servicing strategy 2, the proportional reduction in the hazard rate

A Survey of Replacement Models with Minimal Repair

45

does not change with time over the interval ½x; y; ie., dðtÞ ¼ d and the cost of an imperfect repair is given by ci ðdÞ: The problem is thus simplified and the decision variables are x; y; and d:

2.1.13 Opportunity-Based Replacement Models To be cost effective, preventive replacements of some components may be delayed to some time during which the unit is not required for service. Such idle moments can be created by many mechanisms, e.g. by breakdowns of the other units in a series configuration with the unit in question, and in such cases we speak of maintenance opportunities. Jhang and Sheu [79] propose an opportunity-based age replacement policy with MR. The system has two types of failures. Type I failures (minor failures) occur with probability qðzÞ and are removed by MRs, whereas type II failures occur with probability pðzÞ ¼ 1 qðzÞ and are removed by replacements. Type I and type II failures are age-dependent. A system is replaced at type II failure (catastrophic failure) or at the opportunity after age T; whichever occurs first. The cost of the MR of the system /ðCðzÞ; cðzÞÞ at age z depends on the random part CðzÞ and the deterministic part cðzÞ: The opportunity arises according to a Poisson process, independent of failures of the component and the time W between successive opportunities has an exponential distribution with finite mean E½W and pdf gW ðÞ: For this model, the expected length of a replacement cycle is given by E½cycle ¼

Tþw Z1 Z 0

p ðzÞdzgW ðwÞdw: F

0

The total expected long-run cost per unit time is R1 R Tþw p ðzÞqðzÞrðzÞdzgW ðwÞdw hðzÞF 0 ½cp þ ðcf cp ÞFp ðT þ wÞ þ 0 CðTÞ ¼ ; R 1 R Tþw p ðzÞdzgW ðwÞdw F 0 0 where cf denotes the cost of replacement at type II failure, cp denotes the cost of replacement at the opportunity after age T; and hðzÞ ¼ ECðzÞ ½/ðCðzÞ; cðzÞÞ:

2.1.14 Leased Equipment Leasing a product rather than owning it is becoming more and more popular, because of rapid advances in technology and increase in products complexity. The maintenance of the product is usually specified in a lease contract. Jaturonnatee et al. [74] develop a model of a PM policy for leased equipment with corrective MRs. It is assumed that the lessor carries out the maintenance of

46

M.-S. Ouali et al.

the equipment. The equipment is leased for a period L: The contact involves two penalties for the lessor. Penalty-1 is incurred if the equipment is not restored from failed state to working state within a reasonable time s: Penalty-2 is incurred if failures occur during the lease period. For CM actions, all failures are rectified through MR. In this case, equipment failures with no PM actions occur according to a NHPP with intensity function k0 ðtÞ ¼ rðtÞ where rðtÞ is the hazard function associated with the first failure. For PM policy, the equipment is subjected to k PM actions over the lease period. At the jth PM action, the intensity function is reduced by dj ; so that kðtÞ ¼ k0 ðtÞ

j X

di :

ð4Þ

i¼0

The cost of a PM action is given by Cp ðdÞ ¼ a þ bd where a and b are constants. The total expected cost of this model given by Z k X Cðk; t; dÞ ¼ cf KðLÞ þ ða þ bdj Þ þ ct KðLÞ ½1 GðyÞdy þ cn KðLÞ; 1

j¼1

s

where cf is the average cost of CM action to rectify failure, ct is the penalty cost per unit time of penalty-1, cn is the penalty cost per failure of penalty-2, GðyÞ is the repair time Cdf, and KðLÞ is the cumulative failure intensity function. The decision variables of the policy are (1) the number k of PM actions to be carried out over the lease period, (2) the time instants t ¼ ðt1 ; . . .; tk Þ for such actions, and (3) the level of actions d ¼ ðd1 ; . . .; dk Þ: Yeh and Chang [192] propose a maintenance scheme in which PM actions are taken when the failure rate of the leased product reaches a certain threshold value. They assume that a new product is leased for a period of L: The lifetime distribution of the product is Weibull with scale parameter k and shape parameter b: Within the lease period, any failure of the leased product is rectified by MRs. Each MR incurs a fixed repair cost cr to the lessor and requires a random amount of repair time tr that follows a general Cdf Gðtr Þ: If the repair time exceeds a pre-specified time limit s; then there is a penalty cs to the lessor. To reduce the number of failures within the lease period, imperfect PM actions with degree d are carried out whenever the failure rate of the product reaches a threshold value h: The cost to perform an imperfect PM with degree d is cp ðdÞ ¼ a þ bd: The expected total cost within the lease period L for this model is given by ( " #) 1 b1 n X h þ ði 1Þd b ðkLÞ þ d nL Cðn; d; hÞ ¼ ½cr þ cs GðsÞ bkb i¼1 þ nða þ bdÞ; where n is the number of PM actions during the lease period.

A Survey of Replacement Models with Minimal Repair

47

2.1.15 Outsourcing Models Models where a manufacturer outsources its maintenance activities to a contractor instead of carrying them in-house have received little attention. Tarakci et al. [186] discuss this type of model both from the manufacturer and the contractor point of views. They also consider both cases when no learning or learning takes place. A manufacturer has a finite production horizon of length Y: There are two types of maintenance activities: periodic PM and MR at failure. Let Tp and Tr denote the average time required for the contractor to perform a PM and a MR operation, respectively. The manufacturers payment to the contractor has two components: a fixed payment of P over the time horizon, and a cost subsidization scheme for every preventive maintenance and MR activity that the contractor performs. Let sp and sr represent the ratio of the PM cost cp and the MR cost cr that the manufacturer subsidizes, respectively. It is assumed that the contractor has a pre-determined reservation (minimum) profit over the production horizon Y; denoted by p0 ; for participating in the contractual relationship. Let the net revenue of the manufacturer, after taking into account the production related costs, be R per unit of time that the process is in operation. If no learning takes place, then the contractor expected profit over the time horizon is given by Y P N½ð1 sp Þcp N½ð1 sr Þcr M Tp ; N where MðtÞ is the expected number of failures in time interval ½0; t; while the manufacturer expected profit is Y RðY NTp Þ Ncp Nðcr þ RTr ÞM Tp p 0 : N For both problems, the decision variable is the number N of PM activities performed by the contractor. Tarakci et al. propose also formulations in the cases of natural learning and learning through effort. 2.1.16 MAP Failures Montoro-Cazorla and Pérez-Ocón [114] consider a system subject to two types of failures, external and internal failures. The operational time has a phase-type distribution (PH-distribution) while failures arrive following a Markovian arrival process (MAP). Some failures require the replacement of the system, and others a MR. They derive the mean number of repairable failures per unit time v; the mean number of replacements per unit time r; and write the mean cost of the system as C ¼ c0 þ cI v þ cR r; where c0 is the benefit per unit time while operational, cI is the cost per unit time in imperfect repair, and cR is the cost per replacement.

48

M.-S. Ouali et al.

2.1.17 Multi-Unit Systems In recent years there has been an increasing interest in multi-component systems. However, the cost analysis of these systems is normally extremely difficult because of the large number of system states involved. The optimal replacement rule could have a very complex structure. Sheu and Kuo [174] consider the k-out-of-n system consisting of n iid components each with constant failure rate k: The components of the system have two failure types. Type I failure occurs with probability q and is corrected with MR, whereas type II failure occurs with probability p ¼ 1 q and a failed component is lying idle. The system is completely replaced whenever it reaches age T at a cost nc1 (planned replacement). It is also completely replaced at the occurrence of the n k þ 1th idle component at a cost nc1 þ c2 (unplanned replacement at system Pn n j;p ðtÞ ¼ failure). The cost of the MR is c: Letting F ðepkt Þi i¼njþ1 i ð1 epky Þni dy; the expected cycle length is given by E½cycle ¼

ZT

nkþ1;p ðyÞdy: F

0

Also, the expected long-term average cost, where the decision variable are the number n of components and the replacement time T; is obtained for the model as i R T hPnk nc1 þ c2 Fnkþ1;p ðTÞ þ cqk 0 j¼1 Fj;p ðtÞ þ kFnkþ1;p ðtÞ dt : Cðn; TÞ ¼ RT 0 Fnkþ1;p ðyÞdy Sheu and Jhang [173] consider an N component system with shock failure interaction. The ith component ð1 i NÞ is subject to shocks that arrive according to a NHPP fNi ðtÞ; t 0g with intensity rate ri ðtÞ and mean value Rt function Ki ðtÞ ¼ 0 ri ðsÞds: As shocks occur, the ith component has two types of failures. Type I failure (minor failure) is removed by a MR, whereas type II failure (catastrophic failure) induces a total failure of the system and is removed by an unplanned (or unscheduled) replacement of the system. The expected cycle length of an age replacement cycle is given by E½cycle ¼

ZT

GðtÞdt;

0

where ! k N 1 Ki ðtÞ X Y e ½ K ðtÞ i ¼ k;i : P GðtÞ k! i¼1 k¼0

A Survey of Replacement Models with Minimal Repair

49

k;i is the (known) probability that the first k shocks of component i are type I and P Q P j ðtÞ; then the expected long-run i ðtÞ ¼ N G failures. Now, let qk;i ¼ k;i and H Pk1;i

j6¼i

cost per unit time for the age replacement policy is given by E½R1 CðTÞ ¼ R T : 0 GðtÞdt where E½R1 ¼ R2 þ ðR1 R2 ÞGðTÞ " # ZT X N 1 X eKi ðtÞ ½Ki ðtÞk i ðtÞ dt: k;i H nkþ1;i ðtÞqkþ1;i ri ðtÞ þ mk ðtÞ P þ k! i¼1 k¼0 0

Here, R2 is the cost of the planned replacement of the system at age T; R1 is the cost of the unplanned replacement of the system at the time of type II failure, nk;i ðtÞ ¼ Eci ðtÞ gi ðci ðtÞ; ck;i ðtÞÞ is the expected cost of the kth MR of component i at age t where ci ðtÞ is the age-dependent random part, ck;i ðtÞ is the deterministic part which depends on the age and the number of the MR, and mk;i ðtÞ denotes the cost per unit time of maintenance of component i at time t 2 ½Sk;i ; Skþ1;i Þ; where Sk;i is the arrival time of the kth shock of component i: Sheu and Jhang [173] derive also the expression for the total a discounted cost for each policy. Monga and Zuo [112] present a study on reliability based design of a system considering burn-in, warranty and maintenance. They consider a series-parallel system comprising n subsystems in series and each subsystem consists of identical components connected in parallel. The lifetimes of the components are independent random variables. It is assumed that PM is performed when a system reaches a maximum allowed failure rate. If the system fails between these intervals, MRs are performed. A simple non-renewable free-replacement warranty policy is used, which means that if a failure occurs during the warranty period then the related costs will be borne by the manufacturer and these costs are higher than the repair costs during burn-in. The goal is to determine the optimal number of parallel components in each subsystem, the optimal burn-in duration, the optimal PM schedule and the optimal system replacement time in order to minimize the total costs during the system’s life cycle. The total system life cycle costs includes manufacturing costs, installation and setup costs, warranty and post warranty costs. Jhang and Sheu [80] consider an N-component system with failure interaction. The ith component ð1 i NÞ is subject to shocks that arrive according to a NHPP fNi ðtÞ; t 0g with intensity rate ri ðtÞ and mean value function Ki ðtÞ ¼ Rt 0 ri ðsÞds: As shocks occur the ith component has two types of age dependent failures. Type I failure (minor failure) occurs with probability qi ðtÞ and is removed by a MR, whereas type II failure (catastrophic failure) occurs with

50

M.-S. Ouali et al.

probability pi ðtÞ ¼ 1 qi ðtÞ and is removed by an unplanned replacement of the system. The expected cycle length of an age replacement cycle is given by

E½cycle ¼

ZT

GðtÞdt;

0

where 9 8 t N Z = < X ¼ exp pi ðxÞri ðxÞdx : GðtÞ ; : i¼1 0

Now, denote by R1 the cost of the unplanned replacement of the system at type II failure, by R2 the cost of the planned replacement of the system at age T; by hk;i ðtÞ ¼ Eci ðtÞ gi ðci ðtÞ; ck;i ðtÞÞ the expected cost of the kth MR of component i at age t where ci ðtÞ is the age-dependent random part and ck;i ðtÞ is the deterministic part which depends on the age and the number of the MR, and by mk;i ðtÞ the cost per unit time of maintenance of component i at time t 2 ½Sk;i ; Skþ1;i Þ; where Sk;i is the time of the kth MR of component i: Then, the expected long-run cost per unit time for the age replacement policy is given by R2 þ ðR1 R2 ÞGðTÞ þ CðTÞ ¼

N RT P

½hi ðtÞqi ðtÞri ðtÞ i¼1 RT 0 GðtÞdt 0

þ mi ðtÞGðtÞdt :

Here hi ðtÞ ¼ EMi ðtÞ hMi ðtÞþ1;i ðtÞ ; where Mi ðtÞ counts the number of type I failures of component i in ½0; t and mi ðtÞ ¼ EMi ðtÞ mMi ðtÞ;i ðtÞ : Jhang and Sheu [80] derive also the expression for the total a discounted cost for each policy. Cassady et al. [25] develop a mathematical programming model for making selective maintenance decisions for military equipment which perform sequences of missions and is repaired only between missions. The system consists of m independent subsystems connected in series and each subsystem i contain a set of ni independent and identical components connected in parallel. Each component, subsystem, and the system can be in only one of two states: functioning properly or failed. At the end of a mission, a number of desirable maintenance actions are available to decision maker, namely MR on failed components, replacement of failed components, and replacement of functioning components (PM). The model identifies the optimal maintenance actions. Bai and Pham [9] consider a series system with q independent repairable components under a free repair warranty (FRW) policy with a fixed period w: It is obvious that a series system will fail if and only if any of the components in it fails. Each time a warranted system fails before w; there must be one and only one failed component in the system. Under the FRW policy, the failed component will be identified and minimally repaired free of charge to consumers. Let ki ðtÞ be the

A Survey of Replacement Models with Minimal Repair

51

failure rate function of component i: Two methods of time discounting are used: the continuous discounting and the discrete discounting. Under the continuous discounting the expected discounted warranty cost (DWC) is given by CðwÞ ¼

q X

Zw ci

i¼1

edu ki ðuÞdu;

0

where ci is the repair cost per failure of component i and d is the discount rate, while under the discrete discounting it is given by CðwÞ ¼

q X i¼1

Zw ci

ð1 þ dÞu ki ðuÞdu:

0

Many models presented in the literature assume that a system is subject to two types of failures which occur, at some time t; with respective probabilities pðtÞ and 1 pðtÞ: Cha and Mi [30] obtain general results on some probability functions and apply them to study the shapes of pðtÞ: They also apply their results to determining the optimal inspection and allocation policies in maintenance problems. Other shapes for pðtÞ are given in Samrout et al. [149]. Kim and Kuo [89] establish the trade-off between the reliability of components during system burn-in and develop an optimal burn-in time for non-series systems to maximize system reliability. Two types of repair are considered: MR at the time of system failure, and repair at the time of component or connection failure. Nahas et al. [120] deal with preventive maintenance optimization problem for multi-state systems (MSS). This problem was initially addressed and solved by Levitin and Lisnianski [96]. It consists in finding an optimal sequence of maintenance actions which minimizes maintenance cost while providing the desired system reliability level. Nahas et al. [120] propose an optimization method based on the extended great deluge algorithm. This method has the advantage over other methods to be simple and requires less effort for its implementation.

2.1.18 Statistical Perspectives There are quite a few papers that deal with MR modeling from a statistical point of view. We have grouped these papers into two categories. • Bayesian perspective: all of the above literature assume that the failure parameters are known with certainty. This assumption is not necessarily always correct. When this is not the case, Bayesian analysis is one way to incorporate this information into the decision making process. The Bayesian approach expresses and updates the uncertain parameters for determining an optimal maintenance policy. First a prior distribution is assumed for the parameters. Once the failure data and the posterior distributions of the parameters are

52

M.-S. Ouali et al.

obtained, the objective functions is derived by taking the expectation of the total expected long-run cost per unit time given by (2) with respect to the parameters, resulting in a new objective function. There are few papers which investigate Bayesian maintenance policies, e.g., Mazzuchi and Soyer [108], Sheu et al. [180], Dayanik and Gürler [52], Sheu et al. [181]. • Other perspectives: Gupta and Kirmani [65] deal with predictors of a future MR epoch. Consider a system where the process of failures and MRs continues indefinitely. Let Rn denote the time at which the nth MR is made. A basic problem of practical importance in the study of repairable systems is to predict the time at which a future MR will be needed. More precisely, one would like to predict the value of Rnþk ; for a specified positive integer k; when the available information consists of the observed values of R1 ; R2 ; . . .; Rn : Purohit [144] remarks that besides PM policies based on the MR model, the proportional hazards model is a plausible model which will account for the possibility of additional damage at failures. Therefore, a test is designed to discriminate between these two models. Heidergott [67] shows how to apply the technique of smoothed perturbation analysis (SPA) to optimize threshold values in a maintenance model. He considers a system in which a single component is minimally repaired up to an age threshold t and is replaced immediately upon failure. If the component survives time tp [ t a preventive replacement is performed. Lim and Lie [98] propose an imperfect-repair model for repairable systems where two repair modes, perfect and minimal, occur in accordance with a Markov chain. This model generalizes the imperfect repair model of Brown and Proschan [21] by allowing first-order dependency between two consecutive repair modes. The estimation procedure is developed in a parametric framework for incomplete data where some repair modes are not recorded. The expectation maximization (EM) principle is used to address the incomplete-data problem. Under the assumptions that the lifetime distribution belongs to a parametric family having aging property and explicit form of the Sf, an algorithm is developed for finding the maximum likelihood estimates (MLE) of the transition probabilities of the repair modes as well as the distribution parameters. Ahmadi and Arghami [3], Ahmadi and Balakrishnan [4], and Baratpour et al. [13] note that the study of upper record values is the same as the study of the lifetime of a component with m MRs. Consider a component with lifetime distribution FðtÞ: Let TðmÞ denote the lifetime of the component if m MRs are allowed. Then TðmÞ has the same distribution as the mth upper record value derived from iid observations with Cdf FðtÞ: Agustin [2] examines the problem of goodness-of-fit concerning the distribution of the initial failure times of a repairable system. The imperfect repair model of Block et al. [20] is considered. Upon failure, the system is either perfectly repaired with probability pðtÞ or minimally repaired with probability qðtÞ ¼ 1 pðtÞ: The hypotheses of interest are H0 : k ¼ k0 versus Ha : k 6¼ k0 ; where k0 is a fully specified hazard rate function. This problem is often encountered in

A Survey of Replacement Models with Minimal Repair

53

the engineering and operations research settings where one is typically interested in predicting the reliability of systems, scheduling maintenance, and providing spare parts. Guérin et al. [63] define two accelerated life models (the Arrhenius-exponential model and the Peck–Weibull model) for repairable systems. They show that it is possible to estimate the reliability of a product during its development with a small number of prototypes using accelerated life testing with the ability to minimally repair when a failure occurs. This method allows us to improve the accuracy in the estimation of reliability parameters when the accuracy is linked to the number of failure times that are available. Using the model II of Kijima [87], Mettas and Zhao [110] explore the general renewal processes (GRP) to model and analyze complex repairable systems with various degrees of repair. They present a general likelihood function formulation for single and multiple repairable systems to estimate the GRP parameters. They also develop confidence bounds based on the Fisher information matrix. When dealing with the model I of Kijima [87], the closed form solution of the equation is not available, and even numerical solutions are extremely difficult to obtain because of the mathematical complexity of the g-renewal equation. Kaminskiy and Krivtsov [85] proposed an approximate solution using the Monte Carlo simulation (MC) technique. Newby [129] is concerned with the deterioration of the devices while in storage of one shot devices that are kept in storage and taken into use when required. For a complex device, limited non-destructive testing and repair is possible short of using it in a destructive test. The tests are not perfect and can give false positive and false negative results. When a fault is indicated, a minimal repair is carried out. The objective is to establish levels of reliability of individual components which together with the inspection regime give a particular level of reliability in the delivered components. Assuming a general distribution for the time to fail in storage, the likelihood function is developed and used to estimate the parameters of the model.

2.2 N-policy 2.2.1 Optimal Number of Minimal Repairs Park [131] presents a model for determining the optimal number of MRs before replacement. The basic concept parallels the periodic replacement model with MR at failure introduced by Barlow and Hunter [14], the only difference being the replacement is signaled by the number n of previous MRs performed on the unit. The total cost per unit time in a replacement cycle is CðnÞ ¼

ðn 1Þcf þ cr ; E½cycle

54

M.-S. Ouali et al.

where cf is the average cost of a failure (MR cost) and cr is the average replacement cost. Because MR is performed on failure, the failure rate resumes at hðtÞ instead of returning to hð0Þ; and thus the unit individual failure times are not renewal points. The pmf at failure n; denoted fn ðtÞ; needs R 1 to be determined in order to determine the expected cycle length E½cycle ¼ 0 tfn ðtÞdt: Let NðtÞ denote the number of failures in ½0; t and Pn ðtÞ ¼ PrfNðtÞ ¼ ng: Park shows that Pn ðtÞ ¼ poimðn; HðtÞÞ P and therefore fn ðtÞ ¼ dtd n1 k¼0 Pk ðtÞ: For example, for the Weibull distribution HðtÞ ¼ ðktÞb and E½cycle ¼

C n þ b1 kCðnÞ

:

2.2.2 Cost Limit Replacement Policy To generalize the previous models based on number of minor failures of Park [131] and constant repair cost limit of Park [132], Park [134] defines 1. Catastrophic failure: a failure is termed catastrophic when its repair cost is estimated to exceed the replacement cost r of the system. 2. Major failure: a failure is termed major when its repair cost is estimated to be between the minor repair–cost limit c and the replacement cost r of the system. 3. Minor failure: a failure which is not catastrophic or major. When the failed system requires repair, it is first inspected and the repair cost is estimated. MR is then undertaken only if the estimated cost is less than the minor repair–cost limit; or if the estimated cost is less than the replacement cost and the predetermined number n of major failures is not reached. For random repair cost with pdf gðyÞ and general time-to-failure distribution, Park [134] shows that the expected duration of a replacement cycle is given by E½cycle ¼

Z1 0

eð1G1 ÞHðtÞ

n1 X G2 HðtÞ k¼0

k!

dt;

Rc Rr where G1 ¼ 0 gðyÞdy; G2 ¼ 1 0 gðyÞdy; and HðtÞ is the cumulative hazard. The resulting long run average cost per unit time from repairs and replacement is, however, too complex and no optimization is attempted to derive the optimal values of c and n:

2.2.3 Inventory Models Whenever a unit is to be replaced, a new unit may not be immediately available if the procurement lead-time is not negligible. In this case, an ordering policy should

A Survey of Replacement Models with Minimal Repair

55

be designed to determine when to order a spare and when to replace the operating unit. Sheu et al. [179] present a model for determining the optimal ordering point n and the optimal number k of MRs before replacement which include the optimal number of MRs before replacement of Park [131] as a special case. An order for a spare is made at the time Sn of the nth failure at a cost c2 : After the lead-time L; the spare is delivered. If the kth failure occurs before Sn þ L , then the unit is shutdown and replaced at time Sn þ L with cost cs per unit time suffered for shortage. If the kth failure occurs after Sn þ L; then the unit is replaced at time Sk by the spare in inventory with inventory cost ch per unit time. An original unit is replaced at the kth failure or at time Sn þ L; whichever comes last. The MR is performed for failures in the interval ½0; Sk Þ; where Sk is the time of the kth failure at a cost c1 : Denoting by rðxÞ the failure rate function and by RðxÞ the cumulative hazard function, the expected duration of a replacement cycle for this model is given by 1 0 1 xþL 10 1 Z Z Z X j n1 RðxÞ e ½RðxÞ dx þ LA E½cycle ¼ @ fSn ;Sk ðx; yÞdydxA@ j! j¼0 x

0

0

1 0 1 1 10 1 Z X Z Z j k1 RðxÞ e ½RðxÞ dxA; fSn ;Sk ðx; yÞdydxA@ þ@ j! j¼0 0

xþL

0

where fSn ;Sk ðx; yÞ ¼

eRðyÞ ½RðxÞn1 ½RðyÞ RðxÞkn1 rðxÞrðyÞ ; ðn 1Þ!ðk n 1Þ!

is the joint pdf of Sn and Sk ðn\kÞ: Also, the expected cost per unit time in the steady-state is Cðn; kÞ ¼

E½R1 ; E½cycle

where E½R1 ¼ ðk 1Þc1 þ c2 0 1 xþL 10 1 xþL 1 Z Z Z Z þ@ fSn ;Sk ðx; yÞdydxA@cs ðx þ L yÞfSn ;Sk ðx; yÞdydxA 0 þ@

0

x

Z1

Z1

0

xþL

10 fSn ;Sk ðx; yÞdydxA@ch

0

x

Z1

Z1

0

1 ðy x LÞfSn ;Sk ðx; yÞdydxA:

xþL

Sheu [158] generalizes the previous paper of Sheu et al. [179] in three ways. First, they assume that the system is subject to two types of failures. Type I failure (minor failure) occurs at age y with probability qðyÞ and is corrected with MR,

56

M.-S. Ouali et al.

whereas type II failure (catastrophic failure) occurs with probability pðyÞ ¼ 1 qðyÞ and is corrected with replacement. Second, they assume that the expected cost of a MR has an age dependent random part and a deterministic part which depends on the age and the number of MRs. Third, two different lead-times are considered. An expedited order is made at the time of a type II failure with leadtime Le and a regular order is made at the time of the nth type I failure with leadtime Lr : The expressions of the expected cycle duration and the expected cost per unit time in the long run are too lengthy and are not reproduced here. The decision variables are the ordering point and the number of MRs before replacement. 2.2.4 Production Models Hsu and Kuo [69] address the effects of MR and replacement on a queue-like production system. They consider an unreliable queue-like production system where parts arrive according to a Poisson process with arrival ratek: The system is replaced with an identical one when a failure occurs and at least N parts have already been produced. Any failure that occurs before N parts are processed is handled by MR. The amount of time required to process a part is stochastic and follows a known pdf gðÞ with mean s: Denoting by f ðtÞ; RðtÞ; and hðtÞ the pdf, SF, and failure rate of the production system lifetime, the expected duration of a replacement cycle is shown to be 2 1 3 Z Z1 1 tf ðtÞ gN ðxÞ4 dt5dx þ Tr þ nb Tm ; E½cycle ¼ ks RðxÞ x

0

where nb ¼

Z1

2 g ðxÞ4 N

0

Zx

3 hðtÞdt5dx;

0

is the expected number of system failures before N parts are processed, Tm is the mean time to perform a MR, Tr is the mean time to replace the entire system, and gN ðxÞ is the Nth convolution of gðxÞ: Also, the long-run expected profit per unit time obtained in a production cycle is CðNÞ ¼

R 1 1 ks 0

gN ðxÞ

E½R1 hR i ; 1 tf ðtÞ dt dx þ Tr þ nb Tm x RðxÞ

where p E½R1 ¼ s

Z1 0

2 gN ðxÞ4

Z1 x

2 x 3 3 Z Z1 tf ðtÞ 5 dt dx gN ðxÞ4 cm ðtÞhðtÞdt5dx cr ; RðxÞ 0

0

A Survey of Replacement Models with Minimal Repair

57

and p is the revenue obtained from a processed part, cr is the expected cost of system replacement, and cm ðxÞ is the expected cost of MR at age x: Note that the decision variable is the number N of parts to be processed before the system is replaced with a new one. Hsu [68] considers a policy which calls for a PM operation whenever N parts have been processed. If a failure occurs and at least K PM operations have been carried out, the system is replaced by a new one. Otherwise, a failure is handled by MR. This generalizes Hsu and Kuo [69] which corresponds to the case K ¼ 0: Consider the unreliable queue-like production system where parts arrive according to a Poisson process with arrival rate k: The amount of time required to process a part follows a known pdf hðÞ with mean s: The systems pdf, hazard, and cumulative hazard functions are denoted by f ðÞ; rðÞ; and RðtÞ; respectively. The age of the production system becomes xðÞ units of time younger with each PM. If K is the number of PM operations to be carried out before the system is replaced by a new one, let Ai denote the age of the production system immediately after the ith PM operation is carried out. Then Ai ¼ max½0; Ti1 xðTi1 Þ;

i ¼ 1; 2; . . .; K;

where Ti1 ¼ Ai1 þ

Z1

hN ðyÞdy;

0

and hN ðyÞ is the Nth convolution of hðyÞ; and A0 ¼ 0: The residual life RL of the production system after the Kth PM has been carried out is Z1

E½RL ¼

ðt Ak Þ

f ðtÞ dt: RðAk Þ

Ak

Finally, the expected duration of a replacement cycle is given by 0 1 1 Z 1@ E½cycle ¼ hKN ðxÞdx þ E½RL A þ nf Tm þ KTp þ Tr ; ks 0

where nf ¼

1 Ai þy K1 Z Z X i¼0

0

rðtÞhN ðyÞdtdy;

Ai

is the expected number of system failures before KN items are processed, given that a PM operation is carried out whenever N items are processed, Tm is the mean

58

M.-S. Ouali et al.

time to perform a MR, Tp is the mean duration of a PM operation, and Tr is the mean time to replace the system. Now, if p is the revenue obtained from a processed part, cm ðÞ is the cost of a MR, which is a non-decreasing function of the systems age, cp is the cost of a PM operation, and cr is the cost of replacing the system by a new one, then the long-run expected profit per unit time obtained in a production cycle is PðN; KÞ ¼

E½R1 ; E½cycle

where 0

p E½R1 ¼ @ r

1

Z1 h

KN

ðxÞdx þ E½RL A

1 Ai þy K1 Z Z X i¼0

0

0

cm ðtÞrðtÞhN ðyÞdtdy Kcp cr :

Ai

Note that the decision variables are the number N of parts to process before a PM operation is carried out and the number K of PM operations to be carried out before the system is replaced by a new one.

2.2.5 Shock Models Sheu and Griffith [169] consider a system subject to shocks that arrive according to Rt a NHPP with intensity rðtÞ and mean value function KðtÞ ¼ 0 rðuÞdu: As shocks occur, a system has two types of failures. Type I failure (minor failure) is removed by a MR, whereas type II failure (catastrophic failure) is removed by replacement. The probability of a type II failure is permitted to depend on the number of shocks since the last replacement. A system is replaced at the times of type II failure or at the nth type I failure, whichever comes first. The expected duration of a replacement cycle is given by E½cycle ¼

Z1

HðtÞdt;

0

where ¼ HðtÞ

1 KðtÞ X e ½KðtÞk k¼0

k!

k: P

Denoting by R2 the replacement cost of the system, by ak ðtÞ ¼ Ec0 ðtÞ ½gðc0 ðtÞ; ck ðtÞ the expected cost of the ith MR at age t; where c0 ðtÞ is the age-dependent random part and ck ðtÞ is the deterministic part that depends on the age and on the number of MRs, and by bi ðtÞ the cost per unit time of maintenance of the system at time

A Survey of Replacement Models with Minimal Repair

59

t 2 ½Si ; Siþi Þ; where Si is the arrival time of the ith shock, the expected cost per unit time for an infinite time span is given by E½R1 CðnÞ ¼ R 1 ; HðtÞdt 0

where n1 Z X

1

E½R1 ¼ R2 þ

k¼1

þ

Z1 0

ak ðtÞ

eKðtÞ ½KðtÞk1 k dt rðtÞP ðk 1Þ!

0

n1 X

eKðtÞ ½KðtÞk bk ðtÞ Pk dt: k! k¼0

Lai and Leu [94] consider a system that is subject to shocks according to a NHPP with rate k: There are two types of shocks. A type II shock occurs with probability 1 p and causes the system to fail and such a failure is corrected by a MR. A type I shock occurs with probability p and causes damage to the system in the sense that it increases the failure rate by a certain amount and the failure rate also increases with age due to aging process without external shocks. The system is replaced at the time of the first type I failure or the nth type II failure, whichever occurs first. Letting rj ðtÞ denote the failure rate of the system when the age of the system is t and the number of type I shocks which has arrived until t is j; the time Y1 to type I failure for the system has Sf Rt p ðtÞ ¼ e 0 rðx;pkÞdx ; F where rðx; pkÞ ¼

1 X

R t rj ðtÞ

j¼0

0

j R 1 pkðxÞdx pkðxÞdx e 0 : j!

Letting T1 ; T2 ; . . .; Tn ; . . . be successive times of type II shock, and Z ¼ minðY1 ; Tn Þ represent the time to replacement, the expected length of the replacement cycle is n1 Z X

1

E½cycle ¼

k¼0

p ðtÞPk ðzÞdt; F

0

where R z Pk ðzÞ ¼

0 ð1

k

pÞkðxÞdx e k!

Rz 0

ð1pÞkðxÞdx

:

60

M.-S. Ouali et al.

The long-run expected cost per unit time is given by CðnÞ ¼

cm

Pn2 R 1 k¼0

0 Fp ðzÞPk ðzÞð1 pÞkðzÞdz þ c0 ; P n1 R 1 Fp ðtÞPk ðzÞdt k¼0 0

where c0 is the replacement cost while cm is the mean repair cost.

2.2.6 Imperfect Repair Models Bae and Lee [7] modify the imperfect repair model of Brown and Proschan [21] where, upon failure, a device is returned to the ‘good-as-new’ state with probability p and to the functioning state with probability q ¼ 1 p; by restricting the number of consecutive MRs by n: They argue that it is possible to perform repeatedly only MRs in the original model, but, in practice, if only MRs are repeated several times, it is preferable to perform a perfect repair, regardless of p; at the next failure of the device. Hence, they limit the number of consecutive MRs by n; that is, if only MRs have been consecutively performed n times from the last perfect repair, a perfect repair is performed with probability 1 at the next failure of the device. Practically, a flat tire is replaced with a new one if the tire was sealed several times previously, even though it is possible to seal it again. They show that p;n ðtÞ of the time between two successive perfect repairs is given by the Sf F p;n ðtÞ ¼ FðtÞ F

Pn

k¼0 ½q ln FðtÞ

k!

k

;

where FðtÞ is the life distribution of the device. Bae and Lee [7] is however not related to optimization. They do not try to find the optimal value of n which minimizes the long-run average cost per unit time but are interested in studying the preservation of ageing properties of the modified model.

2.2.7 Improving and Deteriorating Systems Lam [95] uses a geometric process (GP) maintenance model to study a deteriorating system and an improving system. A stochastic process fZn ; n ¼ 1; 2; . . .g is called a GP if there exits a real a [ 0; called ratio of the GP, such that fan1 Zn ; n ¼ 1; 2; . . .g forms a renewal process. Lam [95] considers a system that is repaired whenever it fails and that is replaced by an identical new one following the Nth failure time. Let X1 be the operating time after the installation or a replacement and for n [ 1; let Xn be the operating time of the system after the ðn 1Þth repair, then fXn ; n ¼ 1; 2; . . .g forms a GP with EðX1 Þ ¼ k and ratio a: Also, let Yn be the repair time after the nth failure, then fYn ; n ¼ 1; 2; . . .g constitutes a GP with EðY1 Þ ¼ l and ratio b: The system is assumed to be a deteriorating system if the successive operating times form a decreasing GP while the

A Survey of Replacement Models with Minimal Repair

61

consecutive repair times constitute an increasing GP, which is ensured by taking a 1 and 0\b 1: The system is assumed to be an improving system if the successive operating times form an increasing GP while the consecutive repair times constitute a decreasing GP, which is ensured by taking 0\a 1 and b 1 except the case a ¼ b ¼ 1: Moreover, denote the replacement time by Z and assume that EðZÞ ¼ s: Then, the expected length of a cycle is given by E½cylce ¼ k

N N1 X X 1 1 þ l þ s: k1 k1 a b k¼1 k¼1

Denoting by r the operating reward rate, by c the repair cost rate and by R the replacement cost, the long-run average cost per unit time is given by PN1 1 ðc þ rÞl k¼1 þ R þ rs bk1 : CðNÞ ¼ PN 1 P N1 1 k k¼1 ak1 þ l k¼1 bk1 þ s

2.3 (N, T)-policy Nakagawa and Kowada [126] consider a replacement model where a system is replaced at time T or at the nth failure after its installation, whichever occurs first. They show that the mean time to replacement is given by n1 Z X RðtÞj RðtÞ E½cycle ¼ dt; e j! j¼0 T

0

where RðtÞ is the cumulative hazard and the expected cost per unit time is P RðtÞj RðtÞ c1 n n1 þ c2 j¼0 ðn jÞ j! e Cðn; TÞ ¼ ; Pn1 R T RðtÞj RðtÞ dt j¼0 0 j! e where c1 is the cost of failure and c2 is the cost of replacement. Sheu et al. [177] propose a generalized replacement policy where a system has two types of failures and is replaced at the nth type I failure (minor failure) or first type II failure (catastrophic failure) or at age T; whichever occurs first. Type I and type II failure are age-dependent. Type I failures occur with probability qðzÞ and are removed by MRs. Type II failures occur with probability pðzÞ ¼ 1 qðzÞ and the unit has to be replaced. The cost of the MR of the system at age z depends on the random part CðzÞ and the deterministic part cðzÞ: The expected length of a replacement cycle is given by n1 Z X

T

E½cycle ¼

k¼0

0

p ðzÞpk ðzÞdz; F

62

M.-S. Ouali et al.

Rz p ðzÞ ¼ e 0 pðxÞrðxÞdx and pk ðzÞ is the probability of k type I failure during where F ½0; z: The total expected long-run cost per unit time is P Pn2 R T c3 þ ðc2 c3 Þ n1 i¼0 Fp ðTÞpi ðTÞ þ i¼0 0 hðzÞFp ðzÞpi ðzÞqðzÞrðzÞdz ; Cðn; TÞ ¼ Pn1 R T p ðzÞpk ðzÞdz F k¼0

0

where c3 is the cost of replacement at the nth type I failure or first type II failure and c2 is the cost of replacement at age T: Here rðzÞ is the system failure rate and hðzÞ ¼ EcðzÞ ½gðCðzÞ; cðzÞÞ: 2.3.1 Cost Limit Replacement Policy In Kapur and Garg [86], the repair cost is estimated by inspection on each minimal failure and GðuÞ is the Cd of MR cost. The system is replaced at nth type I failure (minimal failures) or first type II failure (catastrophic failure) or at age T or when the estimated repair cost of minimal failures exceeds the pre-determined limit L; whichever occurs earlier. Type I and Type II failures are assumed to be age dependent by letting pðtÞ be the probability of type I (minimal) failure when the age of the system reaches t: Denoting by qðtÞ and QðtÞ the hazard rate and cumulative hazard rate of the system, respectively, Kapur and Garg [86] show that the expected duration of a replacement cycle is given by ZT n1 X j G ðLÞ Aj ðxÞdx; E½cycle ¼ j¼0

where

Rx Aj ðxÞ ¼

0

0

pðtÞqðtÞdt QðxÞ e ; j!

j ¼ 0; 1; 2; . . .;

denotes the probability of j type I and no type II failure in ð0; xÞ: Also, the mean cost rate is found to be P RT c0 þ ðc1 þ E½LGðLÞ n2 pðxÞqðxÞdxÞ j¼0 0 Aj ðxÞ Cðn; T; LÞ ¼ ; RT Pn1 j j¼0 G ðLÞ 0 Aj ðxÞdx where c0 is the replacement cost, c1 is the inspection cost on minimal failure, and E½L is the mean value of repair cost.

2.3.2 Shock Models Chien et al. [45] consider a system subject to shocks that arrive according to a Rt NHPP with intensity rðtÞ and mean value KðtÞ ¼ 0 rðuÞdu: As shocks occur, a

A Survey of Replacement Models with Minimal Repair

63

system has two types of failures: type I failure (minor failure) is rectified by a MR, whereas type II failure (catastrophic failure) is removed by replacement. The probability of a type II failure is permitted to depend on the number of shocks since the last replacement. This paper proposes a generalized replacement policy where a system is replaced at the nth type I failure or first type II failure or at age T; whichever occurs first. The expected duration of a replacement cycle is given by E½cycle ¼

ZT

GðtÞdt;

0

where ¼ GðtÞ

n1 KðtÞ X e ½KðtÞk

k!

k¼0

k; P

k is the (known) probability that the first k shocks are type I failures. The and P expected cost rate is given by P RT eKðtÞ ½KðtÞk kþ1 R2 þ ðR2 R1 Þ½1 GðTÞ þ n2 rðtÞP k¼0 0 hðtÞ k! Cðn; TÞ ¼ RT GðtÞdt 0

where R1 is the cost of replacement at nth type I failure or first type II failure, R2 is the cost of replacement at age T; and gðc1 ðtÞ; c2 ðtÞÞ is the cost of MR at age t with the expected cost hðtÞ ¼ Ec1 ðtÞ ½gðc1 ðtÞ; c2 ðtÞÞ where c1 ðtÞ is the age-dependent random part and c2 ðtÞ is the age-dependent deterministic part. 2.3.3 Warranted System Murthy et al. [118] consider a modification to the failure-free warranty policy as follows. Under this policy, should the first failure occur within a period Tð WÞ subsequent to the sale of the item, the consumer has a choice between the following two options: (1) obtain a total refund or, (2) get a new replacement item with a new warranty, identical to the original warranty, and a lump sum of Cc ; as compensation for the inconvenience of experiencing an early failure. All failures beyond T and within warranty are repaired at no cost to the consumer. Murthy et al. also assume that some of the items produced can be non-conforming. An item produced is conforming with probability p1 and non-conforming with probability p2 with p1 þ p2 ¼ 1: Let F1 ðtÞ and F2 ðtÞ denote the failure distribution function for conforming and non-conforming items, respectively. To model dissatisfied consumer behavior, Murthy et al. introduce the probability q ð0 q 1Þ that a dissatisfied consumer will choose to collect the refund (under the moneyback guarantee option) and to join the ranks of lost consumers as follows: 1; for X T0 ; q¼ FðT0 Þ=FðTÞ; for T0 \X T;

64

M.-S. Ouali et al.

where X denotes the age of the item at failure. Now, denote by Cm the manufacturing cost per item, Sð [ Cm Þ the sale price per item, Ch the handling cost for each warranty claim, Cr ð\Cm Þ the cost of each minimal repair, and Cg the cost to the manufacturer for each consumer lost. Then, the expected total warranty cost per item is given by CðT; WÞ ¼ AðT; WÞ þ BðT; WÞ; where AðT; WÞ ¼

FðT0 ÞC1 þ ½FðTÞ FðT0 ÞC2 ; FðT0 Þ þ 1 FðTÞ

with C1 ¼ S þ Ch þ Cg ; C2 ¼ Cm þ Ch þ Cc ; and BðT; WÞ ¼

1 ðTÞR1 ðT; WÞ þ p2 F 2 ðTÞR2 ðT; WÞ ðCr þ Ch Þ½p1 F : FðT0 Þ þ 1 FðTÞ

Here, R1 ðT; WÞ and R2 ðT; WÞ represent the expected number of failures for a conforming item and a non-conforming item, respectively, and are given by i ðTÞ F ; i ¼ 1; 2: Ri ðT; WÞ ¼ log Fi ðWÞ Jhang [76] generalizes Jhang [77] to the case where the process of buying a new product is conducted at time T þ W; upon type II failures, or the nth of type I failures, whichever occurs first. Using the same notation, the mean cycle length becomes n1 Z X

T

E½cycle ¼ W þ

k¼0

~ ~pk ðtÞdt; F

0

where ~ ~pk ðtÞ ¼ F

e

Rt 0

~qðxÞ~r ðxÞdx

Rt ½ 0~ qðxÞ~r ðxÞdxk ; k!

while the expected cost of the society per unit time becomes Cðn; TÞ ¼

E½R1 RT ~ ~p ðtÞdt Wþ 0 F

A Survey of Replacement Models with Minimal Repair

65

where 2 E½R1 ¼ cr 4Fp ðWÞ þ

ZW

3 V ðW yÞdFp ðyÞdy5

0

2

ZW

þ cd 4

Fp ðyÞdy þ

WL

þ

ZW

ZW

V ðW yÞFp ðyÞdy

0

ZW

3 V ðW y LÞFp ðyÞdy5

0

p ðyÞhðyÞqðyÞrðyÞdy ð1 þ V ðW yÞÞF

0 n1 Z X ðzÞ~ ~ hi ðzÞ~ pi1 ðzÞ~ qðzÞ~r ðzÞdz: þ F ~p T

i¼1

0

2.3.4 Deteriorating Systems Sheu [162] considers a deteriorating system subject to two types of failures. Type I failure (minor failure) is removed by a repair, whereas when a type II failure (catastrophic failure) occurs a unit has to be replaced. The system is replaced at the Nth type I failure or first type II failure or at the working age T; whichever occurs first. It is assumed that the operating intervals Xn of the system after the ðn 1Þth repair form a stochastically decreasing sequence with decreasing means E½Xn ¼ kn while the repair periods Yn after the nth failure form a stochastically increasing P sequence with increasing means E½Yn ¼ ln : Let Un ¼ ni¼1 Xi and let Fk ðtÞ denote the Cdf of Uk : Then, the expected cycle length of a replacement cycle is given by

E½cycle ¼

N X

k1 P

k¼1

ZT

k ðtÞ F k1 ðtÞdt þ ½F

0

N 1 X

kF k ðTÞ; lk P

k¼1

k is the probability that the first k failures are type I failures. Denoting the where P repair cost rate by c1 ; the reward rate whenever the system is operating by c2 ; and the replacement cost by c3 ; the expected cost rate is given by CðN; TÞ ¼

c1

PN1

k¼1 lk Pk Fk ðTÞ þ c3 RT PN k¼1 Pk1 0 ½Fk ðtÞ

R P k1 ðtÞdt k1 T ½F k ðtÞ F c2 Nk¼1 P 0 : PN1 k1 ðtÞdt þ k¼1 lk P k F k ðTÞ F

66

M.-S. Ouali et al.

3 Block Replacement As in the case of age replacement, we will review in this section the models under the T-policy, the N-policy, and the (N, T)-policy.

3.1 T-policy Barlow and Hunter [14] considered the case of periodic replacement at times T; 2T; 3T; . . .; (for some T [ 0) and MR if the system failed otherwise. They considered cost c1 of replacement and c2 for each MR and showed that the total cost per unit time is CðTÞ ¼

c2 HðTÞ þ c1 ; T

where HðtÞ is the cumulative hazard.

3.1.1 Modified Policy Nakagawa [122] proposes four models of modified periodic replacement with MR at failures when the scheduled replacement time is specified. • If a failure occurs just before the replacement time, three models are considered. In model (A), a unit that fails during ð0; T0 undergoes MR while it remains as it is until the replacement time if it fails during ðT0 ; TÞ: The expected costs rate where T0 is the decision variable is given by CðT0 Þ ¼

c1 RðT0 Þ þ c2 þ c3 ½ðT T0 Þ LðT0 ; TÞ ; T

where RðtÞ is the cumulative hazard, Lða; bÞ is the mean life between times a and b of the unit of age a; c1 is the cost of MR at failure, c2 is the cost of scheduled replacement, and c3 is the cost rate for time between failure and its detection. In model (B), a unit that fails during ðT0 ; TÞ is replaced by one of spares and the expected costs rate is RT c1 RðT0 Þ þ c2 þ c4 T0 N1 ðTtÞdFðtÞ 0Þ FðT CðT0 Þ ¼ ; T is the Sf, N1 ðtÞ is the where FðtÞ is the Cdf of the time to failure of the unit, FðtÞ renewal function of a spare, and c4 is the replacement cost of a spare. In model (C), a unit that fails during ðT0 ; TÞ is replaced by a new unit and the expected costs rate is

A Survey of Replacement Models with Minimal Repair

CðT0 Þ ¼

67

c1 RðT0 Þ þ c2 þ c5 HðT0 ; TÞ ; T0 þ LðT0 ; TÞ

where Hðt; t1 Þ is average hazard rate of the unit and c5 is the additional cost of non-scheduled replacement caused by failure. • If a failure occurs well before replacement time then model (D) it to replace the unit at failure or at time T1 ; whichever occurs first. The expected costs rate is then CðT1 Þ ¼

c1 RðTÞ þ c2 þ c5 HðT; T1 Þ ; T þ LðT; T1 Þ

Beichelt [18] considers a system with failure rate hðxÞ subject to two types of failures. At age x a type I failure occurs with probability qðxÞ and is removed by MR while a type II occurs with probability pðxÞ ¼ 1 qðxÞ and is removed by replacement. Letting MðtÞ and NðtÞ denote the expected number of of type I and type II failures in ð0; tÞ; the long run cost rate is given by CðTÞ ¼

c1 MðTÞ þ c2 NðTÞ þ c3 ; T

where c1 is the cost of MR, c2 is the cost of a replacement after type II failure, and c3 is the cost of a replacement by PM. The moments MðtÞ and NðtÞ satisfy MðtÞ ¼ LðtÞ þ

Zt

Mðt xÞdGðxÞ;

0

and

NðtÞ ¼ GðtÞ þ

Zt

Nðt xÞdGðxÞ;

0

Rt

R ¼ e 0 pðxÞhðxÞdx and LðtÞ ¼ t GðxÞhðxÞdx where GðtÞ GðtÞ: Their Laplace 0 transforms can be easily obtained but the inversion must be done by numerical procedures. Nakagawa and Yasui [128] suggest five replacement policies where a unit is replaced at periodic times, jT; ðj ¼ 1; 2; . . .; Þ and the replacement cost is expensive when some number of events occurring in ð0; TÞ is greater than a threshold level. They show how usual models for inspection, periodic replacement, block replacement, parallel systems, and cumulative damage can be transformed into replacement models with threshold levels.

68

M.-S. Ouali et al.

3.1.2 Inventory Models When the procurement lead-time is not negligible, an ordering policy should determine when to order a spare and when to replace the operating unit. Park and Park [136] propose two ordering policies for replacement with MR which include the ‘periodic replacement with MR at failure’ of Barlow and Hunter [14] as a special case. Let t0 represent the planned order time instant and t1 represent the planned replacement time from t0 : The ordering lead-time has pdf gðxÞ; Cdf GðxÞ and mean m: In Policy 1, an order for a spare is placed at time t0 and the original unit is replaced as soon as the ordered spare is delivered. The expected cycle length in this case is given by E½cycle ¼ t0 þ m; and the cost rate is cðt0 Þ ¼

cp þ

R1 0

Hðt0 þ xÞgðxÞdx ; t0 þ m

where cp is the cost of a replacement and HðtÞ is the cumulative hazard of a unit. In Policy 2, the original unit is replaced at time t0 þ t1 if a spare is available, or as soon as the ordered spare arrives. The expected cycle length becomes E½cycle ¼ t0 þ t1 þ

Z1

½1 GðxÞdx;

t1

while the cost rate becomes

cðt0 ; t1 Þ ¼

cp þ Hðt0 þ t1 ÞGðt1 Þ þ

R1

R t1

t1

0

Hðt0 þ xÞgðxÞdx þ ch R1 t0 þ t1 þ t1 ½1 GðxÞdx

GðxÞdx

;

where ch is the cost of holding a spare. Park and Park also extend their results to the case when replacement between failures is postponed till the next failure as in Muth [119]. The cost rates of Policies 1 and 2 are, respectively, modified to cðt0 Þ ¼

cp þ

R1 0

R1 Hðt0 þ xÞgðxÞdx þ ch 0 rðt0 þ xÞgðxÞdx R1 ; t0 þ m þ 0 rðt0 þ xÞgðxÞdx

and cðt0 ; t1 Þ ¼ where

t0 þ t1 þ

R1 t1

Nðt0 ; t1 Þ R1 ; ½1 GðxÞdx þ rðt0 þ t1 ÞGðt1 Þ þ t1 rðt0 þ xÞgðxÞdx

A Survey of Replacement Models with Minimal Repair

Nðt0 ; t1 Þ ¼ cp þ Hðt0 þ t1 ÞGðt1 Þ þ 2 þ ch 4

Z1

69

Hðt0 þ xÞgðxÞdx

t1

Zt1

GðxÞdx þ rðt0 þ t1 ÞGðt1 Þ þ

Z1

3 rðt0 þ xÞgðxÞdx5;

t1

0

and rðtÞ is the mean residual life of a unit. Sheu [161] generalizes Park and Park [136] by considering a general ordering policy with number-dependent MR and random lead-time according to the following scheme. A system has failure rate function rðtÞ and cumulative hazard function KðtÞ: There are two types of failures: Type I failure (minor failure) is removed by a MR, whereas if type II failure (catastrophic failure) is removed by a replacement. If type II failure occurs before a specified time T0 ; then the expedited order is made at the time of the type II failure. Otherwise, the regular order is made at time T0 : The replacement policy can be summarized as follows: 1. If type II failure occurs before T0 ; then the system is shut down and replaced by the spare as soon as the spare is delivered. 2. If type II failure occurs between T0 and the arrival of the regular ordered spare, then the system is shut down and replaced by the spare as soon as the spare is delivered. 3. If type II failure occurs between the arrival of the regular ordered spare and T0 þ T1 ; where T1 is measured from ordering time T0 ; then the delivered spare is put into inventory and the system is replaced by that spare at type II failure time instant. 4. If the regular ordered spare arrives before T0 þ T1 and type II failure occurs after T0 þ T1 ; then the delivered spare is put into inventory and the system is replaced by that spare at the time T0 þ T1 : 5. If the regular ordered spare arrives after T0 þ T1 and type II failure does not occur before the arrival of the regular ordered spare, then the system is replaced by the spare as soon as the spare is delivered. The random lead-time Le of an expedited order has pdf ke ðxÞ and finite mean ue : The random lead-time Lr of a regular order has pdf kr ðxÞ and finite mean ur : The expected duration of a replacement cycle for this model is given by 0Þ þ 0 Þ þ ur HðT E½cycle ¼ ue ½1 HðT

ZT0 0

HðyÞdy þ

ZT1 TZ0 þT1 0

HðyÞdyk r ðxÞdx;

T0 þx

where HðyÞ; the SF of the waiting time until the first type II failure of the system, is given by 1 KðtÞ X e ½KðtÞk HðyÞ ¼ Pk ; k! k¼0

70

M.-S. Ouali et al.

k is the probability that the first k failures are type I failures and is assumed to and P be known. Introducing the cost ce incurred for each expedited order made up to time T0 ; the cost cr incurred for each regular order made at time T0 ; the cost cs per unit time resulting from the shortage, the cost ch per unit time for holding an item in stock, and the cost ck of the kth MR, the expected cost per unit time in the long run is CðT0 ; T1 Þ ¼

0Þ þ 0 Þ þ ur HðT ue ½1 HðT

E½R1 ; R T1 R T0 þT1 R T0 0 HðyÞdy þ 0 T0 þx HðyÞdykr ðxÞdx

where 0 0 Þ þ cr HðT 0 Þ þ cs @ue HðT0 Þ þ ur HðT 0Þ E½R1 ¼ ce ½1 HðT

Z1

1 A HðyÞdyk r ðxÞdx

0

þ ch

ZT1 TZ0 þT1 0

þ

þ

T0 þx

ZT1 X 1 0

kþ1 ckþ1 P

k¼0

Z1 X 1 T1

HðyÞdyk r ðxÞdx

k¼0

TZ 0 þT1

eKðyÞ ½KðyÞk rðyÞdykr ðxÞdx k!

0

kþ1 ckþ1 P

T Z0 þx

eKðyÞ ½KðyÞk rðyÞdykr ðxÞdx: k!

0

Chien [38] considers a variation of the model of Sheu [161] with two types of failures, age-dependent MR, random lead-time, and incorporates a new decision variable, the allowable inventory time for a spare. The allowable inventory period is measured from the time instant that the ordered spare is delivered. Using the same notation as Sheu [161], the replacement policy can be summarized as follows: 1. If type II failure occurs before T0 ; then the system is shut down and replaced by the expedited ordered spare as soon as the spare is delivered. 2. If type II failure occurs between T0 and T0 þ Lr ; then the system is shut down and replaced by the regular ordered spare as soon as the spare is delivered. 3. If type II failure occurs between T0 þ Lr and T0 þ Lr þ Th ; then the delivered spare is placed into the inventory, and the system is replaced by that spare at the type II failure time instant. 4. If type II failure occurs after T0 þ Lr þ Th ; then the delivered spare is placed into the inventory, and the system is replaced by that spare at the time T0 þ Lr þ Th : The duration of the replacement cycle is then

A Survey of Replacement Models with Minimal Repair

E½cycle ¼ ue Fp ðT0 Þ þ ur Fp ðT0 Þ þ

ZT0

71

p ðyÞdy þ F

0

p ðtÞ ¼ e where F is given by CðT0 ; Th Þ ¼

Ry 0

pðxÞrðxÞdx

Z1

T0Z þxþTh

0

T0 þx

p ðyÞdykr ðxÞdx; F

; while the expected cost per unit time in the long run

le Fp ðT0 Þ þ lr Fp ðT0 Þ þ

R T0 0

E½R1 p ðyÞdy þ F

R 1 R T0 þxþTh 0

T0 þx

p ðyÞdykr ðxÞdx F

;

where E½R1 ¼ ce Fp ðT0 Þ þ ch 0

Z1

T0Z þxþTh

0

T0 þx

p ðyÞdykr ðxÞdx þ cr F p ðT0 Þ F

B p ðT0 Þ þ cs @ue Fp ðT0 Þ þ ur F

Z1 TZ0 þx 0

þ

Z1 T0ZþxþTh 0

1 p ðyÞdykr ðxÞdxC F A

T0

p ðyÞhðyÞqðyÞrðyÞdykr ðxÞdx: F

0

Arguing that the policy maximizing profit rate does not discriminate among large and small investments, Chen and Chien [32, 33] also modify the model of Sheu [161] by using the cost effectiveness as an alternative criterion. The cost effectiveness is defined as (availability)/(expected cost rate). They claim that this criterion is suitable for reflecting efficiency per dollar spent and is useful for the effective use of available money, especially when the benefits obtained from investment are difficult to quantify. The replacement policy can be summarized as follows: 1. A type II failure occurs before the scheduled ordering time T; i.e., an expedited order is made and the failed system is replaced correctively by the spare as soon as the spare is delivered. 2. A type II failure occurs between T and the arrival of the ordered spare; i.e., a regular order is made and the failed system is replaced correctively by the spare as soon as the spare is delivered. 3. No type II failure occurs before the arrival of the ordered spare; i.e., a regular order is made and the un-failed system is replaced preventively by the spare as soon as the spare is delivered. Using the same notation as Sheu [161], the expected cycle length is given by p ðTÞ þ E½cycle ¼ ur ður ue ÞF

ZT 0

p ðxÞdx: F

72

M.-S. Ouali et al.

Now the cost effectiveness per unit time in the long run is given by Cost effectiveness CEðTÞ ¼

availability expected up time in a cycle UðTÞ ¼ ; expected cost rate expected cost per cycle CðTÞ

where UðTÞ ¼

Z1 ZTþy 0

p ðxÞdxkr ðyÞdy; F

0

and CðTÞ ¼ ðcr þ cp Þ þ ½ðce cr Þ cd ður ue ÞFp ðTÞ vs

Z1 Z1 0

þ ðcc cp Þ

Z1

Fp ðT þ yÞkr ðyÞdy þ cd

Z1 ZTþy

0

þ

Z1 ZTþy 0

0

p ðxÞdxkr ðyÞdy F

Tþy

Fp ðxÞdxkr ðyÞdy

T

p ðxÞhðxÞqðxÞrðxÞdxkr ðyÞdy: F

0

The unit costs are defined as follows: ce ðcr Þ is the cost for an expedited (regular) order, cc ðcp Þ is the cost for a corrective (preventive) replacement, cd is the cost rate resulting from the system down, and vs is the salvage value per unit time for the residual lifetime of an un-failed system.

3.1.3 Shock Models We have already described the system studied by Sheu [160] under an age replacement policy. Using a result of Savits [151], Sheu [160] derives the expected long-run cost per unit time for the block policy from the expected long-run cost per unit time for the age policy. Denote by JA ðTÞ and JB ðTÞ the expected long-run costs per unit time for the age policy and the block policy, respectively. Then JA ðTÞ ¼ AðTÞ=E½Y ^ T;

ð5Þ

JB ðTÞ ¼ BðTÞ=T;

ð6Þ

where Y is the random variable with Sf GðyÞ ¼

1 KðtÞ X e ½KðtÞk k¼0

k!

k : P

A Survey of Replacement Models with Minimal Repair

73

Also, the operational costs AðTÞ and BðTÞ over the renewal intervals are related by Z AðT xÞdUðxÞ; ð7Þ BðTÞ ¼ ½0;TÞ

where UðxÞ is the renewal function generated by the random variable Y; so that 8 ZT 1< JB ðTÞ ¼ R2 þ R1 VðTÞ þ ½1 þ VðT tÞ T: 0 " # ) X eKðtÞ ½KðtÞk Pk dt ; ðnkþ1 ðtÞqkþ1 rðtÞ þ mk ðtÞÞ k! k¼0 where VðxÞ ¼ UðxÞ 1: Sheu [160] computes also the expression of the total adiscounted cost for the block replacement policy from the total a-discounted cost for the age replacement policy. Sheu and Griffith [171] generalize the block replacement policy of Sheu [160] to the case where a system is not necessarily replaced by a new one but can be replaced by a used one. Under such a policy, an operating system is preventively replaced by new ones at times iT; ði ¼ 1; 2; . . .; Þ independently of its failure history. If the system fails in ðði 1ÞT; iT dÞ it is either replaced by a new one or minimally repaired, and if in ½iT d; iTÞ it is either replaced by a used one or minimally repaired. The expression of the expected cost rate with T and d as decision variables is quite complex and no optimization is attempted to derive the optimal values. 3.1.4 Warranted Systems Pascual and Ortega [138] consider an equipment that receives three kinds of maintenance actions: MR (as good as before the failure), imperfect repair (between as good as after the previous repair and as good as before the repair) or replacement (as good as new). The equipment receives n 1 imperfect repairs during its life. The interval between imperfect repairs Ts is constant and the life-cycle is given by T ¼ nTs : The quality of an imperfect repair (as well as its cost) is dependent on the improvement factor p ð0\p\1Þ: The decision variables being the number n of imperfect repairs, the life-cycle duration T; and the warranty interval Tw ; the total expected cost per unit time is given by RT RT cr þ co ðpÞðn 1Þ þ cfm 0 kðtÞdt þ cim Tw kðtÞdt ; CðT; Tw ; nÞ ¼ T where cim is the repair cost, cfm is the downtime cost of a repair, co ðpÞ is the cost of an imperfect repair, cr is the cost of a replacement, and kðtÞ is the equipment failure rate.

74

M.-S. Ouali et al.

3.1.5 Inspection Models Mohandas et al. [111] consider a system in which failures are detected only by inspection. A MR is performed when the system is found to be in a failed state during an inspection unless it is a pre-set replacement time in which case the system is replaced. Here a failure means that the system starts producing items outside tolerance limits. If the system is found to be in a failed state during an inspection, the entire production output during the interval prior to it is scrapped. As such, the revenue (r units per unit time) accrues only for the expected duration the system is known (for sure) to be in good operating condition. An inspection is always carried out at time Tj ; to see whether the system has failed and, if so, to reject the items produced in the interval prior to it. The time required for a replacement is tr : The inspection times are denoted by ti for i ¼ 1; 2; . . .; nðjÞ; where tnðjÞ ¼ Tj : Also, denote by FðtÞ and GðtÞ the Cdf and Sf of the system’s failure time distribution, respectively, and consider the following costs: (1) the cost of MR is KFðtÞ; where t refers to the operating age of the equipment; (2) the cost of an inspection is I; (3) the operating cost per unit time is c; and (4) the cost of a replacement is R0 : Then, the profit per unit time for a given replacement time is given by Pðt1 ; . . .; tnðjÞ ; nðjÞÞ ¼

Nðt1 ; . . .; tnðjÞ ; nðjÞÞ ; Tj þ tr

where Nðt1 ; . . .; tnðjÞ ; nðjÞÞ ¼ r

nðjÞ X Gðti Þ ðti ti1 Þ cTj InðjÞ Gðt i1 Þ i¼1

K

nðjÞ1 X i¼1

Gðti1 Þ Gðti Þ ½1 Gðti Þ R0 : Gðti1 Þ

Note that the decision variables are ti ði ¼ 1; . . .; nðjÞÞ and nðjÞ: Hariga and Azaiez [66] also consider a production facility subject to failure at random times in which failures are detected only by inspection. The operating states of the process can be in either ‘in-control’ or ‘out-of-control’ states. During the in-control state, the process generates a net profit pi per unit of time while during the ‘out-of-control’ state it yields a smaller profit po per unit of time. The elapsed time of the process in the ‘in-control’ state before shifting to the ‘out-ofcontrol’ state is a random variable t with known reliability function RðtÞ: At predetermined times tj ðj ¼ 1; 2; . . .; nÞ; inspections are carried out at cost ci each to detect the operating state of the production facility. In case an ‘out-ofcontrol’ state is detected, a MR at cost cr is performed to bring the process back to the ‘in-control’ state. After the nth inspections, further repairs may no longer be justified economically, at which point the production process should be replaced at

A Survey of Replacement Models with Minimal Repair

75

a cost cp : The expected net profit per unit of time generated from the operation of the production facility is given by R tj Pn ðpi po Þ tj1 RðtÞdtþcr Rðtj Þ nðci cr Þ cp j¼1 Rðtj1 Þ Pðt1 ; . . .; tn ; nÞ ¼ ; tn The decision variables are the inspection frequency n and the inspection times tj ðj ¼ 1; 2; . . .; nÞ:

3.1.6 Improving and Deteriorating Systems Deshpande and Singh [55] investigate the effects of ageing of ‘the distribution of time to first failure’ on improvement and deterioration of a repairable system subject to MRs. Denoting by C the system replacement cost, by Ci the cost incurred for MR i; and by qði; tÞ the cost rate of system maintenance at time t between the ith and ði þ 1Þth failures, they use for expected cost incurred during ½0; T; hP i R T NðTÞ CþE i¼1 Ci þ 0 E ½qðNðuÞ; uÞdu CðTÞ ¼ ; T where T is the replacement time and the random variable NðtÞ represents the number of failures in ½0; t:

3.1.7 Imperfect Repair Models Sheu et al. [178] apply a PM at time jT; resulting with probability qj in the unit having the same failure rate as before PM (imperfect PM), and with probability hj ¼ 1 qj in an as good as new unit (perfect PM). Three repair models are considered. 1. Model 1: the system is renewed at each failure between PM via major repair. If failure occurs before the scheduled PM, the major repair can be performed immediately; otherwise preventative maintenance is performed. After a major repair, the system returns to age 0. The expected duration of a replacement cycle is given by Z 1 X j1 P j P FðtÞdt; E½cycle ¼ jT

j¼1

0

76

M.-S. Ouali et al.

j is probability that the first PM are imperfect maintenances and FðtÞ is the where P Sf of the time to failure of a unit. The expected cost per unit time is h i P P1 FðjTÞ P FðjTÞ þ R P R1 1 1 P 2 j1 j j¼1 j1 j¼1 ; CðTÞ ¼ R jT P1 j¼1 Pj1 Pj 0 FðtÞdt where R1 is the cost of each PM and R2 is the cost of each major repair. 2. Model 2: the system undergoes only MR at failures between PM. If failure occurs before the scheduled PM, MR can be performed immediately. The expected cost per unit time of this model is given by R1 þ R3

P1 ðP j1 P j Þ R jT P1 j¼1 0 rðtÞdt j¼1

CðTÞ ¼

j1 P

T

;

where R3 is the cost of each MR and rðtÞ is the hazard rate of a unit. 3. Model 3: the system does not fix the failures, and failures are fixed until perfect PM is achieved. If failure occurs before the scheduled PM, imperfect repair can be made immediately. Imperfect repair does not fix the failures, and failures are only fixed by perfect maintenance. The expected cost per unit time of this model is given by R1 þ R4 CðTÞ ¼

P1 ðP j1 P j Þ R jT P1 FðtÞdt j¼1 j¼1

j1 0 P

T

;

where R4 is the cost per unit of time of the time elapsed between failure and perfect maintenance. For each model, the decision variable is the PM time T: 3.1.8 General Repair Models Nakagawa [121] considers an imperfect PM in which the unit is minimally repaired if it fails. The PM does not return the time origin all the way to zero but to x: The failure rate of the unit after PM is improved (reduced to a fraction of its value just before PM) to become kðxÞ ¼ gðc1 ; hÞkðx þ TÞ where c1 is the amount of resource consumed in PM, 0 g\1; and h is some parameter in gðc1 ; hÞ: The expected cost rate is given by RT c1 þ c2 0 kðt þ xÞdt ; Cðx; c1 Þ ¼ T where c2 is the cost of a MR at failure. Kijima et al. [88] consider the general repair model in which the state of a failed system is brought to a level which is somewhere between completely new

A Survey of Replacement Models with Minimal Repair

77

and prior to failure. Assuming all repair are of equal degree h; they use Kijima’s model I where the virtual age Vn after the nth repair satisfies Vn ¼ Vn1 þ hXn ; where Xn is the nth failure time. Let f ðt þ hxÞ ; 1 FðhxÞ

qðtjxÞ ¼

where f ðxÞ and FðxÞ are the lifetime pdf and Cdf, respectively, of a new system. The function hðtÞ satisfying the following integral equation hðtÞ ¼ qðtj0Þ þ

Zt

hðxÞqðt xjxÞdx;

0

is called a g-renewal density and HðtÞ ¼

Zt hðyÞdy; 0

is called a g-renewal function. Using HðtÞ; the cost function for this model is given by CðTÞ ¼

c0 þ c1 HðTÞ ; T

where c0 is the replacement cost and c1 is the repair cost. Nakagawa [123] assumes that PM is done at fixed intervals xk ðk ¼ 1; . . .; N 1Þ and replacement is done at PM N: If the system fails between PMs, it undergoes MR. He also assumes that the PM is imperfect and introduces improvement factors in either hazard rate or age of the system. The following two PM policies are thus considered: 1. In model A, the hazard rate after PM k becomes ak hðtÞ when it was hðtÞ in period k of PM. The expected length of a replacement cycle is given by E½cycle ¼ x1 þ þ xN ; and the mean cost-rate of the system is given by P c1 Nk¼1 Ak Hðxk Þ þ ðN 1Þc2 þ c3 : Cðx1 ; . . .; xN Þ ¼ x1 þ þ xN 2. In Model B, the age after PM k reduces to bk t when it was t before pm. The expected length of a replacement cycle is given by E½cycle ¼

N1 X k¼1

ð1 bk Þxk þ xN ;

78

M.-S. Ouali et al.

and the mean cost-rate of the system is given by P R xk c1 Nk¼1 bk1 xk1 HðtÞdt þ ðN 1Þc2 þ c3 : Cðx1 ; . . .; xN Þ ¼ PN1 k¼1 ð1 bk Þxk þ xN For both models, c1 is the cost of the MR at failure, c2 is the cost of a scheduled PM, and c3 and is the cost of a replacement. Also, HðtÞ is the cumulative hazard of Qk1 aj where ak is the improvement factor in hazard rate in the system and Ak ¼ j¼0 period k of PM with 1 ¼ a0 \a1 a2 aN1 : Zhang and Jardine [199] consider a system that undergoes a MR whenever a failure occurs and that is completely renewed whenever it reaches a certain age after last renewal. In the cycle between two consecutive renewals, a fixed number m of overhauls are performed, dividing the cycle into ðm þ 1Þ periods of equal length s: An overhaul improves the system, while a MR returns the system to the condition just before that failure. Denoting by vk1 ðtÞ the system failure rate function just before the overhaul, by vk ðtÞ the failure rate function right after the overhaul, and letting p 2 ½0; 1; Zhang and Jardine assume that an overhaul improves the system by a degree p if, for all t after this overhaul, vk ðtÞ ¼ pvk1 ðt sÞ þ ð1 pÞvk1 ðtÞ:

ð8Þ

The expected unit-time cost over infinite time horizon is given by Cðn; sÞ ¼

~ cr þ ðn 1Þco þ cm HðnsÞ ; ns

where n ¼ m þ 1; cm ; co ; cr are the costs of MR, overhaul, and renewal, respectively, n X n ni ~ HðnsÞ ¼ p ð1 pÞi1 HðisÞ; i i¼0 and HðtÞ is the originally expected failures in the interval ½0; tÞ: Note that n and s are the decision variables. Zhang and Love [200] study a repairable system subject to failure. At each failure epoch, a general repair that returns the system to a working condition somewhere between ‘good-as-new’ (a perfect repair) and ‘bad-as-old’ (a MR) is performed. They use a Markov structure (an absorbing Markov chain) consistent with Kijima Type-II repair model to discretize the virtual age of the system and regard each virtual age to be a state of the system. The repair time is not assumed R h f ðxþvhÞ dx where f ðtÞ and FðtÞ are to be negligible. For v ¼ 0; 1; . . .; V; let pv ¼ 0 1FðvhÞ the failure density function and Cdf, respectively, and h is the length of a transition step. Two policies are investigated in this research:

A Survey of Replacement Models with Minimal Repair

79

1. Overhauls are performed at fixed intervals. In this case, the expected length of a replacement cycle is given by " # V 1 V 1 X X 1 py E½cycle ¼ þ þ 1 h; 1 px y¼0 1 py x¼0 while the long-run average cost is given by P py c0 þ c1 V1 y¼0 1py i ; CðVÞ ¼ hP PV1 py V1 1 x¼0 1px þ y¼0 1py þ 1 h where c0 is the cost of a replacement and c1 is the cost of a repair. 2. Overhauls are performed at variable intervals, on the first failure following a predetermined time. In this case, the expected length of a replacement cycle is given by " # V 1 V 1 X X 1 py E½cycle ¼ þ þ 1 h þ nðVÞ; 1 px y¼0 1 py x¼0 where nðVÞ represents the expected time to the next failure given the virtual state is V; while the long-run average cost is given by P py c0 þ c1 V1 y¼0 1py i CðVÞ ¼ hP : PV1 py V1 1 x¼0 1px þ y¼0 1py þ 1 h þ nðVÞ For both policies, the decision variable is the time to replace the system. Lin et al. [102] introduce the concept of two categories of failure modes: maintainable failure modes and nonmaintainable failure modes into the modeling of preventive maintenance (PM) activities. PM only reduces the hazard rate of maintainable failure modes of the system, but does not affect the hazard rate of nonmaintainable failure modes of the system. PM is performed at a sequence of intervals t1 ; t2 ; . . .; tN1 and replaced at tN : Minimal repair is performed at failures between PMs. Replacement of the system restores the system to the as good as new state. Let ha ðtÞ and hb ðtÞ denote the hazard rate of the nonmaintainable and maintainable failure modes of the system, respectively. Also, let ak and bk denote the adjustment factor in hazard rate of the maintainable failure modes after the kth PM and the adjustment factor in effective age due to the kth Qk1 PM, respectively, and Ak ¼ i¼0 ai: The system has the hazard rate due to maintainable failure rate Ak hb ðtÞ between the ðk 1Þth and the kth PMs. The hazard rate of the nonmaintainable failure rate in this time interval is ha ðtÞ: Thus, the hazard rate of the system in time interval ðtk1 ; tk Þ is hk ðtÞ ¼ ha ðtÞ þ Ak hb ðtÞ: The scheduled PM intervals xk and the effective age of the system just before the kth PM yk are such that

80

M.-S. Ouali et al.

yk ¼ xk þ bk1 yk1 ; xk ¼ yk bk1 yk1 : Hence the mean cost rate is P cr þ cpðN 1Þ þ cm Nk¼1 ½Hk ðyk Þ Hk ðbk1 yk1 Þ C¼ ; PN k¼1 ð1 bk Þyk þ yN where cm is the cost of minimal repair, cp is the cost of PM, cr is the cost of replacement, and Hk ðtÞ is the cumulative hazard rate of the system between the ðk 1Þth and the kth PMs. Two alternatives are used to determine the PM intervals. In the first one, optimal PM intervals are selected to minimize the mean cost rate, i.e., to PM intervals are decision variables in the optimization problem. In the second one, PM intervals are determined by the hazard rate limit. Seo and Bai [153] study a system with MR at failure, imperfect repair (they call it periodic overhaul), and replacement at the Nth overhaul similar to that of Zhang and Jardine [199]. Instead of (8), they assume that the hazard rate function hn ðtÞ in the nth overhaul period, i.e., during ððn 1ÞT; nT is given by hnþ1 ðvn ðTÞÞ ¼ hn ðVðvn1 ðTÞÞ; TÞ;

ð9Þ

where T is the scheduled interval between overhauls, vn ðTÞ is the virtual age right after the nth overhaul, and Vðv; TÞ is a virtual age function of the system that specifies the functional relationship between v and T: Then taking Vðv; XÞ ¼ v þ hX where 0 h 1; the virtual age at the nth overhaul becomes vn ðTÞ ¼ h1 nþ1 ½hn ðvn1 ðTÞ þ hTÞ: The expected cost rate over infinite time horizon is given for this model by P R vn1 ðTÞþT c1 Nn¼1 vn1 hn ðtÞdt þ ðN 1Þc2 þ c3 ðTÞ CðN; TÞ ¼ ; NT where c1 is the cost of MR at failure, c2 is the cost of a scheduled overhaul, and c3 is the cost of a replacement. Zequeira and Berenguer [198] study a system with MR at failure, periodic overhaul (they call it imperfect repair), and replacement at the Nth overhaul similar to that of Zhang and Jardine [199] and Seo and Bai [153]. However, they assume two types of failure modes: maintainable and non-maintainable and, instead of (8) and (9), they use the following system failure rate rk;T ðtÞ ¼ hðtÞ þ kðt ðk 1ÞTÞ þ pk ðtÞhðtÞ;

ðk 1ÞT t\kT;

ð10Þ

where pk ðtÞ is a function that models the dependence between maintainable and non-maintainable models for t 2 ððk 1ÞT; kT; k ¼ 1; 2; . . .; N; hðtÞ is the hazard rate of non-maintainable failure modes, and kðtÞ is the hazard rate of maintainable

A Survey of Replacement Models with Minimal Repair

81

failure modes if pk ðtÞ ¼ 0; t 0: The cost rate for an infinite time span is then given by P R kT cr þ ðN 1Þcp þ cm Nk¼1 ðk1ÞT rk;T ðtÞdt CðT; NÞ ¼ NT where cm is the cost of a MR, cp is the cost of a PM action, and cr is the cost of system replacement. Pascual et al. [139] propose a model similar to that of Zhang and Jardine [199] and further assume that repair and overhaul times are not negligible. Their system is also subject to an aging process according to relation (8). They consider an equipment that may suffer failures in time and MRs are applied when the equipment fails. It also may be subject to periodic overhauls. After some time T1 ; the equipment is replaced by a new one. Repairs and overhauls take Tr and To time units to be performed. Preventive actions are applied at equal time intervals Ts : Life-cycle duration is equivalent to an integer n number of periods between overhauls, that is, T1 ¼ nTs : Consequently, there are n 1 overhauls during a lifecycle. The expected global cost per unit time for this model is given by CðTs ; To ; Tr ; nÞ ¼

nr ðnTs Þðcc;d þ cc;i Þ þ ðn 1Þðco;d þ co;i Þ þ cR;g ; nTs

where co;i is the preventive intervention cost, co;d is the downtime cost for overhauls, cc;i is the corrective intervention cost, cc;d is the downtime cost for repairs, and cR;g is the overall replacement cost. All these costs are derived. A non-linear mixed integer formulation that minimizes the expected overall cost rate with respect to repair, overhaul and replacement times and the overhaul improvement factor is presented.

3.1.9 Multi-Unit Systems Uematsu and Nishida [188] deal with a replacement model where a system undergoes MR before time T and is replaced periodically at scheduled times kT; ðk ¼ 1; 2; . . .Þ: The system is made up of a large number of components. All of the components in the system are not necessarily used for all the time, and thus the system can go on working after the original failure of the particular component even if it is not repaired yet. In such a case, the system fails a relatively short time again when an attempt is made to use the failed component. This process continues on until the failed component is searched. Original failures of components constitute a main process. At each point of the main process an attempt is made to search the failure, the attempt succeeding with probability 1 r; independently of other attempts to search the main failure. If the attempt does not succeed, the failure recurs S times after the initial occurrence. These failures which are caused by a main failure are called subsidiary failures. Introducing the costs c1 to search a

82

M.-S. Ouali et al.

failure, c2 of a MR, and c3 of a replacement, the expected cost per unit time is given by RT c1 KðTÞ þ c2 0 kðT uÞHðuÞdu þ c3 ; CðTÞ ¼ T Rt where KðtÞ ¼ 0 kðuÞdu is the mean number of main failures and HðtÞ is the mean number of subsidiary failures. Sheu [155] first considers a two-unit system where unit 1 contains components C1 and C3 while unit 2 contains components C2 and C3 : Component Ci has life distribution Fi : The time to failure of the system follows the multivariate exponential distribution with Sf given by 1 ; x2 Þ ¼ F1 ðx1 ÞF2 ðx2 ÞF3 ðmaxðx1 ; x2 ÞÞ: Fðx The two-unit system is completely replaced whenever it reaches age T at a cost c0 : If component Ci fails at age y\T; it causes failure of unit i and undergoes MR. The cost of the mth MR is ci;m ðyÞ: Letting ki ðtÞ denote the failure rate function of Fi and Ni ðtÞ denote the number of MRs performed on the component Ci of age t; the expected long-run cost per unit of time is given by R T P3 c0 þ 0 i¼1 hi ðyÞki ðyÞdy ; CðTÞ ¼ T where hi ðyÞ ¼ E½ci;Ni ðyÞþ1 : Sheu [155] then extends his model to an n-unit system. Sheu [156] generalizes Sheu [155] to the case where the cost of the jth MR of a component Ci at age y\T is gi ðci ðyÞ; ci;j ðyÞÞ where ci ðyÞ is the age-dependent random part, ci;j ðyÞ is the deterministic part which depends on the age and the number of the MR to the component. The formula of the expected long-run cost per unit of time is unchanged, except that now hi ðyÞ ¼ ENi ðyÞ Eci ðyÞ ½gi ðci ðyÞ; ci;Ni ðyÞþ1 ðyÞÞ where Ni ðTÞ denotes the number of MRs performed on component i in the age interval ½0; TÞ: Monga et al. [113] is concerned with the reliability based design (RBD) of a mixed series-parallel system with deteriorative components. PM is performed on the system at times when the system reaches maximum allowed failure rate. If the system fails in between these intervals, then MRs are performed. The system comprises n subsystems in series. Each subsystem consists of ð1 þ mj Þ identical components in active redundancy. PM is performed at intervals xi ¼ ti ti1 : During x1 ; the component has undergone no PM and the component failure rate is aj0 hj ðx1 Þ with aj0 ¼ 1: At the end of the first interval the component undergoes PM and hence its failure rate is now increased to aj0 aj1 hj ðx2 Þ: In general form, the hazard rate of a component of the jth subsystem during ith interval is hji ðxi Þ ¼ R xi Qi1 h ðtÞdt ajk : Thus, rji ðxi Þ ¼ e 0 ji and the system reliAji hj ðxi Þ where Aji ¼ k¼0 1þmj o Qn n : Finally, the system ability can be written Rðxi Þ ¼ j¼1 1 1 rji ðxi Þ

A Survey of Replacement Models with Minimal Repair

83

0

hazard rate hðxi Þ ¼ RRðxðxiiÞÞ: Now, the system is either maintained preventively or replaced when the system failure rate reaches n; which lead to the constraint hðxi Þ n: The average annual cost of the system for i PM intervals, before the ith PM action is given by Pn Pn Pn j¼1 aj ð1 þ mj ÞACj þ ICj þ ði 1Þ j¼1 ð1 þ mj ÞMCj þ j¼1 CMji ; Cðm1 ; ...;mn Þ ¼ ti where aj is the assembly coefficient of a component in the jth subsystem, ACj is the acquisition cost of a component in the jth subsystem, IC is the installation cost of the system, MCj is the PM cost of subsystem j; and CMji is the expected MR cost for subsystem j during ½0; Ti : Note that the design problem consist in determining the optimal number ð1 þ mj Þ of parallel components in each subsystem. Sheu and Jhang [172] propose a two-phase maintenance policy for a group of N identical repairable units. They define the time-interval ð0; T as the first phase, and the time interval ðT; T þ W as the second phase. Units have two types of failures. Type I failures (minor failures) are removed by MRs (in both phases), whereas type II failures (catastrophic failures) are removed by replacements (in the first phase) or are left idle (in the second phase). During the second phase, if a unit fails at residual age y where y represents the time to failure from the instant it was beyond T; it is either left idle with probability ~ pðyÞ (type II failure), or it undergoes MR with probability ~ qðyÞ ¼ 1 ~ pðyÞ (type I failure). A group maintenance is conducted at time T þ W or upon the kth idle, whichever comes first. Let YðTÞ be the excess or residual life of a unit in use at T; denote its failure rate function by ~r ðtÞ and let Rt ~pðxÞ~rðxÞdx ~ : F ~p ðtÞ ¼ e 0 ~ ðtÞ and let W ; Let W1 ; W2 ; . . .; Wn be iid random variables with Sf F ~ p ð1Þ Wð2Þ ; . . .; WðnÞ be the corresponding order statistics. Denote by s ¼ min W; WðkÞ : Then, the expected duration of a replacement cycle is given by E½cycle ¼ T þ E½s; where E½s is shown to be W k1 Z iNi X i h N ~ ~ ~p ðtÞ F~p ðtÞ F E½s ¼ dt: i i¼0 0

Let c0 be the fixed cost of inspecting the units, then the expected cost per unit time for an infinite time span is given by CðT; W; kÞ ¼

BðTÞ þ E½Km þ E½Kd þ E½Ko þ E½Kr þ c0 ; T þ E½s

84

M.-S. Ouali et al.

where BðTÞ is the expectation of the total operational cost over ð0; T; Km is the total MR cost incurred over the time interval ðT; T þ s; Kd is the total downtime cost incurred over the time interval ðT; T þ s; Ko is the total overhaul cost at T þ s; and Kr is the total replacement cost at T þ s: All these expressions have been derived and are not reproduced here for brevity. The resulting long run average cost per unit time is, however, too complex and no optimization is attempted to derive the optimal values of T; W and k: We have already described the system studied by Sheu and Jhang [173] under an age replacement policy. Using the results (5), (6), and (7) of Savits [151], Sheu and Jhang [173] derive the expected long-run cost per unit time for the block policy from the expected long-run cost per unit time for the age policy as 8 ZT 1< JB ðTÞ ¼ R2 þ R1 VðTÞ þ ð1 þ VðT tÞÞ T: 0 ) N X 1 X eKi ðtÞ ½Ki ðtÞk Pk;i Hi ðtÞdt : nkþ1;i ðtÞqkþ1;i ri ðtÞ þ mkþ1;i ðtÞ k! i¼1 k¼0 Sheu and Jhang [173] compute also the expression of the total a-discounted cost for the block replacement policy from the total a-discounted cost for the age replacement policy. Sandve and Aven [150] study the optimal replacement problem of a monotone system comprising n components, where the components are ‘minimally’ repaired at failures. They assume that the system is replaced at the stopping time T so that 2 T 3 Z 4 dt5: E½cycle ¼ E 0

Three categories of replacement policies are investigated. 1. The system (all components) is replaced at fixed time intervals T: In this case, E½cycle ¼ T: 2. The system is replaced at time S or at the first failure after time T; whichever comes first. Let YT denote the time to the first component failure after T: Then the time to replace the system is minfYT ; Sg: In this case, the following approximation is used E½cycle T þ

ZS

e

Pn i¼1

½Ki ðtÞKi ðTÞ

dt;

T

where Ki ðtÞ is the mean value function of the number of failures of component i in ½0; tÞ when time is measured in operating time.

A Survey of Replacement Models with Minimal Repair

85

3. The system is replaced at a time which is dependent on the condition of the system. The states of components are monitored, and each time a component fails, a preventive replacement time is calculated. This replacement time depends on the history of the system, so that the system is replaced at this time or a new replacement time is calculated at the subsequent component failure, whichever comes first. Let Y1 \Y2 \Y3 \ denote the time where a component failure occurs and let Rm denote the replacement time from Ym which is based on the history up to time Ym : In this case, 3 2 ðYm þR Zm Þ^Ymþ1 1 X 7 6 E½cycle ¼ E4 5dt: m¼0

Ym

We have already described the system studied by Jhang and Sheu [80] under an age replacement policy. Using the results (5), (6), and (7) of Savits [151], Jhang and Sheu [80] derive the expected long-run cost per unit time for the block policy from the expected long-run cost per unit time for the age policy as 8 9 ZT N < = X 1 ½hi ðtÞqi ðtÞri ðtÞ þ mi ðtÞGðtÞdt : R2 þ R1 VðTÞ þ ½1 þ VðT tÞ JB ðTÞ ¼ ; T: i¼1 0

Jhang and Sheu [80] compute also the expression of the total a-discounted cost for the block replacement policy from the total a-discounted cost for the age replacement policy. Lin et al. [103] deal with the optimal design of a mixed series-parallel system with deteriorating components. Consider a system which consists of n subsystems in series. The jth subsystem is a parallel system of mj identical components for W j ¼ 1; 2; . . .; n: The system undergoes periodic PM at equal intervals of length Nþ1 where W is the time interval of the warranty period and N is the number of PM’s during the warranty period. MR is performed at failures between two successive PM’s during the warranty period. Imperfect repair is adopted to model the effect of PM. Let xk denote the age of the system immediately before the kth PM. It is assumed that the system ages from dk xk to xkþ1 ; where dk is the age reduction factor. Both free and pro-rata warranty policies are considered. Under a free warranty policy, PM cost is paid by the manufacturer; under a pro-rata warranty policy, PM cost to the manufacturer is proportional to the length of the remaining warranty period. Under both policies, the MR cost is paid by the manufacturer. Denote by c0 the installation or assembly cost of the system independent of the number of components in each subsystem, by cj the cost of a component in the jth subsystem, by cpj the cost of PM of a component in the jth subsystem, and by crj the cost of a MR of the jth subsystem. Then, under a free warranty policy, the total cost of the system is

86

M.-S. Ouali et al.

Cðm1 ; . . .; mn ; NÞ ¼ c0 þ

" n X j¼1

# Rsj ðdk xk Þ ; ðcj þ Ncpj Þmj þ crj ln Rsj ðdk xkþ1 Þ k¼0 N X

while under a pro-rata warranty policy, it is " # n N X X Ncpj Rsj ðdk xk Þ Cðm1 ; . . .; mn ; NÞ ¼ c0 þ cj þ crj ln mj þ : 2 Rsj ðdk xkþ1 Þ j¼1 k¼0 Here Rsj is the reliability of the jth subsystem. The optimal system design is a problem that considers the selection among various design configurations for minimal costs while meeting other constraints such as reliability. Thus, the objective is to select optimal values of m1 ; m2 ; . . .; mn ; and N to minimize the total cost of the system under the constraint that the system reliability at the end of warranty period is no less than R0 ; that is, n Y

Rsj ðxNþ1 Þ R0 :

j¼1

Li and Xu [97] study a multi-component repairable system implementing a coordinated random group replacement (CRGR) policy, which, at each fixed maintenance epoch, only replaces a subset of the components determined by the probability law of a multivariate stochastic indicator process. The remaining components not replaced at the epoch are left in their functioning conditions immediately prior to the epoch. They use a maintenance policy in which imperfect repairs are performed at failure for individual components (unplanned repairs). The performance measure is the multivariate counting process of unplanned component failures. Park and Yoo [137] compare three replacement policies for a group of N identical units with Sf FðxÞ: 1. In policy I, each unit undergoes MR at failure during ð0; T and all units are replaced at T: The expected cost rate of this policy is given by CðTÞ ¼

cr þ cf HðTÞ ; T

where cf is the cost of a MR, cr is the cost of a group replacement per unit, and HðtÞ represents the expected number of MRs during ð0; t per unit. 2. In policy II, the group replacement interval is divided into repair and waiting intervals. Each unit is minimally repaired on failure during ð0; t; no repair is made during ðt; t þ wÞ and failed units are left idle. All units are replaced at t þ w: The expected cost rate of this policy is given by Cðt; wÞ ¼

cr þ cf HðtÞ þ cd Lt ðwÞ ; tþw

A Survey of Replacement Models with Minimal Repair

87

where cd the downtime cost per unit and Lt ðwÞ represents the expected downtime R tþw FðtÞ: per unit and is given by Lt ðwÞ ¼ w t FðxÞdx= 3. In policy III, each unit undergoes MR at failure during ð0; s: Beyond s; no repair is made until the kth failure, 1 k N: All units are replaced on the kth failure beyond s: The expected cost rate of this policy is given by ðkÞ

cr þ cf HðsÞ þ cd DN ðsÞ ; ðkÞ s þ lN ðsÞ R1 P i Ni ðkÞ N k1 i N dx is the mean where DN ¼ FðtÞ i¼1 N t ½FðtÞ FðxÞ ½FðxÞ i Pk1 N R 1 ðkÞ i Ni N i¼0 downtime, lN ðtÞ ¼ FðtÞ dx is the mean t ½FðtÞ FðxÞ ½FðxÞ i is the unit’s Sf. waiting interval to a group replacement, and FðtÞ Cðs; kÞ ¼

3.1.10 Finite Time Horizon There have been not many papers that treated maintenances for a finite time span, because it is more difficult theoretically to discuss optimal policies for a finite time span. Nakagawa and Mizutani [127] study, among other models with a finite time span, two models with the MR assumption. 1. A periodic replacement with MR: a unit with cumulative hazard rate HðtÞ has to operate for a finite interval ½0; S: If it fails between replacements, MR is made. To maintain the unit, the interval S is partitioned equally into N parts in which it is replaced at periodic times kT; k ¼ 1; 2; . . .; N; where NT ¼ S: Let c1 be the cost of MR and c2 be the cost of planned replacement. Then, the total expected cost until time S is

S CðNÞ ¼ N c1 H þ c2 : N 2. An imperfect PM model with MR: A unit has to operate for a finite interval ½0; S: The PM is done at planned times Tk ; k ¼ 1; 2; . . .; N 1 and the unit is replaced at time TN S: Only MR is made at failures during ½0; S: The failure rate in the kth PM becomes bk hðtÞ when it was hðtÞ in the ðk 1Þth Qk1 bj PM, i.e., the failure rate is Bk hðtÞ for 0\t Tk Tk1 ; where Bk j¼0 ðk ¼ 1; 2; . . .; NÞ . Let c1 be the cost of each MR, c2 be the cost of each PM, and the cost of replacement at time S is c3 : Then, the total expected cost until replacement is,

88

M.-S. Ouali et al.

CðNÞ ¼ c1

N X

TkZ Tk1

Bk

k¼1

hðtÞdt þ ðN 1Þc2 : þ c3 :

0

In the case where the PM is periodic, this expression becomes S CðNÞ ¼ c1 Bk H þ ðN 1Þc2 : þ c3 : N k¼1 N X

Yun and Nakagawa [197] consider maintenance policies for products in which the economical life cycle of products is a random variable. The system is minimally repaired at failure before age T and it is replaced at age T: The expected present value of total maintenance cost is 2 3 ZT 1 X þ kTÞdBðtÞ5; CðTÞ ¼ þ c1 Gðt ekhT 4c2 GðkTÞ k¼0

0

where c1 is the minimal repair cost, c2 is the replacement cost, h is the discount rate, and GðtÞ and GðtÞ are distribution and survivor functions of the random life cycle. When the system is replaced at sequential times Tk ; the expected present value of total maintenance cost is given by 2 3 Tkþ1 Z Tk 1 X k Þ þ c1 þ Tk ÞdBðtÞ5: ehTk 4c2 GðT Gðt CðTÞ ¼ k¼0

0

3.1.11 Bayesian Perspective There are few papers which investigate Bayesian block replacement policies. We cite Mazzuchi and Soyer [108], Lim et al. [101], Chen and Popova [34], Pulcini [143], Juang and Anderson [81], Kim et al. [90].

3.2 N-policy 3.2.1 General Repair Models Bartholomew-Biggs et al. [16] deal with the problem of scheduling sequential imperfect PM of some equipment. They use Kijima’s models in which each

A Survey of Replacement Models with Minimal Repair

89

application of PM reduces the equipment’s effective age. When a system failure occurs, minimal repair takes place instantly and PM is completed instantly. The system may have two categories of failure modes, i.e., maintainable and nonmaintainable. The failure rate for non-maintainable parts of the system is not affected by minimal repair, PM or system failure. The failure rate for maintainable parts of the system is not changed by minimal repair but it is changed whenever a PM is performed. Letting tk represent the time duration from t ¼ 0 to the time of the kth PM and N represent the total number of PM performed, i.e., the Nth PM is a system replacement, the mean lifetime cost is given by cr cm PN j¼1 Hj1 ðtk Þ cp þ ðN 1Þ þ cp CðNÞ ¼ tN where Hj1 ðtÞ is the cumulative failure rate from tj1 to t when PM occurs at t; tj1 t tj ; cp is the cost of a PM, cm is the cost of a minimal repair, and cr is the cost of a system replacement. Note that the decision variable is the number of PM performed.

3.3 (N, T)-policy 3.3.1 Shock Models Nakagawa and Kijima [125] apply the periodic replacement with MR at failure to cumulative damage models. A unit fails with probability pðzÞ when the total amount of damage becomes z; and undergoes only MR at failures. To prevent failures, a unit is replaced at time T; at shock N or at damage Z; whichever occurs first. Letting FðtÞ denote the Cdf of the time between the successive shocks and GðtÞ denote the amount of damage to the unit produced by a shock, the mean time to replacement is given by

E½cycle ¼

N1 X

ZT h i G ðZÞ F ðjÞ ðtÞ F ðjþ1Þ ðtÞ dt; ðjÞ

j¼0

0

while the mean cost-rate is given by

CðT; N; ZÞ ¼ PN1 j¼0

AðT; N; ZÞ ; RT ðjÞ ðjþ1Þ ðtÞdt 0 ½F ðtÞ F

GðjÞ ðZÞ

90

M.-S. Ouali et al.

where AðT; N; ZÞ ¼ c1

N1 X

ðjÞ

F ðTÞ

j¼1

ZZ

pðzÞdGðjÞ ðzÞ þ c2

N1 X

h i GðjÞ ðZÞ F ðjÞ ðTÞ F ðjþ1Þ ðTÞ

j¼0

0 N X

þ c3 F ðNÞ ðTÞGðNÞ ðZÞ þ c4

h i F ðjÞ ðTÞ Gðj1Þ ðZÞ GðjÞ ðZÞ ;

j¼1

and c1 is the cost of MR at failure, and c2 ; c3 ; c4 are the costs of scheduled replacement at time T; at shock N; at damage Z; respectively. Sheu and Chang [163] propose a sequential PM policy of a system subject to shocks. The shocks arrive according to a NHPP fNi ðtÞ; t 0g; whose intensity function ri ðtÞ varies with the number of maintenance actions ði 1Þ that have already been carried out, and the time t that has elapsed since the last maintenance action. Upon the arrival of the kth shock, the system is maintained or repaired minimally with probability hi;k and qi;k respectively depending on the number of maintenance actions ði 1Þ that have already occurred and the ordinal number of the arriving shock since the last maintenance. In addition, a planned maintenance is carried out as soon as Ti time units have elapsed since the ði 1Þth maintenance action. If i ¼ N; i;k the known probability the system is replaced rather than maintained. Denoting by P that the first k shocks are type I failure for a system subjected to ði 1Þ mainteRt nance(s) and by and Ki ðtÞ ¼ 0 ri ðsÞds; the expected length of the cycle for this model is given by N Z X

Ti

E½cycle ¼

i¼1

i ðyÞdy; G

0

k P K ðyÞ i ðyÞ ¼ 1 Hk ðKi ðyÞÞP i;k and Hk ðKi ðyÞÞ ¼ e i ½Ki ðyÞ : The expected where G k¼0 k! cost per unit of time is E½R1 CðN; fTi gÞ ¼ PN R Ti ; i ðyÞdy G

i¼1 0

where E½R1 ¼ ðN 1Þco þ cR þ

N X

cB Gi ðTi Þ

i¼1 N Z X 1 X Ti

þ

i¼1

Also, qi;kþ1 ¼

i;kþ1 P i;k ; P

0

i;k dt: gi;kþ1 ðtÞqi;kþ1 ri ðtÞHk ðKi ðtÞÞP

k¼0

co is the cost of a planned maintenance, co þ cB is the cost of

an unplanned maintenance, cR is the cost of a planned replacement, cR þ cB is the

A Survey of Replacement Models with Minimal Repair

91

cost of an unplanned maintenance, gi ðci ðyÞ; ci;k ðyÞÞ is the cost of the kth MR of a system subjected to ði 1Þ maintenance(s) at time y\Ti ; where ci ðyÞ is the agedependent random part, ci;k ðyÞ is the deterministic part which depends on the age and the number of the MR, and gi;k ðtÞ ¼ Eci ðtÞ ½gi ðci ðtÞ; ci;k ðtÞ: Qian et al. [145] consider replacement and MR polices for an extended cumulative damage model with maintenance at each shock. Shocks occur according to a NHPP with intensity rðtÞ and mean value function KðtÞ ¼ Rt 0 rðuÞdu: The replacement policy can be summarized as follows: 1. When the total damage x does not exceed a failure level K; the system undergoes maintenance at each shock, and the maintenance cost is c2 þ c0 ðx). 2. When the total damage has reached a failure level K; the system fails and undergoes MR at each failure. The MR cost is c3 ¼ c2 þ c0 ðKÞ: 3. The system is replaced at periodic time T; or at Nth failure, whichever occurs first, and the replacement cost is c1 : An amount of damage due to the jth shock has Cdf Gj ðxÞ and the total damage up to the jth damage shock has Cdf GðjÞ ðxÞ ¼ G1 G2 Gj ðxÞ; where and the asterisk represents the Stieltjes convolution. In this case, the mean time to replacement is given by ZT 1 X ðjNþ1Þ E½cycle ¼ G ðKÞ Hj ðtÞdt; j¼0

0

j

KðtÞ where Hj ðtÞ ¼ ½KðtÞ : Also, the expected cost rate is found to be j! e

CðT; NÞ ¼ P1

j¼0

E½R1 GðjNþ1Þ ðKÞ

RT 0

Hj ðtÞdt

;

where E½R1 ¼ c1 þ

1 X

K

Hj ðtÞ

j¼1

þ c3

1 X j¼1

j Z X i¼1

Hj ðtÞ

½c2 þ c0 ðxÞdGðiÞ ðxÞ

0

j h X

i 1 GðiÞ ðKÞ :

i¼jNþ2

4 Conclusions This paper compiles an exhaustive list of research dealing with the mathematical modeling of the notion of minimal repair. There are already many surveys that complement each other and, among them, provide a comprehensive explication

92

M.-S. Ouali et al.

and bibliography of research into maintenance and replacement models. But there is no survey that is devoted specifically to the topic of minimal repair. In this survey, we have tried as much as possible to avoid any overlap with the already published surveys. Only a few papers, mainly those of historical interest, may have been cited in the other surveys too. In the present survey, the already published replacement models dealing with minimal repair are mainly grouped into two categories: age replacement models, and block replacement models. The identified models are divided into three subcategories (i.e. T, N, and (N,T) policies), for the age and the block replacement categories, respectively. We note very few papers concern block replacement model with minimal repair under only the N-policy. The age T of the device, the number of intervening minimally N, and both of them (N,T) are mainly used to trigger a replacement in the context of T, N, and (N,T)-policies, respectively. Published replacement models for both age and block replacement models under the T-policy sub-category are more abundant in the literature than the N, and the (N,T)-policies. It is all so the case when comparing the number of replacement models devoted to one single component system versus multi-component system. These policies become promptly complex to model and need more perfected tools to solve them when considering a multi-component system. This survey of replacement models with minimal repair has summarized more than 200 papers published between 1958 and 2008. It is clearly shown that the interest of researches over these five later decades has been moved progressively from one single component system to multi-component. The notion of minimal repair is well studied in the context of mono-component systems. As Murthy [116, 118] has mentioned, we think that the minimum repair is a suitable model for complex systems. The failure rate of the system will remain substantially the same after the replacement of a failed component by a new and identical one. Moreover, the trend in the last recent years is to consider further utilization of the minimal repair as a fundamental principle for modeling multi-component systems.

References 1. Aghezzaf EH, Najid NM (2008) Integrated production planning and preventive maintenance in deteriorating production systems. Inf Sci 178:3382–3392 2. Agustin MZ (2002) Generalized hazard-based goodness-of-fit tests for a repairable system. J Stat Comput Simul 72:519–531 3. Ahmadi J, Arghami NR (2001) Some univariate stochastic orders on record values. Commun Stat Theory Methods 30:69–74 4. Ahmadi J, Balakrishnan N (2005) Preservation of some reliability properties by certain record statistics. Statistics 39:347–354 5. Arjas E (2002) Predictive inference and discontinuities. J Nonparametr Stat 14:31–42 6. Aven T, Castro IT (2008) A minimal repair replacement model with two types of failure and a safety constraint. Eur J Oper Res 188:506–515 7. Bae J, Lee EY (2001) A repair policy with limited number of minimal repairs. J Nonparametr Stat 13:153–163

A Survey of Replacement Models with Minimal Repair

93

8. Bagai I, Jain K (1994) Improvement, deterioration, and optimal replacement under agereplacement with minimal repair. IEEE Trans Reliab 43:156–162 9. Bai J, Pham H (2004) Discounted warranty cost of minimally repaired series systems. IEEE Trans Reliab 53:37–42 10. Bai DS, Yun WY (1986) An age replacement policy with minimal repair cost limit. IEEE Trans Reliabil R-35:452–454 11. Baik J, Murthy DNP, Jack N (2004) Two-dimensional failure modeling with minimal repair. Nav Res Logist 51:345–362 12. Baik J, Murthy DNP, Jack N (2006) Erratum: Two-dimensional failure modeling with minimal repair. Nav Res Logist 53:115–116 13. Baratpour S, Ahmadi J, Arghami NR (2007) Some characterizations based on entropy of order statistics and record values. Commun Stat Theory Methods 36:47–57 14. Barlow R, Hunter L (1960) Optimum preventive maintenance policies. Oper Res 8:90–110 15. Barlow R, Proschan F (1975) Mathematical theory of reliability. Wiley, New York 16. Bartholomew-Biggs M, Zuo MJ, Li XM (2009) Modelling and optimizing sequential imperfect preventive maintenance. Reliab Eng Syst Saf 94:53–62 17. Baxter LA (1982) Reliability applications of the relevation transform. Nav Res Logist Q 29:323–330 18. Beichelt F (1981) A generalized block-replacement policy. IEEE Trans Reliab R30:171–172 19. Beichelt F (1992) A general maintenance model and its application to repair limit replacement policies. Microelectron Reliab 32:1185–1196 20. Block HW, Borges WS, Savits TH (1985) Age-dependent minimal repair. J Appl Probab 22:370–385 21. Brown M, Proschan F (1983) Imperfect repair. J Appl Probab 20:851–859 22. Butani NL (1991) Replacement policies based on minimal repair and cost limit criterion. Reliab Eng Syst Saf 32:349–355 23. Caballero SM (2006) Weibull point process applied to repairable systems (an application to Cuban sugar industry). http://www.maintenanceworld.com/Articles/caballeros/weibullpoint. html. Accessed 06 March 2010 24. Cassady CR, Kutanoglu E (2005) Integrating preventive maintenance planning and production scheduling for a single machine. IEEE Trans Reliab 54:304–309 25. Cassady CR, Murdock, WP, Pohl EA (2001) Selective maintenance for support equipment involving multiple maintenance actions. Eur J Oper Res 129:252–258 26. Castanier B, Bérenger C, Grall A (2003) A sequential condition–based repair/replacement policy with non-periodic inspections for a system subject to continuous wear. Appl Stoch Models Bus Ind 19:327–347 27. Cha JH (2005) On optimal burn-in procedures—a generalized model. IEEE Trans Reliab 54:198–206 28. Cha JH (2006) An extended model for optimal burn-in procedures. IEEE Trans Reliab 55:189–198 29. Cha JH, Kim JJ (2002) On the existence of the steady state availability of imperfect repair model. Sankhya: Sankhya: Ind J Stat 64B:76–81 30. Cha JH, Mi J (2007) Some probability functions in reliability and their applications. Nav Res Logist 54:128–135 31. Cha JH, Lee S, Mi J (2004) Bounding the optimal burn-in time for a system with two types of failure. Nav Res Logist 51:1090–1101 32. Chen J-A, Chien Y-H (2007) Renewing warranty and preventive maintenance for products with failure penalty post-warranty. Qual Reliab Eng Int 23:107–121 33. Chen M, Chien Y-H (2007) Optimal spare ordering policy for preventive replacement with age-dependent minimal repair under cost effectiveness criterion. In: Proceedings of the IEEE international conference on industrial engineering and engineering management IEEE, pp 636–639

94

M.-S. Ouali et al.

34. Chen TM, Popova E (2000) Bayesian maintenance policies during a warranty period. Commun Stat Stoch Models 16:121–142 35. Chen Y (2006) Optimal inspection and economical production quantity strategy for an imperfect production process. Int J Syst Sci 37:295–302 36. Chen Y, Jin J (2003) Cost-variability-sensitivity preventive maintenance considering management risk. IEE Trans 35:1091–1101 37. Chien Y-H (2005) Determining optimal warranty periods from the seller’s perspective and optimal out-of-warranty replacement age from the buyer’s perspective. Int J Syst Sci 36:631–637 38. Chien Y-H (2005) Generalized spare ordering policies with allowable inventory time. Int J Syst Sci 36:823–832 39. Chien Y-H (2008) A general age-replacement model with minimal repair under renewing free-replacement warranty. Eur J Oper Res 186:1046–1058 40. Chien Y-H (2008) Optimal age-replacement policy under an imperfect renewing freereplacement warranty. IEEE Trans Reliab 57:125–133 41. Chien YH, Chen JA (2007) Optimal age-replacement policy for repairable products under renewing free-replacement warranty. Int J Syst Sci 38:759–769 42. Chien YH, Chen JA (2007) Optimal age-replacement model with minimal repair based on cumulative repair cost limit and random lead-time. In: Proceedings of the IEEE international conference on industrial engineering and engineering management IEEE, pp 636–639 43. Chien Y-H, Sheu S-H (2006) Extended optimal age-replacement policy with minimal repair of a system subject to shocks. Eur J Oper Res 174:169–181 44. Chien Y-H, Sheu S-H, Chang C-C (2009) Optimal age-replacement time with minimal repair based on cumulative repair cost limit and random lead time. Int J Syst Sci 40:703–715 45. Chien Y-H, Sheu S-H, Zhang ZG, Love E (2006) An extended optimal replacement model of systems subject to shocks. Eur J Oper Res 175:399–412 46. Cho DI, Parlar M (1991) A survey of maintenance models for multi-unit systems. Eur J Oper Res 51:1–23 47. Chukova S, Johnston MR (2006) Two-dimensional warranty repair strategy based on minimal and complete repair. Math Comp Modell 44:1133–1143 48. Chung KJ (1995) A note on the upper bound and an algorithm for the optimal repair cost under minimal repair. Int J Qual Reliab Manag 12:85–88 49. Cui L, Kuo W, Loh HT, Xie M (2004) Optimal allocation of minimal and perfect repairs under resource constraints. IEEE Trans Reliab 53:193–199 50. Dagpunar JS (1997) The effect of minimal repairs on economic lot-sizing. Microelectron Reliab 37:417–419 51. Das KK, Acharya D (1988) Optimal replacement policy for induced draft fans. Microelectron Reliab 28:519–523 52. Dayanik S, Gürler U (2002) An adaptive Bayesian replacement policy with minimal repair. Oper Res 50:552–558 53. Dekker R (1995) Integrating optimisation, priority setting, planning and combining of maintenance activities. Eur J Oper Res 82:225–240 54. Dekker R, Wildeman R, Van der Duyn Schouten FA (1997) A review of multi-component maintenance models with economic dependence. Math Methods Oper Res 45:411–435 55. Deshpande JV, Singh H (1995) Optimal replacement of improving and deteriorating repairable systems. IEEE Trans Reliab 44:500–504 56. Dimitrov B, Chukova S, Khalil Z (2004) Warranty costs: an age-dependent failure/repair model. Nav Res Logist 51:959–976 57. Dohi T, Kaio N, Osaki S (1998) Minimal repair policies for an economic manufacturing process. J Qual Maint Eng 4:248–262 58. Doyen L, Gaudoin O (2004) Classes of imperfect repair models based on reduction of failure intensity or virtual age. Reliab Eng Syst Saf 84:45–56

A Survey of Replacement Models with Minimal Repair

95

59. Endrenyi J, Aboresheid S, Allan RN, Anders GJ, Asgarpoor S, Billinton R, Chowdhury N, Dialynas EN, Fipper M, Fletcher RH, Grigg C, McCalley J, Meliopoulos S, Mielnik TC, Nitu P, Rau ND, Reppen ND, Salvaderi L, Schneider A, Singh Ch (2001) The present status of maintenance strategies and the impact of maintenance on reliability: A report of the IEEE/PES task force on impact of maintenance strategy on reliability of the reliability, risk and probability applications subcommittee. IEEE Trans Power Syst 16:638–646 60. Finkelstein M (2007) Imperfect repair and lifesaving in heterogeneous populations. Reliab Eng Syst Saf 92:1671–1676 61. Gasmi S, Love CE, Kahle W (2003) A general repair, proportional-hazards, framework to model complex repairable systems. IEEE Trans Reliab 52:26–32 62. Gross D, Harris CM (1998) Fundamentals of queueing theory, 3rd edn. Wiley, New York 63. Guérin F, Dumon B, Lantieri P (2004) Accelerated life testing on repairable systems. In: Proceedings of the annual reliability and maintainability symposium (RAMS), pp 340–345 65. Gupta RC, Kirmani SNUA (1988) Closure and monotonicity properties of nonhomogeneous Poisson processes and record values. Probab Eng Inf Sci 2:475–484 65. Gupta RC, Kirmani SNUA (1989) On predicting repair times in a minimal repair process. Commun Stat Simul Comput 18:1359–1368 66. Hariga M, Azaiez N (2006) Heuristic procedures for the single facility problem with minimal repair and increasing failure rate. J Oper Res Soc 57:1081–1088 67. Heidergott B (1999) Optimisation of a single-component maintenance system: a smoothed perturbation analysis approach. Eur J Oper Res 119:181–190 68. Hsu L-F (1999) Simultaneous determination of preventive maintenance and replacement policies in a queue-like production system with minimal repair. Reliab Eng Syst Saf 63:161–167 69. Hsu L-F, Kuo S (1994) Optimal replacement schedules in a queue-like production system with minimal repair. Reliab Eng Syst Saf 46:189–198 70. Iskandar BP, Murthy DNP, Jack N (2005) A new repair–replace strategy for items sold with a two-dimensional warranty. Comp Oper Res 32:669–682 71. Ja S-S, Kulkarni VG, Mitra A, Patankar JG (2001) A nonrenewable minimal repair warranty policy with time-dependent costs. IEEE Trans Reliab 50:346–352 72. Ja S-S, Kulkarni VG, Mitra A, Patankar JG (2002) Warranty reserves for nonstationary sales processes. Nav Res Logist 49:499–513 73. Jack N, Iskandar BP, Murthy DNP, Boondiskulchok R (2009) A repair–replace strategy based on usage rate for items sold with a two-dimensional warranty. Eur J Oper Res 174:201–215 74. Jaturonnatee J, Murthy DNP, Boondiskulchok R (2006) Optimal preventive maintenance of leased equipment with corrective minimal repairs. Reliab Eng Syst Saf 94:611–617 75. Jensen U (1991) Stochastic models of reliability and maintenance: an overview. In: Özekici S (ed) Reliability and maintenance of complex systems. Springer, Berlin 76. Jhang JP (2005) A study of the optimal use period and number of minimal repairs of a repairable product after the warranty expires. Int J Syst Sci 36:697–704 77. Jhang JP (2005) The optimal used period of repairable product with lead-time after the warranty expiry. Int J Syst Sci 36:423–431 78. Jhang JP (2001) A generalized age replacement policy with random delivery time and inspection. Int J Syst Sci 32:321–329 79. Jhang JP, Sheu S-H (1999) Opportunity-based age replacement policy with minimal repair. Reliab Eng Syst Saf 64:339–344 80. Jhang JP, Sheu S-H (2000) Optimal age and block replacement policies for a multicomponent system with failure interaction. Int J Syst Sci 31:593–603 81. Juang M-G, Anderson G (2004) A Bayesian method on adaptive preventive maintenance problem. Eur J Oper Res 155:455–473 82. Jung KM, Han SS, Park DH (2008) Optimization of cost and downtime for replacement model following the expiration of warranty. Reliab Eng Syst Saf 93:995–1003

96

M.-S. Ouali et al.

83. Jung KM, Park M, Park DH (2010) System maintenance cost dependent on life cycle under renewing warranty policy. Reliab Eng Syst Saf 95:816–821 84. Kahle W(2007) Optimal maintenance policies in incomplete repair models. Reliab Eng Syst Saf 92:563–565 85. Kaminskiy M, Krivtsov V (2006) A Monte Carlo approach to estimation of g-renewal process in warranty data analysis. Reliab Theory Appl 1:29–31 86. Kapur PK, Garg RB (1989) Optimal number of minimal repairs before replacement with repair cost limit. Reliab Eng Syst Saf 26:35–46 87. Kijima M (1989) Some results for repairable systems with general repair. J Appl Probab 26:89–102 88. Kijima M, Morimura H, Suzuki Y (1988) Periodical replacement problem without assuming minimal repair. Eur J Oper Res 37:194–203 89. Kim KO, Kuo W (2009) Optimal burn-in for maximizing reliability of repairable non-series systems. Eur J Oper Res 193:140–151 90. Kim HS, Kwon YS, Park DH (2007) Adaptive sequential preventive maintenance policy and Bayesian consideration. Commun Stat Theory Methods 36:1251–1269 91. Kirmani SNUA, Gupta RC (1992) Some moment inequalities for the minimal repair process. Probab Eng Inf Sci 6:245–255 92. Kochar SC (1996) Some results on interarrival times of nonhomogeneous Poisson processes. Probab Eng Inf Sci 10:75–85 93. Kumar D (1995) Proportional hazards modelling of repairable systems. Qual Reliab Eng Int 11:361–369 94. Lai M-T, Leu B-Y (1996) An economic discrete replacement policy for a shock damage model with minimal repairs. Microelectron Reliab 36:1347–1355 95. Lam Y (2003) A geometric process maintenance model. Southeast Asian Bull Math 27:295–305 96. Levitin G, Lisnianski A (2000) Optimization of imperfect preventive maintenance for multistate systems. Reliab Eng Syst Saf 267:193–203 97. Li H, Xu SH (2004) On the coordinated random group replacement policy in multivariate repairable systems. Oper Res 52:464–477 98. Lim D-H, Lie CH (2000) Analysis of system reliability with dependent repair modes. IEEE Trans Reliab 49:153–162 99. Lim D-H, Park DH (1999) Evaluation of average maintenance cost for imperfect-repair model. IEEE Trans Reliab 48:199–204 100. Lim J-H, Kim D-K, Park DH (2005) Cost evaluation for an imperfect repair model with random repair time. Int J Syst Sci 36:717–726 101. Lim J-H, Lu K-L, Park DH (1998) Bayesian imperfect repair model. Commun Stat Theory Methods 27:965–984 102. Lin D, Zuo MJ, Yam RCM (2001) Sequential imperfect preventive maintenance models with two categories of failure modes. Nav Res Logist 48:172–183 103. Lin D, Zuo MJ, Yam RCM, Meng MQ-H (2000) Optimal system design considering warranty, periodic preventive maintenance, and minimal repair. J Oper Res Soc 51:869–874 104. Lindqvist BH (1999) Repairable systems with general repair. In: Proceedings of European Safety and Reliability Conference, Munich, Germany, pp 13–17 105. Lugtigheid D, Jardine AKS, Jiang X (2007) Optimizing the performance of a repairable system under a maintenance and repair contract. Qual Reliab Eng Int 23:943–960 106. Makis V (1998) Optimal lot-sizing/preventive replacement policy for an EMQ model with minimal repairs. Int J Logist Appl 1:173–180 107. Makis V, Jardine AKS (1991) Optimal replacement of a system with imperfect repair. Microelectron Reliab 31:381–388 108. Mazzuchi TA, Soyer R (1996) A Bayesian perspective on some replacement strategies. Reliab Eng Syst Saf 51:295–303 109. McCall JJ (1965) Maintenance policies for stochastically failing equipment: a survey. Manag Sci 11:493–524

A Survey of Replacement Models with Minimal Repair

97

110. Mettas A, Zhao W (2005) Modeling and analysis of repairable systems with general repair. In: Proceedings annual reliability and maintainability symposium, Alexandria, Virginia, USA 111. Mohandas K, Chaudhuri D, Rao BVA (1992) Optimal periodic replacement for a deteriorating production system with inspection and minimal repair. Reliab Eng Syst Saf 37:73–77 112. Monga A, Zuo MJ (1998) Optimal system design considering maintenance and warranty. Comp Oper Res 25:691–705 113. Monga A, Zuo MJ, Toogood R (1995) System design with deteriorative components for minimal life cycle costs. IEEE Int Conf Syst Man Cybern 2:1843–1848 114. Montoro-Cazorla D, Pérez-Ocón R (2006) Reliability of a system under two types of failures using a Markovian arrival process. Oper Res Lett 34:525–530 115. Morse PC (1958) Queues, inventories, and maintenance. Wiley, New York 116. Murthy DNP (1991) A note on minimal repair. IEEE Trans Reliab 40:245–246 117. Murthy DNP, Asgharizadeh E (1999) Optimal decision making in a maintenance service operation. Eur J Oper Res 116:259–273 118. Murthy DNP, Djamaludin I, Wilson RJ (1995) A consumer incentive warranty policy and servicing strategy for products with uncertain quality. Qual Reliab Eng Int 11:155–163 119. Muth EJ (1977) An optimal decision rule for repair vs replacement. IEEE Trans Reliab R26:179–181 120. Nahas N, Khatab, A, Ait-Kadi D, Nourelfath M (2008) Extended great deluge algorithm for the imperfect preventive maintenance optimization of multi-state systems. Reliab Eng Syst Saf 93:1658–1672 121. Nakagawa T (1979) Imperfect preventive maintenance. IEEE Trans Reliab 28:402–402 122. Nakagawa T (1981) Modified periodic replacement with minimal repair at failure. IEEE Trans Reliab R-30:165–168 123. Nakagawa T (1988) Sequential imperfect preventive maintenance policies. IEEE Trans Reliab 37:295–298 124. Nakagawa T (2005) Maintenance theory of reliability. Springer, London 125. Nakagawa T, Kijima M (1989) Replacement policies for a cumulative damage model with minimal repair at failure. IEEE Trans Reliab 38:581–584 126. Nakagawa T, Kowada M (1983) Analysis of a system with minimal repair and its application to replacement policy. Eur J Oper Res 12:176–182 127. Nakagawa T, Mizutani S (2009) A summary of maintenance policies for a finite interval. Reliab Eng Syst Saf 94:89–96 128. Nakagawa T, Yasui K (1991) Periodic-replacement models with threshold levels. IEEE Trans Reliab 40:395–397 129. Newby M (2008) Monitoring and maintenance of spares and one shot devices. Reliab Eng Syst Saf 93:588–594 130. Ohnishi M, Morioka T, Ibaraki T (1994) Optimal minimal-repair and replacement problem of discrete-time Markovian deterioration system under incomplete state information. Comp Ind Eng 27:409–412 131. Park KS (1979) Optimal number of minimal repairs before replacement. IEEE Trans Reliab R-28:137–140 132. Park KS (1983) Cost limit replacement policy under minimal repair. Microelectron Reliab 23:347–349 133. Park KS (1985) Pseudo-dynamic cost limit replacement model under minimal repair. Microelectron Reliab 25:573–579 134. Park KS (1985) Optimal number of major failures before replacement. Microelectron Reliab 25:797–805 135. Park YT, Park KS (1985) Optimal stocking for replacement with minimal repair. Microelectron Reliab 25:147–155 136. Park KS, Park YT (1986) Ordering policies under minimal repair. IEEE Trans Reliab R35:82–84

98

M.-S. Ouali et al.

137. Park KS, Yoo YK (2004) Comparison of group replacement policies under minimal repair. Int J Syst Sci 35:179–184 138. Pascual R, Ortega JH (2006) Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts. Reliab Eng Syst Saf 91:241–248 139. Pascual R, Meruane V, Rey PA (2008) On the effect of downtime costs and budget constraint on preventive and replacement policies. Reliab Eng Syst Saf 93:144–151 140. Pham H, Wang H (1996) Imperfect maintenance. Eur J Oper Res 94:425–438 141. Pierskalla WP, Voelker JA (1976) A survey of maintenance models: the control and surveillance of deteriorating systems. Nav Res Logist Q 23:353–388 142. Popova E, Popova I (2008) Replacement strategies. In: Ruggeri F, Faltin F, Kenett R (eds) Encyclopedia of statistics in quality and reliability. Wiley, Chichester 143. Pulcini G (2001) On the prediction of future failures for a repairable equipment subject to overhauls. Commun Stat Theory Methods 30:691–706 144. Purohit SG (1994) Testing for the minimal repair model versus additional damage at failures. Commun Stat Simul Comput 23:89–107 145. Qian C, Nakamura S, Nakagawa T (2003) Replacement and minimal repair policies for a cumulative damage model with maintenance. Comp Math Appl 46:1111–1118 146. Rinsaka K, Sandoh H (2006) A stochastic model on an additional warranty service contract. Comp Math Appl 51:179–188 147. Ross SM (1970) Applied probability models with optimization applications. Holden-Day, San Francisco 148. Sahin I, Polatoglu H (1996) Maintenance strategies following the expiration of warranty. IEEE Trans Reliab 45:220–228 149. Samrout M, Chatelet E, Kouta R, Chebbo N (2009) Optimization of maintenance policy using the proportional hazard model. Reliab Eng Syst Saf 94:44–52 150. Sandve K, Aven T (1999) Cost optimal replacement of monotone, repairable systems. Eur J Oper Res 116:235–248 151. Savits TH (1988) A cost relationship between age and block replacement policies. J Appl Probab 25:789–796 152. Scarsini M, Shaked M (2000) On the value of an item subject to general repair or maintenance. Eur J Oper Res 122:625–637 153. Seo JH, Bai DS (2004) An optimal maintenance policy for a system under periodic overhaul. Math Comp Modell 39:373–380 154. Sherif YS, Smith ML (1981) Optimal maintenance models for systems subject to failure—a review. Nav Res Logist Q 28:47–74 155. Sheu S-H (1990) Periodic replacement when minimal repair costs depend on the age and the number of minimal repairs for a multi-unit system. Microelectron Reliab 30:713–718 156. Sheu S-H (1991) Periodic replacement with minimal repair at failure and general random repair cost for a multi-unit system. Microelectron Reliab 31:1019–1025 157. Sheu S-H (1991) A general age replacement model with minimal repair and general random repair cost. Microelectron Reliab 31:1009–1017 158. Sheu S-H (1997) A generalized model for jointly determining the optimal ordering point and the optimal number of minimal repairs before replacement. Int J Syst Sci 28:759–766 159. Sheu S-H (1997) An optimal ordering policy of a system subject to shocks. Int J Syst Sci 28:241–247 160. Sheu S-H (1998) A generalized age and block replacement of a system subject to shocks. Eur J Oper Res 108:345–362 161. Sheu S-H (1999) A general ordering policy with number-dependent minimal repair and random lead-time. Ann Oper Res 91:227–250 162. Sheu S-H (1999) Extended optimal replacement model for deteriorating systems. Eur J Oper Res 112:503–516 163. Sheu S-H, Chang T-H (2002) Generalized sequential preventive maintenance policy of a system subject to shocks. Int J Syst Sci 33:267–276

A Survey of Replacement Models with Minimal Repair

99

164. Sheu S-H, Chen J-A (2004) Optimal lot-sizing problem with imperfect maintenance and imperfect production. Int J Syst Sci 35:69–77 165. Sheu S-H, Chien Y-H (2004) Optimal age-replacement policy of a system subject to shocks with random lead-time. Eur J Oper Res 159:132–144 166. Sheu S-H, Chien Y-H (2004) Minimizing cost-functions related to both burn-in and fieldoperation under a generalized model. IEEE Trans Reliab 53:435–439 167. Sheu S-H, Chien Y-H (2005) Optimal burn-in time to minimize the cost for general repairable products sold under warranty. Eur J Oper Res 163:445–461 168. Sheu S-H, Griffith WS (1992) Multivariate imperfect repair. J Appl Probab 29:947–957 169. Sheu S-H, Griffith WS (1996) Optimal number of minimal repairs before replacement of a system subject to shocks. Nav Res Logist 43:319–333 170. Sheu S-H, Griffith WS (2001) Optimal age-replacement policy with age-dependent minimal repair and random-lead-time. IEEE Trans Reliab 50:302–309 171. Sheu S-H, Griffith WS (2002) Extended block replacement policy with shock models and used items. Eur J Oper Res 140:50–60 172. Sheu S-H, Jhang J-P (1996) A generalized group maintenance policy. Eur J Oper Res 96:232–247 173. Sheu S-H, Jhang J-P (1998) Optimal age and block replacement policies for a multicomponent system with a shock type failure interaction. Int J Syst Sci 29:805–817 174. Sheu S-H, Kuo C-M (1994) Optimization problems in k-out-of-n systems with minimal repair. Reliab Eng Syst Saf 44:77–82 175. Sheu S-H, Liou C-T (1992) An age replacement policy with minimal repair and general random repair cost. Microelectron Reliab 32:1283–1289 176. Sheu S-H, Yu S-L (2005) Warranty strategy accounts for bathtub failure rate and random minimal repair cost. Comp Math Appl 49:1233–1242 177. Sheu S-H, Griffith WS, Nakagawa T (1995) Extended optimal replacement model with random minimal repair costs. Eur J Oper Res 85:636–649 178. Sheu S-H, Lin Y-B, Liao G-L (2005) Optimal policies with decreasing probability of imperfect maintenance. IEEE Trans Reliab 54:347–357 179. Sheu S-H, Liou C-T, Tseng B-C (1992) Optimal ordering policies and optimal number of minimal repairs before replacement. Microelectron Reliab 32:995–1002 180. Sheu S-H, Yeh RH, Lin Y-B, Juang M-G (1999) A Bayesian perspective on age replacement with minimal repair. Reliab Eng Syst Saf 65:55–64 181. Sheu S-H, Yeh RH, Lin Y-B, Juang M-G (2001) A Bayesian approach to an adaptive preventive maintenance model. Reliab Eng Syst Saf 71:33–44 182. Sim SH, Endrenyi J (1993) A failure–repair model with minimal and major maintenance. IEEE Trans Reliab 42:134–140 183. Siqueira IP (2004) Optimum reliability-centered maintenance task frequencies for power system equipments. 8th Intemational conference on probabilistic methods applied to power systems, Iowa State University, Ames, IA, pp 162–167 184. Soro IW, Nourelfath M, Ait-Kadi D (2010) Performance evaluation of multi-state degraded systems with minimal repairs and imperfect preventive maintenance. Reliab Eng Syst Saf 95:65–69 185. Stillman RH (2003) Power line maintenance with minimal repair and replacement. In: Proceedings of the annual reliability and maintainability symposium, pp 541–545 186. Tarakci H, Tang K, Teyarachakul, S (2009) Learning effects on maintenance outsourcing. Eur J Oper Res 192:138–150 187. Thomas LC (1986) A survey of maintenance and replacement models for maintainability and reliability of multi-item systems. Reliab Eng 16:297–309 188. Uematsu K, Nishida T (1987) The branching nonhomogeneous Poisson process and its application to a replacement model. Microelectron Reliab 27:685–691 189. Valdez-Flores C, Feldman RM (1989) A survey of preventive maintenance models for stochastic deteriorating single-unit systems. Nav Res Logist Q 36:419–446

100

M.-S. Ouali et al.

190. Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139:469–489 191. Wang W (2009) An inspection model for a process with two types of inspections and repairs. Reliab Eng Syst Saf 94:526–533 192. Yeh RH, Chang WL (2007) Optimal threshold value of failure-rate for leased products with preventive maintenance actions. Math Comp Modell 46:730–737 193. Yue D, Cao J (2001) Some results on successive failure times of a system with minimal instantaneous repairs. Oper Res Letters 29:193–197 194. Yun WY (1989) An age replacement policy with increasing minimal repair cost. Microelectron Reliab 29:153–157 195. Yun WY, Bai DS (1988) Repair cost limit replacement policy under imperfect inspection. Reliab Eng Syst Saf 23:59–64 196. Yun WY, Murthy DNP, Jack N (2008) Warranty servicing with imperfect repair. Int J Prod Econ 111:159–169 197. Yun WY, Nakagawa T (2010) Replacement and inspection policies for products with random life cycle. Reliab Eng Syst Saf 95:161–165 198. Zequeira RI, Bérenguer C (2006) Periodic imperfect preventive maintenance with two categories of competing failure modes. Reliab Eng Syst Saf 91:460–468 199. Zhang F, Jardine AKS (1998) Optimal maintenance models with minimal repair, periodic overhaul and complete renewal. IIE Transactions 30:1109–1119 200. Zhang ZG, Love CE (2000) A simple recursive Markov chain model to determine the optimal replacement policies under general repairs. Comp Oper Res 27:321–333 201. Zuo MJ, Liu B, Murthy DNP (2000) Replacement-repair policy for multi-state deteriorating products under warranty. Eur J Oper Res 123:519–530

Information-Based Minimal Repair Models Terje Aven

1 Introduction Using the minimal repair concept, it is possible to describe in a simple way the fact that many repairs in real life bring the system to a condition which is basically the same as it was just before the failure occurred. Such a repair may be used to model a system where a component of the system is replaced or repaired. Of course, the purpose of the repair action is not to bring the system to the exact same condition. Rather the purpose is to bring the system back to operation as soon as possible. But by looking at the condition of the system after the repair, it is a reasonable assumption to say that the system state has not changed. To formalize this idea, assume that the system is installed at time t ¼ 0 and is subject to failure at a random time T1 . The system has a failure rate (intensity) at time t given by the function kðtÞ, meaning that the probability that the system fails during the interval ðt; t þ hÞ is kðtÞh if the system has survived until time t. Here h is a small number. Suppose the repair is minimal in the sense that its state or condition is as good as it was immediately before the failure occurred. The minimal repair means that the age of the system is not disturbed by the failures. Consequently, the failure rate at time t is kðtÞ independent of the number of failures occurred up to time t. We ignore the duration of the repairs. If Nt represents the number of failures in the time period ½0; t, it follows that Nt is a nonhomogenous Poisson process with intensity function kðtÞ. This is the basic minimal repair model presented in 1960 by Barlow and Hunter [8]. This model has been extended in many ways since then, see e.g. Aven [1, 2], Aven and Jensen [6, 7], Phelps [16], Bergman [11], Block et al. [12], Stadje and Zuckerman [21], Shaked and Shanthikumar [20], Beichelt [11], Zhang and T. Aven (&) Faculty of Science and Technology, University of Stavanger, 4036 Stavanger, Norway e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_2, Springer-Verlag London Limited 2011

101

102

T. Aven

Jardine [23], Finkelstein [14] and Zequeira and Bérenguer [22]. A Bayesian approach is presented and discussed in Mazzuchi and Soyer [15]. Many special cases of the basic minimal repair model have been addressed in the literature. Two of the most frequently studied special cases are: kðtÞ ¼kbðktÞb1 kðtÞ ¼ke

bt

(Power law)

(Log linear model).

In this article we will focus on extensions of the basic model, and we will give special attention to the general minimal repair models presented and discussed by Aven [1, 2] and Aven and Jensen [6]. The present paper is partly based on Aven [4], Aven [3] and Sandve and Aven [18]. The minimal repair models are used to describe and predict the performance of repairable systems. A number of applications relate to optimal maintenance and replacement policies, see the above cited references. Two applications are presented in Sect. 4.

2 A Doubly Stochastic Poisson Process Approach In this section we study the concept of minimal repair as it was introduced by Aven [1] in an optimal replacement analysis context. Let Xt ; t 0, be an observable stochastic process, possibly a vector process, representing the condition of the system at time t. We define Nt as the number of failures in ½0; t. The failure intensity process, which is denoted kt , may depend on Xs ; 0 s t. Often we can formulate the relation in the following way: kt ¼ vðXt Þ; where vðxÞ is a positive deterministic function. The interpretation of kt is that given the history of the system up to time t, the probability that the system shall fail in the interval ðt; t þ hÞ is approximately kt h. In other words, kt is the expected number of failures per unit of time, given the information up to time t. If the failure intensity process depends only on the state process Xt and not on the failure process Nt , we can interpret the repairs as minimal: a repair which changes neither the age of the system nor the information about the condition of the system. In this case, the running information about the condition of the system can be thought to be related to a system which is always functioning. A way of formalizing this is to assume that given the state process X, the failure process follows a non-homogenous Poisson process with intensity vðXt Þ, i.e. N is a doubly stochastic Poisson process (Cox process) with intensity vðXt Þ. Hence, if the whole X process is known, the intensity at time t is vðXt Þ, independent of previous failures. This process models a system which is minimally repaired, as a repair does not change the information about the condition of the system. If we know that

Information-Based Minimal Repair Models

103

the state is x at time t, the failure intensity is vðxÞ, independent of the history of the failure process. If Xt ¼ t for all t, we are back to the basic minimal repair model, where Nt is a non-homogenous Poisson process. Example 1 Shock model. Assume that shocks occur to the system at random times, each shock causing a random amount of damage, and these damages accumulate additively. At a shock, the system fails with a given probability. A system failure can occur only at the occurrence of a shock. Let Vt denote the number of shocks in ½0; t, and let Yi denote the amount of damage caused by the ith shock. We assume that Vt is a Poisson process with rate m, and that the Yi s are independent and identically distributed random variables with a distribution HðyÞ. Let Xt denote the accumulated damage in ½0; t, i.e. Xt ¼

Vt X

Yi :

i¼1

Now, if the system is active before time t, and the accumulated damage equals x, and a jump of size y occurs at t, then the probability of failure at this point is pðx þ yÞ, where 0\pðxÞ\1 for all x. This model is a special case of the general set-up described above. The failure intensity process of the counting process Nt equals Z1 m

pðXt þ yÞdHðyÞ:

0

For a formal proof of this result, see [6, 7]. These references also include extensions of this model, by allowing the probability p to depend on the number of failures occurred. However, the repairs are then not minimal repairs. Example 2 Monotone system. Consider a binary, monotone system / of n independent components. We refer to [7, 9] for the definition of such a system. Let Nt ðiÞ denote the number of failures of component i in ½0; t, and Nt the number of system failures in the same interval. The counting process Nt ðiÞ is assumed to have an intensity process kt ðiÞ. Hence the failure process of the system Nt has an intensity kt given by kt ¼

n X

kt ðiÞXt ðiÞð1 /ð0i ; Xt ÞÞ/ðXt Þ;

ð1Þ

i¼1

where /ði ; xÞ ¼ /ðx1 ; . . .; xi1 ; ; xiþ1 ; . . .; xn Þ: Observe that Xt ðiÞð1 /ð0i ; Xt ÞÞ/ðXt Þ is either 0 or 1, and equals 1 if and only if the system is functioning, component i is functioning and the system fails if component i fails.

104

T. Aven

Assume now that if the system fails, it is minimally repaired in the following sense: If a component fails and causes system failure, then this component is minimally repaired in the traditional sense. A component which fails without causing system failure is not repaired. We assume that component i when not being repaired has a lifetime Ri with distribution function Fi ðtÞ and failure rate equal to ri ðtÞ. The n components are assumed to be independent. It follows that we have a special case of the general set-up with kt having the form (1) with kt ðiÞ ¼ ri ðtÞXt ðiÞ. The process Xt ðiÞ; t 0, is in this case either identical to one, or one up to Ri and then zero. If component i is in series with the rest of the system, then Xt ðiÞ 1:

3 A General Point Process Approach In this section we extend the analysis of the previous section to a general point process, using the setup of Aven and Jensen [6, 7]. Let ðTn Þ; n 2 N, where N ¼ f1; 2; . . .g; be a point process on a basic probability space ðX; F; PÞ describing the failure times at which instantaneous repairs are carried out, i.e. ðTn Þ is an increasing sequence of positive random variables which may also take the value þ1 : 0\T1 T2 . . .: The inequality is strict unless Tn ¼ 1: We always assume that T1 ¼ limn!1 Tn ¼ 1: The corresponding counting process N ¼ ðNt Þ; t 2 Rþ ; Nt ðxÞ ¼

X

IðTn ðxÞ tÞ;

n1

where IðÞ denotes the indicator function, represents the number of failures up to time t. Here Rþ ¼ ½0; 1Þ: The information up to time t is represented by the pre-t-history (r-algebra) Ft ; which contains all events of F that can be distinguished up to and including time t: The filtration F ¼ ðFt Þ; t 2 Rþ , is assumed to follow the usual conditions of completeness and right continuity. The main assumption now is that the failure counting process is integrable and admits an F-intensity k ¼ ðkt Þ; i.e. a decomposition Nt ¼

Zt

ks ds þ Mt ;

ð2Þ

0

with a mean zero martingale M ¼ ðMt Þ: The question now is which counting processes can be classified as minimal repair processes (MRPs) and which cannot. Different types of repair processes are characterized by different intensities k: The repairs are minimal if the intensity k is not affected by the occurrence of

Information-Based Minimal Repair Models

105

failures or in other words, if one cannot determine the failure time points from the observation of k: More formally minimal repairs can be characterized as follows. Definition 1 Let ðTn Þ; n 2 N; be a point process with an integrable counting process N and corresponding F-intensity k: Suppose that Fk ¼ ðFkt Þ; t 2 Rþ , is the filtration generated by k : Fkt ¼ rðks ; 0 s tÞ: Then the point process ðTn Þ is called a minimal repair process (MRP) if none of the variables Tn ; n 2 N; for which PðTn \1Þ [ 0; is an Fk -stopping time, i.e. for all n 2 N with PðTn \1Þ [ 0 there exists t 2 Rþ such that fTn tg 62 Fkt : This is a rather general definition which comprises a lot of special cases. It is easily verified that the non-homogeneous Poisson process with a timedependent deterministic function kt ¼ kðtÞ is an MRP, because here Fkt ¼ fX; ;g for all t 2 Rþ , so clearly the failure times Tn are not Fk -stopping times. If the intensity is not deterministic but a random variable kðxÞ which is known at the time origin (k is F0 -measurable) or more general k ¼ ðkt Þ is a stochastic process such that kt is F0 -measurable for all t 2 Rþ , i.e. F0 ¼ rðks ; s 2 Rþ Þ and Ft ¼ F0 _ rðNs ; 0 s tÞ; then the process is called a doubly stochastic Poisson process or a Cox process. The failure (minimal repair) times are not Fk -stopping times, since Fkt ¼ rðkÞ F0 and Tn is not F0 -measurable. In the following we give another characterization of an MRP. Proposition 1 Assume that PðTn \1Þ ¼ 1 for all n 2 N and that there exist versions of conditional probabilities Ft ðnÞ ¼ E IðTn tÞjFkt such that for each n 2 N; ðFt ðnÞÞ; t 2 Rþ ; is a ðFk progressiveÞstochastic process. 1. Then the point process ðTn Þ is a MRP if and only if for each n 2 N there exists some t 2 Rþ such that Pð0\Ft ðnÞ\1Þ [ 0: 2. If furthermore ðFt Þ ¼ ðFt ð1ÞÞ has P-a.s. continuous paths of bounded variation on finite intervals, then 9 8 = < Zt 1 Ft ¼ exp ks ds : ; : 0

See Aven and Jensen [6] for the proof.

4 Applications to Optimal Replacement In this section we consider two applications, the first with minimal repairs at system failures and the second with minimal repairs at component failures.

106

T. Aven

4.1 Minimal Repairs at System Failures Consider the setup of Sects. 2 and 3. Suppose now a planned replacement of the system is scheduled at time T, which may depend on the condition of the system, i.e. on the process Xt . The replacement time T is a stopping time in the sense that the event fT sg depends on the process Xt up to time s. There is no planned replacement if T ¼ 1: The following simple cost structure is assumed: A planned replacement of the system costs Kð [0Þ and a repair/replacement at system failure costs cð [0Þ. It is assumed that the systems generated by replacements are stochastically independent and identical, the same replacement policy is used for each system and the replacement and repairs take negligible time. The problem is to determine a replacement time minimizing the long run (expected) cost per unit time. Let M T and ST denote the expected cost associated with a replacement cycle and the expected length of a replacement cycle, respectively. We restrict our attention to T’s having M T \1 and ST \1. Then using [17], Theorem 3.16, the long run (expected) cost per unit time can be written:

BT ¼

M T cENT þ K ¼ : ST ET

ð3Þ

Using (2) (ref. also [7, 13]), it follows from (3) that

T

B ¼

cE

RT

kt dt þ K : RT E 0 dt 0

ð4Þ

We note that the optimality criterion is in the same form as analyzed by Aven and Bergman [5]. Below the main results obtained in [5] are summarized. Introduce at ¼ ckt and assume that a is non-decreasing in t: Define the replacement time Td by the first point in time the process at exceeds d; i.e. at d: We assume ETd \1: It can be seen that Td minimizes

T

T

M dS ¼ E

ZT

½ckt ddt þ K:

0

The results of [7] follow. Let BðdÞ ¼ BTd .

Information-Based Minimal Repair Models

107

The stopping time Td , where d ¼ inf T BT , minimizes BT . The value d is given as the unique solution of the equation d ¼ BðdÞ. Moreover, if d [ d , then d [ BðdÞ, if d\d , then d\BðdÞ; BðdÞ is non-increasing for d d , non-decreasing for d d , and BðdÞ is left-continuous. It follows from (4) and Fubini’s theorem that R1 c 0 E½Iðat \dÞkt dt þ K R1 : BðdÞ ¼ 0 EIðat \dÞdt Hence if at ¼ cvðXt Þ; where vðxÞ is a deterministic function in x, and Qt ðÞ is the distribution of Xt , we may write R1 R c 0 ½IðcvðxÞ\dÞÞvðxÞQt ðdxÞdt þ K R1 R BðdÞ ¼ : ð5Þ IðcvðxÞ\dÞQt ðdxÞdt 0 Note that if Xt is a vector process, then one of the components of Xt may be the time t. Examples of applications can be derived from the models presented in Examples 1 and 2 above, ref. Aven [3]. We briefly look at the shock model. In this case the vðxÞ function is given by Z1 vðxÞ ¼ m pðx þ yÞdHðyÞ: 0

Suppose the parameters of the model are m ¼ 1; K ¼ 1; c ¼ 2; Yi 1; pðxÞ ¼ 1 ex=4 : Hence kt ¼ 1 eðXt þ1Þ=4 : Using formula (5), it is not difficult to find the optimal policy: Replace the system when the number of shocks, Vt , equals 3. The average cost function then equals 1:1; see Aven [3].

4.2 Minimal Repairs at Component Failures Consider a monotone system / comprising n independent components, which are minimally repaired at failures. Let Xt ðiÞ be a binary random variable representing the state of component i at time t; t 0; i ¼ 1; 2; . . .; n. The random variable

108

T. Aven

Xt ðiÞ equals 1 if the component is functioning at t and 0 otherwise. Assume X0 ðiÞ ¼ 1. Let Nt ðiÞ denote the number of failures of component i in ½0; t, and let Ns0 ðiÞ denote the associated process representing the number of failures of component i in ½0; s when time is measured in operating time. We assume that Ns0 ðiÞ is a nonRs homogeneous Poisson process with intensity function ki ðsÞ. Let Ki ðsÞ ¼ 0 ki ðuÞdu denote the mean value function of the process Ns0 ðiÞ: Hence the minimal repairs at the component levels are traditional minimal repairs as defined in Sect. 1. The setup and analysis can easily be extended to doubly stochastic Poisson processes. We then have to replace ki ðtÞ by vi ðUt Þ where U is the underlying state process representing the condition of the system and vi is a positive deterministic function, see Sect. 2. Let Zt ðiÞ denote the operating time at time t. Then it is not difficult to see that Nt ðiÞ is a counting process with intensity process ki ðZt ðiÞXt ðiÞÞ. Let pi ðtÞ ¼ 1 qi ðtÞ ¼ PðXt ðiÞ ¼ 1Þ. Furthermore let Sin denote the nth failure time of component i. We assume that the repair/restoration times are independent with distribution function Gi ðtÞ. Let Gi ðtÞ ¼ 1 Gi ðtÞ: Each component is minimally repaired at failures, which corresponds to the assumption of a non-homogeneous Poisson process of Ns0 ðiÞ: The following cost structure is assumed: • A system replacement cost K; K [ 0: • The cost of a minimal repair of component i is ci ; ci 0: • The cost of a system failure of duration t is k þ bt: The system is assumed to be replaced at the stopping time T. After a replacement the system is assumed to be as good as new, i.e. the process restarts itself.

4.2.1 Optimization Function Let M T and ST denote the expected cost associated with a replacement cycle and the expected length of a replacement cycle, respectively. Then again using [17], Theorem 3.16, the long run (expected) cost per time unit can be written: BT ¼

M T Ecostin½0; T ¼ : ST ET

It is tacitly understood that the expectations are finite. In a replacement P cycle the cost of the replacement and the minimal repairs equals K þ ni¼1 ci NT ðiÞ. In addition we have a cost associated with system failures. It is not difficult to see RT that this cost equals kNT þ b 0 ½1 Ut dt; where Nt represents the number of system failures in ½0; t:

Information-Based Minimal Repair Models

109

It then follows that the cost/optimization function can be written: RT RT P K þ ni¼1 E 0 ci dNt ðiÞ þ kENT þ E 0 bð1 Ut Þdt BT ¼ ET

ð6Þ

Thus (6) expresses the expected cost per unit of time, and the problem of finding an optimal replacement time is reduced to that of minimizing this function with respect to T. Using that Nt ðiÞ is a counting process with intensity process ki ðZt ðiÞÞXt ðiÞÞ it follows that ZT ZT n n X X E dNt ðiÞ ¼ E ki ðZt ðiÞÞXt ðiÞÞdt: i¼1

i¼1

0

ð7Þ

0

Similarly, we obtain the following expression for the expected number of system failures in a replacement cycle: ZT n X ENT ¼ E ½/ð1i ; Xt Þ /ð0i ; Xt ÞdNt ðiÞ i¼1

0

ZT n X ¼ E ½/ð1i ; Xt Þ /ð0i ; Xt Þki ðZt ðiÞÞXt ðiÞÞdt; i¼1

ð8Þ

0

where /ð1i ; Xt Þ /ð0i ; Xt Þ equals 1 if and only if component i is critical, i.e. the state of component i determines whether the system functions or not. Combining (6), (7) and (8) we get RT E 0 at dt þ K T B ¼ ; ð9Þ RT E 0 dt where at ¼

n X ½ci þ kð/ð1i ; Xt Þ /ð0i ; Xt Þki ðZt ðiÞÞXt ðiÞÞ þ b½1 Ut :

ð10Þ

i¼1

Observe that Zi ðtÞ t if the downtimes are relatively small compared to the uptimes. We see from the above expression for BT that it is basically identical to the one analyzed in [5]. Unfortunately, at does not have non-decreasing sample paths. Hence we cannot apply the results of [5]. In theory, Markov decision processes can be used to analyze the optimization problem. The Markov decision process is characterized by a stochastic process Yt , t 0, defined here by Yt ¼ ðSt ; Xt ; Vt ; Wt Þ;

110

T. Aven

where St ¼ time since the last replacement Xt ¼ ðXt ð1Þ; Xt ð2Þ; ; Xt ðnÞÞ Xt ðiÞ ¼ state of componenti at time t Vt ¼ ðVt ð1Þ; Vt ð2Þ; ; Vt ðnÞÞ Vt ðiÞ ¼ duration of the downtime of componenti at t since the last failure of the component Wt ¼ ðWt ð1Þ; Wt ð2Þ; ; Wt ðnÞÞ Wt ðiÞ ¼ accumulated downtime of componenti at t since last replacement At each time t, the state Yt is observed, and based on the history of the process up to time t, an action at is chosen. In this case there are two possible actions: ‘‘not replace’’ and ‘‘replace’’. Here we shall, however, not analyze this approach any further. From a practical point of view the Markov decision approach is not very attractive in this case. The state space is very large and the cost rate function is not ‘‘monotone’’, cf. [19]. Instead, we shall look at a rather simple class of replacement policies: Replace the system at S or at the first component failure after T, whichever comes first. Here T and S are constants with T S. We refer to this policy as a ðT; SÞ policy. Such a policy might be appropriate if for example the system failure cost is relatively large and a failure of a component often results in other components being critical (this will be the case if the system has minimal cut sets comprising two components).

4.2.2 Replacement Policies (T, S) Let gT denote the first component failure after T. Then, from (9), it follows that RT RS Eat dt þ T EIðt\gT Þat dt þ K ; BðT; SÞ ¼ 0 RS T þ T Pðt\gT Þdt where at is defined by (10). To compute BðT; SÞ we will make use of the approximation Zt ðiÞ t. This means that the downtimes are relatively small compared to the uptimes. Using that the structure function of a monotone system can be written as a sum of products of component states with each term of the sum multiplied by a constant, it is seen that X Y at

vt ðlÞ Xt ðiÞ þ constant; l

i2Al

for some deterministic functions vt ðlÞ and sets Al f1; 2; ; ng: It suffices therefore to calculate expressions of the form ZT Y vl ðtÞ pi ðtÞdt; 0

i

ð11Þ

Information-Based Minimal Repair Models

111

and ZS

vl ðtÞE

Y

Xt ðiÞIðt\gT Þdt:

ð12Þ

i

T

To compute (11) we make use of the following formula for qi ðtÞ ¼ 1 pi ðtÞ : qi ðtÞ

Zt

Gi ðt yÞki ðyÞeðKi ðtÞKi ðyÞÞ dy:

ð13Þ

0

To establish (13) we note that qi ðtÞ ¼

Zt

PðXt ðiÞ ¼ 0jSiNi ðtÞ ¼ yÞHi ðdy; tÞ;

0

where Hi ðy; tÞ ¼ PðSiNi ðtÞ yÞ: It is seen that PðXt ðiÞ ¼ 0jSiNi ðtÞ ¼ yÞ Gi ðt yÞ, and using that Hi ðy; tÞ ¼ PðSiNt ðiÞ yÞ ¼ PðNi ðiÞ Ny ðiÞ ¼ 0Þ eðKi ðtÞKi ðyÞÞ ; formula (13) follows. The accuracy of formula (13) is studied in [19]. It remains to compute (12). Here we shall present a very simple approximation formula. Observing that Iðt\gT Þ ¼ 1 means that there are no component failures in the interval ðT; t, and the components are most likely to be up at time T, we have Y E Xt ðiÞIðt\gT Þ P(no component failures in (T, t]) i

¼

n Y

PðNt ðiÞ NT ðiÞ ¼ 0Þ

i¼1

e

Pn i¼1

ðKi ðtÞKi ðTÞÞ

An approximate value of BðT; SÞ can now be calculated and an optimal policy determined. In case of doubly stochastic Poisson processes, it suffices to compute " ## ZT " Y E vl ðUt ÞE XtðiÞ ¼ 1jF0 dt 0

i

ð14Þ

112

T. Aven

and " ## ZS " Y E vl ðUt ÞE XtðiÞ Iðt\gT ÞjF0 dt;

ð15Þ

i

T

where vl are deterministic functions. Conditional on F0 we can copy the arguments for the non-homogenous Poisson process case, and then by integrating over the distribution of U, we can obtain compact expressions for the optimization criterion. We omit the details. The ðT; SÞ policy can be improved by taking into account which component fails. Instead of replacing the system at the first component failure after T (assuming this occurs before S), we might replace the system at the first component failure resulting in a critical component, or, wait until the first system failure after T. In Aven and Bergman [5] (refer previous section) it is shown that the problem of minimizing BT can be solved my minimizing the function

LTd

T

T

¼ M dS ¼ E

ZT

½aðtÞ ddt þ K:

0

If T minimizes LTd ; where d ¼ inf T BT ; then T also minimizes BT : Hence we can focus on LTd : It is clear from the expression of LTd that an optimal policy will be greater than or equal to the stopping time Td ¼ infft : at dg Using the optimal average cost BðT; SÞ as an approximation for d we can obtain an improved replacement policy ðTd ; SÞ: An alternative replacement policy is obtained by considering the time points where component failures occur as decision points. Let Ti be the point in time of the ith component failure and let Fi denote the history up to time Ti . Then, based on Fi we determine a time Ri ð2 ½0; 1Þ such that the system is replaced at Ti þ Ri if Ti þ Ri \Tiþ1 : The value of Ri is determined by minimizing the conditional expected cost from Ti until the next decision point or replacement time, whichever occurs first, i.e. Ri minimizes gðrÞ ¼

Ti þr Z

E½ðat dÞIðt\Tiþ1 ÞjFi dt:

Ti

The performance of the above policies are studied in [19].

Information-Based Minimal Repair Models

113

References 1. Aven T (1983) Optimal replacement under a minimal repair strategy - a general failure model. Adv Appl Prob 15:198–211 2. Aven T (1987) A counting process approach to replacement models. Optimization 18:285–296 3. Aven T (1996) Optimal replacement of monotone repairable systems. Chapter in Lecture notes. Reliability and maintenance of complex systems. Springer-Verlag, New York, NATO ASI 4. Aven T (2008) General minimal repair models. In: Ruggeri F, Faltin F, Kenett R (eds) Encyclopedia of statistics in quality and reliability. Wiley, Chichester 5. Aven T, Bergman B (1986) Optimal replacment time, a general set-up. J Appl Prob 23:432–442 6. Aven T, Jensen U (2000) A general minimal repair model. J Appl Prob 37:187–197 7. Aven T, Jensen U (1999) Stochastic models of reliability. Springer, New York 8. Barlow R, Hunter L (1960) Optimum preventive maintenance policies. Oper Res 8:90–100 9. Barlow RE, Proschan F (1975) Statistical theory of reliability and life testing. Holt, Rinehart and Winston, New York 10. Beichelt F (1993) A unifying treatment of replacement policies with minimal repair. Nav Res Log Quart 40:51–67 11. Bergman B (1985) On reliability theory and its applications. Scand J Statist 12:1–41 12. Block H, Borges W, Savits T (1985) Age-dependent minimal repair. J Appl Prob 22:370–385 13. Brémaud P (1981) Point processes and queues. Martingale dynamics. Springer, New York 14. Finkelstein MS (2004) Minimal repair in heterogeneous populations. J Appl Prob 41:281–286 15. Mazzuchi TA, Soyer R (1996) A Bayesian perspective on some replacement strategies. Reliab Eng Syst Saf 51:295–303 16. Phelps R (1983) Optimal policy for minimal repair. J Opl Res 34:425–427 17. Ross SM (1970) Applied probability models with optimization applications. Holden-Day, San Francisco 18. Sandve K (1996) Cost analysis and optimal maintenance planning for monotone repairable systems. PhD thesis, University of Stavanger and Robert Gorden University 19. Sandve K, Aven T (1999) Cost optimal replacement of a monotone, repairable system. Eur J Oper Res 116:235–248 20. Shaked M, Shanthikumar G (1986) Multivariate imperfect repair. Oper Res 34:437–448 21. Stadje W, Zuckerman D (1991) Optimal maintenance strategies for repairable systems with general degree of repair. J Appl Prob 28:384–396 22. Zequeira RI, Bérenguer C (2006) Periodic imperfect preventive maintenance with two categories of competing failure modes. Reliab Eng Syst Saf 91:460–468 23. Zhang F, Jardine AK (1998) Optimal maintenance models with minimal repair, periodic overhaul and complete renewal. IIE Trans 30:1109–1119

Minimal Repair Models with Two Categories of Competing Failure Modes Inma T. Castro

1 Introduction The analysis of maintenance strategies is an important issue in reliability engineering. These maintenance strategies can be analyzed obtaining information about their basic stochastic characteristics and attempting to characterize the optimum maintenance strategy, that is, the maintenance strategy that obtains the best value for an objective function. Over the last decades, many maintenance strategies have been analyzed for a wide variety of maintenance problems. Under a maintenance strategy, the system may be subject to corrective and preventive maintenance actions. Following the text of O’Connor et al. [5], corrective maintenance actions include all actions to return a system from a failed to an operating or available state. On the other hand, preventive maintenance actions seek to retain the system in an operational or available state by preventing failures from occurring. Very often, corrective maintenance actions are assumed to be minimal. That is, the corrective maintenance actions restore the system to its functioning condition just prior to failure with the failure rate of the system remaining undisturbed. This assumption seems reasonable for systems consisting of many components each having their own failure mode, since the repair of the failed component will not influence the system failure rate very much. Formally, the minimal repair at failures is defined as follows. Suppose a system starts at time 0. If the system fails, it undergoes repair and it begins to operate again. It is assumed that the repair time is negligible. Let us denote by Y0 ; Y1 ; . . .; Yn ; . . .

I. T. Castro (&) Department of Mathematics, University of Extremadura, Avenida de la Universidad, s/n, 10071 Caceres, Spain e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_3, Ó Springer-Verlag London Limited 2011

115

116

I. T. Castro

the failure times of the system where Y0 ¼ 0. The times between failures Xn ¼ Yn Yn1 ; n ¼ 1; 2; . . . are non-negative random variables. Let PðX1 tÞ FðtÞ; for t 0. The system undergoes minimal repair at failures if and only if PðXn xjX1 þ X2 þ þ Xn1 ¼ tÞ ¼

Fðt þ xÞ FðtÞ ; 1 FðtÞ

n ¼ 2; 3; . . .;

for x [ 0; t 0, such that FðtÞ\1. The function Fðt þ xÞ FðtÞ ; 1 FðtÞ represents the probability that the system of age t fails in ðt; t þ xÞ, and it is called the failure rate. The definition of minimal repair means that the failure rate remains undisturbed by any minimal repair of failure; the system after each minimal repair has the same failure rate as before the failure. If FðtÞ has a density function f ðtÞ which is continuous, the function f ðtÞ ; 1 FðtÞ

t 0;

is also called the instantaneous failure rate or failure rate and it has the same monotone properties as Fðt þ xÞ FðtÞ : 1 FðtÞ The monographic of Nakagawa [9] shows many properties of the random variables related to the minimal repair process. Let Gn ðtÞ and Fn ðtÞ be the distribution functions of the random variables Yn and Xn , the successive failure times of the system and the times between failures for n ¼ 1; 2; . . . assuming minimal repairs at failures with negligible time repair, then Gn ðtÞ ¼1

Fn ðtÞ ¼1

n1 X HðtÞj j¼0 Z1

j!

eHðtÞ ;

t 0;

n ¼ 1; 2; . . .

n2

þ xÞ HðxÞ hðxÞdx; Fðt ðn 2Þ!

0

where HðtÞ ¼

Zt hðuÞdu; 0

t 0;

n ¼ 1; 2; . . .

Minimal Repair Models

117

denotes the cumulative hazard rate, hðtÞ the failure rate function of the time to the first failure X1 at time t; and FðtÞ the distribution of X1 : Moreover, denoting by NðtÞ the number of minimal repairs during ð0; tÞ one obtains that P½NðtÞ ¼ n ¼Gn ðtÞ Gnþ1 ðtÞ ¼

HðtÞn HðtÞ e ; n!

n ¼ 1; 2; . . .;

hence, the expected number of minimal repairs in the interval ð0; tÞ is given by E½NðtÞ ¼ HðtÞ;

ð1Þ

where HðtÞ denotes the cumulative hazard rate. In the reliability literature, many maintenance strategies involving minimal repairs have been developed. Barlow and Hunter [1] introduced the notion of periodic replacement with minimal repairs at failures. They assumed that, after each failure, only minimal repairs are made and the system is replaced by a new one at times kT; k ¼ 1; 2; . . .: They called this maintenance model Policy II and they showed how to calculate the optimum period between replacements for an infinite time span. A classical model that combines perfect and minimal repairs is called a ðp qÞ model. Under this model, the maintenance action consists in replacing the system with probability p or performing a minimal repair with a probability of q ¼ 1 p: Nakagawa [9] considered preventive maintenance actions under this rule and he analyzed the optimum preventive maintenance policies. Nakagawa assumed that, after a preventive maintenance at time T, the system is replaced by a new one with a probability of p and it is minimally repaired with a probability of q ¼ 1 p: When the system fails, a corrective maintenance is performed and the system is replaced by a new one. In 1983 Brown and Proschan [3] considered corrective maintenance actions under the ðp qÞ rule and this model is known as imperfect repair model or Brown–Proschan model. Under the Brown–Proschan model, the system is repaired at failures under the following scheme. With probability p the repair is perfect and the system is replaced by a new one. With probability q ¼ 1 p, the repair is minimal. The Brown–Proschan model has been extended in many ways since then. For example, Block et al. [2] proposed a more general model where the probability of a perfect repair depends on the age of the device. If the system fails at age t, then it is either replaced by a new one with probability p(t) or it is minimally repaired with probability qðtÞ ¼ 1 pðtÞ: In this chapter, some maintenance strategies involving minimal repairs are showed and some maintenance preventive actions are performed to reduce the frequency of failures. These strategies have also in common that the system is subject to different failure modes. Following the description of Mosleh et al. [8], by failure mode we mean ‘‘a description of component failure in terms of the component function that was actually or potentially unavailable’’. Specifically, we assume the system is subject to two modes of failure, called maintainable failure

118

I. T. Castro

mode and non-maintainable failure mode, related to the effect of the successive preventive maintenance actions over them. In many cases, some preventive maintenance actions, such as cleaning, greasing, oiling, tightening screws, etc. can improve the deterioration effect of some failure mode, restoring the components to a ‘‘as good as new’’ condition with respect to some failure mode and keeping the other remaining failure modes unaffected. Under this situation, Li et al. [7] introduced the concept of two categories of failure mode, maintainable failure mode and non-maintainable failure mode. The effects of the preventive maintenance actions are related to the failure rate function of the system assuming that a failure rate function is related to each failure mode. Preventive maintenance actions will affect maintainable failure rate exclusively, whereas non-maintainable failure rate remains unaltered under the effect of the preventive maintenance actions. Removing the deterioration related to the non-maintainable failure mode is only possible by making a complete overhaul which restores the system to ‘‘as good as new’’ condition. In this exposition, we review some minimal repair models with the following characteristics: 1. A single unit system is subject to two modes of failure: maintainable failures and non-maintainable failures. Let hm ðtÞ and hnm ðtÞ be the failure rate function for maintainable and non-maintainable failures at time t and let hðtÞ be the failure rate function of the system at time t. 2. Whenever a failure happens, a minimal repair is performed with negligible repair time. 3. Preventive maintenance actions are performed to mitigate the effects of the failures. These preventive maintenance actions are imperfect, that is, they do not restore the system to as good as new condition. 4. The preventive maintenance actions do not disturb the hazard rate of the nonmaintainable failures, they only affect the maintainable failure rate. 5. In the Nth preventive maintenance action, the system is replaced by a new one. A realization of this maintenance scheme is shown in Fig. 1. Assuming a sequence of costs for the different maintenance actions, the search of the best

Fig. 1 Realization of the failure rate of the system

hnm(t)

hazard rate

hm(t)

T1

T2

T3

T4

T5

time

Minimal Repair Models

119

maintenance strategy for this maintenance scheme is based on cost considerations minimizing a determined objective function. In this work, an infinite time span is assumed. It means that the same maintenance strategy is used over an infinite time period of time and applied to a sequence of systems whose failure times have the same known probability distribution. A replacement cycle is the time between successive replacements of the system. Denoting by CðtÞ the cumulative cost up to time t, by E½C the expected cost in a replacement cycle, and by E½L the expected length of a replacement cycle, the renewal-reward theorem (see Tijms [10] for more details) holds that lim

t!1

CðtÞ E½C ¼ ; t E½L

with probability 1.

ð2Þ

That is, for almost any realization of the process fCðtÞ; t 0g, the long run cost per time unit is equal to the expected cost during one cycle divided by the expected length of the cycle. For an infinite time span, the expression (2) is used as an appropriate objective function to characterize the optimum maintenance strategy. In the literature, different models are presented to capture the effect of the preventive maintenance actions on the failure rate function of maintainable failure mode. The present chapter shows some of these models based on the works of Li et al. [7], Zequeira and Bérenguer [11] and Castro [4]. Firstly, in Sect. 1, we show the model of Li et al. [7] where they assumed that the two modes of failure are independent and they are competing to cause the system failure. Subsequently, in Sect. 2, we show two different models, based on the works of Zequeira and Bérenguer [11] and Castro [4], where the maintainable and non-maintainable failure rates are dependent. The difference between both models is the dependence scheme.

2 Independent Failure Modes Li et al. [7] analyzed a sequential preventive maintenance model under the general assumption that, at a system failure, a minimal repair is performed and preventive maintenance actions are carried out to reduce the frequency of failures. The system is subject to maintainable failures and non-maintainable failures. Let hðtÞ; hm ðtÞ, and hnm ðtÞ be the system failure rate, the maintainable failure rate, and the nonmaintainable failure rate, respectively, at time t. Li et al. [7] assumed that both failure mechanisms are independent and they are competing to cause the system failure, hence the failure rate for the system is given by hðtÞ ¼ hm ðtÞ þ hnm ðtÞ;

t 0:

The preventive maintenance actions are imperfect and a hybrid model is used to model the effects of the preventive maintenance actions on maintainable failure modes. Suppose that, for a system entering service at time T0 ¼ 0; the first

120

I. T. Castro

preventive maintenance occurs at time T1 just before this maintenance is carried out, the system effective age y1 is the same as its calendar age T1 : We suppose that the preventive maintenance reduces the effective age to b1 T1 ; where b1 is some constant lying between 0 and 1. Then, during the period until the next maintenance at time T2 ; the effective age of the system is y ¼ b1 T1 þ x;

0\x\T2 T1 :

Furthermore, the failure rate after preventive maintenance may not be precisely the same as for a genuinely younger system. Denoting by hm ðtÞ the maintainable failure rate function for a new equipment at time t, the maintainable failure rate after the first preventive maintenance action is given by hm;1 ðtÞ ¼ a1 hm ðb1 T1 þ t T1 Þ;

T1 t\T2 ;

where a1 [ 1 is some system-dependent constant. The term b1 denotes the improvement factor in the effective age of the equipment due to the first preventive maintenance. It means that the effective age of the system becomes b1 T1 right after the first preventive maintenance. The term a1 denotes the adjustment factor for the hazard rate function due to the first preventive maintenance. It means that the failure rate at this moment is not the same as the failure rate when the equipment’s age is b1 T1 . Instead, it is equal to a1 hm ðb1 T1 Þ. The effective age of the system just before the second preventive maintenance action at time T2 is y2 ¼ b1 T1 þ ðT2 T1 Þ; and immediately after maintenance, this is reduced to b2 y2 for some b2 such that b1 b2 1 and the maintainable failure rate after the second preventive maintenance is given by hm;2 ðtÞ ¼ a2 a1 hm ðb2 y2 þ t T2 Þ;

T2 t\T3 :

Assuming that the system is preventively maintained at times T1 ; T2 ; . . .; let yk be the effective age of the system just before the kth preventive maintenance at time Tk for k ¼ 1; 2; . . .; n and yk ¼ bk1 yk1 þ ðTk Tk1 Þ;

ð3Þ

with maintainable failure rate given by hm;k ðtÞ ¼ Ak hm ðbk yk þ t Tk Þ;

Tk t\Tkþ1 ;

where bk denotes the adjustment factor in effective age due to the kth preventive maintenance action where 0 ¼ b0 b1 b2 bN1 \1;

Minimal Repair Models

121

and k Y

Ak ¼

ð4Þ

ai ;

i¼0

denotes the product of the corresponding hazard rate adjustment factors where 1 ¼ a0 a1 a2 Let hk be the hazard rate of the system between the kth and the ðk þ 1Þth preventive maintenance action. This function is given by hk ðtÞ ¼ hnm ðtÞ þ Ak hm ðbk yk þ t Tk Þ;

Tk t\Tkþ1 ; k ¼ 0; 1; 2; . . .

The system is replaced in the Nth preventive maintenance action. Let NðTN Þ be the number of minimal repairs for this maintenance model during the replacement cycle ½0; TN . As only minimal repairs are performed when the system fails, using (1), the expected number of minimal repairs for this maintenance model is given by E½NðTN Þ ¼

ZTN

hnm ðtÞdt þ

ZTN

0

¼Hnm ðTN Þ þ

hm ðtÞdt

0 N1 X k¼0

ZTkþ1 Ak

ð5Þ hm ðbk yk þ t Tk Þdt;

Tk

where Hnm ðtÞ denotes the cumulative hazard rate for the non-maintainable failures. Using (2), the long run expected cost per time unit expressed as a function of the effective ages of the system just before the preventive maintenance actions yk given by (3) is PN1 R Tkþ1 Cr þ Cp ðN 1Þ þ Cm Hnm ðTN Þ þ k¼0 Ak Tk hm ðbk yk þ t Tk Þdt CðTk ; NÞ ¼ ; TN ð6Þ where Cr ; Cp and Cm denote the replacement cost, the preventive maintenance cost, and the minimal repair cost, respectively, Hnm ðtÞ denotes the cumulative hazard rate for the non-maintainable failures and Ak is given by (4). The function (6) is expressed as a function of the effective ages yk as P N Cr þ Cp ðN 1Þ þ Cm k¼1 Hk ðyk Þ Hk ðbk1 yk1 Þ ; CðyK ; NÞ ¼ PN k¼1 ð1 bk Þyk þ yN

ð7Þ

where Hk denotes the cumulative failure rate for the system between the ðk 1Þth preventive maintenance and the kth preventive maintenance. Using (7) as objective cost function, Li et al. [7] searched the optimal maintenance strategy, that is, they

122

I. T. Castro

determined the optimal values of N and yk ; k ¼ 1; 2; . . .; N that minimize the mean cost rate given by (7). Li et al. [7] analyzed a different approach of this maintenance model assuming that the preventive maintenance actions are performed whenever the hazard rate of the system reaches a predetermined k:

3 Dependent Failure Modes In contrast to Li et al. [7], Zequeira and Bérenguer [11] assumed that the different failure modes are dependent. A failure rate function is related to each failure mode. The dependence model is expressed in terms of the failure rate: the failure rate of the maintainable failure depends on the failure rate of the non-maintainable failure. They consider a system subjected to preventive and corrective maintenance actions. Minimal repairs are performed at system failures. Preventive maintenance actions are performed periodically at times kT; k ¼ 1; 2; . . .; N 1 and the system is replaced by a new one at time NT: Let hm;k ðtÞ be the hazard rate function of the maintainable failures in the interval ½ðk 1ÞT; kTÞ and hm ðtÞ be the hazard rate of the maintainable failure for a new system. For this maintenance model, the hazard rate of the maintainable failure mode is given by hm;k ðtÞ ¼ hm ðt ðk 1ÞTÞ þ pðt ðk 1ÞTÞhnm ðtÞ;

ðk 1ÞT t\kT;

where pðtÞ 0 and t 0: The term pðt ðk 1ÞTÞhnm ðtÞ expresses the dependence between maintainable and non-maintainable failure modes, and it can be interpreted as the increase in the hazard rate of the maintainable failure mode as a consequence of the degradation of the non-maintainable failure modes. When pðtÞ ¼ 0; t 0; maintainable and non-maintainable failures are independent. The authors consider that the following form is a good model for the function pðtÞ pðtÞ ¼ p0 þ d0 hm ðt ðk 1ÞTÞ;

ðk 1ÞT t\kT;

d0 ; p0 0:

Denoting by hk ðtÞ the hazard rate function for the system in the kth maintenance interval ½ðk 1ÞT; kTÞ one obtains that hk ðtÞ ¼ hnm ðtÞ þ hm ðt ðk 1ÞTÞ þ pðt ðk 1ÞTÞhnm ðtÞ;

ðk 1ÞT t kT: ð8Þ

Let NðtÞ be the number of minimal repairs in ½0; t: Under this maintenance model, the expected number of minimal repairs performed in a replacement cycle is given by

E½NðNTÞ ¼

ðkþ1ÞT N 1 Z X k¼1

kT

hk ðtÞdt

ð9Þ

Minimal Repair Models

123 N Z X

T

E½NðNTÞ ¼ Hnm ðNTÞ þ NHm ðTÞ þ

k¼1

pðtÞhnm ððk 1ÞT þ tÞdt;

ð10Þ

0

where Hnm and Hm denote the cumulative failure rate for maintainable and nonmaintainable failure rate, respectively. Denoting by Cm the minimal repair cost, Cr the replacement cost, and Cp the preventive maintenance cost, using (10), the expected cost rate per time unit for this maintenance model is given by 1 CðT; NÞ ¼ Cr þ ðN 1ÞCp NT 0

N Z X

T

þCm @Hnm ðNTÞ þ NHm ðTÞ þ

k¼1

0

19 = pðtÞhnm ððk 1ÞT þ tÞdtA : ;

Zequeira and Bérenguer analyzed this maintenance problem in two variables T and N: For fixed T; an optimal finite number of preventive maintenances before the total replacement of the system can be obtained. Analogously, for fixed N; the optimal length between successive preventive maintenances can be obtained. Numerical examples are given to show the optimization problem in two variables. In a similar way to Zequeira and Bérenguer, Castro [4] analyzed a system subject to two dependent failure modes, but considering a different dependence model. Maintainable and non-maintainable failures arrive to the system according to nonhomogeneous Poisson processes. Minimal repairs are performed at system failures with negligible repair time. Preventive maintenance actions are performed periodically at times kT; k ¼ 1; 2; . . .; N 1: The preventive maintenance actions only reduce the failure rate of the maintainable failures but they do not affect the failure rate of the non-maintainable failures. The system is replaced by a new one at time NT: In this new maintenance model, the occurrence of maintainable failures depends on the total of non-maintainable failures from the installation of the system. Assuming that the failure rate of the maintainable failures is zero for a new system, the maintainable failure rate after the kth preventive maintenance action, namely hm;k ðtÞ, is given by hm;k ðtÞ ¼ hm ðt kTÞaNnm ðtÞ ;

kT t\ðk þ 1ÞT;

a [ 1;

ð11Þ

where Nnm ðtÞ denotes the number of non-maintainable failures in ½0; t and hm ðtÞ the maintainable failure rate for a new system. The adjustment factor aNnm ðtÞ represents the effect of the wear-out of the system (due to the non-maintainable failures) in the occurrence of the maintainable failures. To ease the analytical calculus, Castro [4] used as approximation of (11) the following expression: hm;k ðtÞ ¼ hm ðt kTÞaNnm ðkTÞ ;

kT t\ðk þ 1ÞT;

a [ 1;

ð12Þ

124

I. T. Castro

where the adjustment factor aNnm ðtÞ that represents the wear-out effect due to nonmaintainable failure rate in ½0; t is replaced by aNnm ðkTÞ , the number of nonmaintainable failure rates counted in the last preventive maintenance action. An important difference with respect to other minimal repair models, is that for this maintenance strategy, the failure rate for maintainable failures is stochastic. Due to the minimal repairs assumption under corrective maintenance actions, the nonhomogeneous Poison process that governs the occurrence of maintainable failures has a stochastic intensity. This process is called a doubly stochastic Poisson process (or Cox process) and it is obtained by randomizing the intensity in a Poisson process. To clarify the exposition of this section, we introduce the definition of doubly stochastic Poisson process and we show the probability distribution of the counting process related to it. General properties of the Cox processes are described in detail in [6]. Definition 1 A random process KðtÞ; t 0; with Kð0Þ ¼ 0; KðtÞ\1 ð0\t\1Þ almost surely and non-decreasing trajectories is called a random measure. Definition 2 Let KðtÞ be a random measure and Nhpp ðtÞ a homogeneous Poisson process. Then the process NðtÞ ¼ Nhpp ðKðtÞÞ is called a doubly stochastic Poisson process or Cox process. In this case, we shall say that the Cox process NðtÞ is controlled by the process KðtÞ or that the process KðtÞ controls the Cox process NðtÞ. An important property of the doubly stochastic Poisson processes is the following. If fNðtÞ; t 0g is a doubly stochastic Poisson process controlled by the process KðtÞ one obtains that 1 P½NðtÞ ¼ n ¼ E KðtÞn eKðtÞ ; n ¼ 0; 1; 2; . . . ð13Þ n! In the maintenance model given in Castro [4], using (12), the random intensity at time t where kT t\ðk þ 1ÞT and k ¼ 0; 1; 2; . . . for the maintainable failure rate is given by Kk ðtÞ ¼

Zt

hm ðu kTÞaNnmðkTÞ du

kT NnmðkTÞ

¼a

ð14Þ

Hm ðt kTÞ;

where Hm ðtÞ denotes the cumulative failure intensity function for maintainable failures for a new system. Under the assumption of minimal repairs when a non-maintainable failure happens, the distribution of the total of non-maintainable failures in the interval ½0; t is given by P½Nnm ðtÞ ¼ n ¼

Hnm ðtÞn expfHnm ðtÞg; n!

n ¼ 0; 1; 2; . . .; t [ 0;

ð15Þ

Minimal Repair Models

125

where Hnm ðtÞ denotes the cumulative non-maintainable failure rate. Denoting by Nm ðtÞ the number of maintainable failures in ð0; tÞ and using (13) and (14), the distribution of the maintainable failures in a maintenance interval is given by P½Nm ððk þ 1ÞTÞ Nm ðkTÞ ¼ n n 1 NnmðkTÞ ¼E Hm ðTÞ expfaNnm ðkTÞ Hm ðTÞg ; a n! for n ¼ 0; 1; 2; . . . and k ¼ 0; 1; 2; . . .. Furthermore, using (15) P½Nm ððk þ 1ÞTÞ Nm ðkTÞ ¼ n ¼

Hm ðTÞn expfHnm ðkTÞg n!

1 X ðan Hnm ðkTÞÞz z¼0

z!

ð16Þ expfaz Hm ðTÞg:

From (16), the expected number of minimal repairs due to maintainable failures between the kth preventive maintenance and the ðk þ 1Þth preventive maintenance is given by E½Nm ðkT; ðk þ 1ÞTÞ ¼

1 X

P½Nm ððk þ 1ÞTÞ Nm ðkTÞ ¼ n

n¼0

¼Hm ðTÞ expðða 1ÞHnm ðkTÞÞ;

ð17Þ

k ¼ 0; 1; 2; . . .

Hence, from (17), the expected number of minimal repairs in a replacement cycle is given by E½NðNTÞ ¼

N X

Hm ðTÞ expðða 1ÞHnm ðkTÞÞ þ Hnm ðNTÞ:

k¼1

Denoting by Cm;1 the minimal repair cost for maintainable failures, Cm;2 the minimal repair cost for non-maintainable failures, Cp the preventive maintenance cost, and Cr the replacement cost, the expected cost rate per unit time for this maintenance model is given by ( N1 X 1 CðT; NÞ ¼ Hm ðTÞ expðða 1ÞHnm ðkTÞÞ Cm;1 NT k¼0 ) þCm;2 Hnm ðNTÞ þ Cr þ ðN 1ÞCp :

ð18Þ

126

I. T. Castro

Castro [4] used Eq. 18 as objective function to analyze optimal maintenance strategies for this model. The problem was to find the values of T and N that minimize the function CðT; NÞ given in (18), that is, the values Topt and Nopt such that CðTopt ; Nopt Þ ¼ inffCðT; NÞ; T [ 0; N ¼ 1; 2; 3; . . .g: For fixed T; an optimal finite number of preventive maintenances before the total replacement of the system can be obtained. For fixed N; the optimal length between successive preventive maintenances can be obtained. If the failure rates of the maintainable and the non-maintainable modes are unbounded, this optimal value of N is finite. To analyze the optimization problem in two variables, except for some special cases, numerical results are used, but one can reduce the search of the optimal values to a finite case.

References 1. Barlow RE, Hunter LE (1960) Optimum preventive maintenance policies. Oper Res 8:90–100 2. Block HW, Borges WS, Savits TH (1985) Age-dependent minimal repair. J Appl Probab 22:370–385 3. Brown M, Proschan F (1983) Imperfect repair. J Appl Probab 20:851–859 4. Castro IT (2009) A model of imperfect preventive maintenance with dependent failure modes. Eur J Oper Res 196:217–224 5. O’Connor PDT, Newton D, Bromley R (2002) Practical reliability engineering. Wiley, Chichester 6. Grandell J (1976) Doubly stochastic poisson processes. Springer-Verlag, New York 7. Li D, Zuo MJ, Yam RC (2001) Sequential imperfect preventive maintenance models with two categories of failure modes. Naval Res Logist 48:172–183 8. Mosleh A, Rasmuson DM, Marshall FM (1998) Guidelines on modeling common-cause failures in probabilistic risk assessment. Idaho National Engineering Laboratory, USA 9. Nakagawa T (2005) Maintenance theory of reliability. Springer, London 10. Tijms HC (2003) A first course in stochastic models. Wiley, Chichester 11. Zequeira RI, Bérenguer C (2006) Periodic imperfect preventive maintenance with two categories of competing failure modes. Reliab Eng Syst Safety 91:460–468

Part II

Preventive Maintenance

Preventive Maintenance Models: A Review Shaomin Wu

1 Introduction Maintenance expenditure becomes growing in all industries, despite technological advances. Maintenance planning is therefore important for many industries, which is especially the case for the utility sectors such as water companies that own diverse assets needing to be properly maintained. There are three types of maintenance: corrective maintenance (CM), preventive maintenance (PM), and condition based maintenance (CBM). Nowadays, there is a trend that CBM has increasingly attracted the attention of both academic researchers and industrial practitioners. Undertaking CBM, however, might be unrealistic for some assets such as water mains that are too geographically largescaled to install condition monitoring equipment to monitor every inch of the pipelines. It is also an essential requirement in various scenarios, for example, when people plan maintenance strategies, select maintenance contractors, or estimate the residual lifetime for some important industrial systems (for example, nuclear power plants, planes, trains) put up for re-sale at the end of their planned life. According to [1], CM is the maintenance carried out after fault recognition and intended to put an item into a state in which it can perform a required function, and PM is the maintenance carried out at predetermined intervals or according to prescribed criteria and intended to reduce the probability of failure or the degradation of the functioning of an item. Maintenance can also be categorized according to its effectiveness in the following way.

S. Wu (&) School of Applied Sciences, Cranfield University, Bedfordshire, MK43 0AL, UK e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_4, Springer-Verlag London Limited 2011

129

130

S. Wu

Better-than-perfect maintenance: Maintenance actions that bring an item’s operating condition to a state with a smaller failure rate or/and a slower failure process than a brand-new identical item. Such maintenance can occur when a more reliable item is used to replace the old one. This can be due to technological advances, and more reliable items are produced. Perfect maintenance: Maintenance actions that restore an item’s operating condition to ‘‘as good as new’’. Upon perfect maintenance, the failure rate function of the item is the same as a new one. An example of perfect maintenance can be due to replacing an old item with a new identical one. Minimal maintenance: Maintenance actions that restore an item to the same failure rate as it had when it failed. The operating state of the item after minimal maintenance is often called ‘‘as bad as old’’ in the literature. Changing a flat tire on a car is an example of minimal repair because the overall failure rate of the car is essentially unchanged. Imperfect maintenance: Maintenance actions that restore an item to a somewhere between ‘‘as good as new’’ and ‘‘as bad as old’’. Clearly, painting anti-corrosion paint on a ship can be an example. Worse maintenance: Maintenance actions that increase the failure rate of the item. Such maintenance might be due to various reasons. For example, the maintenance personnel lack skill or they destroy the item on purpose. Thus, maintenance activities including CM and PM, can change a maintained item in one of two ways. Accordingly, the maintenance can be described as better, or worse. A better maintenance, including better-than-perfect maintenance, perfect maintenance, imperfect maintenance and minimal maintenance, can reduce the hazard rate and/or effective age of a system, whereas a worse one increases them, or even brings the system to fail or break down. Modeling the effectiveness of PM is an active research topic; see [2–18] for examples. In the literature, a number of PM models have been proposed. Some authors do not distinguish PM models from CM models. It can be found, however, that CM models differ from those of PM. The difference between PM and CM lies in the following two points. Scheduled and unscheduled: CM is unscheduled maintenance whereas PM is scheduled: CM is only carried out upon failures, whereas PM is conducted at pre-scheduled time points. Maintenance effectiveness: A CM action might bring the operating condition of an item to the state just before it failed, or the CM action is a minimal maintenance, which might bring the item from its failed state to the working state and does not change the hazard rate of the item. Undertaking minimal PM action, however, does not take any effect on the maintained item, and therefore is meaningless. Considering this difference between PM and CM, maintenance models, which can be used in CM, might not be applicable in modeling PM. For example, the model, suggested by Brown and Proschan [19], assumes that at the time of each failure a perfect maintenance/repair occurs with probability p and a minimal repair occurs with probability 1 - p, independently of the previous history of

Preventive Maintenance Models: A Review

131

repair and maintenance. As maintenance actions might be minimal repair and they do not take any effect, such PM is meaningless. Another model that can not be utilized to depict PM activities is the Block–Borges–Savits model [20], which generalized the Brown–Proschan model [19] by allowing the probability of a perfect repair to depend on the age of the failed item: assuming that at the time of each failure a perfect maintenance/repair occurs with probability p(t) and a minimal repair occurs with probability 1 - p(t). With PM models, maintenance policies can then be developed. Hundreds of maintenance policies have been introduced. Review papers on maintenance policies include [21–27]. Our primary focus is on the comparison of commonly studied PM models. The chapter emphasizes modeling rather than statistical inference. We have tried to make this review reasonably complete; however, those papers which are not included were either considered not to bear directly on the topic of this paper, or were inadvertently overlooked. Our apologies are extended to both the researchers and readers if any relevant papers are omitted. The chapter is structured as follows. Section 2 briefly reviews three existing maintenance strategies: reliability centered maintenance (RCM), total productive maintenance (TPM) and risk based maintenance (RBM). Section 3 reviews existing PM models from two aspects: PM models adjusting hazard functions and PM models adjusting calendar ages/virtual ages. Section 4 arrives at conclusions.

2 PM Strategies Maintenance strategies, including reliability centered maintenance (RCM), total productive maintenance (TPM), and risk based maintenance (RBM), are procedures to identify preventive maintenance (PM) requirements of complex systems. They have been recognized and accepted in many industrial fields.

2.1 Reliability Centered Maintenance (RCM) RCM was initially developed in the civil aircraft industry in the 1960s with the introduction of the Boeing 747 series, and the need to lower PM costs in attaining a certain level of reliability. It is a process that can form a vital part of a company’s preventive maintenance program and is a systematic approach to defining a routine maintenance program composed of cost-effective tasks that preserve important functions. It has been applied to industries where failures can be catastrophic (nuclear, chemical, offshore oil and gas) and with the aim to both reduce the amount of preventive maintenance and to increase the availability of the asset. It also aims at identifying the appropriate maintenance action while considering

132

S. Wu

safety and economics; see Rausand [28]. The detailed procedure to apply the RCM process may be industry specific but the principles are the same [29]. There are many advantages in carrying out a RCM analysis such as an increase in reliability, a record of unacceptable failure modes, a continuous reliability improvement, a highlight of design deficiencies, and a follow up and traceability of failures modes at an early stage [30]. The strength of RCM also lies in the identification of ‘‘hidden failures’’ and their consequences, failures that will not be detected if they occur under normal operating conditions [29]. Therefore, for the RCM strategy, the emphasis is on the consequence of failures rather than their characteristics [31]. However, the need of a highly experienced team and lot of time for the RCM to be implemented for all systems/items of the asset can be a problem for many companies. Moreover, the decisions making about maintenance are based on failure causes at a system level and their consequences. As an example, a failure of a critical item may put the RCM program in difficulties, as this item will be given priority over the other items and delay some planned maintenance [28]. There are some other strength and weaknesses existing in RCM. The reader is referred to Moubray [29] for more detailed discussion.

2.2 Total Productive Maintenance (TPM) TPM is a Japanese philosophy that was introduced in the mid 1970s by the Japan Institute of Plant Maintenance. This technique aims to have a total approach of the installation, operation, and maintenance actions. From its birth, it has been successfully implemented in the manufacturing industry mainly in Asia and has spread across a wide range of industries. The strength of TPM is that it emphases on a strong relationship between operators and maintainers, both of whom share the responsibilities of maintaining a level of equipment effectiveness, and it aims to eliminate all the problems that could lead to asset failures. One of the weaknesses of TPM is that the fundamental principle of TPM is to repair all the defects (even the small ones), which is an unachievable goal [32]. Willmott [32] also pointed out, for example, that TPM alone might not be sufficient to guarantee the integrity of a pipeline system because it does not consider the different aspects of risk.

2.3 Risk Based Maintenance (RBM) RBM had been introduced for many years in some industries such as the offshore industry. It aims to prioritize the efforts of maintenance on critical equipment which might have a high probability of failure and the failure can cause high consequence. RBM allows the planning of maintenance and a decision-making process that reduce the probability of failures as well as their consequences [33].

Preventive Maintenance Models: A Review

133

Khan and Haddara [34] proposes a quantitative methodology based on three modules: risk estimation, risk evaluation, and maintenance planning. Strengths of RBM lie in the fact that it can increase asset reliability and availability [33] while safety and environment are also considered, and it provides answers to important questions such as the causes and consequences of failures, the probability that it will occur, and the frequencies of maintenance or inspection for a specific item [34]. One of the weaknesses of applying RBM lies in the selection of risk acceptance criteria during the risk evaluation phase. Moreover, a risk-based approach provides only insight about the uncertainties and possible consequences of an activity, not about the appropriateness of the decision that follows [35]. The reader is referred to [29, 36, 37] for more comprehensive discussion and reviews on RCM, TPM, and RBM, respectively.

3 PM Models PM models are developed based on either adjusting the hazard rate function of an item or adjusting the age (or so-called virtual age) of an item. Let 0 T1 \T2 \. . ., where Tk is the time interval between the (k-1)th and the kth PM. We assume that corrective maintenance on any failures between two adjacent PM’s is minimal maintenance. Let hk1 ðtÞ denote the hazard function of an item before the kth PM. The corresponding density and cumulative distribution functions are denoted, respecfk1 ðtÞ : Denote Rk1 ðtÞ ¼ 1 Fk1 ðtÞ: tively, fk1 ðtÞ and Fk1 ðtÞ; so hk1 ðtÞ ¼ 1F k1 ðtÞ After the kth PM action on an item, the hazard function of the item is assumed to change from hk1 ðtÞ to hk ðtÞ; and hk ðtÞ can be any non-negative functions. A special case of hk ðtÞ associated with hk1 ðtÞ is given by hk ðtÞ ¼ a1 hk1 ða2 t þ a3 Þ þ a4 :

ð1Þ

Now let us investigate a more special case where the Fk1ðtÞ is the Weibull b ; which implies cumulative distribution function, or Fk1 ðtÞ ¼ 1 exp tc g

hk1 ðtÞ ¼

b t c b1 : g g

ð2Þ

With Eq. 1, we have b a2 t c þ a3 b1 þa4 : hk ðtÞ ¼ a1 g g Eq. 3 can also be re-written as

ð3Þ

134

S. Wu

b t c1 b1 hk ðtÞ ¼ þa4 ; g1 g1

ð4Þ

1

b 3 and c1 ¼ ca where g1 ¼ gða1 ab1 2 Þ a2 : From Eq. 4, we have

Case 1 If a1 6¼ 0 and a1 6¼ 1; a2 ¼ 1; and a3 ¼ a4 ¼ 0; then the kth PM changes 1

the scale parameter of the Weibull distribution from g to ga1 b ; and the other two parameters (the location parameter and the shape parameter) remain unchanged. Case 2 If a1 ¼ 1; a2 6¼ 0 and a2 6¼ 1; and a3 ¼ a4 ¼ 0; then the kth PM changes 1

b the scale parameter of the Weibull distribution from g to g1 ¼ gðab1 2 Þ ; and changes the location parameter from c to ac2 ; whereas the shape parameter remains unchanged.

Case 3 If a1 ¼ 1; a2 ¼ 1; a3 6¼ 0; and a4 ¼ 0; then the kth PM changes the location parameter of the Weibull distribution from c to c a3 ; whereas the shape parameter and the scale parameter remain unchanged. Case 4 If a1 ¼ 1; a2 ¼ 1; a3 ¼ 0; and a4 6¼ 0; then the kth maintenance changes the survival function from the Weibull distribution to a new one, and it reduces the failure rate of the item. From the above analysis, it is found that: • a1 changes the scale parameter; • a2 changes both the scale parameter and the location parameter; • a3 changes the location parameter. Changing the location parameter in the Weibull distribution actually adjusts the age of the item, which can be associated with the concept of virtual age suggested by Kijima et al. [10] and Kijima [11]. • a4 changes the Weibull distribution to another distribution. One can therefore call a1 a scale adjustment parameter, a3 a location adjustment parameter, both a2 and a4 hybrid parameters. Accordingly, we can categorize their corresponding PM models to be scale adjustment models, location adjustment models, and hybrid models. PM models associated with these three categories are reviewed below.

3.1 Location Adjustment PM Models A perfect PM will restore the item to an as-new state. One can therefore express it as hk ðtÞ ¼ h0 ðtÞ; where t is time starting from 0. Nakagawa [4] introduces a PM model, or so-called age reduction model. He assumes that the age of the item after the kth PM reduces to bk t when it was t before the PM, where bk 2 ð0; 1Þ:

Preventive Maintenance Models: A Review

135

Kijima et al. [10] and Kijima [11] introduce two types of maintenance models, type I and type II, using the concept of virtual age. The idea is to distinguish between the system’s age, which is the time elapsed since the system was new, usually at time t ¼ 0; and the virtual age of the system, which describes its present health condition when compared to a new system. The two models are Vk ¼ Vk1 þ jk Xk ; and Vk ¼ jk ðVk1 þ Xk Þ; where Vk is the virtual age of the system immediately after the kth PM, and jk is a parameter taking a value between 0 and 1. In both models, if jk ¼ 0 for all k 1; then the kth PM is a perfect maintenance; whereas, if jk ¼ 1 for all k 1; then the kth PM is a minimal maintenance. Interesting extensions on the virtual age concept have been made by other authors. Dorado et al. [14] defined a general repair model, or so-called DHS model, that contains many popular repair models and introduces many others. The DHS model is based on two sequences fVk g and fhk g; called the virtual ages and life supplements, respectively, satisfying: V1 ¼ 0; h1 ¼ 1; Vk 0; hk 2 ð0; 1 and Vk Vk1 þ hk1 Tk1 ; k 2: The joint distribution of the fTj g is PðTk tjV1 ; . . .; Vk ; h1 ; . . .; hj ; T1 ; . . .; Tj1 Þ ¼ 1

1 Fðhj t þ Vk Þ : 1 FðVk Þ

ð5Þ

The DHS model extends the perfect repair model, the minimal repair model, the Kijima I model, the Kijima II model, among others. Dagpunar [13] considers the case in which the virtual age after the kth maintenance can be expressed as Vk ¼ /ðVk1 þ Xk Þ (where /ðÞ is an arbitrary scaling function that models the effectiveness of maintenance); Dorado et al. [14] studied nonparametric statistical inference in a model slightly more general than Kijima’s models. More references can be found in [15, 16]. Kijma’s virtual age concept was originally introduced to model the effectiveness of CM activities. It has been applied to the PM case recently by some authors (for example [17, 18]). Canfield [5] considers the periodic PM case. He distinguishes between the level of the hazard, and the shape of the hazard function as they are related to system degradation with time. The hazard level reflects the extent of the system degradation. The shape of the hazard function at a given time reflects the rate at which the hazard is changing. In the Canfield model [5], the effective age after PM reduces to t s if the item’s effective age was t just prior to this PM, while the hazard level remains unchanged, where sð 0Þ is the restoration interval at the effective age of the item due to the kth PM. The restoration interval s in this model is an index for measuring the quality of PM. hk ðtÞ ¼ h0 ðt þ kðT sÞÞ þ

k X

fh0 ðði 1ÞðT sÞ þ TÞ h0 ðiðT sÞÞg;

ð6Þ

i¼1

where T is a fixed constant time length between two adjacent PM actions. When s ¼ T; and suppose h0 ð0Þ ¼ 0; the Canfield model reduces to hk ðtÞ ¼ h0 ðtÞ þ kh0 ðTÞ:

ð7Þ

136

S. Wu

Parameter s in the Canfield model is assumed to be a fixed constant. Wu and Clements-Croome [9] consider s as a random variable, and develop PM policies.

3.2 Scale Adjustment PM Models Nakagawa [4] introduces a PM model, or so-called an hazard rate model. He assumes that the hazard rate after the kth PM becomes a hk1 ðtÞ: Chan and Shaw [7] introduces a similar PM model in which they assume that the failure rate after a PM is proportional to that before the PM. Parameter s in the Canfield model is assumed to be a fixed constant. Wu and Clements-Croome [9] consider a as a random variable, and develop PM policies.

3.3 Hybrid PM Models Chan and Shaw [7] introduce a PM model, called failure rate reduction model in [7], which is Case 4 discussed in Eq. 4, or a1 ¼ 1; a2 ¼ 1; a3 ¼ 0; and a4 6¼ 0 in Eq. 4. Malik [2] introduces a PM model assuming that a PM action reduces the calendar age t to effective age t=a; where a is the improvement factor and a 1: He gave an example saying that after maintenance, an item’s reliability RðtÞ would become Rðt=aÞ. If one uses hazard functions to represent Malik’s model, it can be found that Malik’s model is the case when a1 ¼ a2 ; a3 ¼ a4 ¼ 0 in Eq. 4. Malik [2] recommended to use expert judgement to estimate the improvement factor in his model, whereas Lie and Chun [38] proposed using cost information to determine the improvement factor a in the Malik model. Lin et al. [3, 39] introduce a PM model that assumes a1 6¼ a2 ; and a3 ¼ a4 ¼ 0 in Eq. 4. They used the PM model to develop PM policies. Seo and Bai [12] introduced a periodic PM model. They define hk ðxk1 ðtÞÞ ¼ hk1 ðXðxk2 ðtÞ; tÞÞ; where XðÞ and xðÞ are specified functions, and T is a fixed constant time length between two adjacent PM actions. Similarly, maintenance models are also proposed by Lam [40], who defines the geometric process as an alternative to the NHPP (non-homogeneous Poisson process): a sequence of random variables fXk ; k ¼ 1; 2; . . .g is a geometric process if the distribution function of Xk is given by Fðak1 tÞ for k ¼ 1; 2; . . .; and a is a positive constant. The hazard rate changes from hk1 ðtÞ before a maintenance activity to ahk1 ðatÞ after the maintenance. The change is similar to the hybrid PM models. Wang and Pham [41] later refer to a process similar to the geometric process as a quasi-renewal process. Wu and Clements-Croome [42] extend the geometric process by replacing its parameter ak1 with a1 a2k1 þ b1 b2k1 ; where a2 [ 1; and 0\b2 \1: The geometric process has been studied by many authors

Preventive Maintenance Models: A Review

137

(for example, see [6, 43–45]). However, we have found very few works in the application of the geometric process to modeling PM. Braun et al. [46] introduce a model, called a-series process, in which the survival times after each failure are Xk ka ; fk ¼ 1; 2; . . .g: Finkelstein [47] develops a model where he defines a general deteriorating renewal process such that Fkþ1 ðtÞ Fk ðtÞ:

3.4 Remarks Wu and Zuo [48] review existing PM models, investigate their inter-relationship and propose two new categories of PM models: linear PM models and nonlinear PM models based on hazard functions. The above PM models are defined based on hazard functions. For repairable item, however, intensity functions are often used. Doyen and Gaudoin [26] proposed six repair models using intensity functions. These models [26] can be seen as extensions of many existing PM models as above-mentioned. Research on using intensity functions to define repair models can also be found in Dijoux [49]. Most of the existing research assumes that maintenance cannot bring an item to an operating condition better than a new identical item. Some authors, for example, Clavareau and Labeau [50, 51] consider replacement policies for the better-than-perfect maintenance. In the existing publications, however, there has been found little research that a PM model might adjust the shape parameter. As shown in Eq. 3, the first three parameters (i.e., a1 ; a2 ; and a3 ) can change the location parameter and the scale parameter, but not the shape parameter. The fourth parameter a4 changes the Weibull distribution to another distribution. It might be interesting to develop a PM model that might be able to adjust the shape parameter of the Weibull distribution. As the shape parameter can be more effective in describing the deterioration behavior of an item, it is worthwhile developing such shape adjustment PM models.

4 Conclusions RCM (Reliability Centered Maintenance), TPM (Total Productive Maintenance), and RBM (Risk Based Maintenance) are higher level maintenance strategies, which can be applied in selecting different maintenance policies including corrective maintenance and preventive maintenance in different industries. A brief review was conducted on these three commonly used approaches in this paper. A lower level of maintenance policy might need mathematical models of preventive maintenance (PM) to help in optimizing a pre-specified criterion such as the minimization of the expected cost rate. Selecting a proper PM model can be

138

S. Wu

critically important as different PM models can measure different levels of maintenance effectiveness. This paper also reviewed existing PM models. It should be noted that preventive maintenance can also be carried out on multicomponent systems and/or multi-state systems. The reader is referred to [23, 52, 53] for more detailed discussion in these two areas. Acknowledgements This research was supported by Engineering and Physical Sciences Research Council (EPSRC) of the United Kingdom (EPSRC Grant reference: EP/G039674/1).

References 1. British Standard (1993) Glossary of terms used in terotechnology. Brit Standard BS 3811:1993 2. Malik MAK (1979) Reliable preventive maintenance scheduling. AIIE Trans (Am Inst Ind Eng) 11(3):221–228 3. Lin D, Zuo MJ, Yam RCM (2000) General sequential imperfect preventive maintenance models. Int J Reliability Qual Safety Eng 7(3):253–266 4. Nakagawa T (1988) Sequential imperfect preventive maintenance policies. IEEE Trans Reliability 37(3):295–298 5. Canfield RV (1986) Cost optimization of periodic preventive maintenance. IEEE Trans Reliability R-35(1):78–81 6. Wu S, Clements-Croome D (2005) Optimal maintenance policies under different operational schedules. IEEE Trans Reliability 54(2):338–346 7. Chan JK, Shaw L (1993) Modeling repairable systems with failure rates that depend on age and maintenance. IEEE Trans Reliability 42(4):566–571 8. Zhang F, Jardine AKS (1998) Optimal maintenance models with minimal repair, periodic overhaul and complete renewal. IIE Trans (Inst Ind Eng) 30(12):1109–1119 9. Wu S, Clements-Croome D (2005) Preventive maintenance models with random maintenance quality. Reliability Eng Syst Safety 90(1):99–105 10. Kijima M, Morimura H, Suzuki Y (1988) Periodical replacement problem without assuming minimal repair. Eur J Oper Res 37(2):194–203 11. Kijima M (1989) Some results for repairable systems with general repair. J Appl Probab 26(1):89–102 12. Seo JH, Bai DS (2004) An optimal maintenance policy for a system under periodic overhaul. Math Comput Modelling 39(4–5):373–380 13. Dagpunar JS (1997) Renewal-type equations for a general repair process. Qual Reliability Eng Int 13(4):235–245 14. Dorado C, Hollander M, Sethuraman J (1997) Nonparametric estimation for a general repair model. Ann Stat 25(3):1140–1160 15. Lindqvist BH (2006) On the statistical modeling and analysis of repairable systems. Stat Sci 21(4):532–551 16. Baxter LA, Kijima M, Tortorella M (1996) A point process model for the reliability of a maintained system subject to general repair. Commun Stat Part C Stochastic Models 12(1):37–65 17. Kahle W (2007) Optimal maintenance policies in incomplete repair models. Reliability Eng Syst Safety 92(5):563–565 18. Bartholomew-Biggs M, Zuo MJ, Li X (2009) Modelling and optimizing sequential imperfect preventive maintenance. Reliability Eng Syst Safety 94:53–62 19. Brown M, Proschan F (1983) Imperfect repair. J Appl Probab 20:851–859

Preventive Maintenance Models: A Review

139

20. Block HW, Borges WS, Savits TH (1985) Age-dependent minimal repair. J Appl Probab 22(2):370–385 21. Pham H, Wang H (1996) Imperfect maintenance. Eur J Oper Res 94(3):425–438 22. Scarf PA (1997) On the application of mathematical models in maintenance. Eur J Oper Res 99(3):493–506 23. Dekker R, Wildeman RE, Van Der Duyn Schouten FA (1997) A review of multi-component maintenance models with economic dependence. Math Methods Oper Res 45(3):411–435 24. Dekker R, Scarf PA (1998) On the impact of optimisation models in maintenance decision making: the state of the art. Reliability Eng Syst Safety 60(2):111–119 25. Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139(3):469–489 26. Doyen L, Gaudoin O (2004) Classes of imperfect repair models based on reduction of failure intensity or virtual age. Reliability Eng Syst Safety 84(1):45–56 27. Desai A, Mital A (2006) Design for maintenance: basic concepts and review of literature. Int J Product Dev 3(1):77–121 28. Rausand M (1998) Reliability centered maintenance. Reliability Eng Syst Safety 60(2):121–132 29. Moubray J (1997) Reliability-centered maintenance II. Butterworth-Heinemann, Oxford 30. Glover C (2000) Asset management: reliability centred maintenance and overhead lines. IEE Colloquium (Digest) (31):31–35 31. Ostebo R, Nerhus O, Heggland J (1992) Optimising field design and operational requirements by integrated reliability and maintenance analysis. In: European petroleum conference, Cannes, France, pp 187–196 32. Willmott P (1994) Total productive maintenance: the Western way. Butterworth-Heinemann, Oxford 33. Krishnasamy L, Khan F, Haddara M (2005) Development of a risk-based maintenance (RBM) strategy for a power-generating plant. J Loss Prevention Process Ind 18(2):69–81 34. Khan FI, Haddara MM (2003) Risk-based maintenance (RBM): a quantitative approach for maintenance/inspection scheduling and planning. J Loss Prevention Process Ind 16(6):561–573 35. Ersdal G, Aven T (2008) Risk informed decision-making and its ethical basis. Reliability Eng Syst Safety 93(2):197–205 36. Ahuja IPS, Khamba JS (2008) Total productive maintenance: literature review and directions. Int J Qual Reliability Manage 25(7):709–756 37. Arunraj NS, Maiti J (2007) Risk-based maintenance—techniques and applications. J Hazardous Mater 142(3):653–661 38. Lie CH, Chun YH (1986) Algorithm for preventive maintenance policy. IEEE Trans Reliability R-35(1):71–75 39. Lin D, Zuo MJ, Yam RCM (2001) Sequential imperfect preventive maintenance models with two categories of failure modes. Naval Res Logistics 48(2):172–183 40. Lam Y (1988) Geometric processes and replacement problem. Acta Math Appl Sin 4(4):366–377 41. Wang H, Pham H (1996) A quasi renewal process and its applications in imperfect maintenance. Int J Syst Sci 27(10):1055–1062 42. Wu S, Clements-Croome D (2006) A novel repair model for imperfect maintenance. IMA J Manage Math 17(3):235–243 43. Wang H, Pham H (2006) Reliability and optimal maintenance. Springer, London 44. Jia J, Wu S (2009) A replacement policy for a repairable system with its repairman having multiple vacations. Comput Ind Eng 57(1):156–160 45. Jia J, Wu S (2009) Optimizing replacement policy for a cold-standby system with waiting repair times. Appl Math Comput 214(1):133–141 46. Braun JW, Li W, Zhao YQ (2005) Properties of the geometric and related processes. Naval Res Logistics 52(7):607–616

140

S. Wu

47. Finkelstein MS (1993) A scale model of general repair. Microelectronics Reliability 33(1):41–44 48. Wu S, Zuo MJ (2011) Linear and nonlinear preventive maintenance. IEEE Trans Reliability 59(1):242–249 49. Dijoux Y (2009) A virtual age model based on a bathtub shaped initial intensity. Reliability Eng Syst Safety 94(5):982–989 50. Clavareau J, Labeau P-E (2009) Maintenance and replacement policies under technological obsolescence. Reliability Eng Syst Safety 94(2):370–381 51. Clavareau J, Labeau P (2009) A Petri net-based modelling of replacement strategies under technological obsolescence. Reliability Eng Syst Safety 94(2):357–369 52. Liu Y, Huang H-Z, Zuo MJ (2009) Optimal selective maintenance for multi-state systems under imperfect maintenance. In: Proceedings—annual reliability and maintainability symposium, pp 321–326 53. Noriega HC, Frutuoso M, Paulo F (1998) Non-stationary point-process maintenance-model for multi-unit systems. In: Proceedings of the annual reliability and maintainability symposium, pp 189–194

Optimal Schedules of Two Periodic Imperfect Preventive Maintenance Policies and Their Comparison Dohoon Kim, Jae-Hak Lim and Ming J. Zuo

1 Introduction As most of industrial systems become more complex and multiple-function oriented, it is extremely important to avoid the catastrophic failure during actual operation as well as to slow down the degradation process of the system. One way of achieving these goals is to take preventive maintenances while the system is still operating. Although more frequent preventive maintenance (PM) actions certainly would keep the system less likely to fail during its operation, such PM policy inevitably requires a higher cost of maintaining the system. Since Barlow and Hunter [1] proposed two types of PM policies, many authors have addressed the problem of the optimal schedule for the PM policy by determining either the length of the time interval between PM actions or the number of PM actions before replacement, each of which minimizes the expected cost rate. PM actions do not only reduce the hazard rate of system but also slow down the degradation process. A number of PM models which reflect various effects of PM

D. Kim Department of Applied Information Statistics, Kyonggi University Suwon, Gyenggi-do 443-760, Korea e-mail: [email protected] J.-H. Lim (&) Department of Accounting, Hanbat National University, Yusong-gu, Daejon 305-719, Korea e-mail: [email protected] M. J. Zuo Department of Mechanical Engineering, University of Alberta, Edmonton AB T6G 2G8, Canada e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_5, Springer-Verlag London Limited 2011

141

142

D. Kim et al.

actions have been proposed. Different types of PM models studied in earlier literatures are excellently summarized in Pham and Wang [12] and Wang [13]. The earliest PM models assume that the system undergoes PM action at specified times and is restored to as good as new after each PM action. However, although the PM action improves the system and slows down the degradation process, it is very unlikely that it restores the system to a new one for a practical system in use. That introduces the concept of imperfect PM model, which has attracted many researchers’ attention. Although imperfect maintenance covers various categories of repair and maintenance actions, it is assumed in this paper that imperfect maintenance restores the system operating state to somewhere between as good as new and as bad as old. That is known as the improvement factor model. (Pham and Wang [12]; Wang [13]) The effects of the PM action have been modeled by using the system effective age or the hazard rate function, Zequerira and Bérenguer [14]. Malik [10] introduces the concept of improvement factor in which the hazard rate after PM action lies between ‘as good as new’ and ‘as bad as old’ and proposes an algorithm to determine successive PM intervals in the sequential PM policy. Lie and Chun [8] present a general expression to determine these PM intervals in the PM policy considered by Malik [10]. Canfield [3] suggests an imperfect PM model in which PM action does not reduce the hazard rate but slows down the wear-out speed. In Jayabalan and Chaudhuri [7], after each PM action, the system is in a state between as good as new and as bad as old in such a way that the effective age of the system after PM action is reduced to a certain age proportional to an improvement factor. Chan and Shaw [4] consider two types of hazard rate reduction models after each PM which are hazard rate with fixed reduction and hazard rate with proportional reduction. Lim and Park [9] propose a periodic PM policy in which PM action reduces the hazard rate of the system, but the effect of PM gets diminished as the number of PMs increases. Recently, BartholomewBiggs et al. [2] consider two sequential imperfect PM models in one of which each PM reduces the effective age gained in the PM period just prior to PM action and in another of which each PM action reduces the effective age gained since t ¼ 0: In practice, decision makers in a maintenance organization often encounter a decision problem of selecting the more effective PM policy among the following two PM policies. In the first PM policy, a system is maintained by conducting simple PM actions such as replacement or supplement of components at each PM epoch (Local PM policy). In maintenance of vehicles, change of engine oil, or replacement of belts or gaskets at every PM epoch are typical examples of such a PM action. Then this type of PM actions reduces the wear-out of the system occurred during the past short operating period. In the second PM policy, the system is under overall inspection and all its faulty parts detected by inspection are maintained or replaced (Global PM policy). Then this type of PM actions reduces the wear-out of the system since the earlier stage of the system operation. If the costs for the two PM actions are equal, it is intuitively clear that the global PM policy outperforms the local PM policy. Since the PM action of the global PM policy, however, is naturally more costly than that of the local PM policy, it is not

Optimal Schedules of Two Periodic Imperfect PM Policies

143

easy for the decision makers to determine which PM policy is more effective. Which PM policy should the decision makers take in order to save the maintenance cost of the system? In this paper, we address the answer to the above question by developing two periodic imperfect PM policies, which use two types of PM models discussed in Doyen and Gaudoin [5]. Section 2 describes the periodic imperfect PM policies and their assumptions. For each PM policy, the expression of the expected cost rate per unit time is obtained. Section 3 discusses the optimal period optimal period and the optimal number of PM actions for each PM policy. In Sect. 4, we conduct analytical comparison of the optimal schedules of the two periodic imperfect PM policies for various cost structures. Section 5 investigates numerically the sensitivity of the cost structures and PM models to optimal schedules. The following notations are adopted throughout this paper. hðtÞ hpm1 ðtÞ hpm2 ðtÞ x N1 p Cmr Cpm1 Cpm2 Cre C1 ðx; NÞ C2 ðx; NÞ

hazard rate without PM action hazard rate under the local PM action hazard rate under the global PM action time interval between two successive PM actions number of PM actions before replacement hazard rate reduction factor due to PM action, ð0 p 1Þ cost of minimal repair at failure for each of the two PM policies PM cost of the local PM policy PM cost of the global PM policy cost of replacement of both PM policies expected cost rate (per unit time) of the local PM policy expected cost rate (per unit time) of the global PM policy

2 The Proposed PM Policies and the Expected Cost Rates In this section, we describe two periodic imperfect PM policies which are called the local PM policy and the global PM policy, respectively, and obtain the expected cost rate per unit time for each PM policy. The local and the global PM policies and their related assumptions are described as follows. 1. The system begins to operate at time t ¼ 0: 2. For the local and the global PM policies, a system is preventively maintained at periodic time kx ðk ¼ 1; 2; . . .; N; x 0Þ with PM costs Cpm1 and Cpm2 ; respectively, and is replaced by a new one at the Nth PM. 3. In the local PM policy, the hazard rate hpm1 ðkxþ Þ right after the kth PM action is reduced to hpm1 ðkx Þ p½hpm1 ðkx Þ hpm1 ððk 1Þxþ Þ where hpm1 ðkx Þ is the hazard rate just prior to the kth PM action, hpm1 ððk 1Þxþ Þ is the hazard rate right after the ðk 1Þst PM action and 0 p 1: In the global PM policy, the

144

4. 5. 6. 7. 8.

D. Kim et al.

hazard rate hpm2 ðkxþ Þ right after the kth PM action is reduced to hpm2 ðkx Þ phpm2 ðkx Þ: The system undergoes minimal repair at failures between PM actions. The PM cost of the global PM policy, Cpm2 ; is higher than or equal to that of the local PM policy, Cpm1 : It takes negligible time to perform a repair or a PM action. hðtÞ is differentiable, strictly increasing, and convex function. hð0Þ ¼ 0:

It is noted that a PM action has an effect on the relative wear-out since the last PM action in the local PM policy while a PM action reduces the hazard rate by an amount proportional to the current hazard rate in the global PM policy. It is also noted that the wear-out speed after each PM action is the same as that just before the PM action is conducted. More explicitly, the hazard rates of the proposed periodic imperfect PM policies are as follows. 1. Local PM policy: hpm1 ðtÞ ¼

hðtÞ; 0\t x; hðtÞ phðkxÞ; kx\t ðk þ 1Þx;

2. Global PM policy: hðtÞ; Pk1 hpm2 ðtÞ ¼ ð1 pÞj hððk jÞxÞ; hðtÞ p j¼0

0\t x kx\t ðk þ 1Þx;

ð1Þ

ð2Þ

for k ¼ 1; 2; . . .; N 1; 0 p 1; hpm ð0Þ ¼ 0: Figure 1 shows the hazard rates of the systems with the two PM policies considered. Since PM action in the global PM policy has more effect on the wearout of the system than that of the local PM policy, it is natural that the system with the global PM policy deteriorates more slowly than the system with the local PM policy. In order to derive the formula that computes the expected cost rate, we use the well-known fact that the number of minimal repairs between the ðk 1Þst PM and the kth PM action follows a nonhomogeneous Poisson process (NHPP) with intensity function hpm ðtÞ; Fontenot and Proschan [6]. Since the life cycle of the system is equal to Nx and the total cost of maintaining the system is obtained as the sum of costs for PM action, minimal repair and replacement, the expected cost rate per unit time during the life cycle can be obtained as follows. 1. Local PM policy: " # ( ) N 1 X 1 C1 ðx; NÞ ¼ hðkxÞ þ ðN 1ÞCpm1 þ Cre : Cmr HðNxÞ px Nx k¼0

ð3Þ

Optimal Schedules of Two Periodic Imperfect PM Policies

145

Fig. 1 Hazard rates of systems with the local PM policy and the global PM policy

2. Global PM policy: " ( ) N1 X k1 X 1 j C2 ðx; NÞ ¼ ð1 pÞ hððk jÞxÞ Cmr HðNxÞ px Nx k¼0 j¼0 #

ð4Þ

þðN 1ÞCpm2 þ Cre ;

where HðtÞ ¼

Rt 0

hðxÞdx:

3 Optimal Schedules for the Periodic PM Policies We use the conditions provided by Nakagawa [11] to investigate the optimal period x and the optimal number of PM actions, N , which minimize the expected cost rate per unit time.

3.1 Local PM Policy In order to show the existence and uniqueness of the optimal N and the optimal period x ; which minimize C1 ðx; NÞ; we rewrite the expected cost rate as follows.

146

D. Kim et al.

2 3 ZNx 14 C1 ðx; NÞ ¼ Cmr hpm ðtÞdt þ ðN 1ÞCpm1 þ Cre 5 Nx 0 3 2 Zkx N X 16 7 ¼ 4Cmr ðhðtÞ phððk 1ÞxÞÞdt þ ðN 1ÞCpm1 þ Cre5 Nx k¼1 ðk1Þx 2 3 x Z N X 14 ½hðu þ ðk 1ÞxÞ phððk 1ÞxÞdu þ ðN 1ÞCpm1 þ Cre 5 Cmr ¼ Nx k¼1 0

ð5Þ Let rk ðtÞ ¼ hðt þ ðk 1ÞxÞ phððk 1ÞxÞ: Then 2 3 x N Z X 14 C1 ðx; NÞ ¼ rk ðtÞdt þ ðN 1ÞCpm1 þ Cre 5: Cmr Nx k¼1

ð6Þ

0

Nakagawa [11] shows that the sufficient conditions for the Eq. 6 to have the optimal N are (i) rk ðtÞ is increasing in k and (ii) rN ðtÞ ! 1 as N ! 1 for all t 2 ð0; xÞ: He also shows that the sufficient conditions for the Eq. 6 to have the optimal x are (i) rk ðtÞ is differentiable for all t 2 ð0; xÞ and (ii) rk ðtÞ ! 1 as t ! 1 for all k: The following theorems show that rk ðtÞ in the Eq. 6 satisfies all those conditions. Theorem 1 Suppose that 0 p\1: If hðtÞ is strictly increasing and convex in t 0; then there exists a finite and unique N1 which minimizes the expected cost rate in the Eq. 3 for a given x [ 0: Proof For any t 2 ð0; xÞ and k; rkþ1 ðtÞ rk ðtÞ ¼ hðt þ kxÞ phðkxÞ ½hðt þ ðk 1ÞxÞ phððk 1ÞxÞ ¼ hðt þ kxÞ hðt þ ðk 1ÞxÞ p½hðkxÞ hððk 1ÞxÞ [ ð1 pÞ½hðkxÞ hððk 1ÞxÞ 0: The inequality holds since hðtÞ is strictly increasing and convex in t 0: And rN ðtÞ ¼ hðt þ ðN 1ÞxÞ phððN 1ÞxÞ > ð1 pÞhððN 1ÞxÞ becomes infinity as N ! 1 since hðtÞ is convex and strictly increasing to infinity as t goes to infinity. According to Nakagawa [11], there exists a finite and unique N1 which minih mizes C1 ðx; NÞ for a given x [ 0. Theorem 2 If hðtÞ is differentiable and strictly increasing in t 0; then there exists a finite and unique x1 which minimizes the expected cost rate in the Eq. 3 for a given integer N:

Optimal Schedules of Two Periodic Imperfect PM Policies

147

Proof rk ðtÞ ¼ hðt þ ðk 1ÞxÞ phððk 1ÞxÞ is differentiable since hðtÞ is differentiable and is strictly increasing to 1 since hðtÞ ! 1 as t ! 1: According to Nakagawa [11], there exists a finite and unique x1 which minimizes C1 ðx; NÞ for a given N. h

3.2 Global PM Policy By utilizing the same technique used in Sect. 3.1, we can show the existence and uniqueness of the optimal number of PMs, N2 ; and the optimal period, x2 ; which separately minimize C2 ðx; NÞ: They are summarized in the following theorems. Theorem 3 If hðtÞ is strictly increasing in t 0; then there exists a finite and unique N2 which minimizes the expected cost rate in the Eq. 4 for any x [ 0: Proof The proof is omitted since it is similar to the proof of Theorem 1.

h

Theorem 4 If hðtÞ is differentiable and strictly increasing in t 0; then there exists a finite and unique x2 which minimizes the expected cost rate in the Eq. 4 for any integer N: Proof The proof is omitted since it is similar to the proof of Theorem 2.

h

4 Comparison of Optimal Schedules for the Periodic PM Policies As we mentioned earlier, if costs for PM action in both PM policies are the same, it is intuitively clear that the global PM policy outperforms the local PM policy. This can also be shown analytically. When the two PM costs are not equal, it can not be guaranteed that the global PM policy outperforms the local PM policy. In this section, we investigate the relationship between the optimal schedule of the local PM policy and that of the global PM policy. We consider only the case when the PM cost of the global PM policy is higher than that of the local PM policy. The results are summarized in the following two subsections.

4.1 Comparison of the Optimal Numbers of PM Actions Let N1 and N2 be the optimal numbers of PM actions of the local and the global PM policies, respectively. Then N1 and N2 are values of N which satisfy the following inequalities. (See Lim and Park [9] for more details.) C1 ðx; N þ 1Þ C1 ðx; NÞ 0

and C1 ðx; N 1Þ C1 ðx; NÞ 0;

ð7Þ

148

D. Kim et al.

and C2 ðx; N þ 1Þ C2 ðx; NÞ 0

and C2 ðx; N 1Þ C2 ðx; NÞ 0:

ð8Þ

It can be easily shown that C1 ðx; N þ 1Þ C1 ðx; NÞ 0 and C1 ðx; N 1Þ C1 ðx; NÞ 0 imply that Zx N

rNþ1 ðtÞdt

N Z X k¼1

0

x

rk ðtÞdt

Cre Cpm1 ; Cmr

ð9Þ

0

and

ðN 1Þ

Zx

N1 Z X

rN ðtÞdt

k¼1

0

x

rk ðtÞdt\

Cre Cpm1 ; Cmr

ð10Þ

0

respectively. Let

L1 ðx; NÞ ¼ N

Zx

rNþ1 ðtÞdt

N Z X k¼1

0

x

rk ðtÞdt ¼

N Z X k¼1

0

x

½rNþ1 ðtÞ rk ðtÞdt;

ð11Þ

0

where rk ðtÞ ¼ hðt þ ðk 1ÞxÞ phððk 1ÞxÞ: Then N1 is the value of N satisfying L1 ðx; NÞ

Cre Cpm1 Cmr

and

L1 ðx; N 1Þ\

Cre Cpm1 ; Cmr

Analogously, C2 ðx; N þ 1Þ C2 ðx; NÞ 0 and C2 ðx; N 1Þ C2 ðx; NÞ 0 imply that L2 ðx; NÞ

Cre Cpm1 Cmr

and

L2 ðx; N 1Þ\

Cre Cpm1 ; Cmr

ð12Þ

where

L2 ðx; NÞ ¼ N

Zx 0

qNþ1 ðtÞdt

N Z X k¼1

x

2 x 3 Z Zx N X 4 qNþ1 ðtÞ qk ðtÞ5dt; qk ðtÞdt ¼

0

Pk2

k¼1

0

0

and qk ðtÞ ¼ hðt þ ðk 1ÞxÞ p j¼0 ð1 pÞj hððk j 1ÞxÞ: Then N2 is the value of N satisfying the Eq. 12. Figure 2 shows the typical pattern of L1 ðx; NÞ and L2 ðx; NÞ when the initial hazard rate of the system is strictly increasing. It is obvious from Fig. 2 that if the costs for PM actions in the local PM policy and in the global PM policy are the same, N2 is greater than N1 for any given x [ 0: That is, the system with the global PM policy can be useful longer than the system with the local PM policy before it has to

Optimal Schedules of Two Periodic Imperfect PM Policies

149

Fig. 2 L1 ðx; NÞ and L2 ðx; NÞ

be replaced by a new one. When the cost for PM action in the global PM policy is higher than that for PM action in the local policy, it is not guaranteed that N2 is greater than or equal to N1 : In some case, N2 could be smaller than N1 for given x [ 0 as shown in Fig. 2. These results are summarized in the following theorem. Theorem 5 Suppose that hðtÞ is strictly increasing in t 0 and x [ 0 is given. Then 1. if Cpm1 ¼ Cpm2 ; then N2 is greater than or equal to N1 : 2. when Cpm1 \Cpm2 ; Cre Cpm2 Cmr ; then N2 is greater than or equal to N1 ; C C 1Þ reCmr pm2 ; then N2 is smaller than or equal to N1 ; C C C C 1Þ\ reCmr pm2 and L2 ðx; N1 Þ reCmr pm2 ; then N2 is equal

a. if L2 ðx; N1 Þ\ b. if L2 ðx; N1 c. if L2 ðx; N1

to N1 :

Proof Since both L1 ðx; NÞ and L2 ðx; NÞ are monotone increasing in N and the optimal number of PMs for each PM policy is the smallest positive integer, N; such that Li ðx; NÞ ðCre Cpm Þ=Cmr ði ¼ 1; 2Þ; it is sufficient to show that L1 ðx; N1 Þ [ L2 ðx; N1 Þ: Since rN1 þ1 ðtÞ rk ðtÞ ¼ hðt þ N1 xÞ hðt þ ðk 1ÞxÞ p½hðN1 xÞ hððk 1ÞxÞ and qN1 þ1 ðtÞ qk ðtÞ ¼ hðt þ N1 xÞ hðt þ ðk 1ÞxÞ " # NX k2 1 1 X j j p ð1 pÞ hððN1 jÞxÞ ð1 pÞ hððk j 1ÞxÞ j¼0

j¼0

150

D. Kim et al.

it is easy to show that ½rN1 þ1 ðtÞ rk ðtÞ ½qN1 þ1 ðtÞ qk ðtÞ " ¼p

N 1 1 X

ð1 pÞj hððN1 jÞxÞ

j¼0

k2 X ð1 pÞj hððk j 1ÞxÞ j¼0

fhðN1 xÞ hððk 1ÞxÞg " # N k2 1 1 X X j j ¼p ð1 pÞ hððN1 jÞxÞ ð1 pÞ hððk j 1ÞxÞ j¼1

"

N 1 1 X

¼p

j¼1

# k2 X j ð1 pÞ hððN1 jÞxÞ ð1 pÞ fhððN1 jÞxÞ hððk j 1ÞxÞg j

j¼1

j¼k1

0 The last inequality holds since hðtÞ is strictly increasing in t 0: And the equality holds when p ¼ 0 and when p ¼ 1: Therefore, N1 Z X x

L1 ðx; N1 Þ L2 ðx; N1 Þ ¼

k¼1

ðrN1 þ1 ðtÞ rk ðtÞÞ ðqN1 þ1 ðtÞ qk ðtÞÞ dt 0:

0

Analogously, the optimal number of PMs, N2 ; of the global PM policy is greater than or equal to the optimal number of PMs, N1 ; of the local PM policy. h

4.2 Comparison of the Optimal PM Periods In order to compare the optimal PM schedules for a given N; we take the derivatives of C1 ðx; NÞ and C2 ðx; NÞ with respect to x and set them equal to 0. Then we have 9 8 2 x 3 Z Zx N < = ðN 1ÞC þ C X d pm1 re rk ðtÞdt þ rk ðtÞ5 rk ðtÞdt ¼ x4 ; ð13Þ ; : C dx mr k¼0 0

0

and 9 8 2 x 3 Z Zx N < = ðN 1ÞC þ C X d pm2 re x4 : qk ðtÞdt þ qk ðtÞ5 qk ðtÞdt ¼ ; : C dx mr k¼0 0

ð14Þ

0

where rk ðtÞ ¼ hðt þ ðk 1ÞxÞ phððk 1ÞxÞ and qk ðtÞ ¼ hðt þ ðk 1ÞxÞ p Pk2 j j¼0 ð1 pÞ hððk j 1ÞxÞ:

Optimal Schedules of Two Periodic Imperfect PM Policies

151

Let g1 ðxÞ and g2 ðxÞ be the left-hand side of Eqs. 13 and 14, respectively. Then, by taking the derivative of g1 ðxÞ and g2 ðxÞ; it is clear that if hðtÞ is a strictly increasing convex function, then both g1 ðxÞ and g2 ðxÞ are increasing in x: Let x1 and x2 be the optimal PM periods of the local and the global PM policies, respectively. Then x1 and x2 are the values of x satisfying g1 ðxÞ ¼ ½ðN 1ÞCpm1 þ Cre =Cmr and g2 ðxÞ ¼ ½ðN 1ÞCpm2 þ Cre =Cmr for a given integer N [ 0; respectively. It is also obvious that if the costs for PM action in the local and the global PM policies are the same, x2 is longer than x1 for a given integer N [ 0: When the cost for PM action in the global PM policy is higher than that for PM action in the local PM policy, the relationship between x1 and x2 needs to be further investigated. The results of these investigations are summarized in the following theorem. Theorem 6 Suppose that hðtÞ is strictly increasing and convex function and an integer N is given. Then (a) if Cpm1 ¼ Cpm2 ; then x2 is longer than or equal to x1 : (b) if Cpm1 \Cpm2 ; then x2 is longer than or equal to x1 : Proof Since g1 ðxÞ and g2 ðxÞ are increasing in x and x1 and x2 are the values of x which satisfy g1 ðxÞ ¼ ½ðN 1ÞCpm1 þ Cre =Cmr and g2 ðxÞ ¼ ½ðN 1ÞCpm2 þ Cre = Cmr ; respectively, it suffices to show that g1 ðxÞ g2 ðxÞ for any x [ 0: We note that Pk2 ð1 pÞj hððk j 1ÞxÞ is independent of t [ 0 and equal rk ðtÞ qk ðtÞ ¼ p j¼1 to zero when p ¼ 0 or 1. Hence we have 8 2 x 3 Z N < X d d x4 rk ðtÞ qk ðtÞ dt þ rk ðxÞ qk ðxÞ5 g1 ðxÞ g2 ðxÞ ¼ : dx dx k¼1 0 9 Zx = ðrk ðtÞ qk ðtÞÞdt ; 0 8 2 x 39 = Z N < X d d x4 ¼ rk ðtÞ qk ðtÞ dt5 0: ; : dx dx k¼1 0

The last inequality holds since hðtÞ is strictly increasing and then k2 X d d d d rk ðtÞ qk ðtÞ ¼ ðrk ðtÞ qk ðtÞÞ ¼ p ð1 pÞj ½hððk j 1ÞxÞ 0: dx dx dx dx j¼1

It is noted that the equality holds for any x [ 0 when p ¼ 0 or p ¼ 1: Therefore, g1 ðxÞ g2 ðxÞ for any x [ 0: Analogously, the optimal period of the global PM policy, x2 ; is longer than or equal to the optimal period of the local PM h policy, x1 .

152

D. Kim et al.

Remark 1 It is observed from Theorems 5 and 6 that the global PM policy outperforms the local PM policy in the sense of the optimal number of PM actions and the optimal PM period when the cost for PM action in the global PM policy is equal to that for PM action in type PM policy. It should be noted that when the cost for PM action in the global PM policy is sufficiently higher than that for PM action in type PM policy, N2 could be smaller than or equal to N1 while x2 is always longer than or equal to x1 : That does not mean, however, C1 ðx; N1 Þ C2 ðx; N2 Þ for given x [ 0 nor C1 ðx1 ; NÞ C2 ðx2 ; NÞ for a given integer N [ 0: Numerical investigation will be given in Sect. 5.

5 Quantitative Analysis In order to perform a quantitative investigation of PM schedules under different PM policies, we consider the Weibull distribution with a scale parameter k and a shape parameter b: The hazard rate of the Weibull distribution is hðtÞ ¼ bkb1 tb1 for k [ 0; b [ 0 and t 0: We assume that b [ 2 and k ¼ 1: Then, the hazard rate, hðtÞ; is strictly increasing and convex for t 0: The cost structures are assumed to be as follows. • • • •

Cost Cost Cost Cost

for for for for

replacement Cre ¼ 2000 minimal repair Cmr ¼ 1:0 PM in the local PM policy Cpm1 ¼ 100 PM in the global PM policy Cpm2 ¼ mCpm1 ; m ¼ 1; 2; 3; . . .

5.1 Comparing Schedules Based on the Local and the Global Policies Table 1 lists the values of the optimal PM periods and the corresponding expected cost rates of both PM policies under various values of N and m when the shape parameter, b; is equal to 3 and the hazard rate reduction factor, p; is equal to 0.5, which means that each PM action reduces the hazard rate by half. It is noted from Table 1 that when N ¼ 1; the optimal PM periods correspond to the times for replacements since there is no PM action. That is why the two PM policies yield the same optimal PM periods for a single PM action, but they produce different results for two or more PM actions. It is also observed that for a given N; x1 is shorter than x2 and C1 ðx1 ; NÞ is smaller than C2 ðx2 ; NÞ when the PM costs for both policies are equal. That comes from the fact that PM action in the global PM policy is more effective in restoring system than PM action in the local PM policy. Hence, when the PM costs for both policies are equal, it would be better to take the global PM policy. When the PM cost of the global PM policy is higher than that of

Optimal Schedules of Two Periodic Imperfect PM Policies

153

Table 1 Optimal PM periods and the expected cost rates of the local and the global PM policies when b ¼ 3 and p ¼ 0:5 Number of Local PM policy Global PM policy PM actions m¼1 m¼5 m¼9 (N) x1 C1 ðx1 ; NÞ x2 C2 ðx2 ; NÞ x2 C2 ðx2 ; NÞ x2 C2 ðx2 ; NÞ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

10.0000 5.4462 3.8352 2.9905 2.4662 2.1078 1.8465 1.6473 1.4903 1.3631 1.2580 1.1695 1.0940 1.0288 0.9717 0.9216 0.8770 0.8370 0.8011 0.7686

300.000 289.194 286.818 288.414 291.946 296.531 301.735 307.319 313.144 319.123 325.200 331.336 337.506 343.691 349.877 356.055 362.218 368.361 374.479 380.571

10.0000 5.4462 3.8857 3.0927 2.6107 2.2854 2.0502 1.8716 1.7309 1.6170 1.5226 1.4429 1.3746 1.3154 1.2634 1.2174 1.1763 1.1393 1.1058 1.0754

300.000 289.194 283.093 278.880 275.786 273.475 271.754 270.496 269.608 269.024 268.689 268.561 268.607 268.799 269.115 269.536 270.047 270.634 271.287 271.996

10.0000 5.7721 4.3089 3.5573 3.0954 2.7801 2.5495 2.3725 2.2315 2.1161 2.0194 1.9370 1.8656 1.8031 1.7477 1.6981 1.6535 1.6130 1.5760 1.5421

300.000 324.840 348.119 368.958 387.679 404.669 420.250 434.670 448.121 460.752 472.680 483.997 494.779 505.086 514.969 524.470 533.626 542.468 551.021 559.309

10.0000 6.0648 4.6621 3.9246 3.4527 3.1426 2.9054 2.7213 2.5733 2.4510 2.3478 2.2592 2.1821 2.1141 2.0535 1.9991 1.9499 1.9051 1.8640 1.8262

300.000 358.625 407.539 449.086 485.166 517.091 545.777 571.877 595.872 618.120 638.896 658.415 676.848 694.330 710.975 726.875 742.107 756.737 770.822 784.408

the local PM policy, however, the global PM policy is more costly in the sense of the expected cost rate even though the PM period in the global PM policy is longer than that in the local PM policy. Table 1 also shows that the optimal PM periods for both PM policies get smaller as N increases. When the PM costs for both policies are equal, the expected cost rate decreases as the number of PM actions increases from 1 to 4 for the local PM policy and from 1 to 12 for the global PM policy. However, these trends for neither policies continue. Hence it is not beneficial to perform more than 4 PM actions for the local PM policy or more than 12 PM actions for the global PM policy. For the global PM policy with very high PM cost, the expected cost rate increases as the number of PM actions increases. In this case, it is beneficial to perform no PM action. Table 2 shows the values of the optimal number of PM actions and their corresponding expected cost rates of both PM policies for various combinations of Cpm2 and p and x ¼ 0:8: It is interesting to note that as the value of p representing the effect of PM action increases, the optimal number of PM actions increases and the expected cost rates decrease. In other words, the better the PM effect is, the greater the optimal number of PM actions before replacement is.

154

D. Kim et al.

Table 2 Optimal number of PM actions and the expected cost rates of the local and the global PM policies when b ¼ 3 and x ¼ 0:8: p Local PM policy Global PM policy m¼1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

m¼5

m¼9

N1

C1 ðx; N1 Þ

N2

C2 ðx; N2 Þ

N2

C2 ðx; N2 Þ

N2

C2 ðx; N2 Þ

13 13 14 14 15 16 18 20 24

406.252 396.652 386.387 375.155 362.373 347.998 331.024 310.054 281.270

15 18 21 25 29 33 37 41 45

384.828 354.199 326.176 302.432 282.814 266.454 252.543 240.469 229.790

13 16 19 22 26 29 32 36 40

848.789 824.300 801.306 781.296 764.438 750.200 738.042 727.428 718.022

12 14 16 19 22 25 28 31 34

1308.87 1290.65 1273.06 1257.12 1243.38 1231.61 1221.43 1212.50 1204.55

Table 2 also shows that when the PM costs for both policies are equal, N1 is smaller than N2 : It is quite natural since the PM action in the local PM policy has an effect on the relative wear-out since the last PM action, while the PM action has an effect on the global wear-out in the global PM policy. It is observed that as the PM cost of the global PM policy increases, N2 decreases and C2 ðx; N2 Þ increases. Hence it is not beneficial to take the global PM policy when the PM cost of the global PM policy is very high.

5.2 Effect of Hazard Rate Reduction Factor p The value of p in both policies represents the effectiveness of a PM action in such a way that the larger p is, the more effective the PM action is. Table 3 and Figs. 3–5 show the optimal PM periods of the local PM policy and the global PM policy with different m and their corresponding expected cost rates for several values of p: Results in Table 3 show that as the value of p increases, the optimal PM periods increase and the expected cost rates decrease for both PM policies. The optimal periods in Figs. 3–5 show similar form, but a closer inspection shows that the optimal periods increase rapidly as p increases when N is getting larger while an increase of p is less effective when N is small. It is noted that the expected cost rate rapidly decreases as p increases when N is large. Figures 6–8 show the optimal number of PM actions and their corresponding expected cost rates for several values of p when the PM period is 0.8. It is shown from Figs. 6 to 8 that the optimal number of PMs increases and the expected cost rate decreases as p increases. In the sense of time for replacement, Table 3 and Figs. 3–5 and 6–8 show that the system under both PM policies with larger p operates for a longer time. It should be noted that an increase in p has more impact on the global PM

Optimal Schedules of Two Periodic Imperfect PM Policies

155

Table 3 Optimal PM periods and the expected cost rates for various values of p and N ¼ 5 p Local PM policy Global PM policy

0.1

x Cðx ; NÞ x Cðx ; NÞ x Cðx ; NÞ x Cðx ; NÞ x Cðx ; NÞ

0.3 0.5 0.7 0.9

2.1789 330.439 2.3049 312.378 2.4662 291.946 2.6849 268.165 3.0101 239.197

m¼1

m¼3

m¼5

2.2126 325.412 2.4013 299.839 2.6107 275.786 2.8438 253.183 3.1051 231.876

2.4353 394.209 2.6430 363.23 2.8735 334.091 3.1299 306.71 3.4176 280.898

2.6233 457.439 2.8470 421.491 3.0954 387.679 3.3717 355.905 3.6815 325.953

380 10

360

9

340 320

2 4 6 8 1

ef f ect

optimal x*

optimal x*

8 7 6 5 4 3 2 1 0 0.1 0.3 0.5 0.7 0.9

300 280 260 1

240 4

220 7

200 0.1 0.3 0.5 0.7 0.9

10

PM ef f ect

Fig. 3 Optimal PM period (left) and expected cost rate (right) of the local PM policy for given p and N

policy than on the local PM policy in the sense of both the optimal PM period and the optimal number of PM actions.

5.3 Effect of Shape Parameter The shape parameter, b; in the Weibull distribution determines the wear-out speed in the deterioration process. Results in Tables 4 and 5 show the optimal number of PM actions and optimal PM period when values of the shape parameter are 3 and 4. It is observed from Tables 4 and 5 that a system with higher shape parameter needs to be replaced earlier and is less cost effective. It is quiet natural since a system with higher shape parameter rapidly deteriorates.

156

D. Kim et al.

360 10

340

9

320

optimal x*

optimal x*

8 7 6 5 4 3 2 1 0

300 280 260 4

220

4 6

7

200 0.1 0.3 0.5 0.7 0.9

8 0.1 0.3 0.5 0.7 0.9

1

240

2

10

10

PM ef f ect

PM ef f ect

Fig. 4 Optimal PM period (left) and expected cost rate (right) of the global PM policy with m ¼ 1 for given p and N

650 10

600

9

550

8

optimal x*

optimal x*

7 6 5 4 3 2 1 0

2 4 6

450 400 350 1

300

4

250 7

200 0.1 0.3 0.5 0.7 0.9

8 0.1 0.3 0.5 0.7 0.9

500

10

10

PM ef f ect

PM ef f ect

Fig. 5 Optimal PM period (left) and expected cost rate (right) of the global PM policy with m ¼ 5 for given p and N 30

450 400

25

350

20

300 250

15

200

10 5

150 100 Expected cost rate Optimal number of PMs

0

50 0

Effect of PM

Expected cost rate

Opimal number of PMs

Fig. 6 Optimal number of PM actions and expected cost rate of the local PM policy for various values of p and x ¼ 0:8

50 45 40 35 30 25 20 15 10 5 0

157 450 400 350 300 250 200 150 100

Expected cost rate

Expected cost rate

Fig. 7 Optimal number of PM actions and expected cost rate of the global PM policy with m ¼ 1 for various values of p and x ¼ 0:8

Opimal number of PMs

Optimal Schedules of Two Periodic Imperfect PM Policies

50

Optimal number of PMs

0

Effect of PM

45

900

40 850

35 30

800

25 20

750

15 10 5

700

Expected cost rate Optimal number of PMs

0

Expected cost rate

Opimal number of PMs

Fig. 8 Optimal number of PM actions and expected cost rate of the global PM policy with m ¼ 5 for various values of p and x ¼ 0:8

650

Ef f ect of PM

Table 4 Optimal number of PMs and corresponding expected cost rate for different shape parameter (the upper is number of PMs and the lower is expected cost rate) p b¼3 b¼4 Local PM Policy 0.1 13 406.252 0.3 14 386.387 0.5 15 362.373 0.7 18 331.024 0.9 24 281.27

Global PM policy with m¼1

Local PM policy

Global PM policy with m¼1

15 384.828 21 326.176 29 282.814 37 252.543 45 229.79

6 623.745 7 601.195 7 575.39 8 543.526 9 496.716

7 617.747 7 581.781 8 546.317 9 510.8 10 475.833

158

D. Kim et al.

Table 5 Optimal PM period and corresponding expected cost rate for different shape parameter with p ¼ 0:5: N b¼3 b¼4

1 3 5 7 9

Local PM policy

Global PM policy with m ¼ 1

Local PM Policy

Global PM policy with m ¼ 1

x1

C1 ðx ; NÞ

x2

C2 ðx ; NÞ

x1

C1 ðx ; NÞ

x2

C2 ðx ; NÞ

10 3.8352 2.4662 1.8465 1.4903

300 286.818 291.946 301.735 313.144

10 3.8857 2.6107 2.0502 1.7309

300 283.093 275.786 271.754 269.608

5.0813 1.8471 1.1713 0.8691 0.6963

524.797 529.359 546.392 569.824 595.683

5.0813 1.8545 1.2012 0.9155 0.7536

524.797 527.245 532.796 540.958 550.414

6 Summary and Conclusion A large number of PM models have been proposed and studied in the literature. Based on PM models, PM policies have been developed to maintain the system preventively while the system is operating and thus to prolong the lifetime of the system by reducing the hazard rate. In this paper, we consider two periodic PM policies which are the local PM policy and the global PM policy. Each application of PM actions in the global PM policy has an effect on the global deterioration of the system while each PM action in the local PM policy reduces the hazard rate which has increased since the last PM. Under each policy, the system is preventively maintained at periodic times, x; 2x; ; Nx; and is replaced by a new system at the Nth PM action. After the kth PM action, the hazard rate is reduced by each PM model. The proposed PM policies are PM policies with improvement factor. For given cost structures for PM action, minimal repair, and replacement, we derive formulas to compute the expected cost rates per unit time during the system life cycle for the proposed PM policies, and determine the optimal PM schedules, which are to minimize the expected cost rates. We show that the optimal schedules of both PM policies exist and they are unique. Also we compare the optimal schedules of the two PM policies analytically and show that the global PM policy is better than the local PM policy in the sense of the number of PM actions and PM period. It is noted that the optimal schedules of both PM policies are the same when p ¼ 1 and p ¼ 0: This is expected from both PM policies since when p ¼ 1; every PM action restores the system to the state as good as new and then the effects of PM actions in both PM policies are the same. When p ¼ 0; the system after the PM action returns to the state just prior to the PM action and then the effects are the same. Numerical studies show that the global PM policy outperforms the local PM policy when PM costs are equal. When the PM cost of the local PM policy is less than that of the global PM policy, the local PM policy is more cost effective while the global PM policy results in longer cycle for replacement. For both policies, it is observed that as the hazard rate reduction factor, p; increases, the system can operate in longer replacement cycle and the expected cost rate decreases. However,

Optimal Schedules of Two Periodic Imperfect PM Policies

159

it would be more costly to take high values of p: It remains to further study the problem of the PM cost depending on the hazard rate reduction factor. More extensive tables and graphs for numerical examples are available from the authors. Acknowledgements This work was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

References 1. Barlow RE, Hunter LE (1960) Optimum preventive maintenance policies. Oper Res 8: 90–100 2. Bartholomew-Biggs M, Zuo MJ, Li XM (2009) Modelling and optimizing sequential imperfect preventive maintenance. Reliab Eng Syst Saf 94:53–62 3. Canfield RV (1986) Cost optimization of periodic preventive maintenance. IEEE Trans Reliab 35:78–81 4. Chan J, Shaw L (1993) Modeling repairable systems with failure rates that depend on age and maintenance. IEEE Trans Reliab 42:566–571 5. Doyen L, Gaudoin O (2004) Classes of imperfect repair models based on reduction of failure intensity or virtual age. Reliab Eng Syst Saf 84:45–56 6. Fontenot RA, Proschan F (1984) Some imperfect maintenance models. In: Abdel-Hameed MS, Cinlar E, Quinn J (eds) Reliability theory and models. Academic Press, San Diego 7. Jayabalan V, Chaudhuri D (1992) Cost optimization of maintenance scheduling for a system with assured reliability. IEEE Trans Reliab 41:21–26 8. Lie CH, Chun YH (1986) An algorithm for preventive maintenance policy. IEEE Trans Reliab 35:71–75 9. Lim JH, Park D (2007) Optimal periodic preventive maintenance schedules with improvement factors depending on number of preventive maintenances. Asia Pac J Oper Res 24:111–124 10. Malik M (1979) Reliable preventive maintenance policy. AIIE Trans 11:221–228 11. Nakagawa T (1986) Periodic and sequential preventive maintenance policies. J Appl Probab 23:536–542 12. Pham H, Wang H (1996) Imperfect maintenance. Eur J Oper Res 94:425–438 13. Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139:469–489 14. Zequeira RI, Bérenguer C (2006) Periodic imperfect preventive maintenance with two categories of competing failure modes. Reliab Eng Syst Saf 91:460–468

Part III

Two-Dimensional Warranty

Warranty Servicing with Imperfect Repair for Products Sold with a Two-Dimensional Warranty Bermawi P. Iskandar and Nat Jack

1 Introduction Manufacturers who sell products with warranties incur the additional costs of servicing any claims made by their customers. These warranty costs depend on factors such as product reliability, warranty terms, usage intensity, operating environment, and servicing logistics. They can vary between 2 to 10% of product sale price depending on the product and manufacturer and details can be found in the weekly newsletter ‘Warranty Week’ (http://www.warrantyweek.com/). The annual warranty costs for large companies (such as automobile and computer manufacturers) sometimes run into billions of dollars. Consequently, methods for reducing warranty costs are of great interest to manufacturers. Three possible ways of achieving a reduction in costs are to improve product reliability, use preventive maintenance, and use an effective warranty servicing strategy [35]. In the case of a repairable product, a manufacturer has the choice of repairing or replacing the failed item (component, module, assembly or product) by a new one. Murthy and Jack [27] review the different repair–replace strategies that have been proposed for products sold with one-dimensional warranties. If the replacement cost is very high compared to the cost of a repair, then a servicing strategy involving replacement is not economical. As a result, a servicing strategy with imperfect repair (which improves the reliability of the repaired item) instead of a replacement is more economical. Yun et al. [35] have studied several servicing B. P. Iskandar (&) Department of Industrial Engineering, Bandung Institute of Technology, Jalan Ganesa 10, Bandung 40132, Indonesia e-mail: [email protected] N. Jack University of Abertay Dundee, Dundee Business School, Dundee DD1 1HG, UK e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_6, Springer-Verlag London Limited 2011

163

164

B. P. Iskandar and N. Jack

strategies involving imperfect repair for items sold with a one-dimensional warranty. With a two-dimensional warranty, the effect of age and usage on product degradation and failure needs to be modeled. The usage can be the output (copies produced for a photocopier), distance traveled (kilometers for an automobile), or the number of times or hours the product has been used (takeoffs and landings or the total hours flown for an aircraft). Three different approaches have been proposed to model failures involving age and usage. These are discussed in the next section where we also review the literature on servicing strategies involving options such as minimal repair, imperfect repair, and replacement for products sold with two-dimensional warranties. We then study a servicing strategy with an imperfect repair using a new model formulation which is more appropriate for these types of warranty. The outline of the paper is as follows. In Sect. 2 we give the details of the model formulation. The strategy with imperfect repair depends on the usage rate and is characterized by three parameters. Section 3 deals with the analysis of the model to optimally select the three parameter values which depend on the usage rate. Two of the parameter values define a two-dimensional region where imperfect repair is carried out and the third value defines the degree of imperfect repair that should be used. Section 4 looks at the case where the item has a Weibull failure distribution and we illustrate with a numerical example. Finally, we conclude with a brief discussion of topics for future research in Sect. 5

2 Model Formulation The following notation is used to model the new servicing strategy with imperfect repair. Notation

W; U x u Y y0 F0 ðx; a0 Þ Fðx; aðyÞÞ f ðx; aðyÞÞ aðyÞÞ Fðx; hðx; aðyÞÞ Hðx; aðyÞÞ

Parameters of the two-dimensional warranty policy Item age Total item usage Usage rate (random variable) Nominal usage rate used in product design Failure distribution function with nominal design usage rate y0 and scale parameter a0 Conditional failure distribution function given that the usage rate is Y ¼ y Density function associated with Fðx; aðyÞÞ Survivor function associated with Fðx; aðyÞÞ Hazard rate associated with Fðx; aðyÞÞ Cumulative hazard function associated with Fðx; aðyÞÞ (continued)

Warranty Servicing with Imperfect Repair for Products Sold

165

(continued) aðyÞ c Ky ; Ly Wy kðÞ ky ðÞ dy ðÞ

Scale parameter with usage rate y ð ðy0 =yÞc a0 Þ Parameter of the accelerated failure time (AFT) model ð 1Þ Parameters of the warranty servicing strategy under usage rate y Age of the item at the expiry of the warranty under usage rate y Base failure intensity function Failure intensity function under usage rate y Proportional hazard rate reduction for an imperfect repair under usage rate y ð0 dy 1Þ Cm Cost of a minimal repair Cost of an imperfect repair that achieves a 100% reduction in the hazard rate C0 Cost of an imperfect repair under usage rate y Ci ðdy Þ JðKy ; Ly ; dy Þ Expected warranty servicing cost under usage rate y

2.1 Approaches to Modeling Failures Three approaches can be used to modeling failures for products sold with two-dimensional warranties. 2.1.1 Approach 1 The time to first failure is modeled by a bivariate distribution function Fðx; uÞ. If failed items are replaced by new ones and replacement times are negligible, then failures over the warranty region occur according to a two-dimensional renewal process [11]. The modeling of failures with repair instead of replacement is still an active area of research (see [2, 3, 30]). Murthy et al. [29], Kim and Rao [20], Yang and Nachlas [33], Pal and Murthy [32], and Jung and Bai [19] have used this approach for the cost analysis of two-dimensional warranty policies. 2.1.2 Approach 2 The two measurement scales (age and usage) are combined to provide a single composite scale zðz ¼ ax þ bu is an illustrative example) and failures are modeled as a counting process using this composite scale. Kordonsky and Gertsbakh [21, 22, 23] discuss this method in a reliability context and Gertsbakh and Kordonsky [10] use it in a warranty context. For a more detailed discussion of the method and various related issues, see Duschesne and Lawless [9]. 2.1.3 Approach 3 The usage rate Y is assumed to vary from customer to customer but is constant for a given customer. The random variable Y has density function gðuÞ; 0 u\1. Conditional on Y ¼ y, the total usage u at age x is given by

166

B. P. Iskandar and N. Jack

u ¼ yx:

ð1Þ

For a given usage rate y, the conditional hazard (failure rate) function for the time to first failure is given by hðx; yÞ 0 which is a non-decreasing function of the item age x and y. Failures over time are modeled by a counting process. If failed items are replaced by new ones, then this counting process is a renewal process associated with the conditional distribution Fðx; yÞ which can be derived from hðx; yÞ. If failed items are repaired then the counting process is characterized by a conditional intensity function ky ðxÞ which is a non-decreasing function of x and y. If all repairs are ‘minimal’ [4] and repair times are negligible, then ky ðxÞ ¼ hðx; yÞ. Murthy and Wilson [28], Iskandar et al. [14], Moskowitz and Chun [26], and Chun and Tang [7] assume ky ðxÞ to be a linear function of age and usage to build models for warranty cost analysis. Lawless et al. [25] use a different method, utilizing concepts from the accelerated failure time and proportional hazards models (see [24, 5]) to model the effect of usage rate on reliability. A variation of this method is used in this paper.

2.2 Modeling First Failure We consider a product consisting of several interconnected components whose reliability is a function of the component reliabilities. During its design, decisions are made about the component reliabilities in order to ensure that the product has the desired reliability at some nominal usage rate y0 . When the actual usage rate is different from this nominal value, some of the component reliabilities can be affected and this in turn affects the product reliability. As the usage rate increases above the nominal value, the rate of degradation increases and this, in turn, accelerates the time to failure. Consequently, the product reliability decreases [increases] as the usage rate increases [decreases]. The effect of usage rate on degradation can be modeled using an accelerated failure time (AFT) model (see [31, 5]). With the AFT formulation, if T0 ½Ty denotes the time to first failure under usage rate y0 ½y, then we have c Ty y0 ¼ : ð2Þ T0 y If F0 ðx; a0 Þ is the distribution function for T0 where a0 is the scale parameter, then the distribution function for Ty is the same as that for T0 but with scale parameter given by c y0 a0 ; ð3Þ aðyÞ ¼ y with c 1. Hence, we have Fðx; aðyÞÞ ¼ F0 ðfy=y0 gc x; a0 Þ:

ð4Þ

Warranty Servicing with Imperfect Repair for Products Sold

167

The hazard and the cumulative hazard functions associated with Fðx; aðyÞÞ are given by aðyÞÞ; hðx; aðyÞÞ ¼ f ðx; aðyÞÞ=Fðx; and Hðx; aðyÞÞ ¼

Z

ð5Þ

x

hðx0 ; aðyÞÞdx0 ;

ð6Þ

0

where f ðx; aðyÞÞ is the associated density function.

2.3 Modeling Subsequent Failures Subsequent failures depend on the type of action taken to rectify a failed item. For a repairable product, the subsequent failures depend on the type of repair carried out. If this is a minimal repair, the reliability of the product after repair is the same as that just before failure. An imperfect repair (see [8]) improves the reliability of the product, but it is still inferior to that of a new item. If the failed product is always minimally repaired and repair times are negligible (relative to the mean time between failures) and so can be ignored, then failures over time occur according to a non-homogeneous Poisson process (NHPP). The failure intensity function has the same form as the hazard rate for time to first failure so, if the product has usage rate y, the intensity function is ky ðxÞ ¼ hðx; aðyÞÞ;

ð7Þ

where hðx; aðyÞÞ is the hazard function given by (5). The hazard rate for the product lifetime after a minimal repair is the same as that before failure. In contrast, an imperfect repair improves the product’s reliability and the hazard rate after a repair is smaller. We model this as follows. For a given usage rate y, if a failure occurring at age x is rectified by an imperfect repair, then the hazard rate for the product lifetime after a repair, hðxþ ; aðyÞÞ is given by hðxþ ; aðyÞÞ ¼ hðx ; aðyÞÞ dy fhðx ; aðyÞÞ hð0; aðyÞÞg;

ð8Þ

where hðx ; aðyÞÞ is the hazard rate just before failure. Imperfect repair times are also assumed to be negligible and so can be ignored.

2.4 Warranty Policy and Coverage The product is sold with a two-dimensional warranty with warranty region X, the rectangle ½0; WÞ ½0; UÞ, where W is the time limit and U the usage limit. The warranty ceases at the first instance when the age of the item reaches W or its usage reaches U, whichever occurs first. If the usage rate y is at most U=W, then

168

B. P. Iskandar and N. Jack

the warranty expires at age W and an estimate of the total usage is yW. When y is greater than U=W, the warranty expires at age U=y when the usage limit U is reached. If Wy denotes the warranty expiry time when the usage rate is y then W; if y U=W; Wy ¼ ð9Þ U=y; if y [ U=W: All failures are rectified by the manufacturer at no cost to the customer until time Wy .

2.5 Warranty Servicing Strategies For one-dimensional warranties with time limit W, Jack and van der Duyn Schouten [16] made the conjecture that the optimal servicing strategy is characterized by three distinct intervals, ½0; KÞ; ½K; LÞ, and ½L; WÞ. During the first and last intervals, minimal repairs are carried out and, during the middle interval, either minimal repair or replacement by a new item is used, depending on the age of the item at failure. This conjecture was proved to be true by Jiang et al. [18]. Unfortunately, the optimal strategy is difficult to implement, and Jack and Murthy [15] proposed a near optimal strategy involving the same three intervals but with only the first failure in the middle interval resulting in a replacement and all other failures being minimally repaired. If the cost of an item replacement is very high compared to the cost of a minimal repair, then replacement in the middle interval is not appropriate. In this case, performing a ‘better than minimal’ repair in this interval can be a better option. Two servicing strategies involving imperfect repair for products sold with a one-dimensional warranty have been studied by Yun et al. [35]. There is a limited literature on repair–replacement strategies for products sold with two-dimensional warranties. Iskandar and Murthy [12] and Iskandar et al. [13] study two different servicing strategies using Approach 3 to model failures. In each case, the optimal strategy is characterized by three disjoint sub-regions X1 ; X2 , and X3 . Chukova and Johnston [6] consider strategies involving a choice between minimal and complete repair. Yun and Kang [34] extend one of the imperfect repair strategies studied by Yun et al. [35] to the case of a twodimensional warranty. Jack et al. [17] study a repair–replacement strategy where the middle sub-region X2 (in which the first failure to occur results in replacement) is defined in term of age and usage, and is not restricted to the shape considered by Iskandar et al. [13] and Yun and Kang [34]. In this paper, we discuss a servicing strategy involving imperfect repair where the middle region is similar to that defined in Jack et al. [17].

2.6 A New Servicing Strategy for 2-D Warranties For an item with usage rate y, the time to first failure has distribution function Fðx; aðyÞÞ and the warranty expires at age W given by (9). We propose the

Warranty Servicing with Imperfect Repair for Products Sold

169

Fig. 1 Parameters of warranty servicing strategy with imperfect repair

imperfect repair strategy which is similar to that in Yun et al. [35]. This involves three disjoint intervals, ½0; Ky Þ; ½Ky ; Ly Þ, and ½Ly ; Wy Þ in which only the first failure in the middle interval is imperfectly repaired and all other failures are minimally repaired. For a given usage rate y, the values of the parameters Ky ; Ly , and dy are selected to minimize the expected warranty servicing cost. If Ky and L y denote the optimal values of Ky and Ly then, as y varies, the set of points ðKy ; L y Þ defines a closed curve as indicated in Fig. 1. Let C denote the region enclosed by this curve. The new strategy with imperfect repair is defined as follows: For an item sold with a two-dimensional warranty, perform an imperfect repair at the first failure in the region C and repair all other failures through minimal repair.

The region C depends on the type of model used for item failures and on the cost of each minimal repair and imperfect repair. As in Yun et al. [6], we assume that Ci ðdy Þ ¼ Cm þ Ddpy with p [ 1 and D ¼ C0 Cm , so the cost of an imperfect repair lies between Cm (when dy ¼ 0) and C0 (when dy ¼ 1).

3 Model Analysis and Optimization 3.1 Expected Warranty Servicing Cost If the item has usage rate y, then the warranty expires after time Wy . An expression for JðKy ; Ly ; dy Þ, the expected warranty servicing cost under the new servicing strategy with imperfect repair, is obtained as follows:

170

B. P. Iskandar and N. Jack

Minimal repairs are carried out during the interval ½0; Ky Þ, so the expected repair cost for this period is Z Ky Cm hðt; aðyÞÞ dt ¼ Cm HðKy ; aðyÞÞ: ð10Þ 0

The expected cost for the remaining interval ½Ky ; Wy Þ depends on whether the first failure after Ky occurs in ½Ky ; Ly Þ or not. If X denotes the time at which the first failure occurs after Ky and X lies the interval ½Ky ; Ly Þ, then this failure is imperfectly repaired and failures over the interval ½x; Wy Þ occur according to an NHPP with intensity function ky ðtÞ ¼ hðt; aðyÞÞ dy ðhðx; aðyÞÞ hð0; aðyÞÞÞ:

ð11Þ

The expected repair cost for the remainder of the warranty period, conditional on X ¼ x, is Z Wy hðt; aðyÞÞ dy ðhðx; aðyÞÞ hð0; aðyÞÞÞ dt Ci ðdy Þ þ Cm x ¼ Ci ðdy Þ þ Cm HðWy ; aðyÞÞ Hðx; aðyÞÞ dy ðWy xÞðhðx; aðyÞÞ hð0; aðyÞÞÞ : ð12Þ If X lies beyond Ly , then this failure and all remaining failures over the remainder of the warranty period are minimally repaired. The conditional expected repair cost for the interval ½Ly ; Wy Þ is Z Wy Cm hðt; aðyÞÞ dt ¼ Cm HðWy ; aðyÞÞ HðLy ; aðyÞÞ : ð13Þ Ly

Removing the conditioning in (12) and (13) and adding (9) yields JðKy ; Ly ; dy Þ ¼ WðKy ; Ly Þ þ Uðdy ; Ky ; Ly Þ;

ð14Þ

where

y ; aðyÞÞ FðL WðKy ; Ly Þ ¼Cm HðKy ; aðyÞÞ þ ½HðWy ; aðyÞÞ HðLy ; aðyÞÞ FðKy ; aðyÞÞ ) Z Ly f ðx; aðyÞÞ dx ½HðWy ; aðyÞÞ Hðx; aðyÞÞ þ FðKy ; aðyÞÞ Ky y ; aðyÞÞ FðL ; ¼Cm HðWy ; aðyÞÞ 1 þ FðKy ; aðyÞÞ

ð15Þ

and Uðdy ;Ky ; Ly Þ ¼

Z

Ly

Ky

f ðx;aðyÞÞ dx Ci ðdy Þ dy Cm ðWy xÞ½hðx; aðyÞÞ hð0;aðyÞÞ FðKy ; aðyÞÞ

Warranty Servicing with Imperfect Repair for Products Sold

171

y ;aðyÞÞ FðL ¼Ci ðdy Þ 1 FðKy ;aðyÞÞ Z Ly f ðx;aðyÞÞ dy Cm dx: ðWy xÞ½hðx;aðyÞÞ hð0;aðyÞÞ FðK y ;aðyÞÞ Ky

ð16Þ

3.2 Optimization We use a two-stage approach to find the optimal parameter values for the servicing strategy. In stage 1, for fixed Ky and Ly , we obtain the optimal d y ðKy ; Ly Þ that minimizes JðKy ; Ly ; dy Þ. Then, in stage 2, we obtain the optimal Ky and L y by minimizing JðKy ; Ly ; d y ðKy ; Ly ÞÞ. The optimal proportional reduction in the hazard rate when an imperfect repair is carried out is then given by d y ðKy ; L y Þ. Stage 1 For fixed Ky and Ly ; d y ðKy ; Ly Þ is obtained by solving the following optimization problem: min Uðdy ; Ky ; Ly Þ ¼ Ci U1 ðKy ; Ly Þ dy U2 ðKy ; Ly Þ;

dy jKy ;Ly

ð17Þ

subject to the constraint 0 dy 1 where y ; aðyÞÞ FðL ; U1 ðKy ; Ly Þ ¼ 1 FðKy ; aðyÞÞ and U2 ðKy ; Ly Þ ¼ Cm

Z

Ly Ky

f ðx; aðyÞÞ ðWy xÞðhðx; aðyÞÞ hð0; aðyÞÞÞ dx: FðKy ; aðyÞÞ

Differentiating (17) partially with respect to dy yields oU ¼ pðC0 Cm Þdp1 y U1 ðKy ; Ly Þ U2 ðKy ; Ly Þ; ody oU ¼ U2 ðKy ; Ly Þ\0 ody

Now, dy ¼0 oU ¼ pðC C ÞU ðK ; L Þ 0 m 1 y y U2 ðKy ; Ly Þ. ody dy ¼1

If

ð18aÞ and

U2 ðKy ; Ly Þ\pðC0 Cm ÞU1

ðKy ; Ly Þ, then 0\d y ðKy ; Ly Þ ¼

1 p1 U2 ðKy ; Ly Þ \1: pðC0 Cm ÞU1 ðKy ; Ly Þ

ð18bÞ

172

B. P. Iskandar and N. Jack

If U2 ðKy ; Ly Þ pðC0 Cm ÞU1 ðKy ; Ly Þ, then d y ðKy ; Ly Þ ¼ 1:

ð18cÞ

Stage 2 Ky and L y are found by solving the following optimization problem: min JðKy ; Ly ; d y ðKy ; Ly ÞÞ ¼ W1 ðKy ; Ly Þ þ Uðd y ðKy ; Ly Þ; Ky ; Ly Þ;

Ky ;Ly

ð19Þ

subject to the constraint 0 Ky Ly Wy . In each stage, we need to use a computational approach to obtain the optimal parameter values.

4 Special Case: Weibull Failure Time Distribution The failure distribution function with nominal design usage rate y0 and scale parameter a0 is F0 ðx; a0 Þ ¼ 1 expðx=a0 Þb ;

ð20Þ

where b is the shape parameter. The conditional failure distribution function, given the usage rate y, is Fðx; aðyÞÞ ¼ 1 expðx=aðyÞÞb ;

ð21Þ

with aðyÞ given by (3). The hazard function associated with Fðx; aðyÞÞ is given by cb b1 y x : hðx; aðyÞÞ ¼ b y0 ab0

ð22Þ

4.1 Numerical Example Let the parameter values be as follows: Warranty Policy Design Reliability AFT Model

W ¼ 2 (years) and U ¼ 2ð104 Km), so U=W ¼ 1 (104 Km per year), a0 ¼ 1 (year) and b ¼ 2 c¼2

The expected warranty servicing cost when the item is always minimally repaired on failure is Jym JðKy ; Ly ; 0Þ ¼ Cm HðWy ; aðyÞÞ. Table 1 shows the values of Wy ; Ky ; L y ; d y ; JðKy ; L y ; d y Þ and Jym as y varies from 0.2 to 4 when Cm ¼ 1; C0 ¼ 2; and p ¼ 4. In the final column of the table, the percentage

Warranty Servicing with Imperfect Repair for Products Sold

173

Table 1 Wy ; Ky ; L y ; d y ; JðKy ; L y ; d y Þ and Jym as y varies when Cm ¼ 1; C0 ¼ 2 y

Wy

Ky

L y

d y

JðKy ; Ly ; d y Þ

Jym

% Reduction

0.2 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.2 1.4 1.6 1.8 2.0 2.5 3.0 3.5 4.0

2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 1.67 1.43 1.25 1.11 1.00 0.80 0.67 0.57 0.50

0.03 0.15 0.19 0.24 0.33 0.43 0.54 0.64 0.60 0.56 0.51 0.48 0.44 0.37 0.32 0.27 0.24

1.87 1.87 1.87 1.87 1.87 1.87 1.87 1.88 1.56 1.34 1.19 1.07 0.97 0.80 0.67 0.57 0.50

0.09 0.23 0.31 0.39 0.48 0.58 0.68 0.78 0.88 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00

0.0064 0.1019 0.2461 0.4987 0.8882 1.4315 2.1370 3.0134 4.0207 5.0999 6.2790 7.6040 9.1038 13.5703 19.0505 25.5380 33.0295

0.0064 0.1024 0.2500 0.5184 0.9604 1.6384 2.6244 4.0000 5.7600 7.8400 10.2400 12.9600 16.0000 25.0000 36.0000 49.0000 64.0000

0.00 0.40 1.56 3.80 7.52 12.63 18.57 24.67 30.20 34.95 38.68 41.33 43.10 45.72 47.08 47.88 48.39

reduction in expected servicing costs, when the optimal imperfect repair strategy is used instead of always minimal repair, is given. From the table, the following observations can be made: • The largest values of Ky and L y are obtained when the item operates at the design usage rate ðy ¼ 1Þ. • As y decreases below the design usage rate, Ky ! 0; L y ! Wy ; d y ! 0 and so always minimally repairing the item on failure becomes optimal. • As y increases above the design usage rate, Ky ! 0; L y ! Wy ; d y ! 1 and so it becomes optimal to carry out an imperfect repair at the first failure under Fig. 2 Region Tau for the values given in Table 1

174

B. P. Iskandar and N. Jack

warranty, with this repair producing a 100% reduction in the hazard rate for product lifetime. • The benefits of using the optimal imperfect repair strategy over minimal repair decrease as y decreases below the design usage rate and increase as y increases above the design usage rate. Figure 2 shows the plot of the imperfect repair region C using the values of Ky and L y given in Table 1.

5 Conclusion In this paper we have studied a new servicing strategy with imperfect repair which produces significant benefits over a repair–replace strategy, especially when the cost of replacement is high. The selection of failure model and the estimation of model parameters require field data. This topic is currently under investigation and is discussed in Lawless et al. [25] and Baik and Murthy [1]. Baik and Murthy [1] discuss estimation (of the parameters of the failure distribution function and c) using field data. This topic, as well as the model selection (the form of F0 ðx; a0 Þ), needs further study.

References 1. Baik J, Murthy DNP (2008) Reliability assessment based on two-dimensional warranty data and accelerated failure time model. Int J Reliability Safety 2:190–208 2. Baik J, Murthy DNP, Jack N (2004) Two-dimensional failure modelling with minimal repair. Naval Res Logistics 51:345–362 3. Baik J, Murthy DNP, Jack N (2006) Erratum: Two-dimensional failure modelling with minimal repair. Naval Res Logistics 53:115–116 4. Barlow RE, Hunter L (1960) Optimal preventive maintenance policies. Oper Res 8:90–100 5. Blischke WR, Murthy DNP (2000) Reliability: modeling, prediction, and optimisation. Wiley, New York 6. Chukova S, Johnston MR (2006) Two-dimensional warranty repair strategy based on minimal and complete repairs. Math Comput Modelling 44:1133–1143 7. Chun YH, Tang K (1999) Cost analysis of two-attribute policies based on the product usage rate. IEEE Trans Eng Manag 46:201–209 8. Doyen L, Gaudoin O (2004) Classes of imperfect repair models based on reduction of failure intensity or virtual age. Reliability Eng Syst Safety 84:45–56 9. Duchesne T, Lawless JF (2000) Alternative time scales and failure time models. Lifetime Data Anal 6:157–179 10. Gertsbakh IB, Kordonsky KB (1998) Parallel time scales and two-dimensional manufacturer and individual customer warranties. IIE Trans 30:1181–1189 11. Hunter JJ (1974) Renewal theory in two dimensions: basic results. Adv Appl Probab 6:376–391 12. Iskandar BP, Murthy DNP (2003) Repair–replace strategies for two-dimensional warranty policies. Math Comput Modelling 38:1233–1241

Warranty Servicing with Imperfect Repair for Products Sold

175

13. Iskandar BP, Murthy DNP, Jack N (2005) A new repair–replace strategy for items sold with a two-dimensional warranty. Comput Oper Res 32:669–682 14. Iskandar BP, Wilson RJ, Murthy DNP (1994) Two-dimensional combination warranty policies. RAIRO Oper Res 28:57–75 15. Jack N, Murthy DNP (2001) A servicing strategy for items sold under warranty. J Oper Res Soc 52:1284–1288 16. Jack N, Van der Duyn Schouten F (2000) Optimal repair–replace strategies for a warranted product. Int J Prod Econ 67:95–100 17. Jack N, Iskandar BP, Murthy DNP (2009) A new repair–replace strategy based on usage rate for items sold with a two-dimensional warranty. Reliability Eng Syst Safety, 94:611–617 18. Jiang X, Jardine AKS, Lugitigheid D (2006) On a conjecture of optimal repair–replacement strategies for warranted products. Math Comput Modelling 44:963–972 19. Jung M, Bai DS (2007) Analysis of field data under two-dimensional warranty. Reliability Eng Syst Safety 92:135–143 20. Kim HG, Rao BM (2000) Expected warranty cost of two-attribute free-replacement warranties based on a bivariate exponential distribution. Comput Ind Eng 38:425–434 21. Kordonsky KB, Gertsbakh I (1993) Choice of the best time scale for system reliability analysis. Eur J Oper Res 65:235–246 22. Kordonsky KB, Gertsbakh I (1995) System state monitoring and lifetime scales—I. Reliability Eng Syst Safety 47:1–14 23. Kordonsky KB, Gertsbakh I (1995) System state monitoring and lifetime scales—II. Reliability Eng Syst Safety 49:145–154 24. Lawless J (1982) Statistical models and methods for lifetime data. Wiley, New York 25. Lawless J, Hu J, Cao J (1995) Methods for estimation of failure distributions and rates from automobile warranty data. Lifetime Data Anal 1:227–240 26. Moskowitz H, Chun YH (1994) A Poisson regression model for two-attribute warranty policy. Naval Res Logistics 41:355–376 27. Murthy DNP, Jack N (2007) Warranty servicing. In: Ruggeri F, Faltin F, Kenett R (eds) Encyclopedia of statistics in quality and reliability. Wiley, Chichester 28. Murthy DNP, Wilson RJ (1991) Modelling two-dimensional warranties. In: Proceedings of the fifth international symposium on applied stochastic models and data analysis, Granada, Spain, pp 481–492 29. Murthy DNP, Iskandar BP, Wilson RJ (1995) Two-dimensional failure free warranties: two-dimensional point process models. Oper Res 43:356–366 30. Murthy DNP, Baik J, Wilson RJ, Bulmer M (2006) Two-dimensional failure modelling. Springer handbook of engineering statistics, pp 97–112 31. Nelson W (1982) Applied life data analysis. Wiley, New York 32. Pal S, Murthy GSR (2003) An application of Gumbel’s bivariate exponential distribution in estimation of warranty cost of motor cycles. Int J Qual Reliability Manag 20:488–502 33. Yang SC, Nachlas JA (2001) Bivariate reliability and availability modelling. IEEE Trans Reliability 50:26–35 34. Yun W-Y, Kang KM (2007) Imperfect repair policies under two-dimensional warranty. J Risk Reliability 221:239–247 35. Yun W-Y, Murthy DNP, Jack N (2008) Warranty servicing with imperfect repair. Int J Prod Econ 111:159–169

Part IV

Burn-in

A Survey of Burn-in and Maintenance Models for Repairable Systems Ji Hwan Cha

1 Introduction Burn-in is a method used to eliminate the initial failures in field use. To burn-in a component or system means to subject it to a period of simulated use prior to the time when it is to actually be used. Due to the high failure rate in the early stages of component life, the burn-in procedure has been widely accepted as a method of screening out failures before systems are actually used in field operations. An introduction to this important area of reliability can be found in Jensen and Petersen [13] and Kuo and Kuo [14]. If the burn-in procedure is applied for a too long time, then the items of good quality (here, ‘quality’ is mentioned only in terms of the length of lifetime) will also be eliminated by burn-in, or, even if they are not eliminated, their normal lifetimes will be shortened. On the other hand, if the burn-in procedure is performed for a too short time, then the items with poor quality will still remain in the population, which results in frequent failures in the early stages of component life. In addition, burn-in is usually considered as costly. Thus, one of the major problems in the studies of burn-in is to decide how long should the procedure last. The best time to stop the burn-in process for a given criterion is called the optimal burn-in time. In the literature, certain cost structures have been proposed, and the corresponding problem of finding the optimal burn-in time has been considered. Some other performance-based criteria, for example, the mean residual life criteria, the reliability of a given mission time, or the mean number of failures, have been also considered to determine the optimal burn-in time. An excellent survey of studies on burn-in and plenty of references can be found in Block and Savits [3].

J. H. Cha (&) Department of Statistics, Ewha Womans University, Seoul 120-750, Korea e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_7, Ó Springer-Verlag London Limited 2011

179

180

J. H. Cha

After the survey provided by Block and Savits [3], there has been much research on burn-in procedures, especially for repairable systems. The main characteristics of recent research includes the following: (i) various reliability models which jointly deal with burn-in and maintenance policies have been investigated; (ii) burn-in procedures for general failure model have been studied; (iii) a stochastic model for accelerated burn-in procedure has been developed. In this article, recent developments on burn-in procedures will be surveyed, mainly focusing on the burn-in for repairable systems which incorporates minimal repair during the process. Furthermore, current issues and some topics to be developed in the area of burn-in will also be discussed.

2 Burn-in and Maintenance Policies: Initial Models In this section, reliability models which jointly deal with burn-in and maintenance policies will be surveyed. All the models in this section consider burn-in and maintenance at the same time and, in each model, the properties of joint optimal solution for burn-in and replacement times are obtained. Since the optimal replacement policy should depend on the distribution of the lifetime of the system used in field operation, it is thus natural to take both burn-in and maintenance into consideration at the same time. Mi [16] first considered the joint optimization problem for determining optimal burn-in and replacement times. In this section, we survey some initial models starting from those in [16]. Let FðtÞ be the distribution function of the lifetime X: It is assumed that X is an absolutely continuous random variable and let f ðtÞ and rðtÞ ¼ f ðtÞ=FðtÞ; where ¼ 1 FðtÞ; be its density and failure rate functions, respectively. Mi [16] FðtÞ studied optimal burn-in and maintenance policy under the assumption that FðtÞ has a bathtub-shaped failure rate function, which is defined in the following definition. Definition 1 A failure rate function is said to have a bathtub shape if there exist 0 t1 t2 \1 such that 8 if 0 t t1 ; < strictly decreases; rðtÞ ¼ is a constant, sayk0 ; if t1 t t2 ; : strictly increases; if t2 t; where t1 and t2 are called the change points of rðtÞ: Then Mi [16] considered the following burn-in procedure. Burn-in Procedure A [16] Consider a fixed burn-in time b and begin to burn-in a new device. If the device fails before burn-in time b; then repair it completely with shop repair cost cs [ 0; then burn-in the repaired device again, and so on. If the device survives the burn-in time b; then it is put into field operation.

A Survey of Burn-in and Maintenance Models for Repairable Systems

181

Here, it is assumed that the repair is complete, i.e., the repaired device is as good as new. The cost for burn-in is assumed to be proportional to the total burn-in time with proportionality constant c0 [ 0: Let hðbÞ denote the total cost incurred in obtaining the device which firstly survives the burn-in procedure. Then the mean cost E½hðbÞ can be obtained as follows. Conditioning on whether the first device survives the burn-in time b or not, we have E½hðbÞ ¼ ðc0 E½XjX\b þ cs þ E½hðbÞÞ FðbÞ þ ðc0 bÞ FðbÞ ¼ ðc0 E½minfX; bgjX\b þ cs þ E½hðbÞÞ FðbÞ þ ðc0 E½minfX; bgjX bÞ FðbÞ ¼ c0 minfX; bg þ cs FðbÞ þ E½hðbÞ FðbÞ ¼

Zb

FðtÞdt þ cs FðbÞ þ E½hðbÞ FðbÞ:

0

From this, Rb

FðtÞdt

FðbÞ þ cs : E½hðbÞ ¼ c0 FðbÞ FðbÞ 0

2.1 Model 1 In field operation, Mi [16] considered two types of replacement policies, depending on whether the device is repairable or not. For a non-repairable device, the age replacement policy is considered. That is, the device is replaced by a new burned-in device at the time of its failure or ‘field-use age’ T; whichever occurs first. Let cf denote the cost incurred for each failure in field operation and ca satisfying 0\ca \cf the cost incurred for each non-failed item which is replaced by a new burned-in item at its field-use age T: Then, by the theory of renewal reward processes, the long-run average cost rate cðb; TÞ is given by cðb; TÞ ¼

b ðTÞ kðbÞ þ cf Fb ðTÞ þ ca F ; T R Fb ðtÞdt 0

b ðtÞ is the conditional survival function, i.e., F b ðtÞ Fðb þ tÞ=FðbÞ where F and kðbÞ E½hðbÞ: In Theorem 1 of Mi [16], the results regarding the optimal burn-in time b and the optimal age T which satisfy cðb ; T Þ ¼

min

b 0; T [ 0

Cðb; TÞ;

182

J. H. Cha

are given. However, there are several ‘hidden’ properties which can be found in the proof of the theorem and thus the properties are reformulated as follows. Theorem 1 [16] Suppose that the failure rate function rðtÞis bathtub-shaped and differentiable. Let B1

b 0 : lðbÞrð1Þ [

cf þ kðbÞ ; cf ca

R1 b ðtÞdt; and B2 ½0; 1ÞnB1 : Then the properties of the optimal where lðbÞ 0 F burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation ZT þ TÞ cf þ kðbÞ Fðb þ tÞ Fðb : dt þ ¼ rðb þ TÞ c f ca FðbÞ FðbÞ

ð1Þ

0

Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b t1 ; is the value which satisfies b þ T ðb Þ ¼ min0 b t1 ðb þ T ðbÞÞ h Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where 0 b t1 ; is the value which satisfies cf þ kðb Þ cf þ kðbÞ ¼ min : 0 b t1 lðb Þ lðbÞ Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (1). Furthermore, let b1 2 ½0; t1 \ B1 satisfy b1 þ T ðb1 Þ ¼

min ðb þ T ðbÞÞ;

b t1 ; b2B1

and b2 2 ½0; t1 \ B2 satisfy cf þ kðb2 Þ cf þ kðbÞ ¼ min : b t1 ; b2B2 lðb2 Þ lðbÞ If ðcf ca Þrðb1 þ T ðb1 ÞÞ

cf þ kðb2 Þ ; lðb2 Þ

then the optimal ðb ; T Þ ¼ ðb1 ; T ðb1 ÞÞ: Otherwise the optimal ðb ; T Þ is ðb2 ; 1Þ: h

A Survey of Burn-in and Maintenance Models for Repairable Systems

183

2.2 Model 2 For a repairable device, applying the same burn-in procedure as before, block replacement with minimal repair at failure is performed during field operation. More precisely, fix a T [ 0 and replace the component at times T; 2T; 3T; . . . with new burned-in component. Also, at each intervening failure, a minimal repair is performed. Assume cm [ 0 is the cost of a minimal repair, and cr [ 0 is the cost of replacement. In this case, the long-run average cost rate is given by 0 1 ZbþT 1@ cðb; TÞ ¼ rðtÞdt þ cr A: kðbÞ þ cm T

ð2Þ

b

The following theorem, which is redescribed from Mi [16] based on the proof of the corresponding theorem, provides the properties of optimal ðb ; T Þ minimizing cðb; TÞ: Theorem 2 [16] Suppose that the failure rate function rðtÞ is bathtub-shaped and differentiable. Let 8 Z1 < B1 b 0 : ½rð1Þ rðtÞdt : b

Z b 1 [ ðcr cs ÞFðbÞ þ cs þ c0 FðtÞdt ; cm FðbÞ 0 and B2 ½0; 1ÞnB1 : Then the properties of the optimal burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation

Trðb þ TÞ

ZbþT

2 rðtÞdt ¼

1 4 ðcr cs ÞFðbÞ þ cs þ c0 cm FðbÞ

Zb

3 5: FðtÞdt

ð3Þ

0

b

Then, the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b t1 ; is the value which satisfies b þ T ðb Þ ¼ min0 b t1 ðb þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where b can be any value in ½0; 1Þ: Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (3). Then, the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where b is the value which satisfies b þ T ðb Þ ¼

min ðb þ T ðbÞÞ:

b t1 ;b2B1

h

184

J. H. Cha

2.3 Model 3 In Model 2, Burn-in Procedure A is applied to repairable devices. In many cases, because of practical limitations, products which fail during the burn-in are just scraped, regardless of whether the products are repairable or not. In this case, Burn-in Procedure A can be applied. But when dealing with an expensive product or device of some complexity, the complete product will not be discarded on account of failure during a burn-in, but rather a repair can be performed. Cha [4] proposed the following burn-in procedure. Burn-in Procedure B [4] Consider a fixed burn-in time b and begin to burn-in a new component. On each component failure, only minimal repair is done with shop minimal repair cost csm [ 0; and continue the burn-in procedure for the repaired component. Immediately after the fixed burn-in time b; the component is put into field operation. Note that the total burn-in time for this burn-in procedure is a constant b: For a burned-in component, block replacement policy with minimal repair at failure is adopted in field operation as it was in Model 2. Assume 0\csm \cs ; then this means that the cost of a minimal repair during a burn-in process is lower than that of a complete repair, which is a reasonable assumption. Then, the long-run average cost rate is given by 1 cðb; TÞ ¼ ðc0 b þ csm KðbÞ þ cm ðKðb þ TÞ KðbÞÞ þ cr Þ; T Rt where KðtÞ 0 rðuÞdu: Then it can be shown that cB ðb; TÞ cA ðb; TÞ;

ð4Þ

80\b\1; 0\T\1;

where cA ðb; TÞ and cB ðb; TÞ are the cost rate functions in (2) and (4), respectively. This implies that cB ðbB ; TB Þ cA ðbA ; TA Þ; where ðbA ; TA Þ and ðbB ; TB Þ are the optimal solutions which minimize cA ðb; TÞ and cB ðb; TÞ; respectively. Thus we can conclude that Burn-in Procedure B is always preferable to Burn-in Procedure A when the minimal repair method is applicable. Let ðb ; T Þ be the optimal burn-in time and optimal replacement time which minimize the cost rate (4). Then the properties of b and T are given in the following theorem. Theorem 3 [4] Suppose that the failure rate function rðtÞ is bathtub-shaped and differentiable. Let 9 8 Z1 = < 1 B1 b 0 : ½rð1Þ rðtÞdt [ ½cr þ c0 b þ csm KðbÞ ; ; : cm b

A Survey of Burn-in and Maintenance Models for Repairable Systems

185

and B2 ½0; 1ÞnB1 : Then the properties of the optimal burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation Trðb þ TÞ

ZbþT

rðtÞdt ¼

1 ½cr þ c0 b þ csm KðbÞ: cm

ð5Þ

b

Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b t1 ; is the value which satisfies b þ T ðb Þ ¼ min0 b t1 ðb þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where b can be any value in ½0; 1Þ: Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (5). Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where b is the value which satisfies b þ T ðb Þ ¼

min ðb þ T ðbÞÞ:

h

b t1 ; b2B1

3 Burn-in Procedures for General Failure Model In this section, we survey papers on burn-in procedures for a general failure model. In the general failure model, when the unit fails, Type I ailure and Type II failure may occur with some probabilities. It is assumed that Type I failure is a minor one and thus can be removed by a minimal repair or a complete repair (or a replacement), whereas Type II failure is a catastrophic one and thus can be removed only by a complete repair. Such models have been considered in the literature; see, for example, Beichelt [1] and Beichelt and Fischer [2].

3.1 Constant Probability Model In this model, it is assumed that when the unit fails, Type I failure occurs with probability 1 p and Type II failure occurs with probability p; 0 p 1: Under this model, Cha [5] proposed the following burn-in procedure. Burn-in Procedure C [5] Consider a fixed burn-in time b and begin to burn-in a new component. On each component failure, only minimal repair is done for the Type I failure with shop minimal repair cost 0\csm cs ; and a complete repair is performed for the Type II failure with shop complete repair cost cs : And then continue the burn-in procedure for the repaired component.

186

J. H. Cha

Then, considering both Burn-in Procedures A and C for the general failure model defined above, Cha [5] studied optimal burn-in and replacement policy. As mentioned before, when the minimal repair method is impractical during burnin, Burn-in Procedure C cannot be applied and only Burn-in Procedure A should be used. However, if the minimal repair method is applicable, then both Procedures A and C can be considered. Note that Burn-in Procedure A stops when there is no failure during a fixed burn-in time ð0; b at the first time, whereas Procedure C stops when there is no Type II failure during a fixed burn-in time ð0; b at the first time. In the field operation, the component is replaced by a new burned-in component at the ‘field use age’ T or at the time of the first Type II failure, whichever occurs first. For each Type I failure occurring during field use, only minimal repair is done. Let Yb be the time to first Type II failure of a burned-in component with fixed b ðtÞ as burn-in time b: If we define Gb ðtÞ as the distribution function of Yb and G b ðtÞ is given by 1 Gb ðtÞ; then G b ðtÞ ¼ PðYb [ tÞ G Zt ¼ expf prðb þ uÞdug

ð6Þ

0

¼ expfp½Kðb þ tÞ KðbÞg;

8t 0;

Rt

where KðtÞ 0 rðuÞdu: Let the random variable Nðb; TÞ be the total number of minimal repairs of a burned-in component which occur during field operation under burn-in time b and replacement policy T: Then, using the results presented in Beichelt [1], it is easy to see that, when p 6¼ 0; the expectation of Nðb; TÞ is given by 1 E½Nðb; TÞ ¼ Gb ðtÞ

ZT Z t 0

þ

ZT

ð1 pÞrðb þ uÞdudGb ðtÞ Gb ðtÞ

0

b ðTÞ ð1 pÞrðb þ uÞdu G

ð7Þ

0

1 1 ð1 expfp½Kðb þ TÞ KðbÞgÞ ¼ p When p ¼ 0 the expectation is given by E½Nðb; TÞ ¼ Kðb þ TÞ KðbÞ: Let cf denote the cost incurred for each Type II failure in field operation and ca satisfying 0\ca \cf the cost incurred for each non-failed item which is replaced at field use age T [ 0: And also denote by cm the cost of a minimal repair which is performed in field operation. When p ¼ 0 or p ¼ 1; the burn-in and replacement

A Survey of Burn-in and Maintenance Models for Repairable Systems

187

model discussed in this section reduces to that in Mi [16] or Cha [4]. Thus, in the discussion below, we assume that 0\p\1: Then, using the results given in (6) and (7), the long-run average cost rate functions for Procedures A and C are given by 3 02 b R FðtÞdt B6 1 FðbÞ7 7 B6 0 þ cs 7 cA ðb; TÞ ¼ T B6c0 R @4 FðbÞ FðbÞ5 b ðtÞdt G 0 1 ð8Þ 1 ð1 expfp½Kðb þ TÞ KðbÞgÞ þ cm p 1 C b ðTÞC þcf Gb ðTÞ þ ca G C; A and 02 cC ðb; TÞ ¼

1 RT

b ðtÞdt G

Rb GðtÞdt B6 GðbÞ B6 0 þ cs B6c0 @4 GðbÞ GðbÞ

0

3

þcsm

7 1 7 1 ðexpfpKðbÞg 1Þ7 5 p

ð9Þ

1 1 ð1 expfp½Kðb þ TÞ KðbÞgÞ þ cm p 1 C b ðTÞC þcf Gb ðTÞ þ ca G C; A where cA ðb; TÞ and cC ðb; TÞ represent the cost rate for Burn-in Procedure A and C, respectively; see Cha [5] for detailed derivations of cA ðb; TÞ and cC ðb; TÞ: Cha [5] showed ðiÞcC ð0; T; pÞ ¼ cA ð0; T; pÞ; ðiiÞcC ðb; T; pÞ\cA ðb; T; pÞ;

80\T 1; 0\p\1;

80\b\1; 0\T 1; 0\p\1:

Then, from the above inequalities, it can be concluded that Burn-in Procedure C is always (i.e. for all 0\p\1) preferable to Burn-in Procedure A when the minimal repair method is applicable.

188

J. H. Cha

Now the properties of optimal burn-in and optimal replacement times are discussed. Note that the cost rate functions in (8) and (9) can be expressed as 1 1 kðbÞ þ cm cðb; TÞ ¼ R T 1 ð1 expfp½Kðb þ TÞ KðbÞgÞ p 0 Gb ðtÞdt b ðTÞ ; þcf Gb ðTÞ þ ca G ð10Þ where kðbÞ is the average cost incurred during a burn-in process for each model. Then the properties of the optimal ðb ; T Þ which minimizes the cost rate (10) are given in the following theorem. Theorem 4 [5] Suppose that the failure rate function rðtÞ is bathtub-shaped and differentiable. Let 8 Z1 < B1 b 0 : prð1Þ expfp½KðtÞ KðbÞgdt 1 : b 9 = 1

[ ðca þ kðbÞÞ ; ; ½c 1 1 þ ðc c Þ m p

f

a

and B2 ½0; 1ÞnB1 : Then the properties of the optimal burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation prðb þ TÞ

ZbþT

expfp½KðtÞ KðbÞgdt þ expfp½Kðb þ TÞ KðbÞg 1

b

¼ ½cm

1 p

1

1 þ ðcf ca Þ

ðca þ kðbÞÞ: ð11Þ

Then, the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b t1 ; is the value which satisfies b þ T ðb Þ ¼ min0 b t1 ðb þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where 0 b t1 ; is the value which satisfies 1 1 1 1 c 1 þ kðb c 1 þ kðbÞ : þ c Þ ¼ min þ c f m f m 0 b t1 lðbÞ lðb Þ p p Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (11). Furthermore, let b1 2 ½0; t1 \ B1 satisfy b1 þ T ðb1 Þ ¼

min

b t1 ; b2B1

ðb þ T ðbÞÞ;

A Survey of Burn-in and Maintenance Models for Repairable Systems

189

and b2 2 ½0; t1 \ B2 satisfy 1 1 1 1 c þ c Þ ¼ min þ c 1 þ kðb c 1 þ kðbÞ : f m f m 2 b t1 ; b2B2 lðbÞ lðb2 Þ p p If

1 1 1 c þ c Þ 1 þ cf ca prðb1 þ T ðb1 ÞÞ 1 þ kðb cm f m 2 ; p lðb2 Þ p

then the optimal ðb ; T Þ ¼ ðb1 ; T ðb1 ÞÞ: Otherwise the optimal ðb ; T Þ is h ðb2 ; 1Þ:

3.2 Time-Dependent Probability Model In Cha [6], the Constant Probability Model was further extended to the case when the corresponding probabilities change with operating time. Assume now that, when the unit fails at its age t; Type I failure occurs with probability 1 pðtÞ and Type II failure occurs with probability pðtÞ; 0 pðtÞ 1: In this model, we employ the same notations and random variables used before. Also, note that if pðtÞ ¼ p ; a:e: (w.r.t. Lebesgue measure), 0 p 1; the models under consideration can be reduced to those of Mi [16] and Cha [4], [5]. Thus, we only consider the set of functions P as the set of all of the Type II failure probability functions, which is given by P ¼ fpðÞ : 0 pðtÞ 1; 8 t 0g n fpðÞ : pðtÞ ¼ p ; a:e:; 0 p 1g: It can be shown that b ðtÞ ¼ expf½Kp ðb þ tÞ Kp ðbÞg; G where Kp ðtÞ

Rt 0

8t 0;

pðuÞrðuÞdu; and E½Nðb; TÞ ¼

ZT

b ðtÞdt Gb ðTÞ: rðb þ tÞG

0

Then, considering both Burn-in Procedures A and C for this extended model, the long-run average cost rate functions are given by 02 3 Zb 1 @4c0 expf½KðtÞ KðbÞgdt þ cs ½expfKðbÞg 15 cA ðb; TÞ ¼ R T b ðtÞdt G 0 0 2 T 3 1 Z b ðtÞdt Gb ðTÞ5 þ cf Gb ðTÞ þ ca G b ðTÞA; ð12Þ þcm 4 rðb þ tÞG 0

190

J. H. Cha

where KðtÞ

cC ðb; TÞ ¼ R T 0

Rt 0

rðuÞdu; and

1 b ðtÞdt G

02 @4c0

Zb

expf½Kp ðtÞ Kp ðbÞgdt

0

þcs ½expfKp ðbÞg 1 þ csm

Zb

3 ð1 pðtÞÞrðtÞ expf½Kp ðtÞ Kp ðbÞgdt5

0

2 T 3 1 Z b ðtÞdt Gb ðTÞ5 þ cf Gb ðTÞ þ ca G b ðTÞA: þcm 4 rðb þ tÞG 0

ð13Þ As before, it can be shown that ðiÞcC ð0; T; pðÞÞ ¼ cA ð0; T; pðÞÞ; ðiiÞcC ðb; T; pðÞÞ cA ðb; T; pðÞÞ;

8 0\T 1; pðÞ 2 P; 8 0\b\1; 0\T 1; pðÞ 2 P;

which ensures the superiority of Burn-in Procedure C when the minimal repair method is applicable. The cost rate functions in (12) and (13) can be rewritten as 0 2 T 3 Z 1 b ðtÞdt Gb ðTÞ5 @kðbÞ þ cm 4 rðb þ tÞG cðb; TÞ ¼ R T 0 Gb ðtÞdt 0 þcf Gb ðTÞ þ ca Gb ðTÞ ; where kðbÞ denotes the average cost incurred during a burn-in process. Then, under the following assumptions, the properties regarding the optimal burn-in time b and the optimal replacement policy T can be obtained. Assumptions 1. The failure rate function rðtÞ is differentiable and bathtub shaped with the first change point s1 and the second change point s2 : 2. The Type II failure probability function pðtÞ is differentiable and bathtub shaped with the first change point u1 and the second change point u2 : 3. Let t1 maxðs1 ; u1 Þ and t2 minðs2 ; u2 Þ then t1 \t2 holds. 4. ðcf ca Þ [ cm : Theorem 5 [6] Suppose that the assumptions (1)–(4) described above hold. Let the set B1 be

A Survey of Burn-in and Maintenance Models for Repairable Systems

B1

8 < :

b 0 : cm

Z1

191

½rð1Þ rðtÞ expf½Kp ðtÞ Kp ðbÞgdt

b

2 3 Z1 þ ðcf ca Þ cm 4 pð1Þrð1Þ expf½Kp ðtÞ Kp ðbÞgdt 15

b

[ ðca þ kðbÞÞ ;

and B2 ½0; 1ÞnB1 : Then the properties of the optimal burn-in time b and replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation, ZbþT cm

½rðb þ TÞ rðtÞ expf½Kp ðtÞ Kp ðbÞgdt

b

2 þððcf ca Þ cm Þ4 pðb þ TÞrðb þ TÞ

ZbþT

expf½Kp ðtÞ Kp ðbÞgdt

ð14Þ

b

1 expf½Kp ðb þ TÞ Kp ðbÞg ¼ ðca þ kðbÞÞ; then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b t1 is the value which satisfies ðb þ T ðb ÞÞ ¼ min0 b t1 ðb þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where 0 b t1 is the value which satisfies 2 3 Z1 1 4 ðcf cm Þ þ cm rðtÞ expf½Kp ðtÞ Kp ðb Þgdt þ kðb Þ5 lðb Þ b

2

1 4 ðcf cm Þ þ cm ¼ min 0 b t1 lðbÞ

Z1

3 rðtÞ expf½Kp ðtÞ Kp ðbÞgdt þ kðbÞ5;

b

where lðbÞ is given by lðbÞ ¼

Z1 b

expf½Kp ðtÞ Kp ðbÞgdt:

ð15Þ

192

J. H. Cha

Case 3 B1 6¼ ;; B2 6¼ ;: Let T ðbÞ; b 2 B1 ; be the unique solution of the equation (14) and lðbÞ be given by (15). Furthermore, let b1 2 ½0; t1 \ B1 be the value which satisfies b1 þ T ðb1 Þ ¼

min ðb þ T ðbÞÞ;

b t1 ; b2B1

and b2 2 ½0; t1 \ B2 be the value which satisfies 2 1 6 4ðcf cm Þ þ cm lðb2 Þ

Z1

min

b t1 ; b2B2

7 rðtÞ expf½Kp ðtÞ Kp ðb2 Þgdt þ kðb2 Þ5

b2

2

¼

3

1 4 ðcf cm Þ þ cm lðbÞ

Z1

3 rðtÞ expf½Kp ðtÞ Kp ðbÞgdt þ kðbÞ5:

b

If cm rðb1 þ T ðb1 ÞÞ þ ððcf ca Þ cm Þpðb1 þ T ðb1 ÞÞrðb1 þ T ðb1 ÞÞ 3 2 Z1 1 6 7 rðtÞ expf½Kp ðtÞ Kp ðb2 Þgdt þ kðb2 Þ5; 4ðcf cm Þ þ cm lðb2 Þ b2

then the optimal ðb ; T Þ ¼ ðb1 ; T ðb1 ÞÞ: Otherwise optimal ðb ; T Þ ¼ ðb2 ; 1Þ: h Remark 1 In the above theorem, it is assumed that both rðtÞ and pðtÞ are bathtubshaped functions. Cha and Mi [9] investigated how this assumption can practically be satisfied when a device is composed of two parts (Part A and Part B) which are connected to each other in a series form. Assume that the failure of Part A causes a catastrophic failure, whereas that of Part B causes a minor failure. Then, in this case, the failure rate of the device is given by rðtÞ ¼ r1 ðtÞ þ r2 ðtÞ; and the probability of Type II failure pðtÞ is given by pðtÞ ¼

r1 ðtÞ ; r1 ðtÞ þ r2 ðtÞ

where r1 ðtÞ and r2 ðtÞ are the failure rate functions of Parts A and B, respectively; see Cha and Mi [9] for more detailed discussions and several examples where rðtÞ and pðtÞ have many different shapes. h

A Survey of Burn-in and Maintenance Models for Repairable Systems

193

4 Burn-in Procedures in Accelerated Environment Burn-in is generally considered to be expensive and the length of burn-in is typically limited. Furthermore, for today’s highly reliable products, many latent failures or weak components require a long time to detect or identify. Thus, as remarked in Sect. 8 of Block and Savits [3], burn-in is most often accomplished in an accelerated environment in order to shorten the burn-in process. Cha [7] proposed a stochastic model for burn-in procedures for an accelerated burn-in procedure.

4.1 Stochastic Model for Accelerated Burn-in Procedure In this section, the stochastic model for the accelerated burn-in procedure proposed in Cha [7] will be briefly introduced. The approach employs the basic statistical property commonly used in accelerated life tests (ALT). In ALT, test units are used more frequently than usual or are subjected to higher than usual levels of stress or stresses like temperature and voltage. Then the information obtained from the test performed in higher level of environment is used to predict actual product performance in usual level of environment. Nelson [18] provides an extensive and comprehensive source for background material, practical methodology, basic theory, and examples for accelerated testing. Meeker and Escobar [15] is also a good review paper of recent researches and current issues in ALT. As before, the random variable X denotes the lifetime of a component used in the usual level of environment and FðtÞ is the distribution function of X: We assume that X is an absolutely continuous nonnegative random variable and thus the distribution has no probability mass at infinity. Let f ðtÞ be the probability density function of X: Its failure rate function rðtÞ is then given by rðtÞ ¼ f ðtÞ=FðtÞ; where FðtÞ is the survival function of X: Also the random variable XA denotes the lifetime of a component operated in the accelerated level of environment and FA ðtÞ; rA ðtÞ are the corresponding distribution function and failure rate function, respectively. The ‘Accelerated Failure Time’(AFT) regression model is the most widely used parametric failure time regression model in ALT. Under this model, higher stress has the effect of shrinking time through a scale factor. This can generally be expressed as FA ðtÞ ¼ FðqðtÞÞ;

8t 0;

ð16Þ

where qðtÞ depends on the accelerated environment. Since the accelerated environment gives rise to higher stresses than usual environment, reasonable assumptions are qðtÞ t for all t [ 0 and qð0Þ ¼ 0: Furthermore we assume that qðtÞ in the model (16) is strictly increasing, continuous, and differentiable. Then, the model given in (16) implies that XA st X: Here, the notation ‘‘ st ’’ denotes the usual stochastic order, that is, we say that Z1 is said to be smaller than Z2 in the usual

194

J. H. Cha

stochastic order, which is denoted as Z1 st Z2 ; if F2 ðtÞ F1 ðtÞ; for all t 0; where F1 ðtÞ and F2 ðtÞ are the distribution functions of Z1 and Z2 ; respectively. From the model (16), the failure rate function in the accelerated environment is given by rA ðtÞ ¼

q0ðtÞf ðqðtÞÞ ¼ q0ðtÞrðqðtÞÞ: 1 FðqðtÞÞ

On the other hand, right after a new component has been burned-in during a fixed burn-in time b under the accelerated environment, the ‘virtual age’, which is transformed to the usual level of environment of the component, would be not less than b: Thus, we assume that the survival function of the burned-in component with accelerated burn-in time b; which is operated in the usual level of environment, is given by 0 1 Zt FðaðbÞ þ tÞ Fb ðtÞ; ð17Þ exp@ rðaðbÞ þ uÞduA ¼ FðaðbÞÞ 0

where aðbÞ satisfies aðbÞ b for all b 0; að0Þ ¼ 0 and is assumed to be strictly increasing and differentiable function. Equation (17) implies that the performance of a component with accelerated burn-in time b is the same as that of a component which has been operated in the usual level of environment during time aðbÞ: Hence the function aðbÞ represents the accelerated ageing process induced by the accelerated burn-in procedure. From (17), it is easy to see that the burned-in component with accelerated burn-in time b and ‘field use age’ u has failure rate rðaðbÞ þ uÞ;

8u 0:

Now, combining the accelerated burn-in phase and the field use phase, the failure rate function of a component under accelerated burn-in time b; which is denoted by kb ðtÞ; can be expressed as q0ðtÞrðqðtÞÞ; if 0 t b (Burn-in Phase); kb ðtÞ ¼ ð18Þ rðaðbÞ þ ðt bÞÞ; if t b (Field Use Phase): Generally, the shapes of qðtÞ and aðbÞ depend on the level(s) of stress(es) induced during the accelerated burn-in process. A higher level(s) of stress(es) would yield rapidly increasing functions of qðtÞ and aðbÞ; whereas a lower level(s) of stress(es) would give rise to slowly increasing qðtÞ and aðbÞ: Hence, in general, the shapes of qðtÞ and aðbÞ could usually be assumed to be similar. Since the conditions on the functions qðtÞ and aðbÞ are not too restrictive and are minimal ones, the failure rate model in (18) can be considered as a general one and it can be applied to a wide range of applications. Also note that for the burn-in procedure performed in the usual level of environment (in this case, qðtÞ ¼ t for all t 0; and aðbÞ ¼ b for all b 0), the relationship kb ðtÞ ¼ rðtÞ; for all t 0; holds. Therefore, the accelerated burn-in model under consideration is a generalization of the burn-in model in a usual level of environment.

A Survey of Burn-in and Maintenance Models for Repairable Systems

195

Remark 2 Similar to the cumulative exposure model described in Nelson [18], assume now that the virtual age aðtÞ in the normal environment ‘produces’ the same population cumulative fraction of units failing as the age ‘t’ does in the accelerated environment. Formally, it means that FðaðtÞÞ ¼ FA ðtÞ:

ð19Þ

Applying the inverse operator F 1 to both sides of (19), we have aðtÞ ¼ F 1 ðFA ðtÞÞ ¼ qðtÞ;

8t 0:

Therefore, under the above mentioned assumption, aðtÞ ¼ qðtÞ; 8t 0:

h

4.2 Accelerated Burn-in and Maintenance Policy In Cha and Na [10], burn-in and replacement models 1, 2, and 3 considered in Sect. 2 were extended to the case when burn-in is accomplished in an accelerated environment, respectively.

4.2.1 Model 1 We consider burn-in and replacement model 1: the component is burned-in by Burn-in Procedure A under the accelerated environment and the component surviving the burn-in procedure is put into field operation. In field operation, an age replacement policy is applied to the component. In this case, the long-run average cost rate is given by 3 02 b 1 R FA ðtÞdt B6 C 1 FA ðbÞ7 7 B6 0 b ðTÞC þ cs cðb; TÞ ¼ T 7 þ cf Fb ðTÞ þ ca F B6c0 C; ð20Þ R @4 A FA ðbÞ FA ðbÞ5 Fb ðtÞdt 0

b ðtÞ is given by (17) and Fb ðtÞ 1 F b ðtÞ: where F Let b be the optimal accelerated burn-in time and T be the optimal replacement policy which satisfy cðb ; T Þ ¼

min

b 0; T [ 0

cðb; TÞ:

Then the properties regarding the optimal accelerated burn-in time b and the optimal replacement policy T are given in the following theorem. Theorem 6 [10] Suppose that the failure rate function rðtÞ is bathtub shaped and differentiable. Let the set B1 be

196

J. H. Cha

8 > Z1 < B1 b 0 : rð1Þ expf½KðtÞ KðaðbÞÞgdt 1 > : aðbÞ

1 ½ca þ cs ½expfKðqðbÞÞg 1 c f ca Z b þc0 expf½KðqðtÞÞ KðqðbÞÞgdt ; [

0

and B2 ½0; 1ÞnB1 : Furthermore; leta1 ðt1 Þ 0 be the unique solution of the equation aðtÞ ¼ t1 : Then the properties of optimal accelerated burn-in time b and replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation

rðaðbÞ þ TÞ

aðbÞþT Z

expf½KðtÞ KðaðbÞÞgdt

aðbÞ

þ expf½KðaðbÞ þ TÞ KðaðbÞÞg 1 1 ¼ ½ca þ cs ½expfKðqðbÞÞg 1 cf ca 3 Zb þc0 expf½KðqðtÞÞ KðqðbÞÞgdt5:

ð21Þ

0

Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b a1 ðt1 Þ is the value which satisfies aðb Þ þ T ðb Þ ¼

min

0 b a1 ðt1 Þ

ðaðbÞ þ T ðbÞÞ:

Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: In this case, the optimal ðb ; T Þ ¼ ðb ; 1Þ; where 0 b a1 ðt1 Þ is the value which satisfies 2 3 Zb 1 4 cf þ cs ½expfKðqðb ÞÞg 1 þ c0 expf½KðqðtÞÞ Kðqðb ÞÞgdt5 lðaðb ÞÞ 0

1 cf þ cs ½expfKðqðbÞÞg 1 ¼ min1 0 b a ðt1 Þ lðaðbÞÞ 3 Zb þc0 expf½KðqðtÞÞ KðqðbÞÞgdt5; 0

A Survey of Burn-in and Maintenance Models for Repairable Systems

197

where lðaðbÞÞ is given by

lðaðbÞÞ

Z1

expf½KðtÞ KðaðbÞÞgdt:

ð22Þ

aðbÞ

Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (21) and let lðaðbÞÞ be given by (22). Furthermore, let b1 2 ½0; a1 ðt1 Þ \ B1 satisfy aðb1 Þ þ T ðb1 Þ ¼

min

b a1 ðt1 Þ; b2B1

ðaðbÞ þ T ðbÞÞ;

and b2 2 ½0; a1 ðt1 Þ \ B2 satisfy 2

3

1 6 4cf þ cs ½expfKðqðb2 ÞÞg 1 þ c0 lðaðb2 ÞÞ

Zb2

7 expf½KðqðtÞÞ Kðqðb2 ÞÞgdt5

0

¼

1 cf þ cs ½expfKðqðbÞÞg 1 lðaðbÞÞ Z b þc0 expf½KðqðtÞÞ KðqðbÞÞgdt : min

b a1 ðt1 Þ; b2B2

0

If ðcf

ca Þrðaðb1 Þ

þT

1 cf þ cs ½expfKðqðb2 ÞÞg 1 lðaðb2 ÞÞ 3 Zb2 7 þc0 expf½KðqðtÞÞ Kðqðb2 ÞÞgdt5;

ðb1 ÞÞ

0

then the optimal ðb ; T Þ is ðb1 ; T ðb1 ÞÞ: Otherwise, the optimal ðb ; T Þ is ðb2 ; 1Þ: h

4.2.2 Model 2 We consider burn-in and replacement model 2: the component is burned-in by Burn-in Procedure A and the block replacement with minimal repair at failure is applied to the component in field use. In this case, the long-run average cost rate is given by

198

J. H. Cha

02

Rb

A ðtÞdt F

3

6 1B FA ðbÞ7 7 B6 cðb; TÞ ¼ B6c0 0 þ cs 7 FA ðbÞ FA ðbÞ5 T @4

ð23Þ !

þcm ½KðaðbÞ þ TÞ KðaðbÞÞ þ cr : Then the properties of the optimal b and T minimizing cðb; TÞ in (23) are given in the following theorem. Theorem 7 [10] Suppose that the failure rate function rðtÞ is bathtub shaped and differentiable. Let the set B1 be 8 > Z1 < B1 b 0 : ½rð1Þ rðtÞdt > : aðbÞ

[

1 ½cr þ cs ½expfKðqðbÞÞg 1 cm

þc0

Zb 0

39 = expf½KðqðtÞÞ KðqðbÞÞgdt5 ; ;

1

B2 ½0; 1ÞnB1 ; and a ðt1 Þ 0 be the unique solution of the equation aðtÞ ¼ t1 : Then the properties of the optimal burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation TrðaðbÞ þ TÞ

aðbÞþT Z

rðtÞdt aðbÞ

1 cr þ cs ½expfKðqðbÞÞg 1 ¼ cm þc0

Zb

ð24Þ 3

expf½KðqðtÞÞ KðqðbÞÞgdt5:

0

Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b a1 ðt1 Þ; is the value which satisfies aðb Þ þ T ðb Þ ¼ min0 b a1 ðt1 Þ ðaðbÞ þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where b can be any value in ½0; 1Þ:

A Survey of Burn-in and Maintenance Models for Repairable Systems

199

Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (24). Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where b is the value which satisfies aðb Þ þ T ðb Þ ¼

min

b a1 ðt1 Þ; b2B1

ðaðbÞ þ T ðbÞÞ:

h

4.2.3 Model 3 We consider burn-in and replacement model 3: the component is burned-in by Burn-in Procedure B and the block replacement with minimal repair at failure is applied to the component in field use. Then, obviously, the long run average cost rate is given by 1 cðb; TÞ ¼ ð½c0 b þ csm KðqðbÞÞ þ cm ½KðaðbÞ þ TÞ KðaðbÞÞ þ cr Þ: T

ð25Þ

The properties of the optimal b and T minimizing cðb; TÞ in (25) are given in the following theorem. Theorem 8 [10] Suppose that the failure rate function rðtÞ is bathtub-shaped and differentiable. Let 9 8 Z1 = < 1 B1 b 0 : ½rð1Þ rðtÞdt [ ½cr þ c0 b þ csm KðbÞ ; ; : cm b

B2 ½0; 1ÞnB1 ; and a1 ðt1 Þ 0 be the unique solution of the equation aðtÞ ¼ t1 : Then the properties of the optimal burn-in time b and the replacement policy T can be stated in detail as follows: Case 1 B1 ¼ ½0; 1Þ; B2 ¼ ;: Let T ðbÞ be the unique solution of the equation

TrðaðbÞ þ TÞ

aðbÞþT Z

rðtÞdt ¼

1 ½cr þ c0 b þ csm KðqðbÞÞ: cm

ð26Þ

aðbÞ

Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where 0 b a1 ðt1 Þ; is the value which satisfies aðb Þ þ T ðb Þ ¼ min0 b a1 ðt1 Þ ðaðbÞ þ T ðbÞÞ: Case 2 B1 ¼ ;; B2 ¼ ½0; 1Þ: The optimal ðb ; T Þ ¼ ðb ; 1Þ; where b can be any value in ½0; 1Þ: Case 3 B1 6¼ ;; B2 6¼ ;: For b 2 B1 ; let T ðbÞ be the unique solution of the equation (26). Then the optimal ðb ; T Þ ¼ ðb ; T ðb ÞÞ; where b is the value which satisfies

200

J. H. Cha

aðb Þ þ T ðb Þ ¼

min

b a1 ðt1 Þ;b2B1

ðaðbÞ þ T ðbÞÞ:

h

5 Current Issues and Future Topics 5.1 Optimal Burn-in Under Generalized Assumption It is widely believed that many products, particularly electronic products or devices such as silicon integrated circuits, exhibit bathtub shaped failure rate functions. This belief is supported by much experience and extensive data collected by practitioners and researchers in different industries. Hence, as was described in the previous sections, much research on burn-in has been done under the assumption of bathtub shaped failure rate function. In Mi [17], a more general model for the failure rate function, i.e. eventually increasing failure rate function, is proposed and optimal burn-in time has been studied assuming the general failure rate model. It can be seen that this general assumption includes the traditional bathtub shaped failure rate function as a special case. The following is the definition of an eventually increasing failure rate function. Definition 2 A failure rate function rðxÞ is eventually increasing if there exists 0 x0 \1 such that rðxÞ strictly increases in x [ x0 : For an eventually increasing failure rate function rðxÞ the first and second wear-out points t and t are defined by t ¼ infft 0 : rðxÞ is non decreasing in x tg t ¼ infft 0 : rðxÞis strictly increasing in x tg:

Obviously, 0 t t x0 \1 if rðxÞ is eventually increasing. For more detailed discussions about general assumptions for the shape of the failure rate function in a burn-in model, see also Cha and Mi [8]. The previous burn-in and replacement models can be studied under the above generalized assumption and related results can be further extended.

5.2 Optimal Burn-in Under Mixed Populations Due to high initial failure rate which often occurs in the early stages of components’ life, burn-in has been considered as an essential procedure for revealing early failures. In Jensen and Petersen [13], based on various sets of field data, it is observed that the population of produced items is composed of two

A Survey of Burn-in and Maintenance Models for Repairable Systems

201

subpopulations—the strong subpopulation with normal lifetimes and the weak subpopulation with shorter lifetimes. In practice, weak items may be produced along with strong items due to, for example, defective resources and components, error of workers, unstable production environment caused by uncontrolled significant quality factors, etc. Mixture of these two subpopulations often results in a bimodal distribution as illustrated in Jensen and Petersen [13]. According to these authors, e.g., the infant mortality period of the life cycle, which exhibits high failure rate, results from failures in a weak subpopulation of a bimodal lifetime distribution. This can also be well understood if we observe the fact that weak items tend to fail earlier than strong items. In other words, the weakest populations are dying out first (cf. Finkelstein [11]). Thus, in view of this context, it can be stated that one of the main purposes of the burn-in procedure is to eliminate the weak subpopulation from the mixed population. Following the observations and ideas given in Jensen and Petersen [13], it can be assumed that the population is a mixture of two ordered subpopulations—the strong subpopulation and the weak subpopulation, and the study on burn-in can be done under this assumption.

5.3 Shocks as Burn-in In Finkelstein and Esaulova [12], the non-asymptotic and asymptotic properties of mixture failure rates in heterogeneous populations are studied. It is pointed out that, in a specific setting, a shock may perform a kind of burn-in operation. This idea of screening weak items is very similar to that in Environmental Stress Screening (ESS). In ESS, an extremely high stress is applied and the parts are screened for a specified duration before being assembled into a unit. Thus, a burnin procedure which incorporates shock operation can be developed under the model proposed by Finkelstein and Esaulova [12], and the burn-in and replacement models can be studied in this context.

6 Concluding Remarks Maintenance has been a very important issue in reliability and much research has been done on that issue. In order to model various degrees of repair, many different stochastic processes have been proposed. Among them, the renewal process and the nonhomogeneous Poisson process are the most popular ones. The renewal process is adopted to model complete repair process, whereas the nonhomogeneous Poisson process can be used for the minimal repair process. Some stochastic processes which are appropriate for intermediate types of repairs have also been proposed. Based on those various stochastic models, many studies on optimizing maintenance have been done in the reliability area.

202

J. H. Cha

Since the efficiency of a maintenance policy strongly depends on the failure rate function of the device in field operation, determining an optimal maintenance policy without considering a burn-in procedure should result in a partial optimization. Thus, for overall optimization, burn-in and maintenance should be considered at the same time. In this regard, recently, there has been much research on the problem of simultaneous optimization of burn-in and replacement policy, and most of the research has been surveyed in this paper. The burn-in procedures introduced in this paper consider only two extreme types of repairs: minimal repair and complete repair. In practice, there could be intermediate types of repairs during burn-in and this point could be incorporated in the future studies. In software reliability area, a software testing procedure is applied in order to eliminate early software failures or faults. This procedure is similar to a burn-in procedure for hardware systems. The issues and ideas suggested in this paper could also be applied to the problem of determining an optimal software testing procedure. Acknowledgments The author greatly thanks Mrs. Sul Ja Choi for helpful discussions and advices on the structure and construction of this paper. This work was supported by Priority Research Centers Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2009-0093827).

References 1. Beichelt F (1993) A unifying treatment of replacement policies with minimal repair. Nav Res Logist 40:51–67 2. Beichelt F, Fischer K (1980) General failure model applied to preventive maintenance policies. IEEE Trans Reliab R-29:39–41 3. Block HW, Savits TH (1997) Burn-in. Stat Sci 12:1–19 4. Cha JH (2000) On a better burn-in procedure. J Appl Probab 37:1099–1103 5. Cha JH (2001) Burn-in procedures for a generalized model. J Appl Probab 38:542–553 6. Cha JH (2003) A further extension of the generalized burn-in model. J Appl Probab 40:264–270 7. Cha JH (2006) A stochastic model for burn-in procedures in accelerated environment. Nav Res Logist 53:226–234 8. Cha JH, Mi J (2005) Optimal burn-in procedures in a generalized environment. Int J Reliab Qual Saf Eng 12:189–202 9. Cha JH, Mi J (2007) Some probability functions in reliability and their applications. Nav Res Logist 54:128–135 10. Cha JH, Na MH (2009) Accelerated burn-in and system maintenance policies. Commun Stat Theory Methods 38:719–733 11. Finkelstein MS (2008) Failure rate modelling for reliability and risk. Springer, London 12. Finkelstein MS, Esaulova V (2006) Asymptotic behavior of a general class of mixture failure rates. Adv Appl Probab 38:244–262 13. Jensen F, Petersen NE (1982) Burn-in. John Wiley, New York 14. Kuo W, Kuo Y (1983) Facing the headaches of early failures: A state-of-the-art review of burn-in decisions. Proc IEEE 71:1257–1266

A Survey of Burn-in and Maintenance Models for Repairable Systems

203

15. Meeker WQ, Escobar LA (1993) A review of recent research and current issues of accelerated testing. Int Stat Rev 61:147–168 16. Mi J (1994) Burn-in and maintenance policies. Adv Appl Probab 26:207–221 17. Mi J (2003) Optimal burn-in time and eventually IFR. J Chin Inst Ind Eng 20:533–542 18. Nelson W (1990) Accelerated testing: statistical models, test plans, and data analysis. Wiley, New York

Part V

Filtering

Filtering and M-ary Detection in a Minimal Repair Maintenance Model Lakhdar Aggoun and Lotfi Tadj

1 Introduction Consider a system that is subject to random failures. The system has a failure rate at time t given by the function rðtÞ. Suppose that upon failure, a minimal repair is performed, in the sense that the system state or condition is as good as it was immediately before the failure occurred. The minimal repair means that the age of the system is not disturbed by the failures. Consequently the failure rate at time t is still rðtÞ; independent of the number of failures occurred up to time t. If Nt represents the number of failures in the time period ½0; t, it follows that Nt is a nonhomogenous Poisson process with intensity function rðtÞ: This is the basic minimal repair model presented in 1960 by Barlow and Hunter [2]. This model has been extended in many ways since then, see A Survey of Replacement Models with Minimal Repair in this book. Dimitrov et al. [8] suggest the following classification of repairs: • A complete repair completely resets the failure rate of the product so that upon restart the product operates as a new one. This is known as a complete repair, and it is equivalent to a replacement of the faulty item by a new one. L. Aggoun (&) Department of Mathematics and Statistics, Sultan Qaboos University, P.O. Box 36 Al-Khod 123, Muscat, Sultanate of Oman e-mail: [email protected] L. Tadj Sobey School of Business, Saint Mary’s University, Halifax, NS B3H 3C3, Canada e-mail: [email protected]; [email protected] L. Tadj School of Business Administration, Dalhousie University, Halifax, NS B3H 3J5, Canada

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_8, Springer-Verlag London Limited 2011

207

208

L. Aggoun and L. Tadj

• A minimal repair has no impact on the failure rate. The failure rate remains the same as it was prior to the failure. The repair brings the product from a down to an up state without affecting its performance. • An imperfect repair contributes to some noticeable improvement of the product. This contribution is measured by an age-reducing repair factor. The repair sets back the clock of the repaired item. After the repair, the performance of the item is as it was at an earlier age. • A sloppy repair may contribute to some noticeable degradation of the product. This contribution is measured by an age-accelerating repair factor. The repair sets forward the clock of the repaired item. After the repair, the performance of the item is as it will be at some later age. Models of imperfect, sloppy or mixture of repairs belong to the class of agedependent repair models. The notion of the ‘age’ of the product and the degree of repair (also called improvement factor, lack of perfection, restoration factor, parameter of rejuvenation, age-reducing repair factor, age-accelerating repair factor, etc.) are used to define the virtual age of the product. The virtual age was explicitly introduced in Kijima et al. [16] and further explored in Kijima [15]. If a new system has virtual age V0 ¼ 0; and the system has the virtual age Vn1 ¼ y immediately after the ðn 1Þth repair, the functioning system obtained has the nth failure-time Xn distributed as PfXn x j Vn1 ¼ yg ¼

Fðx þ yÞ FðyÞ ; 1 FðyÞ

where FðxÞ is the failure-time distribution of a new system. Let qn be the degree of the nth repair. Two models are constructed, depending on how the repair activities affect the virtual age process fVn ; n ¼ 0; 1; . . .g: • Model I The nth repair cannot remove the damages incurred before the ðn 1Þth repair. It reduces the additional age Xn to qn Xn . Accordingly, the virtual age after the nth repair becomes Vn ¼ Vn1 þ qn Xn : • Model II At the nth failure, the virtual age has been accumulated to Vn1 þ Xn : The nth repair affects the virtual age so that Vn ¼ qn ðVn1 þ Xn Þ:

In general, qn is a random variable taking values between 0 and 1. In both models if qn ¼ 0 for all n 1 then one has a perfect repair model. Kijima et al. [16] considered a periodic replacement model with a general repair where

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

209

qn ¼ q ð0 q 1Þ for all n in Model I. Brown and Proschan [5] considered the imperfect repair in which, with probability p the repair is perfect and with probability 1 p it is minimal. This is Model II with qn independent and identically distributed (iid) random variables taking only the two extremal values 0 and 1. Block et al. [4] have generalized this model to the case where the probability of perfect repair depends on the age at failure, while Shaked and Shanthikumar [24] have generalized it to the multivariate case. The virtual age process has also been considered in various maintenance contexts (for recent references, see for example Scarsini and Shaked [22], Zhang and Love [25], Dimitrov et al. [8], Seo and Bai [23], Cui et al. [6], and Jaturonnatee et al. [23]). For an overview of statistical inferences for age-dependent repair models, see Guo and Love [11, 12] and Love and Guo [17, 18]. Recently, Kaminskiy and Kristov [14] provide a simulation method for statistical estimation of the parameters in Kijima model I. They assume that the time to first failure (TTFF) is Weibull distributed with shape parameter b and scale parameter k. However, this approach needs to estimate the distribution of the TTFF from a large amount of data. The ^ ^k, and ^q Monte Carlo approach requires a very long time to provide the estimates b, of the parameters. Mettas and Zhao [19] propose maximum likelihood estimators to estimate these same parameters. The degree of repair is assumed to be constant in both these papers. Other papers that assume Weibull baseline failure intensity of the system and deterministic degree of repair include Gasmi et al. [10] and Kahle [13]. The present paper adds a new dimension to the estimation problem in the class of age-dependent minimal repair models. We go back to the original model of Kijima [15] where the degree of repair qn is random and not deterministic. To the best of our knowledge, the parameter estimation problem has not been addressed in the case of random degree of repair. We address two different problems in which the random degree of repair is allowed to switch between a finite number of values, due to various phenomena. In the first problem, the degree of repair is a stochastic process and is allowed to switch between a finite number of values due to various phenomena. Switching is assumed to happen according to the jumps of a homogeneous, finite-state Markov chain. We use hidden Markov models (HMM) to estimate and optimally update the conditional probability distribution of the degree of the nth repair. An HMM is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved state. An HMM can be considered as the simplest dynamic Bayesian network. In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In an HMM, the state is not directly visible, but output dependent on the state is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective ‘hidden’ refers to the state sequence through which the model passes, not to the parameters of the model. Even if the model parameters are known exactly, the model is still ‘hidden’.

210

L. Aggoun and L. Tadj

There are three canonical problems associated with HMMs: • Given the parameters of the model, compute the probability of a particular output sequence. This requires summation over all possible state sequences, but can be done efficiently using the forward algorithm, which is a form of dynamic programming. • Given the parameters of the model and a particular output sequence, find the state sequence that is most likely to have generated that output sequence. This requires finding a maximum over all possible state sequences, but can similarly be solved efficiently by the Viterbi algorithm. • Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities. In other words, derive the maximum likelihood estimate of the parameters of the HMM given a dataset of output sequences. No tractable algorithm is known for solving this problem exactly, but a local maximum likelihood can be derived efficiently using the Baum-Welch algorithm or the Baldi-Chauvin algorithm. The Baum-Welch algorithm is also known as the forward-backward algorithm, and is a special case of the Expectation-maximization algorithm. Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, partof-speech tagging, musical score following, partial discharges and bioinformatics, see for example Rabiner [21]. In the second problem, the degree of repair is a random variable and belongs to a set of hypotheses. At each epoch n; a list of M candidate models is available and the optimal one is chosen. The term M-ary detection is used in Electrical Engineering to describe sequential hypothesis testing for more than two candidate model hypotheses. Here we are interested in model-parameter hypotheses. We assume that we have a list of M candidate models from which to choose. These candidate models will be denoted by Hh ; h ¼ 1; . . . M: Let b be a simple random variable denoting a specific model, with states indexed by 1 h M: We shall be interested in computing the posterior probabilitiesp ðb ¼ h j On Þ; where On denotes information contained in some observation process. It will be shown that this problem separates into a pure filtering component and a pure estimation component. In the context of M-ary detection this is known as the Separation Theorem, see Poor [20]. Consider a simple random variable b taking on values in the canonical basis ðe1 ; . . .; eM Þ of RM . We suppose b is an indicator random variable such that b ¼ eh , that is hb; eh i ¼ 1 if and only if hypothesis Hh holds. Here h; i is the usual inner product. In the next section we introduce a ‘reference’ probability measure under which all calculations are performed. The ‘real world’ probability measure under which the dynamics of our model are given is then defined via a suitable martingale. Section 3 deals with the first problem. We estimate recursively the conditional probability distribution of the (hidden) Markov chain which represents the degree

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

211

of the nth repair. Also, using the EM algorithm, the probability transitions of this (hidden) Markov chain are optimally updated. Section 4 deals with the second problem, the M-ary detection. Section 5 concludes the paper and proposes some directions for future research.

2 Reference Probability Measure Consider a system that is subject to random failures. Each failure is followed by an age-dependent repair in the sense of Kijima [15]. In this model the parameter of interest, namely the degree of the nth repair, is assumed to be random and is allowed to switch between a finite number of values due to various phenomena. Switching is assumed to happen according to the jumps of a homogeneous, finitestate Markov chain. Let ðX; F; PÞ be a probability space on which we develop a parametric, discrete-time multi-period integer-valued maintenance schedule. Definition 1 A discrete-time stochastic process fgn g; with finite-state space S ¼ fs1 ; s2 ; . . .; sN g; defined on a probability space ðX; F; PÞ is a Markov chain if P gnþ1 ¼ sinþ1 j g0 ¼ si0 ; . . .; gn ¼ sin ¼ P gnþ1 ¼ sinþ1 j gn ¼ sin ; for all n 0 and all states si0 ; . . .; sin ; sinþ1 2 S: Moreover, the Markov chain gn is a homogeneous if D P gnþ1 ¼ sj j gn ¼ si ¼ pji ; is independent of n: The matrix P ¼ fpji g is called the transition probability matrix of the homoP geneous Markov chain and it satisfies the property Nj¼1 pji ¼ 1: Note that our transition probability matrix P is the transpose of the traditional transition probability matrix defined elsewhere. The convenience of this choice will be apparent later. Consider the filtration fFn g ¼ rfg0 ; g1 ; . . .; gn g; and write Un ¼ ðIðgn ¼s1 Þ ; Iðgn ¼s2 Þ ; . . .; Iðgn ¼sN Þ Þ: Then fUn g is a discrete-time Markov chain with state space the set of unit vectors e1 ¼ ð1; 0; . . .; 0Þ0 ; . . .; eN ¼ ð0; . . .; 1Þ0 of RN ; where the ‘prime’ means transpose. However, the transition probability matrix of U is P: We can write: E½Un j Fn1 ¼ E½Un j Un1 ¼ PUn1 ;

212

L. Aggoun and L. Tadj

from which we conclude that PUn1 is the predictable part of Un ; given the history of D

U up to time n 1 and the non-predictable part of Un must be Mn ¼ Un PUn1 : In fact, it can be easily shown that Mn 2 RN is a mean 0;Fn -vector martingale and we have the semimartingale (or Doob decomposition) representation of the Markov chain fUn g Un ¼ PUn1 þ Mn :

ð1Þ

3 Filtering and Parameter Estimation Suppose q; the degree of the nth repair, is such Markov chain with transition probability matrix Q and state space a ¼ ða1 ; . . .; aN Þ where ai are positive constants less than one, representing possible reductions in age after each repair. For convenience, we identify a with the set of unit vectors e1 ¼ ð1; 0; . . .; 0Þ0 ; . . .; eN ¼ ð0; . . .; 1Þ0 of RN : With Vn and Xn representing the virtual life and age after the n-th failure, respectively, the dynamics of our inventory system follow. Vn ¼ Vn1 þ ha; qn iXn þ Wn ;

ð2Þ

where h:; :i refers to the usual scalar product. Here we assume that the process fXn g is a sequence of non-negative, independent of q (for simplicity) random variables with some probability functions /n : The process fWn g is a sequence of non-negative, independent random variables with probability density functions wn representing some ‘‘noise’’ in the dynamics of the system. Define the filtrations Gn ¼ rðq‘ ; X‘ ; V‘ ; ‘ nÞ and

Yn ¼ rðX‘ ; V‘ ; ‘ nÞ:

We make the following assumptions: 1. The process q‘ is not observed. 2. The processes X‘ and V‘ are either observed or predictable with respect to whatever information is available at time ‘. 3. The parameters a ¼ ða1 ; . . .; aN Þ are known here. However they could be estimated. Our goal is to: 1. derive a recursive conditional probability distribution for q given the filtration Y, and 2. update estimates of the transition probabilities of the Markov chain q: Reference Probability In our context, the objective of the method of reference probability is to choose a measureP; on the measurable space ðX; FÞ, under which the process q is a

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

213

sequence of iid random variables uniformly distributed on the sets e1 ; . . .; eN ; the processes fVn g and fXn g are sequences of independent random variables with probability distributions wn and /n respectively, independent of q: The probability measure P is referred to as the ‘real world’ measure, that is, under this measure we have Vn ¼ Vn1 þ ha; qn iXn þ Wn ; ð3Þ P qn ¼ Qqn1 þ Mn : Denote by K ¼ fKn ; 0 ng the stochastic process whose value at time n is given by n Y Kn ¼ kk ; ð4Þ k¼0

where k0 ¼ 1 and kk ¼

N Y

ðMQm‘ Þ

hqk ;em ihqk1 ;e‘ i

‘;m¼1

wk ðVk Vk1 ha; qk iXk Þ : wk ðVk Þ

ð5Þ

It is easily seen that the sequence fKn gn2N given by (4) is a Gn -martingale. Define the ‘real world’ measure P in terms of P; by setting dP D ¼ Kn : dP Gn The existence of P follows from Kolmogorov Extension Theorem. Under probability measure P, the ‘real world’ dynamics in (3) hold. This is seen by defining: Wn ¼ Vn Vn1 ha; qn iXn : Under probability measure P; Wn has probability density function wn and q is a Markov chain with transition probability matrix Q: For proofs and more details on measure change techniques, see Aggoun and Elliott [1] and Elliott et al. [9]. Remark 1 The purpose of the change of measure is to work under a ‘‘nice’’ artificial probability measure under which calculations are made easy. Note that, at each time n; the two probability measures are connected via Kn which is the projection of the Radon-Nykodim K on the information available at time n: This, of course, allows for the results to be expressed under the original ‘real world’ probability measure P:

3.1 Recursive Estimation We shall be working under probability measure P; under which the process q is a sequence of iid random variables uniformly distributed on the set fe1 ; . . .; eN g and fVn g is a sequences of independent random variables with probability distributions wn independent of q:

214

L. Aggoun and L. Tadj

Write D

pnðuÞ ¼ E½hqn ; eu i j Yn D

qnðuÞ ¼ E½hqn ; eu iKn j Yn : Using a generalized version of Bayes’ Theorem, see Aggoun and Elliott [1] and Elliott et al. [9], we have pnðuÞ ¼

qnðuÞ E½hqn ; eu iKn j Yn ¼ N : X E½Kn j Yn qj j¼1

Theorem 1 For n 1; we have qnðuÞ ¼

N X

Qu‘

‘¼1

wn ðVn Vn1 au Xn Þ qn1 ð‘Þ: wn ðVn Þ

Proof In view (4) and the independence and distribution assumption under P; we see that E½hqn ; eu iKn j Yn ¼ E½hqn ; eu iKn1 kn j Yn ¼ E½hqn ; eu iKn1

N Y

ðMQm‘ Þhqn ;em ihqn1 ;e‘ i

‘;m¼1

wn ðVn Vn1 ha; qn iXn Þ j Yn wn ðVn Þ N X w ðVn Vn1 au Xn Þ Qu‘ n ¼ wn ðVn Þ ‘¼1 E½hqn1 ; e‘ iKn1 j Yn1 N X w ðVn Vn1 au Xn Þ qn1 ð‘Þ: Qu‘ n ¼ wn ðVn Þ ‘¼1

3.2 Parameter Updating Using the EM algorithm, see Baum and Petrie [3] and Dempster and Laird [7], the parameters of the model are updated. In fact it is a conditional pseudo loglikelihood which is maximized, and the new parameters are expressed in terms of the recursive estimates obtained in Sect. 3.1.

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

215

D

Our model is determined by the set of parameters h ¼ðQm‘ ; 1 ‘; m NÞ. Suppose our model is determined by such a set h and we wish to determine a new set ^h ¼ ðQ ^ m‘ ; 1 ‘; m NÞ; which maximizes the conditional pseudo-log-likelihood defined below. To replace, at time n; the parameters h by new ones ^h define: !hq ;e ihq ;e i N Y ^ m‘ ðnÞ k m k1 ‘ Q ^n ¼ K : Qm‘ ‘;m¼1 An argument similar to the one used earlier shows that we can define a new ^ by setting probability measure P ^ dP ^ n: ¼K dP Gn ^ the Markov chain q has transition probabilities given It is easy to see that under P; ^ jiðnÞ : by Q Write ^n ¼ log K

n X N X

^ m‘ ðnÞ þ R; hqk ; em ihqk1 ; e‘ i log Q

k¼1 ‘;m¼1

where R does not contain ^ h. Therefore, ^ n j Yn ¼ E½log K

n X N X

^ m‘ ðnÞ þ R: ^ E½hqk ; em ihqk1 ; e‘ i j Yn log Q

ð6Þ

k¼1 ‘;m¼1

Now the parameters ^ h must satisfy N X

^ m‘ ðnÞ ¼ 1: Q

ð7Þ

m¼1

We wish, therefore, to choose ^ h to maximize (6) subject to the constraint (7). ^ The optimum choice of h is h i h i E Mðm;‘Þ j Yn E Kn Mðm;‘Þ j Yn n n b ji ðnÞ ¼ ¼ ; 8 pairs ð‘; mÞ; ‘ 6¼ m: Q E Kn Jn‘ j Yn E Jn‘ j Yk Therefore to re-estimate the parameters h we shall require estimates of 1. Mnðm;‘Þ , a discrete time counting process for the state transitions e‘ ! em , where ‘ 6¼ m; Mðm;‘Þ ¼ n

n X hqk1 e‘ ihqk ; em i: k¼1

ð8Þ

216

L. Aggoun and L. Tadj

2. Jn‘ , the cumulative sojourn time spent by the process q in state e‘ ; Jn‘ ¼

n X

hqk1 ; e‘ i:

k¼1

Rather than directly estimating the quantities Mðm;‘Þ and Jn‘ , recursive forms can n qn 2 RN , Jn‘ qn 2 RN etc. be found to estimate the related product-quantities, Mðm;‘Þ n The outputs of these filters can then be manipulated to marginalize out the process q, resulting in filtered estimates of the quantities of primary interest, namely Mðm;‘Þ n and Jn‘ : Write h i D ðm;‘Þ nn ðMðm;‘Þ : q Þ ¼ E K M q j Y n n n n n n Lemma 1 The process nn ðMðm;‘Þ qn Þ is computed recursively by the dynamics n qn Þ nn ðMðm;‘Þ n

E wn ðVn Vn1 am Xn Þ D ðm;‘Þ nn1 ðMn1 qn1 Þ; el em ¼ Qml wn ðVn Þ l;m¼1 w ðVn Vn1 am Xn Þ þ Qml n qn1 ð‘Þem : wn ðVn Þ N X

Proof In view of (5) and (8), we have h i

ðm;‘Þ nn ðMðm;‘Þ q Þ ¼ E M þ hq ; e ihq ; e i q K k j Y n n1 ‘ n m n n1 n n n n1 h i ðm;‘Þ ¼ E Mn1 qn Kn1 kn j Yn þ E½hqn1 ; e‘ ihqn ; em iqn Kn1 kn j Yn The first expectation is simply E

h

ðm;‘Þ Mn1 qn Kn1 kn

i

j Yn ¼

N X

Qml

l;m¼1

wn ðVn Vn1 am Xn Þ wn ðVn Þ

h i ðm;‘Þ E Mn1 Kn1 hqn1 ; el i j Yn1 em N X

w ðVn Vn1 am Xn Þ ¼ Qml n wn ðVn Þ l;m¼1 D E ðm;‘Þ nn ðMn1 qn1 Þ; el em : The second expectation yields:

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

217

wn ðVn Vn1 am Xn Þ em wn ðVn Þ E½Kn1 hqn1 ; e‘ i j Yn1 w ðVn Vn1 am Xn Þ ¼ Qml n wn ðVn Þ qn1 ð‘Þem :

E½hqn1 ; e‘ ihqn ; em iqn Kn1 kn j Yn ¼ Qml

h Write D nn ðJn‘ qn Þ ¼ E Kn Jn‘ qn j Yn : A similar argument shows that: Lemma 2 The process nn ðJn‘ qn Þ is computed recursively by the dynamics N X wn ðVn Vn1 am Xn Þ ‘ ‘ nn1 ðJn1 Qml qn1 Þ; el em nn ðJn qn Þ ¼ wn ðVn Þ l;m¼1 N X wn ðVn Vn1 am Xn Þ qn1 ð‘Þem : þ Qm‘ wn ðVn Þ m¼1 The filter recursions given by Lemmata 1 and 2 provide updates to estimate product processes, each involving the process q: What we would like to do, is manipulate these filters so as to remove the dependence upon the process q: This manipulation is routine. Since q takes values on a canonical basis of indicator functions (in fact the standard unit vectors of RN ), we see that h i ðm;‘Þ hnn ðMðm;‘Þ q Þ; 1i ¼ hE K M q j Y n n n n ; 1i n n h i hqn ; 1i j Yn ¼ E Kn Mðm;‘Þ n Þ; ¼ nn ðMðm;‘Þ n etc. Here 1 ¼ ð1; 1; . . .; 1Þ0 2 RN : Remark 2 The revised parameters ^ hðnÞ give new probability measures for the ðm;‘Þ etc. can then be re-estimated using the new parammodel. The quantities Mn eters and perhaps new data.

4 M-ary Detection Suppose now that the degree of repair q is a random variable. The dynamics (2) are now Vn ¼ Vn1 þ qXn þ Wn :

ð9Þ

218

L. Aggoun and L. Tadj

Define the filtration Gn ¼ rðq; X‘ ; V‘ ; ‘ nÞ

and

Yn ¼ rðX‘ ; V‘ ; ‘ nÞ:

We make the following assumptions: 1. q is not known. 2. The processes X‘ and V‘ are either observed or predictable with respect to whatever information is available at time ‘: Our goal is to derive a recursive conditional probability distribution for qh ; h ¼ 1; . . .; M; given the filtration Y: The probability measure P is referred to as the ‘real world’ measure, that is, under this measure we have Vn ¼ Vn1 þ qXn þ Wn

ð10Þ

Denote by K ¼ fKn ; 0 ng the stochastic process whose value at time n is given by n Y kk ; ð11Þ Kn ¼ k¼0

where k0 ¼ 1 and M X wk ðVk Vk1 qh Xk Þ kk ¼ : hb; eh i wk ðVk Þ h¼1

ð12Þ

It is easily seen that the sequence fKn gn2N given by (11) is a Gn -martingale. Define the ‘real world’ measure P in terms of P, by setting dP D ¼ Kn : dP Gn The existence of P follows from Kolmogorov Extension Theorem. Under probability measure P; the ‘real world’ dynamics in (10) hold. This is seen by defining: Wn ¼ Vn Vn1 qh Xn . Under probability measure P; Wn has probability density function wn : For proofs and more details on measure change techniques, see Aggoun and Elliott [1] and Elliott et al. [9].

4.1 Recursive Estimation We shall be working under probability measure P: Write D

pn ðhÞ ¼ E½hb; eh i j Yn D

qnðhÞ ¼ E½hb; eh iKn j Yn :

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

219

Using a generalized version of Bayes’ Theorem, see Aggoun and Elliott [1] and Elliott et al. [9], we have pn ðhÞ ¼

E½hb; eh iKn j Yn q ðhÞ : ¼ Nn X E½Kn j Yn qn ðjÞ j¼1

Theorem 2 For n 1we have wn ðVn Vn1 qh Xn Þ qn1 ðhÞ: qn ðhÞ ¼ wn ðVn Þ Proof In view (11) and the independence and distribution assumption under P; we see that E½hb; eh iKn j Yn ¼ E½hb; eh iKn1 kn j Yn ¼ E½hb; eh iKn1 wn ðVn Vn1 qh Xn Þ j Yn wn ðVn Þ wn ðVn Vn1 qh Xn Þ ¼ wn ðVn Þ E½hb; eh iKn1 j Yn1 wn ðVn Vn1 qh Xn Þ ¼ qn1 ðhÞ: wn ðVn Þ q

nðhÞ : The normalized probabilities are simply pnðhÞ ¼ P M

h

qnðjÞ

j¼1

5 Conclusion We have considered in this paper a system that is subject to random failures. Each failure is followed by an age-dependent repair in the sense of Kijima Model I [15]. As in Kijima’s model, we have assumed that the degree of repair is a random variable. Two problems have been addressed. The first problem is an application of Hidden Markov Models to a maintenance problem. We have assumed that the degree of repair switches between a finite number of values and switching happens according to the jumps of a homogeneous, finite-state Markov chain. Using the change of measure technique, various quantities of interest are derived and parameters are updated via the EM algorithm.

220

L. Aggoun and L. Tadj

In the second problem, we extended the proposed model to M-ary detection. We have assumed that the degree of repair belongs to a set of hypotheses from which to choose in an optimal manner. In effect, the formulation is something like a discrete and finite version of the EM algorithm where rather than considering an uncountable collection of model parameter sets in the space of all admissible models, we consider a finite collection in this space. A possible extension of the proposed M-ary detection model is to consider non ‘static’ hypotheses, that is, at each time n; it may change according to some probability model.

References 1. Aggoun L, Elliott RJ (2004) Measure theory and filtering: introduction with applications. Cambridge series In statistical and probabilistic mathematics. Cambridge University Press 2. Barlow R, Hunter L (1960) Optimum preventive maintenance policies. Oper Res 8:90–110 3. Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Inst Stat Math 37:1554–1563 4. Block HW, Borges WS, Savits TH (1985) Age-dependent minimal repair. J Appl Prob 22:370–385 5. Brown M, Proschan F (1983) Imperfect repair. J Appl Prob 20:851–859 6. Cui L, Kuo W, Loh HT, Xie M (2004) Optimal allocation of minimal and perfect repairs under resource constraints. IEEE Tran Reliab 53:193–199 7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38 8. Dimitrov B, Chukova S, Khalil Z (2004) Warranty costs: an age-dependent failure/repair model. Nav Res Logistics 51:959–976 9. Elliot RJ, Aggoun L, Moore JB (1995) Hidden Markov models: estimation and control. Applications of Mathematics, vol 29. Springer-Verlag, New-York 10. Gasmi S, Love CE, Kahle W (2003) A general repair, proportional-hazards, framework to model complex repairable systems. IEEE Tran Reliab 52:26–32 11. Guo R, Love CE (1992) Statistical analysis of an age model for imperfectly repaired systems. Qual Reliab Eng Int 8:133–146 12. Guo R, Love CE (1994) Simulating non-homogeneous Poisson processes with proportional intensities. Nav Res Logistics 41:507–522 13. Kahle W (2007) Optimal maintenance policies in incomplete repair models. Reliab Eng Syst Saf 92:563–565 14. Kaminskiy M, Krivtsov V (2006) A Monte Carlo approach to estimation of g-renewal process in warranty data analysis. Reliab Theory Appl 1:29–31 15. Kijima M (1989) Some results for repairable systems with general repair. J Appl Probab 26:89–102 16. Kijima M, Morimura H, Suzuki Y (1988) Periodical replacement problem without assuming minimal repair. Eur J Oper Res 37:194–203 17. Love CE, Guo R (1994) Utilizing Weibull failure rates in repair limit analysis for equipment replacement/preventive maintenance decisions. J Oper Res Soc 47:1366–1376 18. Love CE, Guo R (1994) Simulation strategies to identify the failure parameters of repairable systems under the influence of general repair. Qual Reliab Eng Int 10:37–47 19. Mettas A, Zhao W (2005) Modeling and analysis of repairable systems with general repair. In: Proceedings Annual Reliability and Maintainability Symposium, Alexandria

Filtering and M-ary Detection in a Minimal Repair Maintenance Model

221

20. Poor HV (1988) An introduction to signal detection and estimation. Springer Verlag, New York 21. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286. http://ieeexplore.ieee.org/iel5/5/698/00018626.pdf? arnumber=18626. Accessed 10 Mar 2010 22. Scarsini M, Shaked M (2000) On the value of an item subject to general repair or maintenance. Eur J Oper Res 122:625–637 23. Seo JH, Bai DS (2004) An optimal maintenance policy for a system under periodic overhaul. Math Comput Model 39:373–380 24. Shaked M, Shanthikumar JG (1986) Multivariate imperfect repair. Oper Res 34:437–448 25. Zhang ZG, Love CE (2000) A simple recursive Markov chain model to determine the optimal replacement policies under general repairs. Comput Oper Res 27:321–333

Part VI

Product Support

Efficient Product Support—Optimum and Realistic Spare Parts Forecasting Behzad Ghodrati

1 Introduction and Background Generally, due to a lack of technology and other compelling factors (like economic limitations, environmental conditions, etc.), it is impossible to design a product that will completely fulfill its expected function throughout its entire life cycle. Therefore, support is vital to enhance system effectiveness and minimize unplanned stoppages. Product support, also commonly called after sales service, consists of the different forms of assistance and support that manufacturers offer customers to help them gain the maximum value from products. Typical technical forms of support include installation, maintenance and repair services, and spare parts availability. This assistance can be provided in different forms and at different stages of the product life cycle. Product support falls into two broad categories, namely customer support and product support. The research presented in this chapter is focused on product support, which is greatly influenced by the product reliability characteristics. Product reliability characteristics (see e.g. Blanchard and Febrycky [7]) are important for us to understand. Specifically, we must ascertain: • how the product reliability characteristics influence product support and • how to evaluate support requirements (e.g. spare parts), using what are called dependability characteristics. The operating environment parameters for the product also influence its dependability characteristics. Consequently, these factors influence the dimensioning of product support and its evaluation and forecasting to achieve efficiency B. Ghodrati (&) Division of Operation and Maintenance Engineering, Luleå University of Technology, SE-971 87 Luleå, Sweden e-mail: [email protected]

L. Tadj et al. (eds.), Replacement Models with Minimal Repair, Springer Series in Reliability Engineering, DOI: 10.1007/978-0-85729-215-5_9, Springer-Verlag London Limited 2011

225

226

B. Ghodrati

and cost-effectiveness. For existing systems and machines, incorporating environmental parameters in reliability analysis is a powerful tool for forecasting services, repairs and spare parts requirements affected by environmental factors. In fact, reliability is a function of time/load and the operating environment of a product; the latter comprises factors such as the surrounding environment (e.g. temperature, humidity, and dust), condition-indicating parameters (e.g. vibration and pressure), and human aspects (e.g. operator skill). The variables related to these factors are called covariates. Spare parts are a product support issue and can be divided into two types, namely, repairable and non-repairable. Actually, for many types of spare parts, subassemblies and modules, replacing them upon failure is more economical than repairing them. For example, bearings, gears, electronic modules, gaskets, seals, filters, light bulbs, hoses, and valves are parts which tend to be replaced rather than repaired. These parts are referred to as service parts or non-repairable parts. Here, we deal with non-repairable parts. Spare parts management and logistics is an aspect of product support management which influences the product life cycle cost. The availability of spare parts upon demand decreases product down-time and increases utilization of the system/machine and consequently the profitability of the project. If the optimum number of spare parts is stored in the inventory, this minimizes the product life cycle cost as a goal function; the optimum number is calculated taking different factors into account, such as the part criticality, the part purchasing cost, the distance between the manufacturer and user, and the lead-time. The principal objective of any inventory management system is to achieve an adequate service level with minimum risk and a minimum inventory investment. Since investments in spare parts can be substantial, management is interested in decreasing stock levels whilst maximizing the service performance of a spare part management system. To determine the best improvement actions, performance indicators (such as the fill rate and service rate) are needed. For example, sometimes the duration of unavailability of parts is a major factor of concern, making the waiting time for parts a more relevant performance indicator.

1.1 Problem Definition and Discussion As a result of certain limitations in the design phase, such as the technology used (which may not be state-of-the-art), economic limitations, environmental conditions, etc., systems/machines may not meet users’ requirements fully in terms of system performance and effectiveness. In addition, poorly designed reliability and maintainability characteristics combined with a poor maintenance and product support strategy often lead to unplanned and unforeseen stoppages [49]. Therefore, the need for support to compensate for this weakness is vital. When studying the concept of ‘‘product support’’, there are a few key questions to consider, namely:

Efficient Product Support

227

• What is product support? Why is it so important? • Which factors influence product support and how do they do so? • How can we consider and integrate these factors into product design and product support to minimize the product’s life cycle cost (LCC)? The concept of product support/after sales service includes the different forms of assistance that manufacturers/suppliers offer customers to help them gain the maximum profit from a product [48]. Typical forms of support include installation, training the operators/users of the product, maintenance and repair services (generally termed service), documentation, availability of spare parts, upgrades (enhanced functionality), customer consulting, and warranty schemes [25]. In fact, product support entails all the activities necessary ‘‘to ensure that a product is available for trouble-free use to consumers over its useful life span’’. But product support is important for manufacturers as well because: • It is essential for achieving customer satisfaction and good long-term relationships. • It can provide a competitive advantage [2]. As product differentiation becomes harder in many markets, companies are increasingly regarding customer support as a potential source of competitive advantage. • It plays a role in increasing the success rate of new products [12]. • It can be a major source of revenue [24]. Over the working lifetime of a product, the support revenues from a customer may be far higher than the initial product revenue. However, this often receives too little management attention. An important aspect of user/customer satisfaction is reducing the down-time and repair costs of the system/machine. The nature and reliability of the equipment obviously have a large influence on the key elements of product support. Customers expect reliable products and a quick response in the event of failure. The spare part, as an item of product support, is important. The logistics of spare parts and their inventory levels differ depending on the spare part in question, and the common approaches to stock control in manufacturing situations do not apply to spare parts [20]. In the area of parts logistics, supplying spare parts can be a highly profitable business. With the expansion of high-technology equipment in industries worldwide, the need for spare parts to maximize the utilization of this equipment is paramount. Spare parts forecasting and management improve productivity by reducing idle machine time and increasing resource utilization [54]. It is obvious that spare parts provisioning and inventory control are complex [44], because of the trade-offs necessary to maximize availability of slow and fast moving parts [20]. The effectiveness of spare parts management is based on factors which require improvements in data acquisition and methods of forecasting the spare parts requirements, analyzing the data on the demand for such parts, and developing proper stocking and ordering criteria. The early evaluation of all aspects of product support at the design stage has been termed ‘‘design for supportability’’ [25]. It is generally recognized that engineers with experience of environmental factors influencing the technical

228

B. Ghodrati

characteristics of the product and customer support should be involved in the development stage. The reliability of a system can be defined as ‘‘the ability of a system/machine to perform or operate a required function without failure under given conditions for a given time interval’’ (International Electrotechnical Vocabulary [IEV] 191-02-06). It is a function of time influenced by the environment in which the system is operating. The modern concept of reliability is a quantitative measure that can be specified and analyzed. Reliability is now a parameter of design that can be traded off against other parameters such as cost and performance. It is necessary to express reliability as a quantitative measure because of the ever-growing complexity of systems, the competitiveness in the market, and the scarcity of resources [37]. The analysis of field data helps the designer and engineer to modify the design and/or product support strategy to improve the system reliability and calculate the required spare parts. Sound spare parts management improves productivity by reducing idle machine time and increasing resource utilization [54]. Clearly, spare provisioning is a complex problem and requires an accurate analysis of all conditions and factors that affect the selection of appropriate spare provisioning models. The literature contains a number of papers in the general area of spare provisioning, especially in spare parts logistics (e.g. Chelbi and Ait-Kadi [11]; Kennedy et al. [35]). Most such research deals with repairable systems and spares inventory management (e.g. Aronis et al. [3]). For the most part, previous research relies on a queuing theory approach to determining the required spare parts stock to ensure specified system availability (e.g. Dhakar et al. [18]; Huiskonen [27]). These models have been extended to incorporate the inventory management aspect of maintenance (e.g. Kumar et al. [41]). Quantitative techniques based on reliability theory have also been used in the development of methods to forecast the failure rates of the required items to be purchased and/or stocked [30, 23, 33, 47, 61]. The resulting failure rates have been used to determine more accurate demand rates. In the specific area of spare parts management of non-repairable (mechanical) systems, which often fail with time-dependent failure rates (ageing), some renewal theory based prediction models are available for forecasting the needs for spares in a planning horizon [23, 42]. Finally, most of the research in the spare parts domain concerns inventory management. Granted, guaranteeing the availability of systems/machines requires that spare parts be always available on demand. However, estimation and calculation of the required number of spare parts for storage to ensure their availability when required, with respect to techno-economical issues (reliability, maintainability, life cycle cost, etc.), although crucial, have rarely been considered or studied (a notable exception being Sheikh et al. [58]). Most of the research on reliability considers the operation time as the only variable required to estimate reliability. Covariates are seldom included in reliability models (parametric reliability methods such as exponential and Weibull reliability models; see for example O’Connor [53]; Høyland and Rausand [26]). None of the surveyed literature that contains required spare parts calculations

Efficient Product Support

229

based on the reliability characteristics of the product has considered the operating environment as a factor influencing reliability [30, 47]. But not considering covariates may give rise to errors in the estimation of the reliability characteristics of a system and lead to wrong conclusions concerning product support and spare parts forecasting. In short, such estimations are not accurate enough, because the reliability characteristics of a product are a function of the operation time and operating environment. For instance, in the mining industry, the majority of down-time can be due to shortage or waiting for spare parts which, in turn, stems from a wrong estimation of spare parts consumption. Mining companies often follow the recommendation of manufacturers regarding spare parts consumption, but this is only based on the operating time of the machine. Often manufacturers do not take into account the effect of operating environment on the reliability of the system/components, and this generally has a negative effect on the life-length of components and spare parts. Thus, as mentioned earlier, it is very important to assess the effect of the operating environment on the life-length of components to arrive at a better estimate of spare parts consumption. If the effect of operating environment is known, this will facilitate better optimal spare parts planning. It is therefore desirable to estimate the magnitude of the effects of the various covariates so that the reliability characteristics of a system can be interpreted more efficaciously. Kumar [37] has studied some of the methods suitable for reliability analysis of a system whose lifetime is influenced by covariates. However, most of the reliability methods used for spare parts forecasting and calculation (as mentioned earlier) do not take into consideration the effect of covariates, leading to inappropriate forecasting and poor inventory management.

2 Product Support Due to technological, economical, and environmental constraints in the design phase, machines/systems are often unable to fulfill customers’ needs completely in terms of system performance during their entire life cycle. Poorly designed technical characteristics of the system and a poor product support strategy (in the case of new products) play a role in this failure to perform. If we wish to compensate for this shortcoming in existing products, thereby enhancing system efficiency and preventing unplanned stoppages (Fig. 1), we need to recognize the importance of support. That being said, industrial systems need support throughout their lifetimes. The dimensioning of product support (spare parts and a service delivery system/strategy in our case) is greatly influenced by the product design characteristics. The relationship between product exploitation (the type of use and application environment), product design, and product support is illustrated in Fig. 2 [50]. The broken lines indicate a technological pull, whereas the continuous lines indicate a technological push.

230

B. Ghodrati

Fig. 1 Typical reasons for unplanned stoppage creation

Fig. 2 The relationship between product design characteristics, product exploitation, and product support (Source: [50])

So that industrial systems can perform their expected function, some essential and typical technical forms of support include installation, maintenance, repair services, and provision of spare parts. Such forms of support are supplied by original equipment manufacturers (OEM)/suppliers, and are characterized as product support. Maintenance is, in general, a process that can be defined as the combination of all the technical and associated administrative actions, including supervision actions, intended to retain an item in, or restore it to, a state in which it can perform a required function (International Electrotechnical Vocabulary [IEV] 191-07-01). Maintenance objectives can be summarized under four headings: ensuring the system function (availability, efficiency, and product quality); ensuring the system life (asset management); ensuring safety; and ensuring human well-being [17]. For production equipment, ensuring the system function should be the prime maintenance objective. Here, maintenance has to provide the right (but not necessarily the maximum) reliability, availability, efficiency, and capability (i.e. producing the right quality) of production systems, in accordance with the need for these characteristics.

Efficient Product Support

231

Consideration of maintenance should start in the design phase of systems. The maintenance concept or strategy describes what events (e.g. failure, passing of time) trigger what type of maintenance (inspection, repair, replacement) and can be determined both after the design phase and during the operations phase. In general, maintenance management attempts to optimize the maintenance tasks; minimizing the repair time is an issue of maintenance optimization that comprises the availability of spare parts when required. By and large, the product support and maintenance needs of systems are decided during the design and manufacturing phase (see e.g. Blanchard [6]; Blanchard and Fabrycky [7]; Dekker [17]; Goffin [25]). Product support is important in the modern industrial world. Today, management pays close attention to product support because it: • plays a key role in achieving customer satisfaction, • can be a considerable source of revenue and profit, and • can provide a competitive advantage in marketing. Leading companies achieve a competitive advantage with product support. For example, some companies focus on design for supportability (e.g. Kodak); others believe that the capability to upgrade is important and to reduce costs, these companies focus on the field upgradeability of products (e.g. Hewlett-Packard) [25]. In brief, product support is an essential part of modern business. At any rate, we can safely assume the importance of product support is that it increases customer satisfaction, so that customers become interested in purchasing the product again and again.

3 Factors Influencing Product’s Dependability Dependability is a collective term used to describe availability performance and its influencing factors: reliability performance, maintainability performance, and maintenance support performance. Dependability is used only for general description in non-quantitative terms (International Electrotechnical Vocabulary [IEV] 191-02-03). The reliability, availability, and maintainability of a product are important and have an immense influence on product support. Products usually require maintenance and the installation of spare parts; both are performed at regular times to ensure product reliability. The effective performance of systems is critical to the success of all organizations, and reliability and availability measures are one of the most common sets of measures used in evaluating the performance of the organization’s equipment [10]. When reliability and availability performance are inadequate, engineers need to prioritize their improvement efforts. These efforts could comprise actions that reduce the occurrence of system failure and improve the execution of system maintenance. Improvement of the speed of equipment repair is effective when implemented with adequate product support and availability of spare parts.

232

B. Ghodrati

Some measures (e.g. the reliability importance measure and the availability importance measure) provide an index for use in developing an availability improvement strategy [10]. The availability importance measure, for instance, denotes the marginal, relative improvement in availability resulting from decreasing the failure rate or increasing the repair rate of components. In addition, high reliability does not mean that the product will be maintenancefree, since materials degrade over time, and many technical characteristics are dependent on the same mechanisms needing maintenance (e.g. friction clutches, brakes, etc.). When defining reliability, failure is defined as the termination of the ability of an item to perform a required function (International Electrotechnical Vocabulary [IEV] 191-04-01). After failure the item has a fault. Failure is an event, as distinguished from a fault, which is a state. To produce reliable products, to respond quickly to service demands, and to avoid user/customer dissatisfaction by reducing the system down-time (less repair time) and repair costs, companies should consider the reliability characteristics at the design and product support dimensioning stages. Additionally, as mentioned before, high reliability does not mean that we do not need to perform service or maintenance, but that service or maintenance is needed to a lesser degree. However, the design-out-maintenance approach often proves too costly or even impossible in state-of-the-art technology. Companies seek a design for easy, costeffective, and efficient maintenance and support.

3.1 Product Exploitation Product exploitation refers to the situation of the operator, the work conditions and the environmental factors. The environmental conditions in which the equipment is to be operated, the road conditions, maintenance facilities, maintenance crew training, operator training, etc., often have a considerable influence on product reliability characteristics [40]) and hence on product support. Thus, the operating environment should be considered seriously when dimensioning product support and drawing up service delivery performance strategies, since it will have an impact on the operational and maintenance costs and service quality. Generally, the recommended maintenance program for systems and components is based on their age with no consideration of the operating environment. This, in turn, leads to many unexpected system and component failures. It creates poor system performance and a higher Life Cycle Cost (LCC) due to unplanned repairs and/or restoration, as well as more support. Product exploitation should be considered in the design phase of a new product and the support dimensioning phase of an existing product, to provide a support plan for achieving the optimum conditions. In other words, the users’ environments must be analyzed before deciding the service and maintenance concept for industrial systems/products. Furthermore, the users and the operating environment

Efficient Product Support

233

can influence the degree of support needed to achieve the expected performance level [50]. Service, repair, and other issues of product support should be designed according to the system’s operating environment parameters. For example, we cannot offer the same support to unique systems which are working in different geographical locations such as Argentina and Russia.

3.2 Product Geographical Locations This factor is important in the delivery of support and service for products. If the manufacturer is located close to the user, it may take less time to obtain spare parts and assistance, while if the manufacturer is far away the service delivery system becomes critical. The distance of the user from the manufacturer, distributor/ supplier will also influence spare parts management. Finally, in product support, a prompt response to a customer’s request plays a key role in customer satisfaction. Therefore, with respect to these points (fast response, ease of repair, and spare parts), the geographical distribution of customers is a critical factor in determining service delivery strategies, spare parts logistics, and inventory management. In spare parts logistics, for instance, the geographical distribution of the customers (the product working places) has an influence on the lead-time, and consequently, on the quantity of stored parts. In addition, there has to be a trade-off between the product’s dependability and its geographical location (Fig. 3). To arrive at optimal product dependability characteristics for various geographical locations, the LCC analysis is a useful and powerful tool in correct decision-making. In other words, when considering and analyzing the life cycle cost of a new product, we can find out which rate (percentage) of reliability and availability should be designed for a product in relation to its geographical location, in order to optimize the LCC.

4 Product Support Logistics Optimal spares provisioning is a prerequisite for all types of maintenance tasks, including inspections, preventive maintenance, and repairs. With the exception of preventive activities, spare parts for maintenance tasks are usually required at Fig. 3 The trade-off between product dependability and geographical location of product (adapted from Markeset and Kumar [50])

234

B. Ghodrati

random intervals. Thus, the fast and secure coordination of the demand for spare parts with the supply of spare parts at the required time is an important factor in the punctual execution of the maintenance process. Missing materials are a frequently cited reason for delay in the completion of maintenance tasks. As spare parts for machinery are often of a very high quality, this problem cannot be solved simply by increasing the warehouse stock. Through the optimization of product support logistics, material stocks of spare parts can be optimized to support maximum availability with minimum stocks. The aim of product support logistics is to minimize the product support costs, including costs for ordering, holding, transportation, product down-time, etc. Therefore, since the present work deals with spare parts issues in product support, in the field of logistics we will discuss spare parts inventory management and the ordering process for required spare parts. The conditions for planning the logistics of spare parts differ from those for planning the logistics of other materials in several ways: • The service requirements are higher, as the effects of stock-outs may be financially remarkable. • The demand for parts may be sporadic and difficult to forecast. • The prices of individual parts may be high. These conditions lead to the necessity to streamline the logistic system of spare parts. Because streamlined processes save time and money, spare parts management is an important area of inventory research [27].

4.1 Spare Parts Management The spare parts program of a plant is an essential part of the overall spare parts management, because it ensures that there will always be an adequate supply of spare parts at hand when they are needed, and that the plant will never experience costly delays in repairs while awaiting spare parts. However, maintaining this inventory can also result in significant additional costs for the plant/product operation if it is not optimized. An effective spare parts management program has several broad objectives: • To ensure that the spare parts inventory contains at least one of every part which is likely to be needed to carry out repairs of an important system/component whose failure would result in an unacceptable impact on plant safety or production; • To ensure that the ‘‘at-hand’’ replenishment of spare parts for each important component is sufficient to prevent any unacceptable losses in plant or safety system availability which would follow the occurrence of more than one failure during a typical inventory replenishment cycle; • To maintain the necessary inventory at the optimum cost.

Efficient Product Support

235

Historically, inventories have several different classes of spare parts, each of which tends to have its own individual analytical requirements when decisions are made about whether to stock and how many to stock; each of the following type affects the determination of its service level differently: • Expensive spare components or large assemblies for important systems/components, e.g. hydraulic pumps in LHDs (loading, hauling and dumping machines), etc.; • Medium-grade piece parts and complete assemblies for important systems, e.g. pumps, valves, controls, and breakers; • Smaller (cheap) generic parts needed to maintain and repair important systems/ components, e.g. bolts, fasteners, gaskets, cables, and connectors. After the classification of spare parts has been carried out, the next task is to optimize the actual number of spares purchased for the on-site inventory.

4.2 Spare Parts Inventory Inventory control of spare parts plays an increasingly important role in modern operations management. The trade-off is clear: on one hand, a large number of spare parts ties up a large amount of capital, while on the other hand, too little inventory may result in poor customer service or extremely costly emergency actions [3]. A general approach which can be used to determine an appropriate inventory and its replenishment for existing equipment is shown in Fig. 4. Following a decision that a particular part should be kept in the inventory, the next question to be answered is how many parts should the inventory contain. Part replenishment is determined on the basis of the expected usage rate of the part and the economic risk associated with allowing a depleted inventory to occur during the part replenishment cycle which generally follows the removal of parts from stock. Factors which influence the usage rate and replenishment include: • the part failure rate and usage rate per component, and • the number of similar components. These provide the annual usage rate. However, to control handling charges and to exploit the substantial discounts offered by some vendors for multiple purchases, many smaller parts have ‘‘economic ordering quantities’’. All these factors are considered in the determination of replenishment: the analyst calculates the probability of incurring additional failures during the re-order and inventory replenishment cycle. The probability that a spare will not be available when needed will be a function of the number normally held in the inventory (part replenishment), the number of similar components and their reliability, the operating environment

236

B. Ghodrati

Fig. 4 Spare parts optimization process (determination of inventory) (adapted from IAEA [28])

(part usage rates) and the time taken to restock parts after they have been removed from the inventory (replenishment cycle). When the dependability analysis has shown a component to be a candidate for inclusion in the spare part inventory because it has a high importance measure, the benefit from maintaining spares can be calculated by varying the associated mean down-time and measuring its effect on the system’s equivalent forced outage rate (EFOR).

Efficient Product Support

237

When the part usage rate is calculated from the number of similar components and their failure rates and compared to the length of the part replacement cycle, it is possible to calculate the probability that a needed part will not be available. The principle objective of any inventory management system, as mentioned earlier, is to achieve an adequate service level with a minimum inventory investment and minimum administrative costs. For instance, attempting to save on ordering costs by ordering more than what is needed will cause blocked capital in the inventory. To solve this dilemma, the economic order quantity (EOQ) (Fig. 5) can be used. This is the lot size that minimizes the total inventory cost; it concerns both holding and ordering with respect to elimination of shortages, and can be calculated as [36]: rﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2DS EOQ ¼ ; ð1Þ H where: D the annual demand (units/year) [equals NT in one year] S the cost of ordering or setting up one lot ($/lot) H the cost of holding one unit in the inventory for a year (often calculated as a proportion of the item’s value)

Fig. 5 Economic order quantity (Source: Krajewski and Ritzman [36]

5 Reliability Prediction Methods Reliability plays an important role in many aspects of a product life cycle. For example, accurate reliability estimates are required before a design is released to the production facility and before the product is finally released for distribution to

238

B. Ghodrati

customers. Furthermore, reliability estimation is critical in determining the optimal maintenance, inspection, spare parts estimation, and replacement schedules. In other words, accurate estimation and prediction of the hazards of mechanical systems are critical to maintenance (particularly predictive maintenance) activities [21]. This has led many manufacturers to conduct real-life testing on components and products to estimate their reliabilities accurately. Failure prediction of mechanical systems can be conducted in two ways: fault diagnosis from condition monitoring signals and statistical analysis of historical failure data. Most existing statistical models require historical operational data and mature statistical techniques to be effective. The statistical method is an approach for predicting product reliability under special circumstances (e.g. prediction at later stages or update predictions of an item while some data are available, or reliability prediction of an existing item). When the failure time data involve complex distributions that are largely unknown, or when the number of observations is small, making it difficult to fit a failure time distribution accurately and avoid making assumptions that would be difficult to test, non-parametric statistics-based models are used. A widely used non-parametric method is ‘‘multiple regressions’’. It is assumed that the covariates (external factors influencing reliability, e.g. operating environmental factors, applied stresses, etc.) are independent variables of the regression model used to predict the time to failure of the individual component. The proportional hazards model (PHM) is a non-parametric approach developed by Cox [13], in which a baseline hazard function is modified multiplicatively by covariates. PHM, represented by hðtÞ ¼ h0 ðtÞwðz; aÞ; indicates that the hazard of a system will change when its covariates change, i.e. covariates are explanatory variables, and hazard is the response variable in PHM. The advantage of this approach is that it is essentially distribution-free, and no additional assumptions are necessary about the failure times. The model is quite flexible. For example, if one assumes a particular form for the baseline hazard function h0 ðtÞ; a fully parametric proportional hazards model is obtained; Jardine et al. [31] use a Weibull hazard function for h0 ðtÞ in their PHM model. A distribution-free model can be obtained if no specific form is assumed for k0 ðtÞ: Another method for modeling failures at the component level is the stressstrength model. If X represents the strength of a part which is subjected to a stress, Y; there are two alternatives in reliability estimation (see Blischke and Murthy [9], for more information): Deterministic stress and random strength: In this case, the reliability R is given by: R ¼ PfX [ Yg ¼ 1 FX ðYÞ; where FX ðYÞ is the distribution function.

ð2Þ

Efficient Product Support

239

Random stress and strength: In this case, we define Z ¼ X Y and the reliability is given by: R ¼ PfZ [ 0g ¼ 1 FZ ð0Þ:

ð3Þ

For example, if the stress and strength are exponentially distributed, then: R¼

kY : kX þ kY

ð4Þ

Since the product is employed under conditions that are unknown and perhaps new, and many items are intended for a use different from typical and tested applications, it is common to modify predicted values by the application of environmental and other influencing factors. The aim is to account for different conditions, such as temperature, humidity, voltage stress, and so on. The adjustment is accomplished by multiplication of the predicted failure rate by appropriate constants. Generally, this method of prediction is particularly useful for items where failure is a result of breakage, rupture, etc. [9]. The ‘‘Part Stress Analysis’’ and the ‘‘Parts Count’’ methods are another two methods of reliability prediction, but they vary in the degree of information needed to apply them. The Part Stress Analysis method requires a greater amount of detailed information and is applicable during the later design phase, when actual products/components are being designed (MIL-HDBK-217F). The Parts Count method requires less information, generally part quantities, the quality level, and the application environment, and is applicable in the early design phase and during proposal formulation. For the most part, the Parts Count method results in a more conservative estimate (i.e. higher failure rate) of the system reliability than the Part Stress method (MIL-HDBK-217F). The Part Stress Analysis method is actually a refinement of the Parts Count method in that it involves the same basic steps [29]. In the former method, the system must be defined and a reliability model developed, while the latter is based on the principle that the reliability of any component depends upon the baseline failure rates and the environments in which the item is to be used. Basic assumptions of the Parts Count method are that the baseline failure rates are constant with time, and part failures are independent of each other. The part failure rates are calculated by multiplying the baseline failure rates by an appropriate environmental factor. Comparison of the different methods has led to the stress analysis method being used as a standard approach in the reliability prediction of electronic modules, where the failure rates are constant. To summarize our discussion of different reliability prediction methods, the ‘‘Proportional Hazard Model’’ is applicable in a variety of cases, especially for the prediction of the reliability and failure rate of mechanical parts. As mentioned, it is essentially distribution-free, no additional assumption is necessary about the failure times, and it is quite flexible. In addition, it uses covariates that affect the hazard of a system and is more suitable for situations where environmental

240

B. Ghodrati

conditions are monitored (as in our case), as environmental conditions can cause the failure rate of a system to change.

5.1 Reliability Models The exponential and Weibull reliability models are most commonly used for the reliability analysis of systems. The main assumption in the exponential model is that the times between failures are exponentially distributed, or expressed more simply, the failure (hazard) rate is independent of time. The failure of electronic components that have a constant failure rate follows this model, but other mechanical parts do not conform to the exponential model (i.e. do not have a constant failure rate), and fail due to ageing with time. Ageing or wear-out mechanisms such as corrosion, oxidation, and wear are time-dependent processes. They result in increasing failure rates for the parts, characterized by the Weibull model with shape parameter b [ 1: The Weibull reliability model is a versatile model for characterizing the life of machine parts (mechanical systems). The failure density function of the two-parameter Weibull distribution is defined as: h i btb1 ð5Þ f ðtÞ ¼ b exp ðt=gÞb ; t 0; g [ 0; b [ 0: g And the reliability function is given by: Z 1 h i RðtÞ ¼ f ðxÞdx ¼ exp ðt=gÞb ;

ð6Þ

t

kðtÞ ¼ ðb=gÞðt=gÞb1 ;

ð7Þ

where t [ 0; b [ 0; g [ 0: The parameter g is the ‘‘characteristic life’’ parameter. It has the same units as t; and the parameter b is a ‘‘shape’’ parameter and is a non-dimensional quantity. The great versatility of the Weibull distribution stems from the possibility of adjusting it to fit the many cases where the hazard rate either increases or decreases, because this distribution has no fixed characteristic shape. When b ¼ 1 represents the constant failure rate, the reliability model is converted to RðtÞ ¼ expðktÞ;

t 0;

ð8Þ

with the failure rate: kðtÞ ¼

1 1 ¼ g MTBF

ðMTTF for non-repairable componentsÞ:

This model represents the exponential reliability model. In it, RðtÞ is the reliability of the system, k is the constant failure rate ¼ 1=MTTF; and t is the period of

Efficient Product Support

241

operation. The exponential distribution is the most widely used and well-established statistical distribution, and it explains the general failure distribution of a system during its normal operating life period, when failure occurs at random. The most important factors for the applicability of this model are that the hazard rate must be constant, and the age should have no effect on the failure rate of the system. When b [ 1 represents an increasing failure rate, in the Weibull model, the b and g parameters can be determined by plotting ln lnð1=Rðti ÞÞ against lnðti Þ; the slope and intercept of the best fitted straight line to this data are the values of b and g respectively [Rðti Þ ¼ ði 0:3Þ=ðn þ 0:4Þ is the median rank formula; its advantages are that it is relatively easy to put confidence limits on the line, and censored data can be dealt with].

5.2 Operating-Environment-Based Reliability Analysis Only the parametric reliability methods with a specific assumption about the lifetime distribution (e.g. an exponential or a Weibull distribution) were popular at the beginning of reliability analysis of systems [26, 53]. However, restrictions on the fulfillment of assumptions of distribution fitting led to the development of nonparametric reliability models based on the method suggested by Kaplan and Meier [34] and Nelson [51]. The advantages of non-parametric models are that no specific distributional form needs to be assumed concerning the failure data and censored data can be considered easily [37]. These models can be used for modeling the effect of factors other than time (e.g. operating environment and the system/machine situation) as covariates on the reliability of the system. A major contribution to the concept of non-parametric regression methods for modeling the effects of covariates was made with the introduction of the Proportional Hazard Model (PHM) suggested by Cox [13]. Even so, a literature survey indicates that a relatively small number of industrial applications of these methods (especially in spare parts forecasting) has been performed and reported (e.g. Jardine et al. [32]; Kumar and Klefsjö [39]). Most of the previous research on the reliability analysis of systems considers operation time as the only variable for estimating the reliability of a system. However, as mentioned earlier, factors other than time influence the reliability characteristics of a system in its operation life cycle. These include the operating environment (e.g. temperature, pressure, humidity, or dust), the operating history of the machine (e.g. overhauls, effects of repair or types of maintenance), or the type of design or material; these are called risk factors or covariates. They affect the failure behavior of a system, but are usually ignored in reliability analysis. Because the operating environment influences the system reliability characteristics, it should be taken seriously when a reliability and hazard rate analysis is performed. To sum up, reliability can be defined on the basis of the intended function, the product operating life (time), and the environment of use (which

242

B. Ghodrati

includes exterior influence factors such as dust, temperature, etc., and the operators’ skills and competence). One method for analyzing the effects of covariates on the hazard rate (reliability) is to use regression models, which can be broadly classified into two groups, parametric and non-parametric regression models, on the basis of the approaches used [45, 46]. In parametric models, the lifetime of a system is assumed to have a specific distribution that depends on covariates, one example of such a model being the ‘‘Weibull regression model’’. In non-parametric models, however, the general approach is to decompose the hazard rate into two parts. The ‘‘proportional hazard model’’ [13], as said before, is an example of a non-parametric model, which was initially developed to assess the effects of environmental covariates on the hazard (covariates are explanatory variables and the hazard is the response variable in PHM) and is used here for calculating the system’s failure rates. The proportional hazard model (PHM) was initially applied in medical analysis [14] and thereafter was applied and used in engineering reliability analysis (e.g. Ansell and Phillips [1]; Jardine et al. [32]).

5.2.1 Proportional Hazard Model (PHM) The proportional hazard model (PHM) is a valuable statistical procedure to estimate the risk of equipment failing due to operating environment and conditions. The PHM model is based on the assumption that the hazard function for an item/ component of equipment is a product of the baseline hazard function of that item and an exponential term incorporating the effect of a number of explanatory variables or covariates. The generalized form of the proportional hazards model (PHM) that is most commonly used is written as [13]: ð9Þ hðx; zÞ ¼ h0 ðxÞwðzaÞ; Pn where hðx; zÞ is the hazard function, za ¼ i¼1 zi ai ; and a (column vector) is the unknown parameter of the model or regression coefficient of the corresponding n covariates (z) (row vector consisting of the covariate parameters) indicating the degree of influence which each covariate has on the hazard function; and h0 ðxÞ is the baseline hazard rate. The PHM as an ingenious distribution free approach to the analysis of data was first suggested by Cox [13]. If one assumes a particular form for h0 ðtÞ; a fully parametric proportional hazard model is obtained. The most important such model is the Weibull model, for which h0 ðtÞ ¼ ðb=gÞðt=gÞb1 ; this also includes the exponential model as the special case b ¼ 1: An advantage of the PHM method is that it is essentially distribution free: certain properties of the procedure do not depend upon the underlying lifetime distribution or, in other words, on h0 ðtÞ: This is actually true only when there is no censoring, but with many types of censoring, the dependence on h0 ðtÞ is small

Efficient Product Support

243

[45]. If the data come from a specific proportional hazard model such as the Weibull model, there will be some loss of efficiency in using the distribution free approach rather than the one based on the correct parametric model. In certain situations, however, this loss of efficiency is negligible [45]. In this model, it is assumed that, in the real life of a system, the hazard (failure) rate is influenced by the time during which, and the covariates under which, it operates. In other words, the hazard rate of a system is the product of the baseline hazard rate k0 ðtÞ; dependent on time only, and another positive functional term basically independent of time. This term incorporates the effects of a number of covariates, such as temperature, pressure, and others. The effects of the covariates may be to increase or to decrease the hazard rate. For example, in the case of bad operating conditions, poor and incomplete maintenance, or incorrect spare parts, the observed hazard rate is greater than the baseline hazard rate. However, in the case of good operating conditions, or improved and reliable components of a system, the observed hazard rate will be smaller than the baseline hazard rate [39]. The basic concept of this model is shown in Fig. 6. The baseline hazard (failure) rate is assumed to be identical to the total hazard rate when the covariates have no influence on the failure pattern. Fig. 6 Effects of risk factors (covariates) on the hazard rate of the system (Source: Kumar and Klefsjö [39])

Therefore, the observed hazard rate of a system with respect to the exponential form of function, which includes the effects of covariates, may be given as: ! n X zj aj ; ð10Þ kðt; zÞ ¼ k0 ðtÞ expðzaÞ ¼ k0 ðtÞ exp j¼1

where zj ; j ¼ 1; 2; . . .; n are the covariates associated with the system, and aj ; j ¼ 1; 2; . . .; n are the unknown parameters of the model, defining the effects of each n covariate. The multiplicative factor, expðzaÞ; may be termed the relative risk of failure due to the presence of the covariate z: The reliability functions are given by:

244

B. Ghodrati

RðtÞ ¼ ½R0 ðtÞ

exp

P n

za j¼1 j j

;

ð11Þ

where Z t k0 ðxÞdx ¼ exp½K0 ðtÞ; R0 ðtÞ ¼ exp

ð12Þ

0

and R0 ðtÞ is the baseline reliability function dependent only on time, and K0 ðtÞ is the cumulative baseline hazard rate.

6 Spare-Parts Forecasting: A Review Spare parts provisioning has become a vital component of a company’s inventory management in recent years, a phenomenon attributable to various factors. In order to be successful in today’s highly competitive market, manufacturers must provide high-quality, reliable products as well as excellent customer support. A lack of spare parts will result in a significant decrease in systems availability, reducing customer satisfaction and harming customer loyalty. Many industries now require close to 100% system availability. For instance, a processing plant or a power generating unit, often representing an investment of millions of dollars, can easily lose too much production through relatively short interruptions [5]. Nevertheless, manufacturers don’t want their capital tied up in inventory because this money could be invested in other areas that yield a decent rate of return. Excessive spare parts also result in higher holding costs and possible product deterioration and obsolescence. When concerned about the availability of a physical system within an isolated environment for an extended period, for example, aboard a ship at sea or aboard a space shuttle, it is necessary to design a spares provisioning policy [33]. In short, spare part provisioning is important for systems with a long useful life, or when short repair times or a considerable independence from the manufacturer is required [5]. It is essential to be able to accurately predict the number of spare parts required in a certain time horizon to reduce the total operation cost. It should be emphasized that spare parts estimation is only a single aspect of a much broader topic - inventory management. The present concern is the number of spare parts required during a certain planning horizon, whether one month, six months, one year, or so on. After obtaining a forecast for the planning horizon, ordering strategies are proposed for different categories of parts in view of their cost and criticality to the operation [58]. The spare parts discussed here are nonrepairable; in other words, repairing the component is extremely difficult or it is more economical to replace than to repair. The planning horizon could be the leadtime of a component, for instance, the time required to receive a component on site after an order has been placed, thus ensuring that no stock-out occurs [16].

Efficient Product Support

245

Many models have been established to estimate the number of spare parts required for certain operations. Each model has its unique advantages and disadvantages. In order to determine the optimal model, it is necessary to evaluate the circumstances. The fundamental concept, however, remains the same across all models. Every method assumes that as a system operates, over time its components will experience failures, which can be caused by initial design flaws or simple wear-out. The possibilities of failures can be modeled with certain statistical distributions, such as exponential, normal, and Weibull. After analyzing the previous failure data for each component, a cumulative failure distribution can be created to show the fraction of population that fails before a certain time t: At this point, the time horizon t needs to be arbitrarily defined, and the percentage of parts expected to fail before time t can be estimated. When this percentage is multiplied by the total amount of components in the system, the actual expected number of failures can be determined [62]. In this section we discuss some of the most prominent spare parts provisioning methods including their basic concepts, criteria, and applicability. The goal is to provide a basic understanding of current models in the field of spare parts forecasting.

6.1 The Poisson Process Model 6.1.1 Concept During their useful lifetimes, many components exhibit a constant hazard rate. This implies that the occurrence of failures is purely random and that there is no deterioration of the strength or soundness of the components with time. In this case, the instantaneous rate of failure is the same in any time. Although this analysis is not realistic for all time, it is a good approximation during the useful lifetime (the horizontal portion of the bathtub curve) of the component [56]. The homogeneous Poisson Process is simply a special case of the renewal process where: FðxÞ ¼ 1 ekx ¼ Pfsi xg;

i ¼ 1; 2;

ð13Þ

where si s are statistically independent non-negative random variables (e.g. failure free operating times). PfNðT; mÞ ¼ kg ¼

m amount of components T planning horizon

ai a e : i!

ð14Þ

246

B. Ghodrati

a expected number of failures in ½0; T; a ¼ mT=l ¼ mkt l mean time to failure of one component k hazard rate PfNðT; mÞ ¼ kg probability of having equal to kth failures on ½0; T PfNðT; mÞ\kg probability of having less than k failures on ½0; T PfSðk; mÞ [ Tg probability that the time until the kth failure is greater than T

PfNðT; mÞ\kg ¼ PfSðk; mÞ [ Tg ¼

k X ai i¼0

i!

ea p

ð15Þ

Now it’s possible to calculate the value of k; which indicates the stock level that ensures a reliability p (probability of not having a lack of spare parts). Finding the value of k can be accomplished either analytically or through the Poisson distribution table. To use the Poisson table, set a ¼ mkt and search the table for the lowest value of m; corresponding with a; that will provide a probability at least as high as desired [33].

6.1.2 Criteria and Conditions The homogeneous Poisson process may be used to analyze equipment reliability when the following conditions are satisfied [43]: • The lifetime distribution of the device is exponential, and the Poisson Process Model assumes a constant failure rate. • A new device is put into operation at time zero. • Whenever there is a failure, the device is either immediately repaired to ‘‘as good as new’’ condition or immediately replaced by a new identical device. • The lifetimes of all devices are independent of one another. • Random and independent failures occur over time. • Memory less characteristics • The data are iid (independent and identically distributed).

6.1.3 Applicability The Poisson Process Model is one of the most popular spare parts estimation methods because it is relatively simple to implement with respect to both the required data collection and analysis [21]. This simplicity also makes it economical since less time and manpower are required to conduct the analysis. An underlying advantage of this model is that the pooled output of several independent processes naturally tends to be a Poisson process, and the method provides a superior

Efficient Product Support

247

estimation to multi-component systems [61]. This approximation does not assume that the number of demands is very large (a reasonable assumption for relatively expensive, highly reliable components) [16]. The disadvantages of the model include the fundamental assumption of a constant failure rate, which means that the time to failure of the system must follow an exponential distribution. However, industrial systems often have age-related failure mechanisms [21], which indicate a non-constant failure rate that appears as a bathtub curve. Another disadvantage is its relative inaccuracy. Therefore, we do not suggest applying this model when dealing with the availability of critical spare parts.

6.2 The Renewal Process Model 6.2.1 Concept The Renewal Process Model is used to analyze the replacement of components upon failure to find the distribution and mean number of replacement parts. Assume that a system is comprised of many components acting independently and in such a way that an individual component failure causes a system failure. A renewal process, in which a failed component is immediately replaced with a new one, will cause the system to reach a steady-state, constant number of failures per unit of time. A renewal process is characterized by one entity, which is the distribution of time between renewals, denoted by FðtÞ: If NðtÞ represents the number of renewals (in our case the number of failures) that occur by time t; and if one assumes that the time-to-failure random variables Xi ; are independent and have a common distribution FðtÞ; then the probability distribution of the number of failures is given by: PfNðtÞ ¼ ng ¼ F n ðtÞ F nþ1 ðtÞ; Z t F n1 ðt xÞdFðxÞ; F n ðtÞ ¼

ð16Þ ð17Þ

0

MðtÞ ¼

1 X

F n ðtÞ:

n¼1

NðtÞ the number of renewals that occur by time t FðtÞ time between renewal random variables distribution F n ðtÞ the probability that the nth renewal occurs by time t MðtÞ expected number of renewals during a length of t

ð18Þ

248

B. Ghodrati

The Weibull distribution can be applied to the failure function here since the Weibull reliability model is the most versatile model for characterizing the life of machine parts [22]: FðtÞ ¼ 1 expfðt=gÞb g;

ð19Þ

rﬃﬃﬃﬃ t f2 1 t þ f /1 ðpÞ; Nt ¼ MðtÞ ¼ E½NðtÞ ¼ þ T T 2

ð20Þ

f¼

rðTÞ : T

p probability of spare parts availability f coefficient of variation of time to failure rðTÞ standard deviation of time to failures T average time to failure /1 ðpÞ inverse normal function The coefficient of variation of time to failures can be calculated based on the shape and scale parameters as follows: 1 ; ð21Þ T ¼ gC 1 þ b sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 1 2 rðTÞ ¼ g C 1 þ C 1þ : ð22Þ b b

6.2.2 Criteria and Conditions

• The system is restored to ‘‘as good as new’’ condition after the replacement [5]. • There is no correlation in the data [33].

6.2.3 Applicability The Renewal Process Model is widely applied in the industrial world because it offers an accurate prediction of the number of required spare parts in a given planning horizon. This model can incorporate non-constant failure rates in its calculations; hence, it more realistically forecasts the number of needed spare parts.

Efficient Product Support

249

This model, nonetheless, requires more input data, which makes it quite expensive to implement; furthermore, a production plant may not have an adequate amount of recorded data to analyze. The underlying assumption with this model is that the system is restored to its original conditions or ‘‘as good as new’’ after the replacement, but this may not be the case, since the act of replacement may have undermined the efficiency of the system.

6.3 The Normal Distribution Model 6.3.1 Concept The normal distribution is one of the best known and most widely used twoparameter distributions. It was discovered by De Moivre in 1733 as the limiting form of the binomial distribution for discrete variables. The normal distribution is often a good fit for the sizes of manufactured parts, populations of living organisms, magnitudes of certain electrical signals, and other natural phenomena. Its use in reliability evaluation is rather limited, except in the wear-out area [56]. T ðlk=mÞm pﬃﬃﬃ PfNðT; mÞ\kg ¼ PfSðk; mÞ [ Tg ¼ 1 U : ð23Þ r k

PfNðT; mÞ\kg probability of having less than k failures in ½0; T PfSðk; mÞ [ Tg probability that the time until the kth failure is greater than T l mean time to failure of one component m number of components r standard deviation of the failure of a component UðÞ the cumulative standard normal distribution T planning horizon It is possible to calculate the required number of spares given a certain desired probability (reliability) p from the following equation: PfSðk; mÞ [ Tg ¼ p: ð24Þ m pﬃﬃ ¼ zp ; where zp is obtained from a The number k can be found, as Tðlk=mÞ r k standard cumulative normal distribution table found in any statistics textbook. 0 ﬃ12 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 z r z r Tm p p þ A: ð25Þ k¼@ þ 2l 2l l

250

B. Ghodrati

6.3.2 Criteria and Conditions

• The planning time horizon is large compared to the mean time to failure of the components. 6.3.3 Applicability The Normal Distribution Model is applied when the planning time horizon is large in comparison to the mean time to failure of the components, which implies that there will be a large number of failures. The advantage of this method is that it incorporates a measure of variability of demand for spares through the standard deviation of demand of an individual component, which might be unknown [16]. The disadvantage of this model is that for expensive and highly reliable equipment, conditions required for the application of the approach are less likely to be met in a regular operational environment since having a large number of failures is unlikely given the highly reliable components.

6.4 The Constant Interval Model 6.4.1 Concept The constant interval model is a preventive measurement that takes into account the spare parts required for each preventive replacement as well the spares for any failure replacements [30]. Units or parts are replaced at scheduled times (for example, during scheduled shutdown maintenance of the machine) or periodic intervals. However, if any failure of the units occurs during these scheduled intervals, the additional failure replacements will also be made. These additional failure replacements will be rather small in number provided there is an optimal or near optimal selection of the scheduled replacement intervals [58]: T T : ð26Þ ENðT; tp Þ ¼ þ Hðtp Þ tp tp

ENðT; tp Þ expected number of spare parts required over the planning horizon, T; when preventive replacement occurs at time tp T planning horizon tp preventive replacement time T tp number of preventive replacements in interval ð0; TÞ

Efficient Product Support

Hðtp Þ

T tp

251

number of failure replacements in interval ð0; TÞ

Hðtp Þ expected number of failures in an interval of length tp In general, HðTÞ ¼

Z T 1 X ½1 þ HðT i 1Þ

iþ1

f ðtÞdt

T 1:

ð27Þ

i

i¼0

To determine the optimal preventive replacement time tp ; a model must be developed to minimize total downtime per unit time Dðtp Þ [30]. Dðtp Þ ¼

Hðtp ÞTf þ Tp : tp þ Tp

ð28Þ

Tp Down time due to preventive replacement Tf Time required making a failure replacement Hðtp ÞTf Expected downtime due to failures There will be a set of predetermined values for tp with known values for Tp ; Tf and Hðtp Þ: Therefore, the corresponding value of Dðtp Þ can be calculated for every tp ; the tp that results in the lowest Dðtp Þ will be the optimal replacement time.

6.4.2 Criteria and Conditions

• All the parts are replaced at a predetermined time regardless of their current condition. • The distribution of failure of a unit is known by investigating its life data. • Replacement time is negligible.

6.4.3 Applicability The advantage of this model is that it replaces all components under preventive maintenance at specific time intervals. For example, the braking pads of haul trucks in mining may all be replaced every six months, and the company may shut down its entire operation for a few days to implement the maintenance. In this case, there is no need to keep track of each component, thus allowing the company to save money. The disadvantage of this method is that since everything is replaced at a certain time, it is possible that a newly replaced component will shortly be replaced again.

252

B. Ghodrati

For instance, if a braking pad fails and is replaced a few days before the scheduled preventive maintenance, the new braking pad will only serve for a short time before being replaced by another new pad. This could be very uneconomical if the spare part is expensive. The constant interval model should, therefore, be used for fast-moving, cheap, and highly consumable components, not expensive and critical parts.

6.5 The Age-Based Preventive Replacement Model 6.5.1 Concept In the case of the age-based preventive replacement model, we calculate the expected time to replacement and divide this time into the planning horizon [30]: ENðT; tp Þ ¼

T : tp Rðtp Þ þ Mðtp Þ½1 Rðtp Þ

ð29Þ

tp preventive replacement time T planning horizon ENðT; tp Þ expected number of spare parts required over the planning horizon, T; when preventive replacement occurs at time tp Rðtp Þ probability of a preventive replacement cycle Mðtp Þ mean time to failure when preventive replacement occurs at age tp

Number of spares required due to preventive replacement ¼ Number of spares required due to corrective replacement ¼

T : tp Rðtp Þ

ð30Þ

T : Mðtp Þ½1 Rðtp Þ ð31Þ

6.5.2 Criteria and Conditions

• • • • • •

The aging and condition of the components are observed. The distribution of failure of a unit is known by investigating its life data. The age of a unit is defined as its operating time, not calendar time. Failures can be detected instantly. Replacement time is negligible. A newly installed unit begins to operate immediately.

Efficient Product Support

253

6.5.3 Applicability The advantage of this method is that it keeps track of the history of every component; therefore, it reveals the age of the component and avoids the mistake of replacing a new component. When this model is adapted, the company does not need to shut down for days for the implementation of preventive maintenance because the maintenance will be carried out in an ongoing basis. The disadvantage lies in the fact that an extensive amount of data must be collected and managed to ensure that the information of every component is known. This will be very costly in monetary terms and thus is only applicable to more expensive and critical parts.

6.6 Bayesian Approach for the Special Case of a Lack of Data 6.6.1 Concept In the real world, a complete set of data to determine the failure rate of a system is often difficult and might even be impossible to acquire, especially in the case of a new system that has not operated long enough for an adequate amount of data to be collected. Nevertheless, the Bayesian approach allows for the incorporation of prior information into the estimation of parameters of interest. The Bayesian approach incorporates: 1. 2. 3. 4.

Previous system estimates. Generic information coming from actual data from similar systems. Generic information from reliability sources. Expert judgment and belief.

Previous information and experiences are classified as the prior; this is combined with the information collected after the operation of the system, termed the posterior. Together, the analysis provides a more accurate estimation of the values of the desired parameters. When there is a lack of data, the failure rate will be uncertain; therefore, the Gamma distribution can be used with parameters a and b: The calculation of a and b here are derived from previous information, making it a priori estimation. a¼

½EðkÞ2 ; VarðkÞ

ð32Þ

b¼

EðkÞ : VarðkÞ

ð33Þ

EðkÞ mean of the rate of demand VarðkÞ variance of the rate of demand

254

B. Ghodrati

The posterior distribution of the failure rate reflects additional information gathered about this rate through the observation of failure/demand process [16]. The fundamental assumptions of the posterior distribution include: 1. Arrival of demands from a single component follows a Poisson process. 2. Poisson distribution is assumed with the superposition of demand processes rate mk where m is the number of processes being superposed. The information gained from the posterior observation can be used to update the original values of a; b as well as the mean demand/failure rate: a0 ¼ a þ N;

ð34Þ

b0 ¼ b þ Tint ;

ð35Þ

E½k ¼

a0 ; b0

where N is the average number of failures per unit in the period of interest and Tint is the predetermined period of interest.

6.6.2 Applicability The advantage of the Bayesian approach is that it could provide accurate estimations of the number of spares required without too much available information. This is useful when a new component is installed, and there are little available data that can be used to analyze its failure rate. In such a scenario, analysts could use information from the old component, perhaps either an older version of the new component or the same component run down over time. The disadvantage of the Bayesian approach is that it requires us to specify a prior distribution for all unknown parameters. When there is concrete prior knowledge about the parameters, this can be done. But in many cases, prior knowledge is either vague or non-existent, making it difficult to specify a unique prior distribution. Different people, having different opinions, may suggest different priors, and arrive at different answers. The question of ‘‘objectivity’’ is a concern here. Engineers often disagree on the conclusion and interpretation of (classical) statistical conclusions, possibly because of the different ‘‘prior information’’ they have.

6.7 Proportional Hazards Model (PHM) 6.7.1 Concept Every spare parts estimation method requires the value of a hazard rate which depicts the instantaneous failure rate of a system. Manufacturers often provide

Efficient Product Support

255

information on the required number of spare parts for each component of a system for a stated period of time, but their predictions are usually inaccurate, as the operating environment of the system plays a significant role in its failure rate. The Proportional Hazards Model (PHM) takes these environmental impacts as covariates, making it the most useful model to determine the actual hazard rate:

kðtÞ ¼ k0 ðtÞ exp

n X

! aj zj :

ð36Þ

j¼1

The PHM is composed of two parts. The first component is a time-dependent baseline hazard rate, which can be modeled in terms of a Weibull distribution with shape parameter b and scale parameter g: k0 ðtÞ ¼

b t b1 : g g

ð37Þ

The second component is an exponential function that considers the effects of covariates, including temperature, operator skill, dust etc. The effect of covariates may increase or decrease the baseline hazard rate depending on the nature of the particular covariate. ! n X b t b1 exp aj z j : kðtÞ ¼ g g j¼1

ð38Þ

Here zj is a row vector consisting of the covariates and aj is a column vector consisting of the regression parameters. The covariates zj are associated with the system, and aj are the unknown parameters of the model defining the effects of the covariates (Ghodrati, 2005). The partial likelihood method is normally used to approximate the values for aj and the details of this estimation can be found in many statistics books. The basic assumptions of the Proportional Hazards Model are the following [38]: 1. All influential covariates are included in the model. 2. The ratios of any two hazard rates are constant with respect to time; therefore, the baseline hazard rate is identical, and the covariates have a multiplicative effect on the hazard rate. A goodness-of-fit test is frequently applied to evaluate the validity of the third assumption. The plot of logarithm of the estimated cumulative hazard rates against time should simply be shifted by an additional constant, while the estimate of the regression parameter of the covariate is taken as strata [22].

256

B. Ghodrati

6.7.2 Applicability The advantage of the PHM is obvious; it is capable of incorporating the effects of covariates into the estimation of the hazard rate of the system components. Covariates often have dramatic effects on the failure rate of the system; for instance, a component is more prone to failure when it works in the desert than indoors in a controlled climate. Another significant advantage of the PHM is that it allows many of the models to accept different forms. The baseline hazard rate can be expressed as Weibull, normal, exponential, or even hyper exponential distributions [30]. The covariate term can take on such forms as exponential, logistic, inverse linear, and linear. This flexibility gives the PHM the ability to model a wide variety of data, making it suitable for many applications.

6.8 Conclusions Many different models can be used for spare parts estimation. Each has its unique advantages and disadvantages, and a model should be selected depending on the conditions of an individual case. The renewal process model, nonetheless, is the most comprehensible model, because it can deal with non-constant failure rate scenarios. In a component’s lifetime, its failure rate usually follows a bathtub curve. The Poisson process model is appropriate for the useful lifetime of a component since it assumes a constant hazard rate, and the normal distribution model is suitable for the wear-out region of the bathtub curve (right side tail). The renewal process model, meanwhile, can be used to model the entire time life of a component including the infant-mortality period, the useful life, and the wear-out period. Although the renewal process model is more complicated than the other models, its comprehensiveness outweighs its shortcomings. When hampered by a lack of data, the Bayesian approach can be used to forecast the number of spare parts required by combining previous experiences with the current information. The updating of parameters is a simple and intuitive process. The Proportional Hazards Model (PHM) incorporates the effects of covariates in the calculation of the hazard rate, thus more accurately predicting the failure behavior of a system.

7 Realistic Spare Parts Estimation (Forecasting) As mentioned before, a great deal of research has considered the general area of spare provisioning, especially spare parts logistics. Most of these researches deal with repairable systems and spares inventory management, and use a queuing

Efficient Product Support

257

theory approach to determine the spare parts stock to ensure a specified availability of the system. The following features have been extensively discussed: • Mostly repairable systems. • Queuing theory, with the demand rate k and the repair rate l: There is a problem with this, since the failure rate is based on the operational time to failure, while the demand rate (used in inventory models) and repair rate (used in availability models) are based on calendar time. This distinction has not been dealt with. These queuing theory based models primarily deal with constant failure rates and constant repair rates (exponential time to failure and time to repair), although this assumption is restrictive, particularly for mechanical parts. Mechanical parts often fail due to aging over time. Aging or wear-out mechanisms, such as creep, fatigue, corrosion, oxidation, diffusion, and wear, are all time-dependent processes. Meanwhile, quantitative techniques based on reliability theory have been used for estimating the failure rates of the components to be purchased and/or stocked [30, 23, 61]. This failure rate has been used to determine more accurate demand rates. Most of the studies in the spare parts domain have been in inventory management, guaranteeing the availability of systems/machines by emphasizing that spare parts are always available on demand. The estimation and calculation of the required number of spare parts considering their techno-economical characteristics (reliability, maintainability, life cycle cost, etc.) have rarely been considered. In addition, none of the surveyed literature dealing with required spare parts calculations based on the reliability characteristics of a product has considered the operating environment as a factor influencing reliability (e.g. Jardine and Tsang [30]; Lewis [47]). The estimations are not sufficiently accurate, because in real-life situations, as mentioned earlier, several factors other than time influence the reliability characteristics of parts/systems. By taking these factors (covariates) into account in our calculations, we can assume the term expðazÞ in the hazard rate function ½hðt; zÞ to be proportionate to the actual working conditions, as a constant coefficient. Then, P Z t exp Pn aj zj n j¼1 exp a z j j j¼1 k0 ðxÞdx ¼ ½expðK0 ðtÞÞ : ð39Þ RðtÞ ¼ exp

0

Therefore, it appears reasonable to take operating environment issues into account when studying and analyzing systems’ reliability, according to the estimate and forecast of the required spare parts, which has been almost neglected up till now.

258

B. Ghodrati

7.1 Product Reliability Characteristics and Operating Environment Based Spare Parts Estimation As mentioned earlier, the environmental conditions in which equipment is to be operated, such as temperature, humidity, dust, etc. often have considerable influence on the product’s reliability characteristics [9, 40]. An operating environment and its factors, represented by covariates, should be seriously considered when dimensioning product support and drawing up service delivery performance strategies, as this one factor will likely have a significant impact upon the operational/maintenance cost and service quality. Some important examples of operating environment factors (covariates) are: • Working environment: – Climatic conditions such as the temperature and humidity in which a system will be working. – Physical environment factors such as dust, smoke, fumes, corrosive agents, and the like. • User characteristics such as operator skill, education, culture, and language. • Operating place or location: this factor refers to workplace settings such as outdoor (free) or closed (surrounded) spaces, the branch of industry that will be using the product, and/or other characteristics of the area (such as mines) where a product will be used. • Level of application: the system may be intended to have a major/main purpose, a minor or auxiliary purpose, or even a standby purpose in an operational setup. • Work time and period of operation: planning may call for a product to be in continuous or part-time operation. The covariates influence the system’s (including the components’) hazard (failure) rate, so that the observed hazard rate may be either greater or smaller than the baseline hazard rate (Fig. 6). Meanwhile, for better estimation of the reliability characteristics, the use of regression models is suggested because of the ability to include the covariates. As noted, the proportional hazard model (PHM) was introduced by Cox [13] and is a regression type model. The PHM complements the set of tools used in reliability analysis and has some particularly advantageous features [39, 32]. It is classified as a multiplicative and mostly non-parametric regression model which considers covariates and which assumes that the hazard rate of a system/component is a product of the baseline hazard rate k0 ðtÞ; dependent on time only, and a positive functional term, wðz; aÞ; basically independent of time, incorporating the effects of a number of covariates such as temperature, pressure and changes in design. Thus: kðtÞ ¼ kðt; zÞ ¼ k0 ðtÞwðz; aÞ;

ð40Þ

Efficient Product Support

259

where z is a row vector consisting of the covariates, and a is a column vector consisting of the regression parameters. Two popular mathematical models used in spare parts provisioning are based on renewal theory and the homogeneous Poisson process as a special case of a renewal process. The homogeneous Poisson process can be used whenever the failure rate is constant (meaning that each failure mode and other factors which influence the demand should follow the exponential distribution). Whenever the failure rate is not constant, we use renewal theory to forecast demands for spares. It is important to note that the above statement is valid only for non-repairable spares (components). In addition, when a system that comprises several different non-repairable components fails due to the failure of any of the components retaining the system, the failed item is replaced with a new one. In other words, the minimal repair is carried out for the system, and the failure rate of the system after the replacement of the failed component is the same as before the failure. Therefore, in this category of system, failure occurs according to a non-stationary Poisson process [8]. The failure rate associated with the failure distribution function is a reliability measure, used in this case for calculating the average required number of spare parts for a defined time horizon. However, for each non-repairable item/component in a system, failures and hence replacements over time occur according to a renewal process, as each failed item is replaced by a new one [8]. Put otherwise, the time to failure (or time between replacements) is used as a reliability measure for estimating the number of required spare parts (Fig. 7).

Fig. 7 Comparison between the failure rate of a single component and that of a system

7.1.1 Poisson Process Model for Forecasting Required Spare Parts With the assumption of replacing the parts/components upon failure, homogeneous Poisson process models can be used when the time to failure follows an

260

B. Ghodrati

exponential distribution with a constant mean value; i.e. when the failure rate is constant. A constant failure rate could mean that the number of occurrences per time unit does not vary over time, but normally means that the conditional probability of failure per time unit is constant. Consequently, items with an agerelated failure mechanism cannot be modeled using the Poisson process. However, the Poisson process can be used to model higher indenture spares such as Line Repairable Units (LRU) in the steady-state. In LRU with a large number of components which can be modeled using an independent renewal process, theorems by Palm [55] and Drenick [19] state that in the steady-state, the time between removals follows an exponential distribution; i.e. the demand follows a Poisson process. The exponential reliability model is a simple and applicable model to use, especially when the effects of covariates are considered in the study of nonrepairable elements/systems. In this case, the total number of spare parts available, with the assumption of an exponentially distributed lifetime, can be calculated through the use of the following equation (see Billinton and Allan [4]; and Kumar et al. [42] for background information): n X ðktÞk ; ð41Þ 1 PðtÞ ¼ expðktÞ k! k¼0 where PðtÞ Probability of a shortage of spare parts (1 PðtÞ ¼ Confidence level of spare part availability or service level) k Failure rate of an objective part (with regard to the effect of covariates) t Operation time of system n Total number of spare parts available in period t This equation is based on a Poisson distribution that represents the probability of an isolated event which occurs a specified number of times in a given interval of time, and, as mentioned before, one requirement of the Poisson distribution is that the hazard rate should be constant. In such circumstances, the hazard rate is generally termed the failure rate. If q represents the number of the same part in use at the same time, then q is entered into the equation in the form of multiplication by ktq: In this way, the calculated n will represent the total required number of spare parts for the whole system.

7.1.2 Renewal Process Model for Forecasting Required Spare Parts Renewal theory was originally used to analyze the replacement of equipment upon failure, to find the distribution of the number of replacements, and the mean number of replacements [42]. It is the most appropriate tool for predicting the demand for consumable items.

Efficient Product Support

261

Generally, in the analytical world, function evaluation is faster, making optimization feasible. The classes of analytical models that we like to compare are based on general renewal processes because component (non-repairable parts) failure processes are naturally described by these processes. The theory of renewal processes is well developed [23, 57]. A (an ordinary) renewal process is characterized by one entity, the distribution for the time between renewals, denoted by FðtÞ: If NðtÞ represents the number of renewals (in our case the number of failures) that occur by time t; and if we assume that the time-tofailure random variables Xi ; i 1; are independent and have a common distribution FðtÞ; then the probability distribution of the number of failures is given by: PfNðtÞ ¼ ng ¼ F n ðtÞ F nþ1 ðtÞ; where F n ðtÞ is the n-fold convolution of FðtÞ and is given by: Z t F n ðtÞ ¼ F n1 ðt xÞdFðxÞ:

ð42Þ

ð43Þ

0

Here F n ðtÞ denotes the probability that the nth failure will occur by time t: The expected number of failures, MðtÞ; during a length of t is given by: MðtÞ ¼

1 X

F n ðtÞ:

ð44Þ

n¼1

The above equation is known as the Renewal Function and gives the number of renewals during ð0; tÞ; it can be also written as Blischke and Murthy [8]: Z t Mðt xÞf ðxÞdðxÞ: ð45Þ MðtÞ ¼ FðtÞ þ 0

For instance, for an exponential time to failure distribution: FðtÞ ¼ 1 expðktÞ:

ð46Þ

Because the Weibull reliability model is an appropriate model for characterizing the life of machine parts (mechanical systems), by substituting the Weibull cumulative distribution function for the time to failure, we have: " # b t : ð47Þ FðtÞ ¼ 1 exp g Consider replacements of a part having an average time to failure denoted by T and a standard deviation of time to failures denoted by rðTÞ (so that f ¼ rðTÞ=T denotes the coefficient of variation of the time to failures). If the operation time t of the system or machine in which this part is installed is quite long, and several replacements need to be made during this period, the average number of failures

262

B. Ghodrati

E½NðtÞ ¼ MðtÞ will stabilize to the asymptotic value of the renewal function as [23]: t f2 1 Nt ¼ MðtÞ ¼ E½NðtÞ ¼ þ T 2 ¼ Average number of failures in time t;

ð48Þ

and the corresponding failure intensity or renewal rate function is given by : mðtÞ ¼

dMðtÞ dE½NðtÞ 1 ¼ ¼ : T dt dt

ð49Þ

The difficulty in obtaining MðtÞ [the mean number of the renewal function] from Eq. 49 is that it appears on both sides of the equation. If Eq. 49 can be approximated so that MðtÞ appears only on the left hand side and the right hand side contains only known or prescribed functions, then we have an approximate solution that may be obtained either analytically or computationally. Different approximations have been suggested by various researchers and research centers (Spearman [60]; Smeitink and Dekker [59]; Blischke and Murthy [8]; Cui and Xie [15]; NTNU [52]). The approximation method used here to determine the number of failures and the average number of required spare parts (the method suggested by Gnedenko) was compared with a couple of these suggested methods (for t ¼ 5600 hrs). The results (presented in Table 1) were very close to each other, indicating the acceptability of our results. Table 1 The mean number of the renewal function in the case of the Weibull distribution with different values of the shape parameter and different approximation methods MSpearman MMurthy MNTNU MDekker MGnedenko b=1 b=3 b=5

2.800014 2.375542 2.271713

2.80003 2.37554 2.37171

2.8000 2.3678 2.3037

2.799192 2.401217 2.261201

The standard deviation of the number of failures in time t is: rﬃﬃﬃﬃ t r½NðtÞ ¼ f : T

2.8026 2.3641 2.2447

ð50Þ

If time t in the above equations representing a planning horizon is large, then NðtÞ is approximately normally distributed (based on a central limit theorem) with mean ¼ NðtÞ: Therefore, the approximated number of spares Nt needed during this period with a probability of shortage ¼ 1 p is given by: rﬃﬃﬃﬃ t f2 1 t Nt ¼ þ þ f U1 ðpÞ; ð51Þ T T 2

Efficient Product Support

263

where U1 ðpÞ is the inverse normal distribution function, available in probability textbooks. Assuming the Weibull reliability model to be a versatile model for characterizing the life of mechanical parts, and integrating the effect of covariates with regard to the proportional hazard model, we have: ! ! n n X X b0 t b0 1 b0 tb0 1 ð52Þ kðtÞ ¼ exp aj zj ¼ exp aj zj b g0 g0 g0 0 j¼1 j¼1 kðtÞ ¼

b tb0 1 0 P b g0 0 exp nj¼1 aj zj

b0 tb0 1 kðtÞ ¼ h ib0 : Pn 1 g0 exp b j¼1 aj zj

ð53Þ

ð54Þ

0

This equation indicates the Weibull distribution, with the shape parameter and scale parameter as: ( b¼b h P 0 i1=b0 ð55Þ n g ¼ g0 exp j¼1 aj zj The reliability model obtained by assuming that b0 ¼ baseline shape parameter and g0 ¼ baseline scale parameter can be defined as: " # t b FðtÞ ¼ 1 RðtÞ ¼ 1 exp g 9b0 1 0 8 ð56Þ > > = < t C B ¼ 1 exp@ h i1=b0 > A > ; :g exp Pn a z 0 j¼1 j j Thus, it can be concluded that the influencing covariates change the scale parameter only, while the shape parameter remains virtually unchanged. Here b0 and g0 are the initial (baseline) shape and scale parameters, respectively, in the Weibull distribution. The coefficient of variation of the time to failures can be calculated based on the existing shape and scale parameter as below: f¼

rðTÞ ; T

ð57Þ

where 1 T ¼ gC 1 þ ; b

ð58Þ

264

B. Ghodrati

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 1 rðTÞ ¼ g C 1 þ C2 1 þ : b b

ð59Þ

8 Study and Analysis of Exponential and Weibull Models in Spare Parts Estimation There are some advantages and disadvantages of implementing the exponential and/or the Weibull renewal processes in spare parts estimation. On the one hand, the exponential model is simple and easy to implement for both the required data collection and analysis. However, the approximated Weibull renewal processes model is more appropriate for calculating the total number of available spare parts accurately. Below are the results of a comparison of the two methods, based on the implemented calculation process for different values of the baseline mean time to failure ðMTTF0 Þ; the shape parameter ðbÞ; and the effect of covariates Pn as an assumed fix coefficient]. In the implemented [Co:Eff : ¼ exp j¼1 aj zj calculation process, we use both the exact exponential model and the approximated Weibull renewal model methods to estimate the average required number of spare parts in a specified planning horizon. • In both the exact exponential and the approximated Weibull models, the number of required spare parts decreases as the baseline mean time to failure increases. The ratio of the number of spare parts estimated through the exponential method

Fig. 8 The plot for the average number of required spare parts against the baseline mean time to failure for the Weibull and exponential methods

Efficient Product Support

265

to that estimated through the Weibull method is approximately two to one (on average). In addition, the slope of the lines is sharp before 3000 hrs and afterwards subsides. We can therefore conclude that for the working period before 3000 hrs, it is more beneficial to use the Weibull model, which is more accurate (a big difference in the number of required spare parts compared to the exponential model). For the period after 3000 hrs, the exponential model, which is easy to use and implement, can replace the Weibull model. From another point of view, for the less expensive component, it might be more economical to use the exponential model than the accurate but costly (more time-consuming) Weibull model. (Figure 8 shows the average number of required spare parts based on different values of the baseline mean time to failure. The beta value ðb ¼ 3:5Þ; the coefficient of covariates ðCo:Eff : ¼ 1:5Þ and the system operation time ðt ¼ 5600 hrsÞ were assumed to be constant). • The exponential model is influenced more by covariates than the Weibull renewal model, but only because the covariates affect directly (in the multiplication form) the failure rate in the exponential model. In the Weibull model, the covariates affect only the scale parameter as a factor in the failure rate (Figure 9 where b ¼ 3; MTTF0 ¼ 3000 hrs and t ¼ 5600 hrs were assumed to be constant). When the effect of covariates is equal to one, there is no environmental/external influence on the number of failures and consequently on the number of required spare parts. A value of the effect of covariates greater than one indicates a worse situation for the system operating condition, whereas a value of the effect of covariates less than one represents a good/improved working environment.

Fig. 9 The plot for the average number of required spare parts against the effect factor of covariates for the Weibull and the exponential methods

266

B. Ghodrati

• The multi-comparison of the effects of covariates and the b value (Fig. 10) shows that the covariates have less influence on the average required number of spares for components with a high b; thus confirming our previous statement about covariates.

Fig. 10 The multi-analysis of the effects of covariates and the b value on the average number of required spare parts

• With an increasing b value, the average number of required spare parts decreases. This is more predictable1: an increasing b means that the component failure rate increases. The system failure intensity is not affected in the same way, meaning that for b 1:5; the exponential model is more suitable in the context of application and analysis costs (in Fig. 11, Co:Eff : ¼ 1:5; MTTF0 ¼ 3000 hrs and t ¼ 5600 hrs were assumed to be constant) (Fig. 12).

1 The curve of the probability density function moves to the right hand side with an increasing b value (Fig. 12). This means that the probability of failure at the beginning of operation is high when the b value is less in comparison with a high b value. In other words, with an increasing b value, the time to the first failure increases as well, something considered here for nonrepairable modules. However, with regard to the useful life of equipment, if the equipment’s life tends to be infinite, the system/component with a high b value will need more spares. If the manufacturer offers a longer warranty period just by increasing the b value of components, this may result in the customer being dissatisfied after the warranty period due to an increased number of failures. Therefore, there is an important trade-off between the b value and the life length of equipment and the warranty cost.

Efficient Product Support

267

Fig. 11 The plot for the average number of required spare parts against the b

Fig. 12 The plot of the pdf for different values of b

References 1. Ansell JI, Philips MJ (1997) Practical aspects of modeling of repairable systems data using proportional hazards models. Reliab Eng Syst Saf 58:165–171 2. Armistead CG, Clark G (1992) Customer service and support. Pitman, London 3. Aronis KP, Magou I, Dekker R, Tagaras G (2004) Inventory control of spare parts using a Bayesian approach: a case study. Eur J Oper Res 154:730–739 4. Billinton R, Allan RN (1983) Reliability evaluation of engineering systems: concepts and techniques. Pitman Books Limited, Boston 5. Birolini A (2004) Reliability engineering theory and practice, 4th edn. Springer, New York

268

B. Ghodrati

6. Blanchard BS (2001) Maintenance and support: a critical element in the system life cycle. In: Proceedings of the International Conference of Maintenance Societies, Melbourne 7. Blanchard BS, Fabrycky WJ (1998) Systems engineering and analysis, 3rd edn. PrenticeHall, Upper Saddle River 8. Blischke WR, Murthy DNP (1994) Warranty cost analysis. Marcel Dekker Inc., New York 9. Blischke WR, Murthy DNP (2000) Reliability: modeling, prediction, and optimization. Wiley, New York 10. Cassady CR, Pohl EA, Jin S (2004) Managing availability improvement efforts with importance measures and optimization. IMA J Manage Math 15:161–174 11. Chelbi A, Ait-Kadi D (2001) Spare provisioning strategy for preventively replaced systems subjected to random failure. Int J Prod Econ 74:183–189 12. Cooper RG, Kleinschmidt EJ (1993) Major new products: What distinguishes the winners in the chemical industry. J Prod Innov Manage 10:90–111 13. Cox DR (1972) Regression models and life-tables. J R Stat Soc B 34:187–220 14. Cox DR, Oakes D (1984) Analysis of survival data. Chapman and Hall, London 15. Cui LR, Xie M (2003) Some numerical approximations for renewal function oflarge Weibull shape parameter, Communications in Statistics: B: Simulation and Computation 32(1):1–16 16. Darko ML (2007) Optimization of critical spare parts inventories: a reliability perspective. Ph.D. thesis, University of Toronto, Canada 17. Dekker R (1996) Applications of maintenance optimization models: A review and analysis. Reliab Eng Syst Saf 51:229–240 18. Dhakar TS, Schmidt CP, Miller DM (1994) Base stock level determination for high cost low demand critical repairable spares. Comput Oper Res 21:411–420 19. Drenick RF (1960) The failure law of complex equipment. J Soc Ind Appl Math 8:680–690 20. Fortuin L, Martin H (1999) Control of service parts. Int J Oper Prod Manage 19:950–971 21. Ghodrati B (2005) Reliability and operating environment based spare parts planning. Ph.D. thesis, Luleå University of Technology, Sweden ISSN: 1402–1544 22. Ghodrati B, Kumar U (2005) Reliability and operating environment based spare parts estimation approach: a case study in Kiruna Mine, Sweden. J Qual Main Eng 11:169–184 23. Gnedenko BV, Belyayev YK, Solovyev AD (1969) Mathematical methods of reliability theory. Academic Press, New York 24. Goffin K (1998) Evaluating customer support during new product development: an exploratory study. J Prod Innov Manage 15:42–56 25. Goffin K (2000) Design for supportability: Essential component of new product development. Res Technol Manage 43:40–47 26. Høyland A, Rausand M (1994) System reliability theory: models and statistical methods. John Wiley and Sons, New York 27. Huiskonen J (2001) Maintenance spare parts logistics: special characteristics and strategic choices. Int J Prod Econ 71:125–133 28. IAEA (International Atomic Energy Agency) (2001) Reliability assurance programme guidebook for advanced light water reactors. IAEA-TECDOC-1264, Vienna 29. Intellect (2003) Reliability: a practitioner’s guide. The Information Technology Telecommunications and Electronics Association, Relex Software Corporation 30. Jardine AKS, Tsang AHC (2006) Maintenance, replacement and reliability: theory and application. CRC, Taylor and Francis, Boca Raton 31. Jardine AKS, Joseph T, Banjevic D (1999) Optimizing condition-based maintenance decisions for equipment subject to vibration monitoring. J Qual Main Eng 5:192–202 32. Jardine AKS, Banjevic D, Wiseman M, Buck S, Joseph T (2001) Optimizing mine haul truck wheel motors’ condition monitoring program: use of proportional hazards modeling. J Qual Main 7:286–301 33. Kales P (1998) Reliability: for technology, engineering, and management. Prentice-Hall Inc., USA 34. Kaplan EL, Meier P (1958) Non-parametric estimation from incomplete observations. J Am Stat Assoc 53:457–481

Efficient Product Support

269

35. Kennedy WJ, Patterson JW, Fredendall LD (2002) An overview of recent literature on spare parts inventories. Int J Prod Econ 76:201–215 36. Krajewski LJ, Ritzman LR (2005) Operations management: processes and value chains, 7th edn. Pearson Prentice Hall, New Jersey 37. Kumar D (1996) Reliability analysis and maintenance scheduling considering operating conditions. Ph.D. Thesis, Luleå University of Technology, Sweden 38. Kumar D, Klefsjö B (1994) Proportional hazards model: an application to power supply cables of electric mine loaders. Int J Reliab Qual Saf Eng 1:337–352 39. Kumar D, Klefsjö B (1994) Proportional hazards model: a review. Reliab Eng Syst Saf 44:177–188 40. Kumar D, Klefsjö B, Kumar U (1992) Reliability analysis of power transmission cables of electric mine loaders using the proportional hazard model. Reliab Eng Syst Saf 37:217–222 41. Kumar KR, Loomba APS, Hadjinicola GC (2000) Theory and methodology: marketingproduction coordination in channels of distribution. Eur J Oper Res 126:189–217 42. Kumar UD, Crocker J, Knezevic J, El-Haram M (2000) Reliability, maintenance and logistic support: a life cycle approach. Kluwer Academic Publishers, USA 43. Kuo W, Zuo MJ (2003) Optimal reliability modeling. John Wiley and Sons, New Jersey 44. Langford JW (1995) Logistics: principles and applications. McGraw-Hill Inc, New York 45. Lawless JF (1982) Statistical models and methods for lifetime data. John Wiley and Sons, New York 46. Lawless JF (1983) Statistical methods in reliability (with discussion). Technometrics 25: 305–335 47. Lewis EE (1996) Introduction to reliability engineering. John Wiley and Sons, New York 48. Markeset T (2003) Dimensioning of product support: issues, challenges, and opportunities. Ph.D. thesis, Stavanger University College, Norway, ISBN 82-7644-197-1 49. Markeset T, Kumar U (2003) Design and development of product support and maintenance concepts for industrial systems. J Qual Main Eng 9:376–392 50. Markeset T, Kumar U (2003) Integration of RAMS and risk analysis in product design and development work processes: a case study. J Qual Maint Eng 9:393–410 51. Nelson W (1969) Hazard plotting for incomplete failure data. J Qual Tech 1:27–52 52. NTNU (2005) Calculation of renewal function in the Weibull distribution. http://www. ntnu.no/ross/info/ownprog.php. accessed on September 20, 2005 53. O’Connor PDT (1991) Practical reliability engineering, 3rd edn. John Wiley and Sons, West Sussex 54. Orsburn DK (1991) Spares management handbook. McGraw-Hill, USA 55. Palm C (1938) Analysis of the Erlang traffic formula for busy-signal arrangements. Ericsson Tech 5:39–58 56. Ramakumar R (1993) Engineering reliability: fundamentals and applications. Prentice Hall, Eaglewood Cliffs 57. Rigdon SE, Basu AP (2000) Statistical methods for the reliability of repairable systems. John Wiley and Sons, New York 58. Sheikh AK, Younas M, Raouf A (2000) Reliability based spare parts forecasting and procurement strategies. In: Ben-Daya M, Duffuaa SO, Raouf A (eds) Maintenance, modeling and optimization. Kluwer Academic Publishers, Boston 59. Smeitink E, Dekker R (1990) A simple approximation to the renewal function. IEEE Transaction on Reliability 39(1):71–75 60. Spearman ML (1989) A simple approximation for IFR Weibull renewal function. Microelectron Reliability 29(1):73–80 61. Wååk O, Alfredsson P (2001) Constant vs. non-constant failure rates: some misconceptions with respect to practical applications. Systecon Publications, Sweden 62. Wong JYF, Chung DWC, Ngai BMT, Banjevic D, Jardine AKS (1997) Evaluation of spares requirements using statistical and probability analysis techniques. Trans Mech Eng IEAust 22:77–84

Index

(N, T)-policy, 12

A Accelerated ageing process, 194 Accelerated burn-in, 179 Accelerated environment, 179 Accelerated failure time, 193 Accelerated life, 52 Accelerated life test, 193 Actual age, 22 Additional warranty, 34 Adjustment factor, 120 Age, 6 Age replacement, 3 Age-based, 252 Age-correcting factor, 43 Age-dependent, 49 Age-reducing repair, 43 Aging process, 81 Allowable inventory time, 69 Alternating renewal process, 7 Arrival rate, 56 As good-as-new, 28 As-bad-as-old, 41 Availability, 28

B Baldi–Chauvin algorithm, 210 Bathtub, 28 Bathtub-shape, 192 Baum–Welch algorithm, 210 Bayes’ Theorem, 214 Bayesian analysis, 51 Better-than-perfect maintenance, 129 Bivariate imperfect repair, 41

Block replacement, 3 Breakdown, 6 Burn-in, 26 Burn-in cost, 27 Burn-in time, 26

C Carrying cost, 17 Catastrophic failure, 16 Complete overhaul, 118 Complete repair, 26 Component, 4 Condition based maintenance, 129 Constant Probability Model, 189 Continuous discounting, 50 Coordinated group replacement, 86 Corrective maintenance, 3 Corrective replacement, 17 Cost effectiveness, 71 Cost function, 3 Cost variability, 8 Counting process, 86 Covariate, 225 Critical component, 112 Cumulative hazard rate, 15 Customer support, 225 Cycle, 4

D Damage, 91 Defective, 22 Degradation, 8 Degree of deterioration, 30 Degree of repair, 207 Delayed corrective replacement, 17

271

272

D (cont.) Delayed preventive replacement, 17 Demand rate, 20 Dependability, 231 Dependability characteristics, 225 Dependent failure mode, 115 Descriptive models, 4 Deteriorating system, 60 Deterioration, 5 Discounted cost, 24 Discounting factor, 30 Discrete discounting, 57 Doob decomposition, 212 Downtime cost, 32 Dynamic programming, 210 Dynamic repair cost limit, 14

E Economic Order Quantity, 16 Economic Production Quantity, 16 Effective age, 119 Efficiency, 230 Environmental Stress Screening, 201 Estimation, 210 Expectation–maximization algorithm, 207 Expected cost rate, 13 Expedited order, 56 Exponential, 255 External failure, 47

F Failed component, 4 Failure, 4 Failure cost, 61 Failure mode, 115, 117, 119, 121, 123, 125 Failure prediction, 235 Failure process, 102 Failure rate, 5 Failure rate reduction, 136 Failure search, 81 Failure time, 6 Failure time distribution, 11 Failure type, 5 Filter recursion, 217 Filtering, 207, 209, 211, 213, 215, 217, 219, 221 Finite horizon, 87 Forecasting, 225, 227–238, 231–232, 234, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, 263, 265, 267, 269

Index Foreword, vii Forward algorithm, 210 Forward–backward algorithm, 210 Fubini’s theorem, 106

G General failure, 185 General renewal process, 53 General repair, 42 Geometric process, 60 Great deluge, 51 Group maintenance, 83

H Hazard rate, 12 Hidden failure, 132 Hidden Markov model, 207 Holding cost, 20 Hybrid parameter, 134 Hypothesis, 207

I Imperfect inspection, 15 Imperfect maintenance, 5 Imperfect production, 20 Imperfect repair, 5 Improvement factor, 42 Improving system, 40 In-control, 74 Incomplete-data, 52 Independent failure mode, 115 Infinite horizon, 9 Information-based model, 101–102, 105, 107, 109, 111, 113 Inspection, 4 Inspection cost, 15 Inspection error, 21 Inspection frequency, 75 Inspection interval, 21 Inspection time, 74 Instantaneous repair, 104 Integer programming, 22 Integrable counting process Integrated model, 20 Intensity rate, 23 Internal failure, 47 Inventory, 16 Inventory control, 16 Inverse linear, 256

Index J Job shop, 21

K Kijima model, 219 Kolmogorov Extension Theorem, 213

L Lack of perfection factor, 42 Lead-time, 16 Learning, 47 Lease contract, 46 Lease period, 46 Leased equipment, 46 Lebesgue measure, 189 Life cycle, 30 Lifetime, 7 Likelihood function, 53 Line Repairable Unit, 260 Linear, 255 Linear model, 137 Location adjustment parameter, 134 Logistics, 226 Lot-size, 19

M M-ary detection, 207, 209, 211, 213, 215, 217, 219, 221 Maintainability, 232 Maintainable failure, 80 Maintenance, 4 Maintenance opportunities, 45 Maintenance period, 30 Major breakdown, 15 Major failure, 53 Major overhaul, 6 Major repair, 20 Management, 225 Management risk, 9 Markov, 39 Markov chain, 51, 83 Markov decision process, 109 Markovian arrival process, 47 Martingale, 210 Mathematical programming, 50 Maximum likelihood estimate, 52 Measure, 212 Minimal maintenance, 130 Minimal repair, 3

273 Minimal repair process, 104 Minor failure, 16 Minor maintenance, 5 Minor overhaul, 6 Monotone process, 8 Monotone system, 102 Monte Carlo simulation, 52 Multi-component, 47 Multi-unit, 47

N N-policy, 12 New, 4 Non-homogeneous Poisson process, 3 Non-maintainable failure, 80 Non-parametric, 24 Non-renewing warranty, 28 Non-repairable, 181 Non-stationary, 29 Nonlinear model, 137 Normal distribution, 1449

O Objective function, 10 Operating cost, 20 Operational cost, 8 Opportunity-based replacement, 44 Optimal design, 85 Optimal ordering policy, 23 Optimal period, 143 Optimal replacement times, 3 Optimization, 4 Ordering cost, 16 Out-of-control, 74 Out-of-warranty replacement, 32 Outsourcing, 47 Overhaul, 5

P Parameter of rejuvenation, 42 Parameter updating, 214 Part Stress Analysis, 238 Partial likelihood, 255 Parts Count, 238 Perfect maintenance, 129 Perfect repair, 7 Performance indicator, 129 Periodic overhaul, 80 Periodic replacement, 6 Phase-type distribution, 47

274

P (cont.) Planned maintenance, 5 Planned replacement, 23 Point process, 7 Poisson failures, 39 Poisson process, 245 Policy, 5 Post warranty cost, 49 Posterior probability, 210 Predictable, 212 Predictor, 51 Prescriptive models, 4 Preventive maintenance, 5 Preventive replacement, 17 Preventive replacement cost, 20 Prior distribution, 51 Product support, 225–226, 229–230, 232–233, 235, 239, 242–243, 245, 247, 249, 251, 253, 255, 257, 260–261, 263, 265, 268, 269 Production, 18 Production rate, 20 Production resumption cost, 20 Production scheduling, 22 Profit, 56 Proportional hazard, 52 Purchasing cost, 34

Q Quasi-renewal process, 136

R Random, 5 Random delivery time, 18 Random failure, 207 Random repair cost, 15 Rebate, 34 Reference probability, 213 Region, 163 Regression, 237 Regular order, 56 Reliability, 4 Reliability based design, 82 Reliability centered maintenance, 131 Reliability engineering, 115 Reliability prediction, 237 Renewal cycle, 35 Renewal function, 31 Renewal interval, 8 Renewal point, 54 Renewal process, 7

Index Renewal reward, 8 Renewing warranty, 28 Repair, 4 Repair cost, 14 Repair cost limit, 14 Repair services, 225 Repair time, 19 Repairable system Replacement, 3 Replacement cost, 12 Replacement cycle, 8 Replacement period, 15 Residual life, 30 Residual warranty period, 30 Residual warranty time, 4 Restoration, 42 Restoration factor, 135 Restoration interval, 57 Revenue, 129 Risk based maintenance, 129 Risk factor, 241 Risk neutral, 34

S Safety, 230 Safety critical, 44 Safety probability limit, 44 Salvage value, 72 Scale adjustment parameter, 134 Scale parameter, 166 Schedule, 5 Scheduled maintenance, 5 Scheduled replacement, 17 Semimartingale decomposition, 212 Separation Theorem, 210 Sequential hypothesis testing, 210 Sequential policy, 90 Servicing cost, 30 Servicing strategy, 163 Set-up cost, 21 Shape parameter, 172 Shock, 23 Shortage, 22 Shortage cost, 22 Shut down, 23 Sloppy repair, 208 Smoothed perturbation analysis, 52 Sojourn time, 216 Spare part, 225–226, 228, 231, 233–234, 236, 241, 244, 246, 250, 252, 254, 257, 259, 262, 264, 266–267, 269 Standard warranty, 29

Index State process, 108 Stationary, 29 Stochastic process, 207 Stopping time, 104 Strategy, 31 Stress-strength, 238 Sub-region, 168 Subsidiary failures, 82 Supportability, 227 Survival period, 31 System down, 19 System downtime, 25 System failure, 7

T T-policy, 12 Temporal pattern recognition, 210 Time horizon, 11 Time to failure, 14 Top-down methodology, 129 Total productive maintenance, 129 Transition probability matrix, 211 Two-dimensional, 8 Two-dimensional warranty, 37 Two-phase maintenance, 83 Two-phase warranty, 31

275 U Unplanned replacement, 24 Unplanned stoppage, 225 Unreliable, 56 Upper bound, 14 Upper record, 52 Usage rate, 164 Used, 4 Utility, 34 Utilization, 226

V Virtual age, 42 Virtual age process, 42 Viterbi algorithm, 210

W Warranted, 28 Warranty, 28 Warranty cost, 30 Warranty expiration, 30 Warranty period, 28 Warranty reserve, 30 Wear-out, 141 Weibull, 261