ADVANCES IN MATHEMATICAL MODELING FOR RELIABILITY
This page intentionally left blank
Advances in Mathematical Modeling for Reliability
Edited by
Tim Bedford John Quigley Lesley Walls Babakalli Alkali Alireza Daneshkhah and
Gavin Hardman Department of Management Science, University of Strathclyde, Glasgow, UK
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2008 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-58603-865-6 Library of Congress Control Number: 2008926420 Published by IOS Press under the imprint Delft University Press Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail:
[email protected]
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
v
Introduction The Mathematical Methods in Reliability conferences serve as a forum for discussing fundamental issues on mathematical modeling in reliability theory and its applications. It is a forum that brings together mathematicians, probabilists, statisticians, and computer scientists with a central focus upon reliability. The University of Strathclyde hosted the fifth in the series of conferences in Glasgow in 2007. Previous conferences were held in Bucharest, Romania, in Bordeaux, France, in Trondheim, Norway, and in Sante Fe, New Mexico, USA. This book contains a selection of papers originally presented at the conference and now made available to a wider audience in revised form. The book has been organized into a number of sections that represent different themes from the meeting, and important current research areas within the overall area.
1. Graphical Modeling and Bayesian Networks Graphical methods are becoming increasing popular for modeling and supporting the computation of the reliability of complex systems. The papers within this section address a number of challenges currently facing these methods. Langseth provides a brief review of the state of the art of Bayesian Networks in relation to reliability and then focuses on the current challenges of modeling continuous variables within this framework. Hanea and Kurowicka extend the theory for non-parametric continuous Bayesian Networks to include ordinal discrete random variables, where dependence is measured through rank correlations. Donat, Bouillaut and Leray develop a Bayesian Network approach to capture reliability that is changing dynamically. Jonczy and Haenni develop a method using propositional directed acyclic graphs to represent the structure function and hence facilitate the computation of the reliability of networks.
2. Repairable Systems Modeling One of the fundamental problems in reliability is to find adequate models for failure and repair processes. Simple renewal models provide familiar examples to students of probability and reliability, and provide the basic building blocks for many commercial simulation packages. However, such models do not come near to describing the complex interactions between failure and repair. The paper of Kahle looks at the way (possibly) incomplete repair interacts with the failure process through a Kijima type process. Often the overall failure repair process in real systems follows a homogeneous Poisson process, and Kahle shows that maintenance schedules can be constructed to generate this type of output. Volf looks at models where degradation is modeled through a number of shocks or some other random process, and considers how one can choose optimal repair policies that
vi
Advances in Mathematical Modeling for Reliability
stabilize the equipment hazard rate. Finally, Daneshkhah and Bedford show how Gaussian emulators can be used to perform computations of availability. A major problem in practice is to understand how sensitive availability calculations are to parameter choices, and emulators provide the potential to perform such calculations on complicated systems to a fair degree of accuracy and in a computationally efficient manner.
3. Competing Risk Competing risks arise in reliability and maintenance analysis through the ways in which data is censored. Rather than getting “pure” failure data we usually have a messy mixture of data, for there may be many different reasons for taking equipment offline and bringing it back to “as new”, or at least in an improved state. A competing risk model is used to model the times at which such failure causes would be realized, taking into account possible interdependencies between them. There has been a growing interest in competing risk modeling over the last 10-15 years, and the papers presented here demonstrate this. Dewan looks at the interrelationship between various kinds of independence assumptions in competing risk modeling. Sankaran and Ansa consider the problem in which the failure cause is sometimes masked, and additional testing might be required to find the true failure cause. The final two papers of this section move from IID models of Competing Risk to take a point process perspective. Lindqvist surveys a number of recent papers on this topic and discussing the benefits of moving to this wider framework. Finally Dijoux, Doyen and Gaudoin generalize the “usual” independent competing model theory for IID and show that in the point process generalization one can properly formulate and solve the corresponding identifiability issues.
4. Mixture Failure Rate Modeling Mixture models provide a means of analyzing reliability problems where there exist, for example, multiple failure modes or heterogeneous populations. It will not always be possible to observe all factors influencing the time to event occurrence, hence a random effect, called a frailty, can be included in the model. A frailty is an unobserved proportionality factor that modifies the hazard function of an item or group of items. Frailty models can be classed as univariate, when there is a single survival endpoint, or multivariate, when there are multiple survival endpoints such as under competing risks or recurrent event processes. There is much interest in modeling mixtures and frailty in survival analysis. We include two papers in this area. Finkelstein and Esaulova derive the asymptotic properties of a bivariate competing risks model, where the lifetime of each component is indexed by a frailty parameter and, under the assumption of conditional independence of the components, the correlated frailty model is considered. The other paper, due to Bad´a and Berrade, aims to give insights into the properties of the reversed hazard rate, defined as the ratio of the density to the distribution function, and the mean inactivity time in the context of mixtures of distributions.
Introduction
vii
5. Signature The signature of a system refers to a vector where the i-th element is the probability that the system fails upon the realization of i components. The Samaniego representation of the failure time of a system distinguishes the properties of the system through the signature from the probability distributions on the lifetime of the components. Such a representation is effective for comparing the reliability of different systems. This section of papers is concerned with developments of Samaniego representation. Rychlik develops bounds for the distributions and moments of coherent system lifetimes. Triantafyllou and Koutras develop methods to facilitate the calculation of the signature of a system through generating functions. Hollander and Samaniego develop a new signature based metric for comparing the reliability of systems. An important generalization of the concept of independence is that of exchangeability. This assumption is key to Bayesian and subjectivist modeling approaches. The paper of Spizzichino considers symmetry properties arising as a result of exchangeability and discusses generalizations to non-exchangeable systems.
6. Relations among Aging and Stochastic Dependence Aging properties have always played an important role in reliability theory, with a multiplicity of concepts available to describe subtle differences in aging behavior. A particularly interesting development is to place such aging concepts in a multivariate context, and consider how multiple components (or multiple failure modes) interact. The paper of Spizzichino and Suter looks at aging and dependence for generalizations of the Marshall-Olkin model. Their work develops closure results for survival copulas in certain classes with specified aging properties. Belzunce, Mulero and Ruiz develop new variants on multivariate increasing failure rate (IFR) and decreasing mean residual life (DMRL) notions. Some of the basic properties and relationships between these definitions are given.
7. Theoretical Advances in Modeling, Inference and Computation This collection of papers is concerned with developments in modeling, inference and computation for reliability assessment. Ruggeri and Soyer develop hidden Markov modeling approaches and self exciting point process models to address the issue of imperfect reliability development of software. Huseby extends the use of matroid theory to directed network graphs and derives results to facilitate the calculation of the structure function. Coolen and Coolen-Schrijner extend nonparametric predictive inference techniques to address k-out-of-m systems.
8. Recent Advances in Recurrent Event Modeling and Inference Recurrent event processes correspond to those processes where repeated events are generated over time. In reliability and maintenance, recurrent event processes
viii
Advances in Mathematical Modeling for Reliability
may correspond to failure events of repaired systems, processes for detection and removal of software faults, filing of warranty claims for products and so forth. Common objectives for recurrent event analysis includes describing the individual event processes, characterizing variation across processes, determining the relationship of external factors on the pattern of event occurrence and modeling multi-state event data. Model classes include Poisson, renewal and intensity-based for which a variety of parametric, semi-parametric and non-parametric inference is being developed. There has been growing interest in recurrent event analysis and modeling in reliability, medical and related fields as the papers presented here demonstrate. Adekpedjou, Quiton and Pe˜ na consider the problem of detecting outlying inter-event times and examine the impact of an informative monitoring period in terms of loss of statistical efficiency. Mercier and Roussignol study and compute the first-order derivatives for some functional of a piece-wise deterministic Markov process, used to describe the time-evolution of a system, to support sensitivity analysis in dynamic reliability. Lisnianski considers a multi-state system with a range of performance levels which are observed together with the times at which the system makes a transition in performance state and provides a method for estimating the transition intensities under the assumption that the underlying model is Markovian. Finally, van der Weide, van Noortwijk and Suyono present new results in renewal theory with costs that can be discounted according to any discount function which is non-increasing and monotonic over time.
Acknowledgments The organization of the conference was made possible by the hard work of a number of different people working at Strathclyde: Anisah Abdullah, Babakalli Alkali, Samaneh Balali, Tim Bedford, Richard Burnham, Daosheng Cheng, Alireza Daneshkhah, Gavin Hardman, Kenneth Hutchison, Alison Kerr, Haiying Nan, John Quigley, Matthew Revie, Caroline Sisi, Lesley Walls, Bram Wisse. The conference itself was sponsored by the University of Strathclyde, Glasgow City Council and Scottish Power, whom we thank for their contributions to the event.
ix
Contents Introduction
v
1. Graphical Modeling and Bayesian Networks Bayesian Networks in Reliability: The Good, the Bad, and the Ugly H. Langseth
1
Mixed Non-Parametric Continuous and Discrete Bayesian Belief Nets A. Hanea & D. Kurowicka
9
A Dynamic Graphical Model to Represent Complex Survival Distributions R. Donat, L. Bouillaut, P. Aknin & P. Leray
17
Network Reliability Evaluation with Propositional Directed Acyclic Graphs J. Jonczy & R. Haenni
25
2. Repairable Systems Modeling Some Properties of Incomplete Repair and Maintenance Models W. Kahle
32
On Models of Degradation and Partial Repairs P. Volf
39
Sensitivity Analysis of a Reliability System Using Gaussian Processes A. Daneshkhah & T. Bedford
46
3. Competing Risks On Independence of Competing Risks I. Dewan
63
Bivariate Competing Risks Models Under Masked Causes of Failure P.G. Sankaran & A.A. Ansa
72
Competing Risks in Repairable Systems B.H. Lindqvist
80
Conditionally Independent Generalized Competing Risks for Maintenance Analysis Y. Dijoux, L. Doyen & O. Gaudoin
88
4. Mixture Failure Rate Modeling Asymptotic Properties of Bivariate Competing Risks Models M. Finkelstein & V. Esaulova On the Reversed Hazard Rate and Mean Inactivity Time of Mixtures F.G. Badía & M.D. Berrade
96 103
x
5. Signature Bounds on Lifetimes of Coherent Systems with Exchangeable Components T. Rychlik On the Signature of Coherent Systems and Applications for Consecutive k-out-of-n:F Systems I.S. Triantafyllou & M.V. Koutras The Use of Stochastic Precedence in the Comparison of Engineered Systems M. Hollander & F.J. Samaniego The Role of Signature and Symmetrization for Systems with Non-Exchangeable Components F. Spizzichino
111
119 129
138
6. Relations Among Aging and Stochastic Dependence Generalized Marshall-Olkin Models: Aging and Dependence Properties F. Spizzichino & F. Suter New Multivariate IFR and DMRL Notions for Exchangeable Dependent Components F. Belzunce, J. Mulero & J.-M. Ruiz
149
158
7. Theoretical Advances in Modeling, Inference and Computation Advances in Bayesian Software Reliability Modeling F. Ruggeri & R. Soyer
165
Signed Domination of Oriented Matroid Systems A.B. Huseby
177
Nonparametric Predictive Inference for k-out-of-m Systems F.P.A. Coolen & P. Coolen-Schrijner
185
8. Recent Advances in Recurrent Event Modeling and Inference Some Aspects Pertaining to Recurrent Event Modeling and Analysis A. Adekpedjou, J. Quiton & E.A. Peña
193
Sensitivity Estimates in Dynamic Reliability S. Mercier & M. Roussignol
208
Renewal Theory with Discounting J.A.M. van der Weide, J.M. van Noortwijk & Suyono
217
Point Estimation of the Transition Intensities for a Markov Multi-State System via Output Performance Observation A. Lisnianski
227
Keyword Index
235
Author Index
237
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
1
Bayesian Networks in Reliability: The Good, the Bad, and the Ugly Helge LANGSETH Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491 Trondheim, Norway; E-mail:
[email protected] Abstract. Bayesian network (BN) models gain more and more popularity as a tool in reliability analysis. In this paper we consider some of the properties of BNs that have made them popular, consider some of the recent developments, and also point to the most important remaining challenges when using BNs in reliability. Keywords. Bayesian nets, reliability analysis, inference, hybrid models
1. The Good: The Foundation of Bayesian Networks A Bayesian Network (BN), [20,15], is a compact representation of a multivariate statistical distribution function. A BN encodes the probability density function governing a set of random variables {X1 , . . . , Xn } by specifying a set of conditional independence statements together with a set of conditional probability functions. More specifically, a BN consists of a qualitative part, a directed acyclic graph where the nodes mirror the random variables Xi , and a quantitative part, the set of conditional probability functions. An example of a BN over the variables {X1 , . . . , X5 } is shown in Figure 1, only the qualitative part is given. We call the nodes with outgoing edges pointing into a specific node the parents of that node, and say that Xj is a descendant of Xi if and only if there exists a directed path from Xi to Xj in the graph. In Figure 1, X1 and X2 are the parents of X3 , written pa (X3 ) = {X1 , X2 } for short. Furthermore, pa (X4 ) = {X3 } and since there are no directed path from X4 to any of the other nodes, the descendants of X4 are given by the empty set and, accordingly, its non-descendants are {X1 , X2 , X3 , X5 }. The edges of the graph represents the assertion that a variable is conditionally independent of its non-descendants in the graph given its parents in the same graph; other conditional independence statements can be read off the graph by using the rules of dseparation [20]. The graph in Figure 1 does for instance assert that for all distributions compatible with it, we have that X4 is conditionally independent of {X1 , X2 , X5 } when conditioned on X3 . When it comes to the quantitative part, each variable is described by the conditional probability function of that variable given the parents in the graph, i.e., the collection of conditional probability functions {f (xi |pa (xi ))}ni=1 is required. The underlying assumptions of conditional independence encoded in the graph allow us to calculate the joint probability function as
Graphical Modeling and Bayesian Networks
2
X1 X2
X3
X4
X5
Figure 1. An example BN over the nodes {X1 , . . . , X5 }. Only the qualitative part of the BN is shown.
f (x1 , , . . . , xn ) =
n
f (xi |pa (xi )).
(1)
i=1
BNs originated in the field of Artificial Intelligence, where it was used as a robust and efficient framework for reasoning with uncertain knowledge. The history of BNs in reliability can (at least) be traced back to [2] and [1]; the first real attempt to merge the efforts of the two communities is probably the work of [1], where he proposes the use of the G RAPHICAL -B ELIEF tool for calculating reliability measures concerning a low pressure coolant injection system for a nuclear reactor. Reliability analysts are more and more frequently choosing to use BNs as their modeling framework. Their choice is partly motivated by BNs being particularly easy to use in interaction with domain experts, a feature of high importance also in the reliability field [22]. This ease of use is obtained by seeing the BN as a model of causal influence, and although this interpretation is not necessarily correct in general, it can be defended if some additional assumptions are made [21]. Finally, it is worth noticing that BNs constitute a flexible class of models, as any joint statistical distribution can be represented by a BN. This can, for instance, be utilized to extend traditional fault-tree models to incorporate dependence between basic events (e.g., common-cause failures) [16]. The sound mathematical foundation, the ease of interpretation, and the usefulness in applications are “the good features” of Bayesian Networks.
2. The Bad: Building Quantitative Models Bayesian networks are quantitative stochastic models, and therefore requires quantitative parameters to be fully specified. This is obviously not a particularity for Bayesian networks, but since building the BN structure is such a simple and intuitive procedure, the burden of eliciting the quantitative part of the BN from experts often comes as a surprise to the reliability analyst. We therefore consider this to be “the bad part” of BNs’ usage in reliability. To elicit the quantitative part from experts, one must acquire all conditional distributions {f (xi |pa (xi ))}ni=1 in Equation (1). To get a feeling for the assessment burden, consider Figure 1, and assume all variables are discrete with k states. We now need to quantify qi = (k − 1) · k |pa(xi )|
(2)
parameters to specify f (xi |pa (xi )) for a fixed variable xi , and therefore q = i qi to specify the full model. In total we need q = 11 parameters if k = 2 and 1 179 parameters
Bayesian Networks in Reliability - H. Langseth
3
if k = 10. Although the last number may be too large to handle the individual parameters in detail, the BN still attempts to keep the knowledge acquisition burden as low as possible (through the factorized representation of Equation (1)). If we had not utilized the BN structure, the full joint distribution would require q = 31 (q = 99 999) parameters for k = 2 (k = 10). The parametrization is however not optimized; it is merely defined to be sufficient to encode any distribution compatible with the conditional independence statements encoded in the graph. Many researchers have therefore explored even more cost-efficient representations, including the deterministic relations, noisy-OR relations [12] and general independence of causal influence models [6], logistic regression, and the IPF procedure [25]. Finally, vines have been proposed as another natural modeling framework for the reliability analyst [4]. Using vines can dramatically simplify the elicitation of the quantitative parameters. Conditional rank correlations (realized by copulas) model the dependence structure among the variables, and is therefore the fundamental quantitative input when modeling with vines. Recent developments by Hanea and Kurowicka [11] extend these idea. The authors show how one can build non-parametric Bayesian networks (containing both discrete and continuous variables) while still using conditional rank correlations to define the quantitative part of the model.
3. The Ugly: Hybrid Models 3.1. Background BNs have found applications in domains like, e.g., software reliability [8], fault finding systems [13], and structural reliability [5], see [16] for an overview. A characteristic feature of these problem domains is that all variables are discrete (e.g., the variables’ states are {failed, operating}). The preference for discrete variables in the BN community is mainly due to the technicalities of the calculation scheme. BNs are equipped with efficient algorithms for calculating arbitrary marginal distributions, say, f (xi , xj , xk ) as well as conditional distributions, say, f (xi , xj |xk , x ), but the base algorithm only works as long as all variables are discrete [14]1 . We note that the BNs’ applicability in reliability analysis would be enormously limited if one would only consider discrete variables, and that the simplicity of making Bayesian network models does not go well together with the difficulties of inference in the models. Finding robust and computationally efficient inference techniques applicable for BNs containing both continuous and discrete variables (so-called hybrid BNs) is therefore a hot research area. Overviews of the current usage of BNs among practitioners show that the available techniques are relatively unknown however, and many consider the poor treatment of hybrid BNs the missing ingredient for BNs to become even more popular in the reliability community. We therefore dub hybrid models “the ugly part” of using BNs in reliability. 1 Some models containing discrete and Gaussian variables can also be handled. However, these models, called conditional Gaussian (CG) distributions [17], impose modeling restrictions we would like to avoid, and are therefore not considered here.
Graphical Modeling and Bayesian Networks
4
3.2. An example model We will consider a very simple hybrid BN to exemplify why inference in hybrid BNs is difficult, and to show how approximate techniques can be used. The model is shown in Figure 2, where we have 4 binary variables (T1 , . . . , T4 ) and two continuous variables (Z1 and Z2 ).
T1
Z1
Z2
T2
T3
T4
Figure 2. A model for the analysis of human reliability. A subject’s ability to perform four different tasks T1 , . . . , T4 are influenced by the two explanatory variables Z1 and Z2 . The explanatory variables are drawn with double-line to signify that these variables are continuous.
This model, which can be interpreted as a factor analyzer for binary data, was called a latent trait model in [3]. In reliability, similar models can be used to predict humans’ ability to perform some tasks in a given environment (we are extending ideas from the THERP methodology2 here). With this interpretation, Ti is a person’s ability to correctly perform task i (i = 1, . . . , 4) and Ti takes on the values 1 (“success”) or 0 (“failure”). Each Ti is influenced by a set of explanatory variables, Zj , j = 1, 2. The goal of the model is to quantify the effect the explanatory variables have on the observable ones, and to predict a subject’s ability to perform the tasks T1 , . . . , T4 . We have a mixture of both discrete and continuous variables in this model, and this will eventually lead to problems when trying to use this model for inference. Assume first that the explanatory variables are used to model the environment, that the environment can be considered constant between subjects, and that it can be disclosed in advance (that is, the variables are observed before inference is performed). An example of such a factor can for instance be “Lack of lighting”, with the assumption that the luminous flux can be measured in advance, and that it affects different people in the same way. Each Ti is given by logistic regression, meaning that we have P (Ti = 1|z) = −1 (1 + exp (−wi z)) for a given set of weights wi . Here element j of wi quantifies how covariate j influences a person’s ability to perform task i. As long as Z is observed, this is a simple generalized linear model, where Tk is conditionally independent of Tl given Z. Therefore, inference in this model can be handled; note that Z simply can be regarded as a tool to fill in the probability tables for each Ti in this case. Next, assume that some of the explanatory variables are used to model subjectspecific properties, like a subject’s likelihood for omitting a step in a procedure (this is one of the explanatory variables often used when employing the THERP methodology, see, e.g., [23]). It seems natural to assume that these explanatory variables are unobserved, and for the case of simplicity, to give them Gaussian distributions a priori, Zj ∼ N (μj , σj2 ). 2 THERP:
Technique for Human Error Rate Prediction [24]
Bayesian Networks in Reliability - H. Langseth 3
3
3
2
2
2
1
1
1
0
0
0
−1
−1
−1
−2
−3 −3
−2
−2
−1
0
1
2
3
−3 −3
(a) “Exact” results
−2
−2
−1
0
1
2
3
−3 −3
(b) Discretize(5) 3
3
2
2
2
1
1
1
0
0
0
−1
−1
−1
−3 −3
−2
−2
−1
0
1
2
3
(d) MTEs
−3 −3
−2
−1
0
1
2
3
(c) Discretize(10)
3
−2
5
−2
−2
−1
0
1
2
3
(e) MCMC (10 )
3
−3 −3
−2
−1
0
1
2
3
6
(f) MCMC (10 )
Figure 3. Results of some approaches to approximate f (z|e); see text for details.
Assume that we are interested in calculating the likelihood of an observation e = {T1 = 1, T2 = 1, T3 = 1, T4 = 1} (i.e., Z is unobserved) as well as the joint posterior distribution f (z1 , z2 |e). It is straight forward to see that the likelihood is given by 2 exp − 2 (zj −μ2j ) j=1 2σj 1 P (e) = (3) dz, 2πσ1 σ2 R2 4i=1 1 + exp(−w Ti z) but unfortunately this integral has no known analytic representation in general. Hence, we cannot calculate the likelihood of the observation in this model. Note that this a consequence not of the modeling language (the use of a BN), but of the model itself. For the rest of this section we will consider some of the simpler schemes for approximating the calculations in Equation (3). 3.3. Approximative inference We will use the model in Figure 2 as our running example, and for each approximative method we will calculate both the likelihood of the observation, P (e), as well as the posterior distribution over the explanatory variables, f (z|e). In Figure 3 part (a) we have calculated this posterior using a numerical integration scheme (with 200 × 200 integration grid) for obtaining what we will consider the gold-standard
solution to what w1 w 2 w3 w 4 = the posterior should look like. The parameters in the example are
+2 +1 −1 +1 , μ1 = μ2 = 0, and σ12 = σ22 = 1, which gave a likelihood of 0.0695. −1 +1 +1 +2 There are several different approaches to efficient inference in hybrid BNs. At a high level, we may divide them into two categories: 1. Change the distribution functions, so that the mathematical operations (e.g., the integral in Equation (3)) can be handled analytically. 2. Approximate the difficult operations by sampling or some other numerical method.
6
Graphical Modeling and Bayesian Networks
The simplest approach to continuous variables is to simply discretize them. The idea is to divide the support of any distribution function into r intervals Ik , k = 1, . . . , r, and thereby translate the hybrid BN into a discrete one. A random variable T ∈ [a, b] would then be recoded into a new variable, say T ; T ∈ {Low : T ∈ I1 , Medium : T ∈ I2 , High : T ∈ I3 } if we chose to use r = 3 intervals. Mathematically, this corresponds to replacing the density functions in Equation (3) with piecewise constant approximations. The granularity of the model is controlled by choosing a “good” number of states for the discrete variable; larger number of states will improve model expressibility, but at the expense of computational efficiency (very simply put, the computational burden grows at least as fast as the number of cells in each conditional probability table, refer to Equation (2), and in practice much faster). Attempts to automatically choose the number of states have been developed, see, e.g., [19]. Figure 3 (b) gives the joint posterior when each continuous variable is discretized into 5 states, and Figure 3 (c) gives the same for 10-state discretization. The calculated likelihoods were 0.0686 and 0.0694, respectively. Moral et al. [18] developed a framework for approximating any hybrid distribution arbitrarily well by employing mixtures of truncated exponential (MTE) distributions, and they also showed how the BNs’ efficient calculation scheme can be extended to handle the MTE distributions. The main idea is again to divide the support of any distribution function into intervals Ik , and approximate each part of the distribution by a sum of truncated exponential functions; each exponential is linear in its argument: (k) f˜(x|θ) = a0 (θ) +
m
(k) (k) ai (θ) exp bi (θ)x for x ∈ Ik .
i=1
We typically see 1 ≤ r ≤ 4 and 0 ≤ m ≤ 2 in applications; notice that setting m = 0 gives us the standard discretization. Clever choices for values of the parameters (k) (k) (k) {a0 (θ), ai (θ), bi (θ)} to make f˜ as close as possible to the original distribution (in the KL-sense) are tabulated for many standard distributions in [7]. Parameter values from [7] were also used to generate the results in Figure 3 (d); each distribution was divided into 2 intervals, and within each interval a sum of m = 2 exponential terms were used3 . The likelihood was calculated as 0.0695, and the approximation to the joint f (z|e) is also extraordinary good, even in the tails of the distribution. Markov Chain Monte Carlo [9] is a sampling scheme for approximating any distribution by simulation4. Of particular interest for the BN community is BUGS [10], which is a modeling language that takes as its input a BN model, and estimates any posterior probability from this model using sampling. The approach is very general, and proceeds by generating random samples from the target distribution (this can be done even when the algebraic form of the function is unknown), and then approximate the target distribution by the empirical distribution of the samples. For the results given in Figure 3 (e) and (f), 103 and 106 samples were generated from f (z|e) respectively, giving likelihoods of 0.0690 (103 samples) and 0.0692 (106 samples). If we compare the results, we see that the likelihoods calculated by the different methods are roughly the same, and comparable to the “exact” result. However, Figure 3 shows fairly large differences between the approximation of f (z|e) given by the three 3 Note that this means that the density of each Gaussian, which is quadratic in the exponential function, is approximated by sums of terms that are linear in the exponential function. 4 Also the non-parametric hybrid BNs of [11] lend themselves to inference by Markov Chain Monte Carlo.
Bayesian Networks in Reliability - H. Langseth
7
mentioned methods. The quality of the approximation is far better using MTEs than if we use standard discretization. This is particularly true in the tails of the distribution, and since reliability analysts often are considering infrequent events, this is an important finding for practitioners in reliability. MCMC simulation is a very popular method for approximate inference, but care must be taken so that enough samples are used to achieve high quality approximations of the tails of the distributions.
4. Discussion In this paper we have briefly described why BNs have become a popular modeling framework for reliability analysts. The main reasons are (in our experience) the intuitive representation and the modeling flexibility of BNs; these properties make a BN a well-suited tool for cooperating with domain experts. Furthermore, the efficient calculation scheme is (if applicable) an advantage when building complex models; real life models containing thousands of variables are not uncommon, and speed of inference is therefore of utter importance. We then turned to challenges when using BNs, and pinpointed the quantification process as the bottleneck of the modeling phase. We also touched briefly upon inference in hybrid models, which by many is seen as the Achilles’ heal of the BN framework.
References [1] [2]
[3] [4] [5]
[6]
[7] [8]
[9] [10] [11] [12]
R. G. Almond, An extended example for testing GRAPHICAL - BELIEF , Technical Report 6, Statistical Sciences Inc., Seattle, WA, 1992. R. E. Barlow, Using influence diagrams. In Carlo A. Clarotti and Dennis V. Lindley, editors, Accelerated life testing and experts’ opinions in reliability, number 102 in Enrico Fermi International School of Physics, pages 145–157, North-Holland, 1988. Elsevier Science Publishers B. V. D. J. Bartholomew, Latent Variable Models and Factor Analysis, Charles Griffin & Co., London, UK, 1987. T. J. Bedford and R. M. Cooke, Vines - a new graphical model for dependent random variables, The Annals of Statistics, 40(4) (2002), 1031–1068. A. Bobbio, L. Portinale, M. Minichino and E. Ciancamerla, Improving the analysis of dependable systems by mapping fault trees into Bayesian networks, Reliability Engineering and System Safety, 71(3) (2001) 249–260. C. Boutilier, N. Friedman, M. Goldszmidt and D. Koller, Context-specific independence in Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 115– 123, San Francisco, CA., 1996. B. R. Cobb, P. P. Shenoy and R. Rumí, Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials, Statistics and Computing, 46(3) (2006), 293–308. N. E. Fenton, B. Littlewood, M. Neil, L. Strigini, A. Sutcliffe and D. Wright, Assessing dependability of safety critical systems using diverse evidence, IEE Proceedings Software Engineering, 145(1) (1998) 35–39. W. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in practice, Interdisciplinary Statistics. Chapman & Hall, London, UK, 1996. W. Gilks, A. Thomas, and D. J. Spiegelhalter, A language and program for complex Bayesian modeling. The Statistician, 43 (1994), 169–178. A. Hanea and D. Kurowicka, Mixed non-parametric continuous and discrete Bayesian belief nets, In this collection, 2008. D. Heckerman and J. S. Breese, A new look at causal independence. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 286–292, San Francisco, CA., 1994. Morgan Kaufmann Publishers.
8 [13]
[14] [15] [16] [17] [18]
[19] [20] [21] [22] [23] [24] [25]
Graphical Modeling and Bayesian Networks F. V. Jensen, U. Kjærulff, B. Kristiansen, H. Langseth, C. Skaanning, J. Vomlel and M. Vomlelová, The SACSO methodology for troubleshooting complex systems, Artificial Intelligence for Engineering, Design, Analysis and Manufacturing, 15(5) (2001), 321–333. F. V. Jensen, S. L. Lauritzen, and K. G. Olesen, Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quarterly, 4 (1990), 269–282. F. V. Jensen and T. D. Nielsen, Bayesian Networks and Decision Graphs, Springer-Verlag, Berlin, Germany, 2007. H. Langseth and L. Portinale, Bayesian networks in reliability, Reliability Engineering and System Safety, 92(1) (2007), 92–108. S. L. Lauritzen and N. Wermuth, Graphical models for associations between variables, some of which are quantitative and some qualitative, The Annals of Statistics, 17 (1989), 31–57. S. Moral, R. Rumí and A. Salmerón, Mixtures of truncated exponentials in hybrid Bayesian networks, In Sixth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, volume 2143 of Lecture Notes in Artificial Intelligence, pages 145–167, Springer-Verlag, Berlin, Germany, 2001. M. Neil, M. Tailor and D. Marquez, Inference in Bayesian networks using dynamic discretisation, Statistics and Computing, 17(3) (2007), 219–233. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, San Mateo, CA., 1988. J. Pearl, Causality – Models, Reasoning, and Inference, Cambridge University Press, Cambridge, UK, 2000. J. Sigurdsson, L. Walls and J. Quigley, Bayesian belief nets for managing expert judgment and modeling reliability. Quality and Reliability Engineering International, 17 (2001), 181–190. O. Sträter, Considerations on the elements of quantifying human reliability, Reliability Engineering and System Safety, 82(2) (2004), 255–264. A. D. Swain and H. E. Guttman, Handbook of human reliability analysis with emphasis on nuclear power plant applications, NUREG/CR 1278, Nuclear Regulatory Commission, Washington, D.C., 1983. J. Whittaker, Graphical models in applied multivariate statistics, John Wiley & Sons, Chichester, UK, 1990.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
9
Mixed Non-Parametric Continuous and Discrete Bayesian Belief Nets Anca HANEA 1 and Dorota KUROWICKA Delft Institute of Applied Mathematics, Delft University of Technology, The Netherlands Abstract. This paper introduces mixed non-parametric continuous and discrete Bayesian Belief Nets (BBNs) using the copula-vine modeling approach. We extend the theory for non-parametric continuous BBNs to include ordinal discrete random variables. The dependence structure among the variables is given in terms of (conditional) rank correlations. We use an adjusted rank correlation coefficient for discrete variables, and we emphasize the relationship between the rank correlation of two discrete variables and the rank correlation of their underlying uniforms. The approach presented in this paper is illustrated by means of an example. Keywords. Non-parametric Bayesian nets, copula, vines
Introduction Applications in various domains often lead to high dimensional dependence modeling. Problem owners are becoming increasingly sophisticated in reasoning with uncertainty. This motivates the development of generic tools, which can deal with two problems: uncertainty and complexity. Graphical models provide a general methodology for approaching these problems. A Bayesian belief net is one of the probabilistic graphical models, which encodes the probability density or mass function of a set of variables by specifying a number of conditional independence statements in a form of an acyclic directed graph and a set of probability functions. Our focus is on mixed non-parametric continuous and discrete BBNs. In a non-parametric continuous BBN, nodes are associated with arbitrary continuous invertible distribution functions and arcs with (conditional) rank correlations, which are realized by a copula with the zero independence property [1]. The (conditional) rank correlations assigned to the arcs are algebraically independent, and there are tested protocols for their use in structured expert judgment [2]. We note that quantifying BBNs in this way also requires assessing all (continuous, invertible) one dimensional marginal distributions. On the other hand, the dependence structure is meaningful for any such quantification, and need not be revised if the univariate distributions are changed. We extend this approach to include ordinal discrete random variables which can be written as monotone transforms of uniform variates. The dependence structure, however, must be defined with respect to the uniforms. The rank correlation of two discrete vari1 Corresponding Author: Anca Hanea, Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands; E-mail:
[email protected].
10
Graphical Modeling and Bayesian Networks
ables and the rank correlation of their underlying uniforms are not equal. Therefore one needs to study the relationship between these two rank correlations. The paper is organized as follows: Section 1 briefly introduces the normal copula vine modeling approach to non-parametric continuous BBNs [3] . Section 2 presents a correction for the population version of Spearman’s rank correlation coefficient r for discrete random variables, and describes the relationship between the rank correlation of two discrete variables and the rank correlation of their underlying uniforms [4]. An application model is presented in Section 3 and Section 4 presents conclusions.
1. Non-Parametric Continuous BBNs A continuous non-parametric BBN is a directed acyclic graph, together with a set of (conditional) rank correlations and a set of marginal distributions. Nodes are associated with arbitrary continuous invertible distribution functions. For each variable i with parents i1 ...ip(i) , we associate the arc ip(i)−k −→ i with the conditional rank correlation: r(i, ip(i) ), k=0 r(i, ip(i)−k |ip(i) , ..., ip(i)−k+1 ), 1 ≤ k ≤ p(i) − 1. The assignment is vacuous if {i1 ...ip(i) } = ∅. Hence, every arc in the BBN is assigned a (conditional) rank correlation between parent and child. These assignments are algebraically independent and they uniquely determine the joint distribution [5]. The proof of this fact is based on the close relationship between non-parametric BBNs and vines [6,7]. Using the same relationship, we use a sampling protocol based on vines to specify and analyze the BBN structure. Unfortunately, for sampling a large BBN structure with a general copula, extra calculations may be required. These calculations consist of numerical evaluations of multiple integrals, which are very time consuming. This disadvantage vanishes when using the normal copula. The details of the normal copula vine approach to non-parametric continuous BBNs are explained in [3]. In this paper we consider BBNs whose nodes represent both discrete and continuous variables. We enrich the theory of non-parametric continuous BBNs to incorporate discrete ordinal variables, i.e. variables that can be written as monotone transforms of uniforms variables. The dependence structure must be defined with respect to the underlying uniforms. The rank correlation of 2 discrete variables and the rank correlation of their underlying uniforms are not equal, hence one needs to establish the relationship between them.
2. Spearman’s Rank Correlation for Ordinal Discrete Random Variables The definition of the population version of Spearman’s rank correlation coefficient is given in terms of the probabilities of concordance and discordance (e.g., [8]), that we denote Pc , and Pd respectively. Consider a population distributed according to 2 variates X and Y. Two members (X1 , Y1 ) and (X2 , Y2 ) of the population will be called concordant if: X1 < X2 , Y1 < Y2 or X1 > X2 , Y1 > Y2 . They will be called discordant if: X1 < X2 , Y1 > Y2 or X1 >
Non-Parametric Continuous and Discrete BBNs - A. Hanea et al.
11
X2 , Y1 < Y2 . The population version of Spearman’s r is defined as r = 3 · (Pc − Pd ), where (X1 , Y1 ) has distribution FXY with marginal distributions FX and FY and X2 , Y2 are independent with distributions FX and FY . Moreover (X1 , Y1 ) and (X2 , Y2 ) are independent (e.g., [9]). The above definition is valid only for populations for which the probabilities of X1 = X2 and Y1 = Y2 are zero. In order to formulate the population version of Spearman’s rank correlation r, for discrete random variables, one needs to correct for the probabilities of X1 = X2 and Y1 = Y2 . This correction is derived in [4]. In this section we present the main results. Let us consider a discrete random vector (X1 , Y1 ) with distribution pij , i = 1, .., m; j = 1, .., n. We denote pi+ , i = 1, .., m the marginal distribution of X1 , and p+j , j = 1, .., n the margin of Y1 . Let (X2 , Y2 ) be another random vector, as in the definition of the population version of Spearman’s r, and let its distribution be denoted by qij , i = 1, .., m; j = 1, .., n. Each qij can be written as pi+ ·p+j . The adjusted rank correlation coefficient of 2 discrete variables X and Y is given by the following equation, derived in [4]: Pc − Pd r¯ = ⎛ ⎞ ⎛ ⎞ ⎝ pi+ pj + − pi+ pj + pk+ ⎠ · ⎝ p+i p+j − p+i p+j p+k ⎠ j>i
j>i
k>j>i
(1)
k>j>i
Special classes of discrete distributions of ordinal variables can be constructed by specifying the marginal distributions and a copula2, say Cr , parameterized by its rank correlation r. Specifying only the marginal distributions, and the correlation of the copula significantly reduces the quantification burden. Nevertheless, the rank correlation of two discrete variables is, in general, not equal to the correlation of their underlying uniforms, hence the correlation of the copula. There is a relationship between these correlations, which is given by Eq. (1), where: Pc − Pd =
m−1 n−1
r −1 r is calculated as: pi+ p+j C˜ij and C˜ij
(2)
i=1 j=1
i
Cr
k=1
pk + ,
j l=1
i−1
p+l +Cr
k=1
pk + ,
j l=1
i
p+l +Cr
k=1
pk + ,
j−1 l=1
i−1
p+l +Cr
k=1
pk + ,
j−1
p +l
l=1
We will denote with r¯C the rank correlation of 2 discrete ordinal variables whose joint distribution is constructed using their marginals and the copula Cr . If Cr is a positively ordered copula [8], then r¯C is an increasing function of the rank correlation of the underlying uniforms. We will further investigate the relationship between r¯C and the dependence parameter, r, of the copula. We choose different copulas (with more emphasis on the normal copula) and different marginal distributions for 2 discrete random variables X and Y . If we consider 2 ordinal responses X and Y , both uniformly distributed across a 2 The
class of discrete distributions that we obtain will depend on the choice of the copula and its properties.
Graphical Modeling and Bayesian Networks
12
1
1 r¯C for Frank copula
r¯C for Normal copula
0.6
0.6
0.4
0.4
0.2
0.2
r¯C
0.8
r¯C
0.8
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8 −1 −1
−0.8
−0.8
−0.6
−0.4
−0.2
0
r
0.2
0.4
0.6
0.8
1
(a) p1+ = p+1 = p3+ = p+3 = 0.01, p2+ = p+2 = 0.98 and Frank’s copula.
−1 −1
−0.8
−0.6
−0.4
−0.2
0
r
0.2
0.4
0.6
0.8
1
(b) p1+ = p+1 = p3+ = p+3 = 0.01, p2+ = p+2 = 0.98 and the Normal copula.
Figure 1. The relationship between the parameter r of the copula and r¯C , for discrete random variables with equal and symmetric marginal distributions.
small number of states, r¯C and r tend to be very similar, for any choice of a positive ordered copula. Moreover r¯C covers the whole range of r. Increasing the number of states for X and Y , makes r¯C approximately equal3 to r. When marginal distributions are not uniform, the relationship changes. Figure 1 presents the relationship between r and r¯C , for 2 discrete variables X and Y , with 3 states each. Their marginal distributions are equal and symmetric. We use Frank’s copula to obtain Figure 1(a), and the normal copula in Figure 1(b). As both Frank’s copula and the normal copula are positively ordered, r¯C is an increasing function of r. Since the marginal distributions are symmetric, the range of rank correlations realized for the discrete variables is the entire interval [-1, 1]. Notice that the relationship is very nonlinear. This strong nonlinearity is caused by the choice of p2+ = p+2 = 0.98. If we now consider variables with identical, but not symmetric marginal distributions, the relationship is not symmetric around 0 anymore. We choose p1+ = p+1 = p2+ = p+2 = 0.01, p3+ = p+3 = 0.98. In this case the whole range of positive dependence can be attained, but the range of negative association is bounded below, as shown in Figure 2 (left). If the margins are not identical, but "complementary", in the sense that: p1+ = p+3 ; p2+ = p+2 and p3+ = p+1 , then the entire range of negative association is possible, but the range of positive association is bounded above. We will further consider the variables X and Y , such that p1+ = 0.01, p2+ = 0.98, p3+ = 0.01 (for X) and p+1 = 0.19, p+2 = 0.01, p+3 = 0.80 (for Y ). We can observe in Figure 2 (right) that both positive and negative dependencies are bounded. One can also calculate bounds for r¯C , by using the Frechet bounds for Cr in expression (2). These bounds are shown in Figure 2 (right). Since we know the bounds, we can normalize the rank coefficient r¯C , such that it covers the entire interval [-1, 1]. The upshot of this discussion is that, given a copula, we can always find the correlation of that copula correspondent to the required correlation between 2 discrete variables, as well as between one discrete and one continuous variable4 . 3 10
states for each variable will suffice to obtain differences of order 10−3 , between r¯C and r. (2) allows us to calculate the rank correlation between a discrete and a continuous variable.
4 Relation
Non-Parametric Continuous and Discrete BBNs - A. Hanea et al.
13
1
1
r¯C for Normal copula
r¯C for Normal copula
0.8
0.8 0.6
0.6 0.4
0.4
0.2
r¯C
r¯C
0.2 0
0 −0.2
−0.2 −0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1 −1
−0.8
−0.6
−0.4
−0.2
0
r
0.2
0.4
0.6
0.8
1
−1 −1
−0.8
−0.6
−0.4
−0.2
0
r
0.2
0.4
0.6
0.8
1
Figure 2. The relationship between the parameter r, of the Normal copula and r¯C , for discrete random variables with equal, not symmetric (left), and different (right) marginal distributions.
3. Illustrations We explain the methodology of building and quantifying mixed non-parametric continuous and discrete BBNs using a highly simplified version of a problem that is being investigated in a project undertaken by the European Union. The name of the project is Beneris, which stands for Benefit and Risk, and its goal is to estimate the beneficial and harmful health effects in a specified population, as a result of exposure to various contaminants and nutrients through ingestion of fish [10]. The variables we are interested in are the cancer risk and cardiovascular risk resulting from exposure to fish constituents. These risks are defined in terms of remaining lifetime risks. The 3 fish constituents that are considered are: dioxins/furans, polychlorinated biphenyls, and fish oil. The first two are persistent and bio-accumulative toxins which cause cancer in humans. Fish are a significant source of exposure to these chemicals. Fish oil is derived from the tissues of oily fish and has high levels of omega-3 fatty acids which regulate cholesterol and reduce inflammation throughout the human body, hence it influences the cardiovascular risk. Moreover, personal factors such as smoking, socioeconomic status and age may influence cancer and cardiovascular risk. Smoking is measured as yearly intake of nicotine during smoking and passive smoking, while the socioeconomic status is measured by income, which is represented by a discrete variable with 4 income classes. The age is taken, in this simplified model, as a discrete variable with 2 states, 15 to 34 years, and 35 to 59 (we are considering only a segment of the whole population). Figure 3 resembles the version of the model that we consider. The distributions of the variables are presented in the right hand side of Figure 3. These marginal distributions can be obtained either from data, or from experts, but in this particular case they are chosen by the authors for illustrative purposes only. There are 2 discrete (age and socioeconomic status), and 6 continuous random variables. Some indication of the relationships between them is given in their description above. This relationships are represented as arcs of the BBN. The (conditional) rank correlations assigned to these arcs must be gathered from existing data or expert judgment [2]. In this example they are chosen by the authors. Figure 4 (left) presents the same BBN, only now (conditional) rank correlations are assigned to each arc, except one. The arc between the 2 discrete variables "age" and "soci_econ_status" is not assigned any
14
Graphical Modeling and Bayesian Networks
Figure 3. Simplified Bayesian Belief Net for fish consumption risks.
Figure 4. BBN for fish consumption risks with (conditional) rank correlations assigned to the arcs (left). The relation between r, for the Normal copula, and r¯C of "age" and "soci_econ_status" (right).
rank correlation. Let us assume that the correlation between them can be calculated from data, and its value is 0.63. As we stressed in the previous sections, the dependence structure in the BBN must be defined with respect to the underlying uniforms. Hence, we first have to find the rank correlation of the underlying uniforms, r, which corresponds to r¯C = 0.63 for the normal copula. The relationship between r and r¯C is shown in Figure 4 (right). We read from the graph that in order to realize a correlation of 0.63 between the discrete variables, we must assign the rank correlation 0.9 to the arc of the BBN. Similarly, we can choose the required correlations between a uniform variable underlying a discrete, and other continuous variables (e.g. the uniform underlying "age", and "cardiaovasc_risk"). Figures 3 and 4(left) are obtained with a software application, called UniNet5 . UniNet allows for quantification of mixed non-parametric continuous and discrete BBNs [11,12]. The quantified model can be now used to investigate many different types of questions. Let us assume, for example, that we are interested in the cancer risk of a young person, with low socioeconomic status, that smokes. Figure 5 (left) presents how this conditions affect the distribution of the cancer risk. The gray distributions in the background are the unconditional marginal distributions, provided for comparison. The conditional means and standard deviations are displayed under the histograms. We may ask another question, namely “what are the main contributors to a very high risk of cancer?”. We will condition on the 0.9 value of cancer risk. Figure 5 (right) summarizes the combination of factors that increases the risk of cancer to 0.9. From the 5A
version of UniNet will shortly be available at http://dutiosc.twi.tudelft.nl/ risk/index.php.
Non-Parametric Continuous and Discrete BBNs - A. Hanea et al.
15
Figure 5. Conditionalised BBN.
shift of the distributions, one can notice that if a person is neither very young, nor very wealthy, smokes a lot, and ingests more dioxins/furans, and polychlorinated biphenyls, is more likely to get cancer. Because smoking, socioeconomic status, and age influence also the cardiovascular risk, the shift in their distributions causes an increase in the cardiovascular risk as well.
4. Discussion We have extended the theory for continuous BBNs to include discrete random variables that can be written as monotone transforms of uniform variables. In this approach, the dependence structure must be defined - via (conditional) rank correlations - with respect to the uniform variates. We have described the relationship between the rank correlation rC ) and the rank correlation of their underlying uniforms (r). of two discrete variables (¯ The methodology presented in this paper is implemented in the software package UniNet, and it is successfully applied in real world applications that involve hundreds of variables [10,12].
References [1] [2]
[3] [4]
[5] [6]
D. Kurowicka and R.M. Cooke, Distribution - Free Continuous Bayesian Belief Nets, Proceedings Mathematical Methods in Reliability Conference, 2004 O. Morales, D. Kurowicka and A. Roelen, Eliciting Conditional and Unconditional Rank Correlations from Conditional Probabilities., Reliability Engineering and System Safety, 10.1016/j.ress.2007.03.020, 2007 A. M. Hanea, D. Kurowicka and R.M. Cooke, Hybrid Method for Quantifying and Analyzing Bayesian Belief Nets, Quality and Reliability Engineering International, 22(6) (2006) , 613-729. A. M. Hanea, D. Kurowicka and R.M. Cooke, The Population Version of Spearman’s Rank Correlation Coefficient in the Case of Ordinal Discrete Random Variables, Proceedings of the Third Brazilian Conference on Statistical Modelling in Insurance and Finance, 2007 D. Kurowicka and R.M. Cooke, Uncertainty Analysis with High Dimensional Dependence Modelling, Wiley, 2006 R.M. Cooke, R.M, Markov and Entropy Properties of Tree and Vine-Dependent Variables, Proceedings of the Section on Bayesian Statistical Science, American Statistical Association, 1997
16 [7] [8] [9] [10] [11]
[12]
Graphical Modeling and Bayesian Networks T.J. Bedford and R.M. Cooke, Vines - A New Graphical Model for Dependent Random Variables, Annals of Statistics, 30(4) (2002), 1031-1068. R.B. Nelsen, An Introduction to Copulas, Lecture Notes in Statistics, Springer- Verlag, New York, 1999 H. Joe, Multivariate Models and Dependence Concepts, Chapman & Hall, London, 1997 P. Jesionek and R.M. Cooke, Generalized method for modeling dose-response relations - application to BENERIS project, European Union project, 2007 D. A. Ababei, D. Kurowicka and R.M. Cooke Ababei,D. A., Kurowicka, D. and Cooke, R.M., Uncertainty analysis with UNICORN, Proceedings of the Third Brazilian Conference on Statistical Modelling in Insurance and Finance, 2007 O. Morales-Napoles, D. Kurowicka, R.M. Cooke R.M. and D. Ababei, Continuous-Discrete Distribution Free Bayesian Belief Nets in Aviation Safety with U NI N ET, Technical Report TU Delft, 2007
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
17
A Dynamic Graphical Model to Represent Complex Survival Distributions Roland DONAT a,c,1 , Laurent BOUILLAUT a , Patrice AKNIN a and Philippe LERAY b a Laboratory of New Technologies, INRETS, France b Laboratory of Computer Science, Polytech’ Nantes, France c Laboratory of Computer Science, Information Processing, and Systems, INSA Rouen, France Abstract. Reliability analysis has become an integral part of system design and operating. This is especially true for systems performing critical tasks. Moreover, recent works in reliability involving the use of probabilistic graphical models, also known as Bayesian networks, have been proved relevant. This paper describes a specific dynamic graphical model, named graphical duration model (GDM), to represent complex stochastic degradation processes with any kind of state sojourn time distributions. We give qualitative and quantitative descriptions of the proposed model and detail a simple algorithm to estimate the system reliability. Finally, we illustrate our approach with a three-states system subjected to one context variable and non-exponential sojourn time distributions. Keywords. Probabilistic graphical models, Graphical duration models, Reliability
Introduction Reliability analysis has become an integral part of system design and operating. This is especially true for systems performing critical applications. Typically, the results of such analysis are given as inputs to a decision support tool in order to optimize the maintenance operations. Unfortunately, in most of cases, the system state cannot be evaluated exactly. This is one of the reasons which has led to the important development of probabilistic methods in reliability. A wide range of works about reliability analysis is available in the literature. For instance in numerous applications, the aim is to model a multi-state system and therefore to capture how the system state changes over time. This problematic can be partially solved using the Markov framework. The major drawback of this approach comes from the constraint on state sojourn times which are necessarily exponentially distributed. This issue can be overcome by the use of semi-Markov models [1] which allow to consider any kind of sojourn time distributions. On the other hand, one can be interested in modeling the context impacting on the system degradation. A classic manner to address such an 1 Corresponding Author: Institut National de Recherche sur les Transports et leur Sécurité, Laboratoire des Technologies Nouvelles, 2 avenue du Général Malleret-Joinville 94114 Arcueil Cedex, France; E-mail:
[email protected].
18
Graphical Modeling and Bayesian Networks
issue consists in using a Cox model or a more general proportional hazard model [2]. Nevertheless, as far as we know, it is unusual to find works considering both approaches at the same time. Moreover, recent works in reliability involving the use of Probabilistic Graphical Models (PGMs), also known as Bayesian Networks (BNs), have been proved relevant. For instance, the authors in [3] show how to model a complex system dependability by means of PGMs. In [4], the authors explain how fault trees can be represented by PGMs. Finally in [5], the authors explain how to exploit Dynamic PGMs (DPGMs) to study the reliability of a dynamic system represented by a Markov chain. Our work aims to describe a general methodology to model the stochastic degradation process of a system, allowing any kind of state sojourn time distributions along with an accurate context description. We achieve to meet these objectives using a specific DPGM called Graphical Duration Model (GDM). This paper is divided into four sections. Section 1 briefly describes the PGMs and DPGMs formalism. Then, section 2 introduces the GDMs by defining both its structure and its quantitative part. Section 3 depicts a simple iterative method to compute the reliability of a system represented by a GDM. Finally to illustrate our methodology, we propose to study in section 4 a three states system subjected to one context variable and non exponential duration distributions.
1. Probabilistic Graphical Models Probabilistic Graphical Models (PGMs), also known as Bayesian Networks (BNs) [6], provide a formalism for reasoning about partial belief under conditions of uncertainty. This formalism relies on the probability theory and the graph theory. Indeed, PGMs are defined by a Directed Acyclic Graph (DAG) G = (X, E) over a sequence of nodes X = (X1 , . . . , XN ) representing random variables that takes value from given domains X1 , . . . , XN . The set of edges E encodes the existence of correlations between the linked variables. The strength of these correlations are quantified by conditional probabilities. A PGM is a pair (G, {Pn }1≤n≤N ), where G = (X, E) is a DAG and {Pn }1≤n≤N denotes the set of Conditional Probability Distributions (CPDs) associated to each variable Xn and its parents. We refer to the sequence of random variables X pan as the "parents" of Xn in the graph G. Exploiting the conditional independence relationships introduced by the edges of G, the joint probability over X can be economically rewritten with the product form P (x1 , . . . , xN ) =
N
Pn (xn |xpan ),
(1)
n=1
where the general notation xS (resp. X S ) denotes the projection of a sequence x (resp. X) over a subset of indices S. Besides, both the DAG and the CPDs of a PGM can be automatically learned [7] if some data or experts’ opinions are available. Using PGMs is also particularly interesting because of the possibility to propagate knowledge through the network. Indeed, various inference algorithms can be used to compute marginal probability distributions over the system variables. The most classical one relies on the use of a junction tree [8]. In ad-
A Dynamic Graphical Model to Represent Complex Survival Distributions - R. Donat et al.
19
dition, inference in PGMs allows to take into account any variable observations (also called evidence) so as to update the marginal distributions of the other variables. On the other hand, the classic PGM formalism is not able to represent temporal stochastic processes. Thereby the Dynamic Probabilistic Graphical Models (DPGMs, a.k.a. DBN) have been developed [9]. Strictly speaking, a DPGM is a way to extend PGM to model probability distributions over a collection of random variables (X t )t∈N∗ = (X1,t , . . . , XN,t )t∈N∗ indexed by the discrete-time t. A DPGM is defined as a pair (M1 , M→ ). M1 is a PGM representing the prior distribution P1 (X 1 ) = N n=1 Pn,1 (Xn,1 |X pan,1 ). M→ is a particular PGM, called s-slice Temporal Probabilistic Graphical Model (s-TPGM) aiming to define the distribution of X t given (X τ )t−(s+1)≤τ ≤t−1 , where s ≥ 2 denotes the temporal dependency order of the model. In this paper, we set s = 2 such that M→ is a 2-TPGM representing the distribution N P→ (X t |X t−1 ) = n=1 Pn,→ (Xn,t |X pan,t ). Consequently, it is possible to compute the joint distribution over random variables (X t )1≤t≤T by simply "unrolling" the 2-TPGM until we have a sequence of length T as follows P ((X t )1≤t≤T ) = P ((X1,t , . . . , XN,t )1≤t≤T ) = P1
T
P→ (X t |X t−1 )
t=2
=
N n=1
Pn,1 (Xn,1 |X pan,1 )
T N
Pn,→ (Xn,t |X pan,t ).
t=2 n=1
Finally, as it is possible to consider a DPGM as a big unrolled PGM, these dynamic models inherit some of the convenient properties of static PGMs. On the other hand, performing inference in such models can raise some computation problems if the sequence length is too large. Consequently, specific methods have been developed to partially solve this issue [9]. In section 3, a simple inference method is described to compute the reliability of a system represented by a Graphical Duration Model (GDM).
2. Introducing the Graphical Duration Models 2.1. Graphical structure In this article, we propose to extend the variable duration models introduced in [9] to build a comprehensive model for complex survival distributions. The 2-TPGM associated to the underlying model, called Graphical Duration Model (GDM), is depicted in figure 1. It allows to describe in a flexible and accurate way the discrete survival function of a system given its context. It relies on the two following variables : the system state Xt ; and the duration variable XtD describing the time spent in any system states. Moreover, a binary transition variable Jt is added to explicitly characterize the system transitions from one state to another. A collection of context variables (covariates) Z t = (Z1,t , . . . , ZP,t ) can also be used to model the system context. As shown in figure 1, the current system state Xt depends on the previous duration D Xt−1 and the previous system state Xt−1 . Thus, the process generated by a GDM is then
Graphical Modeling and Bayesian Networks
20
similar to a discrete semi-Markovian process [10]. Indeed, thanks to the variable XtD , it is possible to specify any kind of state sojourn time distributions. Consequently exploiting the powerful modeling properties of PGMs, the GDMs account for convenient, intuitive and easily generalizable tools to represent complex degradation processes.
Figure 1. Representation of a GDM. The Zp,t ’s represent the system covariates. Xt is the system state and XtD is the duration variable in the current state. Jt is the explicit transition variable of the system.
2.2. CPDs The following paragraphs address the specification of each CPD involved in a GDM except those about the distribution of Z t since the system context modeling is strongly dependent of the application. Hence in the sequel, we suppose that the probability distribution of Z t is known. In addition as it is usually the case in PGMs, all the CPDs are assumed to be finite and discrete, i.e. can be shown in the form of a multidimensional table. First of all, we refer to X = x1 , . . . , xK and Z = z 1 , . . . , z L as the domain associated to the system state variable and the context variable respectively. Let begin with the CPD of the initial system state given its context, namely P (X1 |Z 1 ). This CPD describe the probability for the system to start in a given state x ∈ X and given a particular context configuration z ∈ Z. Then, it is necessary to define the transition CPD from one state to another. When a transition occurs at time t, i.e. if the variable Jt−1 = 1, the probability that the system goes to state xl from state xk conditionally to the context z is given by the homogeneous transition matrix A(·, z, ·). On the other hand, while there is no transition, i.e. if Jt−1 = 0, the system deterministically remains in the previous state xk . Therefore, the corresponding static transition matrix is reduced to identity whatever the context. As a result the CPD of Xt , t ≥ 2 is
A(k, z, l) if j = 1, , P (Xt = xl | Xt−1 = xk , Jt−1 = j , Z t = z ) = δ(k = l) if j = 0. current state
previous state
transition at time t current context
A Dynamic Graphical Model to Represent Complex Survival Distributions - R. Donat et al.
21
where δ is the characteristic function. The initial duration CPD encodes the sojourn time distributions for each state given the context z, such that P (X1D = d|Z 1 = z, X1 = xl ) = ϕ(z, l, d), where ϕ(z, l, d) gives the probability to remain d time units in each state xl given the context z. Besides as we made the discrete and finite assumption for all the CPDs, the domain of XtD , t ≥ 1 has to be discrete and finite which is not natural for a duration distribution. Basically, we overcome this issue by setting an upper time bound Dmax large enough compared to the dynamic of the studied system. Consequently, XtD takes its values in the set X D = {1, . . . , Dmax }. The CPD of XtD , t ≥ 2 plays an analogous role except it has also to update the remaining time to spend in the current state at each sequence step. Indeed, while the D > 1), the remaining sojourn remaining previous duration is greater than one (i.e. Xt−1 time is deterministically counted down. On the other hand, when the previous remaining duration reaches the value one, a transition is triggered to occur at time t, then a sojourn time for the new current state xl is drawn according to ϕ(z, l, ·). In other words, the CPD of XtD , t ≥ 2 is defined by = d |Xt−1 = x
P (XtD
current remaining time
k
= d , Jt−1 = j , Z t = z) =
D , Xt−1
previous remaining time
δ(d = d − 1) if j = 0 . ϕ(z, l, d) if j = 1
transition at time t
Note that the discrete-time assumption laid on by the DPGM formalism can be easily overcome. Indeed, authors in [11] present a survey of discrete lifetime distributions and explain how to derive usual continuous ones (e.g. exponential, Weibull, . . . ) in the discrete case. Finally, Jt is the random variable characterizing transitions between two different system states. More precisely, when Jt = 1, a transition is triggered at time t and the system state changes at time t + 1. The system state remains unchanged while Jt = 0. Besides, a transition is triggered at time t if and only if the current remaining duration reaches the value one. Consequently, the CPD of Jt is deterministic and merely defined by P (Jt = 1|XtD = d) = δ(d = 1). 3. Reliability Estimation using GDM Let assume that the set of system states X is partitioned into two sets U and D (i.e. X = U ∪ D with U ∩ D = ∅), respectively for "up" states and for "down" states (i.e. OK and failure situations). The discrete-time system reliability is then define as the function R : N∗ → [0, 1] where R(t) represents the probability that the system has always stayed in an up state until moment t, i.e. R(t) = P (X1 ∈ U, . . . , Xt ∈ U). In addition, it is possible to derive some interesting metrics such as the failure rate or the MTTF (cf. [11] for details) from the reliability definition. As the reliability estimation boils down to a probability computation, we proposed the following inference algorithm to compute R(t) :
Graphical Modeling and Bayesian Networks
22 1: 2: 3:
Compute P (X1 , X1D ) and find out P (X1 ) = for t = 2 to T do Compute P (Xt |Xt−1 ) =
X1D
P (X1 , X1D )
D D P (Xt−1 , Xt−1 )P (Jt−1 |Xt−1 )P (Z t )P (Xt |Xt−1 , Jt−1 , Z t )
D ,J Xt−1 t−1 ,Z t
Compute P (Xt , XtD ) t 5: Find out R(t) = P (X1 ∈ U) τ =2 P (Xτ ∈ U|Xτ −1 ∈ U) 6: end for Note that it is possible to show that the computation of the distribution P (Xt , XtD ) can be achieve by means of any classic PGM inference algorithms. Some details about this simple inference method are given in [12]. 4:
4. Application To illustrate our approach, we use a GDM to model the behavior of a 3-states system representing a production machine. This machine is supposed to be subjected to one covariate, namely its production speed. Hence, the resulting GDM consists of : one covariate Z1,t representing the speed level, "low" or "high"; the system state Xt which can be "nominal" (N) or "degraded" (D) for the up states and "failed" (F) for the down state. ; and the duration variable XtD where we arbitrary set Dmax = 150 months which is large enough since our analysis is performed over only 100 months. the transition matrix A and the survival distribution ϕ for each state and each context are given in tables 1. Note in this example that all the sojourn times are assumed to have discrete right censored Weibull distributions denoted by W r (μ, γ) where μ and γ are the classic scale and shape parameters and r is the right censoring time bound assuring the finiteness of the distribution. In other words for each context z1 and each state l, the probability of a sojourn time d is given by ϕ(z1 , l, d) = [F (d − 1) − F (d)] δ(1 ≤ d ≤ r − 1) + [1 − F (r − 1)] δ(d = r), where F is the cumulative distribution function of the well-known continuous Weibull distribution with scale parameter μz1 ,l and shape parameter γz1 ,l . R The GDM used in this example has been implemented in MATLAB environment, using the free Bayes Net Toolbox (BNT). The corresponding reliability estimations are presented in figure 2(a). The associated failure rate and MTTF are depicted in figures 2(b) and 2(c) respectively. These figures allow to characterize the behavior of the studied system for different functioning policies controlled by the percentage of high speed production per time unit. As a consequence, useful information about the covariate effects can be deduced from such analysis and such survival analysis can be essential inputs for reliability-based maintenance decision support tools.
A Dynamic Graphical Model to Represent Complex Survival Distributions - R. Donat et al.
23
Table 1. Parameters of the GDM. W r (μ, γ) denotes the discrete right censored Weibull distribution. The right censoring parameter r is set to Dmax = 150 months. (a) A(k, low, l) (b) A(k, high, l) (c) Conditional duration distributions ϕ N D F
N
D
F
0
9/10
1/10
0 0
0 0
1 1
N D F
state
N
D
F
0
3/10
7/10
speed
N
D
0 0
0 0
1 1
low high
W r (30, 1)
W r (20, 3)
W r (20, 1)
W r (10, 3)
5. Discussion The proposed method based on the GDMs aims to study the behavior of a complex system. Our approach turns to be a satisfying and a comprehensive solution to model and estimate the reliability of a complex system. Indeed, the proposed modeling is generic since it is possible to take into account the context of the system along with an accurate description of its survival distributions. In addition as this work is based on graphical models, the underlying approach is intuitive and easily generalizable. The encouraging results presented in this paper confirm that GDMs are competitive reliability analysis tools for practical problems. Finally in future works, we will address the problem of maintenance modeling based on system represented by GDMs.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
N. Limnios and G. Oprisan, Semi-Markov Processes and Reliability. Statistics for Industry & Technology, Springer, 2001. R. Kay, Proportional hazard regression models and the analysis of censored survival data, Applied Statistics, 26 (1977), 227–237. H. Boudali and J. B. Dugan, A discrete-time bayesian network reliability modeling and analysis framework, Reliability Engineering & System Safety, 87 (2005). 337–349. H. Langseth and L. Portinale, Bayesian networks in reliability, Reliability Engineering & System Safety, 92 (2007), 92–108. P. Weber and L. Jouffe, Reliability modeling with dynamic bayesian networks. In 5th IFAC Symposium on fault Detection, Supervision and Safety of Technical Processes, Washington D.C., USA, 2003. J. Pearl, Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference, Morgan Kaufmann, 1988. R. E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2003. S. L. Lauritzen and D. J. Spiegelhalter, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistical Society, 50 (1988), 157–224. K. P. Murphy, Dynamic Bayesian Networks : Representation, Inference and Learning, PhD thesis, University of California, Berkeley, 2002. V. Barbu, M. Boussemart and N. Limnios, Discrete time semi-markov processes for reliability and survival analysis, Communication in Statistics - Theory and Methods, 33 (2004) ,2833–2868. C. Bracquemond and O. Gaudoin, A survey on discrete lifetime distributions, International Journal on Reliability, Quality, and Safety Engineering, 10 (2003), 69–98. R. Donat, L. Bouillaut, P. Aknin, P. Leray and D. Levy, A generic approach to model complex system reliability using graphical duration models. In Proceedings of the Fifth International Mathematical Methods in Reliability Conference, Glasgow, (2007).
Graphical Modeling and Bayesian Networks
24
1.2 0% 25% 50% 75% 100%
1
prob
0.8 0.6 0.4 0.2 0 0
20
40
60
80
100
time
(a) Reliability. 0.05 0.04
prob
0.03 0.02 0% 25% 50% 75% 100%
0.01 0 −0.01 0
20
40
60
80
100
time
(b) Failure rate. 50 45
MTTF
40 35 30 25 20 0
25 50 75 high production speed (%)
100
(c) MTTF.
Figure 2. Reliability and related metrics over time (in months) for different functioning policies.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
25
Network Reliability Evaluation with Propositional Directed Acyclic Graphs1 a
Jacek JONCZY a and Rolf HAENNI a,b Institute of Computer Science and Applied Mathematics, University of Bern, Switzerland b Bern University of Applied Sciences, Biel, Switzerland Abstract. This paper proposes a new and flexible approach for network reliability computation. The method is based on Propositional Directed Acyclic Graphs (PDAGs), a general graph-based language for the representation of Boolean functions. We introduce an algorithm which creates in polynomial time a generic structure function representation for reliability networks. In contrast to many existing methods, our method does not rely on the enumeration of all mincuts or minpaths, which may be infeasible in practice. From this representation, we can then derive the structure functions for different network reliability problems. Based on the compact PDAG representation, we can both compute the exact reliability or estimate the reliability by means of an approximation method. Keywords. Network reliability, PDAG, structure function representation
Introduction The computation of network reliability plays an important role in operations research, computer science, and engineering. Its main application area are large-scale systems such as computer and communication networks, where a high degree of reliability is crucial. In this paper, the emphasis lies on a compact and efficient representation of the network’s structure function. The structure function describes the network operation based on structural and statistical network properties. The goals of this paper are: First, to introduce a new method which provides a compact structure function representation by means of a polynomial-time algorithm, and second, to show how exact network reliability can be computed based on a generic representation of the structure function. Preliminaries. The network model used in this paper consists of a probabilistic graph G = (V, E), where V = {v1 , . . . , vn } and E = {e1 , . . . , em } are corresponding sets of nodes and edges, respectively, together with a set of probabilities p1 , . . . , pm such that with each edge ei a probability of operation pi = p(ei ) is associated. The reliability analysis is based on the following assumptions: • networks are directed, 1 This
research is supported by the Swiss National Science Foundation, project no. PP002–102652. Many thanks also to Michael Wachter and Reto Kohlas for their helpful comments.
Graphical Modeling and Bayesian Networks
26
• nodes are perfectly reliable, i.e. failures are consequences of edge failures only, • edges fail statistically independently with known probabilities qi = 1 − pi , and • the underlying structure function is monotone in each argument. In this paper, we consider only directed networks with edge failures. This assumption will ease the discussion of the proposed method without implying any conceptual restrictions: First, any network with node failures is polynomial-time reducible into an equivalent directed network with edge failures only [6]. And second, undirected networks are easily transformed into directed networks by replacing every undirected edge by two corresponding opposing directed edges. Each directed edge inherits then the failure probability of the original undirected edge [6]. Structure Function. We follow the usual convention that each edge ei is in either of two possible states, namely operational (1) or failure (0). This motivates the definition of a Boolean function Φ: Ω → {0, 1}, where Ω = {0, 1}m is the state space under consideration. The function Φ is called the structure function (SF) and its evaluation for a given state S ∈ Ω yields the network operation: a network is in an operational state (failure state) S iff Φ(S) = 1 (0). Then the reliability of a network is defined as the probability that the SF evaluates to 1. The representation of the SF is crucial for reliability computation. Therefore, the computation method proposed in this paper tends to represent the SF in a most compact and efficient form. Network Reliability Problems. The following classical network reliability problems are of main interest in this paper.2 In the directed case, the source-to-terminal connectedness, denoted by Conn 2 , is the probability that for two specified nodes s (source) and t (terminal), there exists at least a (directed) path from s to t. The source-to-all-terminal connectedness, or Conn ∀k , is the probability that for a given source node s ∈ V , there is a path from s to each of the k nodes in T , where T ⊆ V \ {s} is the set of terminal nodes. In this paper, we additionally consider the source-to-any-terminal connectedness, denoted by Conn ∃k , the probability that there exists at least one path from the source node s to one or more terminal nodes in T . Note that Conn 2 is a special case of both, Conn ∀k and Conn ∃k . For any of these problems, corresponding undirected counterparts exist. They are all covered by our method, but will not explicitly take part in our discussion. Existing Computational Methods. Several methods exist for the representation of the SF. Complete state enumeration is the simplest, but also the most inefficient method (exponential in the network size). The enumeration of minpaths or mincuts, which results in a corresponding DNF (Disjunctive Normal Form) representation of the SF, may be more efficient, but is still exponential in the worst case. On top of that, one is faced with the generally hard problem of making the DNF disjoint in order to compute the exact reliability. This is the reason why most enumeration techniques require in a second step a disjoint form of the resulting terms. Usually, some sum-of-disjoint-products (SOP) method is applied, see [1,9,2]. Another classical method is the inclusion-exclusion expansion, which is very inefficient on its own right, but a more sophisticated alternative is provided by the domination theory [11]. Algebraic methods for the representation of the SF can be found in [12]. More recent is the use of Binary Decision Diagrams (BDDs) [5], which are based on Shannon’s decomposition principle, a common factoring method. Often a par2 We
partially adopt the notation from [3,6] and slightly modify some definitions.
A New Approach to Network Reliability - J. Jonczy & R. Haenni
27
ticular type of BDDs, so-called Ordered BDDs (OBDDs), are constructed from minpath, mincut, or EED (Edge Expansion Diagram) representations, see [10] for the latter. Of course, it is also possible to generate an OBDD directly from the graph representation of the network, see [4,8,15]. The direct application of OBDDs is a very common approach, as they inherently provide a natural form of disjointness from the beginning. Our New Approach. We propose a flexible approach to compute network reliability. In its essence, the proposed procedure consists of four consecutive phases: (1) creation of a generic SF representation, (2) obtaining a specific SF by appropriate instantiation, (3) transformation of the specific SF into a disjoint form, and (4) exact calculation of the reliability based on the SF obtained in (3). All phases involve PDAGs (Propositional Directed Acyclic Graphs), a recently introduced graph-based language for the representation of Boolean functions [13,14]. We will explain in Section 1 how to use PDAGs for computing network reliability. The method we propose is useful for both an exact and approximate calculation of network reliability.
1. Reliability Evaluation with PDAGs Let us start this section by giving a more detailed overview of the four phases which constitute the proposed reliability evaluation process: Phase 1. Transformation of an initial network representation N into a generic SF representation ϕs for all Conn ∃k problems with a fixed source node s. The proposed algorithm sequentially eliminates in polynomial time all nodes except s. Phase 2. Creation of the specific SF ϕs,∃T for the problem Conn ∃k by an appropriate instantiation of ϕs relative to a set of terminal nodes T , in linear time. This includes Conn 2 as a special case for T = {t}, in which case the SF is denoted by ϕs,t . From such representations for different t, we may then derive SFs ϕs,∀T for arbitrary Conn ∀k problems. Phase 3. Transformation of a specific SF resulting from Phase 2 into a logically equivacd cd lent, but disjoint form denoted by ϕcd s,∃T , ϕs,∀T , or ϕs,t , respectively. This transformation is necessary for an efficient probability computation afterwards, but may result in an exponentially larger representation of the SF. Phase 4. Exact reliability computation by calculating the probability of the disjoint SF from Phase 3 in time linear in its size. In the following discussion, we will concentrate on the Conn 2 problem in order to ease the presentation of our evaluation method. Furthermore, the SFs will be represented by means of PDAGs. A PDAG is a graphical representation of a Boolean function, where the logical conjunction ∧ is represented by a -node, the disjunction ∨ by a -node, and the negation ¬ by a ♦-node. Leaf nodes are represented by circles and labeled with Boolean variables, (true), or ⊥ (false) [13]. Formally, a network is represented by means of a connectivity matrix N = (ϕij )n×n , where n = |V |. Each coefficient ϕij is a PDAG representation of a local SF with the following semantics: ϕij = ⊥ represents for i = j the reachability of node vj from node vi relative to the nodes already eliminated. In the initial matrix, it implies the existence of an edge from vi to vj . Conversely, ϕij = ⊥ means that there is no edge
Graphical Modeling and Bayesian Networks
28
from vi to vj .3 The diagonal coefficients ϕii represent the generic SF for the reachability of node vi . Initially, we write ϕii = λvi , for all vi ∈ V . Later in Phase 2, the λvi play the role of terminal selectors: the imposed value of each λvi determines whether the corresponding node vi belongs to the terminal set T or not. Phase 1: Generating the Generic Structure Function Consider the network from Fig. 1, which will be used as reference example in the sequel. At the beginning, we impose a variable ordering on V = {A, B, C, D}, which arranges the nodes increasingly according to their distance from a previously fixed source node s. Ties are resolved randomly.4 In our example, we have chosen node A as source node and obtain A, B, C, D as initial ordering. The reverse order of the nodes will then be used as elimination sequence according to which the nodes are removed from the matrix. e1
B
e3
e5
A e2
C
D e4
Figure 1. A simple reliability network.
The elimination algorithm performs two essential operations. The first is a reachability update (Line 5 in Alg. 1) and is based on the following observation: when a node vk is eliminated from the network, its reachability information represented by ϕkk must be passed to all nodes vi with an outgoing arc to vk . Thus their respective reachability information ϕii is updated. In our example, when node D is eliminated, the coefficients ϕ22 and ϕ33 for nodes B and C, respectively, are updated: λB becomes λB ∨ (e3 ∧ λD ), and λC becomes λC ∨ (e4 ∧ λD ). The intuition behind this operation is the following: the reachability of node D is represented by λD , but it is also reachable via e3 or e4 . The latter require nodes B and C to be reachable as well, which is expressed by λB 1 2 3 4 5 6 7 8 9 10 11 12
function genericStructFunction((ϕij )n×n ) for k from n downto 2 do for i from k − 1 downto 1 do if ϕin = ⊥ then ϕii ← ϕii ∨ (ϕik ∧ ϕkk ); // reachability update for j from 1 to k − 1 where j = i do ϕij ← ϕij ∨ (ϕik ∧ ϕkj ); // transitivity update end end end end return ϕ11 ;
Algorithm 1. Algorithm creating the generic structure function from a given network N = (ϕij )n×n .
and λC , respectively. The second operation is a transitivity update (Line 7 in Alg. 1) we could set ϕij to for a perfectly reliable edge from vi to vj . is a heuristic called Largest Distance First (LDF), of course other heuristics are also possible.
3 Similarly, 4 This
A New Approach to Network Reliability - J. Jonczy & R. Haenni
29
and is performed whenever the currently eliminated node vk has outgoing arcs in the current connectivity matrix. Hence, for an edge from vk to vj , we check whether there are alternative paths towards the sink node vj (which has not yet been eliminated) from another node vi via vk . If so, the corresponding ϕij is updated. In the example, C has one outgoing arc in the subgraph with V \ {D}, namely e5 . Thus when C is eliminated, ϕ12 is updated: e1 becomes e1 ∨ (e2 ∧ e5 ). Intuitively, this means that B is reachable from A by e1 , or via C by e2 and e5 . The core of the algorithm is shown in Alg. 1.5 The following matrices illustrate the process for each elimination step: the initial matrix NABCD , then NABC after eliminating D, NAB after eliminating C, and finally NA after eliminating B. At the end of the algorithm, ϕ11 in the matrix NA represents the generic source-to-any-terminal connectivity information with A as source node. For reasons of space, the PDAGs ϕij are written as (nested) propositional formulas within the matrices. ⎡
⎤ ⎡ ⎤ λA e1 e2 ⊥ λA e1 e2 ⎢ ⊥ λ ⊥ e ⎥ ⎢ ⎢ ⎥ B 3 ⎥ NABCD = ⎢ NABC = ⎣ ⊥ λB ∨ (e3 ∧λD ) ⊥ ⎥ ⎦ ⎣ ⊥ e5 λC e4 ⎦ λC ∨ (e4 ∧λD ) ⊥ e5 ⊥ ⊥ ⊥ λD λA ∨ (e2 ∧(λC ∨ (e4 ∧λD ))) e1 ∨ (e2 ∧e5 ) NAB = ⊥ λB ∨ (e3 ∧λD ) NA = λA ∨ (e2 ∧(λC ∨ (e4 ∧λD ))) ∨ ((e1 ∨ (e2 ∧e5 ))∧(λB ∨ (e3 ∧λD ))) Phase 2: Creating the Specific Structure Function for Conn 2 In the context of our sample network, the generic SF obtained in Phase 1 is denoted by ϕA . Its exact meaning depends on the respective values of the terminal selectors λA ,. . . ,λD , which must be instantiated accordingly with and ⊥. In case we are interested in the Conn 2 problem with source node s = A and terminal node t = D, in order to obtain the corresponding specific SF ϕA,D , we must instantiate the terminal selectors within ϕA as follows: λD is set to , and λA , λB , and λC are all set to ⊥.6 Note that the instantiation runs always in time linear in the number of PDAG nodes. After all simplifications, we get the new formula ϕA,D = (e2 ∧e4 ) ∨ ((e1 ∨ (e2 ∧e5 ))∧e3 ) which is the specific SF of interest. This serves as starting point for Phase 3. The corresponding PDAG ϕA,D is depicted in Fig. 2 (a). Phase 3: Exact Reliability Computation – cd-PDAG Construction The primary goal of our method is to compute the exact reliability Conn 2 , which in our example is equal to the probability P (ϕA,D ) of the specific SF. To compute this probability, the PDAG ϕA,D must be first transformed into a logically equivalent cd-PDAG 5 The algorithm avoids performing every update, which leads in case of a cycle-free network to the complexity O(n2 ). In general, the algorithm runs in O(nm), hence in the worst case, i.e. if m ≈ n2 , in O(n3 ). 6 For Conn ∃k , λvi is set to for all vi ∈ T , and λvj is set to ⊥ for all vj ∈ V \ T . For Conn ∀k , k accordingly instantiated SFs for corresponding Conn 2 problems are conjoined.
Graphical Modeling and Bayesian Networks
30
ϕcd A,D , satisfying two key properties called decomposability (c) and determinism (d). We prefer cd-PDAGs to other Boolean representation languages like OBDDs or d-DNFs (decomposable Disjunctive Normal Forms) due to their relative succinctness (compactness). For a complete discussion on PDAGs and succinctness in particular, please refer to [13]. In this context, we apply the following general strategy for the cd-PDAG construction, which consists of two consecutive transformations: (1) PDAG to d-PDAG: Replace each -node ϕ1 ∨ ϕ2 by ϕ1 ∨ (ϕ2 ∧¬ϕ1 ), where ϕ1 and ϕ2 are arbitrary PDAGs. All new -nodes then satisfy determinism. Note that the new -nodes are not necessarily decomposable. The operation of making a PDAG deterministic runs in time linear in its size. (2) d-PDAG to cd-PDAG: Check for each remaining -node ϕ1 ∧ ϕ2 whether decomposability holds. Whenever ϕ1 and ϕ2 have a common sub-PDAG ψ, decomposability is not satisfied. In such a case, we condition ϕ1 ∧ ϕ2 on ψ and ¬ψ. This procedure is recursively applied to each newly created -node until decomposability holds for all -nodes. Computationally, this is the hardest task of the entire procedure and requires possibly exponential time. 0.8147 0.3647 0.3087
0.45
e4
e3
e4
e1 e2
0.343
e3
0.5
e5
e1 e2
0.5
e4
e5 0.9
(a)
0.056
0.1 0.9
0.686
0.98 0.18
e5
(b)
0.2
e2 e3
0.56
0.7
e1
0.8
(c)
Figure 2. (a) PDAG ϕA,D representing the SF for Conn 2 . (b) d-PDAG after transformation (1) applied to ϕA,D . (c) cd-PDAG after transformation (2) applied to ϕd A,D and the calculation of its probability.
As shown in Fig. 2(a), the two gray -nodes are not deterministic. In order to achieve determinism, we perform transformation (1) to obtain the d-PDAG ϕdA,D depicted in Fig. 2(b). This is still not enough for probability computation, since the gray -node is not decomposable. Thus we apply now transformation (2) by conditioning on the variable e2 , which is the only common sub-PDAG of the gray -node’s children. The resulting cd-PDAG ϕcd A,D is depicted in Fig. 2(c). Phase 4: Exact Reliability Computation – Probability Calculation The final step of the exact reliability computation consists in the probability calculation based on the obtained cd-PDAG. At this point, we can still assign arbitrary probabilities to the edges, or perform sensitivity analysis if required. In our sample network, we have assigned independent probabilities pi to each arc ei , as indicated at the leaves in the cdPDAG in Fig. 2(c): p1 = 0.8, p2 = p5 = 0.9, p3 = 0.7, and p4 = 0.5. By replacing each -node by a ∗ operation, each -node by a + operation, and by subtracting from 1 the probability of a -node’s child, we can recursively compute the probability of ϕcd A,D
A New Approach to Network Reliability - J. Jonczy & R. Haenni
31
in time linear in its size. The resulting probability P (ϕcd A,D ) = 0.8147 corresponds to the source-to-terminal connectedness with source A and terminal D. 2. Discussion We have presented a new method for the evaluation of network reliability. It allows to create a generic representation of the SF by means of a polynomial-time algorithm. With PDAGs, we have a powerful tool at hand to get a SF representation in a compact and flexible form, in some cases superior to other Boolean representation languages. The introduction of terminal selectors gives us the ability to select, by an appropriate instantiation, the specific SFs for the problems Conn ∃k , Conn ∀k (indirectly), and Conn 2 (the latter as special case of the two former). Based on the resulting PDAG, we can either compute the exact reliability by a foregoing transformation into a cd-PDAG, or directly (i.e. without further transformations) approximate the reliability by a sampling method. The method described is general in the sense that it includes undirected networks and node failures as special cases. References [1] [2] [3]
[4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
[14]
[15]
J. A. Abraham, An improved algorithm for network reliability IEEE Transactions on Reliability, 28 (1979), 58–61. B. Anrig and P. A. Monney, Using propositional logic to compute the probability of diagnoses in multistate systems, International Journal of Approximate Reasoning, 20(2) (1999), 113–143. M. O. Ball, C. J. Colbourn and J. S. Provan. Network reliability. In M. O. Ball, T. L. Magnanti, C. L. Monma, and G. L. Nemhauser, editors, Network Models, volume 7 of Handbooks in Operations Research and Management Science, pages 673–762. Elsevier, 1995. A. Bobbio, C. Ferraris and R. Terruggia, New challenges in network reliability analysis. In CNIP’06, International Workshop on Complex Network and Infrastructure Protection, pages 554–564, 2006. R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, 35(8) (1986), 677–691. C. J. Colbourn. The Combinatorics of Network Reliability, Oxford University Press, New York, 1987. A. Darwiche, A compiler for deterministic, decomposable negation normal form. In AAAI’02, 18th National Conference on Artificial Intelligence, pages 627–634, Edmonton, Canada, 2002. G. Hardy, C. Lucet and N. Limnios, K-terminal Network Reliability measures with Binary Decision Diagrams, IEEE Transactions on Reliability, 56(3) (2007), 506–515. K.D. Heidtmann, Smaller sums of disjoint products by subproducts inversion, IEEE Transactions on Reliability, 38(4) (1989), 305–311. S. Y. Kuo, S. K. Lu and F. M. Yeh, Determining terminal-pair reliability based on edge expansion diagrams using OBDD, IEEE Transactions on Reliability, 48(3) (1999), 234–246. A. Satyanarayana and A. Prabhakar, New topological formula and rapid algorithm for reliability analysis of complex networks, IEEE Transactions on Reliability, R-27 (1978), 82–100, Douglas R. Shier. Network reliability and algebraic structures, Oxford Clarendon Press, New York, USA, 1991. M. Wachter and R. Haenni. Propositional DAGs: a new graph-based language for representing Boolean functions. In P. Doherty, J. Mylopoulos and C. Welty, editors, KR’06, 10th Int. Conference on Principles of Knowledge Representation and Reasoning, pages 277–285, Lake District, U.K., 2006. AAAI Press. M. Wachter, R. Haenni and J. Jonczy. Reliability and diagnostics of modular systems: a new probabilistic approach. In C. A. González, T. Escobet and B. Pulido, editors, DX’06, 17th International Workshop on Principles of Diagnosis, pages 273–280, Peñaranda de Duero, Spain, 2006. X. Zang, H. Sun and K. S. Trivedi, A BDD-based algorithm for reliability graph analysis, Technical report, Department of Electrical Engineering, Duke University, 2000.
32
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Some Properties of Incomplete Repair and Maintenance Models Waltraud KAHLE 1 , Otto-von-Guericke-University, Germany Abstract. We consider an incomplete repair model, that is, the impact of repair is not minimal as in the homogeneous Poisson process and not "as good as new" as in renewal processes but lies between these boundary cases. The repairs are assumed to impact the failure intensity following a virtual age process of the general form proposed by Kijima. In previous works field data from an industrial setting were used to fit several models. In most cases the estimated rate of occurrence of failures was that of an underlying exponential distribution of the time between failures. In this paper it is shown that there exist maintenance schedules under which the failure behavior of the failure-repair process becomes a homogeneous Poisson process. Further, examples of optimal maintenance under incomplete repair are given. Keywords. Incomplete repair, Kijima type repairs, Virtual age, Time scale transformation, Preventive maintenance, Kijima model, Partial repair
Introduction In this research, we are concerned with the statistical modeling of repairable systems. Our particular interest is the operation of electrical generating systems. As in repairable systems, we assume the failure intensity at a point in time depends on the history of repairs. In the environment under investigation, it was observed that maintenance decisions were regularly carried out. We assume that such actions impacted the failure intensity. Specifically we assume that maintenance actions served to adjust the virtual age of the system in a Kijima type manner [5], [4]. Kijima proposed that the state of the machine just after repair can be described by its so-called virtual age which is smaller (younger) than the real age. In his framework, the rate of occurrence of failures (ROCOF) depends on the virtual age of the system. Our immediate interest was to obtain an operating/repair effects model consistent with data obtained from a selected hydro-electric turbine unit within the British Columbia Hydro-Electric Power Generation System. The data collected over the period January 1977 to December 1999 contains 496 sojourns with 160 failures. Two types of repairs are recorded by maintenance personnel, major repairs and minor repairs. The classification of repairs into these two categories is made at the time of the repair. Within this period, 50 major repairs and 96 minor repairs were conducted. All 50 major repairs occurred from a censor decision (i.e., a decision to shut the system down). Furthermore, 1 Corresponding Author: Waltraud Kahle, Institute of Mathematical Stochastics, Otto-von-GuerickeUniversity, D-39016 Magdeburg, Germany; E-mail:
[email protected]
Some Properties of Incomplete Repair and Maintenance Models - W. Kahle
33
of the 96 minor repairs, 1 of them was undertaken immediately following a failure. The remaining 95 are censored minor repairs. In addition to sojourn and censor times of these stoppages, the data also included the times to repair the system. These times ranged from a smallest of 1 minute to a largest of 66,624 minutes (or approximately 46 days). In this paper, we assume that the baseline failure intensity of the system follows a Weibull distribution λ(x) =
β x β−1 , β > 0, α > 0 . α α
1. Kijima Type repairs Consider the impact of repairs. A system (machine) starts working with an initial prescribed failure rate λ1 (t) = λ(t). Let t1 denote the random time of the first sojourn. At this time t1 the item will be repaired with the degree ξ1 . When the system is minimally repaired then the degree is equal to one, and if the repair makes the system as good as new then this degree is zero. The virtual age of the system at the time t1 , following the repair, is v1 = ξ1 t1 , implying the age of the system is reduced by maintenance actions. The distribution of the time until the next sojourn then has failure intensity λ2 (t) = λ(t − t1 + v1 ). Assume now that tk is the time of the k th (k ≥ 1) sojourn and that ξk is the degree of repair at that time. We assume that 0 ≤ ξk ≤ 1, for k ≥ 1. After repair the failure intensity during the (k + 1)th sojourn is determined by λk+1 (t) = λ(t − tk + vk )
,
tk ≤ t < tk+1 , k ≥ 0,
where the virtual age vk is for Kijima’s Type II imperfect repair model vk = ξk (vk−1 + (tk − tk−1 )), that is, the repair resets the intensity of failure proportional to the virtual age. Kijima’s Type I imperfect repair model suggests that upon failure, the repair undertaken could serve to reset the intensity only as far back as the virtual age at the start of working after the last failure. That is: vk = tk−1 + ξk (tk − tk−1 ). The process defined by v(t, ξk , k = 1, 2, . . .) = t − tk + vk , tk ≤ t < tk+1 , k ≥ 0 is called the virtual age process [6]. Figure 1 shows the mean number of failures over time for a minimal repair process (the Weibull process) where the degree of repair is 1, for the Weibull renewal process, where the degree of repair is 0, and, further, for some degrees of repair between 0 and 1 under Kijima Type II. In the two extreme cases, the expected number of failures is the cumulative hazard function (t/α)β for the Weibull process and the solution of the renewal equation for the Weibull renewal process. In the general case an explicit calculation of the expected number of failures is possible only for same very special cases. In the plot 100 failure processes with 50 failures where simulated with parameters α = 1, β = 1.5. Each line shows the mean number of failures from these 100 simulations for
Repairable systems modeling
20 15 10
Weibull process Renewal process degree of repair: 0.5 degree of repair: 0.3 degree of repair: 0.03
0
5
Mean Number of Failures
25
30
34
0
5
10
15
20
25
30
t
Figure 1. Mean number of failures under incomplete repair
the degrees of repair 1, 0.5, 0.3, 0.05 and 0. The straight line is the renewal function of a homogeneous Poisson process. It can be seen that the mean function of the counting process even for renewal processes is convex. In [2] a generalized Kijima type model was considered, where a major repair gives an additional impact. It was shown that the likelihood function can be developed from the general likelihood function for observation of point processes [7]. Further, the likelihood ratio statistic can be used to find confidence estimates for the unknown parameters. The numerical results for this data file are surprising: Under different assumptions about the repair actions (renewals, Kijima type I or II, mixture of Kijima type repairs and renewals in dependence on the time required for repair) a value for β was estimated approximately to be 1, see [2]. That is, the failure intensity is more or less constant. But in this case the failure behavior does not depend on maintenance actions. The results suggest, that in practice the engineers make a good maintenance policy, that is, they make repairs in connection with the state of the system. The idea is that such a policy makes the apparent failure behavior of a system to be that of an exponential distribution. This is consistent with our data. In figure 2 we see the cumulative distribution function of the operating time between failures together with the fitted CDF of an exponential distribution and the Q-Q plot (observed quantiles against the quantiles of the exponential model). These plots suggest reasonable agreement with the exponential model if we consider only the failure process and ignore all maintenance events. Note that the renewal function of a counting process, where the time between failure has an increasing failure rate, is convex. A behavior like in our real data can occur only if there are preventive maintenance actions. Definition 1 A maintenance policy is called failure rate optimal, if the state dependent preventive maintenance actions lead to a constant ROCOF of the failure process.
Some Properties of Incomplete Repair and Maintenance Models - W. Kahle
35
0.0
3e+05 2e+05 0e+00
1e+05
0.2
0.4
Fn(x)
0.6
Ordered failure times
4e+05
0.8
5e+05
1.0
6e+05
ecdf(failuretimes)
0e+00
2e+05
4e+05
6e+05
0
x
1
2
3
4
5
Quantiles of standard exponential
Figure 2. Operating time between failures: CDF and exponential Q-Q plot
2. Some Properties of Processes with Preventive Incomplete Maintenance Let us consider the case that, after each failure, the system is repaired minimally. Than, the failure process is an inhomogeneous Poisson process. Let Λ0 (t) be the mean function of these failure process. Further, preventive maintenance actions are undertaken at time points tk = k · Δ. The degree of each preventive maintenance is ξ, and it is the same degree for each maintenance. Such a preventive maintenance leads to a decrease of the virtual age as it was described in the previous section. The following theorems gives us an idea about the asymptotic behavior of the resulting failure process. Theorem 1 If preventive maintenance actions follow a Kijima Type I incomplete repair process than the failure process is an inhomogeneous Poisson process with mean function Λ(t) = Λ0 (ξt) . Proof: The virtual age of the failure process is influenced only by preventive maintenance because after a failure, the system is repaired minimally. After the kth maintenance action the virtual age of the system is vk = vk−1 + ξ(tk − tk−1 ) = vk−2 + ξ(tk−1 − tk−2 ) + ξ(tk − tk−1 ) = ... = ξ ·
k
(ti − ti−1 ) = ξ · t
i=1
Repairable systems modeling
36
Theorem 2 If preventive maintenance actions follow a Kijima Type II incomplete repair process than the failure process is renewal process where the time between failures is a truncated distribution. Proof: vk = ξ(vk−1 + (tk − tk−1 )) = ξ(tk−1 − tk−2 ) + ξ 2 (vk−2 + (tk − tk−1 )) = ... =
k
ξ i · (ti − ti−1 ) ≤ max(tk − tk−1 )
i=1
1 − ξk 1−ξ
The incomplete repair process tends to an renewal process where the waiting time is left truncated Weibull distributed.
3. Optimal Maintenance as Time Scale Transformation Following an idea in [1] we assume that by repair actions, the time scale is transformed by a function W (t). Let Λ0 (t) be the baseline cumulative hazard function and let Λ1 (t) = Λ0 (W (t)) be the resulting hazard after a transformation of the time scale. For the Weibull hazard Λ0 (t) = (t/α)β and W (t) = t1/β we get Λ1 (t) = Λ0 (t1/β ) =
t , αβ
that is, the hazard function of an exponential distribution with parameter λ1 = 1/αβ . In practice we have repair actions at discrete time points, which lead to the question of the degrees of repair at these points. Let us consider two examples. In both examples we assume that after a failure the system is repaired minimally. Additionally, maintenance decisions were regularly carried out. We assume that maintenance actions served to adjust the virtual age of the system in a Kijima type manner. Example 1: Assume that the distances between maintenance actions are constant and all repair actions follow the Kijima type I repair process. Let t1 , t2 , . . . be the time points of maintenance actions and Δ = tk − tk−1 , k = 1, 2, . . . , where t0 = 0, be the constant distance between maintenances. Then it is possible to find a discrete time transformation which consists of different degrees of repair. Let the sequence of degrees be ξk =
k 1/β − (k − 1)1/β . Δ1−1/β
Then the virtual age vn of the system at time tn = n · Δ can be found to be
Some Properties of Incomplete Repair and Maintenance Models - W. Kahle
37
Figure 3. A discrete time transformation
vn = Δ
n k=1
ξk = Δ
n k 1/β − (k − 1)1/β = (n · Δ)1/β . Δ1−1/β k=1
Example 2: Again we assume that the distances between maintenance actions are constant, but now we consider the Kijima type II repair process. In this case the appropriate sequence of degrees of repair is ξk =
k 1/β . (k − 1)1/β + Δ1−1/β
In both cases the sequence is decreasing, that is, with increasing time the repairs must become better. It should be noted that in case of time scale transformation it is not necessary to make a difference between Kijima type I and II. In both examples the virtual age at maintenance points was reseted to those of the continuous time transformation as it is shown in figure 3. In figure 4 are shown the cumulative hazard functions for an Weibull process without maintenance (solid line) and for maintenance actions every Δ = .1 time units (broken line). For this, a Weibull process with parameters α = 1 and β = 2.5 and 30 failures was simulated. The difference Δ = .1 between maintenance actions is relatively small, and the empirical cumulative hazard function of the process with preventive maintenance is closed to that of a Poisson process. The dotted line shows the theoretical cumulative hazard function of an homogeneous Poisson process. There are many other possibilities for finding failure rate optimal maintenance policies. One other very simple policy is to consider constant degrees of repair. It is easy to see that in this case the repair actions must take place more often with increasing time.
Repairable systems modeling
60 40
Weibull process Delta=1.0 Delta=0.5 Delta=0.1 continuouse
0
20
Number of Failures
80
100
38
0
20
40
60
80
100
t
Figure 4. Weibull process without and with preventive maintenance actions
4. Discussion We have considered failure rate optimal maintenance under the assumption, that the maintenance action has an impact between the two extreme cases minimal repair and renewal. For finding cost optimal maintenance it is necessary to define a cost function which describes the costs of repair actions according to the degree of repair. Further, additional assumptions about times of maintenance actions must be made, because there is the possibility of making frequently small repairs or rarely large repairs that cause the same costs.
References [1]
[2] [3]
[4] [5] [6] [7]
Finkelstein, M. S.: Modeling a Process of Non-Ideal Repair. In N. Limnios, M. Nikulin (Eds.), Recent Advances in Reliability Theory, Statistics for Industry and Technology, pp. 41–53. (2000) Boston: Birkhhauser. Gasmi, S., C.E. Love, and Kahle, W.: A General Repair/Proportional Hazard Framework to Model Complex Repairable Systems, IEEE Trans. on Reliability, 52 (2003) 26–32. Kahle, W. and Love, C. E. (2003). Modeling the Influence of Maintenance Actions. In B.H. Lindquist, K. A. Doksum (Eds.), Mathematical and Statistical Methods in Reliability, Quality, Reliability and Engineering Statistics, pp. 387–400,: World Scientific Publishing Co. Kijima, M. (1989). Some results for repairable systems with general repair, Journal of Applied Probability, 26 (1989) 89-102. Kijima, M., H. Morimura, and Y. Suzuki: Periodical Replacement Problem without Assuming Minimal Repair, European Journal of Operational Research, 37 (1988), 194–203. Last, G. and R. Szekli: Stochastic Comparison of Repairable Systems by Coupling, Journal of Applied Probability, 35 (1998), 348-70. Liptser, R.S. and A.N. Shiryayev: Statistics of Random Processes, vol II, (1978), Springer, New York.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
39
On Models of Degradation and Partial Repairs 1 Petr VOLF 2 Institute of Information Theory and Automation, Prague, Czech Republic Abstract. The models of imperfect repairs are mostly based on the reduction of the cumulated hazard rate, either directly or indirectly (by shifting the virtual age of the system). If the state of the system is characterized by a process of deterioration, the repair degree can be connected with the reduction of the deterioration level. Such a view actually transforms the time scale (or the scale given by the cumulated hazard rate) to the scale of the growing deterioration. From the problems connected with such models (consistency of statistical analysis, model fit assessment etc.) we shall discuss mainly the question of the repair schemes, their consequences, and possibilities of an ‘optimal’ repair policy leading to the hazard rate stabilization. Keywords. Partial repair, Kijima model, intensity of failure, degradation, incomplete repair, optimal maintenance
Introduction In reliability models it is often assumed that the intensity of failure of a technical device is influenced by a process of degradation. The degradation level is either observed directly, or just indirectly, through statistical data. Further, let us assume that the component corresponding to the device deterioration can be controlled. Hence, it is possible to search for the relationship between the extent of repair (taken as the reduction of certain value characterizing the damage) and the “repair level” in the sense of Kijima models (Kijima, [4]), i.e. taken as the reduction of virtual age of the object, or, in other words, as the increase of its survival time. In the contribution, we concentrate to the case when the degradation is modeled via a non-decreasing function or a random process, for instance the step-wise random shocks process (actually a compound Poisson process or its generalization), with known or estimable statistical characteristics. The effect of degradation level reduction will be studied and the prolonged expected life-time after such actions evaluated. Repair strategies optimal with respect to certain requirements will be considered, too. Finally, we shall consider the case that the hazard rate model consists of more components (‘repairable’ or not) and examine the same questions, namely that of the repair scheme and of the impact of repairs to the intensity of failures. ˇ No IAA101120604 research was supported by the project of the GAAV CR Author: Petr Volf, Institute of Information Theory and Automation, 18208 Prague 8, Czech Republic, E-mail:
[email protected] 1 The
2 Corresponding
Repairable systems modeling
40
1. Basic Scheme of Repairs Let us first recall briefly the most common schemes of repair of a repairable component and the relationship with the distribution of the time to failure. The renewal means that the component is repaired completely, fully (e.g. exchanged for a new one) and that, consequently, the successive random variables – times to failure – are distributed identically and independently. The resulting intensity of the stream of failures is called the renewal density, and has the meaning P (failure occurs in [t, t + d)) . d d→0 ∞ Its integral (i.e. cumulated intensity) is then H(t) = E[N (t)] = k=0 k · P (N (t) = k), where N (t) is the number of failures in (0,t]. Let f (t), F (t)denote the density and distribution function of the time to failure. Then t so called renewal equation h(t) = f (t)+ 0 h(t−u)f (u)du holds provided the ‘renewal’ t occurs just after each failure, consequently also H(t) = F (t) + 0 H(t − u)f (u)du. h(t) = lim+
1.1. Repairable models with general repair There are several natural ways in which the notion of complete repairs can be generalized to partial and incomplete repairs. One such contribution is in the paper of M. Kijima [4]. Let F be again the failure distribution of a new system. Assume that at each time the system fails, after a lifetime Tn from the preceding failure, a maintenance activity takes place (executed in negligible time) such that reduces the virtual age to some value Vn = y, y ∈ [0, Tn + Vn−1 ] immediately after the n-th repair (V0 = 0). The distribution of the n-th failure-time Tn is then P [Tn ≤ x|Vn−1 = y] =
F (x + y) − F (y) . 1 − F (y)
M. Kijima then specified several sub-models of imperfect repairs. Denote by An the degree of the n-th repair (a random variable taking values between 0 and 1). Then in Model I the n-th repair cannot remove the damages incurred before the (n-1)th repair, Vn = Vn−1 + An · Tn . On the contrary, the Model II allows for such a reduction of the virtual age, namely Vn = An · (Vn−1 + Tn ). Special cases contain the perfect repair model with An = 0, minimal repair model, An = 1, and frequently used variant with constant degree An = a. Naturally, there are many others different generalizations, e.g. we can consider a randomized degree of repair, or the regressed degree (based on the system history). A set of variant models is also due M.S. Finkelstein [2], who actually ‘accelerated’ the virtual time after each ‘renewal’ repair. It means that the distributions of Ti , the time-to-failure after i-th repair, differ. A reasonable assumption is that Ti is stochastically non-increasing with i, Ti+1 ≤st Ti , i.e. Fi+1 (t) ≥ Fi (t). A simplest example assumes that Fi (t) = F (ui−1 t), u > 1, then a generalization can consider an accelerated time model with time-dependent functions Wi (t), i.e. Fi+1 (t) = Fi (Wi (t)), where usually Wi (t) ≥ t, Wi (t) ≥ 1. It follows that Fi (t) = F0 (W0 (W1 (..(Wi−1 (t))..). The interpretation is straightforward, values of W (t) measure (reflect) a relative speed of degradation.
Models of Degradation and Partial Repairs - P. Volf
41
2. A Model for Preventive Repairs Let us consider the following simple variant of the Kijima II model with constant degree 1 − δ and assume that it is used for the description of the system virtual age change after preventive repairs. Further, let us assume that after the failure the system is repaired just minimally, or that the number of failures is much less than the number of preventive repairs. Let Δ be the (constant) time between these repairs, Vn , Vn∗ the virtual ages before and after n−th repair, and: ∗ Vn = Vn−1 + Δ, Vn∗ = δ · Vn .
If we start from time 0, then V1 = Δ, V1∗ = δΔ, V2 = δΔ + δ = Δ(δ + 1), V2∗ = Δ Δ(δ 2 + δ), V3 = Δ(δ 2 + δ + 1) etc. Consequently, Vn → 1−δ , i.e. it ‘stabilizes’, for each δ and Δ there is a limitmeaning that the actual intensity of failures h(t) ‘oscillates’
δΔ Δ and h0 1−δ , where h0 (t) is the hazard rate of the time-to-failure between h0 1−δ distribution of the non-repaired system. Simultaneously, thecumulated intensity increases regularly through intervals of Δ δΔ length Δ by dH = H 1−δ − H 1−δ ,, i.e. ‘essentially’ with the constant slope
a = dH/Δ. Example: Let us consider the Weibull model, with H0 (t) = α · expβ , (β > 1, say). In that case dH = αΔβ
β 1 − δβ β−1 1 − δ , a = αΔ . (1 − δ)β (1 − δ)β
As the special cases, again the perfect repairs, δ = 0, minimal repairs with δ ∼ 1, and the exponential distribution case with β = 1 can be considered. Figure 1 shows a graphical illustration of such a stabilization in the case that the hazard rate h0 (t) increases exponentially. Remark 1 If the Kijima II model holds (with constant times between repairs Δ) it is always possible to stabilize the intensity by selecting the upper value of H ∗ and repair always when H(t) should reach that value. Then Vn = V = H −1 (H ∗ ), Vn∗ = δVn again, and the interval between repairs should be Δ = V (1 − δ). On the contrary, if we can reduce just the last time increment, (Kijima I model), for n degrees δn and intervals Δn of repairs we get that Vn = k=1 δk Δk , in the constant Δ case we have to decrease δk to 0 in order to keep Vn stabilized. Similarly in the case of accelerated model of repairs, there has to be a deal between the acceleration and the decrease of inter-repairs intervals. 2.1. An optimal selection of repair interval and degree If we consider the stabilized case, and moreover the failures are much less frequent than preventive repairs, then there quite naturally arises the problem of selection of δ to given repair interval Δ (or optimal selection of both). By optimization we mean here the search for values yielding the minimal costs of repairs, which has a sense especially in the case when the repairs after failures are too expensive.
Repairable systems modeling
42
Hazard rate, exp case, ho(t)=0.01*exp(0.5*t), delta=0.7 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
0
2
4
6
8
10
12
14
12
14
Cumulated hazard rate, with and without repairs 2
1.5
1
0.5
0
0
2
4
6
8
10
Figure 1. Case of exponentially increasing h0 (t) = 0.01 ∗ exp(0.5 ∗ t), δ = 0.7, Δ = 1
Let C0 be the cost of failure (and its repair), C1 (δ, Δ) the cost of the preventive repair. Then the mean costs to a time t can be written as C ≈ C0 · E(N (t)) +
t · C1 (δ, Δ), Δ
where E(N (t)) is the mean number of failures up to t, which is actually H(t), H is the cumulated intensity of failures under our repairs sequence. Namely E(N (t)) ≈ Δt · dH. For instance, in the Weibull model with H0 (t) = α · expβ we already have seen that dH = αΔβ (1 − δ β )/((1 − δ)β ). The problem is the selection of function C1 , it should reflect the extent of repair. It leads us to the idea to evaluate the level of system damage, deterioration, and connect the repair with its reduction.
3. Incomplete Repair Reducing the System Deterioration Let us therefore consider a function S(t) (or a latent random process) evaluating the level of degradation after a time t of system usage. In certain cases we can imagine t S(t) = 0 s(u)du with s(u) ≥ 0 is a stress at time u. We further assume that the failure occurs when S(t) crosses a random level X. Recall also that (in the non-repaired system) the cumulated hazard rate H0 (t) of random variable T = time-to failure has the similar meaning, namely the failure occurs when H0 (t) crosses a random level given by Exp(1) random variable, Hence, as T > t <=> X > S(t), i.e. F¯0 (t) = F¯X (S(t)), where by F¯ we denote the survival function, then H0 (t) = − log F¯X (S(t)).
Models of Degradation and Partial Repairs - P. Volf
43
We can again have some special cases, for instance: • X ∼ Exp(1), then H0 (t) = S(t), • S(t) = c · td , d ≥ 0, and X is Weibull (a, b), then T is also Weibull (α = acb , β = b + d), i.e. H0 (t) = α · tβ . Let us now imagine that the repair reduces S(t) as in the Kijima II model, to S ∗ (t) = δ · S(t). In the Weibull case considered above we are able to connect such a change with the reduction of virtual time from t to some t∗ : 1
S(t∗ ) = S ∗ (t) => t∗ = δ d · t, 1
so that the virtual time reduction follows the Kijima II model, too, with δt = δ d . As it has been shown, each selection of δ, Δ leads (converges) to a stable (‘constant’ intensity) case. For other forms of function S(t), e.g. if it is of exponential form, S(t) ∼ ect −1, such a tendency to a constant intensity does not hold in general. Nevertheless, it is possible to select convenient δ and Δ, as noted in Remark 1.
4. Degradation as a Random Process In the case we cannot observe the function S(t) directly, and it is actually just a latent factor influencing the lifetime of the system, it can be modeled as a random process. There are several possibilities, for instance: 1. S(t) = Y · S0 (t), Y > 0 is a random variable, 2. Diffusion with trend function S0 (t) and B(t)-the Brown process, S(t) = S0 (t)+ B(t). 3. S(t) cumulating a random walk s(t) ≥ 0. 4. Compound Poisson process (and its generalizations). Though the last choice, sometimes connected also with the “random shock model”, differs from the others, because its trajectories are not continuous, we shall add several remarks namely to this case. The compound point process is the following random sum S(t) =
Tj
Y (Tj ) =
t
0
Y (u)dN (u)
with the counting process N (t) yielding the random times Tj and random variables Y (t) > 0 giving the increments. It holds that ES(t) =
0
t
λ(u) · μ(u)du, var(S(t)) =
t 0
λ(u) · (μ2 (u) + σ 2 (u))du.
Again, it is assumed that the failure occurs when the process S(t) crosses a level x. Hence, S(t) < x <=> t < T , therefore F¯0 (t) = FS(t) (x), where FS(t) (x) is the ∞ compound distribution at t. If X is a random level, then the right side has the form 0 FS(t) (x)dFX (x).
44
Repairable systems modeling
The evaluation of the compound distribution is not an easy task, nor in the simplest version of compound Poisson process. There exist approximations (derived often in the framework of the financial and insurance mathematics). Another way how to evaluate it consists in the random generation. 4.1. Partial repairs and their optimization What occurs when, as in the preceding cases, the repairs of degree (1 − δ) in regular time intervals Δ are applied to the system? It is assumed that when we decide to repair, then we are able to observe actual state of S(t). Random generation shows that the system then behaves similarly as in the non-randomized case, and has the tendency to stabilize the intensity. We can now return to the ‘cost optimization’ problem which has been already described in Section 2.1, but without specifying the function C1 (δ, Δ). It can be done now for instance as C1 · (dS(t))γ + C2 , where dS(t) = S(t)(1 − δ) = S(tend ) − S(tinit ), C1 and C2 are constants, the later evaluating a fixed cost of each repair. Hence the mean costs per time t can be expressed as C0 · E(N (t)) + C1 ·
t · {(1 − δ)S(trep )}γ + C2 Δ
Of course, a proper selection of costs and function C1 in real case is a matter of system knowledge and experience. We performed several randomly generated examples, in some cases it has been possible to find a minimum w.r.t. δ and Δ, for given other parameters, mostly a minimum of Δ to fixed δ, while optimal δ to selected Δ lied often close to complete or minimal repair degree.
5. Degradation Process as a Part of Intensity Model When the degradation process is just one of factors influencing the survival of the system, it is quite natural to use it as a covariate in a regression model of failure intensity, In the situation when such a factor is not observed directly, it is more appropriate to use a model with latent component. In every case, it has a sense to consider the intensity of failure having several parts, one of them expressing the influence of the degradation process of the interest. Moreover, if the additive form of the intensity model is used (as in the case of Aalen regression model for intensity), the components stay separated even when integrated to cumulated intensity. Let us therefore recall several basic regression models for intensities of failures: 1. In the additive (also Aalen’s) model, the total intensity is the sum of the intensities of components, e.g. h(t) = h1 (t) + h2 (t). 2. In the multiplicative model h(t) = h1 (t) · h2 (t). Cox’s model uses the form h(t) = h0 (t) · exp(B(z(t)), where z(t) is the regressor, i.e. in our case some characteristics of the deterioration (cf. Bagdonavicius and Nikulin, [1]). 3. Accelerated failure-time model H(t) = H0 (V (t)) was already briefly recalled here, too, in the connection with the model of growing virtual age proposed for instance in the paper of Finkelstein [2].
Models of Degradation and Partial Repairs - P. Volf
45
The schemes of regression mentioned above offer different possibilities how to model the impact of degradation and then of repairs. Let us demonstrate it on the case of the multiplicative model. Namely, let the underlying hazard rate of a non-repaired system be h0 (t) · exp(S(t)), where the function S(t) > 0 is non-decreasing and characterizes the degradation of a repairable component. Let us for the simplicity consider just full repairs, in regular time intervals Δ, and follow the system without failure. It starts at time 0, at times n · Δ its intensity of failure is h(n · Δ) = h0 (n · Δ) exp(S(Δ)), which is by the repair reduced to h0 (n · Δ) (S(t) is reduced to 0). Thus, we can here speak about a constant degree (exp(−S(Δ)) of the reduction of intensity, but if h0 is increasing, the whole h(t) remains increasing by the same trend. In the time interval s ∈ ((n−1)Δ, nΔ) the intensity is then h(s) = h0 (s) exp{S(s − (n − 1)Δ)}. Consequently, it yields the case different from the accelerated scheme studied in Finkelstein [2]. The assumption of additive hazards leads to another set of models. It is also worth to note that the additive model corresponds to certain extent to the case of serial system. In a serial scheme of two independent parts the failure time of the system T = min(T1 , T2 ), i.e. F¯ (t) = F¯1 (t) · F¯2 (t), so that H(t) = H1 (t) + H2 (t), too. So that another natural source of models comes directly from the structure of analyzed system described e.g. with the aid of the Fault Tree. Once the FT is constructed, it is possible to consider different repair strategies (of components, subsystems) and, at least by random generation, to evaluate their costs. It is possible to say that such an approach is the most valuable from the practical point of view, but even here the proper description of consequences of partial repairs (e.g. by models discussed above) is inevitable.
6. Discussion The objective of the paper was to propose several new models of (incomplete) repairs based on the process of system deterioration. There are many different real cases corresponding to different models forms. However, especially if the deterioration process is latent, its proper modeling and estimation is crucial for further assessing the system optimal performance and repairs effect. The contemporary statistical techniques based on the Bayes approach and random generation can be very helpful in such analysis and should became the inevitable tool also in the future works on the deterioration and repair schemes modeling.
References [1] [2]
[3]
[4]
V.B. Bagdonavicius and M.S. Nikulin, Generalized proportional hazards model: Estimation, Lifetime Data Analysis 3 (1999). M.S. Finkelstein, Modeling a Process of Non-Ideal Repairs. In: N. Limnios and M. Nikulin (Eds.), Recent Advances in Reliability Theory, Birkhhauser, Series Statistics for Industry and Technology (2000), 41–53. W. Kahle and C.E. Love, Modeling the Influence of Maintenance Actions. In: B.H. Lindquist and K.A. Doksum (Eds.), Mathematical and Statistical Methods in Reliability, World Scientific Publishing Co., Series on Quality, Reliability and Engineering Statistics (2003), 387–400. M. Kijima, Some results for repairable systems with general repair. Journal of Applied Probability 26 (1989), 89–102.
46
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Sensitivity Analysis of a Reliability System Using Gaussian Processes Alireza DANESHKHAH 1 , Tim BEDFORD Department of Management Science, University of Strathclyde Abstract. The availability of a system under a failure/repair process, is a function of time which can be calculated numerically. The sensitivity analysis of this quantity with respect to change in parameters is the main objective of this paper. In the simplest case that the failure repair process is (continuous time/discrete state) Markovian, explicit formulas are well known. Unfortunately, in more general cases this quantity could be a complicated function of the parameters. Thus, the computation of the sensitivity measures would be infeasible or might be time-consuming. In this paper, we present a Bayesian framework originally introduced by Oakley and O’Hagan [7] which unifies the various tools of probabilistic sensitivity analysis. These tools are well-known to Bayesian Analysis of Computer Code Outputs, BACCO. In our case, we only need to quantify the availability measure at a few parameter values as the inputs and then using the BACCO to get the interpolation function/sensitivity to the parameters. The paper gives a brief introduction to BACCO methods, and the availability problem. It illustrates the technique through the use of an example and makes a comparison to other methods available. Keywords. Availability, Bayesian analysis, computer models, emulator, Gaussian process, sensitivity analysis
Introduction In this paper, we are going to present a new approach to study sensitivity analysis of the quantities of interest in reliability analysis, such as availability/unavailability. Sensitivity analysis is concerned with understanding how changes in the model inputs/distribution parameters would influence the output(s)/inference. Suppose that our deterministic model can be shown by y = f (x), where x is a vector of input variables (or parameters) and y is the model output. For example, the inputs can be considered as the parameters of the failure and repair densities and the output can be the availability, A(t) at time t. The traditional method to examine sensitivity of a model with respect to the changes in its input variables is local sensitivity analysis which is based on derivatives of f (.) evaluated at x = x0 and indicates how the output y will change if the base line input values are perturbed slightly (see [2] for the different local sensitivity measures commonly 1 Corresponding Author: Department of Management Science, University of Strathclyde, Graham Hills Building, 40 George Street, Glasgow, G1 1QE, Scotland; E-mail:
[email protected].
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
47
used in Bayesian analysis). This is clearly of limited value in understanding the consequences of real uncertainty about the inputs, which would in practice require more than infinitesimal changes in the inputs. Furthermore, these methods are computationally are very expensive for the complex models and usually require considerable amount of the model runs if we use Monte Carlo based method to compute these sensitivity measures. For instance, Marseguerra et al. [5] use Monte Carlo simulation to calculate the First-order differential sensitivity indexes of the basic events characterizing the reliability behavior of a nuclear safety system. They reported that the computation of the sensitivity indexes for the system unavailability at the mission time by Monte Carlo simulation require 107 iteration. This issue is particularly more interesting in the case where the model is so complex that simply computing the output for any given set of input values is a non-trivial task. The large process models in engineering, environmental science, reliability analysis, etc are often implemented in complex computer codes that require many minutes, hours or even days for a single run, which are called expensive models. Unfortunately, the local sensitivity measures computed usually based on the Monte Carlo simulation methods or the standard sensitivity techniques introduced by Saltelli et al. [10] require a very large number of model runs. Even for a model that takes just one second to run, the most sensitivity analysis measures may demand millions of model runs which 11.5 days of continuous CPU time is needed for completing it. We therefore need a more efficient computational tools to implement sensitivity analysis. Oakley and O’Hagan [7] present a Bayesian approach of sensitivity analysis which unifies the various methods of probabilistic sensitivity analysis which will be briefly introduced in Section 1. This approach is computationally highly efficient and allows effective sensitivity analysis to be achieved by using very smaller numbers of model runs than Monte Carlo methods require. The range of tools used in this approach also enables us to do uncertainty analysis, prediction, optimization and calibration. Section 2 is dedicated to present this method. In Section 3, we present two examples which the sensitivity analysis of the availability with respect to the changes in the parameters of the failure and repair densities will be examined by the new method. The failure and repair densities in the first example are exponentials, used to validate the technique by comparison with known formulas and in the second one are Weibulls.
1. Probabilistic Sensitivity Analysis We briefly introduce the most well-known probabilistic sensitivity analysis approaches that are addressed in the literature (see [11]), but we focus on main effects and interactions and variance-based methods, see Oakley and O’Hagan [7] for the details of other sensitivity measures. We first introduce some notation. We suppose that x = {x1 , . . . , xd }, where xi is the ith element of x, the subvector (xi , xj ) is shown by xi,j , and in general if p is a set of indices then xp is the subvector of x whose elements have those indices. We also denote x−i as the subvector of x containing all elements but xi .
Repairable systems modeling
48
1.1. Main effects and interactions This sensitivity approach is focused on the decompositions of the function f (.) into main effects and interactions as follows: y = f (x) = E(Y )+ Σdi=1 zi (xi ) + Σi<j zi,j (xi,j ) + Σi<j
Vi , var(Y )
STi =
VTi = 1 − S−i var(Y )
where Si is the main effect index of xi , and STi is the total effect index of xi . The details of other sensitivity analysis methods, such as variance decomposition and regression components can be seen in Oakley and O’Hagan [7].
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
49
2. Emulators-based Sensitivity Analysis The Bayesian inference tools developed by Oakley and O’Hagan [7] for estimating all the quantities of interest in sensitivity analysis, for the case of expensive functions will be presented in this section. The implementation of this sensitivity analysis is feasible only with a much smaller number of model runs which can be considered as a key benefit of this approach and can also estimate many probabilistic sensitivity measures introduced in Section 1, from a single set of runs. The most important aspect of this Bayesian approach is that the model (function) f (.) is considered as an unknown function. In a pragmatic sense f (x) is unknown for any particular input configuration x until we actually run the model for those inputs. We then formulate a prior distribution for f (.). This prior is then updated according to the usual Bayesian paradigm, using as data the outputs yi = f (xi ), i = 1, . . . , N , from a set of runs of the model. The result is a posterior distribution for f (.), which is used to make formal Bayesian inferences about the sensitivity measures that were introduced in the previous section. 2.1. Inference About Functions Using Gaussian Processes We should first develop a prior distribution for f (.) in the form of a Gaussian process prior distribution and then derive the posterior distribution in the light of data. The key requirement to use the Gaussian process is that f (.) should be a smooth function, so if we know the value of f (θ) we should then have some idea about the value of f (θ ) for θ close to θ . This basic assumption of a smooth, continuous function gives the Gaussian process major computational advantages over MC methods, since the extra information that is available after each code run (evaluating the function at some value of θ) is ignored in the Monte Carlo simulation approach. Using Gaussian process prior for f (.) would guarantee that the uncertainty about {f (θ1 ), . . . , f (θn )}, given any set of points {θ1 , . . . , θn }, can be represented through a multivariate normal distribution. The mean of f (θ) conditional on the hyperparameters, β, can be considered as E(f (θ)|β) = h(θ)T β where h(.) is a vector of q known functions of θ, β is a vector of coefficients. The choice of h(.) is arbitrary, but it should be chosen to incorporate any beliefs that we might have about the form of f (.). The covariance between f (θ) and f (θ ) is given by cov(f (θ), f (θ )|σ 2 ) = σ 2 c(θ, θ ) where c(., .) is a monotone correlation function on R+ with c(θ, θ) = 1, and decreases as |θ −θ | increases. Furthermore, the function c(., .) must ensure that the covariance matrix of any set of outputs {f (θi )}ni=1 is positive semidefinite. Throughout of this paper, we use the following correlation function which satisfies all the conditions mentioned above and is widely used in BACCO for its computationally convenience. c(θ, θ | b) = exp{−b(θ − θ )2 } where b is a smoothness parameter.
Repairable systems modeling
50
We consider the proper prior distributions for the hyperparameters, (β, σ 2 ) as follows 1
p(β, σ 2 ) ∝ (σ 2 )− 2 (d+q+2) exp{−{(β − z)T V −1 (β − z) + a}/(2σ 2 )} The output of f (.) is evaluated at n design points θ1 , . . . , θn to generate the following vector y = {f (θ1 ), . . . , f (θn )}. It should be noticed that, these points, in contrast with Monte Carlo methods, are not chosen randomly but are selected to give good information about f (.). These points will usually be spread to cover Θ, the parameter space of θ, although their choice will also depend on the distribution of the uncertain inputs, G (see [9] for further details of choosing design points). The standardized posterior distribution of f (.) is given by f (θ) − m∗ (θ) | y ∼ td+n σ ˆ c∗ (θ, θ ) where ˆ m∗ (θ) = h(θ)T βˆ + t(θ)T A−1 (y − H β) c∗ (θ, θ ) = c(θ, θ ) − t(θ)T A−1 t(θ )+ (h(θ)T − t(θ)T A−1 H)(H T A−1 H)−1 (h(θ )T − t(θ )T A−1 H)T
(1)
t(θ)T = (c(θ, θ1 ), . . . , c(θ, θn )), H T = (h(θ1 )T , . . . , h(θn )T ), ⎛
⎞ c(θ1 , θ2 ) . . . c(θ1 , θn ) ⎜ ⎟ .. ⎜ c(θ2 , θ1 ) ⎟ 1 . ⎜ ⎟ A=⎜ ⎟ .. . . ⎝ ⎠ . . 1 c(θn , θ1 ) . . . 1
ˆ {a + z T V −1 z + y T A−1 y − βˆT (V ∗ )−1 β} βˆ = V ∗ (V −1 z + H T A−1 y), σ ˆ2 = (n + d − 2) V ∗ = (V −1 + H T A−1 H)−1 , y T = (f (θ1 ), . . . , f (θn )) The outputs corresponding to any set of inputs will now have a multivariate t-distribution, with covariance between any two outputs given by Equation (1). Note that the tdistribution arises as a marginal distribution for f (.) after integrating out the hyperparameters β and σ 2 . In practice, further hyperparameters will be associated with the modeling of the correlation function, c(., .), and it is generally impossible to integrate the posterior distribution analytically with respect to these further parameters. Although it is possible to integrate numerically, in particular, by using Markov chain Monte Carlo (MCMC) sampling which is a highly intensive computation, but it is suggested that it is simply appropriate to estimate the hyperparameters of c(., .) from the posterior distribution and then to substitute these estimates into c(., .) wherever it appears in the above formulas (see [4]).
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
51
2.2. Inference for Main Effects and Interactions In this section, we explain how inferences about the sensitivity measures introduced in the preceding section can be estimated from the Gaussian process posterior distribution derived in Subsection 1.1. Our Bayesian approach, like Monte Carlo methods, estimates such measures by formal statistical methods, and these estimates can be accompanied by standard errors or standard deviations to indicate their accuracy. Monte Carlo methods applied to very cheap functions typically employ many thousands of model runs, so that the estimation error is very small. When we consider expensive functions, it will rarely be feasible to do enough runs for the estimation error to be negligible. A key benefit of the methods developed here and in Oakley and O’Hagan [7] is that the standard deviations that are associated with these estimates are generally very much smaller, often by orders of magnitude, than those which are obtained from a Monte Carlo method with the same number of model runs. It is this that allows us to achieve useful sensitivity analysis of complex expensive models without having to make prohibitively many runs. First consider inference about E(Y | xp ) = f (x)dG−p|p (x−p | xp ) χ−p
using the notation (introduced earlier) for the space of possible values of x−p and for its conditional distribution given xp , where the uncertainty about the elements of the input, X is described by some probability distribution2 G. Since this expectation, E(Y | xp ) is a linear functional of the Gaussian process f (.), so its posterior distribution will be td+n . The posterior mean of this measure is given by Epost {E(Y | xp )} = Rp (xp )βˆ + Tp (xp )e where
Rp (xp ) =
X−p
h(x)T dG−p|p (x−p | xp )
Tp (xp ) =
X−p
t(x)T dG−p|p (x−p | xp )
ˆ and e = ×(A−1 (y − H β)). We now are able to calculate the main effect or interaction as follows Epost {zi (xi )} = {Ri (xi ) − R}βˆ + {Ti (xi ) − T }e Epost {zi,j (xi,j )} = {Ri,j (xi,j ) − Ri (xi ) − Rj (xj − R)}βˆ + {Ti,j (xi,j ) − Ti (xi ) − Tj (xj − T )}e. Similarly, we can derive the standard deviations of the main effects and interactions, see [7] for the details of this computation. 2 Note
input.
that the definitions of the main effects and interactions depend on the distribution G of the uncertain
Repairable systems modeling
52
We can plot the posterior mean of the main effect Epos (zi (xi ) against xi , with bounds of, for example, plus and minus two posterior standard deviations. We are able to draw Epos (zi (xi )) for i = 1, . . . , d on a single plot if each input variable is standardized. This plot will give us a good graphical summary of the influence of each variable. We will present this plot for the examples given in Section 3. 2.3. Inference for variances We represent posterior inference about the variance-based measures introduced in Subsection 1.2, Vi and VTi . Since these measures are quadratic functionals of f (.), their posterior distributions cannot then be analytically obtained. Oakley and O’Hagan [7] proposed a way to calculate the posterior mean of Vp = var{E(Y | Xp )} for any subvector xp . They first decompose var{E(Y | Xp )} as follows var{E(Y | Xp )} = E{E(Y | Xp )2 } − E(Y )2 and then calculate the posterior mean of the terms in the right hand side of (2.3). We skip the details of these computations and refer the reader to [7]. Note that, the posterior means of the main effect variance Vi and the complementary effect variance VTi (given in Subsection 1.2) can be obtained from the derivation of Epost {var(E(Y | Xp ))}.
3. Applications in Reliability Analysis We exhibit the methodology well know also as BACCO (Bayesian Analysis of Computer Code Output), discussed in this paper with two examples where we examine the sensitivity analysis of the quantities of interest in reliability analysis with respect to the changes of the relevant parameters. We are particularly interested to study the sensitivity analysis of the availability function with respect to the uncertainty in the parameters of the failure and repair distributions. In the first example, these distributions are assumed to have exponential distributions while in the second one they have Weibull distributions. The figures and results shown in this section have been obtained by using software, called GEM-SA. This program generates a statistical emulator of a computer code. The training data required to build the emulator are an arbitrary set of inputs to the code and the corresponding outputs from the code. We use main effects and joint effects to aid model interpretation and model checking as part of a sensitivity analysis, when direct evaluation of the model output is too costly. There are no restrictions on the values of the inputs selected to build the emulator, but a good emulator (one presenting small uncertainties from relatively small numbers of runs) usually requires a well spaced set of inputs covering the range of inputs over which emulation is required. This software is able to generate an efficient set of data points or inputs based on maximin Latin hypercube or LP-tau designs. The smoothness hyperparameters of the model output can also be estimated by using this software. We now introduce our system and the measures that required to examine reliability analysis of the system. The system that we consider throughout this paper is a simple one-component repairable system. We also assume that, at any given time, a component is either functioning normally or failed, and that the component state changes as time
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
Normal State
53
Failed
State
Figure 1. Transition diagram of component states.
evolves. Possible transitions of state are shown in Figure 1 for this system. A component could jump into a normal state and is there for some time, then fails and experiences a transition to the failed state. A repairable component stays in the failed state for a period, then its state is changed to the normal state when the repair is completed. It is also assumed that, at most, one transition occurs in a sufficiently small time interval and that the possibility of two or more transition is negligible. The transition to the normal state is called repair, whereas the transition to the failed state is called failure. We further assume that repairs restore the component to a condition as good as new. Thus, the process consists of alternating repetitions of the repair-to-failure and the failure-to-repair processes. The availability at time t, denoted by A(t), is the probability that the component is functioning at time t. The unavailability at time t denoted by Q(t), is the probability that the component is in the failed state at time t. Because a component is either in the functioning state or in the failed state at time t, the unavailability can be derived from the availability and vice versa from the following equation: A(t) + Q(t) = 1,
∀t
We assume that the time to failure density for the component is shown by f (.) and that the time to repair density is shown by g(.). The unconditional failure intensity, w(t) is the probability that the component fails per unit time t, in other words, w(t)dt is the probability that the component fails during [t, t + dt), given that the component was as good as new at t = 0. W (t, t + dt) is the expected number of failures during [t, t + dt) and is given by W (t, t + dt) = w(t)dt. The expected number of failures during [t1 , t2 ) is calculated by
t2
W (t1 , t2 ) =
w(t)dt t1
The unavailability Q(t) is then calculated by Q(t) = W (0, t) − V (0, t)
Repairable systems modeling
54
We are now able to calculate combined-process probabilistic parameters. The components that fail during [t, t + dt) are classified into two types: Type 1. A component that has been normal to time t and fails during [t, t + dt), given that it was as good as new at time zero. Type 2. A component that was repaired during [u, u + du), u < t, has been normal to time t, and fails during [t, t + dt), given that the component was as good as new at time zero. The unconditional failure intensity is then calculated by t w(t) = f (t) + f (t − u)v(u)du
(2)
0
where f (t − u)dt is the probability that the component has been normal to time t and failed during [t, t + dt), given that it was as good as new at time zero and was repaired at time u. The components that are repaired during [t, t + dt) consist of components of the following type. Type 3. A component that failed during [u, u + du), has been failed until time t, and is repaired during [t, t + dt), given that the component was as good as new at time zero. The probability for this type of component is w(u)du.g(t − u)dt, then we have t v(t) = g(t − u)w(u)du (3) 0
The unconditional failure intensity w(t) and the repair intensity v(t) are calculated by an iterative numerical integration of Equations (2) and (3) when densities f (t) and g(t) are given. If a rigorous, analytical solution is required, Laplace transforms can be used. Example 1 To validate the sensitivity approaches presented in this paper, we first apply these methods to the well-known example where f (.) and g(.) both follow the exponential distributions with the constant failure and repair rates, λ and μ, respectively. The availability A(t) is then given by the closed form as A(t) =
μ λ + exp (−(μ + λ)t) μ+λ μ+λ
To examine the sensitivity analysis of A(t) with respect to the uncertainties in λ and μ, we use GEM-SA software to calculate the main effects, their plots and other sensitivity measures described in Section 3. Plots of main effects (Figure 2) provide us a cheap and effective tool to examine which of the parameters of A(t) is significantly sensitive to, and the nature of the possible relationships between A(t) and λ and μ. These plots also suggest that A(t) is a decreasing function of λ and an increasing function of μ. The percentage variance contribution of each parameter’s main effect can also be calculated whenever main effects are plotted. The basic idea is as follows: uncertainty in the parameters produces uncertainty in the availability, which can be measured by the
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
55
Main effects for 10 runs when t=10 0.96
0.94
0.92
0.9
5
6
7
λ
8
9
10 −3
x 10
0.95 0.945 0.94 0.935 0.93 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Figure 2. Estimated main effects for the availability function parameters. Solid lines represent estimates of the posterior expected of A(t = 10) with respect to the unknown parameter distribution. Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 10 data points to estimate the main effects.
variance of the availability. How much would this variance be reduced if we were to learn the true value of a parameter? If a parameter contributes a high percentage to the total variance of availability, then it is this parameter which we should try and learn more about in order to reduce the total uncertainty about availability. For each parameter, its expected proportional contribution to the availability function variance is computed, and this provides a simple means of ranking the parameters. The results are given in the following table Parameters λ μ Total
Variance (%) 91.99 7.75 99.74
Total Effect 92.22 7.78
According to these results, we can conclude that A(t) is more sensitive to λ at the given time, t = 10. Figures 3 to 6 show the plots of main effects at different time and by using 25 training data points. As time increased from t = 0.01 to t = 1000 (from this point A(t ≥ 1000) = A(∞)), the availability function becomes more sensitive to μ and the percentage variance contribution of λ’s main effect decreases from 98.84% to 14.51%, while this percentage for μ increases from 0.1% to 84.71%. Example 2 The availability A(t) when densities f (t) = W ei(σf , βf ) and g(t) = W ei(σr , βr ) in (2) and (3) follow the Weibull distributions does not have the closed form and we need to estimate this measure for the different values of the parameters by the numerical methods. We first used a simple method introduced by Xie [12] and called RS-method for solving renewal-type integral equations based on direct numerical Riemann-Stieltjes integration. We have adopted this method to compute w(t) and v(t) in Equations (2) and (3). The results obtained from generalization of Xie’s approach do not match reality. The
Repairable systems modeling
56
Main Effects for t=0.01 0.9998 0.9996 0.9994 0.9992 0.999
5
6
7
8
9
10 −3
x 10 λ 0.9993
0.9993
0.9993
0.9992 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Figure 3. Estimated main effects for the availability function parameters. Solid lines represent estimates of the posterior expected of A(t = 0.01) with respect to the unknown parameter distribution. Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects. Main Effects for t=10 0.96
0.94
0.92
0.9
5
6
7
λ
8
9
10 −3
x 10
0.95 0.945 0.94 0.935 0.93 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Figure 4. Estimated main effects for the availability function parameters. Solid lines represent estimates of the posterior expected of A(t = 10) with respect to the unknown parameter distribution. Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
availability values for t ≥ 5 become less than A(∞) which show the method does not work for the large values of t. We then calculate A(t) by a further simpler numerical method which produce reasonably accurate results. In this method, we first generate 25 data points from the plausible ranges of βf , σr and βr . These points are generated by Maximum Latin Hypercube in GEM-SA. We then simulate w(t) and v(t) and calculate A(t) for these data points as the data to update the Gaussian process prior distribution described in Section 3 to calculate the sensitivity measures. We let σf = 1, and the plausible range of values of βf , σr and βr are [1.58, 2.58], [0.05, 0.15] and [0.45, 0.55], respectively.
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
57
Main Effects for t=100 0.9 0.85 0.8 0.75 0.7
5
6
7
λ
8
9
10 −3
x 10
0.9 0.8 0.7 0.6 0.5 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Figure 5. Estimated main effects for the availability function parameters. Solid lines represent estimates of the posterior expected of A(t = 100) with respect to the unknown parameter distribution. Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
Main Effects for t=1000 0.9 0.85 0.8 0.75 0.7
5
6
7
λ
8
9
10 −3
x 10
0.9 0.8 0.7 0.6 0.5 0.01
0.015
0.02
0.025
0.03 μ
0.035
0.04
0.045
0.05
Figure 6. Estimated main effects for the availability function parameters. Solid lines represent estimates of the posterior expected of A(t = 1000) with respect to the unknown parameter distribution. Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
Figures 7 to 12 show the plots of main effects at different time scale, from t = 5 to t = ∞. As time increased from t = 5 to t = 500 (from this point A(t ≥ 500) = A(∞)), A(t) is more sensitive to σr which the percentage variance contribution to its main effect varies between 87% to 91%. In average, βr has about 10% of variance contribution to the main effect. Finally, the percentage variance contribution of βf is less than 1%.
Repairable systems modeling
58
Main effects for t=5 0.9 0.85 0.8
1.6
1.7
1.8
1.9
2
2.1 βf
2.2
2.3
2.4
2.5
0.9 0.8 0.7 0.05
0.1 σr
0.15
0.5 βr
0.55
0.9 0.85 0.8 0.45
Figure 7. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = 5) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
Main effects for t=15 0.9 0.85 0.8
1.6
1.7
1.8
1.9
2
2.1 β
2.2
2.3
2.4
2.5
f
0.9 0.8 0.7 0.05
0.1 σ
0.15
0.5 βr
0.55
r
0.9 0.85 0.8 0.45
Figure 8. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = 15) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
4. Discussion An alternative sensitivity analysis of the quantities of interest in the reliability analysis, such as the availability/unavailability function, with respect to the changes of uncertain parameters is presented. This method is originally introduced by Oakley and O’Hagan [7] to examine sensitivity analysis of a complex model with respect to changes in its inputs based on an emulator which is built to approximate the model. This method enables us to decompose the output variance into components representing main effects and interactions. This approach allows effective sensitivity analysis to be achieved by using far smaller number of models runs than standard Monte Carlo methods. For example, Marseguerra, Zio and Podofillini [5] employ a Monte Carlo method with 107 runs to compute the first-order differential sensitivity indexes of the basic events characterizing
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
59
Main effects for t=30 0.83 0.825 0.82
1.6
1.7
1.8
1.9
2
2.1 β
2.2
2.3
2.4
2.5
f
0.9 0.8 0.7 0.05
0.1 σ
0.15
0.5 β
0.55
r
0.85 0.8 0.75 0.45
r
Figure 9. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = 30) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects). Main effects for t=100 0.85
0.8
1.6
1.7
1.8
1.9
2
2.1 βf
2.2
2.3
2.4
2.5
0.9 0.8 0.7 0.05
0.1 σ
0.15
0.5 β
0.55
r
0.9 0.8 0.7 0.45
r
Figure 10. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = 100) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
the reliability behavior of the transport system while our approach might require few hundred runs to achieve the same accuracy but with broader sensitivity analysis measures. As another comparison between the method represent in this paper with the other sensitivity analysis methods, Oakley and O’Hagan [7] present a synthetic example with 15 inputs. Using only 250 simulator runs to build the emulator, they obtain a comprehensive set of sensitivity analysis measures introduced above while the FAST Monte Carlo method require 15,360 simulator runs to compute just the variance-based sensitivity indices and total sensitivity indices with the same accuracy.
Repairable systems modeling
60
Main effects for t=500 0.82 0.818 0.816
1.6
1.7
1.8
1.9
2
2.1 βf
2.2
2.3
2.4
2.5
0.9 0.8 0.7 0.05
0.1 σ
0.15
0.5 β
0.55
r
0.85 0.8 0.75 0.45
r
Figure 11. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = 500) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
Main effects for t=Infn 0.85 0.8 0.75
1.6
1.7
1.8
1.9
2
2.1 βf
2.2
2.3
2.4
2.5
0.9 0.8 0.7 0.05
0.1 σr
0.15
0.5 βr
0.55
0.85 0.8 0.75 0.45
Figure 12. Estimated main effects for the availability function when the Weibull distributions have been used for TTF and TTR. Solid lines represent estimates of the posterior expected of A(t = ∞) with respect to βf , σr and βr . Dotted lines show 95% pointwise probability intervals for these estimates with respect to the Gaussian process distribution. We use 25 data points to estimate the main effects.
It would be desirable to apply this method to study sensitivity analysis of the quantities of interest in reliability analysis for the systems with more than one component. We are adopting a sensitivity analysis emulator-based approach for reliability and uncertainty analysis of groundwater contaminant transport and remediation system. The sensitivity analysis measures are calculated at the given time while the functions/quantities of interest in the reliability analysis are dynamic, in the sense that they describe evolving behavior of the system over time. This method, however, is less straightforward for dynamic simulators, designed to represent time-evolving sys-
Sensitivity Analysis of a Reliability System - A. Daneshkhah & T. Bedford
61
tems/models. Conti and O’Hagan [1] suggested the following procedures to emulate a dynamic simulator: 1. The first method consists of using multi-output emulator, developed by Conti and O’Hagan [1], where the dimension of the output space is q = T and we assume that the outputs of the dynamic model is presented by y = (y1 , . . . , yT ). 2. The second approach is required just one single-output emulator, following an idea originally mentioned by Kennedy and O’Hagan [4] to analysis the spatial outputs. The time is considered as an extra input for this model, and the output yt = ft (x) can now be presented as f ∗ (x, t), where t = 1, . . . , T . Therefore, emulation of f (.) can be obtained by emulating f ∗ (., .). The training set for construction this emulator consists of nT outputs generated by inputs in the grid X × {1, . . . , T }, where X = {x1 , . . . , xn } denote to the pre-selected design set to obtain the outputs by running the computer code on this set. 3. The third approach is to emulate the T outputs separately, each via a single-output emulator. Data for the t-th emulator would be then provided by the corresponding column of the data set. We are adopting the first method which is the simplest from a computational perspective to examine the sensitivity analysis of A(t) (and other quantities of interest in reliability analysis) with respect the changes of the parameters of the failure and repair densities when the availability evolve over the time. It can be concluded that the multi-output emulator reported in [1] can form the basis for the emulation of dynamic simulators, which in practice can be considered as an huge breakthrough for dealing with the issues such as uncertainty analysis, sensitivity analysis (in particular, the sensitivity analysis of the quantities of interest in the reliability analysis which evolve over time) and calibration of dynamic models. However, there are some directions for further research related to these procedures. The main concern to the multi-output emulator’s improved performance relative to the time-input emulator lies in its more flexible modeling of correlations over time which is also a source of extra computational load through the need for larger training design points. A more restrictive structure, in which the covariance matrix of the Gaussian process is constrained to follow a standard time series form, would allow even better emulation with smaller training samples. Furthermore, an appropriate time-series structure to set up a more flexible time-input emulator’s correlation function could make the corresponding multi-output emulator easier to use and seems worth investigating. The flexibility of the multi-output emulator as the basis for emulation of dynamic models can be improved by relaxing the assumption associated with the smoothness parameters common to all outputs. The generality of the many single-output emulators approach in this regard is its principal benefit. Conti and O’Hagan [1] reported that the combination of different smoothness parameters for each input with some covariance structure on the output space seems to be very difficult in such a way to create a valid positive-definite correlation function. The sensitivity methods developed in this paper based on using the emulators can be extended to study the sensitivity analysis of the most complex models where a single model run takes considerable amount of time. We are interested to study the sensitivity analysis of the availability and other important reliability analysis for more complex models which consists of many components.
Repairable systems modeling
62
One way to simplify these complex models is to decompose them into more simple models. In other words, the complex model can be considered as the coupled models of the sub-models. The task of sensitivity analysis of the complex model is then reduced to the sensitivity analysis of the sub-models which has been addressed in this paper. The primary focus of the sensitivity analysis for the coupled models are: 1. To examine and understand the sub-models that are used to be build the complex models and how they are coupled together. 2. To develop suitable emulators for sub-models to be used as the basis for the sensitivity analysis introduced in this paper. 3. To present a sensitivity measure in terms of the sensitivity measures calculated for each sub-models.
Acknowledgments This research was supported by the Engineering and Physical Sciences Research Council. The authors also wish to acknowledge discussions with Dr. John Quigley.
References [1]
[2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12]
S. Conti, A. O’Hagan, Bayesian emulation of complex dynamic computer models, Technical report, No 569/07, Department of Probability and Statistics, University of Sheffield, Submitted to Journal of Statistical and Planning Inference, 2007. P. Gustafson, The Local sensitivity of Posterior Expectations, unpublished PhD thesis, Department of Statistics, Carnegie Mellon University, 1994. T. Homma, A. Saltelli, Importance measures in global sensitivity analysis of model output, Reliab. Engng Syt. safety 52 (1996), 1-17. M.C. Kennedy, A. O’Hagan, Bayesian calibration of computer models (with discussion), J. R. Statist. Soc., B 63 (2001), 425–464. M. Marseguerra, E. Zio, L. Podofillini, First-order differential sensitivity analysis of a nuclear safety system by Monte Carlo simulation, Reliability Engineering and system safety 90 (2005), 162–168. R.M. Neal, Annealed importance sampling, Statistics and Computing 11 (2001), 125–139. J.E. Oakley, A. O’Hagan, A Probabilistic sensitivity analysis of complex models: a Bayesian approach, J. R. Statist. Soc B 66 (2004), 751–769. A. O’Hagan, M.C. Kennedy, J.E. Oakley, Uncertainty analysis and other inference tools for complex computer codes (with discussion). In: J.M. Bernardo, J.O. Berger, A.P. Dawid, A.F.M. Smith (Eds.), Bayesian Statistics 6, Oxford University Press, Oxford, UK, (1999), 503–524. J. Sacks, W.J. Welch, T.J. Mithchell, H.P. Wynn, Design and analysis of computer experiments, Statistical Science 4 (1989), 409–435. A. Saltelli, K. Chan, M. Scott (Eds.), Sensitivity Analysis, New York, Wiley, (2000). A. Saltelli, S. Tarantola, F. Campolongo, Sensitivity analysis as an ingredient of modeling, Statistical Science 15 (2000), 377–395. M. Xie, On the solution of renewal-type integral equations, Communications in Statistics - B: Computation & Simulation, 18 (1989), 291-293.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
63
On Independence Of Competing Risks Isha DEWAN 1 , Indian Statistical Institute, New Delhi, India Abstract. Consider a competing risk set up with two risks and the latent failures given by X and Y . Statistical analysis of this model has been done under three different independence assumptions - independence of X and Y , independence of T = min(X, Y ), and δ = I(X < Y ) and independence between X and δ. We discuss examples where these independence arise and also the relationship between the three. Keywords. Competing risks, subdistribution function, cause-specific hazard rate, identifiability.
Introduction The competing risks situation was first considered in the eighteenth century when small pox vaccination was being discovered and popularized. [1] was interested in change in mortality structure if the risk of death due to small pox is totally eliminated, the other risks persisting as before? This question is relevant even today with small pox replaced by diabetes, tuberculosis, AIDS etc. Such data also arise from series systems , in unemployment and educational studies and even in the game of cricket [9]. Let X1 , X2 , . . . , Xk denote the latent failure times of individuals subject to k risks, where Xi represents the age at death if cause i were the only cause of failure. What is actually observed is the time to failure T , where T = min(X1 , X2 , . . . , Xk ), and the cause of failure δ where δ = j if Xj = min(X1 , X2 , . . . , Xk ). In reliability engineering T is the lifelength of a series system and δ identifies the component which led to system failure. In survival studies T is the time of failure or censoring time depending on whether δ is 1 ( the individual dies) or 0 (the failure time is censored). Assumption of ‘independence’ has been crucial in statistical analysis of competing risks data. During early years of development of the subject the latent failures were assumed to be independent and all estimation and testing procedures had this as an underlying assumption. In Section 1 we assume that the latent failure times are independent and restate some of the results for independent latent failure times for sake of completion. However, the inherent problems in this set up led to studying the model via the joint distribution of (T, δ). In Section 2 we characterize the independence of T and δ in terms of functions which are identifiable and estimable from competing risks data. In Section 3 we discuss the situation where the independence between X and δ arises. In Section 4 we discuss the implications between these three types of independence and conclude in Section 5 with some examples. 1 Corresponding Author: Isha Dewan, Indian Statistical Institute, 7, SJS Sansanwal Marg, New Delhi, India; E-mail:
[email protected]
Competing Risks
64
1. Independent Latent Failures Suppose that the latent failure times X1 , X2 , . . . , Xk are independent and identically distributed random variables with a common distribution function H(x), then it is uniquely determined by the distribution function of the minimum, that is, P [T ≤ x] = k 1 − [1 − H(x)] . If X s are independent, but not identically distributed , [6] showed that the joint distribution of (T, δ) uniquely determines Fi , the distribution function of Xi . ⎡ ⎢ Fi (x) = 1 − exp ⎣1 −
⎡ x
−∞
⎣1 −
k
⎤
⎤−1 F (j, t))⎦
⎥ dFi (t)⎦ , i = 1, 2, . . . , k,
j=1
where F (j, t) = P [δ = j, T ≤ x]. Thus, the marginal and hence the joint distribution of latent failure times is identifiable from the probability distribution of the observable random variables (T, δ). However [9,16] showed that, in general, when the risks are not independent, neither the joint distribution of X s nor their marginals are identifiable from the probability distribution of (T, δ). They have proved that a unique independent and infinitely many dependent probability distributions of (X1 , X2 , . . . , Xk ) correspond to a single probability distribution of (T, δ). Because of the identifiability problem it is not possible to test for independence of latent failures on the basis of (T, δ) data. Apart from theoretical problems, the assumption of independence may not be realistic in most practical situations. The marginal distribution functions Fi (x) may not represent the probability distribution of lifetimes in any real life situation. Elimination of j-th risk may change the environment in such a way that Fi (x) does not represent the lifetime of Xi in the changed scenario. In medical applications there may be strong interactions between different diseases because of the complex underlying physical conditions. Similarly in physical sciences system components need not be independent when they are sharing a common load or working under similar environments.
2. T and δ Independence Because of identifiability problem it is necessary to suggest appropriate models, develop methodology and carry out the data analysis in terms of the observable random variables (T, δ) alone [12]. Suppose T is the age of the individual exposed to two risks of failure denoted by 0, 1. The joint distribution of (T, δ) is specified by the subsurvival functions S(i, t) = P (T ≥ t, δ = i), and or equivalently by the subdistribution functions, F (i, t) = P (T < t, δ = i). The survival function and the distribution function of T are, respectively, given by S(t) = S(1, t) + S(0, t), and F (t) = F (1, t) + F (0, t). Throughout we assume that the subsurvival functions are continuous with subdensity functions f (i, t), and f (t) = f (0, t) + f (1, t) is the density of T . Let φ = P (δ = 1) = 1 − P (δ = 0). The cause-specific hazard rate for cause i, the probability of instantaneous failure at time t due to i-th cause given that the unit has survived time t , is defined as h(i, t) =
Independence of competing risks - I. Dewan
65
f (i, t)/S(t), and the crude hazard rate for cause i, the probability of instantaneous failure at time t due to i-th cause given that the unit has survived the i-th cause up to time t is defined as r(i, t) = f (i, t)/S(i, t). The hazard rate of T is h(t) = f (t)/S(t) = h(1, t) + h(0, t). [11] considered the following conditional probability functions Φ1 (t) and Φ∗0 (t) Φ1 (t) = P [δ = 1|T ≥ t] =
S(1, t) F (0, t) , Φ∗0 (t) = P [δ = 0|T < t] = . S(t) F (t)
(1)
where Φ1 (t) is the probability of failure due to risk 1 given that the unit has survived up to time t and Φ∗0 (t) is the probability of failure due to risk 0 given that the unit has failed before time t. [11] studied various kinds of dependence between T and δ via Φ1 (t) and Φ∗0 (t). The nature of dependence between T and δ could be crucial and useful in modeling. If T and δ are independent then S(i, t) = P (δ = i)S(t). Thus the hypothesis of equality of incidence functions, or that of equality of cause-specific hazard rates reduces to testing whether P (δ = 1) = P (δ = 0) = 1/2. Hence, it allows us to study the continuous and the discrete random variable separately. The various types of dependence structures (discussed in [5]) between T and δ can be defined in terms of Φ1 (t) and Φ∗0 (t) as follows. 1. T and δ are independent iff Φ1 (t) = φ or equivalently Φ∗0 (t) = 1 − φ. 2. T and δ are positive quadrant dependent if Φ1 (t) ≥ Φ1 (0) = φ, for all t. Equivalently T and δ are positive quadrant dependent if Φ∗0 (t) ≥ Φ∗0 (∞) = 1 − φ, for all t > 0. 3. δ is right tail increasing in T if Φ1 (t) is monotone nondecreasing in t. 4. δ is left tail decreasing in T if Φ∗0 (t) is monotone non-increasing in t. [11] considered the problem of testing H0 : T and δ are independent against various alternative hypotheses viz. positive quadrant dependent, right tail increasing , left tail decreasing. 2.1. Characterizing independence Here we characterize independence/ dependence between (T, δ) in terms of various functions defined above. [15] have discussed various orderings of survival functions and their interrelationships. Here we will see that most of the results can be extended to subsurvival functions. The proofs are simple and hence have been omitted. Theorem 1 (i) T and δ are independent iff S(1, t)/S(0, t) is a constant (φ/ [1 − φ]) or equivalently F (1, t)/F (0, t) is a constant (φ/ [1 − φ]), (ii) T and δ are PQD iff S(1, t)/S(0, t) ≥ (φ/ [1 − φ]) or equivalently F (1, t)/F (0, t) ≤ (φ/ [1 − φ]), (iii) δ is RTI in T iff S(1, t)/S(0, t) is increasing in t, (iv) δ is LTD in T iff F (1, t)/F (0, t) is increasing in t. Hence we can study dependence between (T, δ) via ratio of subsurvival functions and subdistribution functions. Note that φ/ [1 − φ] is the ratio of probability of failure due
66
Competing Risks
to first risk to probability of failure due to second risk. [11] have made use of above relationships to construct tests for testing H0 against various alternatives. ¯ density funcLet X and Y be random variables with survival functions F¯ and G, ∗ tions f and g, hazard rates rX (x) and rY (x) and reverse hazard rates rX (x) and rY∗ (x) , ¯ respectively. It is known (Shaked et al (1994)) that F¯ (x)/G(x) (F (x)/G(x)) is increas∗ ∗ ing in x iff rX (x) ≤ rY (x) (rX (x) ≥ rY (x)). This is known as the hazard rate (reverse hazard rate) ordering of the distribution functions. Here we have the following analogous result. Lemma 1 S(1, t)/S(0, t) (F (1, t)/F (0, t)) is increasing in t iff r(1, t) ≤ r(0, t) (r∗ (1, t) ≥ r∗ (0, t)) for all t , where r∗ (i, t) is the crude reverse hazard rate of i-th risk. Thus the monotonicity of the ratio of subsurvival (subdistribution) functions holds iff the crude hazard (reverse hazard ) rates are ordered. Thus we can connect the dependence of T and δ with the crude hazard rates with the help of the following result. Theorem 2 (i) T and δ are independent iff r(1, t) = r(0, t) ( r∗ (1, t) = r∗ (0, t)) for all t, (ii) δ is RTI in T iff r(1, t) ≤ r(0, t) for all t, (iii) δ is LTD in T iff r∗ (1, t) ≥ r∗ (0, t) for all t. [10] have discussed various parametric models which arise from cause-specific hazard rates where the subdensity function has a nice form but the subsurvival functions are not in mathematically convenient forms. The following results are useful in characterizing independence/dependence between failure time and cause of failure in terms of subdensity functions and cause specific hazard rates. Lemma 2 f (1, t)/f (0, t) is increasing in t implies S(1, t)/S(0, t) (F (1, t)/F (0, t)) is increasing in t. f (x)/g(x) is increasing in x is called the likelihood ratio ordering of density func¯ tion. Then f (x)/g(x) is increasing in x implies that F¯ (x)/G(x) (F (x)/G(x)) is increasing in x, that is, likelihood ratio ordering implies hazard (reverse hazard) rate ordering [15]. Above lemma extends this result to subdensity and subsurvival functions. This leads to following result on dependence of T and δ in terms of ratio of subdensity functions. Theorem 3 (i) T and δ are independent iff f (1, t)/f (0, t) is a constant (φ/ [1 − φ]), (ii) If f (1, t)/f (0, t) ≥ φ/(1 − φ), then T and δ are PQD, (iii) f (1, t)/f (0, t) is increasing in t implies δ is RTI (LTD) in T . It is interesting to note that f (1, t)/f (0, t) = h(1, t)/h(0, t) . Hence we have the following result in terms of the cause-specific hazard rates. Theorem 4 (i) T and δ are independent iff h(1, t)/h(0, t) is a constant (φ/ [1 − φ]),
Independence of competing risks - I. Dewan
67
(ii) If h(1, t)/h(0, t) ≥ φ/(1 − φ), then T and δ are PQD, (iii) h(1, t)/h(0, t) is increasing in t implies δ is RTI (LTD) in T . Part (i) of Theorem 4 was proved by [13]. The proof here is very simple. Another conditional probability of interest is P [T > t|δ = 1] = S(1, t)/φ and P [T > t|δ = 0] = S(0, t)/(1 − φ). These are the conditional subsurvival given failure due to risk 1 and 0, respectively. Then we have Theorem 5 (i) T and δ are independent iff S(1, t)/φ = S(0, t)/(1 − φ), (ii) T and δ are PQD iff S(1, t)/φ ≥ S(0, t)/(1 − φ) or equivalently F (1, t)/φ ≤ F (0, t)/(1 − φ), S(1,t)/φ (iii) δ is RTI in T iff S(0,t)/(1−φ) is increasing in t, (iv) δ is LTD in T iff
F (1,t)/φ F (0,t)/(1−φ)
is increasing in t.
Hence the dependence structure between T and δ can also be studied via conditional subsurvival (subdistribution) functions and behaves in a manner similar to that of subsurvival functions. T and δ are independent if the conditional probability of surviving t is the same irrespective of the cause of failure, PQD if the probability of survival given failure due to cause 1 dominates the probability of survival given failure due to second , that is, risk 1 is less potent. Results in this section are useful from probabilistic context as they help in looking at various orderings between sub-distribution functions, subdensity functions, causespecific hazards. The results are analogous to those available for distributions functions, density functions, failure rates etc. The shape of the Φ function has been used by [7] to build models for competing risks . The above results will be useful for comparing competing risks in two different environments on the basis of their sub-distribution functions, subdensity functions, cause-specific hazards. Besides, the characterization results should be useful in developing new test procedures for testing the hypothesis of independence of failure time and cause of failure against the alternative of positive dependence.
3. Independence between X and δ Consider a component which is subject to failure at random time X.The failure can be avoided by preventive maintenance at random time Y . If X < Y , then we observe failure, else the component has undergone preventive maintenance. This is a competing risks set up wherein we observe min(X, Y ) and I(X < Y ). One would expect dependence between failure time X an maintenance time Y . During operations competent maintenance crew will have some information regarding the component. A very good maintenance team will use this information to minimize the repair (replacement) cost over a long time interval. Since the repair (replacement) cost of a critical failure (corresponding to X corrective maintenance) is much higher than the cost of a degraded failure (corresponding to Y preventive maintenance), the maintenance team will try to avoid critical failure. Also the maintenance team will ideally not want to loose too much time because of increased number of repairs (cost) over a long
Competing Risks
68
time interval. Ideally the component is preventively maintained at time t if and only if it would have otherwise have failed shortly after item t. This situation is best explained by the Random Signs model developed by [8]: consider a component subject to right censoring , where X denotes the time at which a component would expire if not censored, then the event that the components life be censored is independent of the age X at which the component would expire, but given that the component is censored , the time at which it is censored may depend on X. This might arise if a component emits warning before expiring, if the warning is seen then the component is taken out , thus censoring its life, otherwise it fails. This situation has been modeled as follows. Let X and Y be life variables with Y = X − W δ, where W, 0 < W < X, is a random variable and δ is a random variable taking values 1 and -1, with X and δ are independent. S(1, t) = P [X > t, δ = −1] = P [X > t] P [δ = −1] = SX (t)P [Y > X] = SX (t)S(1, 0).
(2)
[8] was only interested in the identifiability of SX (t). Note that f (1, t) = fX (t)S(1, 0), r(1, t) = rX (t). However, S(0, t) = P [Y > t, δ = 1] = P [X − W > t, δ = 1] .
(3)
Hence nothing can be said about f (0, t) and r(0, t). However, if W = a, that is, warning is emitted at fixed time ‘a’, then we have, S(0, t) = SX (t + a)S(0, 0), f (0, t) = fX (t + a)S(0, 0), r(0, t) = rX (t + a). In this case, subdistribution functions, subdensity functions corresponding to both X and Y are identifiable and hence can be estimated from failure time , cause of failure data.
4. Relationship between three types of independence Suppose X and Y are latent failure times with joint distribution F (x, y) = P [X ≤ x, Y ≤ y] . One can observe T = min(X, Y ) and δ = I(X < Y ). We are interested in studying the independence/dependence between these two under latent failures. 4.1. Latent failures are independent Suppose that the latent failure times X and Y are independent. Under this assumption we have the following ¯ ¯ f (1, t) = f (t)G(t), S(t) = F¯ (t)G(t), h(1, t) = rX (t), h(0, t) = rY (t). where rX denotes the failure rate of X. In particular,
Independence of competing risks - I. Dewan
Φ1 (t) =
∞
69
" # ¯ ¯ G(x)f (x)dx/ F¯ (t)G(t) .
t
It is easy to see that Φ1 (t) is increasing iff Φ1 (t) ≥
h(1, t) rX (t) h(1, t) = = . [h(t)] [h(0, t) + h(1, t)] [rX (t) + rY (t)]
(4)
Hence we have the following result regarding independence/dependence of T and δ. Theorem 6 Let X, Y be independent latent failure times. Then (i) T and δ are independent iff rX (t) = krY (t), (ii) δ is RTI in T iff rX (t)/rY (t) is increasing in t. Hence , if X and Y are independent and have proportional failures then T and δ are independent. This result has earlier been proved by [2] and [14]. If the ratio of failure rates of the latent failures is increasing in t , then δ is RTI in T . Theorem 7 Suppose that F¯ (t) = exp(−λt), λ > 0 and Y , independent of X has a decreasing failure rate (DFR). Then δ is RTI in T . Proof: Notice that rX (t)/rY (t) is increasing in t . The result follows from (ii) of Theorem 6. Hence, if X and Y are independent, X has exponential distribution and Y has DFR, then δ is RTI in T . 4.2. Latent failures are dependent When the latent failures are not independent, we have the following results. Theorem 8 If (X, Y ) is symmetric , then T and δ are independent. Proof: If (X, Y ) is symmetric, S(1, t) = S(0, t), and φ1 (t) = 1/2. Hence T and δ are independent. [4] have given an example for which T and δ are independent, however (X, Y ) is not symmetric. Maybe the additional assumption that φ1 (t) = 1/2 would ensure that (X, Y ) is symmetric .
5. Examples [7] have studied the shape of the Φ function for certain families of distributions. Now we look at some examples which make use of the results discussed in above sections. Example 1: [10] considered the cause-specific hazards having the (Weibull ) form h(i, t) = γi αi tαi −1 , i = 1, 2. In this case it is not possible to write the subsurvival
Competing Risks
70
function in the closed form. However, looking at the f (1, t)/f (0, t) and using Theorem 3, they conclude that T and δ are independent for α1 = α2 and δ is RTI in T for α1 > α2 . Example 2: [3] considered a special dependence structure wherein X0 , X1 , X2 are independent exponential random variables with parameters λ0 , λ1 , λ2 , X = X0 +X1 , Y = X0 + X2 . For example X1 , X2 denote latent lifetimes due to diabetes and heart diseases (say). X0 denotes the effect of improved lifestyle, which is beneficial to an individual. X, Y are the modified latent lifetimes after incorporating the effect of lifestyle changes. Here φ1 (t) is a constant and hence T and δ are independent. This example shows that T and δ can be independent even when the latent failures are not independent . In the general case when exponentiality is not assumed, the problem is not identifiable, but the ratio of failure rates is. If one considers the proportional hazards set up ¯ 2 (x) = L ¯ β (x), β > 0, where L ¯ 1 and L ¯ 2 are survivals of X1 and X2 , respectively, L 1 then the ratio f (1, t)/f (0, t) is a constant 1/β and using Theorem 3 we get T and δ are independent . Hence T and δ are independent but failure rates of X and Y are not proportional . Hence the assumption of independence in Theorem 6 cannot be dropped. Remark 1 Theorem 7 illustrates the case where the latent failures X and Y are independent , but T and δ are not independent. In Example 2 the latent failures X and Y are dependent , but T and δ are independent. Hence it is clear that there is no relationship between the independence of latent failures and independence of T and δ. Lemma 3 Under the random signs model with W = a (X and δ are independent), if X has increasing failure rate (IFR) then δ is RTI in T . Proof: If X has increasing failure rate , then r(0, t) ≥ r(1, t) and from Theorem 2 it follows that δ is RTI in T . Also note that Φ1 (t) =
S(1, t) SX (t) SX (t)S(1, 0) , = constant ∗ . SX (t)S(1, 0) + SX (t + a))S(0, 0) S0 (t) SX (t + a)
Then the ratio is increasing in t iff rX (t + a) ≥ rX (t). Hence we reach the same conclusion using Theorem 1. Further if X has exponential distribution then Φ1 (t) is a constant and T and δ are independent. In fact it is easy to prove the following. Lemma 4 Under the random signs model with W = a ,(X and δ are independent) T and δ are independent iff X has exponential distribution. Proof: Proof follows from the fact that S(1, t) S(1, 0) SX (t)S(1, 0) S(1, 0) = iff = , S(0, t) S(0, 0) SX (t + a)S(0, 0) SX (a)S(0, 0) that is, iff SX (t + a) = SX (t)SX (a). Further, T and δ are PQD iff Φ1 (t) =
SX (t)S(1, 0) >= Φ1 (0) SX (t)S(1, 0) + SX (t + a))S(0, 0)
Independence of competing risks - I. Dewan
71
that is iff SX (t + a) ≤ SX (t), which is always true. To summarize, for random signs model T and δ are independent iff X has exponential distribution , they are always PQD and δ is RTI in T if X has increasing failure rate. Thus, the three types of independence discussed above are not equivalent The various examples discussed clearly show that there is no chain of implications between them. This is because they represent different aspects of the competing risks model. The results discussed above are useful in all real life situations where we observe a continuous random variable and an associated discrete identifier. The characterization results of independence/dependence of T and δ should lead to development of new testing procedures to compete with existing ones suggested in [11]. We are also looking at distribution-free tests to test for independence between X and δ on the basis of competing risks data.
References [1]
D. Bernoulli , Essai d’une nouvelle analyse de la mortalite causee par la petite Verole, et des advantages de l’inoculation pour la prevenir. Mem. Acad. R. Sci., 1760, 1-45. [2] W.R. Allen , A note on the conditional probability of failure when the hazards are proportional. Oper. Res. 11 (1963), 658-659. [3] I. Bagai and B.L.S. Prakasa Rao , Analysis of survival data with two dependent competing risks. Biometrical J. 34 (1992), 801-814. [4] R.B. Bapat and S.C. Kochar, Characterizations of identically distributed independent random variables using order statistics. Statist. Probab. Lett. 17 (1993), 225-230. [5] R.E. Barlow and F. Proschan , Statistical Theory of Reliability and Life Testing -Probability Models. Holt Rinehart, New York, 1975. [6] S.M. Berman , Note on extreme values, competing risks and semi-Markov processes. Ann. Math. Statist., 34 (1963), 1104-1106. [7] C.Bunea, R. Cooke and B. Lindqvist (2002) . Competing risk perspective over reliability databases. Mathematical Methods in Reliability- Methodology and Practice, MMR , Trondheim, 131-134. [8] R.M. Cooke . The design of reliability databases, part II. Reliability engineering and system safety 51 (1996), 209-223. [9] M. J. Crowder , Classical Competing Risks. Chapman and Hall/CRC, London, 2001. [10] Dewan, I., Kulathinal, S.B. , Parametric models for subsurvival functions in the competing risks set up. Preprint 2006. [11] I. Dewan, J.V. Deshpande and S.B. Kulathinal. On testing dependence between time to failure and cause of failure via conditional probabilities. Scand. J. Statist. 31 (2004), 79-91. [12] J.D. Kalbfleisch and R.L. Prentice , The statistical analysis of failure time data. Second Edition, John Wiley, New Jersey, 2002. [13] S.C. Kochar and F. Proschan, Independence of time and cause of failure in the multiple dependent competing risks model. Statist. Sinica 1 (1991), 295-299. [14] J. Sethuraman, On a characterization of the three limiting types of the extreme. Sankhy¯a Ser. A 27 (1965), 357-364. [15] M. Shaked and J.G. Shanthikumar , Stochastic orders and their applications. Probability and Mathematical Statistics, Academic Press, Boston, 1994. [16] A. Tsiatis, A nonidentifiability aspect of the problem of competing risks. Proc. Natl. Acad. Sci. U.S.A. 72 (1975), 20-22.
72
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Bivariate Competing Risks Models under Masked Causes of Failure P. G. SANKARAN 1 , and Alphonsa Antony ANSA Department of Statistics, Cochin University of Science and Technology Abstract. Consider a system consists of k components and each component is subject to more than one cause of failure. Due to inadequacy in the diagnostic mechanism or reluctance to report any specific cause of failure, the exact cause of failure cannot be identified easily. In such situations, where the cause of failure is masked, test procedures restrict the cause of failure to a set of possible types containing the true failure cause. In this paper, we develop a non-parametric estimator for the bivariate survivor function of competing risk models under masked causes of failure based on the vector hazard rate. Asymptotic properties of the estimator are discussed. We also illustrate the method with a data set. Keywords. Competing risks, hazard rate, reliability
Introduction In life testing experiments, the failure (death) of an individual, either a living organism or an inanimate object, may be classified into one of b (b > 1) mutually exclusive classes, usually causes of failure. For example, causes of death of an individual may be classified as cancer, heart disease or other. Competing risk models are useful for the analysis of such data in which an object is exposed to two or more causes of failure. There are situations in the analysis of competing risk lifetime data where, the exact failure cause cannot be identified easily. This may be due to inadequacy in the diagnostic mechanism or some individuals may be reluctant to report any specific cause of failure (disease). This phenomenon is termed masking. Examples of masked data in reliability and biomedical contexts can be found in Reiser et.al [10], Flehinger et al [5] and Goetghebuer and Ryan [7]. In certain situations, we observe a set of possible failure types containing a true type, along with failure time, which may be subject to censoring. When the set of possible failure types consists of more than one element, the cause of failure is masked. When it is a singleton set, then the failure type is exactly observed and when it contains all possible failure types, the missingness is total. Flehinger et.al [6] discussed the estimation of survival probability due to different types in two stages under the assumption that the hazards of various risks are proportional to each other. Recently, Dewanji and Sengupta [4] developed nonparametric maximum likelihood estimator of the cause-specific hazards in the absence of covariates using EM algorithm. 1 Corresponding Author: Department of Statistics, Cochin University of Science and Technology, Cochin682022, Kerala, India; E-mail:
[email protected]
Bivariate Competing Risks Models under Masked Causes of Failure - P.G. Sankaran & A.A. Ansa
73
Multivariate lifetime data arise when each study unit may experience several events or when there exists some natural grouping of subjects, which induces dependence among failure times of the same group. The sequence of tumor recurrences, the occurrence of blindness in the left and right eyes and the onset of a genetic disease among family members are examples of certain situations that provide multivariate failure time data. The problem of masking may arise in multivariate failure time data with competing causes as well. The estimation of multivariate survivor function in such situations is not carried out so far. Motivated by this, in the present work, we develop a nonparametric estimator of the bivariate survivor function when the causes of failure corresponding to the component lifetimes are masked. The proposed method is a generalization of the analysis of the univariate competing risk data under masking given in Dewanji and Sengupta [4] to the bivariate set up. The extension to the multivariate set up is direct.
1. Survivor function and hazard rate Let T = (T1 , T2 ) be a pair of non-negative random variable defined on a probability space (Ω, F , P). The variables T1 and T2 are thought of as survival times of married couples, failure times of two-component systems etc. Let S(T1 , T2 ) = P [T1 > t1 , T2 > t2 ] be the survivor functions of T . Now we consider the bivariate cumulative hazard vector Λ(t1 , t2 ) = (Λ1 (t1 , t2 ), Λ2 (t1 , t2 )) where Λ1 (dt1 , t2 ) =
−S(dt1 , t2 ) P [T1 ∈ dt1 , T2 > t2 ] = P [t1 ≥ t1 , T2 > t2 ] S(t− 1 , t2 )
Λ2 (t1 , dt2 ) =
−S(t1 , dt2 ) P [T1 > t1 , T2 ∈ dt2 ] = P [t1 > t1 , T2 ≥ t2 ] S(t1 , t− 2)
and
with Λ1 (0, t2 ) = Λ2 (t1 , 0) = 0. When S(t1 , t2 ) has a density function f (t1 , t2 ), we have Λ1 (dt1 , t2 ) = h1 (t1 , t2 )dt1 and Λ2 (t1 , dt2 ) = h2 (t1 , t2 )dt2 with: h1 (t1 , t2 ) = lim
1 P {T1 ≤ t1 + Δt1 |T1 ≥ t1 , T2 > t2 } Δt1
(1)
h2 (t1 , t2 ) = lim
1 P {T2 ≤ t2 + Δt2 |T1 > t1 , T2 ≥ t2 }. Δt2
(2)
Δt1 →0
and Δt2 →0
Johnson and Kotz (1975) [8] has shown that survivor function S(t1 , t2 ) can be uniquely determined by (Λ1 (t1 , t2 ), Λ2 (t1 , t2 )) as
Competing Risks
74
S(t1 , t2 ) = exp{−Λ1 (t1 , 0) − Λ2 (t1 , t2 )}
(3)
= exp{−Λ1 (t1 , t2 ) − Λ2 (0, t2 )}.
(4)
Thus (3) and (4) provide representations of the bivariate survivor function S(t1 , t2 ) in terms of Λ(t1 , t2 ) and Λ(t1 , t2 ). Let C = (C1 , C2 ) be a set of causes corresponding to T = (T1 , T2 ). Suppose that there are γ1 causes of failure for T1 and γ2 causes of failure for T2 . The cause of failures either C1 or C2 or both C1 and C2 may be missing. Let G = (G1 , G2 ), where Gi is the set of possible causes for the i-th component, i = 1, 2. When there are γi causes of failure for Ti , then G ⊆ 1, 2, . . . , γi , i = 1, 2. Then the cause specific hazard function is given by 1 P {T1 ≤ t1 + Δt1 , C1 = j|T1 ≥ t1 , T2 > t2 }, Δt1 →0 Δt1
h1j (t1 , t2 ) = lim
j = 1, 2, . . . , γ1 and 1 P {T2 ≤ t2 + Δt2 , C2 = j|T1 > t1 , T2 ≥ t2 }, Δt2 →0 Δt2
h2j (t1 , t2 ) = lim
j = 1, 2, . . . , γ2 . Assuming that failure type j must be a unique element of {1, 2, . . . , γi }, the cumulative hazard function Λi (t1 , t2 ), i = 1, 2, is obtained as γ1 t1 h1j (u, t2 )du (5) Λ1 (t1 , t2 ) = j=1
0
and Λ2 (t1 , t2 ) =
γ2 j=1
0
t2
h2j (t1 , u)du.
(6)
Now we define (i)
Pgi j (t1 , t2 ) = P (Gi = gi |Ti = ti , Tk > tk , Ci = j, δi = 1)
(7)
(i)
/ gi , j = {1, 2, . . . , γi }, i, k = 1, 2 and i = k. Therefore, for with Pgi j (t1 , t2 ) = 0 if j ∈ fixed j, (i) Pgi j (t1 , t2 ) = 1 j = 1, 2, . . . , γi , i = 1, 2 (8) gi
Assume that the missing mechanism is independent of the censoring mechanism. Then (7) becomes: (i)
Pgi j (t1 , t2 ) = P (Gi = gi |Ti = ti , Tk > tk , Ci = j)
(9)
j = {1, 2, . . . , γi } i, k = 1, 2 i = k. (i)
Thus Pgi j (t1 , t2 ) gives the conditional probability of observing gi j as the set of possible causes, given failure of the i-th component at time ti due to the cause j and k-th
Bivariate Competing Risks Models under Masked Causes of Failure - P.G. Sankaran & A.A. Ansa
75
component survives beyond time tk , i, k = 1, 2, i = k. The hazard rate for failure of the i-th component due to cause j at time ti with gi j observed as the set of possible causes, given the k-th component survives beyond time tk is given by: (i)
Λgi jtk (dti ) = lim
Δti →0
P (Ti ≤ ti + Δti , Ci = j, Gi = gi |Ti ≥ ti , Tk > tk ) Δti
i, k = 1, 2, i = k and j = 1, 2, . . . , γi Denote the events {Ti ≤ ti + Δti , Ci = j}, {Gi = gi } and {Ti ≥ ti , Tk > tk } as A, B and C respectively. Since P (A ∩ B|C) = P (A|C)P (B|A ∩ C), we obtain (i) Λgi jtk (dti ) as (i)
(i)
Λgi jtk (dti ) = Pgi j (t1 , t2 )hij (t1 , t2 )
j = {1, 2, . . . , γi } i, k = 1, 2 i = k.
Thus the hazard rate of i-th component at time ti with gi observed as the set of possible causes, given the k-th component survives beyond time tk is given by (i)
Λgi tk (dti ) = lim
Δti →0
=
P (Ti ≤ ti + Δti , Gi = gi |Ti ≥ ti , Tk > tk ) Δti (i)
Pgi j (t1 , t2 )hij (t1 , t2 )
i, k = 1, 2 i = k.
(10)
j∈gi
Using (8), summing (10) over all non-empty subsets gi of Gi , we get hi (t1 , t2 ) =
(i) Λgi tk (dti )
gi
=
γ1
hij (t1 , t2 )
i, k = 1, 2 i = k.
(11)
j=1
(i)
The probabilities Pgi j (t1 , t2 ) are usually unknown and need to be estimated. In order to estimate these probabilities, in practice, we consider a simple assumption that (i) Pgi j (t1 , t2 ) is independent of t1 and t2 though it may depend on gi and j. Thus the missing pattern is allowed to be non-ignorable. (i) (i) (∗i) Denote Λgi tk (ti ) as the (2γi − 1) × 1 vector of Λgi jtk (ti )’s and Λgi tk (ti ) as the γi × 1 vector of the cumulative cause specific hazards corresponding to hij (t1 , t2 ). Since (i) (i) (i) Pgi j (t1 , t2 ) is independent of t1 and t2 , we denote Pgi j (t1 , t2 ) by Pgi j and let Pi denote (i)
the (2γi − 1) × γ1 matrix of the Pgi j ’s. Using (10), (i)
(∗i)
Λtk (ti ) = Pi Λtk (ti )
i, k = 1, 2 i = k.
(12)
Let I1×γi be a 1 × γi vector of unity. Then, using (11), we get (∗i)
Λi (t1 , t2 ) = I1×γi Λtk (ti )
i, k = 1, 2 i = k.
(13)
2. Non-Parametric Estimation Under the bivariate right censoring, the observable variables are given by T ∗ = (T1∗ , T2∗ ) and δ = (δ1 , δ2 ) where Ti∗ = min(Ti , Zi ) and δi = I(Ti = Ti∗ ), i = 1, 2 with Z = (Z1 , Z2 ) is a pair of fixed or random censoring times. Thus the observed data
Competing Risks
76
∗ ∗ γi are (T1u , T2u , δ1u , δ2u , G $ 1u , G2u );%u = 1, 2, . . . , n. Consider the (2 − 1) dimen(i)
sional counting process Ngi tk (ti )
gi ∈ℵi
where ℵi consists of all non-empty subsets of (i)
{1, 2, . . . , γi } and i, k = 1, 2, with i = k. In practice, Ngi tk (ti ) represents the observed ∗ ∗ number of pairs (T1u , T2u ) for which Tiu ≤ ti and Tku > tk with gi as the observed set (i)∗ of possible causes, i, k = 1, 2, i = k. Denoting Ngi tk (ti ) = I(Ti∗ ≤ ti , Tk∗ > tk , δi = 0) i, k = 1, 2, i = k, the corresponding intensity process is given by (i)
(i)
αgi tk (ti ) = Y (t1 , t2 )Λgi tk (dti )
i, k = 1, 2 i = k
where Y (t1 , t2 ) = I(T1 ≥ t1 , T2 ≥ t2 ). For each non-empty subset gi of {1, 2, . . . , γi } and for fixed tk (i)
(i)
(i)
Ngi tk (dti ) = αgi tk (ti )dti + Mgi tk (dti ) i, k = 1, 2 i = k % $ (i) (i) (i)∗ where Mgi tk (ti ) = σ Ngi tk (u), Ngi tk (u) : 0 ≤ u ≤ ti ’s are local square integrable (i)
martingales. Therefore, the estimate of Λgi tk (ti ) is directly obtained as ti I(Y (t1 , t2 ) > 0) (i) ˆ (i) (ti ) = Ngi tk (dti ), i = 1, 2 Λ gi tk Y (t1 , t2 ) 0
(14)
From the equation (12), we obtain ˆ (i) (ti ) = Pi Λ∗(i) (ti ) + ε(i) (ti ) Λ tk tk tk
i, k = 1, 2 i = k
(15)
ˆ (i) (ti )’s and for fixed tk , ε(i) (ti ) is a vector process conˆ (i) (ti ) is the vector of Λ where Λ tk tk tk verging to a vector of Gaussian martingales whose variance function is consistently es(i) (i) ˆ (i) (ti ) timated by the matrix diag(ˆ τgi tk (ti )) with τgi tk (ti ) is the variance function of Λ gi tk given by equation (14). Equation (15) can be considered as a linear model with the design matrix Pi to be estimated. Let Pˆi denote a consistent estimate of Pi . Then, using the ∗(i) principle of weighted least squares, a consistent estimate of Λtk (ti ) is −1 (i) ˆ T (i) ˆ ˆ (i) ˆ ∗(i) PˆiT Wtk (ti )Λ Λ tk (ti ) = Pi Wtk (ti )Pi tk (ti )
i, k = 1, 2 i = k
(16)
(i)
where Wtk (ti ) is the inverse of the estimated (2γi − 1) × (2γi − 1) diagonal covariance ˆ ∗(i) (ti ), which is obtained as matrix of Λ tk ' & 1 (i) . Wtk (ti ) = diag (i) τˆgi tk (ti ) For gi j (i)
Pgi j = P (Gi = gi |Ci = j) =
P (Ci = j|Gi = gi )P (Gi = gi ) g j P (Ci = j|Gi = gi )P (Gi = gi ) i
(i)
The estimate of Pgi j is given by
j = 1, 2, . . . , γi , i = 1, 2.
Bivariate Competing Risks Models under Masked Causes of Failure - P.G. Sankaran & A.A. Ansa (i) Pˆgi j =
77
(i) (i)
fgi qjgi gi j
(i) (i)
fg qjg i
i
(i)
where fgi denotes the total number of pairs with Tiu ≤ ti and Tku > tk and Gi observed (i) as gi and qjgi , j = 1, 2, . . . , γi , i = 1, 2. From (13), we get ˆ i (t1 , t2 ) = I1×γi Λ ˆ (∗i) (ti ) Λ tk
i, k = 1, 2 i = k.
(17)
ˆ i (t1 , t2 ), i = 1, 2 cannot be guaranteed to be nonNote that the estimate Λ decreasing, although it is expected to be so for large sample, because of its consistency proved later. In practice, we can use % adjacent violators algorithm to achieve $ the pooling (i) monotonic nature. If some of the Ngi tk (ti ) ’s are not observed to have any jump dur(i)
(i)
ˆ g t (ti ) and fg t (ti ) Šs turn out to be zero, and thus ing the study, the corresponding Λ i k i k the corresponding rows of Pi are also estimated to be zero. From (3), (4) and (17) we obtain, ˆ 1 (t1 , 0) − Λ ˆ 2 (t1 , t2 )} Sˆ1 (t1 , t2 ) = exp{−Λ
(18)
ˆ 1 (t1 , t2 ) − Λ ˆ 2 (0, t2 )}. Sˆ2 (t1 , t2 ) = exp{−Λ
(19)
and
The estimator Sˆi (t1 , t2 ), i = 1, 2 obtained in (18) and (19) may be different. However both are consistent estimators. The proposed estimator is a linear combination of two expressions (18) and (19). Thus the estimator Sˆa (t1 , t2 ) is given by Sˆa (t1 , t2 ) = a(t1 , t2 )Sˆ1 (t1 , t2 ) + (1 − a(t1 , t2 ))Sˆ2 (t1 , t2 ).
(20)
Now the question is how to choose a(t1 , t2 ). Choose the weight a(t1 , t2 ) in such a way that the mean squared error (MSE) of Sˆa (t1 , t2 ) is minimum. Then the weight a(t1 , t2 ) is obtained as V Sˆ2 (t1 , t2 ) − Cov Sˆ1 (t1 , t2 ), Sˆ2 (t1 , t2 ) (21) a(t1 , t2 ) = V Sˆ1 (t1 , t2 ) + V Sˆ2 (t1 , t2 ) − 2Cov Sˆ1 (t1 , t2 ), Sˆ2 (t1 , t2 ) In practice, we estimate the variances and covariance by the bootstrap method, which is an extension of the procedure given in Akritas and van Keilegom (2003) [1] for the bivariate data. To ensure that Sˆa (t1 , t2 ) belongs to the interval [0,1], we place a(t1 , t2 ) by min[1, max{a(t1 , t2 ), 0}]. Remark 1 The extension to the multivariate set up is direct, as the survival function S(t1 , t2 , . . . , tk ) of (T1 , T2 , . . . , Tk ) can be uniquely represented by S(t1 , t2 , . . . , tk ) = exp{−Λ1 (t1 , 0, . . . , 0) − Λ2 (t1 , t2 , 0, . . . , 0) − . . . . . . −Λk (t1 , t2 , . . . , tk )} where
Competing Risks
78
Λj (t1 , . . . , tj , 0, . . . , 0) =
0
tj
(t1 , . . . , tj−1 , u, 0, . . . , 0)du
with hj (t1 , t2 , . . . , tk ) = −
∂ log S(t1 , t2 , . . . , tk ) ∂tj
j = 1, 2, . . . , k.
Remark 2 If both G1 and G2 are singleton sets, then the estimation reduces to the bivariate competing risk case, given in Ansa and Sankaran [3]. Remark 3 The method is an extension of Dewanji and Sengupta. When t2 → 0, (19) reduces to the univariate case given in Dewanji and Sengupta [4]. Remark 4 The strong consistency and asymptotic normality of the estimator Sˆa (t1 , t2 ) can be proved using empirical process. (see Anderson et.al. [2] and van der Vaart and Wellner [11].)
3. Data Analysis To illustrate the estimation procedure given in Section 3, we use the data concerning the times to tumor appearance or death for 50 pairs of mice from the same litter in a tumor genesis experiment (Mantel and Ciminer, [9]), as reported in Ying and Wei [12]. We consider T1 and T2 as failure times (in weeks) for a pair of mice, and Cj (j = 1,2) indicates whether the failure was the appearance of a tumor (Cj = 1) or the occurrence of death prior to tumor appearance (Cj = 2) or the censored observation (Cj = 0). The experiment was terminated at 104 weeks, so there is a common censoring time across all animals of 104. To introduce masking, we randomly allocated the masked set {1, 2} among the observed lifetimes. The estimators Sˆ1 (t1 , t2 ) and Sˆ2 (t1 , t2 ) can be obtained directly from the data using (i) (1) the approach in Section 3 for three cases of qjg , by giving the values (i), q1g = 0.98 and (2)
(1)
(2)
(1)
(2)
q2g =.02 (ii) q1g = 0.5 and q2g =0.5 (iii) q1g = 0.02 and q2g =0.98 with g = {1, 2}. We, then, calculate a, using bootstrap procedure. The estimator, Sˆa (t1 , t2 ), of the survivor function at different time points (55, 90), (97, 79), (87, 74) and (73, 74) is obtained using (i) (20). The value of Sˆa (t1 , t2 ) for three cases of qjg is given in Table 1. The standard errors of the estimates are given in brackets.
4. Discussion In this article, we developed non-parametric estimators for the bivariate survivor function and cause specific distribution functions of competing risk models under masked causes of failure. Asymptotic properties of the estimators were established. We applied the method to the data concerning the times to tumor appearance or death for pairs of mice from the same litter in a tumor genesis experiment (Mantel and Ciminer, [9]). The method is developed under assumption that failure time vector and censoring vector are independent. In many practical situations, this assumption may not hold. The
Bivariate Competing Risks Models under Masked Causes of Failure - P.G. Sankaran & A.A. Ansa
79
Table 1. Estimates of the survivor function S(t1 , t2 ) (t1 , t2 )
Cases
Sˆ1 (t1 , t2 )
Sˆ2 (t1 , t2 )
a
Sˆa (t1 , t2 )
(i)
.735941 (.053)
.718968 (.088)
.733
.7314
(73,74)
(ii)
.729264 (.055)
.709157 (.062)
.559
.7204
(iii)
.727244 (.065)
.708576 (.060)
.468
.7173
(i) (ii)
.399942 (.064) .342390 (.068)
.377815 (.060) .340210 (.061)
.467 .445
.3882 .3411
(iii)
.369583 (.054)
.348013 (.057)
.527
.3593
(i) (ii)
.506083 (.044) .466971 (.051)
.539301 (.048) .510490 (.051)
.543 .500
.5213 .4887
(iii)
.493082 (.047)
.507350 (.055)
.567
.4993
(i)
.570129 (.048)
.516792 (.052)
.541
.5457
(ii) (iii)
.611161 (.052) .648197 (.043)
.590559 (.056) .628784 (.057)
.532 .637
.6015 .6412
(97,79)
(87,74)
(55,90)
work under dependent censoring is yet to be studied. The use of covariates in a regression model is a way to represent heterogeneity in a population. The analysis of multivariate competing risks data in the presence of covariates is an area of research to be explored.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
M.G. Akritas and I van Kielegom, Estimation of bivariate and marginal distributions with censored data, Journal of the Royal Statistical Society, Series B, 65 (2003), 457-471. P.K. Anderson, O. Borgan, R.D. Gill, and N.O. Keiding, Statistical Models based on Counting Process, Springer-Verlang, New York (1993). A. Antony Ansa and P.G. Sankaran, Estimation of bivariate survivor function of competing risk models under censoring, Journal of Statistical Theory and Applications, 4, (2005) 401-423. A. Dewanji and D. Sengupta, Estimation of competing risks with general missing pattern in failure types, Biometrics, 59 (2003), 1063-1070. B.J. Flehinger, B. Reiser, and E. Yashchin, Inference about defects in the presence of masking, Technometrics, 38 (1996), 247-55. B.J. Flehinger, B. Reiser, and E. Yashchin, Survival with competing risks and masked causes of failures, Biometrika, 85 (1998), 151-164. E. Goetghebeur and L. Ryan, Analysis of competing risk data when some failure types are missing, Biometrika, 82 (1995), 821-833. N.L. Johnson and S. Kotz, A vector multivariate hazard rate, Journal of Multivariate Analysis, 5 (1975), 53-66. W. Mantel and J.L. Ciminer, Use of log rank series in the analysis of litter matched data on time to tumor appearance, Cancer Research, 39 (1979), 4308-4315. B. Reiser, I. Guttman, D.K.J Lin, J.S. Usher and F.M. Guess, Bayesian inference for masked system life time data, Applied Statistics, 44 (1995), 79-90. A.W. van der Vaart and J.A. Wellner, Weak Convergence and Empirical Processes with Applications to Statistics Springer Verlag, New York (1996). Z. Ying and L.J. Wei, The Kaplan-Meier estimate for dependent failure time observations, Journal of Multivariate Analysis, 50 (1994), 17-29.
80
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Competing Risks in Repairable Systems Bo Henry LINDQVIST 1 , Norwegian University of Science and Technology, Trondheim, Norway Abstract. We consider repairable systems where the observed events may be of several types. It is suggested to model the observations from such systems as marked point processes, leading to a need for extending the theory of repairable systems to a competing risks setting. In this paper we consider in particular virtual age models and their extension to the case of several types of events. Keywords. Repairable system, marked point process, conditional intensity, competing risks, virtual age
Introduction Data from repairable systems usually contain more information than just the failure times. For example, there may in addition to failure times be information on the times of preventive maintenance (PM), the identity of the failed component, type of failure, type of repair, etc. It is therefore reasonable to model repairable systems by marked point processes where the marks label the types of events. As an example, the marks may be of two kinds, corresponding to whether the event is a failure or a PM. We review some recent literature in this direction with the aim of generalizing some of the classical theory of repairable systems to a competing risks setting. The main focus is on generalizing virtual age models to this more general setting. Two relevant references are Doyen and Gaudoin [6], who present a point process approach for modeling of imperfect repair in competing risks situations between failure and PM, and Bedford and Lindqvist [3], who consider a series system of repairable components where only the failing component is repaired at failures. A general setup for these kinds of processes is suggested in the review paper Lindqvist [9]. The outline of the paper is as follows. In Section 2 we settle the necessary notation. In Sections 3 and 4 we review some basic facts about, respectively, competing risks for non-repairable systems, and the classical virtual age models. The main section is Section 5 where we consider the extension of classical virtual age models for repairable systems to the case of several types of events. The final Section 6 explains how to derive likelihood functions for data from the studied models. 1 Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7491 Trondheim, Norway; E-mail:
[email protected].
Competing Risks in Repairable Systems - B.H. Lindqvist
r (T1 , J1 )
0
r
-
X1
r
(T2 , J2 ) X2
-
81
(T3 , J3 ) X3
t
-
Figure 1. Event times (Ti ), event types (Ji ) and sojourn times (Xi ) of a repairable system.
1. Notation and Basic Results We consider repairable systems where time usually runs from t = 0 and where events occur at ordered times T1 , T2 , . . .. We assume that the system is always restarted immediately after failure or a maintenance action, thus disregarding time durations of repair and maintenance. Types of events (type of maintenance, type of failure, etc.) are recorded as J1 , J2 , . . . with Ji ∈ J for some finite set J which depends on the current application. The observable process (T1 , J1 ), (T2 , J2 ), . . . is a marked point process. The inter-event, or inter-failure, times will be denoted X1 , X2 , . . ., where Xi = Ti − Ti−1 , i = 1, 2, . . . (with T0 ≡ 0). Figure 1 illustrates the notation. We also make use of the counting process representation Nj (t) = number of events of type j in (0, t], which counts the number of events of type j ∈ J , and N (t) = j∈J Nj (t), which counts the number of events irrespective of their types. To describe probability models for the considered processes we use some notation from the theory of point processes [2]. Let Ft− denote the history of the marked point process up to, but not including, time t. We assume that Ft− includes all information on event times and event types before time t. Formally, Ft− is generated by the set {Nj (s) : 0 ≤ s < t, j ∈ J }. The conditional intensity of the process with respect to events of type j ∈ J is now defined as γj (t) = lim
Δt↓0
P r(event of type j in [t, t + Δt)|Ft− ) , Δt
(1)
which we call the type-specific intensity for j. Thus, γj (t)Δt is, approximately, the probability of an event of type j in the time interval [t, t + Δt) given the history before time t.
2. Competing Risks for a Non-Repairable System Consider first a non-repairable system. Assume that this system may fail due to one of several causes, or may be stopped for preventive maintenance (PM) before it fails, in which case failure is avoided and the failure time is censored. We can formally think of this as having a system with, say, n components, denoted {C1 , C2 , . . . , Cn }, where a unique failing component can be identified at failures of the system, and where PM is represented by one of these components in order to simplify notation. Let Wj be the potential failure time due to failure of component Cj , j = 1, 2, . . . , n. The actual observation is then the pair (T, J), where T = min(W1 , . . . , Wn ) is the failure time and J is the identity of the failing component,
Competing Risks
82
say J = j if the component Cj fails. This determines a competing risks situation with n competing risks (Crowder [5], Ch. 3). The joint distribution of (T, J) is identifiable from data, as are the so called type-specific hazards defined by hj (t) = lim
Δt↓0
P r(t < T ≤ t + Δt, J = j | T > t) . Δt
(2)
However, neither the joint nor the marginal distributions of the individual potential failure times W1 , . . . , Wn are identifiable in general from observation of (T, J) only (Crowder [5], Ch. 7). The dilemma from a practical point of view is of course that these marginal and joint distributions are indeed of interest in reliability applications, for example in connection with maintenance optimization. An example is given next. 2.1. Example: Modeling of Failure vs. Preventive Maintenance Cooke [4] considered a competing risks situation with a potential failure of a unit at some time W1 and a potential action of preventive maintenance to be performed at time W2 . Thus n = 2, while J = 1 corresponds to failure of the unit, and J = 2 corresponds to the action of PM. Knowledge of the marginal distribution of W1 would be particularly important since it is the basic failure time distribution of the unit when there is no PM. However, as already noted, the marginal distributions of W1 and W2 are not identifiable unless specific assumptions are made on the dependence between W1 and W2 . The most common assumption of this kind is that W1 and W2 are independent, in which case identifiability follows (Crowder [5], Ch. 7). This assumption is unreasonable in the present application, however, since the maintenance crew is likely to have some information regarding the unit’s state during operation. Thus we are in practice faced with a situation of dependent competing risks between W1 and W2 , and hence identifiability of marginal distributions require additional assumptions.
3. Virtual Age Models for a Single Type of Events The main ingredients of the classical virtual age model [7] is a hazard function ν(·), interpreted as the hazard function of a new system, and a virtual age process which is a stochastic process which depends on the actual repair actions performed. The idea is to distinguish between the system’s true age, which is the time elapsed since the system was new, usually at time t = 0, and the system’s virtual age which describes its present condition when compared to a new system. The main feature is that the virtual age can be redefined at failures according to the type of repair performed, while it runs along with the true time between repairs. A system with virtual age v ≥ 0 is assumed to behave exactly like a new system which has reached age v without having failed. The hazard rate of a system with virtual age v is thus ν(v + t) for t > 0. A variety of so called imperfect repair models can be obtained by specifying properties of the virtual age process. For this, suppose v(i) is the virtual age of the system immediately after the ith event, i = 1, 2 . . .. The virtual age at time t > 0 is then defined by A(t) = v(N (t−))+ t− TN (t−), which is the sum of the virtual age after the last event before t and the time elapsed since the last event. The process A(t), called the virtual age process, thus increases linearly between events and may jump only at events.
Competing Risks in Repairable Systems - B.H. Lindqvist
83
4. Repairable Systems with Several Types of Events Consider the setup of Section 3, where the system may fail due to one of several causes. Suppose now that the system is repaired after failure, then is put into operation, then may fail again, then is repaired, and so on. This can be represented by the marked point process described in Section 2 with marks in J = {1, 2, . . . , n} describing the identity of the failing component (or the type of event). The properties of this process depend on the repair strategy. Various classes of models can be described in terms of a generalization of the virtual age concept. Langseth and Lindqvist [8] suggested a model involving imperfect maintenance and repair in the case of several components and several failure causes. Doyen and Gaudoin [6] developed the ideas further by presenting a general point process framework for modeling of imperfect repair by a competing risks situation between failure and PM. Bedford and Lindqvist [3] considered a series system of n repairable components where only the failing component is repaired at failures. Following [9] we present next a generalization of the virtual age models to the case where there are more than one type of events, and where the virtual age process is multidimensional. The first ingredient of a virtual age model for n components is given by a vector process A(t) = (A1 (t), . . . , An (t)) containing the virtual ages of the n components at time t. The crucial assumption is that A(t) = (A1 (t), . . . , An (t)) ∈ Ft− , which means that the component ages are functions of the history up to time t. As for the case with n = 1 (Section 4) it is assumed that the Aj (t) increase linearly with time between events, and may jump only at event times. Let vj (i) be the virtual age of component j immediately after the ith event. The virtual age process for component j is then defined by Aj (t) = vj (N (t−)) + t − TN (t−) . The second ingredient of a virtual age model in the case n = 1 consists of a specification of the hazard function ν(·). For general n we replace this by functions νj (v1 , . . . , vn ) for v1 , v2 , . . . , vn ≥ 0, such that the conditional intensity (1) of type j events, given the history Ft− , is γj (t) = νj (A1 (t), . . . , An (t)). Thus νj (v1 , . . . , vn ) is the intensity of an event of type j when the component ages are v1 , . . . , vn , respectively. The conditional intensity thus depends on the history only through the virtual ages of the components. The family {νj (v1 , . . . , vn ) : v1 , v2 , . . . , vn ≥ 0} describes the failure mechanisms of the components and the dependence between them in terms of the ages of all the components. 4.1. Specific Models and Their Virtual Age Processes Most of the virtual age processes commonly studied in the case n = 1 can be generalized to the present case of several event types.
Competing Risks
84
4.1.1. Perfect Repair of Complete System Suppose that all the components are repaired to as good as new at each failure of the system. In this case we have vj (i) = 0 for all j and i, and hence Aj (t) = t − TN (t−) for all j. It follows that we can only identify the “diagonal” values νj (t, . . . , t) of the functions νj , which as noted in Section 5.2 (below) are given by the type-specific hazards defined in (2). This is not surprising in view of the fact that the present case of perfect repair essentially corresponds to observation of i.i.d. realizations of the non-repairable competing risks situation. 4.1.2. Minimal Repair of Complete System In the given setting a minimal repair will mean that following an event, the process is restarted in the same state as was experienced immediately before the event. This implies that vj (i) = Ti for all i, j and hence that Aj (t) = t for all j. Note that the complete set of functions νj is again not identifiable. For the case n = 1 it is well known that minimal repair results in a failure process which is a non-homogeneous Poisson process (NHPP). In the present case where several components are minimally repaired, it follows similarly that the failure processes of the individual components are independent NHPPs with the intensity for component j being νj (t, . . . , t) given by (2). 4.1.3. A Partial Repair Model Bedford and Lindqvist [3] suggested a partial repair model for the case of n components, by defining
0 if Ji = j vj (i) = vj (i − 1) + Xi if Ji = j. Thus, the age of the failing component is reset to 0 at failures, whereas the ages of the other components are unchanged. The virtual age processes are then simply given as Aj (t) = time since last event of type j. The authors considered a single realization of the process, with the main result being that under reasonable conditions pertaining to ergodicity, the functions νj (v1 , . . . , vn ) are identifiable. 4.1.4. Age Reduction Models Doyen and Gaudoin [6] considered various types of age reduction models, where the main idea can be described as follows. Let there be given so called age reduction factors 0 ≤ ρj,j ≤ 1 for all j, j ∈ J and assume that vj (i) = (1 − ρj,Ji )(vj (i − 1) + Si ). This means that if the ith failure is of type Ji = j , then the virtual age of component j immediately before the ith failure, vj (i − 1) + Si , is reduced by the factor 1 − ρj,j . Note that the partial repair model is obtained when ρj,j = 1 if j = j and 0 otherwise. Note also that Kijima-type models are obtained if the ρj,j are random (and unobserved).
Competing Risks in Repairable Systems - B.H. Lindqvist
85
4.2. The Intensity Functions νj In principle the functions νj (v1 , . . . , vn ) could be any functions of the component ages. Bedford and Lindqvist [3] motivated these functions by writing, for j = 1, . . . , n, νj (v1 , . . . , vn ) = λj (vj ) + λj∗ (v1 , . . . , vn ),
(3)
with the convention that λj∗ (v1 , . . . , vn ) = 0 when all the component ages except the jth are 0, in order to have uniqueness. The λj (vj ) is then thought of as the intensity of component j when working alone or together with only new components, while λj∗ (v1 , . . . , vn ) is the additional failure intensity imposed on component j caused by the other components when they are not all new. Note that any functions of v1 , . . . , vn can be represented this way, by allowing the λj∗ to be negative as well as positive. Langseth and Lindqvist [8] and Doyen and Gaudoin [6] extended the competing risks situation between failure and PM, as described in Section 3.1, and suggested ways to define suitable functions νj . Their main ideas can be described as follows. Starting from a state where the component ages are, respectively, v1 , . . . , vn , let the time to next event be governed by the competing risks situation between the random variables W1∗ , . . . , Wn∗ with distribution equal to the conditional distribution of W1 − v1 , . . . , Wn − v2 given W1 > v1 , . . . , Wn > vn , where the Wi are the ones defined in the non-repairable case described in Section 3. It is rather straightforward to show that this implies νj (v1 , . . . , vn ) =
−∂j R(v1 , . . . , vn ) , R(v1 , . . . , vn )
(4)
where R(v1 , . . . , vn ) = P (W1 > v1 , . . . , Wn > vn ) is the joint survival function of the Wi , and ∂j R means the partial derivative with respect to the jth entry in R. Note that this corresponds to the usual definition of hazard rate in the case n = 1. Further, we have νj (t, t, . . . , t) = hj (t) where the latter is the type specific hazard rate given in (2). A final remark on the suggested construction of the functions νj is due. It was demonstrated by a counter example in Bedford and Lindqvist [3] that, even in the case n = 2, it is not always possible to derive a general set of functions νj (v1 , . . . , vn ) from a single joint survival distribution as in (4).
5. The Likelihood Function for Data from Virtual Age Models Suppose that we have observed a single marked point process of the kind described in Section 2, from time 0 to time τ , with observations (T1 , J1 ), (T2 , J2 ), . . . , (TN (τ ) , JN (τ ) ). The likelihood function is then given by (see [2], Section II.7), ⎧ ⎫
τ . (τ ) ⎨N ⎬ L= γJi (Ti ) exp − γ(u)du , ⎩ ⎭ 0 i=1
where γ(u) = j γj (u). For data from a virtual age process as described in the previous section we get from this the likelihood function
86
Competing Risks
L=
⎧ (τ ) ⎨N ⎩
i=1
⎫ ⎧ ⎫ ⎬ ⎨ τ ⎬ νJi (A1 (Ti ), . . . , An (Ti )) exp − νj (A1 (u), . . . , An (u))du . ⎭ ⎩ ⎭ 0 j
In the case when we have observations from several independent processes of the same kind, the total likelihood is found as the product of expressions for L given above, one for each process. 5.1. Example: Minimal Repair Model (NHPP) In this case we have A1 (t) = A2 (t) = · · · = An (t) = t, so we need only model ν˜j (v) = νj (v, v, . . . , v). It follows that ⎧ ⎫ (τ ) ⎨N ⎬ τ L= ν˜Ji (Ti ) exp − ν˜j (u)du (5) ⎩ ⎭ 0 i=1
j
Abu-Libdeh, Turnbull and Clark [1] considered the extension of this obtained by introducing both fixed covariates, given by a vector x, and an unobserved heterogeneity, represented by a factor θξj for the jth type of event, where θ is gamma-distributed and the vector (ξj ) is Dirichlet-distributed and independent of θ. Their model then assumes that, conditional on the heterogeneity factor θξj , the intensity of a failure of type j is
ν˜j (t) = θξj λ0 (t)ex β for some baseline intensity function λ0 (t) and a vector β of parameters. The unconditional likelihood function is obtained by taking the expectation of L in (5) with respect to θ, (ξj ). For identifiability it is then necessary to have data from several (independent) processes.
6. Conclusion Most of the classical theory of repairable systems has been developed without taking into account the possibility of competing risks at each failure time. On the other hand, modern reliability databases almost always have information on such competing risks, leading to the need for appropriate methods for their analysis. As was noted in the seminal papers by Cooke [4], most such databases assume, tacitly, that risks are independent. This assumption implies, however, that the rate of occurrence of each of the competing risks would be unaffected by removing the others. For competing risks between failure and PM, for example, this would mean that the rate of occurrence of critical failures would be unaffected by stopping preventive maintenance activity. This is of course completely unreasonable. The appropriate way of analysis would be to invoke a more careful modeling by competing risks, using in principle all available information. Fortunately, the recent reliability literature include several papers where competing risks are studied in connection with repaired and maintained systems. The purpose of the present paper has been to touch some of the topics from these papers.
Competing Risks in Repairable Systems - B.H. Lindqvist
87
References [1]
H. Abu-Libdeh, B. W. Turnbull and L. C. Clark, Analysis of multi-type recurrent events in longitudinal studies; application to a skin cancer prevention trial, Biometrics 46 (1990), 1017–1034. [2] P. K. Andersen, Ø. Borgan, R. Gill and N. Keiding, Statistical Models Based on Counting Processes, Springer, New York, 1993. [3] T. Bedford and B. H. Lindqvist, The identifiability problem for repairable systems subject to competing risks, Advances in Applied Probability 36 (2004), 774–790. [4] R. M. Cooke, The design of reliability databases, Part I and II. Reliability Engineering and System Safety 51 (1996), 137–146 and 209–223. [5] M. J. Crowder, Classical competing risks, Chapman & Hall/CRC, Boca Raton, 2001. [6] L. Doyen and O. Gaudoin, Imperfect maintenance in a generalized competing risks framework, Journal of Applied Probability 43 (2006), 825–839. [7] M. Kijima, Some results for repairable systems with general repair, Journal of Applied Probability 26 (1989), 89–102. [8] H. Langseth, and B. H. Lindqvist, A maintenance model for components exposed to several failure mechanisms and imperfect repair, In: B. H. Lindqvist and K. A. Doksum (Eds.), Mathematical and Statistical Methods in Reliability, World Scientific Publishing, Singapore, pp. 415–430, 2003. [9] B. H. Lindqvist, On the statistical modeling and analysis of repairable systems, Statistical Science 21 (2006), 532–551.
88
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Conditionally Independent Generalized Competing Risks for Maintenance Analysis Yann DIJOUX, a Laurent DOYEN b and Olivier GAUDOIN
a,1
a
b
Grenoble INP, Laboratoire Jean Kuntzmann, France Université Pierre Mendes France Grenoble 2, Laboratoire Jean Kuntzmann, France Abstract. A complex repairable system is subjected to corrective maintenance (CM) and condition-based preventive maintenance (PM) actions. In order to take into account both the dependency between PM and CM and the possibility of imperfect maintenances, a generalized competing risks model have been introduced in [5]. In this paper, we study the particular case for which the potential times to next PM and CM are independent conditionally to the past of the maintenance process. We address the identifiability issue and find a result similar to that of [2] for usual competing risks. We propose a realistic model with exponential risks and derive the maximum likelihood estimators of its parameters. Keywords. Reliability, imperfect maintenance, competing risks, point processes
Introduction Complex repairable systems are submitted to two kinds of maintenance actions. Corrective maintenance (CM), also called repair, is carried out after a failure and intends to put the system into a state in which it can perform its function again. Preventive maintenance (PM) is carried out when the system is operating and intends to slow down the wear process and reduce the frequency of occurrence of system failures. Planned PM occur at predetermined times. Condition-based PM occur at times which are determined according to the results of inspections and degradation or operation controls. In this study, we focus on condition-based PM. Then CM and PM times are both random and the sequence of maintenance times is a random point process. In [5], we introduced the Generalized Competing Risks (GCR) models. It is a modeling framework for the maintenance process which takes both into account the possibility of imperfect maintenance and the dependency between CM and condition-based PM. The aim of this paper is to study the particular case of Conditionally Independent Generalized Competing Risks (CIGCR). 1 Corresponding Author: Olivier Gaudoin, Grenoble INP, Laboratoire Jean Kuntzmann, BP 53, 38041 Grenoble Cedex 9, France ; E-mail:
[email protected].
Generalized Competing Risks - Y. Dijoux et al.
89
1. Modelling of the Maintenance Process The PM-CM process is the sequence of PM times and CM times. Maintenance durations are assumed to be negligible or not taken into account. Then, we introduce the following notations. • • • • • •
{Ck }k≥1 the maintenance times (CM and PM), with C0 = 0. {Wk }k≥1 the times between maintenances, Wk = Ck − Ck−1 . K = {Kt }t≥0 the counting maintenance (CM and PM) process. N = {Nt }t≥0 the counting CM process. M = {Mt }t≥0 the counting PM process. {Uk }k≥1 the indicators of maintenance types: Uk = 0 if the k th maintenance is a CM and Uk = 1 if the k th maintenance is a PM.
In the following, bold characters denote vectors, for instance Wk = (W1 , ..., Wk ). The PM-CM process can either be written as a bivariate counting process {Nt , Mt }t≥0 or as a colored counting process: {Kt , UKt }t≥0 . The color associated to an event of the global maintenance process specifies whether the maintenance is preventive or corrective.
2. Characterization of the PM-CM Process Let Ht = σ({Ns , Ms }0≤s≤t ) = σ({Ks , UKs }0≤s≤t ) be the natural filtration generated by the past of the processes N and M at time t. It is well known [1] that the maintenance process is characterized by three stochastic intensities. The CM intensity is: λN t = lim
Δt→0
1 P (Nt+Δt − Nt− = 1|Ht− ) Δt
(1)
The PM intensity is: λM t = lim
Δt→0
1 P (Mt+Δt − Mt− = 1|Ht− ) Δt
(2)
The (global) maintenance intensity is: N M λK t = λt + λt = lim
Δt→0
1 P (Kt+Δt − Kt− = 1|Ht− ) Δt
(3)
In a parametric approach, the parameters θ of the PM and CM intensities can be estimated by the maximum likelihood method. The likelihood function associated to a single observation of the PM-CM process on [0, t] is : ⎛ ⎞ K Kt− +1 C i t 1−U U i i ⎠ exp ⎝− λN λM λK Lt (θ) = (4) C C s ds i
i=1
i
i=1
Ci−1
In order to build a model of the maintenance process, it is necessary to express the probability of instantaneous PM and CM given all the past of the maintenance process. A realistic model has to take into account both the efficiency of maintenance (which is not necessarily perfect or minimal) and the possible dependency between both kinds of maintenance, due to the fact that CM and PM are linked to the degradation process.
90
Competing Risks
3. Usual Competing Risks Models A simple way of modeling the PM-CM process is the competing risks (CR) approach, developed in the context of maintenance e.g. in [3]. After the k th maintenance, the latent time to the next failure (or the next CM) is a random variable Zk+1 . But the failure can be avoided by a potential PM that can take place at a random time Yk+1 after the k th maintenance. Zk+1 and Yk+1 are not observed. The observations are the time to next maintenance Wk+1 = min(Yk+1 , Zk+1 ) and the type of next maintenance Uk+1 = 1I{Yk+1 ≤Zk+1 } . Yk+1 and Zk+1 are called the risk variables. In the usual competing risks problem, it is assumed that the couples {(Yk , Zk )}k≥1 are independent and identically distributed (iid), so the {(Wk , Uk )}k≥1 are also iid. This means that the effect of every PM and CM is supposed to be perfect. The dependency between each type of maintenance is expressed by the joint distribution of (Y1 , Z1 ), characterized by the joint survival function: S1 (y, z) = P (Y1 > y, Z1 > z)
(5)
A well known problem of usual competing risks models is that the distribution of (Y1 , Z1 ) is not identifiable. In fact, the distribution of the observations {(Wk , Uk )}k≥1 depends only on the sub-survival functions [4] : SZ∗ 1 (z) = P (Z1 > z, Z1 < Y1 ) = P (W1 > z, U1 = 0)
(6)
SY∗1 (y) = P (Y1 > y, Y1 ≤ Z1 ) = P (W1 > y, U1 = 1)
(7)
The assumption that the {(Yk , Zk )}k≥1 are iid is not realistic because the effects of all CM and PM are not perfect. Moreover, PM and CM should be dependent because: • PM and CM are linked to the degradation process. • The aim of PM is to reduce the frequency of failures, so PM should delay CM. • CM can have an influence on the future PM policy. Then, it is interesting to generalize the usual competing risks models in order to take into account any kind of imperfect maintenance effect and any kind of dependency between CM and PM.
4. Generalized Competing Risks Models By a generalized competing risks model (GCR [5]), we mean a competing risks model for which the couples {(Yk , Zk )}k≥1 are not assumed to be iid. The couples {(Wk , Uk )}k≥1 are therefore also not iid. Thus, the effect of every PM and CM can be imperfect. The usual competing risks objects are naturally generalized by introducing a conditioning on the past of the PM-CM process. The CM-PM conditional generalized survival function is: Sk+1 (y, z; Wk , Uk ) = P (Yk+1 > y, Zk+1 > z | Wk , Uk )
(8)
The generalized sub-survival functions are: SZ∗ k+1 (z; Wk , Uk ) = P (Zk+1 > z, Zk+1 < Yk+1 | Wk , Uk )
(9)
Generalized Competing Risks - Y. Dijoux et al.
SY∗k+1 (y; Wk , Uk ) = P (Yk+1 > y, Yk+1 ≤ Zk+1 | Wk , Uk )
91
(10)
The conditional survival functions of the risk variables are: SZk+1 (z; Wk , Uk ) = P (Zk+1 > z | Wk , Uk )
(11)
SYk+1 (y; Wk , Uk ) = P (Yk+1 > y | Wk , Uk )
(12)
The maintenance intensities can be written in terms of the PM-CM survival functions: / 0 ∂ − SKt− +1 (y − CKt− , z − CKt− ; WKt− , UKt− ) ∂z (t,t) (13) λN = t SKt− +1 (t − CKt− , t − CKt− ; WKt− , UKt− ) / 0 ∂ − SKt− +1 (y − CKt− , z − CKt− ; WKt− , UKt− ) ∂y (t,t) λM (14) t = SKt− +1 (t − CKt− , t − CKt− ; WKt− , UKt− ) λK t = −
d ln SKt− +1 (t − CKt− , t − CKt− ; WKt− , UKt− ) dt
(15)
Finally, the likelihood (4) can be rewritten. Lt (θ) = SKt− +1 (t − CKt− , t − CKt− ; WKt− , UKt− )× K / t i=1
/ 0Ui 01−Ui ∂ ∂ − Si (y, z; Wi−1 , Ui−1 ) − Si (y, z; Wi−1 , Ui−1 ) ∂y ∂z (Wi ,Wi ) (Wi ,Wi ) (16)
It can be seen that the PM-CM intensities and the likelihood depend only on the values of the PM-CM survival functions on the first diagonal. Then, there will be here the same identifiability problem as in classical competing risks models.
5. Conditionally Independent Generalized Competing Risks Models The most simple way of building a GCR model is to make a conditional independence assumption. The risks variables {(Yk , Zk )}k≥1 are said to be conditionally independent if they are independent conditionally to the past of the maintenance process: ∀k ≥ 0, ∀y ≥ 0, ∀z ≥ 0, Sk+1 (y, z; Wk , Uk ) = SYk+1 (y; Wk , Uk ) SZk+1 (z; Wk , Uk )
(17)
The corresponding GCR models are called the conditionally independent generalized competing risks models (CIGCR). Note that PM and CM are dependent through the past of the maintenance process.
92
Competing Risks
The maintenance intensities are: λN t = λZK − +1 (t − CKt− ; WKt− , UKt− )
(18)
λM t = λYK − +1 (t − CKt− ; WKt− , UKt− )
(19)
λK t = λWK − +1 (t − CKt− ; WKt− , UKt− )
(20)
t
t
t
where λX denotes the hazard rate of the random variable X. The conditional survival functions can be expressed as functions of the maintenance intensities: 1 z 2 λN (k; W , U ) du (21) SZk+1 (z; Wk , Uk ) = exp − k k ck +u 0
1 SYk+1 (y; Wk , Uk ) = exp −
0
y
2 λM ck +u (k; Wk , Uk ) du
(22)
Then, a CIGCR model is identifiable. Now we have an identifiability result, equivalent to that of [2]. 1. Two CIGCR models with the same CM and PM intensities have the same generalized joint survival function. 2. For every GCR model, there exists a CIGCR model with the same CM and PM intensities. The first result confirms that, for a CIGCR model, Sk+1 is identifiable for all k. The second one proves that it is not true for all GCR models. Then, in order to predict the future of the maintenance process, it is possible to use a CIGCR model. But in order to obtain information on the failure process without PM, additional assumptions are needed on the joint distribution of (Yk+1 , Zk+1 ) given (Wk , Uk ).
6. Exponential CIGCR Models An exponential CIGCR model is such that the conditional distributions of Yk+1 and Zk+1 given (Wk , Uk ) are exponential, with respective parameters λY (Wk , Uk ) and λZ (Wk , Uk ). Then, the conditional survival functions are: Z
(Wk ,Uk ) z
(23)
Y
(Wk ,Uk ) y
(24)
SZk+1 (z; Wk , Uk ) = e−λ SYk+1 (y; Wk , Uk ) = e−λ
The joint survival function is: Y
Sk+1 (y, z; Wk , Uk ) = e−λ
(Wk ,Uk ) y−λZ (Wk ,Uk ) z
(25)
and the conditional distribution of Wk+1 is also exponential: Y
SWk+1 (w; Wk , Uk ) = e−[λ
(Wk ,Uk )+λZ (Wk ,Uk )] w
(26)
Generalized Competing Risks - Y. Dijoux et al.
93
The maintenance intensities and the conditional subsurvival functions can easily be derived: Z λN t = λ (WKt− , UKt− )
(27)
Y λM t = λ (WKt− , UKt− )
(28)
SZ∗ k+1 (z; Wk , Uk ) =
λZ (Wk , Uk ) λY (Wk , Uk ) + λZ (Wk , Uk ) × exp{−[λY (Wk , Uk ) + λZ (Wk , Uk )] z}
SY∗k+1 (y; Wk , Uk ) =
λY
(29)
λY (Wk , Uk ) (Wk , Uk ) + λZ (Wk , Uk )
× exp{−[λY (Wk , Uk ) + λZ (Wk , Uk )] y}
(30)
Finally, the likelihood function associated to the observation of the maintenance process on [0, t] is: Lt (θ) =
Kt i=1
λZ (Wi−1 , Ui−1 )1−Ui λY (Wi−1 , Ui−1 )Ui
⎞ " # × exp ⎝− λY (Wi−1 , Ui−1 ) + λZ (Wi−1 , Ui−1 ) Wi ⎠ ⎛
Kt− +1
(31)
i=1
In order to build an exponential CIGCR model, it is necessary to define how λY and λZ depend on (Wk , Uk ). In other words, we have to find a model of the influence of past CM and PM to next CM and PM.
7. A Tractable Exponential CIGCR Model The dependency between PM and CM can be expressed on the following way. If there have been lots of failures (CM) in the past, the system is not reliable enough. To improve it, the PM have to be performed sooner than expected. In other words, CM accelerate PM. Conversely, if there have been lots of PM, the PM should delay the occurrence of failures. In other words, PM delay CM. We will build a model which reflects these assumptions. We first assume that Z1 and Y1 are independent and exponentially distributed with respective parameters λc and λp . We consider here that delaying a maintenance is multiplying the concerned rate by a constant α < 1. Similarly, accelerating a maintenance is multiplying the concerned rate by a constant β > 1. Then, if the first maintenance is a PM (U1 = 1), we assume that : • λY (W1 , 1) = λp (PM frequency is unchanged). • λZ (W1 , 1) = αλc (CM frequency is decreased : CM is delayed).
94
Competing Risks
If the first maintenance is a CM (U1 = 0), we assume that : • λY (W1 , 0) = βλp (PM frequency is increased : PM is accelerated). • λZ (W1 , 0) = λc (CM frequency is unchanged). Both cases lead to : • λY (W1 , U1 ) = λp β 1−U1 . • λZ (W1 , U1 ) = λc αU1 . With the same assumptions on next maintenances, we obtain: k
λY (Wk , Uk ) = λp β i=1 k
λZ (Wk , Uk ) = λc αi=1
(1−Ui )
Ui
= λp β NCk
(32)
= λc αMCk
(33)
where NCk and MCk are respectively the numbers of CM and PM occurred before k th maintenance. Note that NCk + MCk = k. The maintenance intensities of this model are: Mt λN t = λc α
(34)
Nt λM t = λp β
(35)
Nt λK + λc αMt t = λp β
(36)
The model parameters have a simple practical interpretation. • λc characterizes the initial reliability: it is the failure rate of the system if it is not maintained. • λp characterizes the initial preventive maintenance policy: it is the PM rate if the system is replaced by a new one at each failure. • α characterizes the PM efficiency : the smaller α is, the more PM will manage to delay failures. • β characterizes the reactivity of the maintenance team : the larger β is, the more PM will be anticipated in case of failure. The likelihood function associated to the observation of k maintenances between 0 and t is: Lt (λp , λc , α, β; Wk , Uk ) =
k
(λc αMCi−1 )1−Ui (λp β NCi−1 )Ui
i=1
&
× exp −
k+1
λp β
NCi−1
MCi−1
+ λc α
'
wi
(37)
i=1
with Wk+1 = t − Ck . ˆp, λ ˆc , α Then, it is easy to prove that the maximum likelihood estimators λ ˆ and βˆ are such that:
Generalized Competing Risks - Y. Dijoux et al.
ˆp = λ
MCk k+1
ˆc = λ
βˆNCi−1 Wi
95
NCk k+1
i=1
MCi−1
α ˆ
(38) Wi
i=1
α ˆ and βˆ are solution of two implicit equations : k
(1 − Ui )MCi−1
i=1
k i=1
k+1 i=1
Ui NCi−1
k+1 i=1
MCi−1
k+1
α ˆ MCi−1 MCi−1 Wi
(39)
k+1 NCi−1 ˆ Wi = MCk β βˆNCi−1 NCi−1 Wi
(40)
α ˆ
Wi = NCk
i=1
i=1
With these estimates, it is possible to assess the system reliability and the efficiency of both types of maintenance.
8. Discussion The generalized competing risks provide a general framework for the modeling of the maintenance process, with possibly dependent CM-PM and imperfect maintenance. The conditional independence assumption allows to build simple models with a practical interpretation. The identifiability property shows that, for each kind of data set, a CIGCR model can be adapted. The properties of the exponential CIGCR model have to be studied. The model should be applied to real data. Finally it is possible to build other models, for instance with different CM-PM dependency assumptions (random sign, delay), or with Weibull distribution instead of exponential.
References [1] [2] [3] [4] [5]
P.K. Andersen, O. Borgan, R.D. Gill and N. Keiding, Statistical Models Based on Counting Processes, Springer-Verlag, New-York, 1993. R.M. Cooke, The total time on test statistic and age-dependent censoring, Statistics and Probability Letters, 18(3) (1993), 307–312. R.M. Cooke and T. Bedford, Reliability databases in perspective, IEEE Transactions on Reliability, 51 (2002), 294–310. M.J. Crowder, Classical competing risks, Chapman & Hall, London, 2001. L. Doyen and O. Gaudoin, Imperfect maintenance in a generalized competing risks framework, Journal of Applied Probability, 43(3) (2006), 825–839.
96
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Asymptotic Properties of Bivariate Competing Risks Models Maxim FINKELSTEIN, a,1 , Veronica ESAULOVA b University of the Free State, Republic of South Africa and Max Planck Institute for Demographic Research, Germany b Pearl Group Ltd, UK a
Abstract. A bivariate competing risks model is considered for a general class of survival models. The lifetime distribution of each component is indexed by a frailty parameter. Under the assumption of conditional independence of components the correlated frailty model is considered. The explicit asymptotic formula for the mixture failure rate of a system is derived. It is proved that asymptotically, as t → ∞, the remaining lifetimes of components tend to be independent in the defined sense. Keywords. Frailty, Mixture failure rate, Competing risks, Bivariate distribution
Introduction Mixtures of distributions is a convenient tool for analyzing univariate frailty models. This topic was thoroughly investigated in the literature ([2], [7], [4], [1], [3], [8]) to name a few. In [5] a general class of univariate lifetime models with frailties was considered. A basic model for F (t, z) - an absolutely continuous cumulative distribution function (cdf) of a lifetime random variable T , was defined as Λ(t, z) = A(zφ(t)) + ψ(t),
(1)
t where Λ(t, z) = 0 λ(u, z)du is the corresponding cumulative failure rate and z is a realization of frailty Z. The general assumptions on the functions involved were rather natural: A(s), φ(t) and ψ(t) are differentiable, the right hand side of (1) is non-decreasing in t and increases to infinity as t → ∞, and A(zφ(0)) + ψ(0) = 0. Popular in reliability, survival analysis and risk analysis proportional hazards (PH), additive hazards (AH) and accelerated life (ALM) models are special cases of (1): PH (multiplicative) Model:
Let
A(u) ≡ u,
λ(t, z) = zλ(t),
φ(t) = Λ(t),
Λ(t, z) = zΛ(t).
ψ(t) = 0.
Then (2)
1 Department of Mathematical Statistics, University of the Free State PO Box 339, 9300 Bloemfontein, Republic of South Africa; E-mail:
[email protected]
Asymptotic Properties of Bivariate Competing Risks Models - M. Finkelstein & V. Esaulova
Accelerated Life Model:
Λ(t, z) = AH Model:
Let
A(u) ≡ Λ(u),
Let
φ(t) = t,
ψ(t) = 0.
97
Then
tz
λ(u)du = Λ(tz),
0
A(u) ≡ u,
φ(t) = t,
λ(t, z) = z + ψ (t),
λ(t, z) = zλ(tz).
ψ(t) is increasing, ψ(0) = 0. Λ(t, z) = zt + ψ(t).
(3) Then (4)
In the current study we use and develop asymptotic methodology employed for the univariate case for analyzing the behavior of failure rates in the competing risk setting with a bivariate frailty.
1. Bivariate Frailty and Competing Risks Assume that risks are dependent only via the bivariate frailty (Z1 , Z2 ). To construct the corresponding competing risks model consider firstly a system of two statistically independent components in series with lifetimes T1 ≥ 0 and T2 ≥ 0. The Cdf function of this system is Fs (t) = 1 − F¯1 (t)F¯2 (t), where F1 (t) and F2 (t) are the cdfs of the lifetime random variables T1 and T2 respectively (F¯i (t) ≡ 1 − Fi (t)). Assume now that Fi (t), i = 1, 2 are indexed by random variables Zi in the following conventional sense: P (Ti ≤ t | Zi = z) ≡ P (Ti ≤ t | z) = Fi (t, z),
i = 1, 2
and that the pdfs fi (t, z) exist. Then the corresponding failure rates λi (t, z) are fi (t, z)/F¯i (t, z). Let Zi , i = 1, 2 be interpreted as non-negative random variables with supports in [ai , bi ], a1 ≥ 0, bi ≤ ∞ and the pdf πi (z). A mixture cdf for the ith component is defined by
bi
Fm,i (t) =
Fi (t, z)πi (z)dz,
i = 1, 2.
(5)
ai
The corresponding mixture failure rate is: bi λm,i (t) = abii ai
fi (t, z)πi (z)dz F¯i (t, z)πi (z)dz
bi
=
λi (t, z)π(z | t)dz,
(6)
ai
where the conditional pdf (on condition that Ti > t): πi (z | t) = πi (z) bi ai
F¯i (t, z) F¯i (t, z)πi (z)dz
.
(7)
98
Mixture Failure rate modeling
Assume that the components of our system are conditionally independent given Z1 = z1 , Z2 = z2 . Then the cdf of the system is: Fs (t, z1 , z2 ) = 1 − F¯1 (t, z1 )F¯2 (t, z2 )
(8)
and the corresponding probability density function is fs (t, z1 , z2 ) = f1 (t, z1 )F¯2 (t, z2 ) + f2 (t, z2 )F¯1 (t, z1 ).
(9)
The mixture failure rate of the system in this case is defined as b2 b1 λm,s (t) = ab22 ab11
a2 b2
a1
F¯s (t, z1 , z2 )π(z1 , z2 )dz1 dz2
b1
= a2
fs (t, z1 , z2 )π(z1 , z2 )dz1 dz2 (10)
λs (t, z1 , z2 )π(z1 , z2 | t)dz1 dz2 ,
a1
where π(z1 , z2 | t) = π(z1 , z2 ) b2 b1 a2
a1
F¯s (t, z1 , z2 ) F¯s (t, z1 , z2 )π(z1 , z2 )dz1 dz2
,
(11)
and π(z1 , z2 ) is the bivariate joint probability density function of Z1 and Z2 . It is clear that for our series system, defined by (8): λs (t, z1 , z2 ) = λ1 (t, z1 ) + λ2 (t, z2 ).
(12)
It is clear that if Z1 and Z2 are independent, which means π(z1 , z2 ) = π1 (z1 )π2 (z2 ) for some densities π1 (z1 ) and π2 (z2 ); then π(z1 , z2 |t) = π1 (z1 |t)π2 (z2 |t), which easily follow from definitions (7) and (11). Using equations (10) and (12):
b2
b1
λm,s (t) = a2
b2
b1
= a2
λs (t, z1 , z2 )π(z1 , z2 | t)dz1 dz2
a1
[λ1 (t, z1 ) + λ2 (t, z2 )] π1 (z1 |t)π2 (z2 |t)dz1 dz2
(13)
a1
= λm,1 (t) + λm,2 (t). Hence, when components of the system are conditionally independent and Z1 and Z2 are also independent, the mixture failure rate of a system is the sum of mixture failure rates of individual components.
2. The Main Asymptotic Result Assume that lifetimes of both components belong to the class defined by relation (1). Let for simplicity the non-important additive term be equal to zero. The corresponding survival functions for the components are
Asymptotic Properties of Bivariate Competing Risks Models - M. Finkelstein & V. Esaulova
F¯i (t, zi ) = e−Ai (zi φi (t)) ,
i = 1, 2.
99
(14)
Theorem 1 Let the corresponding survival functions in the competitive risks model (8) be defined by equation (14). Suppose that the mixing variables Z1 and Z2 have a joint probability density function π(z1 , z2 ), which is defined in [0, b1 ] × [0, b2 ], 0 < b1 , b2 ≤ ∞. Let the following properties hold: (a) π(z1 , z2 ) = z1α1 z2α2 π0 (z1 , z2 ),
where α1 , α2 > −1.
(b) π0 (z1 , z2 ) is continuous at (0, 0), π0 (0, 0) = 0. (c) Ai (s), i = 1, 2 are positive ultimately increasing differentiable functions, 0
∞
e−Ai (s) sαi ds < ∞.
Assume finally that φ1 (t), φ2 (t) → ∞ as t → ∞. Then λm,s (t) ∼ (α1 + 1)
φ1 (t) φ (t) + (α2 + 1) 2 . φ1 (t) φ2 (t)
(15)
By the sign ∼ we, as usually, denote the asymptotic equivalence: g1 (t) ∼ g2 (t) as t → ∞ means that g1 (t)/g2 (t) → 1 as t → ∞. It follows from the additive nature of the left hand side of (15) and the corresponding result for the univariate case [5] that the asymptotic mixture failure rate in our model can be viewed as the sum of univariate mixture failure rates of each component with its own independent frailty. Therefore,this theorem means that the asymptotic mixture failure rate in the correlated frailty model with conditionally independent components is equivalent to the asymptotic mixture failure rate in the independent frailty model. It can be also interpreted as some asymptotic independence of remaining lifetimes of our components in the correlated frailty model. Proof
We start our proof with the following supplementary lemma:
Lemma 1 Let g(z1 , z2 ) be a nonnegative integrable function in [0, ∞)2 . Let h(z1 , z2 ) be a nonnegative locally integrable function defined in [0, ∞)2 , such that it is bounded everywhere and continuous at the origin. Then, as t1 → ∞, t2 → ∞: t1 t 2
∞
0
∞ 0
g(t1 z1 , t2 z2 )h(z1 , z2 )dz1 dz2 → h(0, 0)
0
∞
0
∞
g(z1 , z2 )dz1 dz2 .
The proof of the lemma is rather straightforward and follows from the relation t1 t2
0
∞ ∞ 0
g(t1 z1 , t2 z2 )h(z1 , z2 )dz1 dz2 =
0
1
∞ ∞ 0
g(z1 , z2 )h
z1 z2 , t1 t2
and the standard technique of the dominated convergence theorem (since h h(0, 0) as t1 → ∞, t2 → ∞).
2 dz1 dz2 z1 z2 t1 , t2
→
100
Mixture Failure rate modeling
Now we proceed with the proof of the theorem. Substituting (8) and (9) into (10): b1 b2
λm,s (t) = 0b1 0b2 0
0
0
0 b2 0
b2 b1 +
f1 (t, z1 )F¯2 (t, z2 )π(z1 , z2 )dz2 dz1 F¯1 (t, z1 )F¯2 (t, z2 )π(z1 , z2 )dz2 dz1
f2 (t, z2 )F¯1 (t, z1 )π(z1 , z2 )dz1 dz2 = λ1m,s (t) + λ2m,s (t) . b1 ¯ ¯ F2 (t, z1 )F1 (t, z1 )π(z1 , z2 ) 0
(16)
Consider λ1m,s (t) and λ2m,s (t) separately. The probability density function of T1 is f1 (t, z1 ) = A1 (z1 φ1 (t))z1 φ1 (t)e−A1 (z1 φ1 (t))
(17)
and λ1m,s (t)
b1 b2 =
0
0
A1 (z1 φ1 (t))z1 φ1 (t)e−A1 (z1 φ1 (t))−A2 (z2 φ2 (t)) π(z1 , z2 )dz2 dz1 , b1 b2 e−A1 (z1 φ1 (t))−A2 (z2 φ2 (t)) π(z1 , z2 )dz2 dz1 0 0
Applying the Lemma to the numerator, we see that it is asymptotically equivalent to φ1 (t)π0 (0, 0) φ1 (t)α1 +2 φ2 (t)α2 +1
∞
0
A1 (u)uα1 +1 e−A1 (u) du
∞
0
sα2 e−A2 (s) ds
and the denominator is equivalent to π0 (0, 0) α φ1 (t) 1 +1 φ2 (t)α2 +1
∞
0
α1 −A1 (u)
u e
du
∞
0
sα2 e−A2 (s) ds.
Hence, λ1m,s (t)
φ (t) · ∼ 1 φ1 (t)
∞ 0
A1 (u)uα1 +1 e−A1 (u) du ∞ α1 −A1 (u) du 0 u e
(18)
Due to condition (c) of the Theorem, it can be easily shown that e−A(s) sα+1 → 0 as s → ∞.
(19)
Thus, from (18) λ1m,s (t) ∼ (α1 + 1)φ1 (t)/φ1 (t). Similarly,
λ2m,s (t) ∼ (α2 + 1)φ2 (t)/φ2 (t).
Asymptotic Properties of Bivariate Competing Risks Models - M. Finkelstein & V. Esaulova
101
3. Discussion Assumptions (a) and (b) of the Theorem impose certain restrictions on the mixing distribution. The corresponding conditions in the univariate case are satisfied for a wide class of distributions (admissible class), such as Gamma, Weibull, etc. [5]. In the bivariate case they obviously hold, at least, for all densities that are positive and continuous at the origin. It is worth to interpret our results in terms of copulas, which can be helpful in analyzing the competing risks problems. The following result, which defines simple sufficient conditions, is obvious and therefore its proof is omitted: Corollary 1 Assume that the bivariate mixing cdf is given by the copula C(u, v): Π(z1 , z2 ) = C(Π1 (z1 ), Π2 (z2 )), where Π1 (z1 ), Π2 (z2 ) are univariate cdfs, which densities satisfy the following univariate conditions [5]: πi (z) = z αi πi,0 (z),
αi > −1,
where πi,0 (z), i = 1, 2 are bounded in [0, ∞), continuous and positive at z = 0 (admissible class). Then the bivariate conditions (a) and (b) of the Theorem are satisfied, if c(u, v) = ∂2C ∂u∂v (u, v) can be represented as c(u, v) = uγ1 v γ2 c0 (u, v),
(20)
where c0 (u, v) is continuous and positive at (0, 0) and γ1 , γ2 ≥ 0. Example. Farlie-Gumbel-Morgenstern copula. The corresponding mixing distribution is defined via the copula: C(u, v) = uv(1 + θ(1 − u)(1 − v)), 2
∂ C (u, v) = 1 + θ(1 − 2u)(1 − 2v) is continuous at where |θ| ≤ 1, u, v ∈ [0, 1]. Since ∂u∂v the origin and positive there if θ > −1, the bivariate conditions hold when −1 < θ ≤ 1. Therefore, the results of the Theorem hold if the univariate cdfs belong to the admissible class.
Other mixing distributions that meet the conditions of the Theorem are the Dirichlet distribution [6], p. 485 and the inverted Dirichlet distribution [6], p. 491, some types of multivariate logistic distributions [6], p. 551, some types of special bivariate extreme value distributions [6], p. 625. There are also examples where conditions of the Theorem do not hold. This happens, e.g., when the joint Cdf depends on max(z1 , z2 ) and is not absolutely continuous. The widely used Marshall and Olkin’s bivariate exponential distribution with the survival function
102
Mixture Failure rate modeling
¯ 1 , z2 ) = e−γ1 z1 −γ2 z2 −γ12 max(z1 ,z2 ) Π(z is a relevant example. Some multivariate Weibull distributions also employ max functions and are not absolutely continuous at (0, 0). The corresponding examples can be also found in [6], p. 431.
References [1] [2] [3] [4] [5] [6] [7] [8]
F.G. Badia, M.D. Berrade, C.A. Campos and M.A. Navascues, On the behavior of aging characteristics in mixed populations, Probability in the Engineering and Informational Sciences, 15 (2001), 83–94. H.W. Block, J. Mi and T.H. Savits, Burn-in and mixed populations, Journal of Applied Probability, 30 (1993), 692–702. H.W. Block, T.H. Savits and E.T. Wondmagegnehu, Mixtures of distributions with increasing linear failure rates, Journal of Applied Probability, 40 (2003), 485–504. M.S. Finkelstein and V. Esaulova, Modeling a failure rate for the mixture of distribution functions, Probability in Engineering and Informational Sciences, 15 (2001), 383–400. M.S. Finkelstein and V. Esaulova, Asymptotic behavior of a general class of mixture failure rates, The Advances in Applied Probability, 38 (2006), 244–262. S. Kotz, N. Balakrishnan and N. L. Johnson. Continuous Multivariate Distributions, Models and Applications, Vol. 1, Wiley, NewYork, 2000. J.D. Lynch, On conditions for mixtures of increasing failure rate distributions to have an increasing failure rate, Probability in the Engineering and Informational Sciences, 13 (1999), 33–36. M. Shaked and F. Spizzichino, Mixtures and monotonicity of failure rate functions. In: Advances in Reliability(N. Balakrishnan and C.R. Rao -eds.),Vol. 20, Elsevier: Amsterdam. 185–198, 2001.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
103
On the Reversed Hazard Rate and Mean Inactivity Time of Mixtures F.G. BADÍA 1 , M.D. BERRADE Departamento de Métodos Estadísticos, Centro Politécnico Superior, Universidad de Zaragoza. 50018 Zaragoza, Spain Abstract. The reversed hazard rate defined as the ratio of the density to the distribution function shows an increasing importance in reliability analysis. Its connection with the mean inactivity time also stands out. Owing to the growing use of both functions, we aim at giving some insight about its properties in mixtures of distributions. Keywords. Reversed hazard rate, mean inactivity time, mixture
Introduction Researchers have traditionally focused on both the hazard rate and the mean residual life as usual reliability measures. As Finkelstein [6] points out, the reversed hazard rate (RHR) and the mean inactivity time (MIT) emerge as new interesting approaches. He also highlights that the RHR and MIT can be viewed as ‘dual functions’ of the hazard rate and the mean residual life respectively. Consider a lifetime random variable X ≥ 0 with distribution function F (x) and reliability function R(x). The meaning of the hazard rate as the conditional probability of failure in (x, x + dx] provided that it has not fail in [0, x], is one of the reasons of its wide use. The reversed hazard rate does not show such a straightforward explanation as it defines the conditional failure of an object in (x − dx, x] given that it had occurred in [0, x]. Block et al [2] indicate the usefulness of the RHR in the analysis of data with left censored observations or in discussing lifetimes with reversed time scale. The reversed hazard rate turns out to be applicable in medical studies (Kalbfleisch and J.F. Lawless [11], Gross and Huber-Carol [9]) Thus, the product of the reversed hazard rate and dx is the approximate probability for an individual to have been infected with a virus in the interval (x − dx, x] provided that he or she was infected in [0, x]. Epidemiological research is concerned with both, the instant of infection and the time elapsed since that moment till the time of observation, that is, the MIT. Block et al [2] present some properties of the RHR function along with its affinity to study parallel systems and reversed hazard rate ordering in k-out-of-n systems. Finkelstein [6] considers the application of RHR and MIT to ordering of random variables under the proportional reversed hazard rate model. 1 Corresponding
Author: E-mail:
[email protected]
Mixture Failure rate modeling
104
Chandra and Roy [4] point out the growing importance of the RHR and analyze relationships with respect to its monotonic behavior. Moreover they consider implications between the RHR and MIT, presenting characterizing properties. Ross et al [14] provide some results related to the reversed hazard ratio ordering in renewal processes and Markov Chains, and Di Creszenzo [5] shows some results on the proportional reversed hazard model concerning aging characteristics and stochastic orders. This article focuses on the study of the RHR and the MIT in mixtures of distributions, providing with some mixture preserving properties. We study the monotonic behavior of such mixtures by means of the conditional expectations of the RHR and the MIT corresponding to the distributions in the mixture (Badia et al [1]).
1. The RHR and MIT of mixtures This section is devoted to the role of the reversed hazard rate and the mean inactivity time in arbitrary mixtures. Throughout this paper we don’t restrict to positive random variables and consider general intervals of support. Let X be a random variable with probability density function and distribution function given, respectively, by f (x) and F (x). Its reversed hazard rate (RHR) is defined as q(x) =
f (x) F (x)
The mean inactivity time (MIT) is given by the following equation x F (u)du m(x) = E[x − X|X ≤ x] = −∞ F (x) There is hardly any system in modern technology operating under homogeneous conditions and mixtures of distributions constitute the usual tool for modeling heterogeneity (Proschan [13] Finkelstein and Esaulova [8]). The effect of different environments is described by means of a random variable Z such that the density and distribution functions as well as the RHR and MIT functions depend on Z. In what follows f (x, z) and F (x, z) represent, respectively, the density and distribution functions provided that conditions are given by Z = z. In addition the conditional reversed hazard rate if Z = z, is q(x, z) whereas m(x, z) denotes the conditional MIT. From now on X will denote time to failure of the mixture with q (x) and m (x) being the corresponding RHR and MIT. Researchers have addressed their attention to the aging properties (decreasing failure rate, increasing mean residual life) which are preserved under mixtures (Proschan [13], Finkelstein [7], Finkelstein and Esaulova [8], Gupta and Gupta [10], Block et al [3]). Following this approach, we deal with the aging properties concerning both the RHR and MIT. The next result deals with preservation of logconvex functions under mixtures.
The RHR and the MIT of mixtures - F.G. Badía & M.D. Berrade
105
Proposition 1 A mixture of logconvex functions is also logconvex. Proof: Let g(x, z) a logconvex function in x and E[g(x, Z)] the mixture function with Z being the mixing random variable, then g(αx + (1 − α)y, Z) ≤ g(x, Z)α g(y, Z)1−α for x and y in the corresponding domain and 0 ≤ α ≤ 1. Taking expectations in both sides of the foregoing expression along with the Hölder inequality leads to: E [g(αx + (1 − α)y, Z)] ≤ E[g(x, Z)]α E[g(y, Z)]1−α and thus the result follows. The reliability classes defined below will be considered. Definition 1 Any continuous random variable X with density function f (x) is said to be logconvex or decreasing likelihood ratio (DLR) if f (x) is a logconvex function. Definition 2 Let X be a random variable with F (x) and q(x) being its corresponding distribution function and reverse hazard rate. We say that X is increasing reverse hazard rate (IRHR) if q(x) increases with x or equivalently if F (x) is a logconvex function. Definition 3 Let X be a random variable with R(x) and r(x) being, respectively, its reliability function and hazard rate. We say that X is decreasing hazard rate (DFR) if r(x) is non-increasing or equivalently if R(x) is a logconvex function. As an immediate consequence of Proposition 1, the following result holds. Corollary 1 The DFR, DLR and IRHR classes are preserved under mixtures. Proof: The density function corresponding to a mixture of DLR random variables is given by a mixture of logconvex density functions. The reliability function corresponding to a mixture of DFR random variables is given by a mixture of logconvex reliability functions. The distribution function corresponding to a mixture of IRHR random variables is given by a mixture of logconvex distribution functions. Next theorem provides certain bounds for the derivatives of both, the RHR and the MIT of the mixture. They are given in terms of expectations of the aging characteristics and its derivatives corresponding to the distributions in the mixture. Theorem 1 Under the appropriate regularity conditions the following inequalities hold: a) dq(x,Z) F (x, Z) E dx dq (x) ≥ dx E[F (x, Z)] b)
/ 2 0 F (x, Z) dm(x, Z) dm (x) E[f (x, Z)] ≤ 2 E dx E [F (x, Z)] f (x, Z) dx
Mixture Failure rate modeling
106
Proof: a) The RHR corresponding to a mixture of distribution is given as follows q (x) =
E[q(x, Z)F (x, Z)] E[f (x, Z)] = E[F (x, Z)] E[F (x, Z)]
The derivative of the foregoing expression is dq(x,Z) F (x, Z) E dx E[q(x, Z)f (x, z)]E[F (x, Z)] − E 2 [f (x, Z)] dq (x) = + dx E[F (x, Z)] E 2 [F (x, Z)] In addition, applying the Cauchy-Schwartz inequality we obtain 1
1
E 2 [f (x, Z)] = E 2 [(f (x, Z)q(x, Z)) 2 F 2 (x, Z)] ≤ E[f (x, Z)q(x, Z)]E[F (x, Z)] and, hence, the result in a) holds. b) The MIT of a mixture of distributions can be expressed m (x) =
E[m(x, Z)F (x, Z)] E[F (x, Z)]
In addition d d(m(x, Z)F (x, Z)) = dx dx
x
−∞
F (u, Z)du = F (x, Z)
Therefore E 2 [F (x, Z)] − E[f (x, Z)]E[m(x, Z)F (x, Z)] dm (x) = dx E 2 [F (x, Z)] Provided the relationship between the RHR and MIT given next f (x, Z) dm(x, Z) = 1 − m(x, Z)q(x, Z) = 1 − m(x, Z) dx F (x, Z) The following expression of the MIT is obtained 0 / dm(x, Z) F (x, Z) m(x, Z) = 1 − dx f (x, Z) Hence
0 0 / 2 / 2 E[f (x, Z)] E[f (x, Z)] dm (x) F (x, Z) F (x, Z) dm(x, Z) + 2 =1− 2 E E dx E [F (x, Z)] f (x, Z) E [F (x, Z)] f (x, Z) dx
The Cauchy-Schwartz inequality ensures that 0 / 2 F (x, Z) E 2 [F (x, Z)] ≤ E[f (x, Z)]E f (x, Z) and the result in b) is derived.
The RHR and the MIT of mixtures - F.G. Badía & M.D. Berrade
107
Figure 1. q(t) and q (t) in the proportional RHR.
It is well know that the interval of support of any IRHR random variable must be of the type (−∞, b] (Block et al [2]). If we do not restrict to the class of lifetime distributions and consider mixtures of distributions with support (−∞, b] the following results is also derived from part a) in Theorem 1. Corollary 2 The IRHR class is preserved under arbitrary mixtures of IRHR distributions, provided that they can exist when supports (−∞, b] are considered. Regarding the mean inactivity time, it’s not difficult to prove that if q(x) is increasing then m(x) is decreasing. Therefore we must take into account intervals of support of the form (−∞, b] to find distributions showing decreasing mean inactivity times (DMIT). If so, the next result is also a consequence of part b) in Theorem 1. Corollary 3 The DMIT class is preserved under arbitrary mixtures of DMIT distributions, provided that they exist whenever random variables with supports of the type (−∞, b] are considered. The following example aims at illustrating that the DRHR class is not preserved under mixtures. It is based on the so called proportional reversed hazard rate model (Finkelstein [6]) defined as q(x|z) = zq(x)
(1)
where q(x) is a baseline reversed hazard rate and z represents the effect of covariates on it. In this paper q(x|z) represents the RHR of a random variable, Xz , whose distribution function is, equivalently, F (x)z with z > 0 with F (x) being the distribution function corresponding to q(x).
Mixture Failure rate modeling
108
Figure 2. m (x): Mean inactivity times compounding the mixture.
Example 1 Let Z a gamma random variable whose density function is given as follows f (z) =
e−az ap z p−1 Γ(p)
and consider the next baseline RHR ⎧ αγ ⎨ xγ+1 , 0 < x ≤ 1 q(x) = αγ, 1 < x < 2 ⎩ αγ2γ+1 x≥2 xγ+1 , with α and γ being positive parameters. Figure 1 serves at illustrating the previous example concerning the non preservation of the DRHR class. It shows the baseline q(x) and the RHR of the mixture, q (x), for p = 2, a = 1, α = 2, γ = 1. The former, represented by a dashed line, is a decreasing function whereas the latter, in solid line, is non-monotonic. Next, we provide another example to prove that the increasing mean inactivity time (IMIT) class is not preserved either under mixtures. Example 2 Consider the following expression for the mean inactivity time which is a piecewise function ⎧ x ⎪ ⎨ c+1 , 0 < x ≤ 1 1 1 m(x, c) = c − c(c+1) e−c(x−1) , 1 ≤ x ≤ 2 ⎪ ⎩ 1 − 1 e−c + (x − 2), x > 2 c c(c+1)
The RHR and the MIT of mixtures - F.G. Badía & M.D. Berrade
109
Figure 3. m (x): Mean inactivity time of the mixture.
with c being a positive parameter. And the discrete mixture given next x x p 0 F (u, c1 )du + (1 − p) 0 F (u, c2 )du m (x) = pF (x, c1 ) + (1 − p)F (x, c2 ) where F (x, c1 ) and F (x, c2 ) denote the corresponding distribution function to m(x, c1 ) and m(x, c2 ) respectively. Figure 2 contains m(x, c1 ) and m(x, c2 ) for c1 = 1 and c2 = 5 and both are increasing. The mean inactivity function of the mixture, m (x) for c1 = 1, c2 = 5 and p = 0.1, is depicted in Figure 3. The mixture is non increasing in the interval [1, 2].
2. Discussion The article by Proschan [13] explaining why the observed times between breakdowns in airplanes air-conditioning systems showed decreasing failure rates constitutes a seminal work on the role of mixtures in reliability. As Lawless [12] points out is one of the two of the most cited and influential articles in the reliability literature. The huge amount of research concerning this issue reflects the interest and value of this issue in reliability. The hazard rate and the mean residual life turn out to be useful for lifetimes whereas the new approaches concerning the reversed hazard rate and the mean inactivity time emerge when the time scale is reversed. However this situation is not as often encountered in reliability engineering as in medical studies where the cohort analyzes frequently implies a retrospective examination of the lifetimes. In this work we provide theoretical proper-
Mixture Failure rate modeling
110
ties of the RHR and the MIT, however the most interesting preserving condition for the DRHR and the IMIT class can not be applied for nonnegative random variables which are the ones needed for the time to failure modeling.
Acknowledgments This work has been supported by the University of Zaragoza under project UZ2006-CIE03.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
F.G. Badía, M.D. Berrade & C.A. Campos, Aging properties of the additive and proportional hazard mixing models, Reliability Engineering and System Safety 78 (2002), 165–172. H. W. Block, T.H. Savits & H. Sing, The reversed hazard rate function, Probability in the Engineering and Informational Sciences 12 (1998), 69–90. H. W. Block, Y. Li & T.H. Savits, Preservation of properties under mixture, Probability in the Engineering and Informational Sciences 17 (2003), 205–212. N.K. Chandra, D. Roy, Some results on reversed hazard rate, Probability in the Engineering and Informational Sciences 15 (2001), 95–102. A. Di Crescenzo, Some results on the proportional reversed hazards model Statistics & Probability Letters 50, (2000), 313–321. M. Finkelstein, On the reversed hazard rate, Reliability Engineering and System Safety 78 (2002), 71–75. M. Finkelstein, On the shape of the mean residual life function, Applied Stochastic Models in Business and Industry 18 (2002), 135–146. M. Finkelstein, V. Esaulova, Modeling a failure rate for a mixture of distribution functions, Probability in the Engineering and Informational Sciences 15 (2001), 383–400. S.T. Gross, C. Huber-Carol, Regression models for truncated survival data, Scand. J. Statist 19 (1992), 193–213. P.L. Gupta, R.C. Gupta, Ageing characteristics of the Weibull mixtures, Probability in the Engineering and Informational Sciences 10 (1996), 591–600. J.D. Kalbfleisch, J.F. Lawless, Regression models for right truncated data with applications to AIDS ˝ incubation times and reporting lags, Statist. Sinica 1 (1991), 19U-32. J. Lawless, Introduction to Two Classics in Reliability Theory, Technometrics 42 (2000), 5–6 Proschan, F, Theoretical Explanation of Observed Decreasing Failure Rate, Technometrics 5 (1963), 375–383. S.M. Ross, J.G. Shanthikumar, J.G., & Z. Zhu, On increasing-failure-rate random variables, Journal of Applied Probability 42 (2005), 797–809.
111
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Bounds on lifetimes of coherent systems with exchangeable components Tomasz RYCHLIK 1 Institute of Mathematics, Polish Academy of Sciences Abstract. We consider coherent systems based on dependent components with arbitrary exchangeable and continuous joint distributions. Applying an extension of the Samaniego representation for the system lifetime distributions with independent components to the exchangeable model, we provide some bounds for the distributions and moments of the coherent system lifetimes. In particular, we present sharp upper and lower bounds on the distribution functions and expectations of arbitrary system lifetimes, dependent on the Samaniego signature of the system and the marginal distribution of the components. We further determine more general expectation bounds dependent on the mean and variance of the component lifetime marginal distribution, and respective refinements for restricted classes of distributions. We also consider evaluations of lifetime variances in terms of the marginal distribution and variance of a single component. Keywords. Coherent system, dependent exchangeable component, signature, distribution bound, variance bound, expectation bound
Introduction For fixed n, let ϕ : {0, 1}n → {0, 1} denote an arbitrary coherent system with n components. Let nonnegative random variables X1 , . . . , Xn represent random lifetimes of the components of the system, and X1:n ≤ . . . ≤ Xn:n stand for the respective order statistics. If the lifetime variables are independent identically distributed with a continuous marginal distribution function F , we have a well known Samaniego [9] representation P(T ≤ x) =
n
pi P(Xi:n ≤ x)
(1)
i=1
of the distribution function of the system lifetime T = Tϕ (X1 , . . . , Xn ). Here pi = P(T = Xi:n ),
i = 1, . . . , n,
(2)
forming the so-called Samaniego signature vector, are uniquely determined by the properties of the system function ϕ, and are independent of the distribution of the components lifetimes. On the other hand, distribution functions of order statistics 1 Institute of Mathematics, Polish Academy of Sciences, Chopina 12, 87 100 Toru´ n, Poland.
[email protected]
112
Signature
P(Xi:n ≤ x) =
n 1 2 n F k (x)(1 − F (x))n−k , k
i = 1, . . . , n,
(3)
k=i
depend merely on the marginal F , and are independent of the system structure. Navarro and Rychlik [2] generalized the Samaniego representation (1) and (2) to the case of arbitrary continuous exchangeable joint distribution of the components lifetimes. Admitting exchangeability provides a natural and justifiable extension of the i.i.d model. This means that the components are identical, but they may affect one another in the working system, because the failure of some components increases the burden upon the other ones, and make then fail earlier. Clearly, the distributions of order statistics in the general exchangeable case depend on the joint distribution of X1 , . . . , Xn , and (3) is only a particular case among the multitude of other options. Navarro and Rychlik [2] combined (1) and results of Rychlik [3] characterizing the distribution functions of order statistics based on dependent random variables with a fixed marginal F for determining optimal lower and upper bounds on the distributions and expectations of system lifetimes with exchangeable components represented in terms of the component marginal distributions. The results are presented in Section 2. They are further used in Section 3 for determining more general bounds in terms of expectation and variance of the marginal distribution. More refined mean-variance evaluations are presented for the k-out-of-n systems under the condition that the component lifetime distributions belong to the restricted classes of DFR, IFR, DFRA and IFRA distributions. The results are concluded from Rychlik [4], [6]. Finally, following Rychlik [5], [8], respectively, we describe two types of optimal lower and upper bounds on variances of k-out-of-n systems with exchangeable components, expressed in terms of the marginal distribution and its variance.
1. Distribution bounds Rychlik [3] proved that conditions n
Fi = nF,
(4)
F1 ≥ . . . ≥ Fn
(5)
i=1
are necessary and sufficient for the vector of distribution functions (F1 , . . . , Fn ) to be the marginal distribution functions of order statistics X1:n , . . . , Xn:n based on the sample of n arbitrarily dependent random variables X1 , . . . , Xn with the joint distribution function F . The characterization is valid under the restriction to exchangeable samples as well. Rychlik [3] used (4) and (5) for establishing minimal and maximal linear combin nations i=1 ci Fi of distribution functions of order statistics based on dependent (possibly exchangeable) random variables with the common marginal F , and c1 , . . . cn being arbitrarily fixed real coefficients. In particular, the bounds are valid for the distribution functions of lifetimes of coherent systems with a given Samaniego signature p = (p1 , . . . , pn ) based on exchange-
Bounds on lifetimes of coherent systems with exchangeable components - T. Rychlik
113
able continuous random variables with a fixed marginal F . However, the extreme combinations are attained by some discontinuous joint sample distributions such that some fixed groups of order statistics are identical with probability one. Navarro and Rychlik [2] constructed sequences of absolutely continuous exchangeable random variables with given marginal F which attain the bounds in limit. Precisely, they proved the following. Theorem 1 Assume that X1 , . . . , Xn are nonnegative exchangeable random variables with an absolutely continuous joint distribution and a given marginal distribution function F , and T = Tϕ (X1 , . . . , Xn ) is the lifetime of a coherent system with the Samaniego signature p = (p1 , . . . , pn ), and the components with random lifetimes X1 , . . . , Xn . Let Gp , Sp : [0, 1] → R denote the greatest convex and smallest concave functions, respectively, satisfying Gp (0) = Sp (0) = 0, and Gp (j/n) ≤
j
pi ≤ Sp (j/n),
j = 1, . . . , n.
i=1
Then we have Gp (F (x)) ≤ FT (x) = P(Tϕ (X1 , . . . , Xn ) ≤ x) ≤ Sp (F (x)),
x ∈ R.
(6)
(k)
Moreover, there exist sequences P(k) , P , k = 1, 2, . . ., of exchangeable absolutely continuous distributions on Rn with the common marginal F such that (k)
(k)
F T (x) = P(k) (Tϕ (X1 , . . . , Xn(k) ) ≤ x) → Gp (F (x)), (k)
F T (x) = P
(k)
(k)
(Tϕ (X1 , . . . , Xn(k) ≤ xt) → Sp (F (x)),
uniformly in x ∈ R. Note that functions Gp and Sp are continuous piecewise linear distribution functions on [0, 1], and the respective derivatives gp and sp , say, are well-defined, non-decreasing stepwise functions except for a finite number of breaking points. They can be written as gp (x) = gpi , sp (x) = spi ,
(7) i i−1 <x< , n n
i = 1, . . . , n,
(8)
and the finite sequences (7) and (8) are non-decreasing and non-increasing, respectively. The following expectation bounds are immediate consequences of the Theorem. Corollary 1 Under the assumptions and notation of Theorem 1, the following bounds are sharp 0
1
F
−1
(x)sp (x)dx ≤ ETϕ (X1 , . . . , Xn ) ≤
where F −1 denotes the quantile function of F .
0
1
F −1 (x)gp (x)dx,
(9)
114
Signature
In fact, replacing F −1 (x) by g(F −1 (x)) in (9), we also obtain optimal bounds on the expectations of arbitrary non-decreasing functions g of the system lifetime T . In particular, we easily get bounds for arbitrary moments. In the case of k-out-of-n systems, (6) and (9) can be specified as .
$n % n max [F (x) − 1] + 1, 0 ≤ Gk (x) = P(Xn+1−k:n ≤ x) ≤ min F (x), 1 k n+1−k (10) and n n+1−k
1− k−1 n
0
F −1 (x) dx ≤ EXn+1−k:n ≤
n k
1
k 1− n
F −1 (x) dx,
respectively. 2. Expectation bounds Inequalities (9) can also be treated as the optimal bounds on the expectations of arbitrary linear (not necessarily convex) combinations of order statistics. Rychlik [4] used (9) for determining sharp bounds on expectations of L-statistics based on dependent identically distributed samples in terms of some population moments. Below we formulate analogous evaluations for the lifetimes of coherent systems. Theorem 2 Assume that a coherent system ϕ with n components has the Samaniego index p = (p1 , . . . , pn ), and X1 , . . . , Xn are exchangeable continuously distributed lifetimes of components with common expectation μ and variance 0 < σ 2 < ∞. Then the following bounds are optimal & n '1/2 / 1 01/2 1 2 2 − (sp (x) − 1) dx =− s −1 n i=1 pi 0 T −μ ≤ ≤E σ
/
1 0
'1/2 01/2 & n 1 2 (gp (x) − 1) dx = g −1 . n i=1 pi 2
(11)
The bounds are derived by means of the Cauchy-Schwarz inequality. The upper one follows from 1 ET − μ = [F −1 (x) − μ][gp (x) − 1] dx 0
/ ≤
1
[F 0
/ =σ
0
−1
2
(x) − μ] dx
0
1
01/2 [gp (x) − 1] dx 2
01/2 1 2 [gp (x) − 1] dx
1 =σ gpi − 1 n i=1 n
1/2 ,
(12)
Bounds on lifetimes of coherent systems with exchangeable components - T. Rychlik
115
and the lower one is concluded similarly. The equality in (12) holds if gpi − 1 F −1 (x) − μ = " #1/2 , n 1 σ gpi − 1 n
i i−1 <x< , n n
i=1
i = 1, . . . , n,
which means that T has a discrete at most n-valued distribution. To show optimality of the bounds, we should approximate the discrete marginal by continuous ones, and further, approximate discontinuous joint distribution with continuous marginals by absolutely continuous ones. We omit details here. In particular, for k-out-of-n systems we have 4 −
Xn+1−k:n − μ k−1 ≤E ≤ n+1−k σ
4
n−k . k
(13)
Applying the Hölder inequality rather than the Schwarz one, we can generalize (11) and (13), expressing the estimates of the expectations with scale units different from the standard deviation σ (comp. Rychlik [4]). The lower and upper bounds in (11) vanish, if the smallest concave majorant Sp and greatest convex minorant Gp , respectively, are linear. This occurs if j
pi ≤
i=1 j i=1
pi ≥
j , n
j = 1, . . . , n − 1,
j , n
j = 1, . . . , n − 1,
respectively. If all the inequalities are strict, we can apply results of Goroncy and Rychlik [1] and improve the zero bounds as follows ⎛ ⎞ n T −μ j n ⎝ E ≥ min pi − ⎠ , (14) 1≤j≤n−1 σ n n(n − j) i=n+1−j T −μ n ≤ − min E 1≤j≤n−1 σ n(n − j)
&
j
j pi − n i=1
' .
(15)
If some of them become equalities, then the respective tight bounds are zero, and formulas (14) and (15) can be written in this case as well. Especially, for the series and parallel systems, we have E
X1:n − μ 1 ≤ −√ , σ n−1
(16)
1 Xn:n − μ ≥√ , σ n−1
(17)
E
respectively. We can further generalize inequalities (14) to (17) considering different scale units (see Goroncy and Rychlik (2006) for details). More precise moment evaluations are possible when the component lifetimes have distributions from some restricted families. The results can be deduced from Rychlik
116
Signature
[6]. We consider k-out-of-n coherent systems , 1 ≤ k ≤ n − 1, such that the components have a continuous exchangeable joint distribution, and additionally assume that the marginal lifetime distributions belong to a restricted family of distributions. We consider four families of DFR, IFR, DFRA and IFRA distributions, and present mean-variance upper bounds for the lifetime of the system. As above, all the bounds are sharp, but they are not attainable. They are positive, and established by means of projecting properly defined linear functionals over some functional Hilbert spaces onto some convex cones. The method and various applications are comprehensively described in Rychlik [7]. For the series systems, the respective bounds are negative, and should be determined by other methods. Theorem 3 Let X1 , . . . , Xn denote random lifetimes of an k-out-of-n system, 1 ≤ k ≤ n − 1, with an absolutely continuous exchangeable distribution, finite mean μ and variance σ 2 . (i) If the marginal distribution is DFR, then Xn+1−k:n − μ ≤ E σ
56
2n ke − ln nk ,
1
71/2
, if if
k n k n
< e−1 , > e−1 .
(18)
(ii) If the marginal distribution is IFR, then E
1 − 2βe−β − e−2β Xn+1−k:n − μ n ≤ −1 , σ k e−β − 1 + β
(19)
where β = β(k, n) > ln nk is the unique solution to 1 − (1 + β)e−β k n = ln . −β e −1+β n−k k (iii) If the marginal distribution is DFRA, then 6 7 (1 − nk ) ln nk + 1 Xn+1−k:n − μ ≤ E 6 72 . σ 1 + (1 − nk ) ln nk + 1
(20)
(iv) If the marginal distribution is DFRA, then the general upper bounds of (13) holds. All inequalities (18) to (20) and (13) are optimal, but not attainable.
3. Variance bounds Applying (4) and (5), Rychlik [5] described all the possible distribution functions Fk of the kth greatest order statistic based on n dependent random variables with a common marginal F by means of two relations. One is the two-sided inequality (10), and the other is the following inequality 0 ≤ fk (x) =
dFk (x) ≤n dF (x)
F − almost everywhere
(21)
Bounds on lifetimes of coherent systems with exchangeable components - T. Rychlik
117
for the density function fk of Fk with respect to the parent distribution function F , which was proved to exist. The characterization is valid for the distribution functions of lifetimes of k-out-of-n coherent systems based on exchangeable components. This enables us to determine sharp bound on variances of the system lifetimes. It suffices to focus on one-parameter subfamilies of the most dispersed and concentrated elements of the family defined by (10) and (21). The most concentrated family is ⎧ n if x < aα , ⎨ n+1−k F (x), Fα (x) = α, if aα ≤ x < bα , ⎩n (F (x) − 1) + 1, if x ≥ bα , k where aα = F −1
66
1−
k−1 n
7 7 7 6 α , bα = F −1 1 − nk (1 − α) and 0 < α < 1. Therefore 5
VarXn+1−k:n ≤ sup
0<α<1
/
2
x Fα (dx) −
02 8 . x Fα (dx)
(22)
The maximally concentrated distribution functions form two parametric families
Fα1 (x) =
max{0, nF (x) − (n − k)α}, if x < aα , n F (x), 1}, if x ≥ aα , min{ n+1−k
and
Fα2 (x) =
if x < bα , max{0, nk (F (x) − 1 + 1}, min{nF (x) − n + k − (k − 1)α, 1}, if x ≥ bα ,
for 0 < α < 1, with aα , bα defined above. Accordingly, 5 VarXn+1−k:n ≥ min inf
i=1,2 0<α<1
/
2
x Fαi (dx) −
02 8 . x Fαi (dx)
(23)
Both (22) and (23) strongly depend on parameters k, n of the system, and the parent distribution function F . Usually, they have very complicated forms. The exception are the lower variance bounds for the components lifetimes uniformly distributed on some finite interval [0, θ]. Then VarXn+1−k:n ≥
θ2 . 12n2
The corresponding upper variance bounds were calculated in Rychlik [5]. Rychlik (2007) used (22) for establishing general bounds dependent on the components lifetime variance only. The respective trivial lower bounds are derived by simple arguments. Theorem 4 Assume that the component lifetimes X1 , . . . , Xn have an exchangeable absolutely continuous distribution and finite variance σ 2 . Then the variance of the lifetime of k-out-of-n coherent system based on the components satisfies the relations
118
Signature
VarXn+1−k:n 0≤ ≤ max σ2
n n , k n+1−k
. .
Both the inequalities are optimal and non-attainable.
Acknowledgments The paper was prepared under the support of the Polish Ministry of Science and Higher Education Grant no. 1 P03A 015 30.
References [1] [2] [3] [4] [5] [6] [7] [8] [9]
A. Goroncy and T. Rychlik, How deviant can you be? The complete solution. Math. Inequal. Appl., 9 (2006), 633–647. J. Navarro and T. Rychlik, Reliability and expectation bounds for coherent systems with exchangeable components, J. Multivariate Anal., 98 (2007), 102–113. T. Rychlik, Bounds for expectation of L-estimates for dependent samples, Statistics, 24 (1993a), 1–7. T. Rychlik, Sharp bounds on L-estimates and their expectations for dependent samples. Commun. Statist. Theor. Meth. 22 (1993b), 1053–1068. T. Rychlik, Distributions and expectations of order statistics for possibly dependent random variables. J. Multivariate Anal. 48 (1994), 31–42. T. Rychlik, Mean-variance bounds for order statistics from dependent DFR, IFR, DFRA and IFRA samples, J. Statist. Plann. Inference 92 (2001a), 21–38. T. Rychlik, Projecting Statistical Functionals, Lecture Notes in Statistics 160, Springer-Verlag, New York, 2001b. T. Rychlik, Extreme variances of order statistics in dependent samples, submitted for publication, 2007. F. Samaniego, On closure of the IFR class under formation of coherent systems. IEEE Trans. Reliab. R-34 (1985), 69–72.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
119
On the Signature of Coherent Systems and Applications for Consecutive k-out-of-n:F Systems 1 Ioannis S. TRIANTAFYLLOU 2 and Markos V. KOUTRAS 3 Department of Statistics and Insurance Science, University of Piraeus, Greece Abstract. In the present article we develop some tools that facilitate the calculation of the signature of a system by a generating function approach. As an application, we establish a recurrence relation for the computation of the signature of a linear consecutive 2-out-of-n-F :system and provide a formula that expresses the signature of a circular consecutive k-out-of-n:F system in terms of the linear one. Keywords. Signature, consecutive k-out-of-n:F system
Introduction In a linear (circular) consecutive k-out-of-n:F system we have n components which are linearly (circularly) arranged and the system fails if and only if at least k consecutive components fail. The most popular applications of these systems, pertain to telecommunication and pipeline network modeling as well as integrated circuitry design. The term consecutive k-out-of-n:F system was first coined by Chiang and Niu [4] and since then the system has attracted considerable interest (for a review see Chao,Fu and Koutras [2], Chang, Cui and Hwang [3] or Kuo and Zuo [8]). Derman, Liebermann and Ross [5] raised the interesting question whether or not the family of consecutive k-out-of-n:F systems preserves the increasing failure rate (IFR) property i.e. whether system’s lifetime is IFR when its components have statistically independent and identically distributed (i.i.d.) IFR lifetimes. Samaniego [9] offered a complete characterization of the class of coherent systems that have the above property (i.e. are closed under i.i.d. IFR components), by exploiting the notion of signature. More specifically, he proved the following necessary and sufficient condition for a system’s lifetime to be IFR whenever its component lifetimes X1 , X2 , ..., Xn are i.i.d. and follow an IFR distribution: the function 1 This
research work was sponsored by the Pythagoras grant (Ministry of Education and Religious Affairs of Greece). 2 Department of Statistics and Insurance Science, University of Piraeus, 80 Karaoli and Dimitriou str., 18534 Piraeus, Greece; E-mail:
[email protected]. 3 Department of Statistics and Insurance Science, University of Piraeus, 80 Karaoli and Dimitriou str., 18534 Piraeus, Greece; E-mail:
[email protected].
120
Signature
1 2 n · xi i=0 (n − i) · si+1 · i 1 2 g(x) = n−1 n n · xi i=0 ( j=i+1 sj ) · i n−1
must be increasing for x > 0. The quantities si , i=1,2,...,n appearing in g(x) are called signatures of the system and can be defined as si = P (T = X(i) ), i = 1, 2, ..., n where T is system’s lifetime and X(1) ≤ X(2) ≤ ... ≤ X(n) are the order statistics of the sample X1 , X2 , ... , Xn . The signatures si depend only on the system structure and not on the (common) distribution of Xi . As a matter of fact, n!si can be calculated by examining the minimal cut sets of the system and counting how many among the n! equally likely permutations of X1 , X2 , ... , Xn result in a minimal cut set failing upon the occurrence of X(i) . The signature is also connected to many other well known reliability concepts, a fact turning it to a very important tool for investigating and comparing reliability structures. For example, the reliability polynomial can be easily expressed in terms of system’s signature, it is often feasible to establish various stochastic comparisons between systems by comparing their signatures etc. For a detailed and up-to-date presentation of coherent systems and their applications, the interested reader is referred to the excellent publication of Boland and Samaniego [1]. From the aforementioned discussion it is clear that, should one be able to compute the signature of a consecutive k-out-of-n:F system, he could then study its reliability properties and produce new results or establish alternative proofs of already known outcomes. In the present article we provide a formula that facilitates the evaluation of the signature of a reliability structure by a generating function approach. We then apply the general result for a consecutive k-out-of-n:F system and establish a simple relationship between the signature of a linear and a circular consecutive k-out-of-n:F system. Finally, a recurrence is obtained for the signature of a linear consecutive 2-out-of-n:F system. 1. The Generating Function of the Signature of a Coherent System Let X1 , X2 , ... , Xn denote the component lifetimes of a reliability structure with n components and T the system’s lifetime. Throughout this work we assume that the lifetimes Xi are i.i.d. and therefore, the probability that the system fails upon the i-th component failure does not depend on the underlying continuous distribution of Xi . Then, the signature of the system is defined as the probability vector (s1 (n),s2 (n),...,sn(n)) with si (n) = P (T = X(i) ), i = 1, 2, ..., n where X(1) ≤ X(2) ≤ ... ≤ X(n) are the order statistics of a random sample drawn from an arbitrary continuous lifetime distribution. If we denote by ri (n) the number of
Signature of Coherent Systems - I.S. Triantafyllou & M.V. Koutras
121
path sets of the structure with exactly i working components, then si (n) is related to the quantities 1 2−1 n ri (n), ai (n) = i
i = 1, 2, ..., n
(1)
through the system of equations ai (n) =
n
sj (n),
i = 1, 2, ...n
j=n−i+1
or equivalently si (n) = an−i+1 (n) − an−i (n),
i = 1, 2, ..., n
(2)
(convention : a0 (n) = 0). Let us next denote by Rn (p) the reliability of the structure, where q = 1 − p is the common failure probability of its components. In the next Proposition, which will be proved useful in the sequel, we shall provide a relation between the generating function of Rn (p) R(z; p) =
∞
Rn (p)z n
n=1
and the double generating function of the quantities 1 2 n i si (n), i
1 ≤ i ≤ n.
The proof of the Proposition will be couched on the following simple result. Lemma 1. The double generating function of ri (n), 1 ≤ i ≤ n is given by n ∞
1 ri (n)t x = R x(1 + t); i n
n=1 i=1
t 1+t
2
Proof: Since (see [1]) Rn (p) =
n i=1
ai (n)
1 2 n n ri (n)pi q n−i pi q n−i = i i=1
the generating function of Rn (p), n = 1, 2, ..., takes on the form 1 2i p R(z; p) = Rn (p)z = ri (n) (qz)n . q n=1 n=1 i=1 ∞
n
n ∞
(3)
122
Signature
Employing the transformations t=
p , x = qz q
we readily obtain p=
t , z = x(1 + t) 1+t
and the result we are chasing, is effortlessly derived.
1 2 n Proposition 1. The double generating function of i si (n), 1 ≤ i ≤ n is given by i 1 2 ∞ n 1 1 ∂R(x(1 + t); 1+t ∂R(x(1 + t); 1+t ) ) n − t(t + 1) . (4) i si (n)ti xn = tx i ∂x ∂t
n=1 i=1
Proof: Substituting expression (1) in the RHS of (2) we may readily verify that 1 2 n i si (n) = (n − i + 1)rn−i+1 (n) − irn−i (n). i
(5)
Therefore 1 2 ∞ ∞ n n n i n i nrn−i+1 (n)ti xn − si (n)t x = i
n=1 i=1
n=1 i=1
2
t ·
n ∞
(i − 1)rn−i+1 (n)t
x −t·
i−2 n
n=1 i=1
n ∞
irn−i (n)ti−1 xn
n=1 i=1
and the result follows after straightforward algebraic manipulations, on noting that the three terms in the RHS can be expressed as ∞ n−1
nrn−i (n)ti+1 xn = tx ·
n=1 i=0
t2
∞ n−1
irn−i (n)ti−1 xn = t2 ·
n=1 i=0
1 ) ∂R(x(1 + t); 1+t
∂x 1 ) ∂R(x(1 + t); 1+t
∂t
−
∞
nr0 (n)tn+1 xn ,
n=1
−
∞
nr0 (n)tn+1 xn ,
n=1
and t
∞ n n=1 i=1
respectively.
irn−i (n)ti−1 xn = t ·
∂R(x(1 + t); ∂t
1 1+t )
Signature of Coherent Systems - I.S. Triantafyllou & M.V. Koutras
123
By casting a glance at Proposition 1, it is readily concluded that, should we have at hand the reliability function of a family of structures, indexed by n (number of components), or equivalently its generating function, we can easily calculate the generating function of a simple multiple of the signature. An illustration of how the forgone analysis can be applied in the special case of consecutive k-out-of-n:F systems, is provided in the next section.
2. The Signature of Consecutive k-out-of-n:F Systems As already mentioned in the Introduction, a linear (circular) consecutive k-out-of-n:F system consists of n components which are linearly (circularly) arranged and the system fails if and only if at least k consecutive components fail. Let sci , si be the signatures of the i-th component in a circular and a linear consecutive k-out-of-n:F system respectively. In the next two Propositions, we provide explicit expressions for the generating functions of the signatures of a circular and a linear consecutive k-out-of-n:F system (multiplied by appropriate binomial coefficients). Proposition 2. The double generating function 1 2 n ∞ n c C(x, t) = i si (n)ti xn i n=1 i=1
of the sequence {i C(x, t) =
1 2 n c si (n) : i = 1, 2, ..., n and n = 1, 2, ...} is given by i
tx3 (tx)2k + tx(−1 + x + tx)2 − x(tx)k k 2 (tx − 1)2 (−1 + x + tx) (−1 + tx)2 (1 + x(−1 − t + (tx)k ))2
+
x(tx)k t(−1 + 2x(1 + t) − (2 + t(2 + t))x2 ) (−1 + tx)2 (1 + x(−1 − t + (tx)k ))2
+
x(tx)k k(−1 + tx)2 (1 + t(−1 + x + tx)) (−1 + tx)2 (1 + x(−1 − t + (tx)k ))2
.
(6)
Proof: Applying identity (4) for a circular consecutive k-out-of-n:F system we may write 1 2 n ∞ 1 1 ∂Rc (x(1 + t); 1+t ∂Rc (x(1 + t); 1+t ) ) n c − t(t + 1) · , i si (n)ti xn = tx · i ∂x ∂t n=1 i=1 (7) where Rc (z;p) is the generating function of the reliability of the system. On the other hand, Rc (z;p) can be expressed as (see e.g. [7]) Rc (z; p) =
1 − kpq k z k+1 (qz)k − 1 − z + pq k z k+1 1 − qz
124
Signature
and therefore Rc (x(1 + t);
(tx)k + k(1 − tx) 1 + k − kx(1 + t) 1 )= + . 1+t −1 + tx 1 + x(−1 − t + (tx)k )
The RHS of (6) is now readily deduced by calculating the partial derivatives of the last formula with respect to x and t and replacing them in (7). Proposition 3. The double generating function L(x, t) =
1 2 ∞ n n i si (n)ti xn i
n=1 i=1
1 2 n of the sequence {i si (n) : i = 1, 2, ..., n and n = 1, 2, ...} is given by i L(x, t) =
(tx)k (k − tx − ktx + (tx)k+1 ) (1 − x − tx + x(tx)k )2
(8)
Proof: The proof follows an exact parallel to the proof of Proposition 2. More specifically, it suffices to write down the general identity (4) for the special case of a linear consecutive k-out-of-n:F system and use the well known expression of the reliability generating function RL (z;p) for this family of structures (c.f. [7]) RL (z; p) =
1 − (qz)k . 1 − z + pq k z k+1
In Proposition 4, we shall establish a simple relation between the signatures of the linear and circular consecutive k-out-of-n:F system. Our proof will be couched on the following Lemma. 1 2 n c Lemma 2. a. The double generating function of (n − i)i si (n) is given by i n ∞ n=1 i=1
1 2 ∂C(x, t) ∂C(x, t) n c −t . (n − i)i si (n)ti xn = x i ∂x ∂t 1
b. The double generating function of ni ∞ n−1 n=2 i=1
1 ni
n−1 i
2
n−1 i
(9)
2 si (n − 1) is given by
si (n − 1)ti xn = x2
∂L(x, t) + xL(x, t). ∂x
(10)
Signature of Coherent Systems - I.S. Triantafyllou & M.V. Koutras
125
1 2 n c Proof: a. The double generating function of (n − i)i si (n) may be written as i n ∞ n=1 i=1
1 2 1 2 n ∞ n c n c i n (n − i)i ni si (n)t x = x si (n)ti xn−1 i i n=1 i=1
−t
n ∞ n=1 i=1
1 2 n c si (n)ti−1 xn i i 2
and the RHS of (9) is directly deduced by observing that 1 2 1 2 ∞ n ∞ n ∂C(x, t) ∂C(x, t) n c n c si (n)ti xn−1 , = = si (n)ti−1 xn . ni i2 i i ∂x ∂t n=1 i=1 n=1 i=1 b. Making use of the obvious identity 1 ni
n−1 i
2
1 = (n − i)i
n−1 i
2
1 +i
n−1 i
2 , 1 ≤ i ≤ n − 1,
we may write ∞ n−1
1 ni
n=2 i=1
n−1 i
2 si (n − 1)ti xn =
∞ n−1
1 (n − 1)i
n=2 i=1
n−1 i
2 si (n − 1)ti xn
∞ n−1 1n − 12 +x i si (n − 1)ti xn−1 i n=2 i=1
and (10) is directly established on observing that the last term in the RHS coincides to L(x,t), while the first one reads x2
1 1 2 2 n ∞ n−1 ∞ n−1 n si (n)ti xn−1 (n − 1)i ni si (n − 1)ti xn−2 = x2 i i
n=2 i=1
n=1 i=1
= x2
∂L(x, t) . ∂x
We are now ready to proceed to the proof of the main result of this section. Proposition 4. The signatures sci (n) and si (n) satisfy the following relation sci (n) = si (n − 1),
1 ≤ i ≤ n − 1.
Proof: Substituting expressions (6) and (8) in the RHS of (9) and (10) respectively, we may readily verify the following identity n ∞ n=1 i=1
1 2 1 2 n ∞ n c n−1 i n (n − i)i ni si (n)t x = si (n − 1)ti xn . i i n=1 i=1
126
Signature
Therefore (n − i)i
2 1 2 1 n c n−1 si (n) = ni si (n − 1), i i
and the proof is completed by canceling the coefficients of sci (n), si (n − 1) in both sides. As a final application of the results presented in the present article, we shall proceed to the computation of the signature of a linear consecutive 2-out-of-n:F system. Substituting k = 2 in (8) we may obtain the double generating function of the quantities i
1 2 n si (n), i
1≤i≤n
as 1 2 n ∞ t2 x2 (tx + 2) n i . si (n)ti xn = i (1 − x − tx2 )2 n=1 i=1
(11)
If we introduce the notation 1 2 n n i si (n)ti cn (t) = i
(12)
i=1
and write (11) in the form (1 − x − tx2 )2
∞
cn (t)xn = t2 x2 (2 + tx)
(13)
n=1
it is easily verified (by noting that the coefficients of xn of the LHS should vanish for n ≥ 4) that the following recursive scheme holds true for cn (t) cn+1 (t) = 2(t + 1)cn (t) − (t + 1)2 cn−1 (t) − 2t2 cn−2 (t) +2t2 (t + 1)cn−3 (t) − t4 cn−5 (t),
n ≥ 5.
We may next recall (12), and picking out the coefficients of ti+1 , i=0,1,...,n, in both sides, the next recurrence relation ensues 1 2 1 1 2 2 n n−1 n−2 j · sj (n) −2j · · sj (n − 1) − 2(j − 1) · · sj−1 (n − 2) j j j−1 1 2 1 2 n−3 n−2 · sj−1 (n − 3) +j · · sj (n − 2) + 2 · (j − 1) · j−1 j 1 2 n−4 +(j − 2) · · sj−2 (n − 4) = 0 j−2
Signature of Coherent Systems - I.S. Triantafyllou & M.V. Koutras
127
for 1 ≤ j ≤ n and n ≥ 5. A sufficient set of initial conditions for the aforementioned recurrence is provided by
s1 (1) = 0,
2 1 2 1 , (s1 (2), s2 (2)) = (0, 1), (s1 (3), s2 (3), s3 (3)) = 0, , 3 3 2 1 1 1 (s1 (4), s2 (4), s3 (4), s4 (4)) = 0, , , 0 2 2
(we also set sj (n) = 0 for j ≤ 0 or j > n).
3. Discussion In the present article, we provide a formula that facilitates the evaluation of a reliability structure by a generating function approach. More specifically, we consider an ncomponent reliability structure whose components are linearly arranged and labeled as 1, 2, ..., n and assume that the principle of operation (or breakdown policy) of the system makes sense for any n. Then, the generating function R(z; p) of the sequence of reliabilities Rn (p), n = 1, 2, ... contains all the information for the reliability characteristics of the n-component systems, n = 1, 2, .... A class of systems, where an explicit expression is available for R(z; p), is the family of Markov chain embeddable structures (see [6] or Chapter 5 in [8]). In view of Proposition 1, if we have at hand an explicit expression for the generating function of the reliabilities Rn (p), we can write down the generating function of a multiple of the signature and thereof extract recurrence relations for the coordinates si (n) of the signature of the sequence of n-component systems, n = 1, 2, .... As an illustration of the general methodology, we considered the consecutive k-out-of-n:F systems and established: 1. a formula that expresses the signature of the circular consecutive k-out-of-n:F system in terms of the linear one, 2. a recurrence relation for the computation of the signature of a linear consecutive 2-out-of-n:F system. It is noteworthy that these results can be alternatively established by a combinatorial approach (see [10]) In closing, we mention briefly another case where our results can be easily applied. Consider the k-out-of-n:F system, which fails whenever at least k (not necessarily consecutive) components fail to work. Manifestly, Rn (p) = 1 −
n 1 2 n i=k
i
q i pn−i
and therefore R(z; p) =
∞ n=1
Rn (p)z n =
(qz)k z − . 1−z (1 − z)(1 − pz)k
128
Signature
A direct application of Proposition 1 yields 1 2 1 2 n ∞ ∞ (tx)k n n k n i n i = k si (n)t x = k t x k+1 i k (1 − x) n=1 i=1 n=1 and considering the coefficients of xn in the two power series we conclude that ⎧ 1 2 ⎨ 0, 1 2 n n i si (n) = i , ⎩ k k
if i =k if i = k.
Consequently, the components of the signature vector of a k-out-of-n:F system are given by
si (n) =
0, 1,
if i =k if i = k ,
a result that could be easily verified by combinatorial arguments as well (see e.g. [1]). Two additional reliability systems, where the signature could be analyzed by the generating function approach illustrated in the present article, are the r-within-consecutive-kout-of-n:F system and m-consecutive-k-out-of-n:F system. An expression for the reliability generating functions of them can be fetched from [6].
References [1]
[2] [3] [4] [5] [6] [7] [8] [9] [10]
P. J. Boland and F. J. Samaniego, The signature of a coherent system and its applications in reliability, Mathematical Reliability: An expository perspective (eds. R. Soyer, Th.A. Mazzuchi and N.D. Singpurwalla), Kluwer Academic Publishers, Boston, 2004. M. T. Chao, J. C. Fu and M. V. Koutras, Survey of reliability studies of consecutive k-out-of-n:F and related systems, IEEE Transactions on Reliability 44 (1995), 120–127. G. J. Chang, L. R. Cui and F. K. Hwang, Reliability of Consecutive- k systems, Kluwer Academic Publishers, Dordrecht, 2000. D. T. Chiang and S. Niu, Reliability of consecutive k-out-of-n:F system, IEEE Transactions on Reliability 30 (1981), 87–89. C. Derman, G. J. Lieberman and S. M. Ross, On the consecutive k-out-of-n:F system, IEEE Transactions on Reliability 31 (1982), 57–63. M. V. Koutras, On a Markov chain approach for the study of reliability structures, Journal of Applied Probability 33 (1996), 357–367. M. V. Koutras and S. G. Papastavridis, On the number of runs and related statistics, Statistica Sinica 3 (1993), 277–294. W. Kuo and M. J. Zuo, Optimal Reliability Modelling: Principles and Applications, John Wiley and Sons, New York, 2003. F. J. Samaniego, On closure of the IF R class under formation of coherent systems, IEEE Transactions on Reliability 34 (1985), 69–72. I. S. Triantafyllou and M. V. Koutras, On the signature of coherent systems and applications, Probability in the Engineering and Informational Sciences 22 (2008), 19–35.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
129
The Use of Stochastic Precedence in the Comparison of Engineered Systems Myles HOLLANDER a,1 , Francisco J. SAMANIEGO b,2 Department of Statistics, Florida State University, Tallahassee, FL 32306, U.S.A.
[email protected] b Department of Statistics, University of California, Davis, Davis, CA 95616, U.S.A.
[email protected] a
Abstract. While various forms of stochastic domination (including stochastic, hazard rate or likelihood ratio ordering) of one random variable over another have proven useful in making comparisons between systems, they share a common limitation. These modes of comparing systems induce only a partial ordering on the class of systems of interest, leaving some pairs of systems non-comparable. Comparisons via stochastic precedence (as defined in [1]) do not suffer from this limitation. In this paper, we describe how stochastic precedence may be used as a metric in comparing arbitrary systems whose components are assumed to be independent and identically distributed with common distribution F . An explicit computational formula is displayed for the relevant probability P (T1 ≤ T2 ), where T1 and T2 are system lifetimes. A necessary and sufficient condition depending solely on system signatures is given for stochastic precedence between system lifetimes. Examples are given that illustrate the fact that systems whose lifetimes are not comparable by stochastic, hazard rate or likelihood ratio ordering may be definitively compared via stochastic precedence. In the final section, we focus on comparisons between systems whose signatures are symmetric. Keywords. reliability, coherent systems, mixed systems, stochastic comparisons, distribution-free measures, system signatures, order statistics, symmetry
Introduction The problem of comparing the performance of competing systems is one of longstanding interest in the engineering sciences. One approach that has proven efficacious in this regard utilizes the notion of system signatures. We begin with a brief review of this particular proxy for system design and how it has been used in the comparative analysis of systems. Assuming that the components of a system of order n have i.i.d. lifetimes with common continuous distribution F , [10] define the “signature” of a system as the probability vector s whose ith element is the probability that the system fails upon the ith component failure. If X1:n ≤ · · · ≤ Xn:n are the n ordered component failures, then the ith element of the signature s is simply si = P (T = Xi:n ), where T is the lifetime of the 1 M. Hollander’s research was supported in part by NIH grant 5 RO1 DK52329 and NHLBI grant 7 RO1 HL 67460. 2 F. J. Samaniego’s research was supported in part by ARO grant WN11NF05-1-0118.
130
Signature
system. Signatures can be computed via combinatorial arguments, as their value is determined solely by the permutation distribution of the n component failure times. Thus, a given signature depends only on the system’s design and not on the underlying distribu tion F . Signature vectors take values in the simplex S = {s ∈ [0, 1]n | si = 1}. [10] provided explicit representations for the survival function of a system’s lifetime (and its density and failure rate when F is absolutely continuous) in terms of the system’s signature vector and the underlying component distribution F . These representations were obtained from a basic identity established in the paper: P (T > t) =
n
si P (Xi:n > t).
(1)
i=1
The i.i.d. assumption in defining a system’s signature requires further comment. The main contribution of the i.i.d. assumption is that it levels the playing field among competing systems. It is clear that a weak system (say an n-component series system) with highly reliable components will outperform a strong system (say an n-component parallel system) whose components have low reliability. When component lifetimes are i.i.d., any differences that remain among the systems of interest can only be attributed to the systems’ designs. In this sense, the signature of a system, as defined above, is a pure measure of a system’s design. For more details on this issue, see [8]. Coherent systems are characterized by the properties of monotonicity and component relevance (see [2] for details). While the concentration on coherent systems has been customary in Reliability Theory, there are good reasons for expanding this class to the class of all stochastic mixtures of coherent systems. Such “mixed systems” can be physically realized by selecting a system at random from the relevant coherent systems in a given problem. (See [4] for further detail). One consequence of this generalization is the fact that every probability vector in the simplex S above can be thought of as the signature of a mixed system. Indeed, the probability vector p can be represented n as the signature of a mixture of k-out-of-n systems, or, more specifically, as p = i=1 pi si:n , where si:n = (0, . . . , 0, 1, 0, . . . , 0) is the signature vector of an i-out-of-n system (with the “1” as the ith element). Broadening our focus to include mixed systems expands the space of signatures to the entire simplex S, the convex hull of the signatures of the coherent systems, and turns the problem of comparing systems into a continuous problem indexed by the vectors in S. The benefits of dealing with this expanded space become clear in certain optimization problems in which the optimal system turns out to be a non-degenerate mixture of coherent systems. See [5] for an example of this phenomenon in Reliability-Economics problems in which criterion functions depend on both performance and cost. The results developed below apply to arbitrary mixed systems (a class which of course includes degenerate mixtures, i.e, coherent systems themselves). The utility of signature vectors becomes especially clear in the context of the comparative analysis of competing systems. Kocher, Mukerjee and Samaniego [8] established several preservation theorems that demonstrate that certain key properties of pairs of signature vectors carry over to the corresponding system lifetimes. We will use the notation ≤st , ≤hr and ≤lr for stochastic, hazard rate and likelihood ratio ordering, respectively. (See [13] for definitions.) Kocher, Mukerjee and Samaniego [8] proved the following result for coherent systems. As noted by Block, Dugas and Samaniego [3], the result also holds for mixed systems.
Stochastic Precedence in System Comparisons - F.J. Samaniego & M. Hollander
131
Theorem 1. Let s1 and s2 be the signatures of the two mixed systems of order n, both based on components with i.i.d. lifetimes with common distribution F . Let T1 and T2 be their respective lifetimes. Then: (a) s1 ≤st s2 ⇒ T1 ≤st T2 , (b) s1 ≤hr s2 ⇒ T1 ≤hr T2 , and (c) s1 ≤lr s2 ⇒ T1 ≤lr T2 . The appeal of the preservation results above is that they allow an engineer, when comparing two systems for possible use, to examine the system signatures rather than more complex entities – the two systems’ lifetime distributions – and determine, in many cases, which is the preferable system. But one does encounter a limitation in potential applications of Theorem 1. There are systems whose signatures are not comparable, that is, for which neither of the inequalities s1 ≤ s2 or s2 ≤ s1 hold in any of the three stochastic senses mentioned above. The non-comparability problem may not be massive; for example, of the 190 possible comparisons of the 20 coherent systems of size 4, 180 of the pairs are comparable via stochastic ordering. The fact remains that one may not be able to determine, in the comparison in which one is particularly interested, which of two systems is preferable. Further, there are uncountably many pairs of mixed systems of a given size that will not be comparable. As will be seen, non-comparability turns out to be a limitation in the metric used to compare signatures and systems rather than in the signatures themselves. The purpose of this paper is to examine an alternative signature-based metric that permits definitive comparisons (i.e., better, equivalent or worse) between any and all pairs of mixed systems, including comparisons between systems of different sizes. In section 1, we apply the notion of “stochastic precedence” to the comparison of arbitrary mixed systems based on i.i.d. components from a common distribution F , and we display an explicit formula for making the necessary computations. Examples are given of comparisons via stochastic precedence of systems that are non-comparable via the orderings discussed above. Section 2 is dedicated to some special results on the comparison of systems whose signature vectors are symmetric.
1. Comparisons via Stochastic Precedence In this section, we will ultimately focus on the comparisons of pairs of systems that are based on components having independent lifetimes with common continuous distribution F . (Our first result is slightly more general.) It was noted above that the various forms of ordering between signatures assumed in Theorem 1 will often apply to the comparison of two systems of interest but that these conditions do not apply to all possible system comparisons. Under each of the three orderings in Theorem 1, the class of mixed systems of a given order are only partially ordered. An example of such non-comparability is the 2-component mixed system with the signature s1 = (1/2, 1/2) and the 4-component mixed system with the signature s2 = (0, 1/2, 1/2, 0). This 2-component system can be shown (using Lemma 3 stated in Section 2) to have the same lifetime distribution as the 4-component system with signature s∗1 = (1/4, 1/4, 1/4, 1/4). It is easy to verify that s∗1 and s2 are non-comparable using the three orderings mentioned above. An example of coherent systems that are non-comparable are the 4-component coherent systems with
132
Signature
respective minimal cut sets {{1}, {2, 3, 4}} and {{1, 2}, {1, 3}, {1, 4}, {2, 3}}. The first system has signature (1/4, 1/4, 1/2, 0) and the second has signature (0, 2/3, 1/3, 0). It is clear that these systems are not comparable in the st, hr or lr orderings. We will return to an examination of these systems in Examples 1 and 2 below. Arcones, Kvam and Samaniego [1] introduced the notion of “stochastic precedence,” an alternative approach to the notion that one random variable is smaller than another. As we shall see, this approach will prove useful when applied to the comparison of two competing systems. The “sp” relationship is defined as follows. Definition 1. Let Y1 and Y2 be independent random variables with respective distributions F1 and F2 . Then Y1 is said to stochastically precede Y2 (written Y1 ≤sp Y2 ) if and only if P (Y1 ≤ Y2 ) ≥ 1/2. If both Y1 ≤sp Y2 and Y2 ≤sp Y1 hold, the variables are said to be sp-equivalent. Continuous variables Y1 and Y2 are sp-equivalent if and only if they satisfy P (Y1 ≤ Y2 ) = 1/2. Now, let T1 and T2 be independent random variables representing the lifetimes of two mixed systems. The probability P (T1 ≤ T2 ) provides a vehicle for classifying the second system as better than, equivalent to or worse than the first system according to whether this probability is greater than, equal to or less than 1/2. Although we will be primarily interested here in the case in which all components of systems being compared have i.i.d. lifetimes from a common distribution F , we note that the potential for comparison extends to the system lifetimes of pairs of mixed systems in which the underlying component distributions are allowed to be different with, say, components from which the first system is constructed being i.i.d. variables with distribution F1 and components from which the second system is constructed being i.i.d. variables with distribution F2 . Indeed, the following general expression may be established for the probability P (T1 ≤ T2 ). A proof of this result is provided by Hollander and Samaniego [6]. Theorem 2. Let T1 and T2 represent the lifetimes of mixed systems of orders n and m with respective signatures s1 and s2 . Assume that the n components of system 1 have i.i.d. lifetimes governed by the continuous distribution F1 , and let {X1:n , X2:n , . . . , Xn:n } be the corresponding ordered component lifetimes. Similarly, assume that the m components of system 2 have i.i.d. lifetimes governed by the continuous distribution F2 , and let {Y1:m , Y2:m , . . . , Ym:m } be the corresponding ordered component lifetimes. Then P (T1 ≤ T2 ) =
m n
s1i s2j P (Xi:n ≤ Yj:m )
(2)
i=1 j=1
Before treating the technical issues involved in making comparisons via stochastic precedence, we will digress briefly to provide some justification for using this new approach to comparing systems. The probability P (X ≤ Y ) has long been a staple in probability and statistics. It arose early in the history of nonparametric testing. It is, in fact, the expected value of the famous Mann-Whitney statistic used to test the equality of two distributions F1 and F2 based on a random sample X1 , X2 , . . . , Xn ∼ F1 and an independent random sample Y1 , Y2 , . . . , Ym ∼ F2 . In that context, P (X ≤ Y ) measures the extent to which one distribution (or random variable) is larger than another – the precise interpretation we will be interested in here. The probability P (X ≤ Y ) has also been the principle parameter of interest in the subfield of Reliability called “stress-strength”
Stochastic Precedence in System Comparisons - F.J. Samaniego & M. Hollander
133
testing (see, for example, [7] or [11]). There, the variable Y represents the strength of the material of interest (e.g., welded steel bars used in bridge construction) and the variable X represents the level of stress to which the material is subjected. In this latter context, P (X ≤ Y ), generally referred to as the reliability of the material in question, is the probability that randomly chosen material will withstand the amount of stress that is randomly applied. One would interpret P (X ≤ Y ) > 1/2 as indicating that the material tends to be stronger than the stress placed upon it. We will henceforth restrict our attention to the case in which F1 = F2 ; we will provide an explicit formula for calculating the probability P (T1 ≤ T2 ) when the components of both systems have lifetimes that are i.i.d. with distribution F . Returning to the context in which we will employ this measure – the comparison of two systems’ lifetimes – we can infer that system τ2 will tend to last longer than system τ1 if their respective lifetimes satisfy the condition P (T1 ≤ T2 ) > 1/2. Theorem 2 gives the general expression needed for stochastic-precedence calculations. We now wish to make this formula operational. We first state a lemma which examines stochastic precedence between independently drawn order statistics. For a proof, see [6]. Lemma 1. Let X1 , X2 , . . . , Xn be a random sample of size n from a continuous distribution F , and let X1:n , X2:n , . . . , Xn:n be the corresponding order statistics. Let Y1 , Y2 , . . . , Ym be a random sample of size m from F , and let Y1:m , Y2:m , . . . , Ym:m be the corresponding order statistics. Assume that the two samples are independent. Then for any such distribution F ,
P (Xi:n ≤ Yj:m ) =
n
6n76m7 1
k=i
6kn+mj 7 k+j
j k+j
2 .
(3)
Lemma 1 permits the exact calculation of the probability which appears in expression (2) for comparing systems via stochastic precedence. Theorem 3. Let T1 and T2 represent the lifetimes of mixed systems of orders n and m with respective signatures s1 and s2 and ordered component lifetimes {X1:n , X2:n , . . . , Xn:n } and {Y1:n , Y2:n , . . . , Ym:m } obtained from two independent i.i.d. samples of sizes n and m from a common distribution F . Then P (T1 ≤ T2 ) =
m n i=1 j=1
s1i s2j
n k=i
6n76m7 1 6kn+mj 7 k+j
j k+j
2 .
(4)
Note that the probability P (T1 ≤ T2 ) is a distribution-free measure which takes the same value for all continuous distributions F . This fact does not appear to be widely known, and is of independent interest, as it has direct relevance to the theory of nonparametric tests. The property does not generalize; if the X sample is drawn from the distribution F1 while the Y sample is independently drawn from the distribution F2 , the probability P (T1 ≤ T2 ) will depend on F1 and F2 . Finally, we note that Lemma 1 is a special case of Theorem 4 in [9], a result that applies to inequalities involving an arbitrary number r ≥ 2 of independent order statistics. The lemma above allows us to make Theorem 2 more specific.
134
Signature
We now return for a closer examination of the pairs of systems mentioned in the first paragraph of this section. The comparisons we now execute will illustrate the utility of Theorem 3 and the use of stochastic precedence in the comparative analysis of competing systems. Example 1. Let us examine the two mixed systems discussed earlier in this section. We wish to compare the two systems’ signatures s1 = (1/4, 1/4, 1/4, 1/4) and s2 = (0, 1/2, 1/2, 0). From the expression in (5), we see that there are precisely eight non-zero coefficients of the form s1i s2j and each takes the value 1/8. It remains to compute the combinatorial portion of the formula. The outcomes of these calculations are as follows: P (X1:4 ≤ Y2:4 ) = 55/70, P (X1:4 ≤ Y3:4 ) = 65/70, P (X2:4 ≤ Y2:4 ) = 35/70, P (X2:4 ≤ Y3:4 ) = 53/70, P (X3:4 ≤ Y2:4 ) = 17/70, P (X3:4 ≤ Y3:4 ) = 35/70, P (X4:4 ≤ Y2:4 ) = 5/70, P (X4:4 ≤ Y3:4 ) = 15/70. From (4), we thus obtain P (T1 ≤ T2 ) =
2 1 21 55 + 65 + 35 + 53 + 17 + 35 + 5 + 15 1 = 1/2. 8 70
As mentioned earlier, the survival functions of the system lifetimes cross exactly once, with the second system having an initial advantage (that is a larger probability of survival) but with the first system having a larger probability of survival if both systems survive beyond a certain age. The present computation indicates that the shifting domination of survival functions comes out even in the end. Overall, each of these two systems has an equal chance of surviving longer than the other. Thus, in the sense of stochastic precedence, they are considered equivalent systems, and neither would be preferred over the other. We note that, unlike in the cases of the other orderings we have discussed, a concrete comparison of these two systems can actually be made, and the systems are found to be sp-equivalent. Example 2. We mentioned earlier that the two 4-component coherent systems with respective minimal cut sets {{1}, {2, 3, 4}} and {{1, 2}, {1, 3}, {1, 4}, {2, 3}} are noncomparable in the “st” sense (and thus in the “hr” and “lr” senses as well). Here, we will compare these systems via stochastic precedence. As noted earlier, the relevant system signatures are s1 = (1/4, 1/4, 1/2, 0) and s2 = (0, 2/3, 1/3, 0). Using Theorem 3, the probability P (T1 ≤ T2 ) may be computed as 6 1 7 6 11 7 6
14
+
6
1 12
7 6 13 7 14
+
617 617 6
2
+
6
1 12
7 6 53 7 70
+
6 1 7 6 17 7 3
70
+
617 617 6
2
,
a sum that reduces to the value 109/210 = 0.519. From this we conclude that the second system will last longer than the first system slightly more than half the time. Thus, in terms of stochastic precedence, the second system is preferred to the first.
Stochastic Precedence in System Comparisons - F.J. Samaniego & M. Hollander
135
We have alluded to the fact that stochastic precedence between two systems is weaker than the standard stochastic ordering, and that the former is in fact implied by the latter. This result is easily proven. Suppose that the variable X1 is stochastically smaller than the variable X2 , that is, assume that X1 ≤st X2 . Let F1 and F2 be the distributions corresponding to these variables. We then have that F1 (x) ≤ F2 (x) for all x > 0. This implies that P (X1 ≤ X2 ) =
0
∞
F1 (x) dF2 (x) ≥
0
∞
F2 (x) dF2 (x) = 1/2,
that is, X1 ≤sp X2 . Stochastic precedence is weak ordering, even in the continuous case, but it is especially so for discrete random variables. It is easy to construct examples showing that the inequality s1 ≤sp s2 does not necessarily imply T1 ≤sp T2 . While simple sufficient conditions on two signature vectors for the inequality T1 ≤sp T2 to hold seem to be quite elusive, it is surprisingly easy to identify necessary and sufficient conditions on the signatures of two systems for T1 ≤sp T2 to hold. This is due to the distribution-free character of the probability P (T1 ≤ T2 ), a fact that immediately leads to an NASC for stochastic precedence between the lifetimes of two mixed systems of arbitrary order based on i.i.d. component lifetimes. Indeed, if the RHS of (4) is denoted by W (s1 , s2 ), then P (T1 ≤ T2 ) >
1 2
(=
1 2
or < 12 ) if and only if W (s1 , s2 ) >
1 2
(=
1 2
or < 12 ).
(5)
Remark 1. From Theorem 3, we recognize that W (s1 , s2 ) is precisely equal to the probability P (T1 ≤ T2 ) and is, in fact, simply a computational formula for P (T1 ≤ T2 ). The value of W (s1 , s2 ) relative to the threshold 1/2 may nonetheless legitimately be viewed as yielding a signature-based NASC for stochastic precedence by virtue of the fact that W (s1 , s2 ) is independent of the underlying distribution F and depends solely on the signatures of the two systems involved. While the direct calculation of W (s1 , s2 ) in any specific problem would appear to be a cumbersome matter, it is clear that the formula is easily programmed; thus checking the stochastic precedence of one mixed system relative to another is a process that may be automated in a straightforward manner.
2. Some Special Results for Systems with Symmetric Signature Vectors In Section 1, we examined two mixed systems having signature vectors (1/2, 1/2) and (0, 1/2, 1/2, 0) respectively. Example 1 continued the discussion, executing the comparison via stochastic precedence of the second system with a 4-component system whose lifetime was stochastically identical to the first. (The justification of this substitution is given below.) The two systems were shown to be equivalent in the sense of stochastic precedence. In this section, we will see that this is merely an example of a quite general phenomenon involving signatures vectors that are symmetric. A general result in this regard is stated below. The next two results hold for systems of the same size (i.e., n = m). The result that follows them confirms that the result holds in general. The following Lemma records some general facts regarding stochastic precedence between independent order statistics. A proof is provided by Hollander and Samaniego [6].
136
Signature
Lemma 2. Let X1 , X2 , . . . , Xn be a random sample of size n from a continuous distribution F , and let X1:n , X2:n , . . . , Xn:n be the corresponding order statistics. Let Y1 , Y2 , . . . , Yn be a random sample of size n from F , and let Y1:m , Y2:m , . . . , Yn:n be the corresponding order statistics. Assuming the two samples are independent, it follows that (i) P (Xi:n ≤ Yi:n ) = 1/2, (ii) P (Xi:n ≤ Yj:n ) = P (Xn−j+1:n ≤ Yn−i+1:n ), and (iii) P (Xi:n ≤ Yj:n ) = 1 − P (Xj:n ≤ Yi:n ). Lemma 2 is the essential tool needed in establishing that any two systems of order n having symmetric signature vectors are sp-equivalent. For a proof, see [6]. Theorem 4. Let X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Yn be two independent random samples of size n from a continuous distribution F , and let X1:n , X2:n , . . . , Xn:n and Y1:n , Y2:n , . . . , Yn:n be the corresponding order statistics. Let τ1 and τ2 be mixed systems of order n based on components with i.i.d. lifetimes X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Yn as above. Denote the signatures of τ1 and τ2 by s1 and s2 and their lifetimes T1 and T2 . If both signature vectors are symmetric, that is, s1,i = s1,n−i+1 for all i and s2,i = s2,n−i+1 for all i, then P (T1 ≤ T2 ) = 1/2, that is, T1 and T2 are sp-equivalent. To extend the result above to systems of different sizes, we will utilize the following tool. For a proof, see [12]. Lemma 3. Let s = (s1 , s2 , . . . , sn ) be the signature of a coherent or mixed system in n i.i.d. components with common lifetime distribution F . Then the coherent or mixed system with (n+1)-components with i.i.d. lifetimes ∼ F and corresponding to the signature vector s∗ =
n 1 n+1 s1 , n+1 s1
+
n−1 2 n+1 s2 , n+1 s2
+
n−2 n−1 n+1 s3 , . . . , n+1 sn−1
+
1 n n+1 sn , n+1 sn
(6)
has the same lifetime distribution as the n-component system with signature s. Now consider the comparison of two systems of sizes n and m, where n < m. From Lemma 3, we can identify a system in n + 1 components with i.i.d. lifetimes ∼ F and signature s∗ that has the identical lifetime distribution as the n-component system with signature vector s1 . Moreover, it is easy to show that the signature s∗ will be symmetric if s1 is. Repeated applications of Lemma 3 lead to a signature s∗1 of an m-component system that is symmetric and has the same lifetime distribution as T1 . Theorem 4 thus implies the sp-equivalence of T1 and T2 . We record this as: Corollary 1. Let τ1 and τ2 be mixed systems of orders n and m, respectively, based on components with i.i.d. lifetimes ∼ F , and denote their signatures by s1 and s2 and their lifetimes T1 and T2 . If both signature vectors are symmetric, that is, that s1,i = s1,n−i+1 for all i and s2,i = s2,m−i+1 for all i, then P (T1 ≤ T2 ) = 1/2, that is, T1 and T2 are sp-equivalent. The symmetry of the signatures of the systems being compared is a sufficient condition for sp-equivalence. For an example showing that this condition is not a NASC, see Hollander and Samaniego [6].
Stochastic Precedence in System Comparisons - F.J. Samaniego & M. Hollander
137
References [1]
M. Arcones, P. Kvam and F.J. Samaniego, On Nonparametric Estimation of Distributions Subject to a Stochastic Precedence Constraint. Journal of the American Statistical Association 97 (2002), 170–182. [2] Barlow and Proschan, Statistical Theory of Reliability, Silver Springs, MD, To Begin With Press, 1981. [3] H. Block, M. Dugas and F.J. Samaniego, Characterizations of the Relative Behavior of Two Systems via Properties of their Signature Vectors. In N. Balakrishnan, Editor, Advances in Distribution Theory, Order Statistics and Inference, Boston, Birkhauser, 2006. [4] P. Boland F.J. and Samaniego, The Signature of a Coherent System and it’s Applications in Reliability. In R. Soyer, T. Mazzuchi and N. Singpurwalla (Editors), Mathematical Reliability: An Expository Perspective, Boston, Kluwer Academic Publishers, 1–29, 2004. [5] M. Dugas and F.J. Samaniego, On Optimal System Design in Reliability-Economics Frameworks, Naval Research Logistics (in press), 2007. [6] M. Hollander F.J. and Samaniego, On Comparing the Reliability of Arbitrary systems via Stochastic Precedence. Technical Report, Department of Statistics, University of California, Davis, 2006. [7] R. Johnson, Stress-Strength Models for Reliability. In Krishniah, P. R., Editor, Handbook of Statistics, Vol. 7: Quality Control and Reliability, New York, Elsevier, 27–54, 1988. [8] K. Kochar, H. Mukerjee and F.J. Samaniego, The ’Signature’ of a Coherent System and its Application to Comparisons among Systems, Naval Research Logistics 46 (1999), 507–23. [9] P. Kvam and F.J. Samaniego, On the Inadmissibility of Empirical Averages as Estimators in Ranked Set Sampling. Journal of Statistical Planning and Inference 36 (1993), 39–55. [10] F.J. Samaniego, On Closure of the IFR Class Under Formation of Coherent Systems, IEEE Transactions on Reliability Theory TR-34 (1985), 69–72. [11] F.J. Samaniego, Estimation Based on Autopsy Data from Stress-Strength Experiments, Quality Technology and Quantitative Management (Special Issue on Reliability) 4 (2007a), 1–15. [12] F.J. Samaniego, System Signatures and their Applications in Engineering Reliability, New York, Springer, 2007b. [13] M. Shaked and J.G. Shanthikumar, Stochastic Orders and Their Applications, San Diego, Academic Press, 1994.
138
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
The Role of Signature and Symmetrization for Systems with Non-exchangeable Components Fabio SPIZZICHINO 1 , University “La Sapienza”, Rome, Italy Abstract. The signature of a coherent system S is a feature of its structure function that, in the case when the lifetimes of the components of S are exchangeable, has a key role in the computation of the reliability function. We detail several aspects of this property and discuss the role that the concept of signature can have also for the case when the components’ lifetimes are not exchangeable. Keywords. Permutations of components, Symmetries of a system, Well-designed systems, Right-hand and left-hand cosets, Signature, Exchangeability
Introduction Let S be a reliability system formed with n components C1 , ..., Cn ; we assume that C1 , ..., Cn are working at time 0 and, for t > 0, j = 1, 2, ..., n, we put
Yj (t) =
1 if Cj is working at time t , YS (t) = 0 if Cj is down at time t
1 if S is working at time t . 0 if S is down at time t
We denote by φS the structure function of S, so that φS : {0, 1}n → {0, 1} and YS (t) = φS (Y1 (t) , ..., Yn (t)) . φS is assumed to be coherent (see e.g. [1] for definitions). Denote also by G the set of the path vectors of the system, i.e. G = {y ∈ {0, 1}n|φS (y) = 1}. The system’s lifetime is then given by XS = inf{t ≥ 0|YS (t) = 0}, and, denoting by (X1 , ..., Xn ) the components’ lifetimes, we can write XS = ψφ (X1 , ..., Xn ), where ψφ is a suitable non-decreasing function that reflects the structure of S. It is convenient, for what follows, to imagine that each component is capable to work until its own failure, even if the system is already failed. We denote by X(1) , ..., X(n) 1 Fabio L. Spizzichino, Department of Mathematics, University “La Sapienza”, Piazzale A. Moro 2, 00185 Rome, Italy; E-mail:
[email protected]
The role of signature and symmetrization - F. Spizzichino
139
the order statistics of the vector (X1 , ..., Xn ) and assume that the joint distribution of (X1 , ..., Xn ) is absolutely continuous, so that P {X(1) < ... < X(n) } = 1. Obviously, since the system S is going to fail in concomitance with the failure of some of the system’s components, it will be P {XS = X(k) , for some k} = 1. Consider then the events E1 , ..., En , where Ek ≡ {XS = X(k) },
(1)
so that we can also write XS = ψφ (X1 , ..., Xn ) =
n
X(k) 1Ek .
(2)
k=1
By applying the formula of total probabilities, the system’s reliability RS (t) ≡ P {XS > t} can be written in the form
RS (t) ≡
n
P (Ek ) · P {X(k) > t|Ek }.
(3)
k=1
For k = 1, 2, ..., n, of course the probabilities P (Ek ) and P {X(k) > t|Ek } generally depend on both the structure of the system and the joint distribution of (X1 , ..., Xn ). As it will detailed in the next Section 2, a relevant property holds in the case when X1 , ..., Xn are exchangeable: while the quantities P {T(k) > t|Ek } (k = 1, ..., n) only depend on the joint law of the vector (X1 , ..., Xn ), the quantities P (Ek ) (k = 1, ..., n) only depend on the structure of the system S. More precisely the vector (P (E1 ) , ..., P (En )) does coincide with the signature of the system S. The concept of signature has then fundamental relevance in the exchangeable case. This concept was first introduced by Samaniego in [10] and, posterior to that, it was reconsidered in several papers (see in particular [7], [2], [4], [3],[8]). In these papers the reader can find several results and examples about the concept of signature and its use. In particular the concept of signature in the case of exchangeability has been studied in [8]. We point out however that the definition of signature as introduced by Samaniego and then used in some of the afore-mentioned papers is, from a conceptual point of view, slightly different from the definition used here (see also Remark 1, below). For definition and basic properties of exchangeable lifetimes see e.g. [11] and references cited therein. In the case when X1 , ..., Xn are not exchangeable it is natural, in view of the aforementioned simplifying property, to try and compare the actual system reliability with the reliability function corresponding to a suitable other vector of exchangeable lifetimes. Taking into account the formula (3), we consider in this respect the exchangeable vector, that shares with (X1 , ..., Xn ) the same joint law for the vector of order statistics. This
140
Signature
leads us to analyze and use in the present context the concept of symmetrization of a vector of random variables. Our purpose in the present paper is then to show and discuss several details about the role of symmetrization and of signature in the computation and approximation of the reliability function RS (t). We shall show in this respect that the signature, besides simplifying the expression of the reliability function in the exchangeable case, can also provide some information about symmetry properties of the system S. The latter properties, in their turn, can explain the difference between S and the system formed by components with “symmetrized” lifetimes. In a probabilistic viewpoint, we can say that this paper deals with random permutations of the set {1, 2, ..., n} and different meanings that they can have in a system-reliability context. The plan of the paper is as follows. In Section 2, we first introduce some useful notation and recall the definition of signature; then we turn to the case of exchangeability and analyze in details the special form that, under such a condition, is taken by the formula (3). Topics in Section 2 are strictly related to topics that have been recently presented in the literature with the aim to highlight the role of the signature in the case of exchangeability (see e.g. Section 2 of [8]). The form of our presentation, however, is more adapt for our purposes and serve, in particular, to suitably prepare the ground for the subsequent Sections 3 and 4. In Section 3 we analyze some properties of the concept of symmetrization of a multivariate probability distribution, and discuss a related interpretation in the field of system reliability. In Section 4 we will present some results and a short discussion concerning relations between signature and features of symmetry of a given reliability system. Our arguments in that Section will be based both on the topics developed in the previous Sections and on elementary notions of groups theory (find e.g. a basic reference in [6]). A more advanced algebraic analysis may be a promising subject for further research. All along the paper X1 , ..., Xn will have the meaning of non-negative random variables while T1 , ..., Tn and X1∗ , ..., Xn∗ will have the meaning of exchangeable nonnegative random variables.
1. The Meaning of Signature in the Exchangeable Case As first we introduce the {1, 2, ..., n}-valued random variable M defined by the position M = k if XS = X(k) ,
(4)
so that it is P {M = k} = P (Ek ) , k = 1, 2, ..., n. We then consider the random vector (J1 , ..., Jn ) defined as a function of (X1 , ..., Xn ) by the equations Jh = i if X(h) = Xi , h = 1, 2, ..., n.
(5)
(J1 , ..., Jn ) is then a random permutation of {1, 2, ..., n}. Let P be the set of all the permutations of {1, 2, ..., n} and put Ak ≡ {(j1 , ..., jn ) ∈ P|J1 = j1 , ...Jn = jn ⇒ XS = X(k) }. so that
The role of signature and symmetrization - F. Spizzichino
141
{(J1 , ..., Jn ) ∈ Ak } = {XS = X(k) } = {M = k} = Ek . ) is then a partition of P, determined by φS . Being (A1 , ..., An ) a partiA =(A1 , ..., An n tion of P, it is k=1 |Ak | = |P| = n!, where |Ak | denotes the cardinality of Ak . Using such notation, and in view of the arguments developed in [7]), the concept of signature can be defined in the following, convenient, form. Definition 1 The signature of the system S is the vector p ≡ (p1 , ..., pn ), where pk ≡
|Ak | , k = 1, 2, ..., n. n!
From now on in this Section we assume that the components of S have exchangeable lifetimes. The latter will be denoted by T1 , ..., Tn , in order to distinguish this case from the general case. T(1) , ..., T(n) and TS will respectively denote the order statistics of (T1 , ..., Tn ) and the lifetime of S. The meaning of the symbols Ek , Jk , Ak , pk , introduced above, will be adapted in the obvious way to this case, by replacing X by T . The following Lemma is just an immediate consequence of the definition of exchangeability. Lemma 1 If T1 , ..., Tn are exchangeable then the distribution of (J1 , ..., Jn ) is uniform over P. We notice that Lemma 1 in turn implies: 7 6 Lemma 2 If T1 , ..., Tn are exchangeable then the vectors (J1 , ..., Jn ) and T(1) , ..., T(n) are stochastically independent. For the present exchangeable case, some fundamental facts related to Definition 1 are shown in the following Proposition 1 For T1 , ..., Tn exchangeable, the following properties hold i) for k = 1, 2, ..., n, it is P (Ek ) = pk . 6
ii) for k = 1, 2, ..., n, the events T(k) iii) RS (t) =
n
(6)
7 > t and Ek are independent.
pk P {T(k) > t}.
(7)
k=1
Proof: Since the event Ek is equivalent to {(J1 , ..., Jn ) ∈ Ak }, i) immediately follows from Definition 1 and Lemma 1, while ii) follows from Lemma 2. iii) is an immediate consequence of i) and ii), in view of (3). Remark 1 The definition of signature only involves the structure of a reliability system and it is independent on the joint distribution of its components’ lifetimes. The assumption of exchangeability, however, allows us to establish the validity of the identities in
142
Signature
(6), i.e. that the probability distribution of M coincides with the signature p. For a given coherent system S, then, the probability distribution of M remains the same, independently of the choice of the joint distribution of (T1 , ..., Tn ), within the class of all the exchangeable, absolutely continuous, distributions on Rn+ . It is instead easy to find a (non-exchangeable) distribution of lifetimes, for which the corresponding distribution of M is different from p. Example 1 Consider three components with independent, exponentially distributed lifetimes, X1 , X2 , X3 and put E (Xi ) = λ1i , i = 1, 2, 3. Consider now the 3-components system S defined by the structure function φ(x1 , x2 , x3 ) = min [max (x1 , x2 ) , x3 ]. It is M = 1 if and only if X(1) 7 j=1,2,3 Xj = X3 ; otherwise it is M = 2. This yields 6 = min that the signature of S is 13 , 23 , 0 , whereas, as it is easy to check, P (M = 1) =
λ1 + λ2 λ3 , P (M = 2) = . λ1 + λ2 + λ3 λ1 + λ2 + λ3
The distribution of M then coincides with the signature if and only if λ1 +λ2 = 2λ3 . Remark 2 The systems of the type k : n are just defined by the condition P {M = n − k + 1} = 1. In this case, the distribution of M is independent of the choice of the joint distribution of (T1 , ..., Tn ), even within the class of all the absolutely continuous distributions on Rn+ . Remark 3 We saw above that, in the present case of exchangeable lifetimes, the probabilities P (Ek ) (k = 1, 2, ..., n) depend only on the structure of the system, while the conditional probabilities P {T(k) > t|Ek } only depend on the joint distribution of T1 , ..., Tn . More precisely, P (Ek ) are determined by the signature of S; parallel to that, we point out that P {T(k) > t|Ek } = P {T(k) > t}. 2. Symmetrization of Components’ Lifetimes Let X1 , ..., Xn be the lifetimes of the components C1 , ..., Cn of a coherent system S and denote by fX (x1 , ..., xn ) the joint density of X1 , ..., Xn ; fX is not assumed to be necessarily exchangeable. For π ∈ P, x ≡ (x1 , ..., xn ) ∈ {0, 1}n , let xπ denote the vector (xπ1 , ..., xπn ); let furthermore Xπ denote the vector (Xπ1 , ..., Xπn ). It is fXπ (x) = fX (xπ−1 ) ,
(8)
where the permutation π −1 is the inverse of π. For our purposes here it is convenient to imagine that all the components of S, even if they differ in their reliability, are adapt to make the same job in the system. In this way we can conceive that any component can be allocated in each of the n positions of S (that in practice may happen, for instance, when S is a communication network). Fix now π ∈ P and think of π as a permutation of the components of S. For what said above, it makes sense to consider a new system Sπ , made up by installing the component Cπj in the position j (j = 1, 2, ..., n). In other words the structure function of Sπ is defined by ϕπ (x) = ϕ (xπ ) .
(9)
The role of signature and symmetrization - F. Spizzichino
143
Remark 4 Generally, for π = π ∈ P, the two system Sπ , Sπ will be different. However they trivially share the same signature. We may wonder if this is the unique situation under which two different coherent systems, with same number of components, can have the same signature. Actually this is not the case as it is e.g. shown by the counterexample in [7], pg 513. For π = π ∈ P, the two systems Sπ , Sπ may coincide. Formally this means ϕ (xπ ) = ϕ (xπ ) , ∀x ∈ {0, 1}n. This defines an equivalence relation ≈ between two elements of P and a partition B of the set P. Remark 5 In Section 2 and in the above, we respectively introduced A and B, two different partitions of the same set P, both associated to a same system S. A is related to the definition of the signature of S while B is related to the structure of symmetry (or invariance) of S. Remark 6 For our discussion it is useful also to keep in mind that a permutation π ∈ P can be seen as both a transformation acting on the system or on the joint density of the components’ lifetimes. In fact, for any π ∈ P we can consider the new density fXπ and the new structure ϕπ , respectively defined by (8) and (9). (X) (X) We denote by Rπ (t) the reliability function of Sπ and notice that Rπ (t) = (Xπ ) (X) RS (t). Were X1 , ..., Xn actually exchangeable it would then be Rπ (t) = (X) RS (t) , ∀π ∈ P. (X) (X) When X1 , ..., Xn are not exchangeable, Rπ (t) and RS (t) generally are two
different functions. In this respect, we formulate the following Definition 2 The system S is correctly-designed when it is (X)
RS
(t) ≥ Rπ(X) (t) , ∀π ∈ P.
Here we assume that S is correctly-designed and notice that this assumption in particular implies (X)
RS
(t) ≥ R∗ (t),
(10)
1 (X) Rπ (t) . n!
(11)
where we put R∗ (t) =
π∈P
We now focus attention on the computation and on the meaning of the function R∗ (t). To this purpose we denote furthermore by X1∗ , ..., Xn∗ the exchangeable lifetimes obtained from X1 , ..., Xn by symmetrization: letting Π be a random permutation of {1, 2, ..., n}, uniformly distributed over P, we set (X1∗ , ..., Xn∗ ) = (XΠ1 , ..., XΠn ) .
144
Signature
The joint density function fX∗ of (X1∗ , ..., Xn∗ ) is obtained by means of symmetrization of the density fX ; in other words, by the formula of total probabilities we can write fX∗ (x1 , ..., xn ) =
P (Π = π)fX∗ (x1 , ..., xn |Π = π),
π∈P
whence 1 fX (xπ1 , ..., xπn ). n!
fX∗ (x1 , ..., xn ) =
(12)
π∈P
Lemma 3 The vectors of the order statistics share the same joint law.
∗ ∗ , ..., X(n) X(1)
6 7 and X(1) , ..., X(n)
Proof: Fix a vector x = (x1 , ..., xn ) ∈ Rn+ with x1 < ... < xn and compare the values taken in x by the joint densities fX∗(·) and fX(·) , respectively. It is fX(·) (x) =
fX (xπ1 , ..., xπn ).
π∈P
On the other hand, by taking into account (12), fX∗(·) (x) =
fX∗ (xπ1 , ..., xπn ) = n!fX∗ (x) =
π∈P
fX (xπ1 , ..., xπn ).
π∈P
For a stimulating discussion about the role of exchangeability in the study of order statistics see also [5]. ∗ (X ) (X) Compare now RS (t) and RS (t), the reliability functions ofthe system S, for ∗ ∗ or (X1 , ..., Xn ), , ..., X(n) the two cases where the lifetimes of components are X(1) (X∗ )
(X)
respectively. Of course, RS (t) and RS (t) generally are two different functions. As (X) (X∗ ) to RS (t) we recall Eq. (3) whereas, for RS (t), we can write, by Lemma 3 and by Eq. (7), (X∗ )
RS
(t) =
n
pk P {X(k) > t}.
(13)
k=1
Taking into account Eq. (12) again, we can then state Proposition 2 (X∗ )
RS
R∗ (t) =
(t) = R∗ (t),
n k=1
pk P {X(k) > t}.
(14)
The role of signature and symmetrization - F. Spizzichino
145
(X∗ )
For R∗ (t) = RS (t), we can also give an interpretation as follows. Let us denote by S ∗ the system obtained from S, by installing the n components over the different positions in a random order, according to a random permutation Π, uniformly distributed over P. In view of Proposition 2, R∗ (t) can then be seen as the reliability function of S ∗ , i.e. (X∗ )
R∗ (t) = RS
(X)
(t) = RS ∗ (t) .
(15)
Remark 7 We can summarize all the above by saying that the fictitious system S ∗ has the same structure of S and exchangeable components; the joint distribution of the corresponding vector of lifetimes is the one obtained by symmetrization from fX . The corresponding random variable M ∗ has a probability distribution that depends on the structure of S, but does not depend on fX . Actually the probability distribution of M ∗ is the signature of S. On the other hand, as expressed by the identity (15), S ∗ can also be seen as a system with the same components of S, i.e. with lifetimes X1 , ..., Xn , but with a different structure; the two different structures share however the same signature. It is interesting to notice that, for purposes different from ours, the system S ∗ and Proposition 2 have also been considered in [9], where S ∗ is termed “mean system”.
3. Symmetrization and Role of Signature in the Case of Non-exchangeability. In this Section we focus attention on the symmetry properties of the structure function ϕ of a coherent system S. We still consider the case when the components of S have lifetimes X1 , ..., Xn , with a (non-necessarily exchangeable) joint density fX and X1∗ , ..., Xn∗ denote the exchangeable lifetimes obtained from X1 , ..., Xn by symmetrization. All the rest of the notation introduced in the previous Section will also be maintained. As an immediate consequence of Proposition 2 and inequality (10), we can write (X∗ )
min Rπ(X) (t) ≤ R∗ (t) = RS π∈P
(X)
(X)
(t) = RS ∗ (t) ≤ RS
(t) .
The interest of the above inequality is due to the fact that, by (13), R∗ (t) can be computed in terms of only the signature of S and the marginal distributions of the order statistics X(1) , ..., X(n) . It is intuitive that the higher the level of symmetry for fX , and (X) the more ϕ is symmetric, the smaller is the difference RS (t) − R∗ (t). Our claim is that the signature of S can also have a role in the analysis of the properties of symmetry in the structure of S. For our discussion, the following examples will be useful. Example 2
In the case of a k : n system S, it is (Xπ )
min RS π∈P
(X∗ )
(t) = RS
(X)
(X)
(t) = RS ∗ (t) = RS
(t) .
In fact, regardless of fX being exchangeable or not, in this case we have
146
Signature (X)
RS
(X)
(t) = RS ∗ (t) = P {X(n−k+1) > t}, (X)
(X)
and, for any pair π , π ∈ P, Rπ (t) = Rπ (t). Then B contains only one set coinciding with P and we can say that the system S is “completely symmetric". On the other hand, this is the extreme case in which the signature is completely concentrated, i.e. it is a degenerate distribution; in fact the signature of S is of the form (0, 0, ..., 0, pn−k+1 = 1, 0, ..., 0). In the following two examples we compare the signatures and the partitions BS respectively associated to two different systems of four components each. Example 3
Consider the system S with structure given by φ(y1 , y2 , ..., yn ) = min [y1 , max (y2 , ..., yn )] ,
i.e. S is formed as a series of component 1 and the parallel of the remaining components. Then S has a very asymmetric structure. The partition BS is formed by n sets, containing (n − 61)! permutations each. In this case the signature, that we denote by p(2) , is of the 7 1 1 1 2 form n , n , ..., n , n , 0 . by 4 different sets, containing d = 6 permutations In the case n = 4, 6 BS is formed 7 each and it is p(2) = 14 , 14 , 12 , 0 . Example 4 Here we directly consider the case n = 4 and the structure φ(y1 , y2 , y3 , y4 ) = min (max (y1 , y2 ) , max (y3 , y4 )). BS is formed by 6the 3 sets,7 containing d = 8 permutations each. The associated signature is p(3) = 0, 13 , 23 , 0 . We incidentally notice that it is p(2) M p(3) , where M denotes the majorization ordering (see e.g. [11] and references therein). We can now guess that the signature of S can have the following role in describing the amount of symmetry in the structure of S: the less the signature is, in some suitable sense, “concentrated”, the smaller is the level of symmetry in the structure of S. In this respect, we first introduce some further notation and then show two facts of arithmetic and algebraic character. Put P0 ≡ {π ∈ P|ϕ (xπ ) = ϕ (x) , ∀x ∈ {0, 1}n}. P0 is of course one of the sets belonging to the partition B. We also generally put d = |P0 | and denote by ◦ the operation of composition between two permutations π and π ∈ P, so that ϕπ (xπ ) = ϕ (xπ ◦π ) . Lemma 4 (a) d is a factor of n! (b) any set belonging to B contains d elements (c) B contains n! d sets
The role of signature and symmetrization - F. Spizzichino
147
Proof: (a) It is easily seen that the set P0 , endowed with the operation ◦, is a subgroup of (P, ◦). Then d, the cardinality of P0 is a divisor of n!, by Lagrange Theorem of Group Theory. (b) Let π , π ∈ P be such that π ≈ π , i.e., ∀x ∈ {0, 1}n , ϕ (xπ ) = ϕ (xπ ), or ϕπ (x) = ϕπ (x) . It is then ϕ (xπ ◦π−1 ) = ϕπ (xπ−1 ) = ϕπ (xπ−1 ) = ϕ (x). Whence π ≈ π ⇒π ◦ π −1 ∈ P0 . Then any equivalence class belonging to B is a right-hand coset relative to P0 and contains the same number of elements d, as P0 . (c) is a trivial consequence of (b). The number d can be considered as an index of the symmetry of S. Remark 8 In the Examples above, we saw cases where d = mink:|Ak |>0 |Ak |. However this not necessarily happens in the general case. It is easy, for instance, to find examples where S has no symmetries and then d = 1, whereas it is |Ak | > 1 for any k such that |Ak | > 0. Concerning the relations between the index d and |A1 |, ..., |An |, we can rather state the following result. Proposition 3 d is a factor of |Ak |, for k = 1, 2, ..., n. Proof: For h = 1, .., n, let x(h) ∈ {0, 1}n be defined by the position
(h)
xi
=
0 0≤i≤h 1h+1≤i≤n
For k = 1, .., n and σ ∈ P, the condition σ ∈ Ak holds if and only if it is ϕσ x(k−1) = 1, ϕσ x(k) = 0, or ϕ
x(k−1)
= 1, ϕ
σ
x(k)
σ
= 0.
For any π ∈ P0 , we can then write ϕ
x(k−1)
σ◦π
= 1, ϕ
x(k)
σ◦π
= 0,
or ϕσ◦π x(k−1) = 1, ϕσ◦π x(k) = 0.
148
Signature
We thus see that also σ ◦ π ∈ Ak , and we can conclude the proof by noticing that Ak can be partitioned as the union of one or more left-hand co-sets of the form Aσ(k) ≡ {σ ◦ π|π ∈ P0 }, σ ∈ Ak , containing d elements each.
Acknowledgments I like to thank Subhash Kochar for useful discussions on the topics of the present paper. This research was partially supported by University "La Sapienza" and Italian M.U.I.R. in the frame of a 2006 University Project.
References [1] [2] [3]
[4]
[5]
[6] [7] [8] [9] [10] [11]
R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, To Begin With, Silver Spring, MD, 1981. P. J. Boland, Signatures of indirect majority systems, J. Appl. Probability 38 (2) 597-603, 2001. P. J. Boland and F. J. Samaniego, The signature of a coherent system and its applications in reliability, in T. Mazzucchi, N. Singpurwalla, and R. Soyer (Eds.) Mathematical reliability: an expository perspective, Kluwer Acad. Publ., Boston, MA, (2004), pp 3–30. P. J. Boland, F. J. Samaniego, and E. M. Vestrup, Linking dominations and signatures in network reliability theory, in Mathematical and statistical methods in reliability, World Sci. Publ., River Edge, NJ, (2003), pp 89–103. J. Galambos The role of exchangeability in the theory of order statistics, in G. Koch and F. Spizzichino (Eds.) Exchangeability in probability and statistics, North-Holland,Amsterdam-New York,(1982), pp 75–86. I. N. Herstein, Topics in Algebra, Blaisdell Publ. Co., New York-Toronto-London, 1964. S. Kochar, H. Mukerjee, and F. J. Samaniego, The "signature" of a coherent system and its application to comparisons among systems, Naval Res. Logist. 46 (5) 507-523, 1999. J. Navarro and T. Rychlik, Reliability and expectation bounds for coherent systems with exchangeable components, J. Multivariate. Anal. 98 102–113, 2007. J. Navarro, N. Balakrishnan, D. Bhattacharya, and F. Samaniego, Signatures of coherent systems and order statistics: a general approach, Private communication, May 2007. F. J. Samaniego, On closure of the IFR class under formation of coherent systems, IEEE Trans. Reliab. R34 60–72, 1985. F. Spizzichino, Subjective probability models for lifetimes, Chapman and Hall/CRC, Boca Raton, Fl., 2001.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
149
Generalized Marshall-Olkin Models: Aging and Dependence Properties Fabio SPIZZICHINO a , Florentina SUTER b,1 a University “La Sapienza”, Rome, Italy b University of Bucharest, Romania Abstract. We analyze several aspects of a class of bivariate survival models that arise as a direct generalization of the bivariate exponential Marshall-Olkin model and that describe situations of (possibly dependent) competing risks. Keywords. Marshall-Olkin model, survival copula, bivariate aging function, dependence, competing risks
Introduction The topic of competing risks has an increasing importance not only in the field of reliability, but also in the broader field of applied probability (see in particular [4], [5]). Situations of competing risks arise, in fact, in several different applications with a recent special development in the field of finance and interacting defaults. In the present note we specifically deal with dependence and bivariate aging properties for bivariate survival models created by a certain competing risks situation. In particular we consider a pair of different units whose correlations are determined by the presence of two competing types of risk that act symmetrically on the two units. More precisely, hereafter, we present a rather straightforward generalization of the well-known bivariate exponential model of Marshall-Olkin (see e.g. [6], [1]). Inspired by the relevant properties of positive dependence manifested by the Marshall-Olkin model we obtain then some sufficient conditions that guarantee positive dependence properties for our generalized models. Moreover we also consider some special type of bivariate aging properties that have been studied in [2], [3] and that are also met in the analysis of the original Marshall-Olkin model. We study the bivariate model defined by the product ¯ 2 (x, y) , ¯ (x, y) = H ¯ 1 (x, y) H H
(1)
¯ 1 (x, y) and H ¯ 2 (x, y) are two joint exchangeable survival functions with the where H ¯ 1 (x) = H ¯ 1 (x, 0) and G ¯ 2 (x) = H ¯ 2 (x, 0), reunivariate marginal survival functions G spectively. 1 Corresponding Author: University of Bucharest, Department of Mathematics and Computer Science, Academiei 14, 010014 Bucharest, Romania; E-mail: fl
[email protected]
150
Relations among Aging and Stochastic Dependence
¯ can emerge from special situations where each The bivariate survival function H of two similar units, say U and V , undergoes two competing risks associated to shocks which arrive at random times. More precisely, we consider four different random times T1 , W1 , T2 , W2 such that the unit U fails at the random time min{T1 , T2 }, and the unit V fails at the random time min{W1 , W2 }. We then analyze properties of the joint survival models of the two failure times X = min{T1 , T2 }, Y = min{W1 , W2 }.
(2)
We assume that the pairs (T1 , W1 ) and (T2 , W2 ) are stochastically independent and that the joint model of (Ti , Wi ) (i = 1, 2) is described by the joint exchangeable survival ¯ i (x, y), i.e. function H ¯ 1 (x, y), P {T1 > x, W1 > y} = H
¯ 2 (x, y). P {T2 > x, W2 > y} = H
Our assumption means that we have two types of shocks, one type taking place at time T1 on the unit U and at time W1 on the unit V , and the other type taking place at time T2 on the unit U and at time W2 on the unit V . These two types of shocks are independent, while there is a correlation between the two shocks of the same type, acting on the two different units. We note that the pair (X, Y ) of the failure times of the units U and V admits a joint ¯ ¯ (x, y) can be seen survival function H(x, y), as it is given by (1), and that the function H as a direct extension of the Marshall-Olkin model. The latter is obtained in fact when the two units undergo a same shock of second type, i.e. P {T2 = W2 } = 1, T2 is exponential and T1 , W1 are independent and identically, exponentially, distributed. In order to study some dependence and some bivariate aging properties for the model (1), we will analyze the corresponding survival copula function and the so-called bivari¯ ate aging function B (see e.g. [3]) of H. ¯ For a bivariate exchangeable survival model with a joint survival function H(x, y) ¯ and marginal survival function G(x), the survival copula is 7 6 −1 ¯ −1 (v) , 9 v) = H ¯ G ¯ (u), G C(u,
0 ≤ u, v ≤ 1
(3)
and the function B is defined as 6 7 ¯ log u, − log v) }. ¯ −1 H(− B(u, v) = exp{−G
(4)
The latter can be used in the description of some aging properties of a bivariate survival model (see [2], [3]). In Section 2 we determine the survival copula and the bivariate aging function B for ¯ In Section 3 we deduce some dependence and the generalized Marshall-Olkin model H. ¯2. ¯ ¯ 1 and H aging properties for H in terms of related properties for H
Generalized Marshall-Olkin Models - F. Spizzichino & F. Suter
151
1. Survival Copulas and Aging Functions of the Generalized Marshall-Olkin Model ¯ in (1) we notice that the marginal survival function Concerning the bivariate model H ¯ H¯ is: G ¯ ¯ (x) = G ¯ 2 (x). ¯ 1 (x) · G G H 9H¯ and of the For the model (1), we focus now on properties of the survival copula C bivariate aging function BH¯ . These properties will depend on properties of the survival ¯ i , i = 1, 2. Let us denote by copulas and of the bivariate aging functions of the models H 9i the survival copula and by Bi the bivariate aging function of the model Hi , i = 1, 2. C 9H¯ can be determined from C 91 , C 92 We notice that the survival copula for the model (1), C as follows: 9H¯ (u, v) = C
2
# ## " " −1 " −1 ¯i G ¯i G ¯ ¯ (u) , G ¯ ¯ (v) , 9i G C H H
(5)
i=1
and we shortly denote this operation by: 91 G¯ ,G¯ C 92 . 9H¯ = C C 1 2
(6)
¯ by its definition, can be obtained In a similar way, the bivariate aging function of H, from the bivariate aging functions B1 , B2 as follows: 5
¯ −1 BH¯ (u, v) = exp −G ¯ H
2
8 ¯ i (− log Bi (u, v)) G
,
(7)
i=1
and we shortly denote this operation by: BH¯ = B1 G¯ 1 ,G¯ 2 B2 .
(8)
We notice that the model defined by (1) becomes a Marshall-Olkin model by letting 91 (u, v) = u · v, C
92 (u, v) = min{u, v}, C
and ¯ 1 (x) = exp{−λ1 x}, G
¯ 2 (x) = exp{−λ2 x}. G
(9)
¯ 1 and G ¯ 2 are We also notice that the condition (9), namely that the two marginals G exponential, gives rise to simpler expressions for the formulas (5) and (7) of the survival 9H¯ and the bivariate aging function BH¯ , respectively. In fact these formulas copula C become:
152
Relations among Aging and Stochastic Dependence 2 λi λi 91 G¯ ,G¯ C 9H¯ (u, v) = C 92 (u, v) = 9i u λ1 +λ 2 , v λ1 +λ2 C C 2 1
(10)
i=1
and 2 λi 6 7 BH¯ (u, v) = B1 G¯ 1 ,G¯ 2 B2 (u, v) = (Bi (u, v)) λ1 +λ2 .
(11)
i=1
¯ 2 are not necessarily exponential, a simplified expression like (10) ¯ 1 and G Even if G can also be obtained by considering the following relationship between them. " # ¯ 2 (x) = G ¯ 1 (x) θ . G
(12)
9H¯ is: In this case the survival copula C 1 θ 1 θ 9H¯ (u, v) = C 91 u θ+1 92 u θ+1 91 G¯ ,G¯ C 92 (u, v) = C C , v θ+1 · C , v θ+1 , 2 1 and coincides with (10) when θ =
(13)
λ2 λ1 .
¯ 2. Dependence and Bivariate Aging Properties of H In this section we will prove closure results for dependence and aging properties of ¯2, ¯ (x, y). More precisely, considering arbitrary marginal survival functions G ¯ 1 and G H 9 9 we make some dependence assumptions on C1 and C2 , and we will see how these as9H¯ . Similarly, we will see how dependence assumpsumptions reflect on dependence of C tions on B1 and B2 trigger the dependence properties of BH¯ . Finally, we state some related results by specifying univariate aging properties for the marginal survival functions ¯2. ¯ 1 and G G For our results, we consider some specific dependence families. Let S be the family of exchangeable t-seminorms or semi-copulas, i.e. the family of the functions that fulfill all the properties of the exchangeable copulas, except maybe the rectangular inequality (see e.g. [3]). Then, we consider the following families of semi-copulas introduced in [3]: 1 P+ := {S ∈ S|S(u, v) ≥ uv},
2 P+ :=
. S(u , v) S(u, v) , ∀ 0 < u < u S ∈ S| ≤ ≤ 1, ∀ 0 ≤ v ≤ 1 , u u
3 P+ := {S ∈ S|S(us, v) ≥ S(u, sv), ∀ 0 ≤ v ≤ u ≤ 1, 0 < s < 1} ,
(14)
(15)
(16)
Generalized Marshall-Olkin Models - F. Spizzichino & F. Suter
153
1 2 3 with P− , P− , P− families being defined in an obvious way by inverting the inequalities. Additionally we introduce the family defined as follows:
5 = {S ∈ S|S is T P2 }. P+
(17)
We recall that a function A : R × R → R+ is called T P2 (Totally Positive of Order 2) if A(x , y )A(x , y ) ≥ A(x , y )A(x , y ),
x < x , y < y .
4 4 The symbols P+ , P− were used in [3] to denote families that are not of interest for our purposes. We notice that pairs of exchangeable random variables (X, Y ) whose survival cop9 ula C or bivariate aging function B are in one of the above families, have some interesting dependence or bivariate aging properties:
9 of a pair (X, Y ) is in the family P 1 , then X and Y are • If the survival copula C + positive quadrant dependent P QD(X, Y ) [8]. 1 , then the • If the bivariate aging function B of a pair (X, Y ) is in the family P+ bivariate model has a property which can be interpreted as a bivariate notion of NBU [3]. 2 9 of a pair (X, Y ) is in the family P+ • If the survival copula C , then (X, Y ) has the property that Y is right tail increasing in X (RT I(Y |X)) [8]. 3 • If the bivariate aging function B of a pair (X, Y ) is in the family P+ , then the survival function of (X, Y ) is Schur-concave, a significant positive 2-aging property [3]. 5 9 of a pair (X, Y ) is in the family P+ • If the survival copula C , then (X, Y ) is right corner set increasing RCSI(X, Y ) [8]. 2.1. Dependence properties In this subsection we will analyze dependence properties of the model (1) starting from ¯ 1 and H ¯ 2 . We separately assume that survival assumptions on dependence properties of H 91 , C 92 belong to each of the families listed above and describe in each case the copulas C 9H¯ . corresponding implications holding for the survival copula C Proposition 1 91 , C 92 ∈ P i , for i = 1, 2, 5, then also C 9=C 91 G¯ ,G¯ C 92 ∈ P i . 1. If C + + 1 2 91 , C 92 ∈ P 3 and (12) holds, then also C 9=C 91 G¯ ,G¯ C 92 ∈ P 3 . 2. If C + + 1 2 Proof: 91 , C 92 ∈ P 1 We prove the first item of the proposition for i = 1. We suppose that C + and we recall that the PQD property of the survival copula of a pair of random variables (X, Y ) is equivalent with the PQD property of the joint survival function. Hence ¯ 1 (x)G ¯ 1 (y), ¯ 1 (x, y) ≥ G H
¯ 2 (x, y) ≥ G ¯ 2 (x)G ¯ 2 (y), H
∀ x, y ∈ R.
(18)
154
Relations among Aging and Stochastic Dependence
It follows that: ¯ ¯ 1 (x, y)H ¯ 2 (x, y) ≥ G ¯ 1 (x)G ¯ 1 (y)G ¯ 2 (x)G ¯ 2 (y) = G ¯ H¯ (x)G ¯ H¯ (y), H(x, y) = H
(19)
and this is equivalent to 9H¯ = C 91 G¯ ,G¯ C 92 ∈ P 1 . C + 1 2 2 91 , C 92 ∈ P+ 9H¯ (u, v) /u is a non-increasing and prove that C For i = 2 we assume that C function in u ∈ (0, 1] for all v ∈ [0, 1]. Let 0 < u < u ≤ 1 then
91 G¯ ,G¯ C 92 (u , v) 9H¯ (u , v) C C 2 1 = . u u
(20)
By definition of the operation G¯ 1 ,G¯ 2 it follows that: ## ## " " −1 # " −1 " " −1 # " −1 91 G ¯1 G 92 G ¯2 G ¯1 G ¯ ¯ (u ) , G ¯ ¯ (v) · C ¯2 G ¯ ¯ (u ) , G ¯ ¯ (v) 9H¯ (u , v) C C H H H H = ¯ −1 ¯ H¯ (G u G ¯ (u )) H ## ## " " −1 # " −1 " " −1 # " −1 92 G ¯1 G ¯ ¯ (u ) , G ¯ ¯ (v) ¯2 G ¯ ¯ (u ) , G ¯ ¯ (v) ¯1 G ¯2 G 91 G C C H H H H · = ¯ −1 ¯ −1 ¯ 1 (G ¯ 2 (G G G ¯ (u )) ¯ (u )) H
H
91 , C 92 ∈ P 2 , then we can write Taking into account the fact that C + # ## # ## " " −1 " −1 " " −1 " −1 92 G 91 G ¯1 G ¯2 G ¯1 G ¯ ¯ (u) , G ¯ ¯ (v) ¯2 G ¯ ¯ (u) , G ¯ ¯ (v) 9H¯ (u , v) C C C H H H H · ≤ ¯ −1 ¯ −1 ¯ 1 (G ¯ 2 (G u G G ¯ (u)) ¯ (u)) H H
=
9H¯ (u, v) C u
92 ∈ P 5 . For all x, y, x , y ∈ R, x ≤ x , y ≤ y , this is 91 , C For i = 5 we have that C + equivalent to: ¯ 1 (x, y)H ¯ 1 (x , y ) ≥ H ¯ 1 (x, y )H ¯ 1 (x , y) H ¯ 2 (x , y ) ≥ H ¯ 2 (x, y )H ¯ 2 (x , y) ¯ 2 (x, y)H H From (1) and the above two relations it follows that: ¯ ¯ , y) = H ¯ 1 (x, y)H ¯ 2 (x, y)H ¯ 1 (x , y )H ¯ 2 (x , y ) ≥ H(x, ¯ ¯ , y). H(x, y)H(x y )H(x For the proof of the second item of the proposition, taking into account (13), we can write for all 0 ≤ u < v ≤ 1 and 0 ≤ s ≤ 1 1 θ 1 1 θ θ 91 G¯ ,G¯ C 91 u θ+1 92 u θ+1 92 (us, v) = C C · s θ+1 , v θ+1 · C · s θ+1 , v θ+1 . 1 2
Generalized Marshall-Olkin Models - F. Spizzichino & F. Suter
155
91 , C 92 ∈ P 3 we have that From the assumption that C + 1 θ 1 1 θ θ 91 G¯ ,G¯ C 92 u θ+1 91 u θ+1 , v θ+1 92 (us, v) ≥ C θ+1 θ+1 · s θ+1 C · C · s , v 2 1 91 G¯ ,G¯ C 92 (u, vs). = C 1 2
2.2. Bivariate Aging Properties In this subsection we deal with sufficient conditions for some bivariate aging properties of bivariate models defined by (1). As was discussed in [3], some bivariate aging properties for a bivariate survival model are described by dependence properties for the corresponding function B. In this spirit, we have, for the aging function BH¯ of the model in (1), the following closure result: Proposition 2 1 1 then also BH¯ = B1 G¯ 1 ,G¯ 2 B2 ∈ P+ . 1. If B1 , B2 ∈ P+ 3 3 2. If B1 , B2 ∈ P+ then also BH¯ = B1 G¯ 1 ,G¯ 2 B2 ∈ P+ .
Proof: For the proof of the first item let u, v ∈ [0, 1]. Then 2 8 5 " # −1 ¯ ¯ B1 G¯ 1 ,G¯ 2 B2 (u, v) = exp −GH¯ Gi (− log Bi (u, v)) i=1 1 we obtain Using the fact that B1 , B2 ∈ P+ 2 5 8 # " −1 ¯¯ ¯ i (− log u · v) G B1 G¯ ,G¯ B2 (u, v) ≥ exp −G 1
2
H
i=1
# " ¯ H¯ (− log u · v) = u · v. ¯ −1 G = exp −G ¯ H To prove the second item we notice that for 0 ≤ v ≤ u ≤ 1, 0 ≤ s ≤ 1 we have "
5
#
B1 G¯ 1 ,G¯ 2 B2 (us, v) = exp
¯ −1 −G ¯ H
2
8 ¯ i (− log Bi (us, v)) G
i=1 3 we have Taking into account that B1 , B2 ∈ P+ 2 8 5 # " −1 ¯¯ ¯ i (− log Bi (u, vs)) G B1 G¯ ,G¯ B2 (us, v) ≥ exp −G 1
2
H
i=1
# = B1 G¯ 1 ,G¯ 2 B2 (u, vs) . "
156
Relations among Aging and Stochastic Dependence
So far in the present note, we did not make any special assumptions concerning the ¯ 2 . However some further statements can be obtained by combining our ¯ 1, G marginals G results above with Propositions 5.2 and 5.3 in [3], where the authors prove some relations existing among dependence properties, bivariate aging properties and univariate aging properties for a same bivariate survival model. In particular we can obtain Corollary 1 1 1 92 ∈ P+ ¯1, G ¯ 2 be NBU. Then C 9H¯ ∈ P+ ¯ H¯ is NBU, and 91 , C and G , G 1. Let C 1 BH¯ ∈ P+ . 91 , C 92 ∈ P 2 , and G ¯1, G ¯ 2 be IFR. Then also B1 , B2 ∈ P 2 . 2. Let C + + 1 1 ¯ ¯ 2 be NWU then BH¯ ∈ P+ ¯ H¯ is NWU, and 3. Let B1 , B2 ∈ P+ and G1 , G , G 9H¯ ∈ P 1 . C + 3 3 ¯1, G ¯ 2 be DFR. Then BH¯ ∈ P+ ¯ H¯ is DFR, and 4. Let B1 , B2 ∈ P+ and G , G 3 9 CH¯ ∈ P+ .
Taking into account the cited results, the proof of the above Corollary is rather straightforward and will be omitted. Some other results in the same direction can also be obtained by replacing the conditions above by the corresponding dual conditions (e.g. by 1 1 ¯ 91 , C 92 ∈ P+ 91 , C 92 ∈ P− ¯ 2 NBU with G ¯1, G ¯ 2 NWU, G ¯ H¯ replacing, in 1., C with C , G1 , G 1 ¯ H¯ NWU and, finally, BH¯ ∈ P with BH¯ ∈ P 1 ) or by imposing conditions NBU with G + − of the T P2 -type. 3. Discussion The very well-known Marshall-Olkin model is a joint survival model that describes a particular situation of competing risks for two dependent units. This model is characterized by two independent risks acting independently on each unit and a (third) common risk acting on the two units simultaneously. From a probabilistic point of view, the risks are described by three exponential lifetimes with parameters, say, λ1 , λ2 , λ12 , respectively. This model is exchangeable when λ1 = λ2 . In this paper we generalized the exchangeable Marshall-Olkin model in the following way: instead of three risks, we considered two similar pairs of risks, not necessarily with exponential laws, each pair affecting one of the two units. Each pair consists of two independent risks. There is however a correlation between the homologous risks acting on the two different components and this correlation is described by the exchangeable ¯ 1 and H ¯ 2. bivariate distributions H As the Marshall-Olkin model has some special properties of positive dependence, we were interested for our model in some dependence and aging properties of the bi¯ of the failure times of the two units. We assumed that H ¯ 1 and variate distribution H ¯ H2 have some dependence and aging properties and we analyzed how these properties ¯ We identified some conditions under which properties for reflect on the properties of H. ¯ ¯ ¯ the models H1 and H2 are preserved also for H. Taking into account the way it was built, our generalized Marshall-Olkin model can be easily connected to a reliability experiment. However, the same model can find natural and interesting applications in the study of financial risk, in particular in the analysis of
Generalized Marshall-Olkin Models - F. Spizzichino & F. Suter
157
bankruptcy of twin companies in a same market. In this respect, it can be interesting to ¯ traced here, and situations analyze the relations between the dependence analysis of H, of Default Contagion (see e.g. [7]) for the two companies. An extension to the cases ¯ i (i = 1, 2) is not exchangeable could also be the subject to further research. when H
References [1]
R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, To Begin With, Silver Spring, MD, 1981. [2] B. Bassan and F. Spizzichino, Dependence and multivariate aging: the role of level sets of the survival function, in Y. Hayakawa and T. Irony and M. Xie (Eds.) System and Bayesian Reliability, World Sci. Publ., River Edge, NJ, pp 229–242, 2001. [3] B. Bassan and F. Spizzichino, Relations among univariate aging, bivariate aging and dependence for exchangeable lifetimes, J. Multivariate Anal. 93 (2005) 313–339. [4] T. Bedford and R. M. Cooke, Probabilistic Risk Analysis: Foundations and Methods, Cambridge University Press, 2001. [5] M. J. Crowder, Classical competing risks, Chapman and Hall/CRC, 2001. [6] A. W. Marshall and I. Olkin, A generalized bivariate exponential distribution, J. Appl. Probability 4 (1967), 291–302. [7] A. J. McNeil, R. Frey and P. Embrechts, Quantitative Risk Management: Concepts, Techniques, and Tools, Princeton Series in Finance, Princeton University Press, Princeton, NJ, 2005. [8] R. B. Nelsen, An Introduction to Copulas, Springer, New York, 1999.
158
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
New Multivariate IFR and DMRL Notions for Exchangeable Dependent Components1 Félix BELZUNCE a,2 , Julio MULERO b and José-María RUIZ b Dpto. Estadística e Investigación Operativa, Universidad de Murcia, Spain b Dpto. Estadística e Investigación Operativa, Universidad de Murcia, Spain a
Abstract. In the literature several authors have proposed multivariate extensions of univariate aging notions such as IFR (increasing failure rate) and DMRL (decreasing mean residual life) notions. Bassan and Spizzichino [2] and Bassan, Kochar and Spizzichino [1] proposed new multivariate notions when the lifetimes of the components have exchangeable joint probability distributions. These new notions are based on stochastic comparisons of the residual lifetimes of the components and are based on definitions and characterizations of IFR and DMRL notions in the univariate case. In these paper we consider new multivariate notions based on known characterizations (see Cao and Wang [7], Belzunce, Hu and Khaledi [4] and Belzunce, Gao, Hu and Pellerey [3]) of IFR and DMRL notions. Some properties and preservation results under mixtures are also given. Keywords. Increasing failure rate, decreasing mean residual life, aging, stochastic orders, exchangeable distributions, dependence
Introduction In the literature several authors have proposed multivariate extensions of univariate aging notions such as IFR (increasing failure rate) and DMRL (decreasing mean residual life). Bassan and Spizzichino [2] and Bassan, Kochar and Spizzichino [1] proposed new multivariate notions when the lifetimes of the components have exchangeable joint probability distributions. These new notions are based on stochastic comparisons of the residual lifetimes of the components and on definitions and characterizations of IFR and DMRL notions in the univariate case. In this paper we propose and study new multivariate aging notions for exchangeable and possibly dependent components based on other characterizations of the IFR and DMRL notions. In Section 2 we give the definitions and ideas on stochastic orders and bivariate aging notions on which our new notions are based. In Section 3 we give the definition and some properties of these new notions. We deal only with the bivariate case and the ideas given here can be easily extended to the multivariate case following [2]. Throughout the paper we assume that the random variables are non-negative. 1 Supported
by Ministerio de Educación y Ciencia under Grant MTM2006-12834 and Fundación Séneca. Author: Dpto. Estadística e Investigación Operativa, Universidad de Murcia , Campus de Espinardo, 30100 Espinardo (murcia), SPAIN; E-mail:
[email protected]. 2 Corresponding
New multivariate IFR and DMRL notions - F. Belzunce et al.
159
1. Preliminaries on Stochastic Orders and Characterizations of Aging Classes First we recall the definitions of some stochastic orders that will be used in this paper. The reader can look at Shaked and Shanthikumar [9] for details on stochastic orders. Definition 1. Given two random variables X and Y , with distribution functions F and G respectively and survival functions F ≡ 1 − F and G ≡ 1 − G, we say that a) X is smaller than Y in the stochastic order, denoted by X ≤st Y , if E[φ(X)] ≤ E[φ(Y )], for all increasing functions φ for which the expectations exist. b) X is smaller than Y in the hazard rate order, denoted by X ≤hr Y , if F (x)G(y) ≥ F (y)G(x), for all x ≤ y. c) X is smaller than Y in the mean residual life order, denoted by X ≤mrl Y , if +∞ x
F (u)du
F (x)
+∞ ≤
x
G(u)du
G(x)
for all x.
d) X is smaller than Y in the increasing convex order, denoted by X ≤icx Y , if E[φ(X)] ≤ E[φ(Y )], for all increasing convex functions φ for which the expectations exist. e) X is smaller than Y in the increasing concave order, denoted by X ≤icv Y , if E[φ(X)] ≤ E[φ(Y )], for all increasing concave functions φ for which the expectations exist. f) X is smaller than Y in the Laplace transform order, denoted by X ≤Lt Y , if E[e−sX ] ≥ E[e−sY ], for all s ≥ 0. Among these stochastic orders we have the following relationships. X ≤hr Y ↓ X ≤mrl Y
→ →
X ≤st Y ↓ X ≤icx Y
→
X ≤icv Y → →
X ≤Lt Y ↓ E[X] ≤ E[Y ]
(1)
The previous stochastic orders have been used to provide characterizations of the IFR and DMRL notions (see Belzunce and Shaked [6]). We recall that a non negative random variable X, with survival function F , is said to be increasing failure rate (denoted by X or F is IFR) if F (t + x)/F (t) is decreasing in t ≥ 0 for all x ≥ 0. A random variable X is said to be decreasing mean residual life (denoted by X or F is DMRL) +∞ F (u)du/F (t) is decreasing in t. Given a random variable we also denote by if t Xt ≡ {X − t|X > t} the additional residual lifetime at time t. For these aging notions it is possible to find the following characterizations: • X is said to be IFR if and only if, one of the following equivalent conditions holds i) {X − t|X > t} ≥st {X − t |X > t } for all t ≤ t . ii) {X − t|X > t} ≥hr {X − t |X > t } for all t ≤ t .
160
Relations among Aging and Stochastic Dependence
• X is said to be DMRL if and only if, one of the following equivalent conditions holds iii) E{X − t|X > t} ≥ E{X − t |X > t } for all t ≤ t . iv) {X − t|X > t} ≥mrl {X − t |X > t } for all t ≤ t . Bassan and Spizzichino [2] and Bassan, Kochar and Spizzichino [1] proposed extensions of these notions in the multivariate case when the lifetimes of the components have exchangeable joint probability distributions. A joint probability distribution F is said to be exchangeable if F is permutation-invariant. The starting point for their proposal is the following. Based on characterization i), it is possible to prove that given two independent and identically distributed random lifetimes T1 and T2 , then T1 (and T2 ) is IFR if, and only if, for all t1 ≤ t2 {T1 − t1 |T1 > t1 , T2 > t2 } ≥st {T2 − t2 |T1 > t1 , T2 > t2 }. A similar characterization holds for the IFR when replacing the stochastic order by the hazard rate order (from ii)) and a similar one holds for the DMRL notion replacing the stochastic order by the comparisons of expectations or by the mean residual life order (from iii) and iv)). Dropping the independence, but keeping exchangeability, leads Bassan and Spizzichino [2] and Bassan, Kochar and Spizzichino [2] to propose new multivariate notions of aging, as follows. Definition 2. Let (T1 , T2 ) be an exchangeable random vector then a) (T1 , T2 ) is said to have BIFR distribution if for t1 ≤ t2
{T1 − t1 |T1 > t1 , T2 > t2 } ≥st {T2 − t2 |T1 > t1 , T2 > t2 }. b) (T1 , T2 ) is said to have s-BIFR (in the strong sense) distribution if for t1 ≤ t2
{T1 − t1 |T1 > t1 , T2 > t2 } ≥hr {T2 − t2 |T1 > t1 , T2 > t2 }. c) (T1 , T2 ) is said to have w-BDMRL (in the weak sense) distribution if for t1 ≤ t2
E{T1 − t1 |T1 > t1 , T2 > t2 } ≥ E{T2 − t2 |T1 > t1 , T2 > t2 }. d) (T1 , T2 ) is said to have s-BDMRL (in the strong sense) distribution if for t1 ≤ t2
{T1 − t1 |T1 > t1 , T2 > t2 } ≥mrl {T2 − t2 |T1 > t1 , T2 > t2 }. Following these ideas we propose in the next section new bivariate aging notions based on other characterizations of the IFR and DMRL aging notions.
New multivariate IFR and DMRL notions - F. Belzunce et al.
161
2. New Bivariate Aging Notions First we recall some other characterizations of the IFR and DMRL aging notions. • X is IFR if, and only if, one of the following equivalent conditions holds v) {X − t|X > t} ≥icv {X − t |X > t } for all t ≤ t (see Belzunce, Hu and Khaledi [4]). vi) {X − t|X > t} ≥Lt {X − t |X > t } for all t ≤ t (see Belzunce, Gao, Hu and Pellerey [3]). • X is DMRL if, and only if, vii) {X − t|X > t} ≥icx {X − t |X > t } for all t ≤ t (see Cao and Wang [7]). Therefore following Bassan and Spizzichino [2] and Bassan, Kochar and Spizzichino [1], and from characterizations v) vi) and vii), we propose the following new definitions. Definition 3. Let (T1 , T2 ) be an exchangeable random vector then a) (T1 , T2 ) is said to have w-BIFR (in the weak sense) distribution if for t1 ≤ t2 {T1 − t1 |T1 > t1 , T2 > t2 } ≥icv {T2 − t2 |T1 > t1 , T2 > t2 }. b) (T1 , T2 ) is said to have BIFR(Lt) (in the Lt sense) distribution if for t1 ≤ t2 {T1 − t1 |T1 > t1 , T2 > t2 } ≥Lt {T2 − t2 |T1 > t1 , T2 > t2 }. c) (T1 , T2 ) is said to have BDMRL distribution if for t1 ≤ t2 {T1 − t1 |T1 > t1 , T2 > t2 } ≥icx {T2 − t2 |T1 > t1 , T2 > t2 }. Based in (1) we have the following relationships among the new bivariate aging notions and the previous ones. s − BIF R ↓ s − BDM RL
→ →
BIF R ↓ BDM RL
→
w − BIF R → →
BIF R(Lt) ↓ w − BDM RL
Next we describe some sufficient conditions for the new notions. Theorem 1. Let (T1 , T2 ) be an exchangeable random vector. If a) {T1 − x|T1 > x, T2 > y} is increasing in y in the icx order, for all x, and b) {T1 |T2 > y} is DMRL for all y then (T1 , T2 ) is BDMRL. Proof. Let t1 ≤ t2 , then we have the following chain of implications: {T1 − t1 |T1 > t1 , T2 > t2 } ≥icx {T1 − t2 |T1 > t2 , T2 > t2 } ≥icx {T1 − t2 |T1 > t2 , T2 > t1 } = {T2 − t2 |T1 > t1 , T2 > t2 },
162
Relations among Aging and Stochastic Dependence
where the first inequality follows from b) and the characterization vii) of the DMRL aging notion. The second one follows from a) and the exchangeability of the components. Under similar arguments we have the following results. Theorem 2. Let (T1 , T2 ) be an exchangeable random vector. If a) {T1 − x|T1 > x, T2 > y} is increasing in y in the icv order, for all x, and b) {T1 |T2 > y} is IFR for all y then (T1 , T2 ) is w-BIFR. Theorem 3. Let (T1 , T2 ) be an exchangeable random vector. If a) {T1 − x|T1 > x, T2 > y} is increasing in y in the Lt order, for all x, and b) {T1 |T2 > y} is IFR for all y then (T1 , T2 ) is BIFR(Lt). Next, we provide some results about aging properties for the marginals of the new bivariate aging notions. First we recall the following univariate aging notions. Definition 4. Let X be a non negative random variable then a) X is said to be NBUC, if X ≥icx {X − t|X > t}, for all t > 0 (see Cao and Wang [7]). b) X is said to be NBU(2), if X ≥icv {X − t|X > t}, for all t > 0 (see Deshpande, Kochar and Singh [8]). c) X is said to be NBULt , if X ≥Lt {X − t|X > t}, for all t > 0 (see Belzunce, Ortega and Ruiz [5]). Now we can give the following results. Theorem 4. Let (T1 , T2 ) be an exchangeable non negative random vector. If a) (T1 , T2 ) is BDMRL [w-BIFR,BIFR(Lt)], and b) {T1 |T2 > y} ≤icx[icv;Lt] T1 then T1 is NBUC [NBU(2),NBULt ]. Proof. We give the proof for the NBUC case. The NBU(2) and NBULt cases are similar. Given that (T1 , T2 ) is BDMRL and exchangeable, then, for t2 ≥ t1 ≥ 0, we have {T1 − t2 |T1 > t2 , T2 > t1 } ≤icx {T1 − t1 |T1 > t1 , T2 > t2 }. Now letting t1 = 0, we have the following chain of implications {T1 − t2 |T1 > t2 } ≤icx {T1 |, T2 > t2 } ≤icx T1 , where the second inequality follows from condition b), therefore T1 is NBUC. To finish we provide a result about preservation under mixtures. Let Θ be a random vector, with support χ ⊆ Rn , and Π its distribution. Let us consider that given Θ = θ, T1 and T2 are independent with common survival function G(u | θ), then the joint survival function of (T1 , T2 ), is exchangeable and is given by
New multivariate IFR and DMRL notions - F. Belzunce et al .
163
G(t1 | θ)G(t2 | θ)dΠ(θ).
(2)
χ
Theorem 5. If (T1 | Θ = θ) (and (T2 | Θ = θ)) is DMRL for all θ ∈ χ, then (T1 , T2 ) is BDMRL. Proof. First we observe that given two random variables X and Y , with survival func +∞ +∞ tions F and G, then X ≤icx Y if and only if x F (u)du ≤ x G(u)du, for all x. Therefore and from previous characterization, the result follows if we prove that
∞
F (u, t2 )du −
x+t1
∞
F (u, t1 )du ≥ 0 for all t2 ≥ t1 ≥ 0 and for all x ≥ 0.
x+t2
Now from (2) and assuming that conditions of Fubini’s theorem hold the previous inequality can be rewritten as 1 G(t2 | θ) χ
∞
x+t1
G(u | θ) − G(t1 | θ)
∞
2 G(u | θ) dΠ(θ) ≥ 0,
x+t2
which follows from the hypothesis and from characterization vii).
3. Discussion In this paper we present some new definitions of bivariate aging notions (they can be easily extended to the multivariate case) for exchangeable random vectors that complete some other definitions given previously in the literature. These bivariate aging notions extend some univariate aging notions in the sense of decreasing monotonicity of residual lives in the increasing concave order, Laplace transform order and increasing convex order. These new notions can be related to some positive dependence notions. For example condition a) in Theorems 1, 2 and 3 can be considered as a positive aging notion, in the sense that high survival times of one of the components are associated to high residual lifetimes of the other component. Jointly with condition b) we have sufficient conditions for the new bivariate aging notions. Of course some work is still to be done. One major topic for these notions is to study these bivariate aging notions for some well known families of copulas, such as Archimedian copulas.
References [1] [2] [3] [4] [5]
B. Bassan, S.C. Kochar and F. Spizzichino, Some bivariate notions of IFR and DMRL and related properties, Journal of Applied Probability, 39 (2002), 533–544. B. Bassan and F. Spizzichino, Stochastic comparisons for residual lifetimes and Bayesian notions of multivariate aging, Advances in Applied Probability 31 (1999), 1078–1094. F. Belzunce, X. Gao, T. Hu and F. Pellerey, Characterization of the hazard rate order and IFR aging notion, Statistics & Probability Letters 70 (2004), 235–242. F. Belzunce, T. Hu and B.-E. Khaledi, Dispersion type variability orders, Probability in the Engineering and Informational Sciences 17 (2003), 305–334. F. Belzunce, E. Ortega and J.M. Ruiz, The Laplace order and ordering of residual lives, Statistics & Probability Letters 42, 145–156.
164 [6]
[7] [8] [9]
Relations among Aging and Stochastic Dependence F. Belzunce and M. Shaked, Stochastic orders and aging notions, in Encyclopedia of Statistics in Quality and Reliability, Ruggeri, F., Kenett, R. and Faltin, F.W. (Eds.), John Wiley & Sons Ltd, Chichester, UK, (2007), 1931–1935. J. Cao and Y. Wang, The NBUC and NBWUC classes of life distributions, Journal of Applied Probability 28 (1991), 473–479. J.V. Deshpande, S.C. Kochar and H. Singh, Aspects of positive aging, Journal of Applied Probability 23 (1986), 748–758. M. Shaked and J.G. Shanthikumar, Stochastic Orders, Springer, New York, (2007).
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
165
Advances in Bayesian Software Reliability Modeling Fabrizio RUGGERI a,1 , and Refik SOYER b a CNR IMATI, Milano, Italy b Department of Decision Sciences, The George Washington University, USA Abstract. This paper reviews recent developments in Bayesian software reliability modeling. In so doing, emphasis is given to two models which can incorporate the case of reliability deterioration due to potential introduction of new bugs to the software during the development phase. Since the introduction of bugs is an unobservable process, latent variables are introduced to incorporate this characteristic into the models. The two models are based, respectively, on a hidden Markov model and a self-exciting point process with latent variables. Keywords. Hidden Markov models, reliability growth, Markov chain Monte Carlo, self-exciting point process
Introduction Many papers have been published on software reliability during the last several decades; see Jelinski and Moranda [5] and Musa and Okumoto [8] as examples of early work. Bayesian methods have been widely used in this field as discussed in Singpurwalla and Wilson [10]. In this paper we plan to review some of the Bayesian models introduced recently focusing especially on our ongoing research. We present two models that are motivated by potential introduction of new bugs to the software when fixing the current ones. The first model, based on a hidden Markov chain, assumes that times between failures are exponentially distributed with parameters depending on an unknown latent state variable which, in turn, evolves as a Markov chain. The second model considers a self-exciting point process whose intensity might increase each time a bug is attempted to be fixed. Unobserved outcomes of latent Bernoulli random variables are introduced to model the possible introduction of new bugs and the consequent increase in the intensity function of the process. Both models take in account the possibility of not knowing if a new bug has been added at each stage and they can be applied not only to model the failure process but also to infer if new bugs were introduced at different testing stages. In Section 1 we will review some of the earlier work on potential introduction of new bugs to the software during the debugging phase. In Section 2 we will describe the hidden Markov model (HMM) and will apply it to the Jelinski and Moranda’s Naval Tactical data and Musa’s System 1 data in Section 3. The self-exciting point process 1 Corresponding Author: Fabrizio Ruggeri, CNR IMATI, Via Bassini 15, I-20133 Milano, Italy; E-mail:
[email protected].
166
Theoretical Advances in Modeling, Inference and Computation
(SEP) with latent variables will be described in Section 4. Discussion on current research will be presented in Section 5.
1. Earlier Work on Imperfect Debugging Although the models introduced in this paper are novel ones, possibility of imperfect debugging and introduction of new bugs during software testing have been considered in earlier papers. Here we review some of them. Gaudoin, Lavergne and Soler [4] considered failures at times T1 < . . . < Tn and modeled the interfailure times with independent exponential distributions. In particular, they took Ti − Ti−1 ∼ E(λi ), i = 1, . . . , n. with λi+1 = λi e−θi ,
(1)
where λi and θi , i = 1, . . . , n, are nonnegative. From (1), it is clear that the parameter θi plays a relevant role in describing the effect of the intervention during software testing. If θi = 0, then there is no debugging effect on software reliability, which increases (decreases) if θi > 0 (θi < 0). The latter case is due to introduction of new bugs to the software. A slightly modified version of this model was proposed by Gaudoin [3], who considered λi+1 = (1 − αi − βi )λi + μβi , for modeling the more realistic case where intervention at each stage may introduce new bugs while fixing the existing ones at the same time. The effect of the positive intervention is modeled by α, whereas β is used for the negative one. A different model to address the same issue was originally proposed by Kremer [6] who considered a birth-death process X(t) denoting the number of bugs in the software at time t. Starting with X(0) = a, then pn (t) = Pr{X(t) = n} is obtained as the solution of the differential equation
pn (t) = (n − 1)ν(t)pn−1 (t) − n[ν(t) + μ(t)]pn (t) + (n + 1)μ(t)pn+1 (t), n ≥ 0, with p−1 ≡ 0 and pn (0) = 1(n = a), where 1(·) is the indicator function. Here ν(t) (birth rate) and μ(t) (death rate) denote, respectively, the rate of introduction of new bugs and the rate of fixing of old ones. More recently, Durand and Gaudoin [2] considered a hidden Markov model similar to the one we introduce in Section 2, but they considered non-Bayesian approach and used an EM algorithm to obtain maximum likelihood estimates. They applied the Bayesian information criterion (BIC) to choose among models with different number of states of the hidden process.
Advances in Bayesian Software Reliability Modeling - F. Ruggeri & R. Soyer
167
2. A Hidden Markov Model for Software Failures We assume that, during the testing stages, the failure rate of the software is governed by a latent process Y . Let Yt denote the state of the latent process at time t and, given the state at time t is i, assume that, Xt , the failure time for period t follows an exponential model given by Xt |Yt = i ∼ E(λ(i)). The states of the latent process reflect the effectiveness of the interventions, i.e. the design changes, to the software prior to the t-th stage of testing. The failure rate of the software depends on this latent random variable. We assume that the latent process Y = {Yt : t ≥ 1} is a Markov chain with a transition matrix P on a finite state space E = {1, . . . , k}. Given the latent process, we assume that Xt ’s are conditionally independent, that is, π(X1 , X2 , . . . , Xn |Y ) =
n
π(Xt |Y ).
t=1
In the Bayesian setup we assume that the transition matrix P and the failure rate λ(i), for i = 1, . . . , k, are all unknown quantities. For the components of the transition matrix, it is assumed that Pi = (Pi1 , . . . , Pik ), i = 1, . . . , k, i.e. the i-th row of P , follows a Dirichlet distribution Dir(αi1 , . . . , αik ), as π(Pi ) ∝
k
α −1
Pij ij
(2)
j=1
with parameters αij , i, j = 1, . . . , k, and such that the Pi ’s are independent of each other. For a given state i = 1, . . . , k, we assume a Gamma prior λ(i) ∼ G(a(i), b(i)), with independent λ(i)’s. If software failures are observed for n testing stages, then, given the observed data x(n) = (x1 , x2 , . . . , xn ), we are interested in the joint posterior distribution of all unknown quantities Θ =(λ(n) , P , Y (n) ), where λ(n) = (λ(1), . . . , λ(n)), and Y (n) = (Y1 , . . . Yn ). It is not computationally feasible to evaluate the joint posterior distribution of Θ in closed form. However, we can use a Gibbs sampler to draw samples from the joint posterior distribution. The likelihood function is L(Θ; x(n) ) =
n
λ(Yt )e−λ(Yt ) xt
t=1
and the posterior distribution is given by π(Θ|x(n) ) ∝
n t=1
PYt−1 ,Yt λ(Yt )e−λ(Yt ) xt
k i=1
π(Pi ) [λ(i)]a(i)−1 e−b(i)λ(i) ,
168
Theoretical Advances in Modeling, Inference and Computation
where π(Pi ) is given by (2). The implementation of the Gibbs sampler requires draws from the full conditional distributions of the unknown quantities, that is, the components of Θ. We first note that, given Y (n) , the full conditional distribution of the elements of P can be obtained as Pi |Y (n) ∼ Dir{αij +
n
1(Yt = i, Yt+1 = j); j ∈ E}
(3)
t=1
where 1(·) is the indicator function and, given Y (n) , Pi ’s are obtained as independent Dirichlet vectors. Given Y (n) , they are also independent of other components of Θ. The full conditional posterior distribution of λ(i)’s can be obtained as λ(i)|Y (n) , x(n) ∼ G(a∗ (i), b∗ (i))
(4)
where a∗ (i) = a(i) +
n
1(Yt = i)
t=1
and b∗ (i) = b(i) +
n
1(Yt = i) xt .
t=1
Finally, we can show that the full conditional posterior distributions of Yt ’s are given by π(Yt |Y (−t) , λ(Yt ), x(n) , P ) ∝ PYt−1 ,Yt λ(Yt )e−λ(Yt ) xt PYt ,Yt+1
(5)
where Y (−t) = {Ys ; s = t}. Note that the above is a discrete distribution with constant of proportionality given by PYt−1 , j λ(j) e−λ(j) xt Pj, Yt+1 . j∈E
Thus, we can draw a posterior sample from π(Θ|x(n) ) by iteratively drawing from the given full conditional posterior distributions. If we start with an initial value of the (n) states, say, Y0 , then we can update the probability transition matrix via (3). Then, given (n) the data and Y0 , we can draw the failure rates independently using (4). Given these values, we can use (5) to draw a new sample for the states. We can repeat these iterations many times to obtain a joint posterior sample. Posterior predictive distribution of Xn+1 , after observing x(n) , is given by π(Xn+1 |λ(j)) PYn ,j π(Θ| x(n) ) dΘ, π(Xn+1 |x(n) ) = j∈E
which can be approximated as a Monte Carlo integral via π(Xn+1 |x(n) ) ≈
G 1 g π(Xn+1 |λg (Yn+1 )), G g=1
Advances in Bayesian Software Reliability Modeling - F. Ruggeri & R. Soyer
169
g where Yn+1 is sampled given the posterior sample Yng , using Dirichlet probabilities PY g given by (3).
3. Analysis of Software Reliability Data We next illustrate the use of the HMM by applying it to two well known datasets, the Jelinski and Moranda’s Naval Tactical data and Musa’s System 1 data. 3.1. Jelinski-Moranda data The data, presented in Jelinski and Moranda [5], consists of 34 failure times (in days) of a large military system, and is referred to as the Naval Tactical Data System (NTDS). In the analysis of the NTDS data, we consider two possible states for Yt , i.e. E = {1, 2} and assume uniform distributions for the rows Pi , i = 1, 2, of the transition matrix. We describe uncertainty about the λ’s, by considering diffuse priors λ(i) ∼ G(0.01, 0.01), i = 1, 2. Gibbs sampler was run for 5000 iterations and no convergence problems were observed. In what follows we present the posterior results for major quantities of interest as illustrated by plots and tables. Posterior Distribution of Lambda[2]
0
0
2
20
4
40
6
60
8
Posterior Distribution of Lambda[1]
0.0
0.1
0.2
0.3
Lambda[1]
0.4
0.5
0.0
0.02
0.04
0.06
0.08
0.10
Lambda[2]
Figure 1. Posterior distributions of λ(1) and λ(2).
In Figure 1 we present the posterior distributions of λ1 and λ2 . As can be seen from Figure 1, the posterior distribution of λ1 is concentrated at higher values than that of λ2 implying that environment 1 is the less desirable of the two environments. In other words, it represents the environment with higher failure rates and smaller expected time to failures. Posterior distributions of transition probabilities are presented in Figure 2. We can see from Figure 2 that the process Yt tends to stay in environment 1 (compared to environment 2) from one testing stage to the next one. This is implied by the posterior distribution of P11 which is concentrated around values that are higher than 0.6. Posterior
170
Theoretical Advances in Modeling, Inference and Computation
4 3 2 1 0
0
1
2
3
4
5
Posterior Distribution of P[1,2]
5
Posterior Distribution of P[1,1]
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
P[1,1]
0.6
0.8
1.0
P[1,2]
4 3 2 1 0
0
1
2
3
4
5
Posterior Distribution of P[2,2]
5
Posterior Distribution of P[2,1]
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
P[2,1]
0.6
0.8
1.0
P[2,2]
Figure 2. Posterior distributions of transition probabilities.
predictive distribution of the next time to failure, that is, the distribution of X35 is shown in Figure 3. As we can see from the predictive density, the next time to failure is expected within few days.
0.0
0.01
0.02
0.03
0.04
0.05
0.06
Posterior Predictive Density of X[35]
0
50
100
150
200
x[35]
Figure 3. Predictive distribution of 35-th observation.
Table 1 presents the posterior distributions of the environment 1 for time periods, t = 1, . . . , 34 as well as the observed time to failures for the periods. As we can see from the Table the posterior probability of the ”bad” environment (i.e. environment 1) decreases as we observe longer failure times.
Advances in Bayesian Software Reliability Modeling - F. Ruggeri & R. Soyer
171
Table 1. Posterior probabilities of state 1 over time. t
Xt
P (Yt = 1|D)
t
Xt
P (Yt = 1|D)
t
Xt
P (Yt = 1|D)
1
9
0.8486
2
12
0.8846
3
11
0.9272
4
4
0.9740
5
7
0.9792
6
2
0.9874
7
5
0.9810
8
8
0.9706
9
5
0.9790
10
7
0.9790
11
1
0.9868
12
6
0.9812
13
1
0.9872
14
9
0.9696
15
4
0.9850
16
1
0.9900
17
3
0.9886
18
3
0.9858
19
6
0.9714
20
1
0.9584
21
11
0.7100
22
33
0.2036
23
7
0.3318
24
91
0.0018
25
2
0.6012
26
1
0.6104
27
87
0.0020
28
47
0.0202
29
12
0.2788
30
9
0.2994
31
135
0.0006
32
258
0.0002
33
16
0.1464
34
35
0.0794
3.2. Musa’s System 1 data We next consider the System 1 data of Musa [7] which consists of 136 software failure times. As in the case of the Jelinski-Moranda data, we consider only two states for Yt , and assume uniform distributions for the row vectors Pi of the transition matrix, and the same diffuse gamma distributions for the λ’s. As before 5000 iterations of the Gibbs sampler was run and this led to convergence for all the quantities. The posterior analysis for the major quantities of interest will be presented in the sequel using few plots.
0
1000
2000
3000
4000
5000
6000
Time Series Plot of Failure Times
0
20
40
60
80
100
120
140
Period
Figure 4. Failure times.
From Figure 4, we can see that the times between failures tend to increase over time implying an overall reliability growth. The posterior distributions of the λ1 and λ2 are presented in Figure 5. We can see from Figure 5 that the posterior distribution of λ1 is concentrated around lower values than that of λ2 . Thus environment 1 is the more desir-
172
Theoretical Advances in Modeling, Inference and Computation
able of the two environments, that is, it represents the environment with smaller failure rates and larger expected time to failures. In Figure 6 we present the posterior distributions of transition probabilities. We can see from the figure that the process Yt tends to stay in the same state from one testing stage to the next one. Posterior predictive distribution of the next time to failure, that is, the distribution of X137 is shown in Figure 7. As can be seen from the figure the time to the next failure in this case has more variability than the one in the Jelinski-Moranda data shown in Figure 3. Posterior Distribution of Lambda[2]
0
0
500
100
1000
1500
200
2000
2500
300
3000
Posterior Distribution of Lambda[1]
0.0004
0.0008
0.0012
0.0016
0.0
0.005
Lambda[1]
0.010
0.015
0.020
Lambda[2]
Figure 5. Posterior distributions of λ(1) and λ(2).
8 6 4 2 0
0
2
4
6
8
10
Posterior Distribution of P[1,2]
10
Posterior Distribution of P[1,1]
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
P[1,1]
0.6
0.8
1.0
P[1,2]
8 6 4 2 0
0
2
4
6
8
10
Posterior Distribution of P[2,2]
10
Posterior Distribution of P[2,1]
0.0
0.2
0.4
0.6 P[2,1]
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
P[2,2]
Figure 6. Posterior distributions of transition probabilities.
In Figure 8 we present the posterior probabilities P (Yt = 1|D) for the "good" environment, that is, for environment 1, for time periods t = 1, . . . , 136. As we can see from
Advances in Bayesian Software Reliability Modeling - F. Ruggeri & R. Soyer
173
0.0
0.0002 0.0004 0.0006 0.0008 0.0010 0.0012
Posterior Predictive Density of X[137]
0
1000
2000
3000
4000
5000
x[137]
Figure 7. Predictive distribution of 137-th observation.
0.0
0.2
0.4
0.6
0.8
1.0
Time Series Plot of Posterior Probabilities of Y(t)=1
0
20
40
60
80
100
120
140
Period
Figure 8. Posterior probability of Yt = 1.
the figure, the posterior probability is rather low for most of the first 80 testing stages implying that modifications which are made to the software during these stages have not improved the reliability from one period to the next. On the other hand, the posterior probabilities for environment 1 wander around values higher than 0.85 for most of the stages implying the improvement in the reliability achieved during the later stages. We note that as in the case of the Jelinski-Moranda data, the higher posterior probabilities in Figure 8 are associated with longer failure times shown in Figure 4.
174
Theoretical Advances in Modeling, Inference and Computation
4. Self-exciting Point Process with Latent Variables Self-exciting point processes have an important role in software reliability since they can be used to unify existing models into a unique class as shown by Chen and Singpurwalla [1]. In this section, we consider a self-exciting process with latent variables that enables us to infer if a new bug has been introduced at each testing stage and the process intensity has increased. We consider a non-homogeneous Poisson process (NHPP) with intensity function μ(t) to describe the behavior of the software when no bugs are added at each testing phase. We assume that the intensity is modified by positive valued functions g(t − ti ) at each testing phase i as a consequence of the introduction of a new bug. We introduce Bernoulli random variables Zj ’s to describe the introduction of a new bug during the i-th testing phase. As a consequence we consider a self-exciting point process (SEP) with latent variables with intensity N (t− )
λ(t) = μ(t) +
Zj gj (t − tj ),
j=1
where μ(t) is the intensity of process without introduction of new bugs and N (t− ) is the number of failures right before t, t1 < t2 < . . . < tn are the failures in (0, T ]. The latent variable Zj = 1 if a bug is introduced after the j–th failure and Zj = 0 otherwise, and the function gj (u) ≥ 0 for u > 0 and = 0 otherwise. Under these assumptions the likelihood function is given by L(θ; t(n) , Z (n) ) = (n) f (t |Z (n) , θ)f (Z (n) |θ), where t(n) = (t1 , t2 , . . . , tn ) and Z (n) = (Z1 , Z2 , . . . , Zn ) with T n − λ(t)dt (n) (n) f (t |Z , θ) = λ(ti )e 0
=
n
⎡
i=1
⎣μ(ti ) +
i=1
i−1
⎤ Zj g(ti − tj )⎦ e
−
T 0
μ(t)dt−
N (T − ) j=1
Zj
T −tj 0
gj (t)dt
,
j=1
and dependence on θ is suppressed. In our analysis we consider the Power Law process (PLP) with intensity function μ(t) = M βtβ−1 , with M > 0 and β > 0. We assume also that μ ≡ gj , for all j, i.e. the contribution of each new bug is represented by the same PLP as the baseline process. In this case we obtain ⎤ ⎡ N (T − ) i−1 n β β −M T + Z j (T −tj ) j=1 ⎣tβ−1 f (t(n) |Z (n) , θ) = M n β n + Zj (ti − tj )⎦ e i i=1
= M nβn
n i=1
j=1
Ai (β, Z (i−1) )e−MB(β,Z
(n)
)
,
i−1 + j=1 Zj (ti −tj ) and B(β, Z (n) ) = where Z (i) = (Z1 , . . . , Zi ), Ai (β, Z (i−1) ) = tβ−1 i N (T − ) T β + j=1 Zj (T − tj )β . Considering Zj ∼ Bern(pj ), for all j, then it follows that
Advances in Bayesian Software Reliability Modeling - F. Ruggeri & R. Soyer
f (t(n) , Z (n) |θ) = f (t(n) |Z (n) , θ)f (Z (n) |θ) = f (t(n) |Z (n) , θ)
n
175
pj j (1 − pj )1−Zj . Z
j=1
Given the likelihood function the two plausible strategies are either summing over all Z (n) so that f (t(n) |θ) can be obtained or treating Zj ’s as parameters and using MCMC methods. We follow the latter approach. We assume the prior distributions as M ∼ G(α, δ), β ∼ G(ρ, λ) and pj ∼ Beta(μj , σj ), for all j. Other possibilities about pj could be an autoregressive model based on logit(pj ), a more general Markov chain or to use a common distribution Beta(μ, σ), for all j. We define p(n) = (p1 , . . . , pn ), p−j = (p1 , . . . , pj−1 , pj+1 , . . . , pn ) and Z−j = (Z1 , . . . , Zj−1 , Zj+1 , . . . , Zn ). Also, we suppress the dependence on t(n) . The full posterior conditionals are given by • M |β, Z (n) , p(n) ∼ G(α + n, δ + B(β, Z (n) ) n (n) • β|M, Z (n) , p(n) ∝ β ρ+n Ai (β, Z (i−1) )e−MB(β,Z )−λβ i=1
• pj |M, β, Z (n) , p−j ∼ Beta(μj + Zj , σj + (1 − Zj )), ∀j It follows from the above that P(Zj = r|M, β, p(n) , Z−j ) = with C0 =
n
⎡ ⎣tβ−1 + i
C1 =
n i=j+1
⎤ Zh (ti − th )β ⎦
h=1,i−1;h=j
i=j+1
and
Cr , r = 0, 1, C0 + C1
⎡ ⎣tβ−1 + i
⎤ β
Zh (ti − th )β + (ti − tj )β ⎦ e−M(T −tj ) .
h=1,i−1;h=j
Thus, we can draw a posterior sample from the joint distribution by iteratively drawing from the given full conditional posterior distributions.
5. Discussion Possible extensions of the above models are currently under consideration. For example, in the HMM the dimension of the state space of the Markov chain will be typically unknown and this can be incorporated into the model as another random quantity. Other possible extensions include a dynamic evolution of the λ(i)’s, a non-homogeneous Markov chain for the states of the latent process Yt . Specification of a prior distribution for the initial environment Y0 , which has been assumed as given here and estimation of the stationary distribution of the Markov chain are other issues under consideration. Regarding the SEP model we are aware that the PLP is not most appropriate choice in this context and that other NHPP’s with finite intensities should be explored. There-
176
Theoretical Advances in Modeling, Inference and Computation
fore, we plan to consider different baseline processes, possibly in the family of the NHPP’s whose intensity function can be written as μ(t) = M g(t; β). Posterior analysis under these NHPP’s are very similar to the ones obtained with the PLP as discussed in Ruggeri and Sivaganesan [9]. An alternate class of models to what we consider here is NHPPs with change points as discussed in Ruggeri and Sivaganesan [9]. Other considerations include analysis with different actual software failure data sets and development of optimal testing policies.
References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10]
Y. Chen and N. D. Singpurwalla, Unification of software reliability models via self-exciting point processes,Advances in Applied Probability textbf29 (1997), 337–352. J.-B. Durand and O. Gaudoin, Software reliability modeling and prediction with hidden Markov chains, Statistical Modeling 5 (2005), 75–93. O. Gaudoin, Software reliability models with two debugging rates, International Journal of Reliability, Quality and Safety 6 (1999), 31–42. O. Gaudoin, C. Lavergne and J. L. Soler, A generalized geometric de-eutrophication software-reliability model, IEEE Transactions on Reliability R-44 (1994), 536–541. Z. Jelinski and P. Moranda, Software reliability research, Statistical Computer Performance Evaluation,W. Freiberger (Ed.), (1972). New York: Academy Press. W. Kremer, Birth-death and bug counting, IEEE Transactions on Reliability R-32 (1983), 37–46. J. D. Musa, Software reliability data, Technical Report (1979), Rome Air Development Center. J. D. Musa and K. Okumoto, A logarithmic Poisson execution time model for software reliability measurement, Proceedings of the seventh International Conference on Software Engineering 1984, 230– 237. F. Ruggeri and S. Sivaganesan, On modeling change points in nonhomogeneous Poisson processes, Statistical Inference for Stochastic Processes 8 (2005), 311–329. N. D. Singpurwalla and S. Wilson, Statistical Methods in Software Engineering, Springer Verlag, New York, 1999.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
177
Signed Domination of Oriented Matroid Systems Arne Bang HUSEBY 1 Department of Mathematics, University of Oslo, Norway Abstract. The domination function has played an important part in reliability theory. While most of the work in this field has been restricted to various types of network system models, many of the results can be generalized to much wider families of systems associated with matroids. Previous papers have explored the relation between undirected network systems and matroids. In this paper the main focus is on directed network systems and oriented matroids. Classical results for directed network systems include the fact that the signed domination is either +1 or −1 if the network is acyclic, and zero otherwise. It turns out that these results can be generalized to systems derived from oriented matroids. Several classes of such systems will be discussed. Keywords. Reliability, directed networks, oriented matroids
Introduction The domination function has played an important part in reliability theory. Classical references on this topic are [11], and [12]. More recent work in this area related to the present paper includes [4] and [5]. Most of the work in the field has been restricted to various types of network system models. However, many of the results can be generalized to much wider families of systems associated with matroids. Previous papers, e.g., [6], [7], [8], and [10] have explored the relation between undirected network systems and matroids. In this paper we focus on directed network systems and oriented matroids.
1. Basic Concepts We start out by reviewing the basic concepts of reliability theory (see [1]). A binary monotone system is an ordered pair (E, φ) where E = {1, . . . , n} is a nonempty finite set, and φ is a binary nondecreasing function defined for all binary vectors X = (X1 , . . . , Xn ). The elements of E are interpreted as components of some technological system. Each component can be either functioning or failed. The vector X is referred to as the component state vector. That is, for all i ∈ E, Xi = 1 if the ith component is functioning and zero otherwise. The function φ is called the structure function of the 1 Corresponding Author: Dept. of Mathematics, University of Oslo, P.O.Box 1053 Blindern, N-0316 Oslo, Norway; E-mail:
[email protected].
178
Theoretical Advances in Modeling, Inference and Computation
system and represents the system state as a function of the component states. That is, φ = φ(X) = 1 if the system is functioning and zero otherwise. A minimal path set of a binary monotone system (E, φ), is a minimal subset P ⊆ E such that if Xi = 1 for all i ∈ P , and zero otherwise, then φ(X) = 1. It is well-known (see [1]) that the structure function of a binary monotone system is always multilinear. That is, it can be written in the following form: φ(X) = δ(A) Xi A⊆E
i∈A
The function δ, defined for all subsets A ⊆ E, is called the signed domination function of the system. The system reliability can also be expressed in terms of the signed domination function as: Pr(φ(X) = 1) = E[φ(X)] = δ(A)E[ Xi ] (1) A⊆E
i∈A
Thus, we see that both the structure function and the system reliability is uniquely determined by the signed domination function. Since the number of terms in the right-hand sum in (1) is 2n , this formula may be very slow to compute. Fortunately, however, many systems have signed domination functions where δ(A) is zero for a large number of sets. This may simplify the calculations considerably. The formula (1) is of particular interest in the study of directed network systems. Such a system is illustrated in Figure 1. The components of the system are the edges, labeled 1, 2, . . . , 7. The system is said to be functioning if there exists a directed path of functioning edges from the source s to the terminal t. If (E, φ) is a directed network
Figure 1. An acyclic directed network
system, and A ⊆ E, then v(A) denotes the number of nodes being adjacent to at least one edge in A. A key result for directed network systems is the following classical theorem (see [11]): Theorem 1 If (E, φ) is a directed network system, then the signed domination function satisfies the following: δ(A) = (−1)|A|−v(A)+1 , if A is an acyclic union of minimal path sets (i.e., a union of minimal path sets which does not contain any directed circuit of the network). Otherwise δ(A) = 0.
Oriented Matroid Systems - A.B. Huseby
179
The main purpose of this paper is to explore the possibility of generalizing the results for directed network systems. It turns out that this can be done within the framework of oriented matroids.
2. Oriented Matroid Systems A signed set is a set M along with a mapping σM : M → {+, −}, called the sign mapping of the set. With a slight abuse of notation, M refers both to the signed set itself as well as the underlying unsigned set of elements. The sign mapping σM defines a partition of M into two subsets, M + = {e ∈ M : σM (e) = +} and M − = {e ∈ M : σM (e) = −}. M + and M − are referred to as the positive and negative elements of M respectively. If M is a signed set with M + = {e1 , . . . , ei } and M − = {f1 , . . . , fj }, we indicate this by writing M as {e1 , . . . , ei , f¯1 , . . . , f¯j }. If M = M + , M is called a positive set, while if M = M − , M is called a negative set. −M denotes the signed set obtained from M by reversing the signs of all the elements, i.e., σ−M (e) = −σM (e) for all e ∈ M . If M is a family of signed sets, the family of sign mappings, {σM : M ∈ M}, is called the sign signature of M. Signed sets can be used to describe paths in directed networks by letting the positive elements represent edges directed the same way as the path, while negative elements represent edges directed the opposite way of the path. As an example consider once again the directed network system shown in Figure 1. The signed minimal path sets from the source s to the terminal t are: P1 = {1, 4, 6},
P2 = {1, 4, ¯5, 7},
P4 = {1, 3, 7},
P5 = {2, 5, 7},
P3 = {1, 3, 5, 6}, P6 = {2, ¯3, 4, 6},
P7 = {2, 7},
while the positive minimal path sets between s an t are P1 , P3 , P4 , P5 , P7 . We now proceed by adding an “artificial” edge x from t to s, and thus turning all the paths into circuits. See Figure 2. Let M denote the family of all signed circuits in the extended network. We also introduce the following families of sets: P¯ = {(M \ x) : M ∈ M, x ∈ M + }, P = {(M \ x) : M ∈ M, x ∈ M + and (M \ x)− = ∅}. It is easy to see that P¯ is the family of all signed minimal path sets from the source s to the terminal t, while P is the family of the positive such sets. Given the element x, P¯ and P can be derived from M without any knowledge of the node structure of the network. Thus, all relevant information about the system is stored within M. The family of signed circuits of a directed graph satisfies certain properties which can be formalized within the theory of oriented matroids. An oriented matroid is defined as follows (see [3]): Definition 1 An oriented matroid is an ordered pair (F, M) where F is a nonempty finite set, and M is a family of signed subsets of F , called signed circuits satisfying the following properties: (O1) ∅ is not a signed circuit.
180
Theoretical Advances in Modeling, Inference and Computation
x
Figure 2. A 2-terminal directed network system with an artificial edge, x
(O2) If M is a signed circuit, then so is −M . (O3) For all M1 , M2 ∈ M such that M1 ⊆ M2 , we either have M1 = M2 or M1 = −M2 . (O4) If M1 and M2 are signed circuits such that M1 = −M2 , and e ∈ M1+ ∩ M2− , then there exists a third signed circuit M3 with M3+ ⊆ (M1+ ∪ M2+ ) \ e and M3− ⊆ (M1− ∪ M2− ) \ e. If (F, M) is an oriented matroid, the elements of F may sometimes be interpreted as vectors in a linear space, in which case the circuits correspond to minimal linearly dependent sets. An independent set of an oriented matroid is defined as a set which does not contain any circuit. If (F, M) is an oriented matroid, the rank function of the matroid, denoted ρ(A), is defined for all A ⊆ E as the cardinality of the largest independent subset of A. Definition 2 Let (E∪x, M) be an oriented matroid, and let (E, φ) be a binary monotone system with minimal path set family P given by: P = {(M \ x) : M ∈ M, x ∈ M + and (M \ x)− = ∅}
(2)
We then say that (E, φ) is the oriented matroid system derived from the oriented matroid (E ∪ x, M) with respect to x, and write this as (E ∪ x, M) → (E, φ). If (E ∪ x, M) → (E, φ), a subset A ⊆ E is said to be cyclic if there exists a positive circuit M ∈ M such that M ⊆ A. If no such circuit exists, A is said to be acyclic. In particular the system (E, φ) is said to be cyclic (acyclic) if E is cyclic (acyclic). The class of oriented matroid systems generalizes the class of 2-terminal directed network systems. Moreover, Theorem 1 can be generalized to the class of oriented matroid systems: Theorem 2 If (E ∪ x, M) → (E, φ), then: δ(A) = (−1)|A|−ρ(A∪x) , if A is an acyclic union of minimal path sets (i.e., a union of minimal path sets which does not contain any positive circuit of M). Otherwise δ(A) = 0. Proof: See [9]
Oriented Matroid Systems - A.B. Huseby
181
3. Oriented Matrix Systems In order to introduce the class of oriented matrix systems, we start out by letting (E, φ) be a binary monotone system where E = {1, . . . , n}. If the component state vector is X, we introduce the set A = A(X) = {i : Xi = 1}. For each i ∈ E we associate a vector denoted v i belonging to some vector space over an ordered field, say e.g., R. We also introduce a “target” vector u belonging to the same vector space. We then define φ(X) to be 1 if there exists {λi ≥ 0 : i ∈ A(X)} so that:
λi v i = u,
(3)
i∈A
and zero otherwise. Thus, the system is functioning if and only if the convex cone spanned by the vectors {vi : i ∈ A} contains the target vector. We refer to such a system as an oriented matrix system. It can be shown that such a system is in fact a special case of an oriented matroid system. We denote the corresponding matroid by (E ∪ x, M). To the artificial component x we associate the vector v x = −u. The family of signed circuits M consists of the sets M ⊆ (E ∪ x) such that {vi : i ∈ M } is a minimal linearly dependent set of vectors. Thus, if M ∈ M, there exists a set of non-zero constants {λi : i ∈ M } such that:
λi v i = 0.
(4)
i∈M
Moreover, given {λi : i ∈ M }, the sign map of M is defined so that M + = {i : λi > 0}, while M − = {i : λi < 0}. Finally, the rank function of (E ∪ x, M), denoted ρ, reduces to “ordinary” matrix rank. That is, if A ⊆ (E ∪ x), then ρ(A) is equal to the rank of the matrix with columns {v i : i ∈ A}. We observe that if M ∈ M, x ∈ M + and (M \ x)− = ∅, we have: λi v i = −v x = u. λx
(5)
i∈M\x
Thus, (M \ x) is indeed a minimal path set of (E, φ). Since (E, φ) is an oriented matroid system, it follows by Theorem 2 that δ(A) = (−1)|A|−ρ(A∪x) if A is an acyclic union of minimal path sets and zero otherwise. The class of oriented matrix systems can be viewed as a generalization of the class of 2-terminal directed network systems. In particular, if (E, φ) is a 2-terminal directed network system, the associated vectors correspond to the columns of the node-arc incidence matrix of the network graph, including the artificial edge x from the terminal back to the source. (See Figure 2). We recall that for an oriented matroid system (E, φ) a subset A ⊆ E is acyclic if A does not contain any positive circuits. Thus, in an oriented matrix system (E, φ) with associated vectors {vi : i ∈ E}, A ⊆ E is is cyclic if there exists a set of nonnegative numbers {λi : i ∈ A} where λj > 0 for at least one j ∈ A, and such that:
182
Theoretical Advances in Modeling, Inference and Computation
λi v i = 0.
(6)
i∈A
Note that if (6) holds for the set of nonnegative numbers {λi : i ∈ A} and c > 0, then (6) also holds for {cλi : i ∈ A}. Thus, since not all the λi s are zero, we may scale them so they add up to 1, in which case the left-hand side of (6) becomes a convex combination of the v i s. Hence, A is cyclic if and only if 0 is contained in the convex hull of {v i : i ∈ A}. If not, A is acyclic.
4. Oriented k-out-of-n Systems Let (E, φ) be a binary monotone system where |E| = n, and assume that φ(X) = 1 if |A(X)| ≥ k and zero otherwise. Then the system is said to be a k-out-of-n system. That is, the system is functioning if and only if at least k of the n components are functioning. Thus, the minimal path sets of a k-out-of-n system are all sets P ⊆ E such that |P | = k. The class of k-out-of-n systems has been studied extensively in the reliability literature. See e.g., [1]. An efficient algorithm for calculating the reliability of k-out-of-n systems is given in [2]. In [6] it is shown that k-out-of-n systems can be associated with matroids in the same way as undirected network systems. It turns out that it is possible to derive oriented matroid systems from the class of k-out-of-n systems as well. Thus, we let E = {1, . . . , n} be a set of components and let k be an integer such that 1 ≤ k ≤ n. We then consider what is known as a “uniform” oriented matroid (E ∪ x, M) with rank k. See [3]. That is, M is given as. M = {M ⊆ (E ∪ x) : |M | = k + 1},
(7)
and equipped with a suitable sign signature. Note that since all the circuits of (E ∪x, M) contains k + 1 elements, it follows that the largest independent subsets of E ∪ x contain k elements. Thus, by definition of the rank we indeed have that ρ(E ∪ x) = k. ¯ be the binary monotone system with minimal path sets P¯ = {(M \ Then let (E, φ) ¯ is a x) : x ∈ M + }. Hence, P¯ consists of all subsets of E with cardinality k, so (E, φ) k-out-of-n system. Now, consider instead the system (E, φ) with minimal path sets P = {P ∈ P¯ : P − = ∅}. Thus, only the positive sets of P¯ are included in P. By definition (E, φ) is an oriented matroid system, and we then refer to this system as an oriented k-out-of-n system. Note that the exact form of (E, φ) depends on the sign signature of (E ∪ x, M). Thus, in general there will be many different types of oriented k-out-of-n systems. Some of these are acyclic, while others are cyclic. In the case where (E, φ) is acyclic, i.e., where E does not contain any positive circuits, it follows by Theorem 2 that: δ(E) = (−1)|E|−ρ(E∪x) = (−1)n−k ,
(8)
while in the cyclic case δ(E) = 0. Example 1 Let (E, φ) be an oriented matrix system where E = {1, . . . , 5}. Assume that the associated vectors v 1 , . . . , v 5 all have the same length and are located in the first
Oriented Matroid Systems - A.B. Huseby
183
v1
v2
v5 u
v4
v3
Figure 3. Vectors in R3 forming a regular pentagon, and projected into a plane orthogonal to the center point
octant of R3 forming a regular pentagon. Furthermore, assume that the target vector u is located at the center of this pentagon. The system is illustrated Figure 3, where we have projected all the points into a plane orthogonal to the center point of the pentagon. As usual we denote the corresponding matroid by (E ∪ x, M), and let v x = −u. By the choice of v 1 , . . . , v 5 , v x it is clear that any set of three of these vectors forms a basis for R3 . Since M by definition consists of the sets M ⊆ (E ∪ x) such that {v i : i ∈ M } is a minimal linearly dependent set of vectors, it follows that we in this case have M = {M ⊆ (E ∪ x) : |M | = 4}. Thus, (E ∪ x, M) is a uniform oriented matroid, and we have: ρ(E ∪ x) = rank[v 1 , . . . , v 5 , v x ] = 3.
(9)
Hence, by the definition of oriented k-out-of-n systems it is evident that (E, φ) is an oriented 3-out-of-5 system. On the other hand (E, φ) is by definition also an oriented matrix system. Thus, if A(X) ⊆ E is the set of functioning components, it follows that φ(X) = 1 if and only if the target vector u is contained in the convex cone spanned by the vectors {v i : i ∈ A(X)}. Considering the projection in Figure 3 this is equivalent to the projection of u being contained in the polygon spanned by the projections of the vectors {vi : i ∈ A(X)}. For this to hold we must have |A(X)| ≥ 3. Moreover, if |A(X)| = 3, the projections cannot be consecutive points in the pentagon. Thus, e.g., the triangle corresponding to the set {1, 2, 4} contains the projection of the target, so φ(1, 1, 0, 1, 0) = 1. On the other hand the triangle corresponding to the set {1, 2, 3} does not contain the projection of the target, so φ(1, 1, 1, 0, 0) = 0. From this we get that the minimal path sets of the system are P = {P1 , . . . , P5 } where P1 = {1, 2, 4}, P2 = {2, 3, 5}, P3 = {1, 3, 4}, P4 = {2, 4, 5}, and P5 = {1, 3, 5}. Since all the associated vectors are located in the first octant of R3 , the convex hull of these vectors cannot contain 0. Thus, (E, φ) is acyclic. Hence, by Theorem 2 it follows that δ(E) = (−1)|E|−ρ(E∪x) = (−1)5−3 = 1.
5. Discussion In the present paper we have introduced the class of oriented matroid systems, and shown how the classical domination results for directed network systems can be extended to this
184
Theoretical Advances in Modeling, Inference and Computation
class. Since 2-terminal directed network systems are special cases of oriented matroid systems, the domination results for such network systems are covered completely by our results. In [6] and [7] it was shown that multi-terminal undirected network systems can be handled in a unified way using matroid theory. Thus, a natural conjecture would be that similar unifying results can be obtained in the directed case. Preliminary investigations of this, however, indicates that the problem is much more difficult than in the undirected case, and that certain restrictions will apply.
Acknowledgments This paper was written with support from the Norwegian Research Council and Johan and Mimi Wesmann’s foundation. We also want to thank Professor Bent Natvig for many helpful comments and suggestions.
References [1] [2] [3] [4]
[5] [6] [7] [8] [9] [10] [11] [12]
R. E. Barlow and F. Proschan, Statistical Theory of Reliability and Life Testing, To Begin With – Silver Spring MD, 1981. R. E. Barlow and K. D. Heidtmann: Computing k-out-of-n system reliability, IEEE Trans. Reliability R-33 (1984), 322–323. A. Björner and M. Las Vergnas and B. Sturmfels and N. White and G. Ziegler, Oriented Matroids – Second Edition, Encyclopedia of Mathematics and its Applications, Cambridge University Press, 1999. H. Cancela and L. Petingi: Properties of a generalized Source-to-All-terminal Network Reliability Model with Diameter Constraints, OMEGA, International J. of Manag. Sc., (2005), Special issue on Telecommunications Applications. H. Cancela and L. Petingi: On the Characterization of the Domination of a Diameter-constrained Network Reliability Model, Discrete Applied Mathematics, 154, (2006), 1885–1896. A. B. Huseby: A Unified Theory of Domination and Signed Domination with Application to Exact Reliability Computations, Statistical Research Report no 3, University of Oslo, 1984. A. B. Huseby: Domination Theory and The Crapo β-invariant, Networks 19 (1989), 135–149. A. B. Huseby: On Regularity, Amenability and Optimal Factoring Strategies for Reliability Computations, Statistical Research Report no 4, University of Oslo, 2001. A. B. Huseby: Oriented Matroid Systems, Statistical Research Report no 2, University of Oslo, 2008. J. Rodriguez and L. Traldi: (K, j)-Domination and (K, j)-Reliability, Networks 30 (1997), 293–306. A. Satyanarayana and A. Prabhakar: New topological formula and rapid algorithm for reliability analysis of complex networks, IEEE Trans. Reliability R-27 (1978), 82–100. A. Satyanarayana and M. K. Chang: Network reliability and the factoring theorem, Networks 13 (1983), 107–120.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
185
Nonparametric Predictive Inference for k-out-of-m Systems F.P.A. COOLEN 1 and P. COOLEN-SCHRIJNER Durham University, UK Abstract. We present lower and upper probabilities for reliability of k-out-of-m systems with exchangeable components. These interval probabilities are based on the nonparametric predictive inferential (NPI) approach for Bernoulli data [5]. It is assumed that test data are available on the components, and that the components to be used in the system are exchangeable with those tested. An attractive feature is the way in which data containing zero failures can be dealt with. Keywords. Interval probability, system reliability, zero failures
Introduction Imprecise probabilistic methods in reliability are receiving increasing attention [6,13,22], with uncertainty quantified via lower and upper probabilities [23,24,25]. The authors have developed a novel statistical theory entitled Nonparametric Predictive Inference (NPI) [7,13], which uses lower and upper probabilities. Recent applications include reliability demonstration for failure-free periods [10], (opportunity-based) age replacement [14,16], comparison of success-failure data [12,15], probabilistic safety assessment in case of zero failures [8], and prediction of unobserved failure modes [9]. Reliability assessment for k-out-of-m systems is a traditional problem [1]. Such a system consists of m exchangeable components (often the confusing term ‘identical components’ is used), and the system functions if and only if at least k components function. We apply the NPI method for Bernoulli data [5] to such systems, with inference based on data from tests on n components, which are exchangeable with the components in the system. We only consider situations where components, and the system, either function or not when called upon. We assume that in a test of n components, s functioned and n − s failed, and we use NPI to derive the lower and upper probabilities for the event that the k-out-of-m system, made up of components exchangeable with those n tested, functions. Section 1 presents the main results together with some of the underlying theory. These results are illustrated in three examples in Section 2, which also serve to discuss some important aspects of this approach. Section 3 provides discussion of several further issues, including a brief comparison to Bayesian methods and comments on related research challenges for generalization of the NPI approach for system reliability. 1 Corresponding Author: Department of Mathematical Sciences, Durham University, Durham, DH1 3LE, United Kingdom; E-mail:
[email protected]
186
Theoretical Advances in Modeling, Inference and Computation
1. NPI for k-out-of-m systems In this paper, NPI for Bernoulli random quantities [5] is used for inference on reliability of k-out-of-m systems. We refer to [5] for justifications and discussion of related statistical theory. The lower and upper probabilities presented by Coolen [5] fit in the framework of nonparametric predictive inference (NPI) [2,7], hence we also call them NPI (-based) lower and upper probabilities. They have strong consistency properties in the theory of interval probability [2,25], and NPI is ‘perfectly calibrated’ in the frequentist sense [20]. These lower and upper probabilities have the attractive property that, for an event A, the interval created by them always contains the precise empirical probability for A as based on the observed data. Suppose that we have a sequence of n + m exchangeable Bernoulli trials, each with ‘success’ and ‘failure’ as possible outcomes, and data consisting of s successes in n trials. Let Yij denote the random number of successes in trials i to j, then a sufficient representation of the data for our inferences is Y1n = s, due to the assumed exchangeability of n+m ≥ k, as this all trials. We focus on the lower and upper probabilities for the event Yn+1 event corresponds to successful functioning of a k-out-of-m system. Given data consisting of s successes from n components tested, we denote the NPI lower and upper probabilities for the event that the k-out-of-m system functions successfully by P (m : k|n, s) and P (m : k|n, s), respectively. From the general NPI lower and upper probabilities in [5], the following results are directly derived. For k ∈ {1, 2, . . . , m} and 0 < s < n, the NPI lower probability is n+m ≥ k|Y1n = s) P (m : k|n, s) = P (Yn+1 1 2−1 k−1 1s + l − 121n − s + m − l2 n+m =1− s−1 n−s n
(1)
l=0
and the corresponding upper probability is n+m P (m : k|n, s) = P (Yn+1 ≥ k|Y1n = s) = 1 2−1 1 21 2 21 2 m 1 n+m s+k n−s+m−k s+l−1 n−s+m−l + (2) n s n−s s−1 n−s l=k+1
For m = 1, so a system consisting of a single component, the lower and upper probabilities for the event that the system functions successfully are (P , P )(1 : 1|n, s) = s s+1 ( n+1 , n+1 ). If the observed data are all successes, so s = n, or all failures, so s = 0, then the lower and upper probabilities are, for all k ∈ {1, . . . , m}, & 1 21 2−1 ' n+k−1 n+m (P , P )(m : k|n, n) = 1 − ,1 n n & 1 21 2−1 ' n+m−k n+m (P , P )(m : k|n, 0) = 0, n n For series systems, for which k = m, these lower and upper probabilities are (for 0 ≤ s ≤ n)
NPI for k-out-of-m systems - F.P.A. Coolen & P. Coolen-Schrijner
⎛ (P , P )(m : m|n, s) = ⎝
187
⎞
m m s−1+j s+j⎠ , n + j j=1 n + j j=1
and for parallel systems, for which k = 1, they are (for 0 ≤ s ≤ n) ⎛
⎞ m m n − s + j n − s − 1 + j ⎠ ,1 − (P , P )(m : 1|n, s) = ⎝1 − n + j n + j j=1 j=1 An important advantage of the use of lower and upper probabilities in statistical inference occurs in situations with the observations either all successes or all failures, as inferences based on precise probabilistic methods are typically not in agreement with empirical probabilities in such cases. If one has observed zero failures in tests of n components, one might wish to assign a small but non-zero probability to failure of a future component. However, zero failures in n components does not exclude the possibility that failures could never happen. An attractive, albeit informal, manner in which to interpret lower and upper probabilities is to regard the lower probability P (A) as quantifying the evidence in favor of event A, and the upper probability P (A) as quantifying the evidence against event A. From this perspective, if one considers a system consisting of only a single component, the lower and upper probabilities of successful functioning of the one future component, given zero failures in n components tested (so s = n), which are n , 1), are attractive, as the upper probability reflects equal to (P , P )(1 : 1|n, n) = ( n+1 that there is no evidence from the test data against successful functioning of the future component, whereas the lower probability provides a natural cautious inference which can be used in quantitative risk assessment. As such, the results in this paper can be used in zero-failure reliability demonstration from NPI perspective, generalizing the results presented in [10]. This is briefly illustrated in Example 3. For the special cases with m = 1, k = 1 or k = m, for which the lower and upper probabilities of successful system functioning given s successes in n component tests are given above, it is easily seen that the following result holds (for 0 ≤ s < n) P (m : k|n, s) = P (m : k|n, s + 1)
(3)
The result (3) actually holds generally for all k-out-of-m systems as considered in this paper. A direct proof, using expressions (1) and (2), is a complicated exercise in combinatorial analysis. However, this result follows immediately from detailed consideration of the underlying representation assumed for Bernoulli random quantities in the NPI method [5] that is used here. In this paper, we do not provide this detailed justification for the equality (3), but the examples in Section 2 will, of course, illustrate this interesting property of our inferences. The result (3) can obviously be used to reduce computational effort, if lower and upper probabilities are required for all possible values of s. We would also like to emphasize the elegance of this equality, as it implies that the intervals created by corresponding lower and upper probabilities of successful system functioning, for s = 0, 1, . . . , n, form a partition of the interval [0, 1].
188
Theoretical Advances in Modeling, Inference and Computation
2. Examples Example 1. Consider a series system with 10 exchangeable components (so k = m = 10), and the only information available is the result of a test of 2 components, also exchangeable with the 10 to be used in the system. For the three possible values of the number of successes in the test, s = 0, 1, 2, the NPI lower and upper probabilities (P , P ) for successful functioning of the system are (0, 1/66) for s = 0, (1/66, 1/6) for s = 1, and (1/6, 1) for s = 2. These values illustrate that the upper probability of successful system functioning given s successes in n tests is equal to the lower probability given s + 1 successes in n tests. The value 0 (1) of the lower (upper) probability for the case s = 0 (s = 2) reflects that in this case there is no strong evidence that the components can actually function (fail). These values emphasize the serious error that can be made if one would plug a ‘reasonable’ estimate of a probability that a component functions, into a formula for a probability of system functioning. For example, based on s = 1 one might be tempted to estimate the probability of successful functioning of this series system by (1/2)10 = 1/1024, which is far lower than the corresponding lower probability 1 66 from our method, so system reliability would be substantially under-estimated. As an informal argument that leads to a better alternative than such a plug-in approach, and that is in line with our approach, we could reason as follows. On the basis of one success in two tests, the predictive probability for the next component to be successful might be set at (about) 1/2. For the series system to function successfully, all 10 components must function. Let us then consider the second component in the series system, conditional on the first functioning successfully and the data from the test, hence for this component the information available consists of 2 successful components out of 3, and therefore the predictive probability for this component to function might be set at (about) 2/3. Continuing this reasoning, which acknowledges the interdependence of the 10 components in the system, the predictive probability of successful functioning of the series system 1 would be (about) 12 × 23 × 34 . . . × 10 11 = 11 , which is in between our corresponding lower and upper probabilities. We should emphasize that we only present this latter informal reasoning as a possible explanation why the use of a plug-in estimate is wrong, we do not suggest that the value 1/11 is a ‘correct’ precise probability in this case, an obvious reason is that this informal argument would lead to precise probability 0 (1) for system functioning in the case s = 0 (s = 2). For a parallel system with 10 components (k = 1, m = 10), with 2 components tested, the NPI lower and upper probabilities of successful functioning of the system are (0, 5/6) for s = 0, (5/6, 65/66) for s = 1, and (65/66, 1) for s = 2. Note here that mistakenly using a plug-in estimate for the case s = 1, as discussed above, one could think that the value 1 − (1/2)10 = 1023/1024 would be a reasonable estimate of the system reliability. This value is substantially higher than the NPI upper probability 65/66, and far greater than the NPI lower probability 5/6 which one might wish to use for risk assessment from a cautious perspective, and hence there could be a danger of over confidence in the system’s reliability. Example 2. To further illustrate the NPI results for system reliability, presented in Section 1, Table 1 provides all lower and upper probabilities for the possible cases with n = 4 components tested, of which s functioned successfully, and the system consisting of m = 5 components, of which at least k must function.
NPI for k-out-of-m systems - F.P.A. Coolen & P. Coolen-Schrijner
189
Table 1. Lower and upper probabilities for all cases with m = 5 and n = 4
P s=0 s=1 s=2 s=3 s=4
k=1 P
0 0.556 0.833 0.952 0.992
0.556 0.833 0.952 0.992 1
P
k=2 P
0 0.278 0.595 0.833 0.960
0.278 0.595 0.833 0.960 1
P
k=3 P
0 0.119 0.357 0.643 0.881
0.119 0.357 0.643 0.881 1
P
k=4 P
0 0.040 0.167 0.405 0.722
0.040 0.167 0.405 0.722 1
P
k=5 P
0 0.008 0.048 0.167 0.444
0.008 0.048 0.167 0.444 1
The values in Table 1 show that, in order to get a reasonably large lower probability of successful system functioning, one does not necessarily require most tested components to have functioned well if k is small, which means that the system has much redundancy, but for large values of k one requires (nearly) all tested components to have been successful. Example 3 briefly discusses the related issue of possible choice between extra testing or extra system redundancy for reliability demonstration. Example 3. As mentioned in Section 1, the results in this paper can also be used in zerofailure reliability demonstration from NPI perspective, generalizing the results in [10]. Suppose that for system functioning it is required that k components function, but that redundancy can be built into the system by increasing the total number of components m in the system. For example, components considered could be batteries required to provide back up in case of problems with electricity supply for a safety-critical system, where system functioning requires a minimum of three batteries to function when demanded, but where installing more batteries might provide important redundancy. Rahrouh, et al. [21] presented a Bayesian approach for optimal decisions for reliability demonstration, assuming that only component tests with zero failures would lead to release of the system for practical use, as is often the case if high reliability is required. They considered both costs of testing and costs of extra system redundancy, and also took practical constraints with regard to test time and budget into account. Apart from cost and time figures, and related constraints, the key input for such decisions consists of the lower probabilities P (m : k|n, n), as function of m and n for fixed k. Some such values are presented in Table 2, for k = 8 and the cases n = 5, 10, 15, and m varying from 8 to 12. Of course, the corresponding upper probabilities are all equal to one, as the tests revealed zero failures. Table 2. Lower probabilities for zero-failure testing with k = 8
s=n=5 10 15
m=8
9
10
11
12
0.385 0.556 0.652
0.604 0.789 0.870
0.736 0.895 0.948
0.819 0.945 0.978
0.872 0.970 0.990
The lower probabilities presented in Table 2 can be used in several ways. For example, consider the case m = 8 with 5 zero-failure tests, leading to lower probability 0.385 of successful system functioning. The table shows that increasing the redundancy to m = 9, keeping k = 8, would increase the lower probability to 0.604, while increas-
190
Theoretical Advances in Modeling, Inference and Computation
ing the number of zero-failure tests to 10 would increase the lower probability to 0.556, so if these two actions were available at similar costs, increasing redundancy might be preferred to more tests. However, if 15 tests were possible at a cost similar to the added redundancy, then this might be preferred as the corresponding lower probability would increase to 0.652, if all 15 tests would be successes. Of course, extra testing has the added advantage of possibly finding more failures, in which case one would start the analysis over again after further inspection or development of the components. In our NPI approach, the absence of prior information makes it impossible to infer how likely failures in the tests would be, but in high reliability demonstration one would normally be quite surprised to encounter failures in tests. Table 3 extends this example by presenting the minimum number of zero-failure tests required to achieve a chosen value for the lower probability of successful system functioning, again for k = 8 and m varying from 8 to 12. The requirement considered is P (m : 8|n, n) ≥ p for different values of p. Table 3. Required number n of zero-failure tests for P (m : 8|n, n) ≥ p
p = 0.75 0.80 0.85 0.90 0.95 0.99
m=8 24 32 46 72 153 792
9 9 11 14 19 30 77
10 6 7 8 11 16 33
11 4 5 6 8 11 21
12 4 4 5 6 9 15
The main conclusion to draw from Table 3 is that, in order to demonstrate high reliability via zero-failure testing, one requires quite a large number of successful tests, yet this number can be substantially reduced by building in redundancy.
3. Discussion The Bayesian approach to statistics also provides a natural framework for predictive inferences of the kind considered in this paper. If one assumes a Binomial model with a Beta prior distribution, which is a standard Bayesian approach, then the results presented in this paper actually coincide with the corresponding Bayesian results based on two particular Beta prior distributions. The NPI lower probabilities of successful system functioning correspond to Bayesian probabilities based on the Beta(0, 1) prior, and the NPI upper probabilities correspond similarly to the Beta(1, 0) prior (note that these priors are improper, but the corresponding posterior probabilities of interest do exist). This is due to the fact that, generally, for events of the form ‘k or more successes out of m trials’, the inferences of Coolen [5] coincide with these Bayesian inferences. It should, however, be emphasized that this is not the case for all events considered in the NPI approach [5]. The fact that these inferences provide the same values for the (lower and upper) probabilities considered can be understood from the representation of successes and failures that underlies NPI [5]. In relation to Example 1, it is useful to remark that, if one would use a Bayesian approach with improper prior Beta(0, 0), and add test data consisting of one
NPI for k-out-of-m systems - F.P.A. Coolen & P. Coolen-Schrijner
191
success and one failure (leading to a uniform posterior distribution), then the posterior probability of successful functioning of a 10-out-of-10 system would be equal to 1/11, the value also derived via an informal argument in Example 1. Hartigan [18] proposed the use of either the Beta(0, 1) or the Beta(1, 0) prior for ‘cautious’ inference, and we also proposed the Beta(0, 1) prior for Bayesian high-reliability demonstration, mainly due to its relation to NPI [11]. There is an ongoing discussion, in the statistical research community, on benefits and suitable theories of objective and subjective Bayesian methods [3,17]. The NPI approach explicitly attempts to limit subjective aspects, and hence can be considered to be an alternative to objective Bayesian inference, as discussed in detail by Coolen [7]. In particular in reliability scenarios where safety criteria must be met, any suggestion of subjectivity in inferences might best be avoided, and NPI provides suitable inferences in such situations. It is quite remarkable that so-called objective Bayesian methods are based on the attempt to select a prior distribution with minimal influence on the final inferences, while avoiding the need of a prior distribution appears more natural to achieve ‘objective inferences’. In situations where subjective methods are deemed suitable, for example to support decisions involving system reliability within a company, based on the best available expert judgments, the NPI approach does not provide a suitable framework for inference. In such cases, elicitation of expert opinions presents interesting challenges [4], and the Bayesian approach provides much flexibility for modeling of uncertainty and corresponding decision support [17]. However, even in such situations it might be useful to also study the NPI results, assuming that at least some test data on components are available. If the results of the subjective study fall outside the corresponding NPI-based interval, it is a clear indication of the influence of the subjective input and as such it can be a useful tool for reflection on the appropriateness of the elicited expert opinions. We consider it an advantage of the NPI approach that the inferences are in terms of lower and upper probabilities, as these naturally reflect the amount of information available, and deal in an attractive manner with situations where all test results are failures or all are successes. In practical risk assessment, it is often clear which of the lower and upper probabilities should be used to support decisions, while the difference between corresponding upper and lower probabilities can provide further useful insight. The NPI approach can be extended to more general system configurations, which provides interesting research challenges. For example, systems consisting of different types of components can be considered, which is a relatively straightforward extension. More challenging is development of the NPI approach for systems consisting of parallel and series subsystems, as for such systems the basic NPI results [5] must be extended to take the particular groupings of future components in the system into account. The basic idea of the NPI approach [5] will remain the same, but the combinatorics involved in deriving the lower and upper probabilities will be challenging for larger systems. In a forthcoming paper, we will present a generalization of the approach in this paper by considering systems that consist of series of independent subsystems, with each subsystem i a ki -out-of-mi subsystem as discussed in this paper. For such systems, the NPI approach enables a fast algorithm for optimal redundancy allocation, which is wellknown to be a complex computational problem if component reliability is assumed to be known [19].
192
Theoretical Advances in Modeling, Inference and Computation
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23] [24] [25]
J.D. Andrews and T.R. Moss, Reliability and Risk Assessment (2nd Ed.), Professional Engineering Publishing Ltd, London, 2002. T. Augustin and F.P.A. Coolen, Nonparametric predictive inference and interval probability, Journal of Statistical Planning and Inference 124 (2004), 251–272. J. Berger, The case for objective Bayesian analysis, Bayesian Analysis 1 (2006), 385–402. T. Bedford, J. Quigley and L. Walls, Expert elicitation for reliable system design, Statistical Science 21 ˝ (2006), 428U-450. F.P.A. Coolen, Low structure imprecise predictive inference for Bayes’ problem, Statistics & Probability Letters 36 (1998), 349–357. F.P.A. Coolen, On the use of imprecise probabilities in reliability, Quality and Reliability Engineering International 20 (2004), 193–202. F.P.A. Coolen, On nonparametric predictive inference and objective Bayesianism, Journal of Logic, Language and Information 15 (2006), 21–47. F.P.A. Coolen, On probabilistic safety assessment in case of zero failures, Journal of Risk and Reliability 220 (2006), 105–114. F.P.A. Coolen, Nonparametric prediction of unobserved failure modes, Journal of Risk and Reliability 221 (2007), 207–216. F.P.A. Coolen and P. Coolen-Schrijner, Nonparametric predictive reliability demonstration for failurefree periods, IMA Journal of Management Mathematics 16 (2005), 1–11. F.P.A. Coolen and P. Coolen-Schrijner, On zero-failure testing for Bayesian high reliability demonstration, Journal of Risk and Reliability 220 (2006), 35–44. F.P.A. Coolen and P. Coolen-Schrijner, Nonparametric predictive comparison of proportions, Journal of Statistical Planning and Inference 137 (2007), 23–33. F.P.A. Coolen, P. Coolen-Schrijner and K.J. Yan, Nonparametric predictive inference in reliability, Reliability Engineering and System Safety 78 (2002), 185–193. P. Coolen-Schrijner and F.P.A. Coolen, Adaptive age replacement based on nonparametric predictive inference, Journal of the Operational Research Society 55 (2004), 1281–1297. P. Coolen-Schrijner and F.P.A. Coolen, Nonparametric predictive comparison of success-failure data in reliability, Journal of Risk and Reliability 221 (2007), 319-327. P. Coolen-Schrijner, F.P.A. Coolen and S.C. Shaw, Nonparametric adaptive opportunity-based age replacement strategies, Journal of the Operational Research Society 57 (2006), 63–81. M. Goldstein, Subjective Bayesian analysis: principles and practice, Bayesian Analysis 1, 403–420. J.A. Hartigan, Bayes Theory, Springer, New York, 1983. W. Kuo and V.R. Prasad, An annotated overview of system-reliability optimization, IEEE Transactions on Reliability 49 (2000), 176–187. J.F. Lawless and M. Fredette, Frequentist prediction intervals and predictive distributions, Biometrika 92 (2005), 529–542. M. Rahrouh, F.P.A. Coolen and P. Coolen-Schrijner, Bayesian reliability demonstration for systems with redundancy, Journal of Risk and Reliability 220 (2006), 137–145. L.V. Utkin and F.P.A. Coolen, Imprecise reliability: an introductory overview, in: G. Levitin (Ed.), Computational Intelligence in Reliability Engineering, Volume 2: New Metaheuristics, Neural and Fuzzy Techniques in Reliability, Springer, New York, 2007, pp. 261–306. P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, London, 1991. K. Weichselberger, The theory of interval-probability as a unifying concept for uncertainty, International Journal of Approximate Reasoning 24 (2000), 149–170. K. Weichselberger, Elementare Grundbegriffe einer allgemeineren Wahrscheinlichkeitsrechnung I. Intervallwahrscheinlichkeit als umfassendes Konzept, Physika, Heidelberg, 2001.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
193
Some Aspects Pertaining to Recurrent Event Modeling and Analysis1 a
Akim ADEKPEDJOU a , Jonathan QUITON b,2 and Edsel A. PEÑA c,3 Department of Mathematics and Statistics, University of Missouri-Rolla b Department of Mathematics, Western Kentucky University c Department of Statistics, University of South Carolina Abstract. This article presents some results pertaining to recurrent event modeling and analysis. In particular, we consider the problem of detecting outliers and also examine the impact of an informative monitoring period in terms of loss of efficiency. Aside from the ideas and analytical results, we demonstrate these aspects through an application to the well-used air-conditioning reliability data set in [18]. Keywords. Asymptotic relative efficiency; Koziol-Green model; informative monitoring; outlier detection; Neyman’s smooth embedding.
Introduction Consider n units or systems, with the ith unit monitored for the occurrence of a recurrent event of interest over a period [0, τi ], where the τi s are independent and identically distributed (IID) random variables from a distribution function G(s) = Pr{τi ≤ s}, which could be a degenerate distribution. Denote by Tij s the successive inter-event times and by Si1 < Si2 < . . . the successive calendar times of event occurrences for the ith unit. It will be assumed that the Tij s are continuous and IID from an unknown inter-event distribution function F (t) = Pr{Tij ≤ t}, and that the Tij s are independent of the τi s. On the monitoring period, denote by Ki the random number of event occurrences, so Ki = max{k ∈ {0, 1, 2, . . .} : Sik ≤ τi }. For the ith unit, the observable random vector is (τi , Ki , Ti1 , . . . , TiKi , τi − SiKi ).
(1)
Observe that τi − SiKi is the right-censored value of the unobserved inter-event time TiKi +1 . Furthermore, even though the Tij s are IID, the observables (Ti1 , . . . , TiKi ) are not anymore independent owing to the sum-quota constraint given by Ki
Tij ≤ τi <
j=1 1 The
K i +1
Tij .
(2)
j=1
authors acknowledge research support from NIH Grant R01 GM056182. and Quiton have contributed equally on this paper hence are equal first authors of this article. 3 Corresponding Author: Professor, Department of Statistics, University of South Carolina, Columbia, SC 29208 USA. 2 Adekpedjou
194
Recent Advances in Recurrent Event Modeling and Inference
The typical problem for the above setting is to make inference about the inter-event distribution F , or its parameters, based on the observables in (1). Several papers have dealt with such inference problems such as [21] and [17]. However, there are other issues that have not been touched upon. One of these issues is the consideration of the situation where the distribution G is related to the distribution F . This concerns informative censoring, though, even when G is not related to F , the sum-quota constraint in (2) still forces an informative censoring structure, not to mention dependent censoring. The second issue is identifying inter-event times or units that are outliers and/or highly influential. This article presents some recent results pertaining to these two issues. To model informative censoring, we consider the situation where F and G are related according to the Koziol-Green (KG) model (cf., [12]) which postulates that ¯ = F¯ β G
(3)
¯ = 1 − H is the survivor for some β > 0, and where, for a distribution function H, H function. This model has been utilized in the literature as an important model for analytically examining the impact of informative censoring. Among papers that have exploited the nice mathematical properties of this model are [12], [5], [6], and [8]. The last mentioned paper provides a comprehensive summary of work using the KG model until 1988. It is fruitful to formulate the setting in a stochastic process framework which eases the derivation of estimators and the consequent derivations of their properties. To do so, we define the following processes with time index s being calendar time: ∞ Ni† (s) = j=1 I{Sij ≤ s ∧ τi }; (4) Yi† (s) = I{τi ≥ s}; Niτ (s) = I{τi ≤ s} = 1 − Yi† (s+). Another process that will play a role in the development is the backward recurrence time process, which for the ith unit is defined via Ri (s) = s − SiN † (s−) .
(5)
i
This denotes the elapsed duration at calendar time s since the last event occurrence. The aggregated processes over the units will be N † = ni=1 Ni† , Y † = ni=1 Yi† , and n N τ = i=1 Ni† . We also recall that for the inter-event distribution F , if f is its associated density function, then its hazard rate function and (cumulative) hazard function are, respectively, defined via t f (t) and Λ(t) = λ(t) = ¯ λ(w)dw = − log F¯ (t). (6) F (t) 0 Observe under the KG model that the hazard rate functions of τi and Tij are proportionally related according to λτ (t) = βλT (t), so the KG model is also referred to as the proportional hazards model. An important property is that the process {Mi† (s) : s ≥ 0}, where s † † Mi (s) = Ni (s) − Yi† (v)λT [Ri (v)] dv, s ≥ 0, 0
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
195
is a square-integrable zero-mean martingale whose predictable quadratic variation process is s Mi† (s) = Yi† (v)λT [Ri (v)] dv. 0
Analogously, Miτ (s) = Niτ (s) −
s
0
Yi† (v)λτ (v)dv, s ≥ 0,
is a zero-mean square integrable martingale with predictable quadratic variation process s τ Mi (s) = Yi† (v)λτ (v)dv. 0
These martingale properties are with respect to the filtration or history Fis = σ{(Ni† (v), Niτ (v)) : v ≤ s}. :n As a consequence, if we form Fs = i=1 Fis , then the aggregated processes M † (s) = n s † τ τ N † (s) − ni=1 0 Yi† (v)λT [Ri (v)] dv = i=1 Mi (s) and M (s) = N (s) − s τ n τ i=1 Mi (s) both form zero-mean square integrable martingales 0 Y (v)λT (v)dv = with respect to the aggregated filtration {Fs : s ≥ 0}. Following [11] (see also [2]), the likelihood process is 8 5n s ΔNi† (v) ΔNiτ (v) L(s; λT , λτ ) = λT [Ri (v)] × λτ (v) i=1 v=0
5
n
exp −
0
i=1
s
8 Yi† (v) [λT
[Ri (v)] + λτ (v)] dv
.
(7)
Supposing that λT (t) = λ0 (t; θ), where θ ∈ Θ ⊆ !p , and if λτ (t) does not involve θ, that is, the non-informative censoring case, then the likelihood process (for the parameter θ) in (7) reduces to 8 5n s ΔNi† (v) × L(s; θ) = λ0 [Ri (v); θ] i=1 v=0
5
exp −
n i=1
0
s
8 Yi† (v)λ0
[Ri (v); θ] dv
.
(8)
On the other hand, under the KG model where the λτ involves the parameter θ, (7) becomes 5n s 8 ΔNi† (v) N τ (s) ΔN τ (v) L(s; θ, β) = β λ0 [Ri (v); θ] × λ0 (v; θ) 5 exp −
i=1 v=0
n i=1
0
s
Yi† (v) (λ0
8 [Ri (v); θ] + βλ0 (v; θ)) dv
.
(9)
196
Recent Advances in Recurrent Event Modeling and Inference
This likelihood process in (9) will be used in the developments in Section 1, while the likelihood process in (8), with slight modifications from the embedding framework that will be employed, will be used in Section 2 dealing with outlier detection. In the outlier detection portion, we do not at the outset assume the KG model, though in the application portion in Section 3, we specialize to the case of a homogeneous Poisson process (HPP) event accrual, so λ(t; θ) = θ, together with the assumption that G also follows an exponential distribution. In this case, the KG model naturally arises.
1. Impact of Informative Monitoring 1.1. Parameter Estimation In this section, we examine some consequences of having an informative endpoint for the monitoring period. We assume the KG model, so that the hazard rate functions of the Tij s and τi s are related according to λτ (t; θ) = βλ0 (t; θ); where λ0 (·) is the common hazard rate function of the Tij s. The parameter vector (θ, β) ∈ Θ × !+ , but its exact value is unknown. The functional form of λ0 (·; ·) is assumed known. The appropriate likelihood process associated with the recurrent event setting described in Section is that given in (9). Taking the logarithm of this likelihood process, we have the log-likelihood process n s l(s; θ, β) = N τ (s) log β + {log(λ0 (v, θ))dNiτ (v)+ i=1
% log(λ0 [Ri (v), θ])dNi† (v) −
0
0
s
Yi† (v) {βλ0 (v, θ) + λ0 [Ri (v), θ]} dv.
(10)
Denote by ∇θ and ∇β the gradient operators with respect to θ and β, respectively. Let ρ(w, θ) ≡ ∇θ log λT (w, θ). Applying these gradient operators to the log-likelihood, we obtain the score processes n s ρ(w, θ)Miτ (dw; θ, β)+ Uθ (s; θ, β) = i=1
s 0
Uβ (s; θ, β) =
0
. ρ(Ri (w), θ)Mi† (dw; θ) ;
n i=1
0
s
1 τ M (dw; θ, β), β i
where the martingale differentials, which involve the unknown parameters, are Mi† (dw; θ) = Ni† (dw) − Yi† (w)λ0 [Ri (w); θ] dw;
(11) (12)
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
197
Miτ (dw; θ, β) = Niτ (dw) − Yi† (w)βλ0 [Ri (w); θ] dw. Given an s∗ ∈ !+ , usually taken to be larger than τ(n) ≡ max1≤i≤n τi , the maximum ˆ ∗ ), β(s ˆ ∗ )) ≡ (θ, ˆ β), ˆ are obtained likelihood estimators (MLE) of θ and β, denoted by (θ(s by solving the set of equations Uθ (s∗ ; θ, β) = 0 and Uβ (s∗ ; θ, β) = 0.
(13)
ˆ β) ˆ will be obtained and the solution of Generally, no closed-form expressions for (θ, (13) will be obtained using standard numerical techniques such as the Newton-Raphson (NR) procedure. Thus, for instance, with the observed Fisher information process defined according to 0 / ∇θθ ∇θβ ∗ I(s ; θ, β) = − l(s; θ, β), (14) ∇βθ ∇ββ then the NR iteration updates a current estimate (θˆcur , βˆcur ) to obtain a newer estimate (θˆnew , βˆnew ) according to (θˆnew , βˆnew ) ← (θˆcur , βˆcur ) + I(s∗ ; θˆcur , βˆcur )−1 U (s∗ ; θˆcur , βˆcur )
(15)
where U (s; θ, β) = (Uθ (s; θ, β), Uβ (s; θ, β) , the score vector process. A purely nonparametric estimator of the inter-event time distribution for the recurrent event setting where G and F need not be related was presented in [17]. For the purpose of examining the efficiency gain by taking into account the informative monitoring period, we could compare the estimator of the afore-mentioned paper with the estimator obtained by substituting θˆ for θ in F (·; θ). We denote this resulting estimator ˆ while the nonparametric estimator in [17] will be labeled F˜ (s∗ , t) by Fˆ (s∗ , t) = F (t; θ), and will be referred to as the PSH estimator. 1.2. Some Properties and Efficiency Comparisons For obtaining asymptotic properties of the ML estimators, we could invoke the consistency and asymptotic normality results in [4] and techniques employed in [20]. As was shown in detail in [1] dissertation work, when n → ∞ and under certain regularity conditions, the ML estimators are consistent and asymptotically normal, and consequently, the distribution function estimator Fˆ (t), by using the delta method, is also asymptotically normal. General expressions of the asymptotic variances of the estimators are presented in Adekpedjou’s dissertation ([1]), but here we limit ourselves to the case of a homogeneous Poisson event accrual, that is, when λ0 (t; θ) = θ, so that λτ (t) = βθ. Under this HPP assumption, it turns out that the ML estimators are given by n Ki n ˆ θ = i=1 and βˆ = n , (16) n τ i=1 i i=1 Ki which are the occurrence/exposure rates. By applying the general asymptotic theory mentioned above, we find that the asymptotic variance of Fˆ (s∗ , t) is given by Avar{Fˆ (s∗ , t)} =
1 β(tθ)2 exp(−2θt) . n 1 − exp{−β(θs∗ )}
(17)
198
Recent Advances in Recurrent Event Modeling and Inference
The asymptotic variance of the nonparametric estimator F˜¯ (t) in [17] can be found in their paper, and by taking the ratio of the two asymptotic variances, and letting s∗ → ∞, we obtain the asymptotic relative efficiency of F˜ relative to Fˆ to be ARE{F˜¯ (t) : Fˆ¯ (t)} =
{tθ(β + 1)}2 . {exp [(β + 1)θt] − 1}
(18)
0.5 0.4 0.3 0.2 0.0
0.1
Asymptotic Relative Efficiency
0.6
By examining this ARE function, which is shown in Figure 1, we are able to obtain
0.0
0.2
0.4
0.6
0.8
1.0
p
Figure 1. Plot of the asymptotic relative efficiency of the nonparametric estimator versus the parametric estimator of the inter-event survivor function given in (18). In this plot, the p in the abscissa is the value of the inter-event survivor function.
some results concerning the gain in efficiency when the informative monitoring is taken into account, at least relative to the fully nonparametric estimator in [17]. Of course it should be noted that the comparison is lopsided since the F˜ estimator is nonparametric in nature and it also does not take into account the KG assumption. However, this allows us to make some comparisons on efficiency results that were obtained for single-event settings. For instance, one of the interesting results is that p0 {log(p0 )}2 sup ARE{F˜ (t)|Fˆ (t)} = ≈ .65, 1 − p0 t≥0
(19)
√ where p0 is the solution of the equation exp{p − 1} − p = 0. The result that this upper bound is much less than unity is not at all surprising in light of the nature of the F˜ estimator. What is surprising, however, is that this is exactly the upper bound obtained in [6] of the ARE of the Kaplan-Meier estimator relative to the ML estimator of the distribution function when the KG assumption is taken into account for single-event settings. In such settings, an important mathematical property under the KG model that is exploited in obtaining results is the independence between min(Ti1 , τi ) and I{Ti1 ≤ τi }.
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
199
This was, for instance, exploited in [5] to get exact results for the Kaplan-Meier estimator as well as in [6]. However, this independence property cannot anymore be exploited so directly in the recurrent event setting, and hence the surprising result that the two ARE upper bounds still coincide.
2. Outlier Detection 2.1. Detecting Outlying Inter-Event Times and Units The presence of outliers in a data set could cause havoc in terms of the inferences that will be performed such as estimation or hypothesis tests. It is therefore of utmost importance to be able to detect outliers, and to perform the appropriate action on them prior to proceeding with desired inferences. In contrast to the recurrent event setting, outlier detection has been well-studied [3]. It continues to be an active research area such as in multivariate time series [10], in spatio-temporal setting [7,9], and in data-mining [14]. In this section, we present some ideas and methods for detecting outliers in a recurrent event data of form described in the Introduction. In such a data set, one may be interested in deciding whether a particular inter-event time is an outlier, or one may be interested in deciding whether a whole observational unit is an outlier. In our development, we do not specifically assume an informative monitoring period, in contrast to the setting in Section 1. First, we focus on the issue of deciding whether a specified inter-event time is an outlier. Thus, suppose that it is of interest to determine if a particular inter-event time Ti0 ,j0 , the j0 th observation in unit i0 th is an outlier, where we allow for the possibility that j0 = Ki0 + 1. The starting point is to postulate that, except for the (i0 , j0 )th interevent time, all the other inter-event times Tij s came from a distribution whose hazard rate function is λ0 (t; θ) which belongs to a parametric class {λ0 (t; θ) | θ ∈ Θ ⊆ !p } ;
(20)
whereas, Ti0 ,j0 has a hazard rate function that is a perturbed version of λ0 (t; θ) of form λ0 (t; θ) exp{γ ϕ(t)}
(21)
where γ is a k × 1 vector of constants and ϕ(·) is a k × 1 vector of basis functions. Such a specification of hazard rate functions was used in [15] and [16] in goodness-of-fit testing for single-event settings, and its original impetus was [13]’s paper which introduced this type of ‘smooth’ embedding for density functions. The (i0 , j0 )th observation will therefore be declared an outlier if γ = 0, so a decision procedure is to test the null hypothesis that (H0 ) γ = 0 versus the alternative hypothesis (H1 ) γ = 0 using the observed recurrent event data. The appropriate likelihood is obtained from (8) where we replace the λ0 (t; θ) there by λT (t; γ, θ) = λ0 (t; γ, θ) exp{γ ϕ(t)I{i = i0 , j = j0 }.
(22)
Observe that this hazard is just λ0 (t; θ) when the inter-event time is not Ti0 ,j0 , and it becomes the perturbed hazard otherwise. Taking the gradients with respect to γ and θ of the resulting log-likelihood process log L(s; γ, θ), we obtain the score process
200
Recent Advances in Recurrent Event Modeling and Inference
/
0 Uγ (s; γ, θ) U(s; γ, θ) = Uθ (s; γ, θ) 0 n s/ ϕ (Ri (v)) I(i = io , Ni† (v−) = jo − 1) dMi† (v; γ, θ),(23) = ∇θ log(λT (Ri (v); γ, θ)) 0 i=1
where dMi† (v; γ, θ) = dNi† (v)− Yi† (v)λ0 [Ri (v); θ] exp{γ ϕ [Ri (v)] I(i = i0 , Ni† (v−) = jo − 1)}dv. The test that Ti0 ,j0 is an outlier is to be based on Uγ (s∗ ; γ, θ) with γ = 0 and θ replaced by its ML estimator θˆ under the restriction that γ = 0. Because this score function involves the indicator function I(i = i0 , j = j0 ), then in assessing the magnitude of the observed value of this score statistic, an asymptotic approach is not feasible, even though asymptotic properties of the (restricted) ML estimator of θ were obtained in [20]. The sampling distribution of relevance in performing the test procedure is the one induced by conditioning on the observed τi s. This is justified by the fact that we are assuming that the distribution of the τi s does not involve θ, hence the τi s are technically ancillary. If, on the other hand, we had considered an informative monitoring period as in the setting of section 1, then the relevant sampling distribution will not be a conditional distribuˆ tion. Sometimes, it is possible to obtain the exact sampling distribution of Uγ (s∗ ; 0, θ) as in the case of the HPP event accrual model considered in the next subsection. However, we find that a bootstrapping approach is a more appealing alternative especially in complicated models. To make a determination of whether Ti0 ,j0 is an outlier, we therefore either obtain ˆ analytically or via a bootstrapping approach the sampling distribution of Uγ (s∗ ; 0, θ). When k = 1, then from this sampling distribution we obtain, for a given level of significance α, the (α/2) × 100th and (1 − α/2) × 100th percentiles uα/2 and u1−α/2 , respectively. The decision procedure becomes 7 6 ˆ ∈ Ti0 ,j0 is declared an outlier whenever Uγ (s∗ ; 0, θ) (24) / uα/2 , u1−α/2 . ˆ Details for this In the case where k > 1, we instead use a quadratic form of Uγ (s∗ ; 0, θ). more general setting are in Quiton’s dissertation ([19]). This outlier detection approach for an inter-event time is easily extended to determining if an observational unit i0 is an outlier. The only change in the procedure is to replace I(i = i0 , j = j0 ) in the above developments by I(i = i0 ). This is tantamount to postulating that the hazard rate function governing the i0 unit is of form λ0 (t; γ, θ) = λ0 (t; θ) exp{γ ϕ(t)}. Finally, we point out that in general, the basis functions could also be made dependent on the parameter θ. However, in our implementation, we usually take this to be of form ϕ(t) = (1, t, t2 , . . . , tk−1 ). 2.2. Outlier Detection under HPP Event Accrual In this subsection we specialize the outlier detection method described in the preceding subsection to a HPP event accrual, so that λ0 (t; θ) = θ and with ϕ(t) = 1. The resulting ˆ statistic for testing that the inter-event time Ti ,j is an outlier simplifies to Uγ (s∗ ; 0, θ) 0 0
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
9 = Uγ (s ; 0, θ) ∗
201
⎧ ⎨
9 i j , Ki ≥ j0 1 − θT 0 0 0 9 −θ (τi0 − Si0 j0 ) , Ki0 = j0 − 1 ; ⎩ 0, otherwise
(25)
n n where, provided that s∗ ≥ τ(n) , the occurrence/exposure rate θˆ = i=1 Ki / i=1 τi is the ML estimator of θ under the restriction γ = 0. An interesting observation is that this score statistic is proportional to the (i0 , j0 )th jackknife residual given by 9 JR(i0 , j0 ) ≡ θ9[io ,jo ] − θ,
(26)
which is a measure of the change in value of the ML estimate θˆ when Ti0 ,j0 is excluded from the computation. In (26), θ9[io ,jo ] is the ML estimate when Ti0 ,j0 is excluded. As indicated in the preceding subsection, the relevant sampling distribution of ˆ is that obtained by conditioning on the τi s. Note that this entails marginalUγ (s∗ ; 0, θ) izing over the possible values of Ki s, which conditional on the τi s, have each a Poisson with mean τi θ distribution. The cumulative (conditional) sampling distribution of ˆ is Uγ (s∗ ; 0, θ) FUγ (u) =
∞
ˆ ≤ u|Ki = m, τi )P (Ki = m|τi ). P (Uγ (s∗ ; 0, θ) o o o o
(27)
m=0
From (25), the score only involves Ti0 ,j0 , Si0 ,j0 , Ki s, and τi s, so in the computation of the distribution in (27), we require the conditional density function of Ti0 ,j0 , given τi0 , Ki0 = m, which is 1 2m−1 t m 1− I{0 ≤ t ≤ τio }, (28) fTi0 ,j0 |τi0 ,Ki0 (t |τio , m ) = τio τio which is a well-known result from HPP theory. Figure 2 provides plots of the distribution function in (27) for several values of j0 . From the resulting conditional sampling distribution obtained via (27), the decision rule’s critical values (uα/2 , u1−α/2 ), given a specified significance level α, are obtained, and decision on whether the observed value of Ti0 ,j0 is an outlier can be made. Details of the implementation of this procedure are in [19]. For determining if the i0 unit is an outlier for this HPP setting, the relevant score statistic is 9i . 9 = Ki − θτ Vγ (γ = 0, θ) 0 0
(29)
The distribution of Vγ depends on the fact that Ki0 |τi0 ∼ Poisson(τi0 θ). Since Ki0 |τi0 has a discrete (conditional) distribution, then a randomized decision rule may need to be employed to get the exact α-size decision rule. Similar to the inter-event time case, the score statistic for this unit test is proportional to the jackknife residual. That is, 9 ∝ θ9[i ] − θ, 9 Vγ (γ = 0, θ) o
(30)
with this jacknife residual measuring the change in the ML estimate θ9 based on all the ˆ and when the i0 unit is excluded from the computations observations (the estimate θ)
202
Recent Advances in Recurrent Event Modeling and Inference
1.0
Estimated cdf for methods 1 and 2
0.8
for j=26−30 for j=25 for j=24 for j=23 for j=22
0.6
for j=11−15
P(Uγ
for j=20
0.4
for j=21
0.2
for j=1−10
0.0
Fixed τ
−5
−4
−3
−2
−1
0
1
c
Figure 2. The plots are the estimates of the conditional sampling distribution, given the τi s, of the score statistic for different j0 values for airplane 6 in the data analysis illustration. The solid curve is the resulting estimate of the conditional sampling distribution, given the τi S and Ki s, but which was not employed in this article. The abscissa c represents the argument of the distribution functions.
(leading to θˆ[i0 ] ). Thus, because of the relationship between the score statistics and the jacknife residual arising from this hazard embedding framework, this approach provides a natural way to obtain outlier detection procedures for this recurrent event setting framework.
3. Some Illustrations The air-conditioning data set in [18] provides a recurrent event data set that is useful for illustrating our procedures. In this data set, repeated air-conditioning failures are recorded (in operational hours) over some monitoring period for thirteen Boeing 727 airplanes. The calendar time of event occurrences for each of the airplanes is depicted in Figure 3. Assuming an HPP event accrual for each airplane, the ML estimate is θ9 = 0.0107, and the fitted exponential survival curve, together with the nonparametric PSH estimate of the survival curve in [17], are shown in Figure 3. Observe that under this HPP model assumption, this ML estimate of θ is the same as the ML estimate under the KG model. Observe that the PSH estimate crosses the parametric curve only one time, which is indicative of decreasing failure rate (DFR) nature of the true inter-event distribution. [18]’s theorem explains that this phenomenon is brought forth by the mixture of exponential distributions, that is, the inter-event times are generated from a mixture of exponential distributions with varying hazard rates. With the outlier detection method described in the preceding section, we may also ask whether some of the inter-event times and/or units could be considered as outlying. Using the outlier detection procedure de-
14
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
203
Failures
0
2
4
6
Planes
8
10
12
Stopping
0
500
1000
1500
2000
2500
1.0
Calendar time
0.6 0.4 0.0
0.2
Survival Probability
0.8
PSH Fitted Exponential
0
100
200
300
400
500
600
Inter−event time
Figure 3. Boeing air conditioning data set from [18] together with the parametric estimate (under exponentiality) and nonparametric PSH estimate of the inter-event survivor curve.
scribed for the HPP event accrual setting in the preceding subsection, we present in Table 1 those inter-event times identified as outliers at α = 0.05. From Table 1, it is interesting to see that the shorter inter-event times from plane 6 were identified as outlier, while longer inter-event times from plane 9 were also identified as outliers. Further analysis using the outlier detection at the unit level revealed that planes 6 and 9 can be considered as outlying units. This is demonstrated in Figure 4 where the score statistic for each of the 13 airplanes are depicted, together with the critical values. This analytical result is consistent with the observation level results in which plane 6 tends to have shorter
204
Recent Advances in Recurrent Event Modeling and Inference
Table 1. Inter-event times that were declared to be outliers for each of the thirteen airplanes in the airconditioning data set found in [18].
Plane
Outliers
1 2
413, 447
3 4
502, 386
5 6
3, 11, 1,16,52,95
7
1
8
13, 14
9
359, 603, 2, 438
10 11 12
487
13
0 −15
−10
−5
U scores
5
10
15
Test for outliers among units
1
2
3
4
5
6
7
8
9
10 11 12 13
Unit number
Figure 4. Results of unit outlier detection procedures reveals that airplanes 6 and 9 were outliers.
inter-event times, thereby having a higher failure rate, while plane 9 tends to have longer inter-event times, resulting in a lower failure rate. Interestingly, when planes 6 and 9 were excluded from the estimation of the interevent survivor function, the PSH and the fitted exponential estimates, shown in Figure 3, no longer have the single-crossing property, which seems to indicate that a common exponential distribution could be holding for the event accrual. Finally, we end by demonstrating the agreement between the results of our boot-
205
1.0
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
0.6 0.4 0.0
0.2
Survival Probability
0.8
PSH Fitted Exponential
0
100
200
300
400
500
Inter−event time
Figure 5. The nonparametric and the fitted exponential survivor curves when airplanes 6 and 9 are excluded from the analysis.
strapping approach and the analytical approach for obtaining the critical values of the decision rules under the HPP model. The bootstrap approach implemented is a parametric one, wherein the bootstrap samples are generated from an exponential inter-event distribution whose parameter value is the ML estimate θ9 obtained from the original data. For the τi s which are considered fixed, recurrent event data are generated for each of the 13 airplanes, with the restriction that for each airplane the sum of the inter-event times is at most τi , the sum-quota accrual constraint. For each bootstrap sample the score function Uγ (s∗ ; 0, θ9∗ ) is computed, where θ9∗ is the ML estimate from this bootstrap sample. The critical values (u∗α/2 , u∗1−α/2 ) are obtained from the resulting bootstrap sampling distribution of Uγ (s∗ ; 0, θ9∗ ). Figure 6 shows the resulting bootstrap critical values for plane 6’s inter-event times are very close to the critical values obtained from the analytical method. In the bootstrap experiment, the bootstrap replication number was 1000.
Acknowledgments We thank Alexander McLain and Laura Taylor for helpful discussions. We also extend our thanks to Professors Tim Bedford and Babakalli Alkali for their organizing the MMR Conference held in Glasgow, Scotland last July 2007, and for inviting us to contribute this article. We also thank a reviewer for helpful comments.
References [1]
Akim Adekpedjou, Estimation with Recurrent Event Data Under an Informative Monitoring Period, PhD Thesis, University of South Carolina, Columbia, South Carolina, 2007.
Recent Advances in Recurrent Event Modeling and Inference
2
206
0 −1 −3
−2
Score quantiles
1
bootstrap exact
0
5
10
15
20
25
30
interevent times for unit 6
Figure 6. Outlier detection test for inter-event times using the bootstrap and exact sampling distributions. The dashed curve represents the lower and upper critical curves obtained from the analytical approach, whereas the solid curve is from the bootstrap experiment.
[2]
P. Andersen, O. Borgan, R. Gill and N. Keiding, Statistical Models Based on Counting Processes, Springer-Verlag, New York, 1993. [3] R.J. Beckman and R.D. Cook, Outlier...s, Technometrics 25(2) (1983), 119–163. With discussion and reply by the authors. [4] O. Borgan, Maximum likelihood estimation in parametric counting process models, with applications to censored failure time data, Scandinavian Journal of Statistics 11 (1984), 1–16. [5] Y. Chen, M. Hollander and N. Langberg, Small-sample results for the Kaplan-Meier estimator, Journal of the American Statistical Association 77 (1982), 141–144. [6] P.E. Cheng and Gwo Dong Lin, Maximum likelihood estimation of a survival function under the KoizolGreen proportional hazards model, Stat. Prob. Letters 5(1) (1987), 75–80. [7] T. Cheng and Z. Li, A multiscale approach for spatio-temporal outlier detection, Transactions in GIS 10(2) (2006), 253–263. [8] Sándor Csörg˝o, Estimation in the proportional hazards model of random censorship. Statistics 19(3) (1988), 437–463. [9] G.M Davis and K.B. Ensor, Outlier detection in environmental monitoring network: an application to ambient ozone measurements for Houston, Texas, Journal of Statistical Computation and Simulation 76(5) (2006), 407–422. [10] P. Galeano, D. Peña and R.S. Tsay, Outlier detection in multivariate time series by projection pursuit, Journal of the American Statistical Association 101 (2006), 654–669. [11] J. Jacod, Multivariate point processes: Predictable projection, Radon-Nikodym derivatives, representation of martingales. Z. Wahrsch. verw. Geb, 31 (1975), 235–253. [12] J. Koizol and S. Green, A Cramer-von Mises statistic for randomly censored data. Biometrika, 63 (1976), 139–156. [13] J. Neyman, Smooth test for goodness of fit, Skand. Aktuaruetidskr., 20 (1937), 149–199. [14] M. Otey, A. Ghoting and S. Parthasarathy, Fast distributed outlier detection in mixed-attribute data sets. Data Mining and Knowledge Discovery, 12 (2006), 203–228. [15] E.A. Peña, Smooth goodness-of-fit tests for composite hypothesis in hazard based models, Ann. Statistics, 26(5) (1998), 1935–1971. [16] E.A. Peña, Smooth goodness-of-fit tests for the baseline hazard in Cox’s proportional hazards model,
Recurrent Event Modeling and Analysis - A. Adekpedjou et al.
207
Journal of the American Statistical Association, 93 (1998), 673–692. E.A. Peña, R.L. Strawderman and M. Hollander, Non-parametric estimation with recurrent event data. Journal of the American Statistical Association, 96 (2001), 1299–1315. [18] F. Proschan, Theoretical explanation of observing decreasing failure rate, Technometrics, 5 (1963), 375– 383. [19] J. Quiton, General Outlier Detection and Goodness of Fit for Recurrent Event Data, PhD Thesis, University of South Carolina, Columbia, South Carolina, 2007. [20] R. Stocker and E. A. Peña. A general class of parametric models for recurrent event data, Technometrics, 49 (2007), 210–220. [21] M.C. Wang and S.H. Chang, Nonparametric estimation of a recurrent survival function, Journal of the American Statistical Association, 94 (1999), 146–153. [17]
208
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Sensitivity Estimates in Dynamic Reliability Sophie MERCIER 1 and Michel ROUSSIGNOL Université Paris-Est, Laboratoire d’Analyse et de Mathématiques Appliquées, France Abstract. The aim of this paper is to study and to compute first-order derivatives with respect to some parameter p, for some functionals of piecewise deterministic Markov processes (PDMP), in view of sensitivity analysis in dynamic reliability. Such functionals are mean values of some function of the process, cumulated on some finite interval [0, t], and their asymptotic value per unit time. Keywords. Piecewise deterministic Markov process, Importance factor.
Introduction In dynamic reliability, the time-evolution of a system is described by a piecewise deterministic Markov process (PDMP) (It , Xt )t≥0 (see [3], [5]). The first component It is discrete, with values in a finite state space E. Typically, it indicates the state (up/down) for each component of the system at time t. The second component Xt , with values in V ⊂ Rd , stands for environmental conditions, such as temperature, pressure, and so on. Both components of the process, It and Xt , interact in each other: the process jumps at countably many isolated random times; by a jump from (It− , Xt− ) = (i, x) to (It , Xt ) = (j, y) (with (i, x), (j, y) ∈ E × V ), the transition rate between the discrete states i and j depends on the environmental condition x just before the jump and is a function x −→ a (i, j, x). Similarly, the environmental condition just after the jump Xt is distributed according to some distribution μ (i, j, x) (dy), which depends on both components just before the jump (i, x) and on the after jump discrete state j. Between jumps, the discrete component It is constant, whereas the evolution of the environmental condition Xt is deterministic, solution of a set of o.d.e. which depends on the fixed disd Xt = v(i, Xt ) for all t ∈ [a, b], crete state: given that It = i for all t ∈ [a, b], we have dt where v is a mapping from E × V to V . Under technical assumptions, (It , Xt )t≥0 is a Markov process with general state space E × V (see [3], [5]). We study quantities of the form: 1 Rρ0 (t) = Eρ0
0
2
t
h(Is , Xs ) ds
1 Corresponding Author: Université Paris-Est, Laboratoire d’Analyse et de Mathématiques Appliquées (CNRS UMR 8050), 5 boulevard Descartes, Champs sur Marne, F-77454 Marne-la-Vallée, France. E-mail:
[email protected]
Sensitivity Estimates in Dynamic Reliability - S. Mercier & M. Roussignol
209
where ρ0 is the initial distribution of the process and h is some bounded measurable function. We assume that the jump rates for It and the function h depend on some parameter p, where p belongs to an open set O ⊂ R (or Rk ). The quantities of interest then are the firstorder derivatives of Rρ0 (t) and of limt→+∞ Rρ0 (t)/t with respect to p, which may help to rank input data according to their relative importance. This kind of sensitivity analysis was studied by Gandini [7] and by Cao and Chen [2] for pure jump Markov processes with countable state space. We here present extensions of their results to PDMPs. Note that due to the reduced size of the present paper, proofs are not provided here and will be part of a forthcoming paper. All the paper is written under the following assumptions (H1 ), where exponent (p) reflects dependence on p: for all (i, j, p) ∈ E 2 × O, the function x −→ a(p) (i, j, x) is non-negative, bounded and continuous; for all (i, j, x) ∈ E 2 × V , the function p −→ a(p) (i, j, x) is differentiable with uniformly bounded derivative for (i, j, x, p) ∈ E 2 × V × O; for all i, j ∈ E and for all function ψ : V → R continuous and bounded, the function x −→ μ(i,j,x) ψ := ψ(y)μ(i,j,x) (dy) is continuous; for all i ∈ E, the function x −→ v(i, x) is locally Lipschitz continuous and sub-linear; for all (i, x) ∈ E × V , the function p −→ h(p) (i, x) is differentiable with uniformly bounded derivative for (i, x, p) ∈ E 2 × V × O. (p) (p) (p) at time t We denote by ρt (j, dy) the distribution of the process It , Xt t≥0
with initial distribution ρ0 (independent on p). We then have: (t) = Rρ(p) 0
t
0
(p) ρ(p) ds = s h
1 i∈E
V
t 0
h(p) (i, x) ds
2
ρ(p) s (i, dx) .
1. Transitory Results (p) (p) We first introduce the infinitesimal generators of both Markov processes It , Xt t≥0 (p) (p) and It , Xt , t : t≥0
Definition 1 Let DH0 be the set of functions ϕ from E × V to R such that for all i ∈ E the function x −→ ϕ(i, x) is bounded and continuously differentiable and the function x −→ v(i, x) · ∇ϕ(i, x) is bounded and continuous on V . For ϕ ∈ DH0 , we define (p)
H0 ϕ(i, x) =
6 7 a(p) (i, j, x) μ(i,j,x) ϕ(j, ·) + v(i, x) · ∇ϕ(i, x)
j∈E
for all (i, x) ∈ E × V , with a(p) (i, i, x) = − j=i a(p) (i, j, x) and μ(i,i,x) (dy) = δx (dy), where δx is the Dirac measure at x. Let DH be the set of functions ϕ from E × V × R to R such that for all i ∈ E and s ∈ R+ the function x −→ ϕ(i, x, s) is ∂ ϕ(i, x, s) + bounded and continuously differentiable on V and the function x −→ ∂s v(i, x) · ∇ϕ(i, x) is bounded and continuous on V . For ϕ ∈ DH , we define H (p) ϕ(i, x, s) =
j
6 7 ∂ϕ (i, x, s) + v(i, x) · ∇ϕ(i, x, s) a(p) (i, j, x) μ(i,j,x) ϕ(j, ·, s) + ∂s
210
Recent Advances in Recurrent Event Modeling and Inference
for all (i, x, s) ∈ E × V × R+ . (p) (p) (i, x, j, dy) to be the transition probability distribution of It , Xt , t≥0 s (p) (p) (p) Pu ϕ du for all ϕ ∈ DH0 , all s ∈ R+ and we then have: Ps ϕ = ϕ + 0 H0 s (p) (p) Ps ϕ (·, ·, s) = ϕ (·, ·, 0) + 0 H (p) Pu ϕ (·, ·, u) du for all ϕ ∈ DH , all s ∈ R+ (Chapman-Kolmogorov equations). We may now introduce new functions called importance functions: (p)
Setting Pt
(p)
Definition 2 Let t be fixed. We say that a function ϕt ∈ DH is the importance function 6 7 (p) (p) associated to h(p) , t if ϕt is solution of the differential equation H (p) ϕt (i, x, s) = (p) h(p) (i, x) for all s ∈ [0, t[,all (i, x) ∈ E × V with initial data: ϕt (i, x, t) = 0 for all (i, x) ∈ E × V . In applications, the importance functions will generally be computed numerically. However, an analytical form is available, which is also useful for the asymptotic study. Lemma 1 Let us assume that the function x −→ a(p) (i, j, x) is continuously differentiable on V for all i, j ∈ E and all p ∈ O, and that v is bounded (assump6 the function 7 tions H2 ). The importance function associated to h(p) , t is then unique and it is given by: 5 (p) ϕt
(i, x, s) =
−
t−s 0
(p) Pu h(p) (i, x) du if 0 ≤ s ≤ t 0 otherwise
(1)
for all (i, x) ∈ E × V . The proof of the previous result is based on the Chapman-Kolmogorov equations, as well as the following theorem. (p)
Theorem 1 Let t be fixed. Under assumptions H1 and H2 , the function p −→ Rρ0 (t) is differentiable on O and we have: (p)
∂Rρ0 (t) = ∂p (p)
t
0
where we set ∂H∂p ϕ (i, x, s) := all (i, x, s) ∈ E × V × R+ .
∂h ρ(p) s
∂p
j∈E
(p)
ds +
t
0
∂a(p) ∂p (i, j, x)
ρ(p) s 6
∂H (p) (p) ϕt−s ds ∂p
(2)
7 μ(i,j,x) ϕ(j, ·, s) for all ϕ ∈ DH ,
Equation (2) is an extension of the results from [7] for pure jump Markov processes.
2. Asymptotic Results We first transform (2) in view of studying its asymptotic expression.
Sensitivity Estimates in Dynamic Reliability - S. Mercier & M. Roussignol
211
Lemma 2 Under assumptions H1 and H2 , we have: (p)
1 ∂Rρ0 (t) t ∂p 1 t−s 2 1 t (p) ∂H (p) 1 t (p) ∂h(p) (p) (p) (p) (p) ds + = Pu h − π h du ds ρ ρ t 0 s ∂p t 0 s ∂p 0 The proof of the previous lemma is based on (1 − 2), and on the fact that 6 (p) (p) 7 6 (p) (p) 7 ∂H (p) (p) π h = π h 1 = 0. ∂p 1 = 0 because H
∂H (p) ∂p
We now need the following additional assumptions (H3 ): the process (It , Xt )t≥0 is positive Harris-recurrent with π (p) as unique stationary distribution, and for each p ∈ O, +∞ +∞ there exists a function f (p) such that; 0 f (p) (u) du < +∞, ;0 u f (p) (u) du < ; ; (p) +∞, limu→+∞ f (p) (u) = 0 and ; Pu h(p) (i, x) − π (p) h(p) ; ≤ f (p) (u) for all (i, x) ∈ E × V , all u ≥ 0. We get: Theorem 2 Under assumptions H1 , H2 and H3 , the function U h(p) (i, x) :=
0
+∞
Pu(p) h(p) (i, x) − π (p) h(p) du
exists for all (i, x) ∈ E × V and (p)
(p)
1 ∂Rρ0 ∂h(p) ∂H0 (t) = π (p) + π (p) U h(p) t→+∞ t ∂p ∂p ∂p lim
(3)
(p) 6 7 (p) ∂H where we set ∂p0 ϕ (i, x) := j∈E ∂a∂p (i, j, x) μ(i,j,x) ϕ(j, ·) for all ϕ ∈ DH0 , all (i, x) ∈ E × V . Also, the function U h(p) is solution of the following differential equation:
(p)
H0 U h(p) (i, x) = π (p) h(p) − h(p) (i, x) The previous theorem provides an extension of the results from [2] for pure jump Markov processes. We now look at two examples. In such examples, dependence on p (namely (p) ) is generally not specified any more, in order to get simpler notations.
3. First Example A single component is considered, which is perfectly and instantaneously repaired at each failure. The time evolution of the component is described by the process (Xt )t≥0 where Xt stands for the time elapsed at time t since the last instantaneous repair. (There is one single discrete state here so that component It is not necessary). The failure rate for the component at time t is λ (Xt ) where λ (·) is some continuous non-negative function.
212
Recent Advances in Recurrent Event Modeling and Inference
The process (Xt )t≥0 is "renewed" after each repair so that μ (x) (dy) = δ0 (dy) and the evolution of (Xt )t≥0 between renewals is given by g (x, t) = x + t. We are interested in the rate of renewals on [0, t], namely in the quantity Q (t) such that: ' 1 t 2 & 1 1 t R (t) = E0 λ (Xs ) ds = λ (x) ρs (dx) ds Q (t) = t t t 0 R+ 0 where R (t) is the renewal function associated to the underlying renewal process and ρs is the distribution of Xs given that X0 = 0. The function λ (x) depends on some parameter p and we want to compute ∂Q(t) ∂p . Using (1 − 2), we get: 1 ∂Q (t) = ∂p t
t 0
s
0
ρs (dx)
∂λ (x) (1 − ϕt (0, s) + ϕt (x, s)) ds ∂p
∂ ∂ ϕt (x, s)+ ∂x ϕt (x, s) = λ (x) for where ϕt is solution of λ (x) (ϕt (0, s) − ϕt (x, s))+ ∂s all s ∈ [0, t[ and ϕt (x, t) = 0 for all x ∈ [0, t]. No closed form is available for ϕt and for the numerical computation, this equation has been discretized and solved numerically. For the asymptotic quantity, one may prove that:
∂Q (∞) = Q (∞) ∂p 1 E(T1 )
+∞
0
1 2 x ∂λ − 0v λ(u)du (x) 1 − Q (∞) e dv dx ∂p 0 +∞
v
e− 0 λ(u)du dv (mean up time). 6 7 We take λ (t) = αβtβ−1 and (α, β) = 10−5 , 4 and we compute ∂Q(t) ∂p for t ≤ ∞ and p = α, β. To validate our results, they are compared to those obtained by finite differences with: with Q (∞) =
and E (T1 ) =
0
1 (p+ε) ∂Q (t) % Q (t) − Q(p) (t) ∂p ε for small ε and t ≤ ∞, with p = α, β. For the asymptotic results, we use Q (∞) = E(T1 1 ) to compute such a derivative. For the transitory results, we use an algorithm from [8] which provides the renewal function R (t) and hence Q (t) = R(t) t . The results are gathered in Table 1 for the asymptotic derivatives and plotted in Figures 1 and 2 for the transitory results, both by the present method and by finite differences, which are quite concordant (at least for ε small enough).
Sensitivity Estimates in Dynamic Reliability - S. Mercier & M. Roussignol Table 1.
∂Q(∞) ∂α
and
∂Q(∞) ∂β
by finite differences (FD) and the present method (MR) FD
MR
∂Q(∞) ∂α
10−4
10−6
10−8
10−10
5.1 × 102
1.496 × 103
1.5504 × 103
1.5510 × 103
∂Q(∞) ∂β
10−4
10−6
10−8
10−10
4.3761 × 10−2
4.3760 × 10−2
4.3760 × 10−2
4.3760 × 10−2
ε ε
213
1.5509 × 103 4.3755 × 10−2
2500
2000
1500
1000 ∂ Q(t), MR α
−9 ∂αQ(t), FD, ε=10
500
−6
∂αQ(t), FD, ε=10 ∂ Q(∞) α
0 0
10
20
30
40
50
t
Figure 1.
∂Q(t) ∂α
by FD and MR.
0.07 0.06 0.05 0.04 0.03 0.02 0.01
∂ Q(t), MR β
−3
∂βQ(t), FD, ε=10
0
∂ Q(∞) β
−0.01 0
10
20
30
Figure 2.
∂Q(t) ∂β
40
50
by FD and MR.
4. Second Example A tank is considered, which may be filled in or emptied out using a pump. This pump may be in two different states: "in" (state 0) or "out" (state 1). The level of liquid in the tank goes from 0 up to R. The state of the system "tank-pump" at time t is (It , Xt ) where It is the discrete state of the pump (It ∈ {0, 1}) and Xt is the continuous level in the tank (Xt ∈ [0, R]). The transition rate from state 0 (resp. 1) to state 1 (resp. 0) at time t is λ0 (Xt ) (resp. λ1 (Xt )). The speed of variation for the liquid level in state 0 is v0 (x) = r0 (x) with r0 (x) > 0 for all x ∈ [0, R[ and r0 (R) = 0: the level increases in state 0 up to reaching R, where it remains constant. Similarly, the speed in state 1 is
214
Recent Advances in Recurrent Event Modeling and Inference
v1 (x) = −r1 (x) with r1 (x) > 0 for all x ∈ ]0, R] and r1 (0) = 0: the level of liquid decreases in state 1 until reaching 0, where it remains constant. Also, the level in the tank is continuous so that μ (i, 1 − i, x) (dy) = δx (dy) for i ∈ {0, 1}, all x ∈ [0, R]. The functions ri and λi are assumed to be continuous, with λi bounded, which ensures an almost sure finite number of jumps on [0, t] (all t ≥ 0). Such an example is very similar to that from [1]. The main difference is that we here assume Xt to remain bounded (Xt ∈ [0, R]) whereas Xt takes its values in R+ in the quoted paper. In order to study asymptotic quantities, we assume conditions which ensures the process (It , Xt )t≥0 to be ϕ-irreducible, in the sense of [6]. Such conditions for irreducibility are very similar to those from [1]: we first take λ1 (0) > 0 and λ0 (R) > 0 which pre(i) vents the system from being stuck in states (1, 0) and (0, R), respectively. Setting tx→y for the deterministic time to go from x up to y following the curve (g (i, x, t))t∈R (all x, y ∈ [0, R]), we also assume that:
R
if x
y
if 0
1 (0) du = tx→R = +∞, then r0 (u) 1 (1) du = ty→0 = +∞, then r1 (u)
R
x y
λ0 (u) du = +∞ for some x ∈ [0, R[ r0 (u)
λ1 (u) du = +∞ for some y ∈ ]0, R] r1 (u)
0
Under such conditions, the process (It , Xt )t≥0 may be proved to be positive Harris recurrent and its single invariant distribution π is available in closed form. We are interested in two quantities: first, the proportion of time spent by the level in the tank between two fixed bounds a and b with 0 < a < b < R and we set: Q0 (t) =
1 Eρ t 0
1 0
t
2 1{a≤Xs ≤b} ds
1
=
1 t i=0
t 0
b
ρs (i, dx) ds = a
1 t
0
t
ρs h1 ds
with h0 (i, x) = 1[a,b] (x). Secondly, the mean number of times the pump is turned from state "in" (0) to state "out" (1) by unit time, namely: ⎛ ⎞ 1 1{Is− =0;Is =1} ⎠ Q1 (t) = Eρ0 ⎝ t 0<s≤t
1 = Eρ0 t
1
0
t
2
λ0 (Xs ) 1{Is =0} ds
1 = t
0
t
ρs h1 ds
with h1 (i, x) = 1{i=0} λ0 (x). For i1 = 0, 1, we assume that λi1 (x) depends on some parameter αi1 (but no other ∂Q (t) ∂Qi0 (∞) data depends on αi1 ). For i0 , i1 ∈ {0, 1}, we then want to compute ∂αi0i and ∂α . i1 1 As for the asymptotic derivatives, one may prove that U hi0 (1, x) − U hi0 (0, x) = −
0
x1
ui0 (1, z) ui0 (0, z) + r1 (z) r0 (z)
2 e
z λ1 (y) x
r1 (y)
λ (y) − r 0(y) dy 0
Sensitivity Estimates in Dynamic Reliability - S. Mercier & M. Roussignol Table 2.
∂Qi0 (∞) ∂αi1
by finite differences (FD) and the present method (MR).
∂Q0 (∞) ∂αi
∂Q1 (∞) ∂αi
i
FD
MR
FD
MR
0
−1.4471 × 10−2
−1.4469 × 10−2
−5.5294 × 10−2
−5.5303 × 10−2
1
−1.7471 × 10−2
−1.7469 × 10−2
−4.9948 × 10−2
−4.9946 × 10−2
Table 3.
∂Qi0 (t) ∂αi1
215
for t = 2 by finite differences (FD) and the present method (MR). ∂Q0 (2) ∂αi
∂Q1 (2) ∂αi
i
FD
MR
FD
MR
0
−4.7561 × 10−2
−4.7580 × 10−2
−8.6747 × 10−2
−8.6599 × 10−2
1
−4.5566 × 10−3
−4.5166 × 10−3
−2.7299 × 10−2
−2.7370 × 10−2
∂Q
(∞)
i0 and closed forms are then available for ∂α (i0 , i1 ∈ {0, 1}), using (3). i1 As for the transitory quantities, one needs to compute numerically quantities of the (i ) shape ρt h as well as the importance functions ϕt 0 (i1 , ·, ·) (where i0 , i1 ∈ {0, 1}). Both are computed using finite volume methods as in [4] and numerical approximations for ∂Qi0 (t) ∂αi1 are derived using (2). The system is assumed to initially be in state (I0 , X0 ) = (0, R/2). Besides, we take:
λ0 (x) = xα0 ; r0 (x) = (R − x)
r0
; λ1 (x) = (R − x)
α1
; r1 (x) = xr1
for x ∈ [0, R] with αi > 0 and ri > 1 and the following numerical values: α0 = 1.05; r0 = 1.2; α1 = 1.10; r1 = 1.1; R = 1; a = 0.3; b = 0.7. Similarly as for the first method, we test our results using finite differences (FD). The results are rather stable choosing different values for ε and the results are provided for ε = 10−6 . The asymptotic results are given in Table 2 and the transitory ones in Table 3 for t = 2. The results are very similar by FD and MR both for asymptotic and transitory quantities, which clearly validate the method.
5. Discussion We have here presented extension of the results from [2] and [7] to PDMPs. The computation of ∂R ∂p (t) requires the computation of the distribution (ρs )0≤s≤t and of the importance function (ϕs )0≤s≤t , which are independent on the choice of the parameter with respect of which we differentiate. This means that in case derivatives with respect to different parameters are needed, say p1 , ..., pm , both (ρs )0≤s≤t and (ϕs )0≤s≤t only have to ∂R be computed once. On the contrary, evaluation of ∂p (t) for some i = 1, ..., m by finite i differences requires the computation of (ρs )0≤s≤t for the initial parameters p1 , ..., pm but also for the same set of parameters with one single pi substituted by pi + ε (for the ∂R fixed i). This means that evaluation of ∂p (t) for all i = 1, ..., m requires computation i
216
Recent Advances in Recurrent Event Modeling and Inference
of (ρs )0≤s≤t for m + 1 set of parameters (the initial set + m other sets). Our method is then cheaper in computing times than finite differences. This paper is a first step for the computation of derivatives of functionals of PDMP with respect to some parameter p. Indeed, the present study is restricted to the case where only the discrete transition rates depend on parameter p. Some additional mathematical work remains to be done to extend the results to more general cases. Also, some reflection should be lead on to go through the numerical computation of the importance functions, in case of larger studies than the small examples of this paper. Acknowledgment: the authors would like to thank Anne Barros, Christophe Bérenguer, Laurence Dieulle and Antoine Grall from Troyes Technological University (Université Technologique de Troyes) for having drawn their attention to the present subject.
References [1] [2] [3] [4] [5] [6] [7] [8]
O. Boxma, H. Kaspi, O. Kella, D. Perry (2005). On/Off Storage Systems with State Dependent Input, Output and Switching Rates. Probab. Engrg. Inform. Sci., 19(1), pp. 1–14. X.-R. Cao and H.-F. Chen, Perturbation realization, potentials, and sensitivity analysis of Markov processes. IEEE Trans. Automat. Contr., 42(10), (1997) pp. 1382–1393. C. Cocozza-Thivent, R. Eymard, S. Mercier, M. Roussignol, Characterization of the marginal distributions of Markov processes used in dynamic reliability. J. Appl. Math. Stoch. Anal. 2006, (2006) pp. 1–18. C. Cocozza-Thivent, R. Eymard, S. Mercier, A finite volume scheme for dynamic reliability models. IMA J. Numer. Anal., 26(3), (2006) pp. 446–471, Advance Access published on 6 March 2006. M. H. A. Davis, Markov models and optimization. Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1993. D. Down, S.P. Meyn and R. Tweedie, Exponential and Uniform Ergodicity of Markov Processes. Ann. Probab., 23, (1996) pp. 1671–1691. A. Gandini, Importance and sensitivity analysis in assessing system reliability. IEEE Trans. Reliab., 39(1), (1990) pp. 61–70. S. Mercier, Discrete random bounds for general random variables and applications to reliability. European J. Oper. Res., 177(1), (2007) pp. 378–405, available online 15 Feb. 2006.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
217
Renewal Theory with Discounting J.A.M. VAN DER WEIDE a,1 , J.M. VAN NOORTWIJK b and SUYONO c Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, P.O. Box 5301, NL-2600 GA Delft, The Netherlands. b HKV Consultants, P.O. Box 2120, NL-8203 AC Lelystad, The Netherlands. c Jurusan Matematika FMIPA Universitas Negeri Jakarta Jl. Pemuda no. 10 Rawamangun, Jakarta Timur 13200, Indonesia. a
Abstract. To determine optimal investment and maintenance decisions, the total costs should be minimized over the whole life of a system or structure. In minimizing life-cycle costs, it is important to account for the time value of money by discounting and to consider the uncertainties involved. This paper presents new results in renewal theory with costs that can be discounted according to any discount function which is non-increasing and monotonic over time (such as exponential, hyperbolic, generalized hyperbolic and no discounting). The main results include expressions for the first and second moment of the discounted costs over a bounded and unbounded time horizon. The renewal models under discounting are applied to the classic decision problem of dike heightening. Keywords. Renewal theory; Exponential discounting; Hyperbolic discounting.
Introduction Several methods to discount future costs have been proposed in different areas such as Economy, Psychology and Engineering, see Frederick et al. [4]. The present value of cost C at time t is given by D(t)C, where D is the discount function. The simplest example of a discount function corresponds to exponential discounting for which D(t) = e−rt , see Samuelson [12]. In financial terms, exponential discounting corresponds to continuous compounding with constant interest rate r. In financial markets the interest rate is definitely not constant in time. In Financial Engineering the interest rate is considered as an explanatory variable. A well-known model for the interest rate is Vasicek’s model, see Björk, [2] dr(t) = (b − ar(t)) dt + σ dW (t), where a, b, σ are positive parameters and W a Wiener process. Inflation can be included as a continuous dividend rate by adapting the parameter b. The solution of this equation is given by 1 Corresponding Author: J.A.M. van der Weide, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, P.O. Box 5301, NL-2600 GA Delft, The Netherlands, email:
[email protected]
218
Recent Advances in Recurrent Event Modeling and Inference
r(t) = r(0)e−at +
7 b6 1 − e−at + σ a
0
t
ea(u−t) dW (u)
and the corresponding discount factor is t D(t) = E e− 0 r(u) du
ab − 12 σ 2 = exp − t+ a2
/1
ab − 12 σ 2 σ − 2 a 4a
2
0
. r(0) + 1 B(0, t) ,
where B(0, t) = (1 − e−at )/a. It follows that this discount factor can be approximated with a discount factor with constant rate r = (ab − 12 σ 2 )/a2 . As an alternative to exponential discounting, several hyperbolic functional forms for the discount function have been proposed: Herrnstein [5] and Mazur [7] suggested the function D(t) = (1 + βt)
−1
,
β > 0,
(1)
and Loewenstein and Prelec [6] generalized this form to D(t) = (1 + αt)
−β/α
,
α, β > 0.
(2)
Equations (1) and (2) are called hyperbolic discounting and generalized hyperbolic discounting, respectively. For hyperbolic discounting, future cost is attached more weight than for exponential discounting, and a person’s discount rate is declining over time rather than that it is constant. A hyperbolic discount function fits often better to empirical data than the exponential discount function. Exponential discounting and hyperbolic discounting are special cases of generalized hyperbolic discounting: Equation (2) converges to exponential discounting with rate β as α → 0, and Equation (2) simplifies to hyperbolic discounting for α = β. When decisions about a nuclear waste facility must be made, Atherton and French [1] claim that hyperbolic discounting is more reasonable and justifiable than exponential discounting. Other environmental decision problems to which hyperbolic discounting might better be applied concern: global climate change, loss of bio-diversity, thinning of stratospheric ozone, groundwater pollution, minerals depletion, and many others (Weitzman [19]). To assure socio-economically sustainable civil engineering infrastructures, Rackwitz et al. [11] proposed to use the discount rate function r(t) = ρe−at + δ, where ρ, a, δ > 0. The existing literature contains a lot of results about the mean and variance of the exponentially discounted cost, mostly in asymptotic form (Rackwitz [9,10] and van Noortwijk [16]). In this paper, we will study discounting from a very general point of view with a discount function D which is only assumed to be continuous, non-negative, non-increasing with D(0) = 1. All discount functions studied in the above mentioned literature are special cases. For the proofs of the theorems and more results we refer to [15].
1. Model and Notation In this paper the times at which cost occur will be modeled as a renewal process. In particular, times between occurrences are independent and identically distributed.
Renewal Theory with Discounting - J.A.M. van der Weide et al.
219
Let T, Tj , C, Cj , j ≥ 1, be random variables defined on some probability space (Ω, F, P) such that the random variables T, Tj are positive and such that the sequence {(T, C), (Tj , Cj ), j ≥ 1} is an i.i.d. sequence of random vectors with cumulative distribution function H: H(x, y) = P(T ≤ x, C ≤ y),
x, y ∈ R+ .
It follows that the sequence {Tj , j ≥ 1} is also i.i.d. with cumulative distribution function F (x) = H(x, +∞) and F (0) = 0. Let N = {N (t) : t ≥ 0} be the renewal process associated with the sequence of partial sums (Sj )j≥1 : N (t) = max{j | Sj ≤ t} =
∞
1{Sk ≤t} ,
t ≥ 0,
k=1
where Sj = T1 + . . . + Tj , j ≥ 1 and where 1A denotes the indicator function of the set A. By convention, S0 ≡ 0. The random variables Sj , j ≥ 1, can be interpreted as the times at which maintenance actions take place and Cj is the cost of the maintenance action at time Sj . The total discounted cost over the bounded time horizon [0, t] is then given by: K(t, D) =
∞
D(Sj )Cj 1{Sj ≤t} .
(3)
j=1
In the case that the inter-occurrence times Tj and the cost Cj are independent, the process {K(t, D), t ≥ 0} is known as a compound renewal process, see Morey [8]. In the special case with discount rate r ≡ 0, the process {K(t, D), t ≥ 0} is also known as a renewalreward process, see Tijms [13, Chapter 2]. For applications in maintenance engineering, the special case Cj = c(Tj ) is important, where c : R → R+ is a given (non-random) Borel function, see van Noortwijk [16].
2. First Moment of Discounted Cost We will assume that the distribution function F of the renewal times T is absolutely continuous with probability density function f , such that the renewal process N has a renewal density m, i.e. t M (t) = E[N (t)] = m(u) du. 0
Let U (t) =
∞
Fk (t),
t ≥ 0,
k=0
where Fk is the cumulative distribution function of Sk . In particular, for any nonnegative Borel function g, t t ∞ g(x) dU (x) = g(0) + g(x)m(x) dx = E[g(Sk )1{Sk ≤t} ]. 0
0
k=0
220
Recent Advances in Recurrent Event Modeling and Inference
Note that U (t) = M (t) + F0 (t). For more information about renewal processes, see Tijms [13, Chapters 2 and 8]. Let D be a given discount function, i.e. D is a continuous, non-negative, non-increasing function with D(0) = 1. Suppose first that we are in the special case with constant unit cost, Cj ≡ 1. For the mean of the discounted cost, which we will denote in this case by K(t, D), we get from Equation (3) t ∞ E[K(t, D)] = E[D(Sj )1{Sj ≤t} ] = D(x)m(x) dx. (4) 0
j=1
For the special case that T and C are independent we get t E[K(t, D)] = E[C] D(x)m(x) dx.
(5)
0
In the general case, we can write E[K(t, D)] =
∞
E[D(Sj )Cj 1{Sj ≤t} ]
j=1
=
t
0
E[D(x + T )C1{x+T ≤t} ] dU (x).
(6)
Theorem 1 Let E[C] < ∞ and D an arbitrary discount function. 1. If
∞
0
D(x)m(x) dx < ∞,
the expected discounted cost over an unbounded horizon is finite: ∞ E[D(x + T )C] dU (x). lim E[K(t, D)] = t→∞
2. If
(7)
0
∞
0
D(x)m(x) dx = +∞,
the long-term expected cost per renewal is given by lim t
t→∞
0
E[K(t, D)] D(x)m(x) dx
= E[C].
Theorem 1 determines the long-term expected cost per renewal. For the purpose of reserving budget for performing future maintenance actions, it is important to determine how much money these actions cost per unit time while taking the discounting into account. In finance, this cost is known as the equivalent average cost per unit time (see e.g. Wagner [17, Chapter 11] and Brealey and Myers [3, Chapter 6]). The expected equivalent average cost (EEAC) per unit time computed over a bounded time horizon of length t is defined as E[K(t, D)] EEAC = t . 0 D(x) dx
(8)
Renewal Theory with Discounting - J.A.M. van der Weide et al.
221
Washburn [18] denoted the EEAC with theequivalent rate of spending. For a bounded ∞ horizon, and an unbounded horizon with 0 D(x) dx < ∞, the equivalent average cost per unit time can also be interpreted as a stream of fixed identical cost per unit time sufficient to recover all the necessary discounted cost. In this situation, the present values of the expected equivalent average cost per unit time summed over a bounded time horizon is equal to the total expected discounted costs over the whole time horizon. Under the same assumptions as Theorem 1, the long-term expected equivalent average cost per unit time can be written as follows. ∞ Corollary 1 If 0 D(x) dx = +∞, then the long-term expected equivalent average cost per unit time is E[C] E[K(t, D)] lim t . = E[T ] D(x) dx 0
t→∞
As an example consider the case of exponential discounting for which D(t) = e−rt . In this case we have ∞ E[e−rT ] , D(x)m(x) dx = 1 − E[e−rT ] 0 and the limit of the expected value of the discounted cost K(t, r) is given by " # E e−rT C . lim E[K(t, r)] = t→∞ 1 − E [e−rT ] See [15] for an economic interpretation.
3. Second Moment of Discounted Cost For any discount function D, the second moment of the discounted cost over a bounded time horizon can be written as t " # E D2 (x + T )C 2 1{x+T ≤t} dU (x) E[K 2 (t, D)] = t +2 0
0
0
t
# " E D(x + T )CD(x + T + y + T )C 1{x+T +y+T ≤t} dU (x)dU (y),
where (T , C ) and (T, C) are i.i.d. Theorem 2 Let E[C 2 ] < ∞. For any discount function D with ∞ D2 (x)m(x) dx < ∞, 0
we have
222
Recent Advances in Recurrent Event Modeling and Inference
2
lim E[K (t, D)] =
t→∞
+2
∞
0
0
∞
∞
0
" # E D2 (x + T )C 2 dU (x)
E [D(x + T )CD(x + T + y + T )C ] dU (x) dU (y).
∞ For a comment about the case that 0 D2 (x)m(x) dx = ∞, see Section 2. The expression for the second moment simplifies when T and C are independent: t 2 2 D2 (x)m(x) dx E[K (t, D)] = E[C ] 0
+2(E[C])2
D(x)D(x + y)m(x)m(y) dx dy.
(9)
x+y≤t
In the case of exponential discounting we get " #6 " #7 " # " # E C 2 e−2rT 1 − E e−rT + 2E Ce−rT E Ce−2rT 2 . lim E[K (t, r)] = t→∞ (1 − E [e−rT ]) (1 − E [e−2rT ])
4. Generalized Hyperbolic Discounting The generalized hyperbolic (or gamma) discount factor is given by DGH (t) = (1 + αt)−β/α ,
α, β > 0,
see Loewenstein and Prelec [6]. If α = β, we get the hyperbolic discount function DH (t) =
1 . 1 + βt
The limit, as α → 0, is exponential discounting. The generalized hyperbolic discount factor has a kind of Bayesian interpretation as the expected value of an exponential discount factor with uncertain rate r. If the uncertainty in r is modeled by a gamma distributed random variable R with mean β and variance αβ, we get ∞ # " (r/α)β/α−1 −r/α e e−rt dr = (1 + αt)−β/α , E e−Rt = αΓ(β/α) 0 see Weitzman [20]. Therefore, generalized hyperbolic discounting is also called gamma discounting. It follows that the expectation of the discounted cost over (0, t] is given by ⎡ ⎤ ∞ E[K(t, DGH )] = E ⎣ (1 + αSj )−β/α Cj 1{Sj ≤t} ⎦ j=1
=
0
∞
⎡
E⎣
∞ j=1
⎤ e−rSj Cj 1{Sj ≤t} ⎦
(r/α)β/α−1 −r/α e dr. αΓ(β/α)
Renewal Theory with Discounting - J.A.M. van der Weide et al.
In the term E
∞ −rSj Cj 1{Sj ≤t} j=1 e
223
we recognize the expectation of the exponen-
tially discounted cost over (0, t] which is a direct consequence of the interpretation of the generalized hyperbolic discount factor as the expected value of an exponential discount factor with uncertain rate r, where the uncertainty is given by a gamma distribution over the the rate r. Assume that we have constant cost C and let the inter-renewal times be exponentially distributed with mean 1/m. In this case, the renewal density m(x) ≡ m. Note that t t m (1 + αt)(α−β)/α − 1 . DGH (x)m dx = (1 + αx)−β/α m dx = α−β 0 0 ∞ So, for α > β, the integral 0 DGH (x) dx = +∞. It follows from (5) E[K(t, DGH )] =
Cm (1 + αt)(α−β)/α − 1 , α−β
(10)
and from (9) E[K 2 (t, DGH )]
t t 2 2 =C DGH (x)m dx + 2 0
0
0
t−x
.
. DGH (x + y)m dy DGH (x)m dx
2
=
C 2 m2 C m ((1 + αt)(α−2β)/α − 1) + ((1 + αt)(α−β)/α − 1)2 . α − 2β (α − β)2
It follows that for α > 2β ∞ 2 DGH (x) dx = +∞ and 0
E[K 2 (t, D)] lim t = +∞. 2 t→∞ 0 DGH (x) dx
So, in general, we cannot expect that a statement analogous to Theorem 2.1(2) will hold for the second moment of the discounted cost, but note that in this example t C2m 2 ((1 + αt)(α−2β)/α − 1) = C 2 m DGH (x) dx.(11) Var[K(t, DGH )] = α − 2β 0 5. Application This section illustrates the influence of gamma discounting instead of exponential discounting by revisiting the classic economic decision problem of heightening a dike (adapted from van Dantzig [14]). Suppose it has to be decided how high a dike should be to prevent a polder from flooding. The aim is to determine the cost-optimal dike height for which the expected costs of dike heightening and flooding is minimal. Let the height of the dike h be the decision variable, and let h0 = 3.25 m be the initial height of the dike at the moment the decision is taken. The costs of heightening the dike with h − h0 meters are assumed to be linear in h − h0 . They consist of the fixed cost cf = 1.1 × 108 Dutch guilders and the variable cost cv = 4.0 × 107 Dutch guilders: i.e., c0 (h) = cf + cv (h − h0 ). The dike heightening will be performed at year zero.
224
Recent Advances in Recurrent Event Modeling and Inference 8
5.5
x 10
Investment cost plus expected discounted cost of flooding Exponential discounting with r=0.015 Gamma discounting with E(R)=0.015 and CV(R)=0.75
5
Loss [Dutch guilders]
4.5
4
3.5
3
2.5
2 5
5.5
6
6.5 7 Dike height [m +NAP]
7.5
8
Figure 1. Mean of discounted costs of dike heightening and flood damage over an unbounded time horizon as a function of the dike height for exponential and gamma discounting.
The problem of computing the expected costs of flooding can be tackled by applying a non-homogeneous Poisson process as follows. Let {N (t) : t ≥ 0} be a nonhomogeneous Poisson process with intensity-rate function m(t) and denote the interi occurrence times of the Poisson events by T1 , T2 , . . . Define Si = h=1 Th , i = 1, 2, . . . In the context of flooding, the intensity rate m(t) can be interpreted as the frequency of flooding dependent on time t > 0. The flooding frequency can include, e.g., the influence of sea-level rise, settlement and dike-improvement measures. Let the associated cost of flooding be time dependent as well (for example, due to economic growth) and denote it by c(t). Furthermore, we apply general discounting in terms of discount function D(t). The probability of n floods in a period of t years is then a Poisson distribution t with parameter 0 m(x) dx. Using the probability density function of Sn , the expected discounted cost of flooding in a time period of length t can now be written as ∞ t 1{Sn ≤t} D(Sn )c(Sn ) = D(τ )c(τ )m(τ ) dτ. (12) E[K(t, D)] = E τ =0
n=1
Similarly, for the variance we get Var[K(t, D)] =
t τ =0
D2 (τ )c2 (τ )m(τ ) dτ.
(13)
Renewal Theory with Discounting - J.A.M. van der Weide et al.
3
225
18 x 10 Variance of investment cost plus discounted cost of flooding
Exponential discounting with r=0.015 Gamma discounting with E[R]=0.015 and CV[R]=0.75
2
Variance of loss [Dutch guilders ]
2.5
2
1.5
1
0.5
0 5
5.5
6
6.5 7 Dike height [m +NAP]
7.5
8
Figure 2. Variance of discounted costs of dike heightening and flood damage over an unbounded time horizon as a function of the dike height for exponential and gamma discounting.
The only failure mechanism that we regard is overtopping, i.e. flooding of the polder will occur as soon as the sea level exceeds the height of the dike. We assume that the frequency of the sea level exceeding the dike height h is constant and can be represented with an exponential function with location parameter s0 = 1.96 m and scale parameter w = 0.33 m; that is, m(t) ≡ m = exp {−(h − s0 )/w} for all t ≥ 0. Here, we take m constant to simplify the calculations. If the polder is flooded, an economic value of c = 2.4 × 1010 Dutch guilders is lost. The loss includes, amongst others, human lives, housing, industry, agriculture, stocks, cattle, means of production, and the cost of repairing collapsed dikes. The flood damage costs c are assumed to be constant in time. The present discounted value of future costs are determined by exponential discounting with annual discount rate r = 0.015. According to Equation (13), the expected discounted costs of dike heightening and flood damage over an unbounded horizon (t → ∞) are
. h − s0 c lim E[K(t, r)] = cf + cv (h − h0 ) + exp − . t→∞ r w For exponential discounting, the cost-optimal dike height for which the expected discounted costs are minimal is 5.82 m (see Figure 1). The influence of imposing an uncertainty distribution on discount rate r is as follows. Let us assume that the discount rate R has a gamma distribution with mean 0.015 and coefficient of variation (CV) 0.75 (the latter value is taken from Weitzman [20]). The corresponding values of the parameters of the gamma distribution are α = 0.0084 and β = 0.0150. For gamma discounting,
226
Recent Advances in Recurrent Event Modeling and Inference
the expected discounted costs over an unbounded horizon can be computed using Equation (10) for t → ∞. The cost-optimal dike height under gamma discounting is 6.10 m, which is higher than for exponential discounting (see Figure 1). Using Equation (11), we can also compute the variance of the discounted costs over an unbounded time horizon for both exponential and gamma discounting (see Figure 2). Clearly, the uncertainty in the total discounted costs decreases as the dike height is chosen to be higher. The authors believe that incorporating the uncertainty in the discount rate by means of gamma discounting is preferred to assure a sustainable development. For this purpose, the renewal models with discounting presented in this paper can be of great help.
References [1]
E. Atherton and S. French, Valuing the future: a MADA example involving nuclear waste storage Journal of Multi-Criteria Decision Analysis, 7(6) (1998), 304–321. [2] T. Bjork, Arbitrage theory in continuous time, Second edition, Oxford University Press, 2004. [3] R.A. Brealey and S.C. Myers, Principles of Corporate Finance, New York, McGraw-Hill/Irwin, 7th edition, 2003. [4] S. Frederick, G. Loewenstein and T. O’Donoghue, Time discounting and time preference: A critical review, Journal of Economic Literature, 40(2) (2002), 351–401. [5] R.J. Herrnstein, Self-control as response strength. In C.M. Bradshaw, E. Szabadi and C.F. Lowe, editors, Quantification of Steady-State Operant Behavior. Amsterdam, Elsevier/North-Holland, 1981. [6] G. Loewenstein and D. Prelec, Anomalies in intertemporal choice: Evidence and an interpretation, The Quarterly Journal of Economics, 107(2) (1992), 573–597. [7] J.E. Mazur, An adjustment procedure for studying delayed reinforcement. In M.L. Commoms, J.E. Mazur, J.A. Nevin and H. Rachlin, editors, Quantitative Analysis of Behavior: The Effect of Delay and Intervening Events on Reinforcement Value, pages 55–73. Hillsdale, NJ, Erlbaum, 1987. [8] R.C. Morey, Some stochastic properties of a compound-renewal damage model, Operations Research, 14(5) (1966), 902–908. [9] R. Rackwitz, Optimization—the basis of code-making and reliability verification, Structural Safety, 22(1) (2000), 27–60. [10] R. Rackwitz, Optimizing systematically renewed structures, Reliability Engineering and System Safety, 73(3) (2001), 269–279. [11] R. Rackwitz, A. Lentz and M. Faber, Socio-economically sustainable civil engineering infrastructures by optimization, Structural Safety, 27(3) (2005), 187–229. [12] P.A. Samuelson, A note on measurement of utility, The Review of Economic Studies, 4(2) (1937), 155– 161. [13] H.C. Tijms, A First Course in Stochastic Models, New York, John Wiley & Sons, 2003. [14] D. van Dantzig, Economic decision problems for flood prevention, Econometrica, 24(3) (1956), 276– 287. [15] J.A.M. van der Weide, Suyono and J.M. van Noortwijk, Renewal theory with exponential and hyperbolic discounting, Probability in the Engineering and Informational Sciences, 22(1) (2008), 53-74. [16] J.M. van Noortwijk, Explicit formulas for the variance of discounted life-cycle cost, Reliability Engineering and System Safety, 80(2) (2003), 185–195. [17] H.M. Wagner, Principles of Operations Research, Englewood Cliffs, NJ, Prentice-Hall, 2nd edition, 1975. [18] A. Washburn, Present values with renewals, Management Science, 38(6) (1992), 846–850. [19] M.L. Weitzman, Why the far-distant future should be discounted at its lowest possible rate, Journal of Environmental Economics and Management, 36(3) (1998), 201–208. [20] M.L. Weitzman, Gamma discounting, American Economic Review, 91(1) (2001), 260–271.
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
227
Point Estimation of the Transition Intensities for a Markov Multi-state System via Output Performance Observation Anatoly LISNIANSKI The Israel Electric Corporation,
[email protected] Abstract. In this paper a multi-state system is considered where the system and its components are allowed to have a range of performance levels from complete failure up to perfect function. This paper proposes a technique for the estimation of transition intensities between all possible performance levels based on the observation of the system output performance. In order to estimate transition intensities, a special Markov chain embedded in the observed output performance process was defined. By using this technique all the transition intensities can be estimated from an observed realization of the system output performance stochastic process. Keywords. Multi-state System, Performance, Reliability Data, Markov Model, Semi-Markov Process.
Introduction Many real world systems can perform their tasks with various different levels of efficiency usually referred to as performance rates. A system that can have a finite number of performance rates is called a multi-state system (MSS). Actually, a binary-state system is the simplest case of a MSS, having two distinguished states (perfect function and complete failure). Point estimation of transition intensities for a two-state (binary) Markov model is a well understood problem [7], [2]. But until now there have been no investigations that consider this problem in a multi-state context, in spite of the fact that it is an actual practical problem [6]. Multi-state models are widely used now in many applications. For example, in the field of power system reliability assessment it has been recognized [1] that the use of simple two-state models to assess the generator capacity of large generating units can yield pessimistic appraisals. In order to assess generator reliability more accurately, many utilities now use multi-state models instead of two-state representations. A technique called the apportioning method [1] is usually used to create the multi-state generating unit models based on real world statistical data. Using this technique, steady-state probabilities of a unit residing at different generator capacity levels can be defined. When the short-term behavior of a MSS is studied the investigation cannot be based on steady-state (long-term) probabilities. This investigation should use a MSS model, where transition intensities between any states of the model are known.
228
Recent Advances in Recurrent Event Modeling and Inference
The problem is to estimate these transition intensities from actual MSS failure and repair statistics, which are found by observing the realization of the output performance, which is a stochastic process. Such a problem has not been considered in the literature until now and the corresponding solution will be presented in the work.
1. Problem formulation. A multi-state Markov model and observed reliability data A general Markov model of the MSS with minor and major failures and repairs [6] is presented in Figure 1.
...
N
a N −1, N
...
a N, N−1 N-1
a2 N
a1 ,N a1,N−1
...
...
a N−2, N−1
... aN −1, N −2
aN ,2
a3 , 2
a2 ,3 ...
...
aN ,1
2 ... a1,N
a 2 ,1 ... 1
Figure 1. General Markov model for MSS
There are N states in the model, where each state i ∈ [1, .., N ] has its own assigned performance level gi . Usually state N is associated with the nominal performance level. State 1 is associated with complete system failure and all other states i ∈ [2, .., N − 1]are associated with corresponding reduced performance levels gi . The transition intensity from state i to state j is designated as aij . Observed reliability data for MSSs is usually presented as a realization of a continuous-time discrete-state stochastic process, which describes the MSS output performance GA (t) as a function of time. As a result, the MSS output performance is known for any time instant t ∈ [0, T ], where T is the total observation time, as well as corresponding time instants of MSS transitions from any performance level gi to level gj , i, j ∈ [1, ..., N ]. An example of a single realization of such a stochastic process is presented in Figure 2. According to its nature, the stochastic process GA (t) is a discrete-state continuoustime process. For this stochastic process the following designations are introduced. (j) Ti - the length of time the system is in state i for the j -th time it enters that state during observation time T. ki - the accumulated number of system entrances to state i (or the accumulated number of system exits from state i to any other state) during observation time T. kij - the accumulated number of system transitions from state i to any state j = i during observation time T.
Point Estimation of the Transition Intensities - A. Lisnianski
GA(t) g N
TN(1)
g N−1
(3) TN
( 2) TN
229
( 4) TN
TN(1−)1
TN( 2−)1
gN − 2
g3
g 2 g 1 = 0
T3(1) T1(1) T
t
Figure 2. MSS output performance GA (t) as a stochastic process (single realization)
For example, for the realization presented in Figure 2 the MSS sits in state N on four separate occasions during the observation time T (kN = 4), once it transitioned from state N to state N − 1 (kN,N −1 = 1), once it transitioned from state N to state 3 (kN,3 = 1), and once it transitioned from state N to state 1 (kN,1 = 1). So, the following data can be derived from observation of the performance stochastic process during time T . For each state i: $ % (1) (2) (k ) 1. The sample Ti , Ti , ..., Ti i of system sojourn (residing) times in state i during observation time T 2. The number kij of system transitions from state i to any possible state j during observation time T 3. The number ki of times the system rests in state i (or the number of system exits from state i to any other possible state) during observation time T. The problem is to estimate the transition intensities aij , i, j ∈ [1, ..., N ] based on a single realization of the discrete-state continuous-time stochastic process GA (t) that was observed during time T.
2. Description of the method As it was shown above the stochastic process GA (t) is a discrete state continuous time Markov process. Here we introduce an additional stochastic process associated with the process GA (t). If random times between transitions from state i to state j = i in the process GA (t) are ignored and only the time instants of the transitions are of interest the resulting process will be a discrete state discrete time Markov chain. Such a process is considered only at the time instants of the transitions in the underlying process GA (t) and is called a Markov chain GAm (n) , n = 0, 1, 2, ...embedded in the process GA (t) [3]. The embedded Markov chain GAm (t) is completely defined by the probability distribution of it’s initial states and the one-step transition probabilities πij , ij = [1, ..., N ]. The transitions between different states of the model in Figure 1 are executed as consequences of events such as failures and repairs. Since the MSS is described by a Markov
230
Recent Advances in Recurrent Event Modeling and Inference
model the cumulative distribution function (cdf) Fij (t) of time between the transition from state i to any state i = j is defined by the corresponding transition intensity such as the following Fij (t) = 1 − e−aij t
(1)
where aij is the transition intensity from state i to state j, Function Fij (t) is the distribution of the so-called [4] conditional sojourn time Tij in state i, which characterizes the system sojourn time in state i under the condition that the unit transitions from state i to state j. If the MSS is initially in state i at time instant t=0, the probability Qij (t) that it will transition from state i to state j before time t is called a one-step transition probability of the discrete state continuous time process GA (t) . All these probabilities Qij (t) , i, j = [1, ..., N ] define the kernel matrix Q (t) [4] for the stochastic process GA (t): Q (t) = |Qij (t)|
(2)
These one-step probabilities for the kernel matrix may be defined in the following way [6], [5]. Each probability Qik (t)is defined as the probability that random variable Tik will be the minimum of the set of random variables Tij , j = i, j = [1, ..., N ], (that define random residing times in state i up to possible transitions from state i to all other states). So, for each k = i one will have Tik = min {Ti1 , ..., Tik−1 , Tik+1 , ...TiN }
(3)
Based on (3) one can obtain the one-step probability Qij (t) as the probability that under the condition Tik ≤ t, the random variable Tik is less than all the other variables Tij , j = i, j = k, j = [1, ..., N ]. Hence, for each i = 1, 2, ...N and i = k the following can be written Qik (t) = Pr {(Tik ≤ t) ∩ (Ti1 > t) ∩ .. ∩ (Tik−1 > t) ∩ (Tik+1 > t) ∩ .. ∩ (TiN > t)} t =
∞ dFik (u)
0
t
∞ ∞ ∞ dFi1 (u) ... dFik−1 (u) dFik+1 (u) ... dFiN (u) t
t
(4)
t
t [1 − Fi1 (u)] ... [1 − Fik−1 (u)] [1 − Fik+1 (u)] ... [1 − FiN (u)] dFik (u) .
= 0
By using (4) and taking into account Expression (1) this becomes ⎡ ⎤ N − aij t aik ⎣ Qik (t) = N 1 − e j=1 ⎦ aij j=1
(5)
Point Estimation of the Transition Intensities - A. Lisnianski
231
Based on the one-step probabilities Qij (t) , i, j = [1, ..., N ] the cdf Fi (t) of the unconditional sojourn time Ti for any state i can be obtained as follows
Fi (t) =
N
Qik (t) = 1 − e
−
N aij t j=1
(6)
k=1
So, for a Markov model of a MSS the unconditional sojourn time Ti is an exponentially distributed random variable with mean
Timean =
1 N j=1
where A =
N j=1
=
aij
1 , A
(7)
aij .
According to well-known methods [2] estimation of the mean unconditional sojourn % $ (1) (2) (ki ) 9 time Timean can be done by using the sample Ti , Ti , ..., Ti ki
T9imean =
(j)
Ti
j=1
ki
(8)
Based on (7) and (8) one can write the following expression for estimating the sum A of the intensities of all the transitions from state i
9= A
ki 1 = k i 9 Timean (j) Ti
(9)
j=1
By using Expression (9) one can only estimate the sum of the intensities for all the transitions from state i. In order to estimate the individual transition intensities, an additional expression must be obtained. Based on the kernel matrix Q (t) for the stochastic process GA (t) one can obtain the one-step transition probabilities for the embedded Markov chain GAm (t)
πij = lim Qij (t) t→∞
Taking into account Expression (5) this leads to
(10)
232
Recent Advances in Recurrent Event Modeling and Inference
πik
⎧ ⎫ ⎪ ⎪ ⎡ ⎤ ⎪ ⎪ N ⎪ ⎪ ⎨ a − aij t ⎬ aik ik ⎣ j=1 ⎦ = N = lim Qik (t) = lim 1−e N t→∞ t→∞ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ aij aij ⎩ ⎭ j=1
(11)
j=1
or
aik = πik
N
aij
(12)
j=1
Based on an observed single realization of the output performance stochastic process, the one-step transition probabilities πik of the embedded Markov chain can easily be estimated as a ratio of the corresponding transition frequencies
π 9ik =
kik ki
(13)
Substituting estimates (9) and (13) into Expression (12) the following estimate will be obtained for the transition intensity 9= 9ik A 9 aik = π
kik ki j=1
=
(j) Ti
kik , i, k = [1, ..., N ] , i = k T i
(14)
where T i is the accumulated time that the system resides in state i during the total observation time T. N N aij = 0 , therefore 9 aii = − 9 aij . For a Markov MSS with N states the sum j=1,j=i
j=1
3. The algorithm for Estimating the Transition Intensities Based on the method described in the previous section the following algorithm for data processing is suggested for a Multi-state Markov system with N possible states. 1. Calculate the accumulated time that the system resides in state i during the total ki (j) observation time using T i = Ti j=1
2. Estimate the transition intensity aij from state i to state j = i by using the folk lowing expression 9 aij = Tiji N 3. Estimate the transition intensities for j = i using 9 aii = − 9 aij . j=1,j=i
Point Estimation of the Transition Intensities - A. Lisnianski
233
4. Numerical example A diesel-generator with a nominal generating capacity of 360 KWT is considered. During an observation time T = 1 year the generator was in the following 4 states: state 4 with a nominal capacity of g4 = 360KW T ; state 3 with a reduced generating capacity of g3 = 325KW T ; state 2 with a reduced generating capacity of g2 = 215KW T and state 1 (complete failure) with a generating capacity of g1 = 0. The corresponding accumulated times that the generator remained in each state i during the observation time T were: T 1 = 480 hours; T 2 = 472 hours; T 3 = 511 hours; T 4 = 7027 hours. The observed numbers kij of the generator transitions from state i to state j are presented in the following table. State number 1 2 3 4
1 18 11 20
2 0 0 43
3 0 0 58
4 31 31 50 -
By using the presented algorithm the following matrix of point estimates for the transition intensities was computed ; ; ; −0.065 0 0 0.065 ;; ; ; 0.024 −0.110 0 0.086 ;; |9 aij | = ;; 0 −0.120 0.098 ;; ; 0.022 ; 0.003 0.006 0.008 −0.017 ;
5. Conclusions The problem of point estimation of the transition intensities for a Markov multi-state system via the observed output performance was considered. The corresponding estimates were obtained and an algorithm for their computation was suggested.
Acknowledgments The author would like to thank Dr I. Frenkel for his constructive comments and B. Frenkel for his contribution at the formative stage of this work.
References [1] [2] [3]
R. Billinton and R. Allan, Reliability Evaluation of Power System, Plenum Press, NY, 1996. International Standard IEC605-4. Procedures for determining point estimates and confidence limits for equipment reliability determination tests, International Electrotechnical Commission, 2001. V. Korolyuk and A. Swishchuk, Random Evolution for Semi-Markov Systems, Kluwer Academic, Singapore, 1995.
234 [4] [5] [6] [7]
Recent Advances in Recurrent Event Modeling and Inference N. Limnious and G. Oprisan, Semi-Markov Processes and Reliability, Birhauser, Boston, Basel, Berlin, 2000. A. Lisnianski and A. Jeager, Time-redundant System Reliability Under Randomly Constrained Time Resources, Reliability Engineering and System Safety, 70, (2000), 157-166. A. Lisnianski and G. Levitin, Multi-state System Reliability. Assessment, Optimization, Applications., World Scientific, NY, London, Singapore, 2003. M. Modarres, M. Kaminskiy and V. Krivtsov, Reliability Engineering and Risk Analysis. A Practical Guide, Marcel Dekker, Inc., NY, Basel, 1999.
235
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Keyword Index Keywords
Page
Aging Aging, bivariate function
158 149
Page Importance factor Inference
208 1
Availability
46
Informative
193
Bayesian analysis Bayesian nets
46 1
Intensity, conditional Interval probability
80 185
Bayesian nets, non-parametric
9
Kijima model
32,39
Bivariate distribution Coherent system
96 111
Kijima type repairs Koziol-Green model
32 193
Competing risks
63,72, 80,88,
Maintenance, imperfect Maintenance, preventive
88 32
96,149
Marked point process
80
Computer models Consecutive k-out-of-n:F system
46 119
Markov chain Monte Carlo Markov models, hidden
165 165
Copula
9
Markov Multi-state system
227
Cosets, right-hand and left-hand Degradation
138 39
Markov process, piecewise deterministic
208
Dependence Directed networks
149 177
Marshall-Olkin model Mean inactivity time
138 103
Distribution bound
111
Mean residual life, decreasing
158
Distribution-free measures Efficiency, asymptotic relative
129 193
Mixed system Mixture
111 103
Emulator
46
Mixture failure rate
96
Exchangeability Exchangeable dependent
129 111
Monitoring Network reliability
193 25
component Exchangeable distributions
158
Neymans smooth embedding Order statistics
193 111
Expectation bound
111
Oriented matroids
177
Exponential discounting Failure intensity
217 39
Outlier detection Partial repair
193 39
Failure rate, increasing
158
PDAG
25
Frailty Gaussian process
96 46
Permutations of components Point process
129 88
Graphical duration models Hazard rate
17 72
Point process, self-exciting Probabilistic graphical models
165 17
Hazard rate, cause-specific
63
Reliability
17,72,88,
Hazard rate, reversed Hybrid models
103 1
Reliability analysis
111,208 1
Hyperbolic discounting
217
Reliability growth
165
Identifiability
63
Renewal theory
217
236
Advances in Mathematical Modeling for Reliability Repair, incomplete
32
Symmetry
111
Repairable system Sensitivity analysis
80 46
System reliability System signatures
185 111
Signature
129
Time scale transformation
32
Stochastic comparisons Stochastic orders
111 158
Transition intensities Variance bound
227 111
Structure function
25
Vines
9
representation Subdistribution function
63
Virtual age Well-designed systems
32 129
Survival copula Symmetries of a system
138 129
Zero failures
185
237
Advances in Mathematical Modeling for Reliability T. Bedford et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
Author Index Adekpedjou, A. Aknin, P. Ansa, A.A. Badía, F.G. Bedford, T. Belzunce, F. Berrade, M.D. Bouillaut, L. Coolen, F.P.A. Coolen-Schrijner, P. Daneshkhah, A. Dewan, I. Dijoux, Y. Donat, R. Doyen, L. Esaulova, V. Finkelstein, M. Gaudoin, O. Haenni, R. Hanea, A. Hollander, M. Huseby, A.B. Jonczy, J. Kahle, W.
193 17 72 103 46 158 103 17 185 185 46 63 88 17 88 96 96 88 25 9 129 177 25 32
Koutras, M.V. Kurowicka, D. Langseth, H. Leray, P. Lindqvist, B.H. Lisnianski, A. Mercier, S. Mulero, J. Peña, E.A. Quiton, J. Roussignol, M. Ruggeri, F. Ruiz, J.-M. Rychlik, T. Samaniego, F.J. Sankaran, P.G. Soyer, R. Spizzichino, F. Suter, F. Suyono Triantafyllou, I.S. van der Weide, J.A.M. van Noortwijk, J.M. Volf, P.
119 9 1 17 80 227 208 158 193 193 208 165 158 111 129 72 165 138, 149 149 217 119 217 217 39
This page intentionally left blank