Subjective Probability Models for Lifetimes Fabio Spizzichino Department of Mathematics Universita’ La Sapienza Rome Italy
CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C.
MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY General Editors D.R. Cox, V. Isham, N. Keiding, T. Louis, N. Reid, R. Tibshirani, and H. Tong 1 Stochastic Population Models in Ecology and Epidemiology M.S. Barlett (1960) 2 Queues D.R. Cox and W.L. Smith (1961) 3 Monte Carlo Methods J.M. Hammersley and D.C. Handscomb (1964) 4 The Statistical Analysis of Series of Events D.R. Cox and P.A.W. Lewis (1966) 5 Population Genetics W.J. Ewens (1969) 6 Probability, Statistics and Time M.S. Barlett (1975) 7 Statistical Inference S.D. Silvey (1975) 8 The Analysis of Contingency Tables B.S. Everitt (1977) 9 Multivariate Analysis in Behavioural Research A.E. Maxwell (1977) 10 Stochastic Abundance Models S. Engen (1978) 11 Some Basic Theory for Statistical Inference E.J.G. Pitman (1979) 12 Point Processes D.R. Cox and V. Isham (1980) 13 Identification of Outliers D.M. Hawkins (1980) 14 Optimal Design S.D. Silvey (1980) 15 Finite Mixture Distributions B.S. Everitt and D.J. Hand (1981) 16 Classification A.D. Gordon (1981) 17 Distribution-Free Statistical Methods, 2nd edition J.S. Maritz (1995) 18 Residuals and Influence in Regression R.D. Cook and S. Weisberg (1982) 19 Applications of Queueing Theory, 2nd edition G.F. Newell (1982) 20 Risk Theory, 3rd edition R.E. Beard, T. Pentikäinen and E. Pesonen (1984) 21 Analysis of Survival Data D.R. Cox and D. Oakes (1984) 22 An Introduction to Latent Variable Models B.S. Everitt (1984) 23 Bandit Problems D.A. Berry and B. Fristedt (1985) 24 Stochastic Modelling and Control M.H.A. Davis and R. Vinter (1985) 25 The Statistical Analysis of Composition Data J. Aitchison (1986) 26 Density Estimation for Statistics and Data Analysis B.W. Silverman (1986) 27 Regression Analysis with Applications G.B. Wetherill (1986) 28 Sequential Methods in Statistics, 3rd edition G.B. Wetherill and K.D. Glazebrook (1986) 29 Tensor Methods in Statistics P. McCullagh (1987) 30 Transformation and Weighting in Regression R.J. Carroll and D. Ruppert (1988) 31 Asymptotic Techniques for Use in Statistics O.E. Bandorff-Nielsen and D.R. Cox (1989) 32 Analysis of Binary Data, 2nd edition D.R. Cox and E.J. Snell (1989)
33 Analysis of Infectious Disease Data N.G. Becker (1989) 34 Design and Analysis of Cross-Over Trials B. Jones and M.G. Kenward (1989) 35 Empirical Bayes Methods, 2nd edition J.S. Maritz and T. Lwin (1989) 36 Symmetric Multivariate and Related Distributions K.T. Fang, S. Kotz and K.W. Ng (1990) 37 Generalized Linear Models, 2nd edition P. McCullagh and J.A. Nelder (1989) 38 Cyclic and Computer Generated Designs, 2nd edition J.A. John and E.R. Williams (1995) 39 Analog Estimation Methods in Econometrics C.F. Manski (1988) 40 Subset Selection in Regression A.J. Miller (1990) 41 Analysis of Repeated Measures M.J. Crowder and D.J. Hand (1990) 42 Statistical Reasoning with Imprecise Probabilities P. Walley (1991) 43 Generalized Additive Models T.J. Hastie and R.J. Tibshirani (1990) 44 Inspection Errors for Attributes in Quality Control N.L. Johnson, S. Kotz and X. Wu (1991) 45 The Analysis of Contingency Tables, 2nd edition B.S. Everitt (1992) 46 The Analysis of Quantal Response Data B.J.T. Morgan (1992) 47 Longitudinal Data with Serial Correlation—A state-space approach R.H. Jones (1993) 48 Differential Geometry and Statistics M.K. Murray and J.W. Rice (1993) 49 Markov Models and Optimization M.H.A. Davis (1993) 50 Networks and Chaos—Statistical and probabilistic aspects O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall (1993) 51 Number-Theoretic Methods in Statistics K.-T. Fang and Y. Wang (1994) 52 Inference and Asymptotics O.E. Barndorff-Nielsen and D.R. Cox (1994) 53 Practical Risk Theory for Actuaries C.D. Daykin, T. Pentikäinen and M. Pesonen (1994) 54 Biplots J.C. Gower and D.J. Hand (1996) 55 Predictive Inference—An introduction S. Geisser (1993) 56 Model-Free Curve Estimation M.E. Tarter and M.D. Lock (1993) 57 An Introduction to the Bootstrap B. Efron and R.J. Tibshirani (1993) 58 Nonparametric Regression and Generalized Linear Models P.J. Green and B.W. Silverman (1994) 59 Multidimensional Scaling T.F. Cox and M.A.A. Cox (1994) 60 Kernel Smoothing M.P. Wand and M.C. Jones (1995) 61 Statistics for Long Memory Processes J. Beran (1995) 62 Nonlinear Models for Repeated Measurement Data M. Davidian and D.M. Giltinan (1995) 63 Measurement Error in Nonlinear Models R.J. Carroll, D. Rupert and L.A. Stefanski (1995) 64 Analyzing and Modeling Rank Data J.J. Marden (1995) 65 Time Series Models—In econometrics, finance and other fields D.R. Cox, D.V. Hinkley and O.E. Barndorff-Nielsen (1996)
66 Local Polynomial Modeling and its Applications J. Fan and I. Gijbels (1996) 67 Multivariate Dependencies—Models, analysis and interpretation D.R. Cox and N. Wermuth (1996) 68 Statistical Inference—Based on the likelihood A. Azzalini (1996) 69 Bayes and Empirical Bayes Methods for Data Analysis B.P. Carlin and T.A Louis (1996) 70 Hidden Markov and Other Models for Discrete-Valued Time Series I.L. Macdonald and W. Zucchini (1997) 71 Statistical Evidence—A likelihood paradigm R. Royall (1997) 72 Analysis of Incomplete Multivariate Data J.L. Schafer (1997) 73 Multivariate Models and Dependence Concepts H. Joe (1997) 74 Theory of Sample Surveys M.E. Thompson (1997) 75 Retrial Queues G. Falin and J.G.C. Templeton (1997) 76 Theory of Dispersion Models B. Jørgensen (1997) 77 Mixed Poisson Processes J. Grandell (1997) 78 Variance Components Estimation—Mixed models, methodologies and applications P.S.R.S. Rao (1997) 79 Bayesian Methods for Finite Population Sampling G. Meeden and M. Ghosh (1997) 80 Stochastic Geometry—Likelihood and computation O.E. Barndorff-Nielsen, W.S. Kendall and M.N.M. van Lieshout (1998) 81 Computer-Assisted Analysis of Mixtures and Applications— Meta-analysis, disease mapping and others D. Böhning (1999) 82 Classification, 2nd edition A.D. Gordon (1999) 83 Semimartingales and their Statistical Inference B.L.S. Prakasa Rao (1999) 84 Statistical Aspects of BSE and vCJD—Models for Epidemics C.A. Donnelly and N.M. Ferguson (1999) 85 Set-Indexed Martingales G. Ivanoff and E. Merzbach (2000) 86 The Theory of the Design of Experiments D.R. Cox and N. Reid (2000) 87 Complex Stochastic Systems O.E. Barndorff-Nielsen, D.R. Cox and C. Klüppelberg (2001) 88 Multidimensional Scaling, 2nd edition T.F. Cox and M.A.A. Cox (2001) 89 Algebraic Statistics—Computational Commutative Algebra in Statistics G. Pistone, E. Riccomagno and H.P. Wynn (2001) 90 Analysis of Time Series Structure—SSA and Related Techniques N. Golyandina, V. Nekrutkin and A.A. Zhigljavsky (2001) 91 Subjective Probability Models for Lifetimes Fabio Spizzichino (2001)
C0329_DIsclaimer Page 1 Monday, April 9, 2001 12:40 PM
Library of Congress Cataloging-in-Publication Data Spizzichino, F. (Fabio), 1948Subjective probability models for lifetimes / Fabio Spizzichino. p. cm. -- (Monographs on statistics and applied probability ; 91) Includes bibliographical references and index. ISBN 1-58488-060-0 (alk. paper) 1. Failure time data analysis. 2. Probabilities. I. Title. II. Series. QA276 .S66 2001 519.5--dc21
2001028129
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com © 2001 by Chapman & Hall/CRC No claim to original U.S. Government works International Standard Book Number 1-58488-060-0 Library of Congress Card Number 2001028129 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper
.
To Daniela, Valeria and Flavia
To Daniela, Valeria and Flavia
Contents Preface Notation and Acronyms 1 Exchangeability and subjective probability 1.1 Introduction 1.2 Families of exchangeable events 1.2.1 Extendibility and de Finetti's theorem 1.2.2 The problem of prediction 1.2.3 More on infinitely extendible families 1.3 Exchangeable random quantities 1.3.1 Extendibility and de Finetti's theorem for exchangeable random variables 1.3.2 The problem of prediction 1.4 de Finetti-type theorems and parametric models 1.4.1 Parametric models and prediction sufficiency 1.5 Exercises 1.6 Bibliography 2 Exchangeable lifetimes 2.1 Introduction 2.2 Positive exchangeable random quantities 2.3 Multivariate conditional hazard rates 2.4 Further aspects of m.c.h.r. 2.4.1 On the use of the m.c.h.r. functions 2.4.2 Dynamic histories, total time on test statistic and total hazard transform 2.4.3 M.c.h.r. functions and dynamic sufficiency 2.5 Exercises 2.6 Bibliography
3 Some concepts of dependence and aging 3.1 Introduction 3.1.1 One-dimensional stochastic orderings 3.1.2 Stochastic monotonicity and orderings for conditional distributions 3.2 Multivariate stochastic orderings 3.2.1 Usual multivariate stochastic ordering 3.2.2 Multivariate likelihood ratio ordering 3.2.3 Multivariate hazard rate and cumulative hazard rate orderings 3.2.4 Some properties of multivariate stochastic orderings and examples 3.3 Some notions of dependence 3.3.1 Positive dependence 3.3.2 Negative dependence 3.3.3 Simpson-type paradoxes and aspects of dependence in Bayesian analysis 3.3.4 Likelihood-ratio comparisons between posterior distributions 3.4 Some notions of aging 3.4.1 One-dimensional notions of aging 3.4.2 Dynamic multivariate notions of aging 3.4.3 The case of exchangeable lifetimes 3.5 Exercises 3.6 Bibliography 4 Bayesian models of aging 4.1 Introduction 4.2 Schur survival functions 4.2.1 Basic background about majorization 4.2.2 Schur properties of survival functions and multivariate aging 4.2.3 Examples of Schur survival functions 4.2.4 Schur survival functions and dependence 4.3 Schur density functions 4.3.1 Schur-constant densities 4.3.2 Examples of Schur densities 4.3.3 Properties of Schur densities 4.4 Further aspects of Bayesian aging 4.4.1 Schur densities and TTT plots 4.4.2 Some other notions of Bayesian aging 4.4.3 Heterogeneity and multivariate negative aging 4.4.4 A few bibliographical remarks
4.4.5 Extensions to non-exchangeable cases 4.5 Exercises 4.6 Bibliography 5 Bayesian decisions, orderings, and majorization 5.1 Introduction 5.1.1 Statistical decision problems 5.1.2 Statistical decision problems and sufficiency 5.1.3 Some technical aspects 5.2 Stochastic orderings and orderings of decisions 5.3 Orderings of residual lifetimes and majorization 5.3.1 The case of observations containing failure data 5.4 Burn-in problems for exchangeable lifetimes 5.4.1 The case of i.i.d. lifetimes 5.4.2 Dependence and optimal adaptive burn-in procedures 5.4.3 Burn-in optimal, stopping,monotonicity, and Markovianity 5.4.4 Stochastic orderings and open-loop optimal adaptive burn-in procedures 5.5 Exercises 5.6 Bibliography Essential bibliography
Essential bibliography Aldous, D.J. (1983). Exchangeability and related topics. Springer Series Lecture Notes in Mathematics, 1117. Barlow, R. E. and Proschan F. (1975). Statistical theory of reliability and life testing. Holt, Rinehart and Winston, New York. Bremaud, P. (1981). Point processes and queues. Martingale dynamics. Springer Verlag, New York. Chow Y. S. and Teicher H. (1978). Probability theory. Independence, interchangeability, martingales. Springer Verlag, New York. Cox, D. and Isham, V. (1980). Point Processes. Chapman & Hall, London. Cox, D. and Oakes, D. (1984). Analysis of Survival Data. Chapman & Hall, London de Finetti, B. (1970). Teoria delle Probabilita'. Einaudi, Torino. English translation: Theory of probability. John Wiley and Sons, New York, 1974. De Groot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York. Karlin, S. (1968). Total positivity. Stanford University Press, Stanford, Ca. Lawless, J. F. (1982). Statistical models and methods for lifetime data. John Wiley & Sons, New York. Marshall, A.W. and Olkin I. (1979). Inequalities: theory of majorization and its applications. Academic Press, New York. Savage, L. J. (1972). The foundations of statistics, 2nd revised edition. Dover, New York. Shaked, M. and Shanthikumar, J.G. (1994). Stochastic orders and their applications. Academic Press, London.
245
Preface From a mathematical point of view, the subject of this monograph can be described as a study of exchangeable, non-negative random variables or, in other words, of symmetric probability measures over Rn+ . However, the interest is essentially in the role of conditioning in lifetime models and, more than on analytical aspects, the focus is on some ideas related to applications, especially in reliability and survival analysis; in fact the random variables of interest, T1; :::; Tn , have the meaning of lifetimes of dierent units (or individuals) and most of our attention will be focused on conditional \survival" probabilities of the type
P fTh+1 > sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g. In particular we study notions of dependence and notions of aging, which provide the tools to obtain inequalities for those conditional probabilities. The monograph is addressed to statisticians, probabilists, and engineers, interested in the methods of Bayesian statistics and in Bayesian decision problems; the aim is to provide a conceptual background which can be useful for sound applications of such methods, mostly (but not exclusively) in reliability and survival analysis. Since the Bayesian approach is mainly motivated by a subjectivist interpretation of probability, our study will be developed within the frame of subjective probability, where the probability of an event is interpreted as a degree of belief, related with a personal state of information; within such a frame, probability is indeed a tool for describing a state of information (or a state of uncertainty). A characterizing feature of subjective probability is, in fact, that any unknown quantity is treated as a random variable (i.e., there is no room for \deterministic but unknown quantities") and uncertainty about it is expressed by means of a probability distribution. This entails that a probability assessment explicitly depends on the information level. In particular, as a basic point of our discussion, we have to point out that aging and dependence properties for lifetimes depend on the actual
ow of available information. In general an approach based on such general lines turns out to be quite
exible and convenient; as a matter of fact it provides a natural setting for those applications in which dierent information levels must be considered and compared, as can be the case in the problems considered here. On the other hand, the same approach involves, for many probabilistic notions, an interpretation completely dierent from that in other approaches and, as we shall discuss, this has some eects even on the meaning of the notions of dependence and of aging. In particular notions of stochastic dependence are to be understood in a special way, since dependence can originate both from \physical" interactions and from situations in which learning is present.
xiv
Preface
Since ignoring such eects can be a source of logic diculties in the application of Bayesian methods, we shall extensively discuss aspects of dependence, of aging, and of their mutual relations, in the frame of subjective probability. As I mentioned above, the discussion will be limited to the case of exchangeable lifetimes. For such a case, special notions such as multivariate hazard rates functions, notions of aging and dependence, which arise in reliability and survival analysis, will be discussed in detail. The choice of limiting the study to exchangeable lifetimes has the following motivations: - it allows us to analyze conceptual points related to the role of information, picking o those aspects which are not essential in the present discussion; - exchangeability is, in my opinion, the natural background needed to clarify relations between concepts of aging and dependence in the subjective standpoint; - exchangeability provides an introduction and a general frame for a topic which will be discussed in detail: distributions with Schur-concavity or Schurconvexity properties and related role in the analysis of failure data (such properties are de ned in terms of the concept of majorization ordering). The latter type of distributions describe in fact a very particular case of exchangeability. However, they are of importance both from a conceptual and an application-oriented point of view, in that they give rise, in a sense, to a natural generalization of the fundamental case of independent or conditionally independent (identically distributed) lifetimes with monotone failure rate. Special cases of distributions with Schur properties are the distributions with Schur-constant joint survival functions or Schur-constant densities. These constitute natural generalizations of the basic cases of independent or conditionally independent (identically distributed) exponential lifetimes. Like the exponential distribution is the basic and idealized probability model for standard reliability methods, similarly Schur-constant densities can be seen as the idealized models in the setting of multivariate Bayesian analysis of lifetimes. In fact, densities with Schur properties and exchangeable densities could be, respectively, seen, in their turn, just as the most natural generalizations of Schur-constant densities; furthermore a suitable property of indierence with respect to aging, or of no-aging, owned by the latter, is an appropriate translation of the memory-less property of exponential distributions, into the setting considered here. We shall see (in Chapter 4) that no-aging is substantially a property of exchangeability for residual lifetimes of units of dierent ages. The monograph consists of ve chapters. The rst chapter provides a background to the study of subjective probability and Bayesian statistics. In particular we shall discuss the impact of subjective probability on the language and formalization of statistics and the related role of exchangeable random variables.
Preface
xv
In the second chapter, by concentrating attention on the case of exchangeability, we analyze fundamental notions of multivariate probability calculus for non-negative random variables, such as survival functions, conditional survival probability given histories of failures and survivals, one-dimensional hazard rate functions, and conditional multivariate hazard rate functions. Based on the fundamental concepts of stochastic comparisons, the third chapter is devoted to the presentation of some notions of stochastic dependence, aging and to their mutual relations, with an emphasis on the exchangeable case. The fourth chapter will focus on the probabilistic meaning of distributions with Schur-constant, Schur-concave and Schur-convex survival functions or density functions. After discussing these notions, we shall illustrate related properties and some applications to problems in survival analysis. Chapter 5 will be devoted to Bayes decision problems; two main features of this chapter are the following: i) some typical problems in the eld of reliability applications are given the shape of Bayes decision problems, special attention being paid to life-testing and burn-in problems; ii) it is shown how dierent concepts of stochastic orderings, dependence and majorization enter in the problem of obtaining inequalities for Bayes decisions when dierent sets of observed data are compared. In order to limit the mathematical diculties, all these arguments will be essentially treated under the assumption that the joint distributions of lifetimes admit a joint density function. My intention was to give each separate chapter a speci c character and identity of its own and then an eort was made to maintain a reasonable amount of independence among chapters; however, the same group of a few basic models is reconsidered several times between Chapters 2 and 5, to provide examples and comparisons at dierent stages of the treatment. For the sake of independence among chapters, selective lists of bibliographical references will be presented at the end of each of them. Such lists are far from being exhaustive, and I apologize in advance for the omissions which they surely present. On the other hand, these lists may be still too wide and the reader may desire to have an indication of a more basic literature. For this reason a reduced list of really basic, by-now classic, books, is presented in the essential bibliography at the end of the monograph. Only a very small part of all ideas and mathematical material contained in those books was sucient to provide the basic background for the arguments developed here and, of course, the reader is not at all assumed to be familiar with all of them. It is assumed, rather, a background on calculus in several variables, basic theory of probability at an intermediate level, fundamentals of Bayesian statistics, basic elements of stochastic processes, reliability and life testing at an introductory level.
xvi
Preface
As far as Bayesian statistics is concerned, only the knowledge of the basic language and of Bayes' formula is assumed (the notation of Stiltjes integrals will be sometimes used). As to the reliability background, the reader should be familiar with (and actually interested in the use of) the most common concepts of univariate aging, such as IFR, DFR, NBU, etc. No previous knowledge of the two basic topics of exchangeability and stochastic orderings is strictly assumed. The project of this monograph evolved over quite a long period, somehow changing in its form, over years; such an evolution was assisted, at its dierent stages, by the project managers at Chapman & Hall, previously, and at CRC Press, lately; I would like to thank them for assistance, encouragement, and for their professional commitment. Special thanks are also due to Marco Scarsini, for encouragement and suggestions that he has been providing since the early stage of the project. The basic kernel of ideas presented here developed over years, originating from several, long discussions that I had with Richard E. Barlow and Carlo A. Clarotti, about ten years ago. Essentially our interest was focused on the meaning of positive and negative aging and on the analysis of conditions which justify procedures of burn-in, for situations of stochastic dependence. On related topics, I also had a few illuminating discussions with Elja Arjas and Moshe Shaked. Several ideas presented here also developed as a consequence of discussions with co-authors of some joint papers and with students of mine, preparing their Tesi di Laurea in Mathematics, at the University \La Sapienza" in Rome. At an early stage of the preparation of the manuscript, helpful comments were provided by Menachem Berg, Julia Mortera, Giovanna Nappo and Richard A. Vitale. In the preparation of the most recent version, a big help and very useful comments came from several colleagues, among whom, in particular, are Richard E. Barlow, Bruno Bassan, Uwe Jensen, Giovanna Nappo, Ludovico Piccinato, Wolfgang Runggaldier, Marco Scarsini, and Florentina Petre. I am grateful to Bruno and Florentina also for their help on matters of technical type: in particular Bruno introduced me to the pleasures (and initial sorrows) of Scienti c Word and Florentina helped me with insertion of gures. Several grants, from C.N.R. (Italian National Council for Research) and M.U.R.S.T. (Italian Ministry for University and Scienti c Research), supported my research activity on the topics illustrated in the monograph, and are here gratefully acknowledged. Rome, March 2001.
Notation and Acronyms Notation E1 ; E2 , ... random events
1E indicator of the event E E family of exchangeable events X1 ; X2; ::: random variables
L(X ) probability law of X P Sn ni=1 Xi , for exchangeable random variables X1 ; X2 ; ::: !k(n) P fSn = kg, with X1 ; X2 ; ::: binary exchangeable random variables !k !k(k) = P fSk = kg (n) p(kn) !(knk) N0 maximum rank of a family of exchangeable events R
set of the real numbers
R +
set of the non-negative real numbers
[x]+ E (X)
x
if x 0 0 if x < 0
expected value of the random variable X
X(1); :::; X(n) order statistics of random variables X1 ; X2 ; :::; Xn F (n) (x1 ; :::; xn ) P fX1 x1 ; X2 x2 ; :::; Xn xn g; joint distribution function of n exchangeable random variables X1 ; X2 ; :::; Xn f (n) (x1 ; :::; xn ) joint density function of n exchangeable random variables f (n) (t1 ; :::; tn ) joint density function of n non-negative exchangeable random
variables (lifetimes)
U1; :::; Un units or individuals
xviii
Notation and Acronyms
T1; T2 ; :::; Tn exchangeable lifetimes s1 ; :::; sn set of ages F (n) (s1 ; :::; sn ) P fT1 > s1 ; T2 > t2 ; :::; Tn > sn g, joint survival function of n exchangeable lifetimes r(t) (one-dimensional) failure (or hazard) rate function
R R(s) 0s r (t) dt cumulated hazard rate function
D observed data (possibly containing survival data) D0 dynamic history of the form fTi1 = t1 ; :::; Tih = th ; Tj1 > t; :::; Tjn;h > tg ht
dynamic history of the form fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg or
fXI = xI ; XIe > teg parameter in a parametric statistical model
W state of nature in a Bayes decision problem W
0 initial (or a-priori) distribution for a parameter or for a state of nature
0 density of 0 (when it does exist) (jD) ; (jD) conditional distribution and conditional density for or W , given the observed data D
(tn) ; (tn;h) (t1 ; :::; th ) multivariate conditional hazard rate functions for exchangeable lifetimes T1 ; :::; Tn ijI (tjxI ) multivariate conditional hazard rate functions for non-necessarily exchangeable lifetimes
P Ht nj=1 1[Tj t] , stochastic process counting the number of failures observed up to time t for n units with exchangeable lifetimes T1 ; :::; Tn a ^ b min(a; b); a _ b max(a; b)
P ;
Yt ni=1 T(i) ^ t Total Time on Test (TTT) process for n units with exchangeable lifetimes T1 ; :::; Tn
Notation and Acronyms
xix
Zt (Ht ; Yt )
P Y(h) hi=1 T(i) + (n ; h) T(h) Total Time on Test cumulated up to the h-th failure, h = 1; 2; :::; n ;
Ch (n ; h + 1) T(h) ; T(h;1) , h = 1; :::; n normalized spacings between
order statistics
Ht (Ht ; T(1) ^ t; :::; T(n) ^ t) history process st ; hr ; lr ; ch one-dimensional or multidimensional stochastic orderings ; majorization orderings A action space l loss function decision function (or strategy) a(Sn;h)(t1 ; t2 ; :::; th ) residual duration of a burn-in procedure, after h failures, according to the burn-in strategy S :
Acronyms
a.s. almost surely i.i.d Independent and identically distributed (random variables) m.c.h.r. Multivariate conditional hazard rate (functions) TTT Total time on test st stochastic (ordering) hr hazard rate (ordering) lr likelihood ratio (ordering) TP Totally positive MTP Multivariate totally positive PC Positively correlated PUOD Positively upper orthant dependent HIF Hazard increasing upon failure SL Supporting lifetimes WBF Weakened by failures IFR Increasing failure rate DFR Decreasing failure rate NBU New better than used NWU New worse than used IFRA Increasing failure rate in average PF Polya frequency
xx
Notation and Acronyms MRR Multivariate reverse regular NUOD Negatively upper orthant dependent MIFR Multivariate increasing failure rate MPF Multivariate Polya frequency OLFO Open-loop feedback optimal.
Chapter 1
Exchangeability and subjective probability 1.1 Introduction In this introductory section we want to summarize brie y the main, well-known dierences between the subjectivist and frequentist interpretations of probability. Then we sketch the implications that, in the formalism of statistics, are related to that. We aim mainly to stress aspects that sometimes escape the attention of applied Statisticians and Engineers interested in Bayesian methods. For this purpose we limit ourselves to a concise and informal treatment, leaving a more complete presentation to the specialized literature. In the subjective approach, randomness is nothing but a lack of information. A random quantity is any unambiguously de ned quantity X which, according to your state of information, takes on values in a speci ed space X but your information is such that you are not able to claim with certainty which element x 2 X is the \true" value of X . A random event E is nothing but a statement about which you are not sure: you are not able to claim with certainty if E is true or false. Personal probability of E is your degree of belief that E is true. We start from the postulate that you can assess a degree of belief for any event E of interest in a speci ed problem: your state of partial information about E is described by the assessment of a degree of belief. Similarly, the state of partial information about a random quantity X is described by a probability distribution for X . Generally a state of partial information is described by a probability distribution on a suitable family of events and, in turn, a probability distribution is induced by a state of information, so that the (personal) probability distribution and the state of partial information are two dierent 1
2
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
aspects of the same object. Roughly speaking, in the frequentist approach probability has the meaning of frequency of successes in a very large number of analogous trials; i.e. we can de ne probability only with reference to a collective of events and it is senseless to speak of probability for a singular event (this is an event which cannot be reasonably embedded in some collective). More generally, in the frequentist approach, we must distinguish between random variables (which can be embedded in a collective of observations) and deterministic quantities with an unknown value (these are originated from singular experiments); for the latter it is impossible to de ne a (frequentist) probability distribution. Let us now underline some fundamental aspects implicit in the subjective approach; these make the latter approach substantially dierent from the frequentist one and also make clear that the language of statistics is to be radically modi ed when switching from one approach to the other. 1) Personal probability of an event is by no means a quantity with a universal physical meaning, intrinsically related to the nature of the event; it changes when the state of information changes and, obviously, depends on the individual who assesses it. 2) It is senseless to speak of a probability which is unknown to the individual who is to assess it (even though he/she may have some diculty in stating it precisely). This is a focal point and we aim to explain it by means of the following argument. Think of the situations when the assessment of the conditional probability P (E jx), of a speci ed event E given the knowledge of the value x taken by some quantity X , is, let us say, straightforward. When the choice of X is appropriate, this can often happen, due, for example, to reasons of symmetry, or to large past experience, or to reasons of \intersubjective agreement" and so on. For such situations, consider now the assessment of the probability of E , when the value of X is unknown to you. The basic point in the subjective frame is that it is senseless to claim that P (E jx) is the \true probability" but it is \unknown" due to ignorance about X. In such cases X , being an unknown quantity, is to be looked at as a random quantity and a probability distribution FX (x) is to be assessed for it. In order to derive the personal probability of E , you must rather \uncondition" with respect to X . This means that the probability of the event E is to be computed according to the rule
Z
P (E ) = P (E jx)dFX (x):
INTRODUCTION
3
3) Stochastic interdependence between dierent events (or dierent random quantities, in general) can exist even if they are deemed to be physically independent. This usually happens when we \uncondition" with respect to some unknown quantity, in the sense speci ed in the point above. In such cases the knowledge of the values taken by some of the considered events modi es the state of information concerning those events which have not yet been observed. This is the case of interdependence due to an increase of information and it is of fundamental importance in the problems of statistics. This topic will have in particular a crucial role in what follows and we shall come back to it several times. A special case of that is met when dealing with the form of dependence created by situations of conditional independence; such situations will be often considered in this monograph and a basic case of interest is the one described in Example 2.1. 4) All events and all unknown quantities are singular in the subjective approach; in some cases there is a situation of similarity (or symmetry ), relative to the state of information of a given individual; this is formalized by means of the concept of exchangeability. 5) Can personal probability, not being an observable physical quantity and depending on an individual's opinion, be of some use in real practical problems? The answer is \Yes": it is to be employed in decision problems under uncertainty. By combining personal probabilities with utilities, one can check if a decision procedure is a coherent one; only coherent procedures are to be taken into consideration. Coherency is a fundamental concept in the subjective approach and the (Kolmogorov-like) axioms of probability are interpreted and justi ed in terms of coherency. 6) As mentioned, the state of information about a random quantity X is declared by means of a probability distribution. The state of information about X , after observing the value y taken by another quantity Y will in general be a modi ed one and then declared by using a dierent probability distribution: by the constraint of coherency, the latter must coincide with the conditional distribution of X given (Y = y): In \regular" cases, the latter distribution can be obtained from the former by use of Bayes formula.
4
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
7) By the very nature of personal probability, it makes sense to assess probability distributions only for those quantities which have a clear physical meaning and are of real interest in the problem at hand. Let us now mention some immediate consequences that the above points imply in the treatment of statistics. a) In orthodox statistics, parameters normally are looked on as deterministic quantities with an unknown \true" value; such a value aects the (frequentist) probability distributions of statistical observations. The latter are random variables. Even though the statistician may have some a-priori information about the value of the parameter, no systematic method is available to formalize how to deal with it. In Bayesian statistics, any unknown quantity is treated as a random variable and then there is a sort of symmetry between parameters and statistical observations. The parameter, being a random quantity of its own, follows a (personal) probability distribution; the latter depends on the statistician's state of information: thus we have a prior distribution on the parameter (describing the state of information before taking observations) and a posterior distribution (describing the state of information after taking observations). The terms prior and posterior only have a conventional meaning (for instance a prior can be obtained as a posterior following some previous observations). However it is to be said that, in view of what was mentioned at point 7) above, one often tries to avoid introducing a parameter, as we shall illustrate later on. b) The easiest and most fundamental case of interest in statistics is de ned, in the orthodox statistics language, by statistical observations which form a sequence of Bernoulli trials (i.e. independent, equiprobable events) with an \unknown" (frequentist) probability; the latter is the parameter which we must estimate on the basis of the observed events. In order to characterize and to study this situation in a subjectivist standpoint, we need to introduce a suitable concept; indeed the observable events are not stochastically independent (since the observation of some carries a piece of information about others); furthermore subjective probability cannot coincide with the (unknown) frequentist probability. Last, but not least, we must explain, in subjectivist terms, what the frequentist probability (parameter to be estimated) is. The above leads us to introduce the concept of (a family of) exchangeable events. A natural generalization is the concept of (a family of) exchangeable random quantities which is introduced to render the concept of random sample in the subjectivist language.
INTRODUCTION
5
c) Sometimes the parameter in a statistical model has a clear physical meaning of its own. In those cases the very aim of statistical analysis is to study how statistical observations modify the state of information about the parameter itself. In such cases there is a certain analogy between the orthodox and the Bayesian approach: the parameter is to be estimated based on the observed values taken by statistical variables; the dierences between the two approaches essentially lie in the methods used for the estimation of parameters, once the values taken by statistical variables have been observed. In the Bayesian approach, the posterior distribution which is obtained by conditioning with respect to the observations constitutes the basis for the estimation procedures. A very important consequence is that, at least in regular cases, the likelihood principle must hold: since the posterior distribution only depends on the prior distribution and on the likelihood function associated with the observed results, it follows that two dierent results giving rise to the same likelihood (up to some proportionality constant, possibly) also give rise to the same posterior distribution for the parameter and then to the same estimation or decision procedure. d) More often, however, the parameter in a (orthodox) statistical model merely has a conventional meaning, simply having the role of an index for the distributions of statistical variables. In the subjectivist approach to Bayesian statistics this kind of situation is seen as substantially dierent from the one described formerly. In this situation indeed it is recognized that the actual aim of the statistical analysis is a predictive one: on the basis of already observed statistical variables we need to predict the behavior of not yet observed variables rather than to estimate the \true" value of an hypothetical parameter. Prediction can be made once the joint (personal) probability of all the conceivable observations (past and future) is assigned. In these cases then attention is focused on the probability distribution obtained after \unconditioning" with respect to the unknown parameters. Such distributions are usually called predictive distributions. Problems of statistical inference and decisions formulated in terms of such distributions are said to be predictive inference problems and predictive decision problems, respectively. Certain natural questions may then be put: What is the relationship between the joint (personal) distribution of observables and the (frequentist) model? Can the parameter be given a convincing Bayesian interpretation? These problems are, in some cases, solved through de Finetti-type theorems. This will be brie y explained in Section 1.4.
6
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
In the next sections of this chapter we shall dwell in some detail on exchangeable events and exchangeable random variables. This will serve as a basis for what is to be developed in the following. Furthermore this will allow us to clarify better the points quoted above. In the special framework of exchangeability, in particular, we shall brie y discuss the spirit of de Finetti-type results. These results, which are useful to explain what a \parameter" is, can be seen from a number of dierent points of view. As we shall mention, a possible point of view is based on the role of (predictive) sucient statistics. Although such topics have a theoretical character, they will be important for a better comprehension of basic aspects related to the analysis of lifetimes. If not otherwise speci ed, the term probability will stand for subjective probability of an individual (perhaps you).
1.2 Families of exchangeable events We are interested in the probabilistic description of situations where, according to our actual state of information, there is indierence among dierent events. Example 1.1. In a factory, a lot of n units of the same type has just been received and not yet inspected. We expect that there are a few defective units in the lot. Before inspection, however, we have no reason to distinguish among dierent units, as far as their individual plausibility to be defective is concerned: given 0 k m < n, whatever group of m out of the n units has the same probability to contain k defective and (m ; k) good units. The notion of exchangeability, for a family of events, formalizes such an idea of indierence and allows us to understand which are the possible cases of indierence among events. Situations of indierence among (non-binary) random quantities will be studied in the next section, by means of the more general concept of exchangeable random variables. Let E fE1 ; :::; En g be a nite family of random events; the symbol Xi will denote the indicator of Ei , i.e. Xi is the binary random quantity de ned by
1
if Ei is true 0 if Ei is false : De nition 1.2. E is a family of exchangeable events if, for any permutation j1 ; j2 ; :::; jn of 1; 2; :::; n, (X1 ; :::; Xn ) and (Xj1 ; :::; Xjn ) have the same joint distribution: for any 1 h < n P (X1 = 1; :::; Xh = 1; Xh+1 = 0; :::; Xn = 0) =
Xi =
P (Xj1 = 1; :::; Xjh = 1; Xjh+1 = 0; :::; Xjn = 0):
(1.1)
FAMILIES OF EXCHANGEABLE EVENTS
7
For an arbitrary family of events E fE1 ; :::; En g, x 1 m n and consider the probability of all successes over a xed m-tuple of events Ej1 ; :::; Ejm : pj1 ;j2 ;:::;jm P (Ej1 \ ::: \ Ejm ): Obviously, pj1 ;j2 ;:::;jm will in general depend on the choice of the indexes j1 ; j2 ; :::; jm ; but, in the case of exchangeability, one can easily prove the following Proposition 1.3. Let E fE1; :::; En g be a family of exchangeable events. Then, for any 1 m n, pj1 ;j2 ;:::;jm is a quantity depending on m but independent of the particular choice of the m-tuple j1 ; j2 ; :::; jm : we can nd n numbers 0 !n !n;1 ::: !2 !1 1 such that pj1 ;j2 ;:::;jm = !m: From now on in this section, E fE1 ; :::; En g is a family of exchangeable events. For m = 1; 2; :::; n, 1 h m ; 1, we shall use the notation p(0m) P (X1 = 0; :::; Xm = 0); p(mm) P (X1 = 1; :::; Xm = 1) = !m : and, for 1 h m ; 1, p(hm) P (X1 = 1; :::; Xh = 1; Xh+1 = 0; :::; Xm = 0): It will be also convenient to use the symbol !h(m) to denote the probability of exactly h successes among any m events from E : we let, for 1 h < m < n,
m
(1.2)
As is immediate to check, it must be p(hm) = p(hm+1) + p(hm+1+1) :
(1.3)
X !(m) P f X m
h
i=1
i = hg =
(m) h ph :
It is then self-evident that, for 1 h m < n, !h(m) can be computed in terms of f!k(n); k = 0; 1; :::; ng. Later on we shall see the explicit form of the relation between f!k(n)gk=1;:::;n and f!n(m)gk=1;:::;m, m < n (Proposition 1.9 and Remark 1.12). In what follows we shall present some relevant examples of exchangeable families, which will clarify fundamental aspects of the de nition above. Example 1.4. (Bernoulli scheme; binomial probabilities). Let E1 ; :::; En be judged to be mutually independent and such that P (Ei ) = p (0 < p < 1). It is immediate to check that E1 ; :::; En are exchangeable and
!h = ph ;
!(m) = h
m
h m;h for 1 h < m n h p (1 ; p)
8
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
To introduce the related next example, it is convenient to think in particular of the scheme of random drawings from an urn. An urn contains M1 = M green balls and M2 = (1 ; )M red balls, where (0 < < 1) is a quantity known to you. We perform n consecutive random drawings with replacement and set Ei fa green ball at the i-th drawingg; (1 i n): This obviously is a special case of the Bernoulli scheme (p = ). In drawings with replacement it is immaterial whether M is known or unknown. Example 1.5. (Random drawings with replacement from an urn with unknown composition; mixture of binomial probabilities). Let us consider the urn as in the example above; but let us think of the case when is unknown to you. is the value taken by a random quantity and your prior state of information on is described by a probability distribution 0 on the interval [0; 1]. Your personal probability is
P (Ei ) =
Z1
Moreover, for i 6= j ,
P (Ei \ Ej ) =
0
d0 ():
Z1 0
2 d0 () ;
whence E1 ; :::; En are not (subjectively) stochastically independent, in that P (Ei \ Ej ) 6= P (Ei )P (Ej ): However, they are exchangeable; in particular they are conditionally independent given and it is
!h(m) =
Z 1 m 0
h m;h h (1 ; ) d0 ():
Example 1.6. (Several tosses of a same coin). We toss the same coin n times and let Ei fhead at the i-th tossg. It is natural to assume P (E1 ) = P (E2 ) = ::: = P (En ):
Two dierent cases are now possible: you assess stochastic independence among E1 ; :::; En (this is a very strong position of yours) or you admit interdependence (this may be more reasonable), i.e. you admit there are results in the rst (h ; 1) tosses which may lead you to assess a conditional probability, about the outcome of the h-th toss, which diers from the initial one. Only in the rst case we are in a Bernoulli scheme (!k = (!1 )k ); in any case, however, it is natural to assume that E1 ; :::; En are exchangeable.
FAMILIES OF EXCHANGEABLE EVENTS
9
Example 1.7. (Random drawings without replacement from an urn with known composition; hypergeometric probabilities). Let us go back to the example of the urn, containing M1 = M green balls and M2 = (1 ; )M red balls, where both and M are known quantities. Consider the events E1 ; :::; En ; where
Ei fgreen ball at the i-th drawingg: This time we perform n random drawings without replacement. E1 ; :::; En are such that P (E1 ) = ::: = P (En ) = .
They are not independent; however they are exchangeable and one has 1 M1 ; 1 M1 ; 2 M1 ; k + 1 !k = M M M ; 1 M ; 2 ::: M ; k + 1 ;
!(m) = h
;M1; M2 h ; m;h ; max(0; m ; M ) h min(m; M ): 2 1 M m
(1.4)
Example 1.8. (Random drawings without replacement from an urn with un-
known composition; mixture of hypergeometric probabilities). Consider the case of the previous example with unknown; then M1 and M2 are unknown too. Also in this case E1 ; :::; En are exchangeable and, by (1.4) and the rule of total probabilities, one obtains M ;X m+h ;k ;M ;k h ; m;h P fM = k g: 1 h M m k=h ; Of course we shall understand ab = 0, if a > b.
!(m) =
(1.5)
We shall see soon (Remark 1.12) that the situation contemplated in Example 1.8 can be used as a representation of the most general case of nite families of exchangeable events. Before we analyze further, some general facts. For an arbitrary exchangeable family fE1 ; :::; En g, put
Sm =
m X i=1
Xi (m = 1; 2::::; n)
so that Equation (1.2) becomes
P fSn = kg = !k(n) = nk p(kn) ; k = 0; 1; :::; n: Some relevant properties of the sequence fS1 ; :::; Sn g directly follow from
the very de nition of exchangeability.
10
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
Due to these properties, fS1 ; :::; Sn g has a fundamental role in the present study and it is the prototype for a general concept which will be introduced in Section 1.4. Fix h m < n; rst, it is
P fSm = hjSn = kg =
;P
P f(Sm = h) \ ni=m+1 Xi = k ; h = P fSn = kg
;m;n;mp(n) ;m;n;m h ; k;h k = h ; k;h n n p(n) k k
k
(1.6)
By (1.6) and by Bayes' formula, we immediately obtain
;m;n;m (n) P fSn = kjSm = hg = h ;nk;h !(km) : k
!h
(1.7)
Furthermore we have
P fSn = kjX1 ; :::; Xm g = P fSn = kjS1 ; :::; Sm g = P fSn = kjSm g
(1.8)
and, when fE1 ; :::; En g is a subfamily of a larger family of exchangeable events
fE1 ; :::; En ; En+1 ; :::; EM g; P fSm = hjSn = k; Xn+1 ; :::; XM g = P fSm = hjSn = kg Proposition 1.9. For an arbitrary exchangeable family, !(m) = h
n;X m+h ;k ; n;k h ; m;h ! (n) ; 0 h m: n k m k =h
Proof. By the rule of total probabilities:
!h(m) = =
n X k=0
P fSm = hjSn = kgP fZn = kg
n;X m+h k=h
P fSm = hjZn = kg!k(n)
whence Equation (1.10) follows by the formula (1.6).
(1.9)
(1.10)
FAMILIES OF EXCHANGEABLE EVENTS
11
A trivial but fundamental consequence of the de nition of exchangeability is illustrated in the following. Remark 1.10. Let E fE1; :::; Eng be an exchangeable family. For 1 m n; the joint distribution of (X1 ; :::; Xm ) is uniquely determined by the probability distribution of the variable Sn : P x = h (0 h More precisely, for (x1 ; :::; xm ) 2 f0; 1gm such that m i=1 i m); we can write (m) p(hm) P fX1 = x1 ; :::; Xm = xm g = !;hm
h
n;X m+h ;n;m n;X m+h ;k ; n;k ( n ) k;;h ! (n) : h m ; h ; n ;m !k = = n k m h k k=h k=h
Example 1.11. In a lot of n units fU1; :::; Ung, let Sm be the number of defectives out of the sample fU1 ; :::; Umg. By imposing exchangeability on the events
E1 fU1 is defectiveg; :::; En fUn is defectiveg; we have that the distribution of Sm is determined by that of Sn via Equation (1.10). If P fSn = kg = 1, for some k n, then the distribution of Sm is obviously hypergeometric. Generally (1.10) shows that the distribution of Sm is a mixture of hypergeometric distributions. If Sn is binomial b(n; p) then Sm is binomial b(m; p). If Sn is uniformly distributed over f0; 1; :::; ng, then Sm is uniformly distributed over f0; 1; :::; mg (see Example 1.19).
Remark 1.12. Let !h(m); 0 h m n; be the set of probabilities associated
with a given exchangeable family. Consider now the scheme of random drawings without replacement from an urn containing M balls, among which M1 are green as in Example 1.7 (M1 is a random number). More precisely, consider the case with M = n and let, for k = 0; 1; :::; n;
P fM1 = kg = !k(n) : n = M means that \we sample without replacement until the urn is empty"). Thus, for the family, Eb fEb1 ; :::; Ebn g, with Ebi fgreen ball at the i-th drawingg;
12
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
we have whence
Sbn = SbM = M1 ; !bk(n) P fSbn = kg = P fM1 = kg = !k(n)
In view of Proposition 1.9,
!h(m) = !bh(m) ; 0 h m < n:
We can then conclude that, for any nite family of exchangeable events, one can nd another family, describing drawings without replacement from an urn, sharing the same joint probability distribution. This argument also shows that Equation (1.10) could be obtained by means of a direct heuristic reasoning, taking into account the formula (1.5).
1.2.1 Extendibility and de Finetti's theorem
Any subfamily fEi1 ; :::; Eim g of E is of course still a family of exchangeable events; thus we can wonder if E is, in its turn, a subfamily of a larger family of exchangeable events. De nition 1.13. E fE1; :::; Eng is an N ;extendible family of exchangeable events (N > n) if we can de ne events En+1 ; :::; EN such that fE1 ; :::; EN g are exchangeable. E is in nitely extendible if it is N-extendible for any N > n. E has maximum rank N0 if E is N0 -extendible but not (N0 + 1)-extendible. A necessary and sucient condition for N -extendibility (N > n) will be given in terms of the quantities !k(n) (Remark 1.15). Examples 1.4 and 1.5 show cases of in nite extendibility, while in Example 1.7 we have a case of nite extendibility (N0 = M ). In Example 1.8, N0 M and the actual value of N0 depends on the probabilities P fM1 = kg(k = 0; :::; M ): In Example 1.6 we have no given constraint on N0 (besides, of course, N0 n): However it can be natural to assume in nite extendibility. Remark 1.14. Let Y1 ; :::; Yn be random variables satisfying the conditions: V ar(Yi ) = 2 (i = 1; :::; n); Cov(Yi ; Yj ) = 2 ( has obviously the meaning P of correlation coecient). By considering the variance of the variable ni=1 Yi , we obtain the inequality 0 V ar
n ! X n X i=1
Yi =
i=1
V ar (Yi ) +
X i6=j
Cov (Yi ; Yj ) =
FAMILIES OF EXCHANGEABLE EVENTS
13
n2 + n (n ; 1) 2 whence
; n ;1 1 :
For the indicators of exchangeable events, the above conditions trivially hold with
2 = P (E1 ) ; [P (E1 )]2 = !1 ; (!1 )2 and
2 ; (!1 )2 : P (E1 \ E2) 2; [P (E1 )] = !!2 ; 1 (!1 )2
It then follows that if < 0 (i.e. if (!1 )2 > !2 ), we have, for the maximum rank, 2
N0 < 1 + j1j = 1+ !1 ;2 (!1 ) : (!1 ) ; !2 Thus 0 ((!1 )2 !2 ) is a necessary condition for in nite extendibility.
Remark 1.15. A necessary and sucient condition for N -extendibility of fE1; :::; En g (N > n) is the existence of (N + 1) quantities !j(N ) 0 such that N X j =0
!(N ) = 1; j
!(n) = k
NX ;n+k ;j ;N ;j k ; n;k ! (N ) ; 0 k n: j N n j =k
!j(N ) can be interpreted as the probability of j successes out of N trials. For our purposes it is convenient to rewrite the above condition of N extendibility in the form
!(n) = k
Z 1 ;N ;(1;)N (N ) k ; n;k d (): 0 N 0
n
(1.11)
where the symbol (0N ) () denotes the probability distribution of the variable N SNN
14
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
(N is the \frequency of successes among the rst N events E1 ; :::; EN "). Starting from equation (1.11), we wonder which constraints must be satis ed by the probabilities !k(n) in the case of in nitely extendible families. This is explained by the following well known result. We shall not prove it formally; rather it can be instructive to sketch two dierent types of proofs. Theorem 1.16. (de Finetti's theorem for exchangeable events). Let fE1 ; E2; :::g be a denumerable family of exchangeable events. Then there exists a probability distribution 0 on [0; 1] such that
!k(n) =
Z 1 n 0
k n;k k (1 ; ) d0 ():
(1.12)
Proof. 1. We can easily prove the existence of a variable such that lim j ; j2 = 0 a:s: N !1 N
(\law of large numbers for exchangeable events"). This implies that the sequence f(0N ) g converges to a limiting distribution 0 . On the other hand, as is well known, lim
N !1
;N ;N ;N k ; n;k = k (1 ; )n;k N n
the limit being uniform in : We can then obtain (1.12) by letting N ! 1 in Equation (1.11). Proof. 2. Consider the sequence 1 = !0 !1 !2 ::: and the quantities (n) p(kn) = !;nk :
k
From Equation (1.3) we can in particular obtain p(nn;)1 = !n ; !n;1 ;!n;1; p(nn;)2 = p(nn;;21) ; p(nn;)1 2 !n;2 ; :::: (1.13) This shows that the sequence !0 !1 !2 ::: is completely monotone: for k n; (;1)n n;k !n;k 0: Being !0 = 1; it follows by Hausdor theorem (see e.g. the probabilistic proof in Feller, 1971) that there exists a probability distribution 0 on [0; 1] such that !0 !1 !2 ::: is the sequence of the moments of 0 :
!n =
Z1 0
n d0 ():
(1.14)
Finally (1.12) is obtained by using the representation (1.14) in the sequence of identities (1.13).
FAMILIES OF EXCHANGEABLE EVENTS
15
Remark 1.17. Equation (1.12), 8x 2 f0; 1gn such that Pni=1 xi = k, can equivalently be rewritten in the form
P fX1 = x1 ; :::Xn = xn g =
Z1 0
k (1 ; )n;k d0 ()
(1.15)
If is a degenerate random variable, i.e. if P f = bg = 1 for some b 2 (0; 1), (1.15) becomes
P fX1 = x1 ; :::Xn = xn g = bk (1 ; b)n;k :
Then E1 ; :::; En are independent and equiprobable. For the general case of in nity extendibility, when is not degenerate, E1 ; :::; En are conditionally independent and equiprobable, given .
1.2.2 The problem of prediction
Often when dealing with a family of exchangeable events the real object of statistical interest is the prediction problem, namely the one of computing, for given x0 (x01 ; :::; x0m ) 2 f0; 1gm, the conditional probability P fXm+1 = 1jX1 = x01 ; :::; Xm = x0m g; or more generally, for given n and x (x1 ; :::; xn ) 2 f0; 1gn, P fXm+1 = x1 ; :::; Xm+n = xn jX1 = x01 ; :::; Xm = x0m g P x0 , we have (of course it must be n + m N0 ). Denoting h = m i=1 i 0 P fXm+1 = 1jX1 = x1 ; :::; Xm = x0m g =
P fX1 = x01 ; :::; Xm = x0m ; Xm+1 = 1g = P fX1 = x01 ; :::; Xm = x0m g
;
+1 !h(m+1+1) = mh+1 !(hm+1+1) h + 1 ; = m + 1 !(m) : !h(m) = mh h P n Similarly, for (x1 ; :::; xn ) such that i=1 xi = k, we obtain P fXm+1 = x1 ; :::Xm+n = xn jX1 = x01 ; :::; Xm = x0m g =
;
!h(n++km) = nh++mk ; : !h(m) = mh
(1.16)
(1.17)
16
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
Example 1.18. Having initially assessed a probability distribution for the number Sn of defective units in a lot of size n, we want to compute the conditional probability, P fXm+1 = 1jSm = hg; that the next inspected unit will reveal to be defective, given that we found h defective units in the previous m inspections. To this aim, we apply the formula (1.16). For instance we obtain: h P fXm+1 = 1jSm = hg = nk ;; m
in the case P fSn = kg = 1, for some k n; P fXm+1 = 1jSm = hg = p in the case when the distribution of Sn is b(n; p), and 1 P fXm+1 = 1jSm = hg = nh + +2 when Sn is uniformly distributed over f0; 1; :::; ng (see also Example 1.19). In the case of an in nitely extendible family we can use the representation (1.12) and then the formulae (1.16), (1.17), respectively, become P fXm+1 = 1jX1 = x01 ; :::; Xm = x0m g =
R 1 h+1 (1 ; )m;hd () 0 0R 1 h (1 ; )m;h d () 0
(1.18)
0
P fXm+1 = x1 ; :::Xm+n = xn jX1 = x01 ; :::; Xm = x0m g =
R 1 h+k (1 ; )n+m;h;k d () 0 0 R : 1 h m ; h (1 ; ) d () 0
0
(1.19)
For s > 0, consider the probability distribution on [0; 1] de ned by s ; )m;s d0 () ds;m () = R 1 (1 : (1.20) s m;s d0 () 0 (1 ; ) Equations (1.18) and (1.19) can be rewritten
P fXm+1 = 1jX1 = x01 ; :::; Xm = x0m g =
Z
1 0
dh;m ()
(1.21)
FAMILIES OF EXCHANGEABLE EVENTS
17
P fXm+1 = x1 ; :::Xn+m = xn jX1 = x01 ; :::; Xm = x0m g =
Z1 0
k (1 ; )n;k dh;m ():
(1.22)
It is of interest to contrast Equations (1.22) and (1.15). Arguments presented so far can be illustrated by discussing a very simple and well-known case. Example 1.19. (Inference scheme of Bayes-Laplace). Consider n exchangeable events with a joint distribution characterized by the position !k(n) = (n 1+ 1) ; k = 0; 1; :::; n: Namely, we assess that the number of successes out of the n trials has a uniform distribution over the set of possible values f0; 1; :::; ng. In this case the formula (1.10) gives, for 0 h m < n;
!h(m) =
n ; k :
m+h k 1 ; n;X (n + 1) mn k=h h
m;h
For m = n ; 1, one immediately obtains !h(n;1) = n1 and then, by backward induction, !h(m) = m 1+ 1 ; 0 h m n ; 1: This also shows that we are in a case of in nite extendibility. Indeed, for any N > n; we can write, for !j(N ) = (N1+1) ,
!(n) = k
n+k ; j ;N ;j 1 = N ;X k ; n;k ! (N ) : j N (n + 1) n j =k
Then, by Theorem 1.16, we can write Z 1 n !k(n) = (n 1+ 1) = k (1 ; )n;k d0 () k 0
0 must be the limit of the uniform distribution over f0; N1 ; N2 ; :::; NN;1 ; 1g, for N ! 1. Indeed 0 is the uniform distribution on the interval [0; 1] and, for k = 0; 1; :::; n, we have the well-known identity 1 = Z 1 n k (1 ; )n;k d: (n + 1) 0 k
18
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
formula (1.16), or (1.21), we get, for x0 2 f0; 1gm such that PByn applying 0 x = h, i=1 i
h+1 P fXm+1 = 1jX1 = x01 ; :::; Xm = x0m g = m +2
1.2.3 More on in nitely extendible families
We can now present some more comments about in nitely extendible families of exchangeable events. In the case N0 = 1, we consider the variable
SN Nlim !1 N introduced in the proof 1 of Theorem 1.16. We recall that the distribution 0 appearing in (1.12) has the meaning of \initial" distribution of . We also note that, for s = 0; 1; :::; m, the distribution s;m , de ned by Equation (1.20) can be interpreted as the nal distribution of after observing fSm = sg (i.e. after observing s successes out of the rst m trials). In the beginning, Sm ;m is a \random distribution", i.e. a distribution which will depend on future results of the rst n observations. However, before taking the observations, one can claim that, for m ! 1, Sm ;m will converge to a degenerate distribution. As already noticed (Remark 1.17), E1 ; E2 ; ::: are conditionally independent and equiprobable given . Two alternative cases are possible: a) The initial distribution 0 is a degenerate one b) 0 is a non-degenerate distribution. In case a), E1 ; E2 ; ::: are independent and equiprobable. The personal probability P (Ei ) coincides with the value b taken by . This is an extreme case, where a person asserts that there is nothing to learn from the observations: Sm ;m trivially coincides with 0 ; with probability one for any m and, as a consequence of (1.21), it is
P fXm+1 = 1jX1 ; :::; Xm g = b a.s., m = 1; 2; ::: :
Case b) is more common. In this case E1 ; E2 ; ::: are not independent but positively correlated. can be interpreted as a parameter. On the basis of m past observations, the probability of success in a single trial is updated according to formula (1.21). When m is very large, such a probability is approximated by the observed frequency of successes.
FAMILIES OF EXCHANGEABLE EVENTS
19
The above discussion can be reformulated as follows:
Remark 1.20. Let E1; E2 ; ::: be independent events with P (Ei ) = b (i =
1; 2; :::); i.e. they are exchangeable with
!m = (b)m ; m = 1; 2; :::
or, equivalently,
!(m) = h
(1.23)
m
b m b m;h h () (1 ; ) ; 0 h m; m = 1; 2; ::::
Under this condition, by the law of large numbers, it is
SN ! b; a.s. N
(1.24)
i.e. = limN !1 SNN is a degenerate random variable. On the other hand, for a denumerable sequence of exchangeable events, the validity of the limit (1.24), for some given b 2 (0; 1), is not only a necessary condition, but also a sucient condition for the condition (1.23) to hold. If, on the contrary, the initial distribution 0 of (i.e. the limiting distribution of SNN ) is not degenerate, the events are positively correlated. In both cases, however, it is or
P fXN +1 = 1jX1 ; :::; XN g ; SNN ! 0; a:s: P fXN +1 = 1jX1 ; :::; XN g ! ; a:s:
Remark 1.21. By means of the concept of an in nitely extendible family we
give a mathematical meaning to the distributions 0 and s;n which are fundamental in the practice of Bayesian statistics. Moreover we recover the concepts of the frequentist approach in subjectivist terms. limN !1 SNN can be seen as the \frequentist probability". Frequentist probability and personal probability are two dierent concepts, anyway. They only occasionally coincide in the case of sequences of events which are (in the subjective point of view) independent and equiprobable. We see from (1.12) that in nitely extendible families can be represented by models of random drawings with replacements from urns (we already saw that an arbitrary family can be represented by models of drawings without replacement). Case a) corresponds to urns with known composition, while case b) corresponds to urns with unknown composition.
20
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
Remark 1.22. It is to be stressed however that in most cases the urn is purely
hypothetical and one thinks of that only for the sake of conceptual simplicity. This happens in all cases when we are not dealing with a real nite population from which we draw with replacement. In these cases the parameter has a purely hypothetical meaning and it is only introduced for mathematical convenience. To clarify the above, compare the situations described in Examples 1.5 and 1.6, respectively. In the former case the aim of statistical analysis is typically the estimation of the quantity ; since we draw observations from a real nite population and has a clear physical meaning of its own. In the latter case the object of statistical interest is rather the prediction problem. However it can be natural, as already mentioned, to assume in nite extendibility and then to de ne the variable , also for the present case. has the meaning of being the limit of P fXN +1 = 1jX1 ; :::; XN g; for N ! 1. We remark, anyway, that the concepts of subjective probability and exchangeability put us in a position of dealing with the broader class of inference problems which only concern with nite collections of similar events. In such problems the inference problem has a more realistic meaning and prediction is to be made about the quantity
SN 0 =
N0 X i=1
Xi :
We conclude this section by presenting a classical model of \contagious probabilities". This provides an example of conditional independence, arising from an urn model dierent from simple random sampling with replacement. Example 1.23. (Polya urns and Bayes-Laplace model). An urn initially contains M1 green balls and M ; M1 red ones, with M and M1 known quantities, like in the Examples 1.4 and 1.7. We perform random drawings from the urn but, this time, after any drawing we put the sampled ball again into the urn together with one additional ball of the same color. Denote again Ei f a green ball at the i-th drawingg; i = 1; 2; :::: This gives rise to an in nitely extendible family of exchangeable events. In the present case, we have a special type of positive dependence among events due to the direct in uence of observed results on the probability of success in the next trial.
EXCHANGEABLE RANDOM QUANTITIES
h;
21
P
m x0 = More precisely, it obviously is, for (x01 ; :::; x0m ) 2 f0; 1gm such that i=1 i 1+h P fXm+1 = 1jX1 = x01 ; :::; Xm = x0m g = M M + m:
Noting also that P fX1 = 1g = MM1 , we obtain, for x (x1 ; :::; xn ) such that P n x =k i=1 i
(n)
P fX = xg = !;kn = k
M1 (M1 + 1):::(M1 + k ; 1)(M ; M1 ):::(M ; M1 + n ; k ; 1) : M (M + 1):::(M + n ; 1) Since we deal with an in nitely extendible exchangeable family, there must exist a mixing distribution 0 allowing the quantities !k(n) to have the representation (1.12). Consider in particular the case M = 2; M1 = 1: This entails !k(n) = (n +1 1) ; k = 0; 1; :::; n: Thus we see that this model coincides with that of Bayes-Laplace discussed in Example 1.19. Then, in this case, 0 is the uniform distribution on the interval [0; 1].
1.3 Exchangeable random quantities In the previous section we dealt with binary random quantities; this allowed us to describe states of probabilistic indierence among several random events. In the present section we aim to formalize the condition of indierence among dierent random quantities; to this purpose we extend the concept of exchangeability to the case of real random variables, X1 ; X2 ; ::: . Example 1.24. Consider the lot of n units as in Example 1.1. Here we do not classify units according to being defective or good; rather, the quantity Xi which we associate to the unit Ui (i = 1; :::; n) is not binary; Xi might be for instance the life-length of Ui or its resistance to a stress, and so on. We want to provide a probabilistic description of the case when, before testing, there is no reason to distinguish among the units, as far as our own opinion about X1 ; :::; Xn is concerned. The situation of indierence relative to X1 ; :::; Xn in the example above is formalized by means of the condition of exchangeability for X1 ; :::; Xn .
22
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
De nition 1.25. X1; :::; Xn are exchangeable if they have the same joint distribution as Xj1 ; :::; Xjn ; for any permutation fj1 ; :::; jn g of the indexes f1; 2; :::; ng. In other words X1 ; :::; Xn are exchangeable if the joint distribution function
F (n) (x1 ; :::; xn ) P fX1 x1 ; :::; Xn xn g is a symmetric function of its arguments x1 ; :::; xn . Such a condition then means that the state of information on (X1 ; :::; Xn ) is symmetric. We note that of course exchangeability is actually a property of a joint distribution function F (n) (x1 ; :::; xn ); when convenient we speak of an exchangeable distribution function in place of a symmetric distribution function.
Remark 1.26. Independent, identically distributed, random variables are trivially exchangeable. Most of the time we shall be dealing with the case of absolutely continuous distributions, for which
F (n) (x1 ; :::; xn ) =
Z x1 Z x2 Z xn ;1 ;1
:::
;1
f (n)(1 ; :::; n )d1 :::dn
where f (n) denotes the joint probability density function. In such cases it is n
f (n) (x1 ; :::; xn ) = @x @:::@x Fn (x1 ; :::; xn ) 1
n
and, as it is easy to see, X1 ; :::; Xn are exchangeable if and only if f (n) is symmetric as well.
Example 1.27. Let (X1; X2) be a pair of real random variables, uniformly distributed over a region B of the plane:
f (2) (x
1 ; x2 ) =
1 area(B )
0
if (x1 ; x2 ) 2 B : otherwise
Then (X1 ; X2 ) are exchangeable if and only if B is symmetric with respect to the line x2 = x1 .
EXCHANGEABLE RANDOM QUANTITIES
23
Figure 1 Example of a symmetric set (X1 ; X2 ) are i.i.d. if and only if B is of the type:
B f(x1 ; x2 ) jx1 2 A; x2 2 Ag for some (measurable) region A of the real line. Several further examples of exchangeable random variables will be discussed in the next chapter; examples there are however restricted to the case of nonnegative variables. The concept of exchangeable random variables was introduced by de Finetti (1937). It characterizes, in a setting of subjective probability, the common concept of \random sample from a population". The distinction between sampling from nite and in nite populations is rendered in terms of the notions of nite and in nite extendibility, which we are going to recall next.
1.3.1 Extendibility and de Finetti's theorem for exchangeable random variables
We note that if X1 ; :::; Xn are exchangeable, Xj1 ; :::; Xjm also are exchangeable, with fj1 ; :::; jm g any subset of f1; 2; :::; ng. The joint distribution function of Xj1 ; :::; Xjm ;
F (m) (x1 ; :::; xm ) xm+1;:::;x lim !1 F (n) (x1 ; :::; xn ) n
24
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
depends of course on m and on F (n) but it does not depend on the particular choice of the subset fj1 ; :::; jm g. De nition 1.28. An exchangeable distribution function F (n)(x1 ; :::; xn ) is N extendible (N > n) if we can nd an exchangeable N -dimensional distribution function F (N ) such that F (n) is a (n-dimensional) marginal distribution of F (N ) . F (n) is in nitely extendible if it is N -extendible for any N > n: For a given exchangeable distribution function F (n) (x1 ; :::; xn ); the m-dimensional marginal F (m) (x1 ; :::; xm ) is of course n-extendible. Remark 1.29. If X1; :::; Xn are such that (Xi ; Xj ) < 0; (1 i 6= j n) = Cov V ar(Xi ) then their distribution cannot be N -extendible for N > 1 + j1j (see Remark 1.14). Example 1.30. Consider n exchangeable random quantities X1; :::; Xn such that, for some deterministic value K , it is n X
Pf
i=1
Xi = K g = 1
(1.25)
Suppose we could nd a further random Xn+1 such that X1 ; :::; Xn ; Xn+1 P +1quantity were exchangeable. By symmetry, P f ni=2 Xi = K g = 1 would result, whence P fX1 = X2 = ::: = Xn = Xn+1 g = 1; i.e. an exchangeable distribution satisfying (1.25) and such that P fX1 = X2 = ::: = Xn g < 1 cannot be extendible. Further examples of non-extendibility arising from applications of our interest will be presented in the next Chapter. Remark 1.31. From Remark 1.29, it follows that in nite extendibility implies (Xi ; Xj ) 0. This is not a sucient condition however: we can nd positively correlated exchangeable quantities which are not conditionally independent, identically distributed (see e.g. Exercise 1.60). Our aim now is to discuss how de Finetti's Theorem 1.16 can be extended to the case of in nitely extendible vectors of real exchangeable random variables. We start by considering two situations which obviously give rise to in nite extendibility. Let X1 ; :::; Xn be independent, identically distributed (i.i.d.) and let G(x) denote their common one-dimensional distribution function. The joint distribution function F (n) (x1 ; :::; xn ) = G(x1 ) G(x2 ) ::: G(xn ) (1.26)
EXCHANGEABLE RANDOM QUANTITIES
25
is obviously in nitely extendible: for any N > n; F (n) is the marginal of
F (N )(x1 ; :::; xN ) = G(x1 ) G(x2 ) ::: G(xN ): Let be a random variable taking values in a space L and with a distribution 0 . Let moreover X1 ; :::; Xn be conditionally i.i.d., given :
F (n) (x1 ; :::; xn j = ) = G(x1 j) G(x2 j) ::: G(xn j)
(1.27)
where G(xj) denotes their common one-dimensional conditional distribution function. The joint distribution function of X1 ; :::; Xn
F (n) (x
1 ; :::; xn ) =
Z
L
G(x1 j) G(x2 j) ::: G(xn j)d0 ()
(1.28)
is symmetric; furthermore it is obviously in nitely extendible: for any N > n; F (n) is the marginal of
F (N ) (x1 ; :::; xN ) =
Z
L
G(x1 j) G(x2 j) ::: G(xN j)d0 ():
We can now illustrate the de Finetti's representation result. Under quite general conditions, the latter shows that all possible situations of in nite extendibility are of the type (1.28) or of the type (1.26). In an informal way it can thus be stated as follows: Theorem 1.32. The distribution of exchangeable random variables X1; :::; Xn is in nitely extendible if and only if X1 ; :::; Xn are i.i.d. or conditionally i.i.d.. This result can be formulated, and proven, in a number of dierent ways. Complete proofs can be found in (de Finetti, 1937; Hewitt and Savage, 1955; Loeve, 1960 and Chow and Teicher, 1978). A much more complete list of references on exchangeability is contained in the exhaustive monograph by Aldous (1983). The essence of the de Finetti's representation result is based on the special relation existing between a symmetric distribution function F (N ) and its ndimensional marginal F (n) (n < N ). Below, we are going to explain this, rather than dwelling on a formal mathematical treatment. Our discussion will start with some notations and remarks. First of all for a xed set of N real numbers z1 ; :::; zN (not necessarily all distinct) denote by
N (z) (z(1) ; :::; z(N ))
26
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
the vector of the corresponding order statistics. Moreover, denote by the symbol Fz(N ) the distribution function corresponding to the discrete N -dimensional probability distribution Pz(N ); uniform on the set of permutations of fz1; :::; zN g and let Fz(N;n) be the distribution function of the discrete n-dimensional probability distribution Pz(N;n), uniform on the set of all the n-dimensional vectors of the form (zj1 ; :::; zjn ), with fj1 ; :::; jn g any subset of f1; ::::; N g.
Remark 1.33. For our purposes, it is useful to look at Pz(N ) as the probabil-
ity distribution of (Y1 ; :::; YN ), where (Y1 ; :::; YN ) is a random permutation of (z1 ; :::; zN ), i.e. it is a sample of size N without replacement from a N -size population with elements fz1; :::; zN g. Similarly, Pz(N;n) can be seen as the probability distribution of a random sample of size n; (Y1 ; :::; Yn ); and this shows that Pz(N;n) does coincide with the n-dimensional marginal of Pz(N ) .
We need to emphasize that Fz(N ) and Fz(N;n) depend on z =(z1 ; :::; zN ) symmetrically or, in other words, that they depend on z only through the vector N (z); to this aim we shall also write ) Fz(N ) F(NN ()z) ; Fz(N;n) F(N;n N (z) :
For n = 1, it is 1 1) F(N; N (z) (x) = N
N X i=1
1[xzi] ; 8x 2 R
(1.29)
1) i.e. F(N; N (z) (x) does coincide with the empirical distribution function which counts, for any real number x, the number of components of z which are not greater than x. Remark 1.34. The well known fact that, when the size of a sampled population goes to in nity, random sampling without replacement tends to coincide with random sampling with replacement can be rephrased by using the notation introduced above. Let fz1; z2 ; :::g be a denumerable sequence of real numbers and (1 ; :::; n ) a given vector; then, for N ! 1; ) (N;1) (N;1) F(N;n N (z) (1 ; :::; n ) ! FN (z) (1 ) ::: FN (z) (n )
(1.30)
The primary role of the distributions Pz(N ) and Pz(N;n) is explained by the following result. Let F (N ) be an N -dimensional symmetric distribution function and let F (n) be its n-dimensional marginal.
EXCHANGEABLE RANDOM QUANTITIES
27
Proposition 1.35. F (n) can be written as a mixture of distribution functions of the type Fz(N;n); the mixing distribution being F (N ), i.e.
F (n) (1 ; :::; n ) =
Z
RN
Fz(N;n)(1 ; :::; n )dF (N ) (z)
(1.31)
Proof. Consider N exchangeable random quantities Z1 ; :::; ZN with joint distribution function F (N ) : The property of symmetry means that F (N ) coincides with the distribution function of any permutation of Z1 ; :::; ZN ; i.e., using the notation introduced above
F (N ) (1 ; :::; N ) =
Z
RN
Fz(N ) (1 ; :::; N )dF (N ) (z)
(1.32)
Equation (1.31) can then be immediately obtained by passing to the ndimensional marginals of both sides of such identity and then taking into account that Fz(N;n) is the n-dimensional marginal of Fz(N ):
F (n) (1 ; :::; n ) = ;:::; lim !1 F (N )(1 ; :::; N ) n+1 N =
Z
[ lim Fz(N )(1 ; :::; N )]dF (N ) (z) RN n+1 ;:::;N !1 =
Z
RN
Fz(N;n)(1 ; :::; N )dF (N ) (z)
We can say that the formula (1.32) is of fundamental importance. The above arguments can be summarized as follows: exchangeable random variables X1 ; :::; XN have the same distribution of a random permutation of them and this entails that, for n < N; (X1 ; :::; Xn ) has the same joint distribution as a random sample (Y1 ; :::; Yn ) from a population formed by the elements fX1 ; :::; XN g. The representation (1.31) can be seen as the analog of formula (1.11); for our purposes it is convenient to rewrite it in the form
h
i
) F (n) (1 ; :::; n ) = E F(N;n N (X1 ;:::;XN ) (1 ; :::; n ) :
(1.33)
The essence of de Finetti's theorem can now be heuristically grasped by letting N go to in nity in (1.33). First of all consider, 8 2 R; the sequence of random variables 1) fF(N; N (X1 ;:::;XN ) ( )gN =1;2;:::
28
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
1) By (1.29), F(N; N (X1 ;:::;XN ) ( ) is the \frequency of successes" in the family of exchangeable events fE1 ( ); :::; EN ( )g, where Ej ( ) (Xj ): As mentioned in the previous section then, there exists a random quantity F ( ) 1) such that fF(N; N ( X1 ;:::;X N ) ( )g converges to F ( ); a.s.. Taking into account that, on the other hand, the limit in (1.30) holds, for N ! 1, we have ) (N;1) (N;1) F(N;n N (X1 ;:::;XN ) (1 ; ::; n ) ! FN (X1 ;:::;XN ) (1 ) ::: FN (X1 ;:::;XN ) (n )
! F (1 ) ::: F (n )
Thus, by letting N ! 1 in the equation (1.32), we can write F (n) (1 ; :::; n ) = E [F (1 ) ::: F (n )] : (1.34) F () can be seen as a random one-dimensional distribution function depending on the whole random sequence fN (X1 ; X2; :::XN )g. We can conclude with the following. Proposition 1.36. X1; X2; ::: are conditionally independent, identically distributed, given F (): Remark 1.37. The sequence fN (X1; X2; :::; XN )g had, in the last result, a role analogous to the one of the sequence fN g in the proof of de Finetti's theorem for exchangeable events and F () has now a role analogous to that of the limit random variable there. Some of the arguments contained in Remark 1.20 can be extended to the more general case. Remark 1.38. F () can of course be de ned only under the condition of in nite extendibility. We stress that F () is in general random (i.e. unknown to us, before taking the observations X1 ; X2 ; :::). Let us denote by the symbol L the law of F (). L is a probability distribution on the space of all one-dimensional distribution functions. L is a degenerate law concentrated on a given distribution function Fb() if and only if X1 ; X2 ; ::: are independent, identically distributed and Fb() is their one-dimensional distribution. Indeed if L is degenerate then X1 ,X2 ,... are independent by Proposition 1.36. On the other hand if X1 ; X2 ; ::: are independent, identically distributed with a distribution Fb() then the Glivenko-Cantelli theorem (see e.g. Chow and Teicher, 1978) ensures that 1) b a:c: sup jF(N; N (X1 ;:::;XN ) ( ) ; F ( )j ! 0 ;1<x<1
whence we can claim that L is concentrated on Fb( ).
EXCHANGEABLE RANDOM QUANTITIES
29
It will be shown in the next section that, under the condition of in nite extendibility, it is in general possible to nd dierent objects, with respect to which X1 ; X2::: are conditionally i.i.d.. In any case the \limiting empirical distribution" F () is one of those, as we have just seen. In special cases (parametric models ), such \objects" can also be given a nite-dimensional representation; an instance of that is of course provided by denumerable sequences of f0; 1g;valued exchangeable random quantities. Parametric models are the common models considered in parametric statistics; they are characterized by the existence of a random variable , taking values in a domain L Rk (for some k = 1; 2; :::) such that X1 ; X2 ; :::are conditionally independent, identically distributed given . This means that, for a suitable family of one-dimensional distribution functions fG(j)g2L and a probability distribution 0 , we have
F (n) (x1 ; x2 ; :::; xn ) =
Z
L
G(x1 j)G(x2 j):::G(xn j)d0 ():
(1.35)
In some of the above cases, the parameter has a clear physical meaning of its own. In other cases the meaning of can be explained in terms of the concept of predictive suciency and of de Finetti-type results, which will be brie y illustrated in the next section.
1.3.2 The problem of prediction
We now turn to considering the problem of predictive inference for exchangeable random variables, namely to studying conditional distributions of variables Xm+1 ; :::; Xm+n ; given the values taken by previous observations X1 ; :::; Xm . We look at such a conditional distribution as a tool for describing changes of uncertainty, caused by previous observations, about (random) quantities still to be observed. Of course the conditional distribution is to be obtained from the assessment of the joint distribution of X1 ; :::; Xm; Xm+1 ; :::; Xm+n : The possibility of dealing with joint and conditional distributions for the variables X1 ; X2 ::., avoiding the intervention of a parameter , is one of the fundamental features of the Bayesian approach. Here we consider a set of exchangeable random quantities X1 ; :::; XN . We shall assume the existence of a joint density; for fj1 ; :::; jk g f1; :::; N g, the joint density function of k quantities Xj1 ; :::; Xjk will be denoted by the symbol f (k) . Of course, one has
f (k) (x
1 ; :::; xk ) =
Z1 Z1 ;1
:::
;1
f (N ) (x1 ; :::; xk ; 1 ; :::; N ;k ) d1 :::dN ;k (1.36)
30
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
For 1 m < n + m N , the conditional density of Xm+1 ; :::; Xm+n , given an observation
fX1 = x01 ; :::; Xm = x0m g
such that f (m) (x01 ; :::; x0m ) > 0, is
f (n) (xm+1 ; :::; xm+n jx01 ; :::; x0m ) =
(m+n)
0
0
; :::; xm ; xm+1 ; :::; xm+n ) R 1 ::: R 1 ff(m+n) (x(x0 ;1:::; x0m ; m+1 ; :::; m+n ) dm+1 :::dm+n 1 ;1 ;1
(1.37)
Consider now in particular the special case of parametric models, de ned by a joint distribution of the form (1.35), where X1 ; :::; Xn ; ::: are conditionally i.i.d. with respect to the parameter . In such a case we have absolute continuity if and only if the one-dimensional distributions G(j) admit density functions g(j), 2 L and the n-dimensional (predictive) density function f (n) (x1 ; :::; xn ) takes the form
f (n) (x1 ; :::; xn ) = For x01 ; :::; x0m such that
Z
L
g(x1 j):::g(xn j)d0
f (m) (x01 ; :::; x0m ) > 0;
(1.38) (1.39)
let m (jx0 ) be the conditional (i.e. \posterior") distribution of , given fX1 = x01 ; :::; Xm = x0m g. By Bayes' theorem, it is 0
0
(xm j)d0 dm (jx0 ) = R g(gx(1xj0j):::g ):::g(x0 j)d L
1
m
0
(1.40)
In terms of such a conditional distribution, we can write the conditional density f (n) (xm+1 ; :::; xm+n jx0 ) in a shape formally similar to that of equation (1.38) itself, as the following Proposition shows. Proposition 1.39. Under the condition (1.38), we have, for x0 (x01 ; :::; x0m ) such that (1.39) is satis ed,
f (n) (xm+1 ; :::; xm+n jx01 ; :::; x0m ) =
Z L
g(xm+1 j):::g(xm+n j)dm (jx0 )
(1.41)
EXCHANGEABLE RANDOM QUANTITIES
31
Proof. We only must notice that, under the condition (1.38), the formula (1.37) results in
f (n) (xm+1 ; :::; xm+n jx01 ; :::; x0m ) =
R g(x j):::g(x j)g(x0 j):::g(x0 j)d m+n 0 1 m L m+1 R 0
L g (x1 j):::g (xm j)d0
0
whence the equation (1.41) immediately follows by taking into account the identity (1.40). Note that the equation (1.41) has a clear heuristic interpretation in terms of conditional independence between (X1 ; :::; Xm ) and (Xm+1 ; :::; Xm+n), given . The following Remark aims to illustrate heuristically the meaning of the distribution 0 , in a parametric model. A complete treatment is beyond the scope of this monograph. However the arguments below can be seen as an appropriate extension of those presented in Remark 1.20. Remark 1.40. 0 is degenerate if and only if X1; X2; :::; are i.i.d.. In such a case also m (jX1 ; :::; Xm ) is obviously degenerate, with probability one, for any m. In other cases one can claim that m (jX1 ; :::; Xm ) will converge, with probability one, toward a degenerate distribution, concentrated on some value
b(X1 ; X2 ; :::):
b(X1 ; X2 ; :::) is a L-valued random variable, which has the meaning of Bayes
asymptotic estimate of and, in such cases, its distribution does coincide with 0 .
It is important to notice the following consequence of equation (1.41) and of the fact that m (jX1 ; :::; Xm ) is, asymptotically, degenerate: conditionally on a very large number N of past observations X1 ; :::; XN , further observations XN +1 ; :::; XN +n , tend to become independent, identically distributed.
Remark 1.41. Think of the situation in which X1; X2; ::: are i.i.d. with a distribution depending on a parameter ; X1 ; X2 ; ::: are trivially exchangeable. The property of independence is lost when is considered as a random variable with a distribution 0 of its own and we \uncondition" with respect
32
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
to , i.e. when we pass from a joint distribution of the form (1.27) to one of the form (1.28). By contrast the property of exchangeability is maintained under \unconditioning". Remark 1.42. Stochastic independence is, from a Bayesian standpoint, a stronger condition than physical independence: in most cases we deal with random quantities representing physically independent phenomena, but such that, due to the common dependence of their distribution on the same (random) factor, are not stochastically independent. This aspect will be analyzed further in Subsection 3.3. of Chapter 3. Finally we want to direct the reader's attention toward a further class of cases in which changes in the state of information destroy independence without aecting exchangeability. Remark 1.43. Let X1; :::; Xn be i.i.d. and let Z (x1 ; :::; xn ) be a symmetric function with values in the image space Z (Rn ). Fix a subset S1 of Z (Rn ) and consider the conditional distribution of X1 ; :::; Xn , given the event fZ (X1 ; :::; Xn ) 2 S1 g: Under such a conditioning operation, the property of independence is generally lost. However, due to the symmetry of Z , exchangeability is maintained. We note that, in such cases, we in general meet situations of nite exchangeability.
1.4 de Finetti-type theorems and parametric models The classic de Finetti's Theorem 1.32 shows that the assumption of exchangeability for a denumerable sequence of random quantities is equivalent to a model of (conditional) independence and identical distribution. The law of the random distribution F , appearing in the de Finetti's representation, is in general a probability law L on the space of all the one-dimensional distribution functions (see Remark 1.38). As we saw, assessing that L is a degenerate law concentrated on a single distribution function Fb is equivalent to impose that X1 ; X2 ; ::: are, in particular, i.i.d. (with distribution Fb). We can claim that a parametric model of the form (1.35) arises when we assess that L is concentrated on a parametrized sub-family fG(j)g2L: Of course parametric models are very important both in the theory and in the applications. We shall see, in particular, that notions to be developed in the next chapters take special forms when probabilistic assessment is described by means of parametric models.
DE FINETTI-TYPE THEOREMS AND PARAMETRIC MODELS
33
For this reason it is of interest to show possible motivations for assessments of this special kind. Motivations can come from symmetry (or invariance) assumptions, more speci c than simple exchangeability, as is demonstrated by de Finetti-type results. A de Finetti-type theorem for a sequence of exchangeable random quantities X1 ; X2 ; ::: is any result proving that a speci c situation of conditional independence, with a xed family of conditional distributions, is characterized by means of certain invariance properties of n-dimensional (n = 1; 2; :::) predictive distributions. This will be illustrated by concentrating attention on a particular case, which has special interest in our setting. We assume existence of joint density functions and the symbol f (n) will again denote the n-dimensional joint density of n variables Xj1 ; Xj2 ; :::Xjn : Theorem 1.44. Let X1; X2; :::be non-negative random variables and let the density functions f (n) (n = 2; 3; :::) satisfy the following implication: n X i=1
n X
x0i =
i=1
x00i ) f (n) (x01 ; :::; x0n ) = f (n) (x001 ; :::; x00n ):
(1.42)
Then, for a suitable probability distribution 0 () on [0; +1),
f (n)(x
1 ; :::; xn ) =
Z1 0
n expf;
n X i=1
xi gd0 ()
(1.43)
Proof. Put
F (1) (t) P fX1 > tg; 1 (t) f (1) (t) = ; dtd F (1) (t); t 0:
By the assumptions (1.42), we can nd non-negative functions 2 ; 3 ; ::: such that n X
f (n) (x1 ; :::; xn ) = n (
i=1
xi ):
(1.44)
By combining the equation (1.44) with the condition that f (n) is the n-dimensional marginal of the density f (n+1) , we obtain, for n = 1; 2; :::, n X
n (
i=1
xi ) =
Z1 0
whence
n (t) =
Z1 t
n X
n+1 (
i=1
xi + u)du;
n+1 (v)dv:
34
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
This shows that n is dierentiable and that it is
d (t) = ; (t): n+1 dt n Thus F (1) (t) admits derivatives of any order and
dn F (1) (t) = (;1)n (t); n dtn
(1.45)
furthermore it is obviously such that
F (1) (0) = 1:
(1.46)
A function satisfying the conditions (1.45) is completely monotone. Any completely monotone function satisfying the additional condition (1.46) is the Laplace transform of a probability distribution 0 on [0; +1]. i.e., for some 0 we can write (1)
F (t) =
Z1 0
expf;tgd0();
(1.47)
see e.g. Feller (1971), pg. 439. The equation (1.43) is then obtained by combining Equation (1.44) with (1.45) and (1.47). A more general proof can be found in Diaconis and Freedman (1987).
Remark 1.45. (Mathematical meaning of Theorem 1.44). We say that a joint
density function is Schur-constant if it satis es condition 1.42. Joint (predictive) densities of n conditionally independent, exponentially distributed nonnegative random variables (i.e. densities satisfying Equation (1.43)) are special cases of Schur-constant densities. Although these are not the only cases, they however constitute the only cases compatible with the condition of being marginal of N-dimensional Schur-constant densities for any N . Another way to state the same fact is that denumerable products of exponential distributions are the extreme points of the convex set of all the distributions on R1 + with absolutely continuous nite dimensional marginals satisfying the condition (1.42).
Remark 1.46. (Statistical meaning of Theorem 1.44). As we shall see in Chap-
ter 4, the property (1.42) is equivalent to a very signi cant condition of invariance in the context of survival analysis, which will be interpreted as a \no-aging property" for a n-tuple of lifetimes. Theorem 1.44 states that, in the case of in nite extendibility, imposing such a condition for any n is equivalent to assessing the \lack of memory" model (1.43).
DE FINETTI-TYPE THEOREMS AND PARAMETRIC MODELS
35
1.4.1 Parametric models and prediction suciency
As mentioned above, certain parametric models for sequences of exchangeable random variables can be justi ed in terms of de Finetti-type theorems. Generally, de Finetti-type theorems can be approached from a number of different mathematical viewpoints (see e.g. Dynkin, 1978; Dawid,1982; Lauritzen, 1984; Diaconis and Freedman, 1987; Ressel, 1985; Lauritzen, 1988 and references cited therein). Several examples are presented in Diaconis and Freedman (1987). A further, although not exhaustive, bibliography is given at the end of this Chapter. In what follows we shall not analyze speci c parametric models. Rather we aim to give just a mention of the possible role of the notion of prediction suciency within the framework of de Finetti-type results. Let X1 ; X2 ; :::; be an N -extendible family of exchangeable quantities (possibly N = 1); we limit ourselves to consider the case when (X1 ; X2 ; :::; Xm ) admit a probability density f (m), for m = 1; 2; :::; N . Let Zm be a given statistic of X1 ; X2 ; :::; Xm . De nition 1.47. Zm is a prediction sucient statistic if, for any n such that m + n N, (Xm+1 ; :::; Xm+n ) and (X1 ; :::; Xm) are conditionally independent given Zm(X1 ; :::; Xm ). The existence of statistics can be seen as a condition of invariance, stronger than exchangeability, for the joint densities and we have the following: Proposition 1.48. The above de nition is equivalent to the existence of functions m;n such that, for x0 2 Rm , and x (xm+1 ; :::; xm+n ) 2 Rn , one has
f (m+n)(x0 ; x) = m;n (Zm (x0 ); x)f (m) (x0 ) (1.48) Proof. Conditional independence among (Xm+1 ; :::; Xm+n ) and (X1 ; :::; Xm ), given Zm(X1 ; :::; Xm) means that the conditional density f (n) (xjX1 = x01 ; :::; Xm = x0m ) depends on x0 only through the value Zm (x0 ) and then f (n) (xjX1 = x01 ; :::; Xm = x0m ) = m;n (x;Zm (x0 ) :
Whence Equation (1.48) follows for the joint density
f (m+n) (x0 ; x) = f (m)(x0 )f (n) (xjX1 = x01 ; :::; Xm = x0m ) of X1 ; :::; Xm; Xm+1 ; :::; Xm+n .
Remark 1.49. De nition 1.47 does not require in nite extendibility. Even in the case of in nite extendibility, anyway, the concept of prediction suciency
36
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
is formulated without referring to a \parameter", i.e. to a random variable , with respect to which X1 ; X2 ; ::: are conditionally i.i.d. This feature of the de nition is indeed convenient in the present setting, where the problem of interest is just proving the existence of a parameter, in terms of conditions of invariance or of suciency. For our purposes, we are interested in the study of in nitely extendible families admitting chains of prediction sucient statistics
Z1 (X1 ); Z2 (X1 ; X2 ); :::; Zm (X1 ; X2 ; :::; Xm ); :::. More speci cally our attention is to be focused on sequences of prediction suf cient statistics with the following two additional properties: i) for any m, Zm(x1 ; :::; xm ) is symmetric, i.e.
Zm (x1 ; :::; xm ) = Zm (xj1 ; :::; xjm )
(1.49)
for any permutation (j1;:::; jm ) of (1; :::; m) ii) the sequence fZmgm=1;2;:::;N is algebraically transitive i.e. mappings m exist such that
Zm+1 (x1 ; :::; xm ; xm+1 ) = m+1 (Zm (x1 ; :::; xm ); xm+1 ) i.e., such that for all x, the following implication holds Zm(x01 ; :::; x0m ) = Zm(x001 ; :::; x00m ) ) Zm+1 (x01 ; :::; x0m ; x) = Zm+1(x001 ; :::; x00m ; x):
(1.50)
Before continuing, it is convenient to reconsider known cases in this framework.
Example 1.50. Let X1; X2; :::; XN be f0; 1g-valued exchangeable quantities and consider, for m = 1; 2; :::; N ; 1, the statistic Sm (x1 ; :::; xm ) =
m X i=1
xi :
Sm obviously is a prediction sucient statistic, as is shown by the formula (1.17). Sm is trivially a symmetric function and the property (1.50) is satis ed in that
Sm (x1 ; :::; xm ; xm+1 ) = Sm (x1 ; :::; xm ) + xm+1 :
DE FINETTI-TYPE THEOREMS AND PARAMETRIC MODELS
37
Example 1.51. Consider a parametric model in which f (n) (n 1) is of the form (1.38) with fg(xj)g2L in particular being a regular exponential family of order 1: Namely we assume that, for suitable real functions a; b; H; K;
g(xj) = a(x)b() expfH (x)K ()g
(1.51)
This means that, for n = 1; 2; :::;
f (n) (x1 ; :::; xn ) =
Yn
Z
a(xi ) [b()]n expfK () Zn(x1 ; :::; xn )gd0 ()
(1.52)
L i=1 P H (x ). Z (X ; :::; X ) where Zn : Rn ! R is de ned by Zn (x1 ; :::; xn ) m i m 1 m i=1
is sucient in the usual sense of Bayesian statistics (see e.g. Zacks, 1971). One can easily show that, for any m, Zm is also prediction sucient. It is immediate that Zm satis es the conditions (1.49) and (1.50). Example 1.52. Let X1; :::; XN be non-negative exchangeable quantities with a joint density function satisfying condition (1.42). Letting, for m = 1; :::; N ; 1,
Sm (x1 ; :::; xm )
m X i=1
xi
it is immediately seen that Sm is a prediction sucient statistic, satisfying the conditions (1.49) and (1.50). Example 1.53. Consider now an arbitrary family X1; :::; XN of (real-valued) exchangeable quantities and let
Zm(x1 ; :::; xm ) m (x1 ; :::; xm )
;
where m (x1 ; :::; xm ) denotes the vector x(1) ; :::; x(m) of order the statistics of (x1 ; :::; xm ). Due to exchangeability, it is immediate that Zm (X1 ; :::; Xm ) is a prediction sucient statistic. The conditions (1.49) and (1.50) are trivially satis ed. Remark 1.54. Example 1.53 shows that we can always nd at least one sequence of prediction sucient statistics satisfying the additional properties (1.49) and (1.50). This is the sequence of order statistics. However this may not be the only one, as Examples 1.50, 1.51 and 1.52 show. Let us now come back to the discussion, presented in the previous section, concerning the classical de Finetti representation result 1.32. We can see that
38
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
the sequence of order statistics plays there a role similar to that of the speci c sequence of statistics
Sn (X1 ; :::; Xn ) =
n X i=1
Xi
for the case of exchangeable events. In this respect, recall also the Remark 1.37. Starting from these considerations, we can in general remark that any sequence of prediction sucient statistics can play a similar role, whenever it satis es the conditions (1.49) and (1.50). Speci cally, one can prove in fact that the sequence fZm gm=1;2;::: allows us to represent X1 ; X2 ; ::: as a sequence of conditionally i.i.d. random quantities: for xed n, X1 ; X2; :::; Xn tend to become conditionally i.i.d. given ZN (X1 ; X2 ; :::; XN ), when N ! 1. In this respect, the following fact is relevant: consider the variables
Z1 (X1 ); :::; ZN ;1 (X1 ; :::; XN ;1 ) for exchangeable quantities X1 ; :::; XN .
By taking into account the conditional independence of (X1 ; :::; Xm ) and (Xm+1 ; :::; XN ) given Zm (X1 ; :::; Xm ), one can show that Z1 (X1 ); Z2 (X1 ; X2 ); : : : have properties which generalize those speci ed by equations (1.8) and (1.9). More precisely, it is only a technical matter to show the following result. Proposition 1.55. For m; n such that m + n < N , for arbitrary (measurable) sets Bm and Bm+n , the following properties hold:
P fZm+n (X1 ; :::; Xm+n ) 2 Bm+n jX1 ; :::; Xm g = P fZm+n (X1 ; :::; Xm+n ) 2 Bm+n jZ1 ; :::; Zm g = P fZm+n (X1 ; :::; Xm+n ) 2 Bm+n jZm (X1 ; :::; Xm)g
(1.53)
and
P fZm (X1 ; :::; Xm ) 2 Bm jZm+n; Xm+n+1 ; :::; XN g = P fZm (X1 ; :::; Xm ) 2 Bm jZm+n (X1 ; :::; Xm+n)g
(1.54)
When, in the in nitely extendible case, the sequence
Z1 (X1 ); :::; ZN (X1 ; :::; XN ); ::: converges to a random variable , one can nd technical conditions under which X1 ; X2 ; ::: are conditionally i.i.d. with respect to .
EXERCISES
39
Remark 1.56. Results of the type mentioned above allow us to characterize,
and justify, speci c nite-dimensional parametric models of the type (1.35), by assigning the sequence of sucient statistics. Note also that, in such cases, the \parameter" , being the limit of the sequence fZN (X1 ; :::; XN )g, is given a meaning as a function of the whole sequence of the observable quantities X1 ; X2 ; :::(see also the discussion in Dawid, 1986). It is of interest also to point that, for ithappen out can cases, those that, in any n, (X1 ; :::; Xn ) and are conditionally independent given Zn(X1 ; :::; Xn ), i.e. Zn (X1 ; :::; Xn ) issucien t for , according to the usual de nition.
1.5 Exercises Exercise 1.57. Prove Proposition 1.3. Exercise 1.58. For n exchangeable events E1; :::; En , let Sn = Pni=1 Xi be the number of successes. Check that E1 ; :::; En are independent ififonly and the distribution of Sn is binomial b(n; ), for some .
Exercise 1.59. Let f (x1; x2 ) be the symmetric joint density function de ned by
8 for (x ; x ) 2 C \ A < 1 2 f (x1 ; x2 ) = : for (x1 ; x2 ) 2 C \ A 0 for (x1 ; x2 ) 2 C
where ; > 0,
A f(x1 ; x2 )jx1 x2 0g C f(x1 ; x2 )j ; 1 x1 + x2 1; ;z x1 ; x2 z g; z > 2: (see Figure 2)
40
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
C
1
A
Ã
1
-1 Ã
A
-1
Figure 2 Show that f is not 3-extendible. Hint: note that a necessary condition for 3-extendibility is the possibility of nding out a domain 3 R3 ; such that (a) for any permutation (i1 ; i2; i3 ) of (1; 2; 3), (x1 ; x2 ; x3 ) 2 3 ) (xi1 ; xi2 ; xi3 ) 2 3 (b)
2 = C where
2 f(x1 ; x2 ) 2 R2 j 2 R exists such that (x1 ; x2 ; ) 2 3 g
Exercise 1.60. (Continuation) For the above density, prove that we can have cases of positive correlation or negative correlation, by suitably choosing the constants a and . The next three exercises concern discrete random variables. Consider random variables X1 ; :::; Xn and let X fz1 ; :::; zdg be the set of their possible values. Note that the condition of exchangeability in this cases becomes P fX1 = x1 ; :::; Xn = xn g = P fX1 = xj1 ; :::; Xn = xjn g where x1 ; :::; xn 2 X and (xj1 ; :::; xjn ) is any permutation of (x1 ; :::; xn ).
EXERCISES
41
Exercise 1.61. (\Occupation numbers"). De ne Nh Pnj=1 1fXPj =zhg; h =
1; 2; :::; d; Nh is the number of variables taking on the value zh , and dh=1 Nh = n. Prove that, when X1 ; :::; Xn are exchangeable, the joint distribution of (X1 ; :::; Xn ) is determined by the joint distribution of (N1 ; :::; Nd ) and derive that X1 ; :::; Xn are i.i.d. if and only if the distribution of (N1 ; :::; Nd) is multinomial. Exercise 1.62. (\Bose-Einstein" model). There are d dierent objects in a lot; any object is of dierent type. We sample an element at random and, after inspection, we reinsert the object into the lot together with another object of the same type; we continue in this way for n subsequent times. For j = 1; :::n; h = 1; 2; :::; d, de ne
Xj = h if the object sampled at the j-th time is of the type h. Find the joint distribution of (X1 ; :::; Xn ). (Hint: nd the distribution of the vector of occupation numbers (N1 ; :::; Nd ), de ned in the previous exercise.) Exercise 1.63. Let X1; :::; Xn be exchangeable random variables taking values in N f1; 2; :::g and such that
P fX = xg =
P
0n 1 X n @ xj A : j =1
Compute the distribution of Sn nj=1 Xj and the conditional distribution of (X1 ; :::; Xn ) given fSn = sg. Check that this does not depend on the particular choice of n . The next exercise concerns continuous variables with \Schur-constant" densities, i.e. satisfying the invariance condition
0n 1 X f (n) (x) = n @ xj A : j =1
Such a condition can be seen as the most direct analog, for the case of continuous random variables, of thePnotion of exchangeability for binary random variables. Again we denote Sn nj=1 Xj . Exercise 1.64. a) Prove that Sn has a density of the form
fSn (s) = Kn (s)n (s) ; s 0
42
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
where the function Kn does not depend on n . b) Prove that X1 ; :::; Xn are i.i.d., exponentially distributed, if and only the distribution of Sn is gamma with parameters n and for some > 0:
fSn (s) / sn;1 expf;sg; s 0
and derive that
n;1 Kn (s) = (ns ; 1)! :
c) Prove that, for 1 m < n, the conditional density of (X1 ; :::; Xm ) given X1 ; :::; Xn are i.i.d., exponentially distributed. d) In terms of the function n , write the conditional density of Xn , given fX1 = x1 ; :::; Xn;1 = xn;1 g. Exercise 1.65. Let (Y1; :::; Yn ) a vector of non-negative random variables with a (non-exchangeable) joint density fY . Prove that we can nd an exchangeable density f (n) for random variables (X1 ; :::; Xn ) such that the vectors of order statistics of (Y1 ; :::; Yn ) and (X1 ; :::; Xn ) have the same joint probability distribution. Exercise 1.66. Prove Proposition 1.48. Exercise 1.67. Consider a parametric model with fg(xj)g2L being a regular exponential of order 1 as in the Example 1.51. Check that Zn (x1 ; :::; xn ) Pm H (x ) family i is prediction sucient. i=1 Sn is the same as in the special case when n
1.6 Bibliography Aldous, D.J. (1983). Exchangeability and related topics. Springer Series Lecture Notes in Mathematics, 1117. Bernardo, J. and Smith, A. F. M. (1994). Bayesian theory. John Wiley & Sons, New York. Bruno, G. and Gilio, A. (1980). Application of the simplex method to the fundamental theorem for the probabilities in the subjective theory. (Italian). Statistica, 40, 337{344 Cassel, D., Sarndal, C.E. and Wretman, J.H. (1977). Foundations of inference in survey sampling. John Wiley & Sons, New York. Chow, Y. S. and Teicher H. (1978). Probability Theory. Independence, Interchangeability, Martingales. Springer Verlag, New York. Cifarelli, M.D. and Regazzini, E. (1982), Some considerations about mathematical statistics teaching methodology suggested by the concept of exchangeability. In Exchangeability in Probability and Statistics, G. Koch and F. Spizzichino, Eds. North-Holland, Amsterdam.
BIBLIOGRAPHY
43
Dawid, A. P. (1979). Conditional independence (with discussion). J. Roy. Statist. Soc. Ser. B 41 (1979), no. 1, 1{31. Dawid, A.P. (1982). Intersubjective statistical models. In Exchangeability in Probability and Statistics, G. Koch and F. Spizzichino, Eds. North-Holland, Amsterdam. Dawid, A.P. (1985). Probability, symmetry and frequency. British J. Philos. Sci., 36, 107- 112. Dawid, A.P. (1986). A Bayesian view of statistical modelling. In Bayesian inference and decision techniques, P.K. Goel and A. Zellner, Eds. Elsevier, Amsterdam. de Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti della R. Accademia Nazionale dei Lincei, Ser. 6, Mem. Cl. Sci. Mat. Fis. Nat., 4, 251-299. de Finetti, B. (1937). La Prevision: ses lois logiques, ses sources subjectives. Ann. Inst. Henry Poincare,7, 1-68. de Finetti, B. (1952). Gli eventi equivalenti ed il caso degenere. G. Ist. Ital. Attuari, Vol. 15, 40-64. de Finetti, B. (1964). Alcune osservazioni in tema di \suddivisione casuale". G. Ist. Ital. Attuari, 27, 151{173. de Finetti, B. (1970). Teoria delle Probabilita'. Einaudi, Torino. English translation: Theory of Probability, John Wiley and Sons, New York (1974). de Finetti, B. (1972). Probability, induction and statistics. The art of guessing. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, London-New York-Sydney. Diaconis, P. and Freedman, D. (1987). A dozen de Finetti-style results in search of a theory. Ann. Inst. Henry Poincare 23, 397-423. Diaconis, P. and Freedman, D. (1990). Cauchy's equation and de Finetti's theorem. Scand. J. Stat. 17, 235-250. Diaconis, D. and Ylvisaker, D. (1985). Quantifying prior probabilities. In Bayesian Statistics, Vol. 2, Bernardo, De Groot, Lindley, Smith Eds. Dubins, L. and Freedman, D. (1979). Exchangeable process need not be mixtures of identically distributed random variables. Z. Wahrsch. verw. Gebiete, 33, 115-132. Dynkin, E.B. (1978). Sucient statistics and extreme points. Ann. Probab., 6, 705-730. Ericson, W.A. (1969) Subjective Bayesian models in sampling nite populations. J. Roy. Statist. Soc Ser. B. 31, 195-233. Feller, W. (1968 ) An Introduction to probability theory and its applications. Vol. 1. Third edition John Wiley & Sons, New York-London-Sydney. Feller, W. (1971). An Introduction to probability theory and its applications. Vol. 2. John Wiley & Sons, New York-London-Sydney. Flourens, J.P., Mouchart, M.J. and Rolin, M. (1990). Elements of Bayesian statistics. Marcel Dekker, New York-Basel.
44
EXCHANGEABILITY AND SUBJECTIVE PROBABILITY
Heath, D. and Sudderth, W. (1976). De Finetti's theorem for exchangeable random variables. Amer. Statist. 30, no. 4, 188{189. Hewitt, E. and Savage, L.J. (1955). Symmetric measures on cartesian products. Trans. Am. Math. Soc., 80, 470-501. Hill, B.M. (1993). Dutch books, the Jerey-Savage theory of hypothesis testing and Bayesian reliability. In Reliability and Decision Making, R.E. Barlow, C.A. Clarotti, F. Spizzichino Eds., Chapman and Hall, London, 31-85. Jackson, M., Kalai, E. and Smorodinsky, R. (1999). Bayesian representation of stochastic processes under learning: de Finetti revisited. Econometrica 67 , no. 4, 875{893. Kendall, D.G. (1967). On nite and in nite sequences of exchangeable events. Studia Scient. Math. Hung., 2, 319-327. Kyburn, H. and Smokler, H. (Eds.) (1964). Studies in subjective probability. John Wiley & Sons, New York. Koch, G. and Spizzichino, F. (Eds.) (1982). Exchangeability in probability and statistics. North-Holland, Amsterdam. Lad, F. (1996). Operational subjective statistical methods. A mathematical, philosophical, and historical introduction. John Wiley & Sons, New York. Lauritzen, S. L. (1974). Suciency, prediction and extreme models. Scand. J. Stat., 1, 128-134. Lauritzen, S. L. (1984). Extremal point methods in statistics. Scand. J. Stat., 11, 65-91. Lauritzen, S. L. (1988). Extremal families and systems of sucient statistics. Springer Verlag, New York. LeCam, L. (1958). Les proprietes asymptotiques de solution de Bayes. Publications de l'Institut de Statistique de l'Universit e de Paris, 7, 1419-1455. Lindley, D.V. and Novick, M.R. (1981). The role of exchangeability in inference. Ann. Stat. 9, 45-58. Loeve, M. (1963). Probability theory Third edition Van Nostrand, Princeton. Olshen, R. (1974). A note on exchangeable sequences. Z. Wahrsch. verw. Gebiete, 28, 317-321. Picci, G. (1977). Some connections between the theory of sucient statistics and the identi cation problem. SIAM J. Appl. Math., 33, 383-398. Piccinato, L. (1993). The Likelihood Principle in Reliability Analysis. In Reliability and Decision Making, R. E. Barlow, C. A. Clarotti and F. Spizzichino, Eds, Chapman & Hall, London. Piccinato, L. (1997). Metodi per le decisioni statistiche. Springer Italia, Milano. Regazzini, E. (1988). Subjective Probability. In Encyclopedia of Statistical Sciences, Vol. 9, N. Johnson and S. Kotz Eds., John Wiley & Sons, New York. Renyi, A. and Revesz, P. (1963). A study of sequences of equivalent events as special stable sequences. Publ Math. Debrecen, 10, 319-325.
BIBLIOGRAPHY
45
Ressel, P. (1985). de Finetti-type theorems: an analytical approach. Ann. Probab., 13, 898-922. Ryll-Nardewski (1955). On stationary sequences of random variables and de Finetti's equivalence. Coll. Math., 4, 149-156. Savage, L. J. (1972) The foundations of statistics, 2nd revised edition. Dover, New York. Scozzafava, R. (1982). Exchangeable events and countable disintegrations. In Exchangeability in probability and statistics, G. Koch and F. Spizzichino, Eds. North-Holland, Amsterdam, 297{301. Spizzichino, F. (1978). Statistiche sucienti per famiglie di variabili aleatorie scambiabili. In Atti Convegno su Induzione, Probabilita' e Statistica, Universita' di Venezia. Spizzichino, F. (1982). Extendibility of symmetric probability measures. In Exchangeability in probability and statistics, G. Koch and F. Spizzichino, Eds. North-Holland, Amsterdam, 313-320. Spizzichino, F. (1988). Symmetry conditions on opinion assessment leading to time-transformed exponential models. In Prove di durata accelerate e opinioni degli esperti in Adabilita', C.A. Clarotti and D. Lindley, Eds. North Holland. Sugden, R. (1979). Inference on symmetric functions of exchangeable populations. J. R. Stat. Soc. B, 41, 269-273. Urbanik, K. (1975). Extreme points methods in probability. In Proceedings of the Winter School on Probability, Karpacz, 1975. Zacks, K. (1971). The theory of statistical inference. John Wiley & Sons, New York.
Chapter 2
Exchangeable lifetimes 2.1 Introduction In this chapter we shall be dealing with basic aspects of the distribution of n non-negative exchangeable random quantities that will be denoted by T1 ; :::; Tn . T1 ; :::; Tn can be interpreted as lifetimes either of elements in a biological population or of units of an industrial production; elements or units will be sometimes denoted by U1 ; :::; Un and commonly referred to as individuals. The assumption that T1 ; :::; Tn are exchangeable embodies the idea that U1 ; :::; Un are similar, at least as far as our state of information is concerned: we expect that U1 ; :::; Un will have dierent performances and that their lifetimes T1 ; :::; Tn will take dierent values, but we have no reason to assess for them dierent probability laws. A list of dierent, simple cases giving rise to exchangeable lifetimes follows; a more precise description of corresponding probabilistic models will be given in the next sections. These models will also provide the basis for most of the examples and discussions to be presented from now on.
Example 2.1. (Conditionally i.i.d. lifetimes). This is the most common case.
U1 ; :::; Un are similar and there is no physical interaction among them; however, their lifetimes are in uenced by the values taken by one or more physical quantities, such as intrinsic strength of material or temperature, pressure and, more in general, environmental conditions (similar cases can easily be conceived in the biomedical eld). The vector formed by the latter quantities is denoted by the symbol . Suppose that the actual value of is constant during the lives of U1 ; :::; Un ; but it is unobservable to us and there is an uncertainty to it. Then we assess a probability distribution for . T1; :::; Tn may be assumed to be conditionally independent, identically distributed given ; and thus exchangeable, but they are not independent. During the observation of progressive survivals 47
48
EXCHANGEABLE LIFETIMES
and/or of failures of U1 ; :::; Un a continuing process of learning about takes place; this means that we continuously update the distribution of . Example 2.2. (Change-point models). is a non-negative quantity, measuring the level of the instantaneous stress on the individuals U1 ; :::; Un ; the operative lives of U1 ; :::; Un start simultaneously at time t = 0 and the value taken by ; at t = 0; is known to be equal to 1 . At a time > 0; changes its value to 2 : We do not observe directly; however, we observe progressive failures of U1 ; :::; Un . T1 ; :::; Tn are then assumed to be conditionally independent, identically distributed given : What we have observed up to any time t > 0 will be used to update our judgment about the event f tg and about the residual lifetimes Ti ; t: This is a special case of what is known as the \disruption" or \change point" problem. Furthermore we can combine the case above with the situation of Example 2.1, by regarding 1 and 2 as unknown to us. More generally we might think of cases where the value of evolves in time, giving rise to a (non-observable) stochastic process t : This situation may become much more general and complicated than that in the previous example; however, T1 ; :::; Tn remain exchangeable as far as, for all t > 0, all individuals surviving at t have a same probabilistic behavior, conditional on the process t. Example 2.3. (Heterogeneous populations). Among the n individuals U1; :::; Un; some are strong and some are weak. We neither know the identity of weak individuals nor the total number of them and, for j = 1; :::; n, we introduce the binary random variables de ned as follows:
1 U is \weak" j Kj =
0 Uj is \strong" We think of the situation where, if we knew the condition fKj = ig (i = 1; 2), of the individual Uj , then its lifetime Tj would be independent of the other lifetimes and distributed according to a given distribution Gi (t). Thus T1 ; :::; Tn are independent, though non-identically distributed, given K1 ; :::; Kn. Under the condition that K1; :::; Kn are exchangeable, T1 ; :::; Tn are also exchangeable, but not in nitely extendible, in general, according to the de nition given in 1.28. K1; :::; Kn are i.i.d. if and only if the total number M of weak individuals has a binomial distribution b(n; p) for some p (see Exercise 1.58); in such a case T1 ; :::; Tn are i.i.d. as well. The extension of this example to the case when there are more than two categories is rather straightforward; we can in fact consider the case of exchangeable random variables K1 ; :::; Kn which are not necessarily binary (see Subsection 4.3 in Chapter 4).
INTRODUCTION
49
Example 2.4. (Finite exchangeability). The individuals U1; :::; Un are similar and behave independently of each other. P At a certain step, we get the information fS (T1 ; :::; Tn ) = sg, where S (t) = ni=1 ti or S is any other symmetric
statistic. As noticed in Remark 1.43, the distribution of (T1 ; :::; Tn ) conditional on this event is exchangeable; however, T1 ; :::; Tn are not independent. Furthermore their (n-dimensional) joint distribution does not admit a density function. This is a case of nite-extendibility. Example 2.5. (Time-transformed exponential models). The lifetime of each of the individuals U1 ; :::; Un in a population is a deterministic function of a certain resource A available to it, so that we can put Ti = (Xi ); where Xi denotes the initial amount of the resource A initially owned by UPi .n The total initial amount of A in the population is a random quantity X = i=1 Xi . We assume that, conditional on the hypothesis P fX = xg, the joint distribution of X1 ; :::; Xn is uniform over the simplex ni=1 Xi = x; in other words X1 ; :::; Xn are the spacings of a random partition of the interval [0; X ]. T1; :::; Tn are exchangeable; their joint distribution is in nitely extendible or not depending on the marginal distribution of X . Example 2.6. (Exchangeable load-sharing models). The individuals are units which are similar and are installed to work simultaneously in the same system or to supply the same service (think of the set of copy-machines in a university department). Then they share a similar load and, when one of them fails, the others undergo an increased stress. This is then a case of positive dependence. The notion of multivariate conditional hazard rate, which will be de ned later on, provides a convenient tool to describe mathematically such models. Example 2.7. (Exchangeable common mode failure models). Think again of units which are similar and work simultaneously. Now we consider the case when they work independently of each other. However they are sensible to the same shock. The shock arrives at a random time; at any instant t, the hazard rate of the arrival of the shock is independent of the number and identity of the units failed up to t. When the shock arrives all the surviving units fail simultaneously. This is again a case of positive dependence. Also in this case the joint distribution does not admit a density. From now on in this chapter we recall the terminology and fundamental probabilistic concepts which are speci c to the treatment of non-negative random quantities. In the remainder of the present section we consider the onedimensional case, while the next three sections will be devoted to the case of several variables. In particular, the concept of multivariate conditional hazard rate will be introduced in Section 2.3 and speci c aspects of it will be analyzed in Section 2.4.
50
EXCHANGEABLE LIFETIMES
In the one-dimensional cases, fundamental concepts are those of the conditional survival function of a residual life-time at an age s and, for distributions admitting a density function, of hazard rate function. For a distribution function F , such that F (0) = 0, let U be an individual whose lifetime T has the survival function
F (s) P fT > sg = 1 ; F (s). The conditional survival function of the residual lifetime of U at age s is
F s ( ) P fT ; s > jT > sg = F ( + s) , s > 0; > 0 F ( )
(2.1)
R In the absolutely continuous case, when F (s) = 0s f (t)dt (s > 0), the hazard rate function is de ned by P fT ; s jT> sg = lim 1 ; F s () = r(s) = lim !0+ !0+ f (s) = ; d log F (s) ds F (s)
(2.2)
R By introducing the cumulative hazard function R(s) 0s r( )d , one can write F (s) = exp f;R (s)g f (t) = r (t) exp f;R (t)g
(2.3)
F s ( ) = exp fR (s) ; R (s + )g
Example 2.8. The exponential distribution with parameter (F (s) = expf;sg) is characterized by the properties
F s ( ) = exp f; g = F ( ); r(s) = ; 8s > 0; 8 > 0:
Example 2.9. As in Example 2.1, consider the case when T has a conditional survival function G(sj) depending on the value of some physical quantity ; where is a random quantity taking values in a set L and with an initial density 0 . The \predictive" survival and density function of T are then, respectively, F (s) =
Z
L
G(sj)0 ()d; f (t) =
Z
L
g(tj)0 ()d
INTRODUCTION
51
so that, for s > 0; > 0,
R
G( + sj)0 ()d F s ( ) = F ( + s) = LR = F (s ) L G(sj)0 ()d
R
G( +sj) Z L G(sj) G(sj)0 ()d R G(sj) ()d = Gs( j)s (js)d L 0 L
(2.4)
where s (js) is de ned by
s (js) =
R GG(s(jsj))0 (())d : L
0
By Bayes' formula, s (js) can be interpreted as the conditional density of ; given the observation of the survival (T > s): Denoting by r(sj), R(sj) the hazard rate function and the cumulative hazard function of G(sj), respectively, we can also write
F (s) = f (t) =
Z L
Z
L
expf;R(sj)g0()d
r(tj) expf;R(tj)g0 ()d
and the predictive (unconditional) hazard rate function is
R
r(tj) expf;R(tj)g ()d Z r(t) = f (t) = L R expf;R(tj)g (0)d = r(tj)t (jt)d: F (t) 0 L L
(2.5)
The predictive joint survival function and the predictive density, respectively, are given by mixtures of the conditional survival functions and densities, 0 being the compounding density. Notice that also for the predictive hazard rate r(s) we found that it is a mixture of the conditional hazard rate functions r(tj). But, dierently from what happens for F and f , the compounding density this time is t (jt), which varies with t. This fact will have important consequences in the study of \aging properties" of mixtures. Example 2.10. Think of a lifetime T with a (predictive) hazard rate function r(t); suppose that, at a certain step, we get the information fT < g (for a given > 0). The conditional survival function is, for s < ;
F (sjT < ) = F (s);F ( ) 1;F ( )
52
EXCHANGEABLE LIFETIMES
and then the conditional hazard rate function becomes, for t < , r(tjT < ) = r(t) 1F ( ) 1; F (t)
(2.6)
Example 2.11. Consider a single unit in the case envisaged in Example 2.2. More generally, assume that T has a hazard rate function
R
r(tj) 1ft<g r1 (t) + 1ftg r2 (t)
and put Ri (s) 0s ri (t)dt. If is a non-observable random time with a density q(); then the (predictive) survival function of T is
F (s) =
Zs 0
expf;R1 ()g expfR2 () ; R2 (s)gq()d + expf;R1(s)
Z1 s
q()dg:
(2.7)
One can show (see Exercise 2.65) that the predictive hazard failure rate is
r(t) = (t)r1 (t) + [1 ; (t)]r2 (t) where
R
expf;R1 (t)g t1 q()d (t) = = P f > tjT > tg: F (t) Remark 2.12. As (2.2) and (2.3) show, a one-dimensional distribution for a lifetime T , admitting a density, can be equivalently characterized in terms of the distribution function, or the survival function or its hazard rate function. In the next sections we shall consider multivariate analogues of arguments and formulae above.
2.2 Positive exchangeable random quantities In this section we deal with some concepts related to joint distributions of several non-negative random quantities. Our treatment will in particular prepare the ground for the study of multivariate conditional hazard rate functions to be carried out in the next section. Even though in the literature such concepts are introduced dealing with general cases, here we shall introduce them by directly thinking of exchangeable quantities T1 ; :::; Tn; which represent the lifetimes of similar individuals
U1 ; :::; Un :
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
53
If not otherwise stated, we deal with distributions on Rn+ admitting continuous joint density functions; a density function for n lifetimes will be denoted by the symbol f (n) (t1 ; :::; tn ) or f (n) (t): Illustrative examples will be presented at the end of the section. A natural way to describe the joint distribution of T1 ; :::; Tn is by means of their joint (predictive) survival function:
F (n) (s1 ; :::; sn ) P fT1 > s1 ; :::; Tn > sn g which often turns out to be more useful than the density function for the probabilistic ofthe beha modelling vior of U1 ; :::; Un and of our state of information about them. (n) Of course, F can be de ned for any n-tuple of random quantities; in our case, T1 ; :::; Tn being exchangeable, we have that the function F (n) is m per utation-invariant. It can be convenient lo to m pe y,dieren ingeneral, t notation for the argu(n) ( n ) ments of f and F , for which we use the symbols t1 ; :::; tn and s1 ; :::; sn ; respectively: t1 ; :::; tn are interpreted as failure times, while s1 ; :::; sn are interpreted as ages. Denoting by F (n) (s1 ; :::; sn ) the joint distribution function, i.e. F (n) (s1 ; :::; sn ) P fT1 s1 ; :::; Tn sn g, it is,trivially
F (n) (s1 ; :::; sn ) + F (n) (s1 ; :::; sn ) 1 (see Figure 3, for the case n = 2). y
B s2
A
s1
x
Figure 3. F (2) (s1 ; s2 ) = P fT 2 Ag; F (2) (s1 ; s2 ) = P fT 2 B g:
54
EXCHANGEABLE LIFETIMES (Only for n = 1 it is F (1) (s) + F (1) (s) = 1; 8s > 0). Moreover n (n) f (n) (t ; :::; t ) = (;1)n @ F (s1 ; :::; sn ) j 1
n
which is in fact equivalent to
F
(n)
(s1 ; :::; sn ) =
@s1:::@sn
Z1 Z1 s1
:::
sn
s=t
f (n)(t1 ; :::; tn )dt1 :::dtn
(2.8) (2.9)
For 1 k < n, the marginal survival function of k quantities drawn from T1 ; :::; Tn is F (k) (s1 ; :::; sk ) P fT1 > s1 ; :::; Tk > sk g = F (n) (s1 ; :::; sk ; 0; :::; 0) (2.10)
Remark 2.13. In the case of i.i.d. random quantities with one-dimensional
hazard rate function r(t) and cumulative hazard function R(t), the joint survival function can be written as
F (n) (s1 ; :::; sn ) = expf; and their joint density function is
n X j =1
R(sj )g
f (n) (t1 ; :::; tn ) = r(t1 ):::r(tn ) expf;
n X j =1
R(tj )
(2.11)
(2.12)
Following the discussion sketched in the previous chapter, it is natural in the predictive approach to consider the conditional distribution of unobserved variables, given the values taken by those which have been observed. In the present setting we are often interested in computing survival probabilities for lifetimes, conditional on the values taken by already observed lifetimes. That is, we can be interested in conditional probabilities of the type P fTh+1 > sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th g: (2.13) Taking into account the equation (2.9) and adapting (1.37) of Chapter 1, we obtain P fTh+1 > sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th g =
Z1 Z1 sh+1
:::
sn
f (n;h)(th+1 ; :::; tn jT1 = t1 ; :::; Th = th )dth+1 :::dtn =
R 1 ::: R 1 f (n)(t ; :::; t ; t ; :::; t )dt :::dt 1 h h+1 n h+1 n s1 sn f (h)(t1 ; :::; th )
(2.14)
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
55
Remark 2.14. The existence of a joint density f (n) allows us to give a precise meaning to the formal expression
P fTh+1 > sh+1 ; :::; Tn > sn j T1 = t1 ; :::; Th = th g even if fT1 = t1 ; :::; Th = th g is an event of null probability. Indeed we implicitly de ne
P fTh+1 > sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th g lim P fTh+1 > sh+1 ; :::; Tn > sn jt1 T1 t1 + ; :::; th Th th + g (2.15)
!0
Under absolute continuity the above limit exists and Equation (2.14) provides an expression for it. Note that, in the statistical analysis of lifetime data, an event of the type fT1 = t1 ; :::; Th = th g corresponds to the observation of h failures and t1 ; :::; th denote failure times. Commonly, one also observes survival data. That is, one fairly often must also take into account previous pieces of information about survival of some of the individuals U1 ; :::; Un and, rather than in conditional survival probability of the form above, we are interested, for s1 r1 ; :::; sn rn 0, in
P fT1 > s1 ; :::; Tn > sn jT1 > r1 ; :::; Tn > rn g
(2.16)
P fTh+1 > sh+1 ; :::; Tn > sn jDg
(2.17)
or where
D fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g: (2.18) Of course some of the ri (1 i n) can be equal to zero, and, in particular (2.17) reduces to (2.13) when ri = 0 (h + 1 i n). As a function of the variables 1 = s1 ; r1 ; :::; n = sn ; rn , the conditional probability in (2.16) can then be seen to be the joint survival functions of the residual lifetimes Tj ; rj (r = h + 1; :::; n) conditional on the observation of survival data. Trivially, we can write (n) P fT1 > s1 ; :::; Tn > sn jT1 > r1 ; :::; Tn > rn g = F(n) (s1 ; :::; sn ) F (r1 ; :::; rn )
The conditional probability in (2.17) can be de ned as lim P fTh+1 > rh+1 + h+1 ; :::; Tn > rn + n jD g
!0
(2.19)
56
EXCHANGEABLE LIFETIMES
where
D ft1 T1 t1 + ; :::; th Th th + ; Th+1 > rh+1 ; :::; Tn > rn g Again it can be seen as the joint survival function of residual lifetimes of surviving individuals, conditional on the observation of survival data and/or failure data. An explicit expression for it, in terms of f (n) , is provided by the following Proposition 2.15. For h = 1; :::; n ; 1, tj 0 (0 j h), 0 rj sj (h + 1 j n), one has
P fTh+1 > sh+1 ; :::; Tn > sn jDg =
R 1 ::: R 1 f (n)(t ; :::; t ; t :::; t )dt :::dt Rs1h+1 ::: Rs1n f (n)(t1 ; :::; th; th+1;:::; tn)dth+1:::dtn rh+1
h h+1;
1
rn
n
h+1
n
(2.20)
Proof. Since rj sj (h + 1 j n), the event fTh+1 > sh+1 ; :::; Tn > sn g implies fTh+1 > rh+1 ; :::; Tn > rn g. Thus
lim P fTh+1 > sh+1 ; :::; Tn > sn jD g =
!0
lim P fTh+1 > sh+1 ; :::; Tn > sn jt1 T1 t1 + ; :::; th Th th + g !0 P fTh+1 > rh+1 ; :::; Tn > rn jt1 T1 t1 + ; :::; th Th th + g
sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th g = PP ffTTh+1 > > r ; :::; T > r jT = t ; :::; T = t g h+1
h+1
n
n 1
1
h
h
whence (2.20) is obtained by applying (2.14). Now we want to specialize the above arguments to common special cases of in nite extendibility. More precisely we shall now consider parametric models, as de ned by the condition (1.38) in Section 1.3. Let us then assume that the lifetimes T1 ; :::; Tn are conditionally i.i.d., given a parameter taking values in L Rd for some d 1, with an initial density 0 , and that the one-dimensional conditional survival function G(tj) admits a density g(tj) (8 2 L). We denote by r(tj) and R(sj) the hazard and cumulative hazard functions, respectively, corresponding to G(tj).
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
57
By recalling the equations (2.11) and (2.12), we have
F (n) (s1 ; :::; sn ) =
Z L
Z
expf;
f (n) (t1 ; :::; tn ) =
G(s1 j):::G(sn j)0 ()d =
L
n X j =1
Z L
R(sj j)g0 ()d;
(2.21)
g(t1 j):::g(tn j)0 ()d =
(2.22)
8 n 9 < X = r(t1 j):::r(tn j) exp :; R(tj j); 0 ()d: L j =1
Z
Under the assumptions on g(j) which allow interchanging the integration order, by Proposition 2.15, we can then write, for D as in (2.18),
P fTh+1 > sh+1 ; :::; Tn > sn jDg =
R R 1 ::: R 1 g(t j):::g(t j)g(t j):::g(t j) ()dt :::dt d RL Rs1h+1 ::: Rs1n g(t1j):::g(th j)g(th+1j):::g(tnj)0 ()dth+1:::dtnd : L rh+1
rn
1
h
h+1
n
0
h+1
n
(2.23)
Consider now the posterior (i.e. conditional) density of given the observation D. In order to derive it, we can use Bayes' formula two times subsequently: the rst time to account for failure data fT1 = t1 ; :::; Th = th g and the second time to account for survival data fTh+1 > rh+1 ; :::; Tn > rn g. This gives
(jD) = R g(t1 j):::g(th j)G(rh+1 j):::G(rn j)0 () L g (t1 j):::g (th j)G(rh+1 j):::G(rn j)0 ()d as
(2.24)
Equation (2.24) shows that the ratio in the r.h.s. of (2.23) can be rewritten
Z Y n G(s j) j (jD) d L j =h+1 G(rj j)
and we can conclude with the following
(2.25)
58
EXCHANGEABLE LIFETIMES
Proposition 2.16. Assume F (n)(s1 ; :::; sn) to be of the form (2.21). Then it
is
P fTh+1 > sh+1 ; :::; Tn > sn jDg =
Z P fTh+1 > sh+1; :::; Tn > snjg (jD) d L P fTh+1 > rh+1 ; :::; Tn > rn jg
manasatural Notice seen becan that (2.26)
(2.26)
ultivariate extension of (2.4).
In order to illustrate the arguments introduced so far, let us now turn to discussing some of the special cases introduced in the last section.
Example 2.17. (Finite forms of time transformed exponential models). For
s 0, denote by (sn) the simplex de ned by (sn) ft 2 Rn+ j
n X i=1
ti = sg
and, for n lifetimes T1; :::; Tn , we consider the (singular) probability distribution which is uniform over (sn) ; the joint survival function of (T1 ; :::; Tn) is in this case given by (n)
F (s1 ; :::; sn ) =
(
1;
Pni=1 si n;1
P
for 0 ni=1 si s ; otherwise
s
0
we shall also write this as (n)
F (s1 ; :::; sn ) = See Figure 4 for the case n = 2.
Pn s n;1 : 1 ; i=1 i
s
+
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
59
t 1+t2 <s s Z s (t1,t2) t2
s
t1
p
Figure p 4 The bold segment Zs (t1 ; t2 ) has length (1 ; t1 ; t2 ) 2 and ps has length s 2. P fT1 > t1 ; T2 > t2 g = P f(T1; T2 ) 2 Zs (t1 ; t2 )g = (1;t1sp;t22 ) 2 = 1 ; t1+t2 : s +
P
Let Z1 ; :::; Zn be i.i.d. exponential lifetimes and set Sn = ni=1 Zi ; for s > 0, the conditional distribution of Z1 ; :::; Zn given fSn = sg is the uniform distribution over (sn) . This same fact remains true for whatever distribution of Z1; :::; Zn satisfying the condition
P fZ1 > s1 ; :::; Zn > sn g =
n ! X i=1
si
i.e. such that the joint survival function is Schur-constant. This argument shows that, conditioning with respect to Sn and denoting by P(n) the marginal distribution of Sn , we can write, for an arbitrary Schurconstant survival function F (n) ,
F
(n)
(s) =
n ! Z 1 X i=1
si =
0
Pn s n;1 1 ; i=1 i dP (n) (s) : s
+
Z1 ; :::; Zn are i.i.d. exponential (i.e. (t) = expf;tg, for some > 0 ) if and only if P(n) is G(n; ), a gamma distribution with parameters n and : Z1; :::; Zn are conditionally i.i.d. exponential given a parameter , if and only if P(n) is a mixture of gamma distributions with parameters n and ,
60
EXCHANGEABLE LIFETIMES
fG(n; )g0. This happens if and only if the function is such that the idenP tity F (n) (s) = ( ni=1 si ) gives rise to a well-de ned n-dimensional survival function, for any n = 1; 2; :::. We can also state the latter arguments in the following form: for n ! 1, a uniform distribution over the simplex (sn) tends to coincide with the distribution of n i.i.d lifetimes. This is the more general formulation of the de Finetti-type Theorem 1.44 proven by Diaconis and Freedman (1987). For Z1 ; :::; Zn with a Schur-constant survival function, consider now lifetimes T1 ; :::; Tn of the form
T1 = R;1 (Z1 ) ; :::; Tn = R;1 (Zn ) where R is a strictly increasing non-negative function. We have (n)
F T (s) = P fZ1 > R(s1 ); :::; Zn > R(sn )g =
n X i=1
!
R(si ) :
We call this a nite form of time transformed exponential model (see also Barlow and Mendel, 1992). Proportional hazard models which will be discussed in the next example can be obtained from these models in terms of the de Finetti-type result mentioned above.
Example 2.18. (Proportional hazards, conditionally i.i.d. lifetimes with a time transformed exponential model). Let be a non-negative parameter with a prior density 0 () and let T1 ; :::; Tn be conditionally i.i.d given , with a conditional distribution of the form G(sj) = expf;R(s)g
(2.27)
where R : [0; +1) ! [0; +1) is a non-decreasing dierentiable function such that R(0) = 0 and lims!1 R(s) = +1. By denoting (t) dsd R(s)js=t , Equations (2.21) and (2.22), respectively, become
F (n) (s1 ; :::; sn ) = f (n) (t
1 ; :::; tn ) =
Z +1 0
Z +1 Y n n 0
j =1
expf;
n X j =1
(tj ) expf;
R(sj )g0 ()d n X j =1
R(tj )g0 ()d:
(2.28)
(2.29)
POSITIVE EXCHANGEABLE RANDOM QUANTITIES It follows, for D as in (2.18), that
(jD) = k (h; t; r) h expf;[
h X j =1
R(tj ) +
n X j =h+1
61
R(rj )]g0 ()
(2.30)
where k (h; t; r) is the normalization constant. Whence, using (2.26), we obtain P fTh+1 > sh+1 ; :::; Tn > sn jDg =
R 1 h expf;[Ph R(t ) + Pn R(s )]g ()d R01 hexpf;[Phj=1 R(t j) + Pnj=h+1 R(r j)]g 0()d j =1 (n)
0
j
j =h+1
j
0
(2.31)
P
It is worth remarking that F (s1 ; :::; sn ) actually is a function of nj=1 R(sj ); furthermore the above conditional probability depends upon the quantities (t1 ; :::; th ; rh+1 ; :::; rn ) only through the triple
0 h 1 n X X @h; R(tj ); R(rj )A j =1
j =h+1
P
and it depends upon (sh+1 ; :::; sn ) only through the quantity nj=h+1 R(sj ). In particular, in the conditionally exponential caseP(R(s) P s), it depends upon (t1 ; :::; tn ; rh+1 ; :::; rn ) only through the pair (h; hj=1 tj + nj=h+1 rj ). Example 2.19. Continuing Example 2.7, we now consider two lifetimes T1; T2, conditionally independent given the random time , with hazard rate function r(tj) 1ft<g r1 (t) + 1ftg r2 (t)
R q() denotes the density of and Ri (t) = 0t ri (u)du. In order to give F (2) (s1 ; s2 )
an explicit form, it is convenient to write (2)
F (s1 ; s2 ) =
Z s2 s1
Z s1 0
P fT1 > s1 ; T2 > s2 jgq()d+
P fT1 > s1 ; T2 > s2 jgq()d +
Z1 s2
P fT1 > s1 ; T2 > s2 jgq()d:
Whence, it can be checked (Exercise 2.67) that (2) P fT1 > s1 ; T2 > s2 jT1 > r; T2 > rg = F (2)(s1 ; s2 ) F (r; r)
62 =
Z1 0
EXCHANGEABLE LIFETIMES
P fT1 > s1 ; T2 > s2 jgq(jT1 > r; T2 > r)d:
(2.32)
Example 2.20. (Heterogeneous populations). Consider the case of Example 2.3 and let the distribution Gi be characterized by a density function gi (t) and by a one-dimensional hazard rate ri (t) = gi (t)=Gi (t); (i = 0; 1): Furthermore let p0 (m) = P fM = mg (m = 0; 1; :::; n) denote the prior distribution of the total number M of weak individuals. The joint survival function and the joint density function of T1 ; :::; Tn are easily seen to have, respectively, the form F (n) (s1 ; :::; sn ) = n X m=0
n X
m=0
p0 (m) n1!
p0 (m)P fT1 > s1 ; :::; Tn > sn jM = mg = m XY i=1
G1 (si )
Yn i=m+1
G0 (si );
(2.33)
f (n) (t1 ; :::; tn ) = n X m=0
p0 (m) n1!
m XY i=1
r1 (si )G1 (si )
n Y i=m+1
r0 (si )G0 (si );
where the sum is taken over all the permutations = (1 ; :::; n ) of (1; 2; :::; n). The marginal one-dimensional survival function of a lifetime Ti is F (1) (s) = F (n) (s; 0; 0; :::; 0) = G (s) E (M ) + G (t) n;E (M ) 1
0
n
and the marginal one-dimensional density function is f (1)(t) = g (t) E (M ) + g (t) n;E (M ) 1
n
0
n
n
As far as the marginal hazard rate function is concerned, we have (1) (t) = f (1)(t) = (t)r1 (t) + [1 ; (t)]r0 (t) F (t) where G1 (t)E [M ] (t) = G1 (t)E [M ]+G0 (t)(n ; E [M ])
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
63
It can be easily shown that (t) has the meaning (t) = P (Cj = 1jTj > t): As (2.33) shows, the prior distribution of M of course in uences the joint distribution of T1 ; :::; Tn . We consider now the conditional survival probabilities, given failure and survival data; for the sake of notation simplicity, we limit ourselves to the case n = 2. The conditional survival probability P fT2 > s2 jT1 = t1 ; T2 > r2 g can be obtained by specializing the equation (2.20). It can be also obtained by noticing that, for D fT1 = t1 ; T2 > r2 g, it must be P fT2 > s2 jDg =
P fT2 > s2 jT2 > r2 ; K1 = 1; K2 = 1g P fK1 = 1; K2 = 1jDg+ P fT2 > s2 jT2 > r2 ; K1 = 1; K2 = 0g P fK1 = 1; K2 = 0jDg + P fT2 > s2 jT2 > r2 ; K1 = 0; K2 = 1g P fK1 = 0; K2 = 1jDg+ P fT2 > s2 jT2 > r2 ; K1 = 0; C2 = 0g P fK1 = 0; K2 = 0jDg: In the special case M0 b(2; p), where as already remarked T1 ; T2 are
stochastically independent, the above reduces to P fT2 > s2 jDg = P fT2 > s2 jT2 > r2 ; K2 = 1gP fK2 = 1jT2 > r2 g+
P fT2 > s2 jT2 > r2 ; K2 = 0gP fK2 = 0jT2 > r2 g:
Remark 2.21. (\Conditional formula of total probabilities"). A fundamental
property of conditional probability specializes in the following formula. Let E and D be two events with positive probabilities and let be a random quantity taking values in a (one-dimensional or multidimensional) space L. Then one has Z P (E jD) = P (E jD; = )d(jD) (2.34) L
where (jD) denotes the conditional distribution of , given D. This formula has a great importance in our setting and the examples above show the following: computations of conditional survival probabilities given D, with D being a survival data of the type fT1 > r1 ; :::; Tn > rn g or, more in general, of the form (2.18), can just be recognized as dierent applications of this formula, taking as a suitable \latent" variable. Recall that, when we write P (E jC ), C being an event of null probability we tacitly assume that P (E jC ) can be properly de ned as a limit of \well-de ned" conditional probabilities similarly to what we did in Remark 2.14.
64
EXCHANGEABLE LIFETIMES
Example 2.22. (Common mode failures). Consider two units which are similar and work independently one of the other; they are, however, sensible to the same destructive shock. We assume that, in the absence of a shock, the two lifetimes would be independent with the same survival function G and that the waiting time until the shock is W , where the survival function of W is H . We can then write T1 = min(V1 ; W ); T2 = min(V2 ; W ) where V1 ; V2 are independent lifetimes with survival function G. For the joint survival function we have then
F (2) (s1 ; s2 ) = P fV1 > s1 ; V2 > s2 ; W > s1 ; W > s2 g =
G (s1 ) G (s2 ) H (s1 _ s2 ) where s1 _ s2 is the short-hand notation for max(s1 ; s2 ). Note that, since P fT1 = T2g > 0, F (2) (s1 ; s2 ) does not admit a joint den-
sity; then we cannot use the formula (2.20) to compute a conditional survival probability of the type P fT2 > s2 jT1 = t1 ; T2 > r2 g. However a simple reasoning yields, for s2 > r2 > t1 ,
P fT2 > s2 jT1 = t1 ; T2 > r2 g = G (s2 ) H (s2 ) , G (r2 ) H (r2 )
(2.35)
and, for r2 < t1 ,
g(t1 ) P fT2 > s2 jT1 = t1 ; T2 > r2 g = G (s2 ) H (s2 ) G (r2 ) H (t1 ) g(t1 ) + G(t1 )h(t1 )
Example 2.23. (Bivariate model of Marshall-Olkin). Consider again the model in the previous example, assuming in particular that the survival functions G and H are exponential, G(s) = expf;sg, H (s) = expf;0 sg say. In this special case we obtain
F (2) (s1 ; s2 ) = expf;(s1 + s2 ) ; 0 (s1 _ s2 )g
(2.36)
This is known as the bivariate exponential model of Marshall-Olkin (Marshall and Olkin, 1967). Notice that the one-dimensional marginal distribution is exponential. Remark 2.24. In the reliability practice, data are often generated according to a scheme as follows. Think for instance of the case of n similar devices which are available to be installed into m dierent positions (m < n). At time 0, m new devices are installed; at the instant of a failure, the failed device is replaced
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
65
by a new one in the same position. At a generic instant t > 0, the observed data can be represented in the form D fT1 = t1 ; :::Th = th ; Th+1 > r1 ; ::::; Th+m > rm ; Th+m+1 > 0; :::; Tn > 0g where h is the number of replacements that have occurred up to t, t1 ; :::; th are the observed lifetimes for the corresponding devices and 0 < ri t (i = 1; :::; m). ri is the age of the device working in the position i at time t, and ri = t means that no replacement previously occurred in that position. In this scheme it is, at any instant t, h X j =1
tj +
m X j =1
rj = mt:
As special cases of the above scheme, we encounter two special patterns of observed data. (A) One special case is when m = 1; in such a case then, obviously, only one unit is working at a time and then there is at most only one survival data, at any instant t. o
U h+1 Uh . . . . . U3
x
x
U2
x
U1 t3
Figure 5.1.
t1
x
t 2.... t h . . . . . . . . t
t
observation of type (A)
(B) A second pattern of failure and survival data is the one that we obtain from the above scheme when no replacement is scheduled for failed devices. This means that all the individuals start living, or operating, at the same time
66
EXCHANGEABLE LIFETIMES
and that all of those surviving at a time t > 0 share the same age t (m = n). We then consider a duration experiment where the units U1 ; :::; Un are new at time 0 and are simultaneously put under test, progressively recording all the subsequent failure times. At any time t > 0, the available statistical observation then is a history of the form
D0 fTi1 = t1 ; :::; Tih = th ; Tj1 > t; :::; Tjn;h > tg
(2.37)
where 0 h n, 0 < t1 ::: th t, i1 ; :::; ih f1; 2; :::; ng is the set of the indexes denoting the units which failed up to t and
fj1 ; :::; jn;h g f1; 2; :::; ngnfi1; :::; ih g is the set of the indexes denoting the units which are still surviving at time t. U in U in-1 . . . U ih . . . .
o o x
x
U i2 U i1
x t1
Figure 5.2.
t2
th . . . . . . . . t
t
observation of type (B)
This is often described in the literature as the case of longitudinal observations. A dynamic approach is appropriate to deal with such situations. This means in particular that the form of interdependence existing among lifetimes is more conveniently described in terms of the notion of multivariate conditional hazard rates, which will be introduced in the next section. It is to be noticed, on the other hand, that dierent kinds of patterns for failure and survival data observations are often encountered in survival analysis, where data from the biomedical eld are to be considered.
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
67
Remark 2.25. Survival data typically come along with the presence of some
rule for stopping the life-data observation. In some cases the stopping rule can itself be informative, i.e. it can contain some information relatively to the lifetimes we want to predict (see e.g. Barlow and Proschan, 1988). In the derivation of Proposition 2.15, which is the basis for the subsequent developments, we tacitly assumed that the stopping rule is not informative. Parametric models described by (2.21) are often encountered in applications. As mentioned in the previous section, the ( nite-dimensional) parameter can be interpreted as a vector specifying factors (such as structural properties, environmental conditions, or resistance to stress situations) which simultaneously aect the behavior of the individuals U1 ; :::; Un . This is a special case of \in nite extendibility." Note that this latter condition is a natural assumption when we think of units which come from a conceivably in nite population of similar units. In the reliability practice, for instance, this is the case when, for a component in a position of a system, we plan a replacement policy by installing (at the instant of failure or at a xed age) other similar components: the lifetimes of new components, which are subsequently put into operation, form a conceivably in nite population. Conditional independence in (2.21) formalizes the assumption that there is no \physical interaction" among U1; :::; Un and, at the same time, that their behavior is in uenced by the common factors speci ed by . This assumption entails that the stochastic dependence among T1 ; :::; Tn, is only due to the increase of information, about , carried by the observation of failures or survivals of some of the U 's. Actually, this is, in particular, formalized by the formula (2.25) above. However, when at least some of the coordinates of describe environmental conditions, it may be more realistic to admit that the value of changes in time. In recent statistical literature, more and more interest has been directed toward the study of interdependence among lifetimes arising from time-varying environmental conditions. A simple model of this type is provided by a straightforward generalization of the proportional hazard model of Example 2.18: the parameter is replaced by a stochastic process t with state space contained in R+ and (conditional on the knowledge of t ) the hazard rate of each lifetime, at time t > 0, is modeled to be t r(t), where r(t) is a given hazard rate function. This model can be interpreted as follows: under ideal, laboratory conditions, each lifetime has a baseline hazard rate r(t) whereas actual operating environment causes r(t) to be modulated by the environmental factor function t (see the review paper by Singpurwalla, 1995). In a Bayesian viewpoint, the assessment of initial conditions and of the law for the stochastic process t here replaces the prior distribution assessed for a time-constant parameter .
68
EXCHANGEABLE LIFETIMES
Of course dierent kinds of multivariate survival functions (and then of interdependence among T1 ; :::; Tn) are obtained under dierent models for the evolution of t . In some particular cases, the laws of the process t have special structures (see e.g. the models in the examples at the end of this section) which allow one to derive the joint (predictive) survival function. In general it is not at all feasible obtaining the joint survival function in a closed form and it is only possible to write down \dynamic" equations which de ne F (n) in an implicit way. Furthermore an even more complicated situation is in general to be taken into account: the conditional hazard rate of any individual at time t may depend not only on the environmental conditions at the same time t but also on their \past history" and/or on the history of failures and survivals
D fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g observed up to t. Apart from speci c probabilistic models, the mathematical tools to deal with general problems involving time-varying environmental conditions is to be found in the framework of the theory of point processes and of stochastic ltering (Bremaud, 1981; Arjas, 1981). Remark 2.26. Assuming a common dynamic environment presupposes the consideration of units which can work simultaneously, as for instance happens in the situations described in the previous Remark 2.24. However this is in general a rather complicated situation to deal with and one usually restricts attention to the case where dynamic histories are observed of the type in (2.37); i.e., in the case of possibly dynamic environment, one usually considers the case (B), where individuals start living, or operating, at the same time, so that all of those surviving at a time t > 0 share the same age t. This is the case which will be considered in detail in the next section when introducing the concept of multivariate conditional hazard rate functions. Such a concept is in particular helpful to describe cases when, even conditionally on the knowledge of the environment process, the lifetimes are not necessarily independent. The following two examples show special models obtained by specifying arguments contained in Singpurwalla and Yougren (1993) to the case of exchangeable lifetimes. Example 2.27. Let A(t) R0t a()d(t 0) be a given non-decreasing leftcontinuous real valued function and let b 2 (0; +1). A gamma process with parameters A(t) and 1=b is a stochastic process Xt characterized by the following conditions (i) P fX0 = 0g = 1 (ii) Xt has independent increments
POSITIVE EXCHANGEABLE RANDOM QUANTITIES
69
(iii) Xt ; Xs has a gamma distribution with parameters
= A(t) ; A(s); = 1=b(s < t) Consider now the case of two (conditionally) independent lifetimes T1; T2 with a continuous baseline hazard r(t) and environmental factor function t , t being a gamma process with parameters A(t) and 1=b. Then the joint survival function of T1; T2 is given, for s1 s2 , by
Z s1
(2)
F (s1 ; s2 ) = exp ;
Z 0s2
exp ;
s1
Z s1 1 log[1 + b 2r( )d ]a(u)du u 1 Z s2
log[1 + b
u
r( )d ]a(u)du
(2.38)
Example 2.28. Here we consider conditionally independent lifetimes with a constant baseline hazard r and an environmental factor function t of the form t =
1 X
k=0
Xk h(t ; T(k) )
where X1 ; X2 ; ::: are non-negative i.i.d. random variables with a common distribution G, T(1); T(2) ; ::: are the arrival times of a Poisson process with a known rate m(t) and h is a non-negative function such that h(u) = 0 for u 0 (see e.g. Cox and Isham, 1980, p. 135). Furthermore assume that X1 ; X2 ; ::: are independent of T(1); T(2) ; :::. Let G be the Laplace transform of G, it can be proved (Singpurwalla and Yougren, 1993) that, for s1 s2 , the two-dimensional survival function is
F (2) (s1 ; s2 ) = G (r [H (s1 ) + H (s2 )]) exp
Z s1 0
G (r [H (s1 ; u1 ) + H (s2 ; u1 )]) m(u1 )du1
exp
Z s2 s1
G (r H (s2 ; u2)) m(u2 )du2
(2.39)
In the special case when H (u) 1, m(u) m, G is a gamma distribution with parameters and = mr , then (2.39) becomes (2)
F (s1 ; s2 ) =
1+m;(s2;s1) 12 1+m(s1 +s2 )
expf;m s2 g:
70
EXCHANGEABLE LIFETIMES
2.3 Multivariate conditional hazard rates
In this section we assume the existence of a continuous joint density f (n) for lifetimes T1 ; :::; Tn and speci cally consider the \longitudinal" observation of the behavior of the individuals U1 ; :::; Un ; this means that the failure and survival data, observed in the interval [0; t] are of the special form (2.37). This situation leads, in a natural way, to the de nition of the concept of multivariate conditional hazard rate (m.c.h.r.). Such a concept can be seen as a direct extension of the one-dimensional hazard rate; it provides a tool for characterizing (absolutely continuous) distributions on Rn+ , and is an alternative to traditional tools such as density functions or survival functions. In various cases it is a particularly handy method to describe a probabilistic model, since it conveys the \dynamic" character of the statistical observations in the applications we are dealing with. Also in formulating the de nition of m.c.h.r. we shall refer to the special case of exchangeability. More generally, this concept was considered e.g. by Gaver (1963), Lawless (1982), and extensively used later on by Shaked and Shanthikumar in the study of dynamic-type dependence and aging properties of vectors of lifetimes. We shall brie y report the general de nition in the next Chapter. This is related to the more general concept of stochastic intensity, introduced in the framework of the theory of point processes (see Arjas, 1981; Bremaud, 1981). If not otherwise speci ed, the individuals U1 ; :::; Un will be thought of as n pieces of an industrial equipment, since such interpretation actually provides a more exible basis for the language that we shall use. Let F (n) (s1 ; :::; sn ) be the survival function of T1 ; :::; Tn and let T(1) ::: T(n) denote the coordinates of the vector of order statistics; we assume the existence of a density f (n) (t1 ; :::; tn ), so that (2.8) holds; under this assumption, it must be necessarily P fT(1) < ::: < T(n)g = 1: simultaneous failures happen with null probability. Consider then a duration experiment where the units U1 ; :::; Un are new at time 0 and are simultaneously put under test, progressively recording all the subsequent failure times. At any time t > 0 the available statistical observation is a history of the form (2.37). De nition 2.29. For h = 1; :::; n ; 1, the multivariate conditional hazard rate of a unit Uj , surviving at t, given the history in (2.37), is de ned as the limit (tn;h) (t1 ; :::; th ) lim 1 P fT < t + jT = t ; :::; T = t ; T > t; :::; T > tg: !0+
jl
i1
1
ih
h j1
jn;h
where l = 1; 2; :::; n ; h. For h = 0, we de ne (tn) lim+ 1 P fTj < t + jT1 > t; :::; Tn > tg: !0
MULTIVARIATE CONDITIONAL HAZARD RATES
71
The sux (n ; h) denotes the number of the units which are still surviving at time t. Note that, due to exchangeability, the above limit does not depend on the choice of the index j 2 fj1 ; :::; jn;h g. In view of this assumption, moreover, there is no practical interest in recording also the \identities" of the units which fail. Thus we can consider that, in place of (2.37), the observed history is one of the form ht
fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg
(2.40)
where 0 < t1 ::: th . Also in the de nition of the multivariate conditional hazard rate functions we do not need to refer to the information about which units are still surviving at any time t. Indeed, since we assumed P fT(1) < ::: < T(n) g = 1, we can equivalently de ne (tn;h) (t1 ; :::; th ) (h = 1; :::; n ; 1) and (tn) in terms of the vector of the order statistics by letting
(tn;h) (t1 ; :::; th ) = lim+ (n 1; h) P fT(h+1) t + jT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg
!0
(2.41)
and, for h = 0, 1 P fT t + jT > tg: (tn) = lim (2.42) (1) (1) + n !0 In the present cases, the above limits, de ning (tn;h) (t1 ; :::; th ) and (tn) , exist and expressions for them are provided by the following result. Proposition 2.30. For 1 h n ; 1, 0 < t1 ::: th t,
(tn;h) (t1 ; :::; th ) =
R 1 R 1 (n) th ; t; th+2 ; :::; tn ) dth+2 :::dtn R 1 :::tR 1:::f (tn) (ft ; :::;(t1t; :::; ; t ; t ; :::; t ) dt dt :::dt t
1
t
Furthermore
(n) t
h h+1 h+2
n
h+1 h+2
R 1 ::: R 1 f (n) (t; t ; :::; t ) dt :::dt 2 n 2 n = R 1t R 1t (n) ::: f (t ; t ; :::; t ) dt dt :::dt t
t
1 2
n
1 2
n
n
(2.43)
(2.44)
Proof. For 1 h n ; 1, 0 < t1 ::: th t, consider the conditional survival function of the residual life-times of the (n ; h) units which are still surviving at t, given an observed history of the form (2.37) or (2.40).
72
EXCHANGEABLE LIFETIMES
By specializing (2.20), we obtain that, as a function of the ages sh+1 ; :::; sn , this is given by P fTj1 > sh+1 ; :::; Tjn;h > sn jht g =
R 1 ::: R 1 f (n)(t ; :::; t ; t :::; t )dt :::dt ; n h+1 n Rsh1+1::: R 1snf (n)(t 1; :::; t h; t h+1:::; t )dt :::dt t
t
1
h h+1;
n
h+1
n
In order to obtain the corresponding one-dimensional marginal we then have to compute the latter in the vector of ages (s; t; :::; t), which yields R 1 R 1 ::: R 1 f (n)(t ; :::; t ; t :::; t )dt :::dt 1 h h+1; n h+1 n sR 1t R 1t (n) (t1 ; :::; th ; th+1; :::; tn )dth+1 :::dtn ::: f t t By combining the latter with De nition 2.29, we get
(tn;h) (t1 ; :::; th ) =
R t+ R 1 ::: R 1 f (n)(t ; :::; t ; t :::; t )dt :::dt h h+1; n h+1 n lim+ 1 t R 1 t R 1 t (n) 1 !0 ::: f ( t ; :::; t ; t 1 h h+1; :::; tn )dth+1 :::dtn t t whence we obtain (2.43). For h = 0, the expression (2.44) is obtained similarly. Remark 2.31. In the case when T1; :::; Tn are i.i.d. with a one-dimensional
hazard rate function r(t), Equations (2.43) and (2.44)) trivially reduce to (tn;h) (t1 ; :::; th ) = r(t); (tn) = r(t) (2.45)
Remark 2.32. Equation (2.42) shows that n(tn) is the one-dimensional failure rate function of the variable
T(1) 1min T: j n j
So we can also read (2.42) as f (t) (tn) = nP fTT(1) > tg or (tn) = (1)
fT(1) (t) R n expf;n 0t (un) dug
(2.46)
In an analogous way, we can read (2.41) in the form (tn;h) (t1 ; :::; th ) =
fT(h+1) (tjT(1) = t1 ; :::; T(h) = th ; T(h+1) > th ) (n;h)P fT(h+1) > tjT(1) = t1 ; :::; T(h) = th ; T(h+1) > th g :
(2.47)
MULTIVARIATE CONDITIONAL HAZARD RATES
73
We now proceed to present a few examples. Some of the examples show the special form taken by the m.c.h.r. functions in some remarkable special cases; other examples, alternatively, show how some models of interest can be de ned, in a natural way, starting from the assignment of the set of the m.c.h.r. functions. Example 2.33. Consider two units undergoing a change in the level of stress at a random time as in Example 2.19. From De nition 2.29 we can obtain
(1) t (t1 ) = P f > tjT1 = t1 ; T2 > tgr1 (t) + P f tjT1 = t1 ; T2 > tgr2 (t) (2) t = P f > tjT1 > t; T2 > tgr1 (t) + P f tjT1 > t; T2 > tgr2 (t) (see also Exercise 2.71). Example 2.34. In the case of two individuals in Example 2.20, where
F (2) (s1 ; s2 ) = p0 (0)F 1 (s1 )F 1 (s2 )+ 1 p (1) F (s )F (s ) + F (s )F (s ) + p (2)F (s )F (s ); 0 1 1 2 0 2 1 1 0 0 1 0 2 2 0 we have
(2) t = r1 (t)P fM = 0jT1 > t; T2 > tg + 12 fr1 (t) + r0 (t)gP fM = 1jT1 > t; T2 > tg+
r0 (t)P fM = 2jT1 > t; T2 > tg
Example 2.35. Consider the situation of a Schur-constant joint density already analyzed in Example 1.52 of Chapter 1. Let then T1 ; :::; Tn have a joint density such that, for a suitable function n : R+ ! R+ , n X
f (n)(t1 ; :::; tn ) = n (
i=1
ti ):
By applying the formula (2.9), we can easily check that this is equivalent to
F (n) (s1 ; :::; sn ) = (
n X i=1
si )
74
EXCHANGEABLE LIFETIMES n
with : R+ ! [0; 1] such that n (t) = (;1)n dsd n js=t . Moreover, by (2.41), we can nd (see Caramellino and Spizzichino, 1996) dhh+1 +1 (y ) dy ( n ; h ) t (t1 ; :::; th ) = ; dh dyh (y ) with
y=
h X i=1
ti + (n ; h)t:
Example 2.36. Think of two similar individuals, between whom there is a strong rivalry, so that each individual undergoes a stress which is proportional to the total age cumulated by the other individual. We model this by assuming that their lifetimes T1 and T2 are exchangeable random quantities with a joint distribution characterized by m.c.h.r's of the form (1) (2) t () = t; t (t1 j) = t1 where the \baseline" hazard > 0 is a known constant. For any xed t > 0 and > 0, (1) t (t1 j) is an increasing function of t1 in the interval [0; t) and this can be looked at as a form of negative dependence between T1 and T2 . Example 2.37. An important (and often analyzed) class of multivariate reliability models arises from the assumption that the instantaneous conditional failure rate of a component surviving at t does not depend on the failure-times of previously failed components: the failure rate is assumed to depend only on t, on the number, and on the identities of all the components still surviving at t (\working set"). Such a case can be seen as a generalization of models considered in Kopocinska and Kopocinski (1980); Ross (1984); Shechner (1984) and references cited therein (special cases of this are the \load-sharing models" and the \Ross models"). In order to get, for such a model, a case of exchangeability, we must impose that all the components surviving at t have a same failure rate and that this failure rate depends only on the total number (n;h), but not on the \identities", of the surviving components. Summarizing, we consider those exchangeable models arising from the position (tn;h) (t1 ; :::; th ) = (tn;h) ; h = 1; :::; n ; 1: (2.48) As special cases of (2.48) one has the \linear-breakdown" models, where ( n ; t h) = nL;(th) for a suitable function L(t). By imposing, instead, time-homogeneity we get exchangeable Ross models, where (tn;h) is independent of t. By combining the latter two conditions we obtain the well-known special case (tn;h) = n ; h ;
MULTIVARIATE CONDITIONAL HAZARD RATES
75
where is a given positive quantity. Remark 2.38. Note that the condition (2.48) does not mean that T1; :::; Tn are independent. Indeed (tn;h) (t1 ; :::; th ) is a function of h (beside being in general a function of t), while, as remarked above a necessary and sucient condition for stochastic independence is that (tn;h) does not depend on h. In the particular cases of parametric models that were considered in the last section, (tn;h) (t1 ; :::; th ) and (tn) can be given a particularly signi cant form: Proposition 2.39. Let the joint density function be of the form (2.22). Then it is
(tn;h) (t1 ; :::; th ) = (n) = t
Z L
Z
L
r(tj)(jht )d
r(tj)(jT(h+1) > t)d
(2.49) (2.50)
where ht is as (2.40) and where (jht ) and (jT(h+1) > t) can be obtained as special cases of (2.24). Proof. Replace f (n) in (2.43) with the r.h.s. of (2.22) and change the order of integration. This yields
(tn;h) (t1 ; :::; th ) =
R R 1 R 1 t j):::g(t j)g(tj)g(t j):::g(t j)dt :::dt d 1 h h+2 n h+2 n = R R 1L :::tR 1:::g(tt jg)(:::g ( t j ) g ( t j ) g ( t j ) :::g ( t j ) dt 1 h h+1 h+2 n h+1 dth+2 :::dtn d L t t R hg(t j):::g(t j) g(tj) G (tj) G (t j) :::G (t j)i d n L R g1 (t j):::gh(t jG()tGj)(tj) G (t hj+2 = ) :::G (t j) d L
1
h+2
h
n
Z
r(tj) R g(t1 j):::g(th j)G (tj) G (th+2 j) :::G (tn j) d: L L g (t1 j):::g (th j)G (tj) G (th+2 j) :::G (tn j) By adapting (2.24) to the special case when D = ht as in (2.40), we see that the latter expression becomes
(tn;h) (t1 ; :::; th ) = A similar argument holds for (2.50).
Z L
r(tj)(jht )d;
76
EXCHANGEABLE LIFETIMES
Note that Proposition 2.39 could also be directly proved by making use of Proposition 2.15. Now contrast (2.49) and (2.45); the heuristic interpretation of (2.49) is rather immediate: if we knew the value taken by the parameter , then, due to conditional independence given , the value of the hazard rate for a unit surviving at t would be equal to r(tj); when is unknown the observed history (2.40) is to be taken into account to update the distribution of ; in this respect, see also Remark 2.21. Example 2.40. (Proportional hazard models). This is the case when r(tj) in (2.49) is of the form r(tj) = (t); whence we have
(tn;h) (t1 ; :::; th ) = (t)
Z
L
(jht )d = (t)E (jht )
(2.51)
Remark 2.41. The condition (2.22), for which the m.c.h.r. functions have been derived just above, can give rise to relevant cases of positive dependence. This will be rendered in more precise terms in Chapter 3 (Subsection 3.3). It is easy to understand that cases of positive dependence are also those described in the above Example 2.37, when the function (tn;h) is increasing in h. However one can intuitively see that the former and the latter cases are different in nature: in the former case, dependence is due to \increase of information," that is to conditional independence given the same unknown quantity; in the latter case positive dependence is just due to a \physical" interaction among components. Such a dierence will have an impact on the analysis of multivariate aging. Before continuing it is convenient to state the following Lemma, which can be simply proved by applying the \product rule" formula of conditional probabilities and by taking into account (2.41) and (2.42). Lemma 2.42. For h = 1; :::; n ; 1; 0 < t1 < ::: < th < t, one has P ft T < t + ; :::; th T(h) < th + ; T(h+1) > tg lim 1 (1) 1 !0 h = n! (t1n)
Yh j =2
(tjn;j+1) (t1 ; :::; tj;1 )
8 Z 9 Z tj h < t1 (n) X = exp :;n u du ; (n ; j ; 1) (un;j+1) (t1 ; :::; tj;1 )du; 0 tj;1 j =2
MULTIVARIATE CONDITIONAL HAZARD RATES
77
Proposition 2.39 is a special case of the following more general situation, which is also of potential interest in several applications: T1 ; :::; Tn are conditionally exchangeable, given the value taken by a parameter ( 2 L Rd for some d = 1; 2; :::), with a joint distribution characterized by c.m.h.r.'s (tn) () and (tn;h) (j) depending on . In such cases it is of interest to obtain the (\predictive") m.c.h.r. functions ( n ) t and (tn;h) () of T1 ; :::; Tn in terms of the \initial" density of and of the m.c.h.r.'s (tn) (); (tn;h) (j) conditional on (h = 1; 2; :::; n ; 1): This problem is in general solved by the following Theorem 2.43. For h = 1; :::; n ; 1, one has
Z
(tn;h) (t1 ; :::; th ) = and
(tn) =
Z L
L
(tn;h) (t1 ; :::; th j)(jht )d
(tn) ()(jT(1) > t)d:
(2.52) (2.53)
Proof. We start by proving (2.53). By Bayes' theorem one has
0 () F T(1) (tj) : L 0 () F T(1) (tj) d
(jT(1) > t) = R
Then, in order to prove (2.53), we have to check the identity
(n)
t =
R (n)()F (tj) () d T(1) 0 L Rt : F (tj) () d L T(1)
0
(2.54)
On the other hand, as a consequence of Equation (2.42), we can write
f (tj) (tn) () = n1 T(1) F T(1) (tj) whence (2.54) becomes 1 t =n
(n)
R f (tj) () d R LF T(1) (tj) 0 () d = n1 FfT(1)((ttj)) L T(1)
0
T(1)
which is seen to be true by applying Equation (2.42) to the unconditional distribution of T(1) . Equation (2.52) can be proved in a similar way, as follows. First we need to extend Equation (2.24) to the present more general case.
78
EXCHANGEABLE LIFETIMES
Denoting by f (n) (t1 ; :::; tn j) the conditional joint density of T1 ; :::; Tn, given = (with 2 L), we rst note that, for ht as in (2.40) R 1 ::: R 1 f (n)(t ; :; t ; t ; :; t j) () dt :::dt d h+1 n (jht ) = R Rt 1 Rt 1 (n) 1 h h+1 n 0 : ::: f ( t ; :; t ; t ; ::; t j ) ( ) dt 1 h h +1 n 0 h+1 :::dtn d L t t This can be obtained by Bayes theorem taking into account that, in view of Lemma 2.42, the likelihood associated with the result ht indeed has the form
`( ) =
Z1 Z1 :::
t
t
f (n) (t1 ; :::; th ; uh+1 ; :::; un j)duh+1 :::dun :
Thus our task becomes one of checking that
(tn;h) (t1 ; :::; th ) =
R (n;h)(t ; ::; t j) R 1 ::: R 1 f (n)(t ; ::; t ;t ; uj) () dt dud 1 h h+1 0 h+1 L t R 1R h R t t 1 ::: 1 f (n) (t ; ::; t ; t ; uj) () dt dud 1 h h+1 0 h+1 t
L t
(2.55)
where, for brevity, in the notation we replaced uh+2 ; :::; un by u and duh+2 :::dun by du. From Equation (2.43), we get
(tn;h) (t1 ; :::; th j) (n ; h)
Z1 Z1 t
:::
t
f (n) (t1 ; :::; th ; th+1 ; uj)dth+1 du =
Z1 Z1 t
:::
t
f (n)(t1 ; :::; th ; t; uj)du:
Then the r.h.s. of (2.55) becomes R [R 1 ::: R 1 f (n)(t ; :::; t ; t; uj)du] () d 1 h 0 (n ; h) R R 1L tR 1 (tn) = [ ::: f ( t ; :::; t ; t ; u j ) dt 1 h h+1 h+1 du]0 () d L t t
R
R R
1 ::: 1 [ f (n)(t ; :::; t ; t; uj) () d]du 1 h 0 (n ; h) R 1 tR 1 R t (nL) = ::: [ f ( t ; :::; t ; t ; u j ) 1 h h+1 0 () d]dth+1 du t t L
R 1 ::: R 1 f (n)(t ; :::; t ; t; u)du h R (n ; h) 1 t R 1 t (n) 1 ; ::: f (t ; :::; t ; t; u)dt du t
since
f (n) (t
1
t
1 ; :::; tn ) =
Z L
h
h+1
f (n) (t1 ; :::; tn j)0 () d:
Then (2.55) is proved by taking again into account Equation (2.43).
MULTIVARIATE CONDITIONAL HAZARD RATES
79
Remark 2.44. Let 0 be the initial density of and suppose the history ht as in (2.40) has been observed (h = 0; 1; :::; n ; 1). By Lemma 2.42 we see that the posterior density of can be speci ed by (jht ) / (t1n) () expf;n
Z t1 0
(un) ()du ;
Yh j =2
h X j =2
;(n ; h)
(thn;j+1) (jt1 ; :::; tj;1 )
(n ; j ; 1)
Zt th
Z tj tj;1
(unj+1) (jt1 ; :::; tj;1 )du
(un;h) (jt1 ; :::; th )dug
(2.56)
Note that Equation (2.52) provides an appropriate generalization of (2.49).
Remark 2.45. As mentioned, it is often the case that has to be replaced by
a stochastic process t . Indeed , or some of its coordinates, can be thought of as describing environmental conditions which may be liable to undergo timevariations as also mentioned at the end of the previous section. In such cases the equations in (2.49) and (2.50) have to be suitably modi ed to more general formulae. In order to obtain such formulae a setting of stochastic processes and appropriate general assumptions are needed. However it is still possible to see directly, by means of a heuristic reasoning, what in tractable cases the appropriate generalization of (2.49) and (2.50) should be: the conditional distribution of , given the observed history (2.40 ) is to be replaced by the conditional distribution of t , given the observed history (2.40). Here we shall not consider such a generalization. Rigorous results can be obtained as particular cases of a general theorem from the theory of point processes (see e.g. Bremaud (1981); the techniques of stochastic ltering just serve to obtain the conditional distribution of t , given the observed history; for a deeper mathematical discussion on this topic see Arjas (1989). Remark 2.46. Often it is natural to conceive the units (or \individuals"), whose lifetimes T1; :::; Tn we are studying, as \parts" or \components" of a single system. This is the case, for instance, in Example 2.37. It can be true for biological applications as well, where it is impossible to abstract individuals of a population from their common environment, and in which also interactions among dierent members are necessarily present. We remark however that a speci c feature of our treatment is that, even when the individuals are part of the same system, we are not interested in predicting the evolution of the system
80
EXCHANGEABLE LIFETIMES
as such (e.g. to estimate its own failure-time); we are interested in the lifetimes of any single unit. This means that we completely ignore the structure function of the system.
Remark 2.47. In modeling dependence among lifetimes, we are rather interested in studying the eects both of the common environment and of the stress (or the help) that aects any single unit at a generic time-instant t as the result of the behavior of all the other units in the interval [0; t). It is to be noted that dependence due to informational eects about the common environment and dependence due to physical eects of interactions among units may overlap in determining the kind of stochastic dependence that is described by means of the m.c.h.r. functions. For the case when environmental factors are constant in time, this is better explained by means of Theorem 2.43.
2.4 Further aspects of m.c.h.r. In this section we focus attention on some further aspects of interest related to the notion of multivariate conditional hazard rate. For this purpose, we keep the same setting and same notation as in Section 2.3, where a longitudinal observation of failure data was considered: individuals U1 ; :::; Un are units which start living at a same epoch 0; the quantities T1 ; :::; Tn denote the lifetimes of U1 ; :::; Un and up to any time t a dynamic history of the form ht fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg as in (2.40) is observed: T(1) ; :::; T(n) are the order statistics, h is the number of failures observed up to t; t1 ::: th are the corresponding failure times and all the units surviving at t share the same age t. h can be seen as the observed value, at time t, of the stochastic process Ht de ned as follows
Ht
n X j =1
1fTj tg; 8t 0
(2.57)
where the symbol 1A denotes the indicator of the event A. Ht counts the number of failures observed up to time t; of course H0 = 0 and the trajectories of Ht are stepwise constant, non-decreasing, functions on [0; +1), with jumps at the instants T(1) ::: T(n) .
2.4.1 On the use of the m.c.h.r. functions
We saw, in Proposition 2.30, how (tn;h) (t1 ; :::; th ) (0 h n ; 1; 0 < t1 ::: th ) are to be computed in terms of the joint density function f (n). Conversely, we can see how f (n) can be computed starting from the knowledge of the m.c.h.r. functions. Indeed the joint density function is determined by the family of functions f(tk) gk=1;:::;n;t0 and we can in particular obtain the following Proposition, as a direct consequence of Lemma 2.42.
FURTHER ASPECTS OF M.C.H.R.
81
Proposition 2.48. If the limits de ning (tn) and (tn;h)() (h = 1; 2; :::; n ; 1) exist, the joint distribution of T1 ; :::; Tn admits a joint density f (n). Denoting by (t(1) ; :::; t(n) ) the vector of order statistics of (t1 ; :::; tn ), the following relation holds:
f (n)(t
1 ; :::; tn
) = (n)
t(1)
( X n
exp ;
Yn
h=2
(t(nh;) h+1) (t(1) ; :::; t(h;1) ) exp
[n ; (h ; 1)]
h=2
Z t(h) t(h;1)
Z t(1) ;n
0
(un;h+1) (t(1) ; :::; t(h;1) )du
(n) du u
)
(2.58)
Note that Equation (2.58) is a multivariate analogue of (2.3). Example 2.49. Let > 0 be a given number and consider the case
(tn) = ; t 0 (tn;h) (t1 ; :::; th ) = ; 0 h n ; 1; 0 < t1 ::: th t: Then (2.58) yields
f (n)(t1 ; :::; tn ) = f (n) (t(1) ; :::; t(n) ) = n expf;
n X j =1
tj g:
i.e. T1 ; :::; Tn are i.i.d. with exponential distribution of parameter . More generally, (2.58) shows that the case of i.i.d. lifetimes is characterized by m.c.h.r.'s of the form in (2.45). Example 2.50. For the case (tn;h)(t1 ; :::; th) = n; h (linear breakdown model), considered in Example 2.37 we can obtain, from (2.58), that the corresponding joint density is n
f (n) (t1 ; :::; tn ) = n! expf; t(n) g The case considered just above (as well as the case in Example 2.36, for instance) provides an example of the following fact: in many situations, which arise in the applications of reliability, dependence among lifetimes can be modeled in a completely natural way in terms of the m.c.h.r. functions. In the same cases, on the contrary, it can take a while to gure out what the corresponding joint density or the joint survival function should be. This fact allows us to focus one reason of interest in Proposition 2.48, which actually permits one to obtain the joint density starting from the m.c.h.r. functions.
82
EXCHANGEABLE LIFETIMES
We also notice that, among the dierent tools which characterize an absolutely continuous probability distribution on Rn+ , the set of m.c.h.r. functions is, in a sense, the most natural way to describe the conditional distribution of residual lifetimes Tj1 ; t; :::; Tjn;h ; t, given an observed history of the type (2.40). Indeed, for h = 0; 1; :::; n ; 1; 0 < t1 ::: th t, let us consider the m.c.h.r. functions corresponding to such a conditional distribution, i.e., for r = 1; :::; n ; h + 1; 0 < u1 ::: ur s, let us consider e((s n;h);r)(u1 ; :::; ur jh; t1 ; :::; th ; t) lim 1 P fT(h+r+1) ; t s + jT(1) = t1 ; :::; T(h) = th ;
!0
T(h+1) ; t = u1 ; :::; T(h+r) ; t = ur ; T(h+r+1) ; t > sg and, for the case r = 0, e(sn;h) (jh; t1 ; :::; th ; t) 1 P fT lim (h+1) ; t s + jT(1) = t1 ; :::; T(h) = th ; T(h+1) ; t > sg !0
By the very de nition of m.c.h.r. function, we easily see that
e(sn;h) (jh; t1 ; :::; th ; t) = (sn+;t h) (t1 ; :::; th ) and, for r = 1; :::; n ; h + 1,
e((s n;h);r) (u1 ; :::; ur jh; t1 ; :::; th ; t) = (t+n;s (h+r)) (t1 ; :::; tr ; u1 + t; :::; ur + t):
(2.59) Note that Equation (2.59) can be seen as the direct multivariate analogue of the formula rt (s) = r(t + s) which holds for one-dimensional hazard rate functions, where r is the failure rate function of a lifetime T and rt (s) denotes the conditional failure rate of a residual lifetime T ; t, conditional on fT > tg : On the contrary, the set of m.c.h.r. functions is not a suitable tool for deriving marginal distributions or conditional distributions of a subset of lifetimes given an observation of the form fTi1 = t1 ; :::; Tih = th g for the other lifetimes. Indeed the multivariate conditional hazard at any time t > 0, only allows predictions about the behavior, just after t, of those units which still survive at t, while it does not allow probabilistic assessment about events before t. One aspect of the above is that, in particular, no simple equation relates m.c.h.r. functions of m-dimensional marginal distributions to the m.c.h.r. functions of the original n-dimensional distribution (m < n).
FURTHER ASPECTS OF M.C.H.R.
83
Example 2.51. Consider again the linear breakdown Ross model with > 0 given and n = 3. In such a case (2) t (t1 ) = 2 ;
(1) t (t1 ; t2 ) = ;
(3) t = 3;
3
f (3) (t1 ; t2 ; t3 ) = 3! expf; t(3) g:
As far as the two-dimensional marginal density function is concerned, we have
f (2) (t1 ; t2 ) = = =
Z t1_t2 0
f (3) (t
Z1 0
1 ; t2 ; t3 )dt3 +
Z t1 _t2 3 0
f (3)(t1 ; t2 ; t3 )dt3
Z1 t1 _t2
3! expf; t1 _ t2 gdt3 +
f (3) (t1 ; t2 ; t3 )dt3
Z 1 3 expf; t3 gdt3 t1 _t2 3!
2
= 3! expf; t1 _ t2 g( t1 _ t2 + 1):
The m.c.h.r. functions of the pair (T1 ; T2 ) can be obtained by applying Equations (2.41) and (2.42) to the above two-dimensional joint density. Some lengthy computations give (t + 1) (2) (t + 2) (1) (2.60) t (t1 ) = (t + 2) ; t = 2 (t + 3) : This still de nes a non-homogeneous exchangeable model where the m.c.h.r. do not depend on past failure times.
2.4.2 Dynamic histories, total time on test statistic and total hazard transform
In the statistical analysis of a dynamic history the following de nitions are of interest. De nition 2.52. The total time on test (in short TTT) process is the stochastic process de ned by
Yt =
n ; X i=1
T(i) ^ t ; t 0
where a ^ b denotes the minimum between a and b; as usual.
(2.61)
84
EXCHANGEABLE LIFETIMES
In terms of the process Ht , we can give Yt the following equivalent expressions:
Yt =
Ht X i=1
T(i) + (n ; Ht ) t; Yt =
Zt 0
(n ; Hs ) ds
(2.62)
Note that Yt quanti es the total amount of life spent by individuals U1 ; :::; Un in the time interval [0; t] : For h = 1; 2; :::; n; consider the random variable
Y(h) YT(h) Y(h) is the total time on test cumulated by U1 ; :::; Un until the h-th failure, i.e. in the random time interval 0; T(h) . We can also write
Y(h) =
h X i=1
T(i) + (n ; h) T(h)
(2.63)
De nition 2.53. For h = 1; 2; :::; n; the h-th normalized spacing between order statistics is the random variable de ned by
;
Ch (n ; h + 1) T(h) ; T(h;1)
(2.64)
where T(0) 0:
Ch is the total time on test cumulated by surviving individuals between the (h ; 1)-th and h-th failure; from (2.63) and De nition 2.53, we immediately see that Y(h) can be decomposed in terms of the normalized spacings as follows Y(h) =
h X i=1
Ci :
(2.65)
Example 2.54. In a life testing experiment on 6 similar units, the following
failure data have been observed: t(1) = 9; 000 h; t(2) = 16; 000 h; t(3) = 20; 000 h; t(4) = 24; 600 h; t(5) = 28; 000 h; t(6) = 30; 400 h: This result yields C1 = 54; 000; C2 = 35; 000; C3 = 16; 000; C4 = 13; 800; C5 = 7; 800; C6 = 2; 400: Y(1) = 54; 000; Y(2) = 89; 000; Y(3) = 105; 000; Y(4) = 118; 800; Y(5) = 126; 600; Y(6) = 129; 000: The corresponding graph for the process Yt is presented in Figure 6 below.
FURTHER ASPECTS OF M.C.H.R.
85
129 126.6 119.8 105 89
54
0
9
16
20
24.6
28 30.4
Figure 6. The following de nition is also of interest in the setting of longitudinal observations of failure data. Given t1 ; :::; tn with 0 t1 ::: tn , consider the functions
w1 (t1 ) n wi (t1 ; :::; ti ) (n ; i + 1)
Z ti ti;1
Z t1 0
(un) du
(un;i+1) (t1 ; :::; ti;1 )du; i = 2; :::; n: (2.66)
By using this notation, the expression (2.58) for the joint density f (n) becomes f (n) (t1 ; :::; tn ) =
( n)
t(1)
n Y h=2
(t(nh;) h+1) (t(1) ; :::; t(h;1) ) exp
( X ) n ; ; wh t(1) ; :::; t(h) h=1
86
EXCHANGEABLE LIFETIMES Consider now the random variables de ned as follows
;
and
Wi wi T(1) ; :::; T(i) ; i = 1; :::; n:
(2.67)
j Wn1 + ::: + n ;Wjj+ 1 ; j = 1; :::; n:
(2.68)
Heuristically j is the total hazard cumulated by U(j) during its life, where
U(j) is the individual failing at the time T(j) :
In what follows we derive interesting properties of (W1 ; :::; Wn ) and ( 1 ; :::; n ). To this purpose, for (u1 ; :::; un ) 2 Rn+ , we introduce the notation n X Z (u1 ; :::; un ) un1 ; un1 + n u;2 1 ; :::; n ;uii + 1 i=1
!
(2.69)
so that we can write and, from (2.64),
( 1; :::; n ) = Z (W1 ; :::; Wn )
(2.70)
;T ; :::; T = Z (C ; :::; C ) 1 n (1) (n)
(2.71)
Concerning the sum of coordinates of the transformation Z ; note that, 8 (u1 ; :::; un ) 2 Rn+ , n n ui ) = X u1 + ( u1 + u2 ) + ::: + (X ui : n n n;1 i=1 n ; i + 1 i=1
(2.72)
The following fact is well known. Lemma 2.55. Let T1; :::; Tn be exchangeable with a positive density; f (n) (t) ; t 2 Rn+ and let ;n t 2 Rn+ jt1 t2 ::: tn . The joint density of T(1) ; :::; T(n) is given by
fb(n) (t) = n!f (n) (t1 ; :::; tn ); for t 2 ;n
and
(2.73)
fb(n) (t) = 0; otherwise.
Concerning the joint distribution of the vector C (C1 ; :::; Cn ), we have
FURTHER ASPECTS OF M.C.H.R.
87
Proposition 2.56. C has an absolutely continuous distribution with density fC(n) (c1 ; :::; cn ) = f (n) (Z (c1 ; :::; cn )) ; (c1 ; :::; cn ) 2 Rn+ : (2.74) Proof. The distribution of C admits a joint density, since it is de ned by means of the one-to-one dierentiable transformation (2.64), in terms of T(1) ; :::; T(n) which admit a joint density. The inverse of (2.64) is the transformation Z in (2.71). Then one has, for (c1 ; :::; cn ) 2 Rn+ ,
fC(n) (c1 ; :::; cn ) = fb(n) (Z (c1 ; :::; cn )) jJZ (c1 ; :::; cn )j
where JZ (c1 ; :::; cn ) is the Jacobian of the transformation Z . Equation (2.74) is then proved by taking into account (2.73), and by noting that, for r = 1; :::; n;h, jJZ (c1 ; :::; cn )j = n1! , since
1 0 0 ::: 0 n1 1 0 ::: 0 n n;1 JZ (c1 ; :::; cn ) = n1 n;1 1 n;1 2 ::: 0 : ::: ::: ::: ::: ::: n1 n;1 1 n;1 2 ::: 1
At this point it is helpful to look at the special case of independent, identically distributed, standard exponential variables. Let then T1 ; :::; Tn be such that
f (n) (t
1 ; :::; tn ) = exp
( X n ) ;
i=1
ti
(2.75)
By taking into account Proposition 2.56 and the identity (2.72), we reobtain a well-known result (see e.g. Barlow and Proschan, 1975, p. 60). Lemma 2.57. T1; :::; Tn are independent, identically distributed, standard exponential variables if and only if the corresponding variables C1 ; :::; Cn are also independent, identically distributed, standard exponential variables. By combining this result with the relation (2.71), we also obtain a more general fact: for non-negative random variables R1 ; :::; Rn , consider the vector (V1 ; :::; Vn ) de ned by (V1 ; :::; Vn ) = Z (R1 ; :::; Rn )
88
EXCHANGEABLE LIFETIMES
Lemma 2.58. (V1; :::; Vn ) is distributed as the vector of the order statistics of n independent, identically distributed, standard exponential variables if and only if R1 ; :::; Rn are independent, identically distributed, standard exponential variables. The above Lemma can be used in the derivation of the following result, concerning the vectors (W1 ; :::; Wn ) and ( 1 ; :::; n ), respectively, de ned in (2.67) and (2.68). Proposition 2.59. Assume that (2.75) holds for the joint density of T1; :::; Tn. Then W1 ; :::; Wn are also independent, identically distributed, standard exponential variables and ( 1 ; :::; n ) is distributed as the vector of the order statistics of n independent, identically distributed, standard exponential variables. Proof. The condition (2.75) is equivalent to (t1n) = 1; t(n;h+1) (t1 ; :::; th;1 ) = 1. Whence, in this case, one has
w1 (t1 ) n t1 ; wi (t1 ; :::; ti ) (n ; i + 1) (ti ; ti;1 ) ; i = 2; :::; n: or, in other words, by comparing (2.66) with (2.64), we have (W1 ; :::; Wn ) = (E1 ; :::; En ). From Lemma 2.57 we have that W1 ; :::; Wn are i.i.d. standard exponential variables. In order to conclude the proof we only have to take into account (2.70) and to apply Lemma 2.58. A remarkable fact is that the property of (W1 ; :::; Wn ) and ( 1 ; :::; n ), proved in Proposition 2.59 under the condition (2.75), is generally true, as we are going to show next. To this purpose it is helpful to keep in mind the following elementary result. Let T be a lifetime with a strictly decreasing survival Rfunction G; hazard rate function r(t); and cumulative hazard function R (t) = 0t r ( ) d Lemma 2.60. The random variable
V R (T ) = ; log G (T ) has a standard exponential distribution. Proof. For v > 0;
P fV > vg = P T > R;1 (v) = exp ;R R;1 (v) = exp f;vg :
We now turn to considering exchangeable lifetimes T1; :::; Tn , with a positive density f (n) (t) ; t 2 Rn+ . Notice that now we are not requiring, for f (n) , the condition (2.75); however we can still claim
FURTHER ASPECTS OF M.C.H.R.
89
Proposition 2.61. a) W1 ; :::; Wn are i.i.d. with a standard exponential dis-
tribution b) ( 1 ; :::; n ) is distributed as the vector of the order statistics of n independent, identically distributed, standard exponential variables. Proof. As shown by Equation (2.46), T(1) is a non-negative random variable with hazard rate function n (tn) . Then W1 has a standard exponential distribution according to Lemma 2.60. Now we aim to obtain the conditional distribution of W2 given W1 : Since w1 (t1 ) is a one-to-one mapping, we can equivalently compute the conditional distribution of W2 given T(1) : By (2.47), the conditional distribution of T(2) ; T(1) ; given T(1) , admits the hazard rate function
rT(1) (t) (n ; 1) (Tn(1);+1)t (T(1) ) Then, by applying again Lemma 2.60
Z T(2) ;T(1) 0
rT(1) (t)dt
has a standard exponential distribution, and it is stochastically independent of T(1) . On the other hand
Z T(2) ;T(1) 0
rt1 (t)dt = (n ; 1)
;
Z T(2) T(1)
(un;1) (T(1) )du
We can then conclude that W2 = w2 T(1) ; T(2) has a standard ; exponential distribution, and it is stochastically independent of W1 = w1 T(1) : Continuing in this way, we obtain that Wh has a standard exponential distribution, and it is stochastically independent of W1 ; ::: Wh;1 and a) is proved. In order to prove b) we simply have to recall Equation (2.70) and then apply Lemma 2.58 to the vector (W1 ; :::Wn ). A presentation of the result above in a much more general setting has been given in Arjas (1989). Consider now a strictly positive, exchangeable, and for the rest arbitrary, density f (n) on Rn+ . The proposition 2.61 shows that we can construct a vector of exchangeable lifetimes T1; :::; Tn with density f (n) , starting from a vector of i.i.d. standard exponential variables. Such a construction goes along as follows: i) x a vector of i.i.d. standard exponential variables B1 ; :::Bn .
90
EXCHANGEABLE LIFETIMES
ii) Consider the transformation W de ned by (2.67) (this is a one-to-one transformation from R+n to R+n ) and denote its inverse by W ;1 : Note that W ;1 depends on the density f (n) by means of the functions wi 's de ned by (2.66). iii) De ne a vector of n random variables T() by
T() = W ;1 (B1; :::Bn ) T() must be distributed like the vector of order statistics of n lifetimes with density f (n) . iv) De ne exchangeable variables T1 ; :::; Tn by means of a random permutation of coordinates of T() : Then T1; :::; Tn must be distributed according to the density f (n) : Such a construction can be useful in many situations, from the points of view of both theory and practice. It is a special form of the \Total Hazard Construction," de ned in general for non-necessarily exchangeable vectors of lifetimes (see Shaked and Shanthikumar, 1994).
2.4.3 M.c.h.r. functions and dynamic suciency
Let us start from the analysis of two remarkable special cases: Schurconstant joint densities and m.c.h.r. functions not depending on past failure times, considered in Examples 2.35 and 2.37, respectively. In the rst case we have that (t0n;h) (t1 ; :::; th ) is only a function of the pair (h; yP ), where h is the number of failures and y is the observed total time on test, y = hi=1 ti +(n;h)t; in the second case (tn;h) (t1 ; :::; th ) is simply a function of the pair (h; t). These examples suggest a special de nition of suciency (dynamic suciency ) for data of the form (2.40). This concept can, in particular, be useful to characterize special probabilistic models for lifetimes; furthermore it can be used to give simpler forms to the transformation W ;1 , needed for the total hazard construction. Furthermore, conditions of dependence or of multivariate aging (to be studied in the next chapters) are more easily checked or imposed in the case of models characterized by the fact that statistics, of particularly simple form, have the property of dynamic suciency. Denote (0; te) (0; t; ::::; t) 2 f0g R+n and, for 0 < t1 ::: th t, (h; t1 ; :::; th ; te) (h; t1 ; :::; th ; t; :::; t) 2 fhg R+n
G f(h; t1 ; :::; th ; te)jt 0; h = 1; :::; n ; 1; 0 < t1 ::: th tg
FURTHER ASPECTS OF M.C.H.R.
91
Gb G [ f(0; te)jt 0g
(2.76)
We can look at Gb as the space of possible values taken by dynamic histories. Let q(h; t1 ; :::; th ; te) be a measurable function de ned on Gb (i.e. q is a function of the generic history). De nition 2.62. q is a dynamic prediction sucient (d.p.s.) statistic if the following implication holds q(h0 ; t01 ; :::; t0h0 ; t0 e) = q(h00 ; t001 ; :::; t00h00 ; t00 e) ) 0
00
(t0n;h ) (t01 ; :::; t0h ) = (t00n;h ) (t001 ; :::; t00h )
The following result generally shows a simple connection between the de nitions of prediction suciency and dynamic prediction suciency. Proposition 2.63. Let f (n) be such that Sm(m = 1; :::; N ; 1) is a sequence of prediction sucient statistics. Then the mapping q(h; t1 ; :::; th ; t) (h; Sh (t1 ; :::; th ); t) is a d.p.s. Proof. By (2.43) and by adapting the formula (1.48) of Chapter 1 to the present notation, we can write
(tn;h) (t1 ; :::; th ) =
R 1 ::: R 1 (S (t ; :::; t ); t; u :::; u )du :::du R 1t::: R 1t h;n;(hS (ht ; 1:::; t );hu ;hu+2 :::;nu )duh+2 :::dun t
t
h;n;h h 1
h
h+1 h+2
n
h+1
n
Then (tn;h) (t1 ; :::; th ) only depends on h; Sh(t1 ; :::; th ) and t: Conversely the knowledge of a d.p.s. statistic w(h; t1 ; :::; th ; te) does not give in general any indication about prediction sucient statistics. As the formula (2.58) shows, the joint density function depends both on the form of the m.c.h.r. functions and on the values of t(1) ; :::; t(n) . Indeed t(1) ; :::; t(n) appear in the r.h.s. of (2.58) also as the points between which m.c.h.r. functions are to be integrated. This in general prevents those values from being eliminated when a joint density is divided by a marginal density for computing a conditional distribution. A heuristic explanation of that can also be found in the considerations, developed in the previous subsection, concerning the fact that the system of the m.c.h.r. functions is not an adapt tool to deal with marginal distributions or with conditional distributions, when the conditional event is not a history of \dynamic" type.
92
EXCHANGEABLE LIFETIMES
2.5 Exercises
Exercise 2.64. Let be a non-negative parameter with a (prior) exponential density
0 () = expf; g; > 0 and let T be a lifetime with a conditional hazard rate, given = , r(tj) = t ( > ;1), i.e. the conditional distribution of T given is the Weibull distribution, with scale parameter and shape parameter (1 + ). Prove that the predictive hazard rate function is given by
r(t) =
Z1 0
r(tj)(jT > t)d = +t t1+ ; t > 0
Hint: use (2.5).
Exercise 2.65. Show that the hazard rate function for the one-dimensional survival function in (2.7) is given by
r(t) = r1 (t)P f > tjT > tg + r2 (t)P f tjT > tg: Hint: dierentiate F (s) with respect to s to obtain f (s) = r2 (s) expf;R2 ()
Zs
expfR2() ; R1 ()gq()d
0Z
+ r1 (s) expf;R1(s)g
1
s
q()d:
Exercise 2.66. For a time-transformed exponential model as in Example 2.5, consider the special case when the prior distribution of is Gamma with shape parameter and scale parameter . Show that, for D as in (2.18),
P fTh+1 > sh+1 ; :::; Tn > sn jDg =
P
P
+ hj=1 R(tj ) + nj=h+1 R(rj ) P P + hj=1 R(tj ) + nj=h+1 R(sj )
!(+h)
Hint: notice that posterior density (jD) is again the density of a Gamma distribution, with parameters
X 0 = 0 + h; 0 = 0 + [ R(tj ) +
and use the formula (2.31).
h
n X
j =1
j =h+1
R(rj )]
EXERCISES
93
Exercise 2.67. For two lifetimes T1; T2 in Example 2.19, prove the formula (2.32). Hint: start proving that
Z s1 0
F (2) (s1 ; s2 ) = expf;2R1()g expf;R2 (s1 ) ; R2 (s2 ) + 2R2()]gq()d+
Z s2 s1
expf;R1 (s1 ) ; R1 ()g expf;R2(s2 ) + R2 ()gq()d+
Z1 s2
expf;R1(s1 ) ; R1 (s2 )gq()d:
Exercise 2.68. For Example 2.20 write down explicitly the conditional survival probability P fT2 > s2 jT1 = t1 ; T2 > r2 g. Exercise 2.69. Prove that, for 1 h n ; 1, 0 < t1 ::: th t, (tn;h) (t1 ; :::; th ) =
@ h+1 F (n) (s1 ;:::;sh; sh+1 ;t;:::;t) j s1 =t1 ;:::;sh =th ;sh+1 =t @s1 :::@sh+1 ; ( n ) h @ F (s1 ;:::;sh ;t;:::;t) j s1 =t1 ;:::sh =th @s1 :::@sh
(2.77)
Exercise 2.70. Prove Lemma 2.42. Exercise 2.71. (Continuation of Exercise 2.67). Consider again the bivariate survival function in Example 2.19 and let A(s1 ; s2 ) r2 (s1 ) expf;R2 (s1 ) ; R2 (s2 )g
Z s1 0
expf;2R1() + 2R2 ()gq()d
B (s1 ; s2 ) r1 (s1 ) expf;R1(s1 ) ; R2 (s2 )g Z s2 expf;R1 () + R2 ()gq()d s1
C (s1 ; s2 ) r1 (s1 ) expf;R1(s1 ) ; R1 (s2 )g
Z +1 s2
q()d:
94
EXCHANGEABLE LIFETIMES
Show that
A(s1 ; s2 ) A(s1 ; s2 ) + B (s1 ; s2 ) + C (s1 ; s2 ) = P f s1 jT1 = t1 ; T2 > tg B (s1 ; s2 ) A(s1 ; s2 ) + B (s1 ; s2 )+C (s1 ; s2 ) = P fs1 < tjT1 = t1 ; T2 > tg C (s1 ; s2 ) A(s ; s ) + B (s ; s ) + C (s ; s ) = P f > tjT1 = t1 ; T2 > tg 1 2
1 2
1 2
By applying De nition 2.29 then obtain
(1) t (t1 ) = P f > tjT1 = t1 ; T2 > tgr1 (t) + P f tjT1 = t1 ; T2 > tgr2 (t) and
(2) t = P f > tjT1 > t; T2 > tgr1 (t) + P f tjT1 > t; T2 > tgr2 (t)
Exercise 2.72. In the case of two individuals in Example 2.20, where F (2) (s1 ; s2 ) =
2 X
m=0
p0 (m)P fT1 > s1 ; T2 > s2 jM = mg =
p0 (0)F 1 (s1 )F 1 (s2 ) + 12 p0 (1) F 0 (s1 )F 1 (s2 ) + F 0 (s2 )F 1 (s1 ) + p0 (2)F 0 (s1 )F 0 (s2 ) show that
(2)
t =;
@F (2) (s;t) j @s s=t = (2)
F (t; t)
p0 (0)f1 (t)F 1 (t) + 12 p0 (1) f0 (t)F 1 (t) + F 0 (t)f1 (t) + p0 (2)f0 (t)F 0 (t) p0 (0)F 1 (t)F 1 (t) + 12 p0 (1) F 0 (t)F 1 (t) + F 0 (t)F 1 (t) + p0 (2)F 0 (t)F 0 (t) (2) Hint: Calculate @F @s(1s1 ;u) js1 =t1 and take into account Exercise 2.69. Exercise 2.73. (Continuation). By using Bayes formula, show that the latter expression for (2) t is equal to
r1 (t)P fM = 0jT1 > t; T2 > tg + r0 (t)P fM = 2jT1 > t; T2 > tg + 21 fr1 (t) + r0 (t)gP fM = 1jT1 > t; T2 > tg:
BIBLIOGRAPHY
95
Exercise 2.74. Show that, in the special case r(t) r > 0, a(t) a > 0, (2.38) becomes (2.36), i.e. the bivariate exponential distribution of MarshallOlkin, by letting 2 a log( bb++2rr ); 0 a log( b((bb++r2)r) ); Exercise 2.75. Check the validity of Equation (2.59). Exercise 2.76. By applying Proposition 2.48 verify the equation n
f (n) (t1 ; :::; tn ) = n! expf; t(n) g
for the linear breakdown Ross model in Example 2.50. Exercise 2.77. Compute the joint density of the bivariate distribution in Example 2.36. Exercise 2.78. Check the validity of Equation (2.60) at the end of Example 2.51.
2.6 Bibliography Arjas, E. (1981). The failure and hazard processes in multivariate reliability systems. Math. Oper. Res. 6, 551-562. Arjas, E. and Norros, I. (1984). Lifelengths and association: a dynamical approach. Math. Op. Res. 9, 151-158. Arjas, E. and Norros, I. (1986). A compensator representation of multivariate life length distributions, with applications. Scand. J. Stat. 13, 99-112. Arjas, E. (1989). Survival models and martingale dynamics. Scand. J. Statist., 16, 177-225. Aven, T. and Jensen, U. (1999). Stochastic models in reliability. Springer Verlag, New York. Barlow, R. E. and Irony, T. (1993). The Bayesian approach to quality. In Reliability and decision making, R.E. Barlow, C.A. Clarotti and F. Spizzichino, Eds., Chapman & Hall, London. Barlow, R.E. and Mendel, M.B. (1992). de Finetti-type representations for life distributions. J. Am. Stat. Soc., 87, no. 420, 1116{1122. Barlow, R. E. and Proschan F. (1975). Statistical theory of reliability and life testing. Holt, Rinehart and Winston, New York. Barlow, R. E. and Proschan F. (1988). Life distributions and incomplete data. In Handbook of Statistics, Vol. 7, P.R. Krishnaiah and C.R. Rao Eds., Elsevier, 225-249. Bergman, B. (1985). On reliability and its applications. Scand. J. Statist., 12, 1-41.
96
EXCHANGEABLE LIFETIMES
Berliner, L.M. and Hill, B.M. (1988). Bayesian nonparametric survival analysis (with discussion). J. Am. Stat. Assoc., 83, 772-784. Bremaud, P. (1981). Point processes and queues. Martingale dynamics. Springer Verlag, New York. Caramellino, L. and Spizzichino, F. (1996). WBF property and stochastic monotonicity of the Markov process associated to Schur-constant survival functions. J. Multiv. Anal., 56, 153-163. Cinlar, E. and O zekici, S. (1987). Reliability of complex devices in random environments. Prob. Engnr. Inform. Sc. 1, 97-115. Cinlar, E. , Shaked, M. and Shantikumar J.G. (1989). On lifetimes in uenced by a common environment. Stoch. Proc. Appl. 33, 347-359. Clarotti, C.A. (1992). L'approche bayesienne predictive en abilite': \un noveau regard". CNES Report N. DCQ 176, Paris. Cox, D. and Isham, V. (1980). Point Processes. Chapman & Hall. London. Cox, D. and Oakes, D. (1984). Analysis of survival data. Chapman & Hall. London. Diaconis, P. and Freedman, D. (1987). A dozen de Finetti-style results in search of a theory. Ann. Inst. Henry Poincare 23, 397-423. Esary, J.D., Marshall, A.W. and Proschan, F. (1973). Shock models and wear process. Ann. Probab.,1, 627-649. Gaver, D.P. (1963). Random hazard in reliability problems. Technometrics 5, 211-226. Heinrich, G. and Jensen, U. (1996). Bivariate lifetimes distributions and optimal replacement. Math. Met. Op. Res., 44, 31-47. Hougaard, P. (1987). Modelling multivariate survival. Scand. J. Statist. 14, 291-304. Lawless, J. F. (1982). Statistical models and methods for lifetime data. John Wiley & Sons, New York. Lee, M. T. and Gross, A.J. (1991). Lifetime distributions under unknown environment. J. Stat. Plann. Inf. 29, 137-143. Lindley, D.V. and Singpurwalla, N.D. (1986). Multivariate distributions for the lifelengths of components of a system sharing a common environment. J. Appl. Prob., 23, 418-431. Martz, H. and Waller, R.A. (1982). Bayesian reliability analysis. John Wiley & Sons, New York. Nappo, G. and Spizzichino, F. (2000). A concept of dynamic suciency and optimal stopping of longitudinal observations of lifetimes. Volume of contributed papers presented at \Mathematical Methods of Reliability", Bordeaux, July 2000. Norros, I. (1986). A compensator representation of multivariate life length distributions, with applications. Scand. J. Statist. 13, no. 2, 99{112. Shaked, M. and Shanthikumar, J.G. (1987a). Multivariate hazard rates and stochastic ordering. Adv. Appl. Probab. 19, 123-137.
BIBLIOGRAPHY
97
Shaked, M. and Shanthikumar, J.G. (1987b). The multivariate hazard construction. Stoc. Proc. Appl., 24, 85-97. Singpurwalla, N.D. (1995). Survival in dynamic environments. Statistical Sciences, 10, 86-103. Singpurwalla, N.D. (2000). Some Cracks in the Empire of Chance. Private comunication. Singpurwalla, N.D. and Yougren, M.A. (1993). Multivariate distributions induced by dynamic environments. Scand. J. Statist., 20, 250-261. Singpurwalla, N.D. and Wilson, S.P. (1995). The exponentiation formula of reliability and survivals: does it always hold?. Lifetime Data Analysis, 1, 187-194. Yashin, A. I. and Arjas, E. (1988). A Note on random intensities and conditional survival functions. J. Appl. Probab. 25, 630-635.
Chapter 3
Some concepts of dependence and aging 3.1 Introduction In this chapter we study some concepts of stochastic orderings, dependence and aging properties for vectors of random quantities. Such notions provide the essential tools for obtaining inequalities on conditional probability distributions of residual lifetimes, namely for the conditional probabilities of the type
P fTh+1 > sh+1 ; :::; Tn > sn jT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g
(3.1)
considered in the previous chapter. Such inequalities can be useful in various decision problems, as will also be discussed shortly in Chapter 5. A very rich literature exists concerning the topics of stochastic orderings, dependence and aging properties and related applications in reliability, life-testing and survival analysis; we address the reader to the partial list of references at the end of this chapter for a wider treatment. The aim of this chapter is just to recall, from these elds, the notions that can be more strictly of use for our purposes. We shall also discuss some aspects of the concept of dependence in Bayesian analysis. Furthermore we shall consider some preliminaries to the formulation of concepts of aging, for exchangeable lifetimes, in a predictive Bayesian approach. The fundamental notions are those of univariate and multivariate stochastic orderings. Some of those, for what concerns the univariate case, will be recalled later in this section. Some notions concerning the multivariate case will 99
100
SOME CONCEPTS OF DEPENDENCE AND AGING
be brie y illustrated in the next Section 2; a quite exhaustive reference is provided by the volume by Shaked and Shanthikumar (1994). These topics indeed provide the basis for formulating in general de nitions and results concerning notions of dependence and of multivariate aging. In Section 3 we shall consider some concepts of dependence. We shall then devote Section 4 to discuss some notions of univariate and multivariate aging. In order to formulate notions of stochastic orderings and dependence it is necessary to consider random vectors (X1 ; ::::; Xn ), in general. We shall then use the following notation: for an ordered subset I fi1; :::; ijI jg f1; :::; ng (where jI j is the cardinality of I ), XI denotes the vector (Xi1 ; :::; XijIj ); e denotes a vector with an appropriate number of coordinates all equal to 1, whereas, for t > 0, te (t; :::; t); Ie is the complement of I ; nally ht
fXI = xI ; XIe > teg
(3.2)
denotes the history
fXi1 = xi1 ; :::; XijIj = xijIj ; Xj > t(j 2 Ie )g; where 0 xi1 ::: xijIj t. For a vector of lifetimes X, for a history ht as in (3.2) and for j 2 Ie, the
symbol
jjI (tjxi1 ; :::; xijIj ) is used to denote the multivariate conditional hazard rate (see Shaked and Shanthikumar, 1994): 1 P fX t + tjh g: jjI (tjxi1 ; :::; xijIj ) = lim j t t!0 t In general, given a history ht , we need to consider the joint distribution of the vector of residual lifetimes (Xj ; t), j 2 Ie, conditional on the observation of ht , which will be denoted by
L[(XIe ; te)jht ]: Sometimes one must refer to observed histories ht without specifying which is the set I of indexes of individuals failed along ht . In such cases the notation (X ; te)+ is used to denote the vector whose components are the residual lifetimes for surviving individuals and 0 for failed individuals, so that it is possible to write, for all t > 0,
X = (X ^ te) + (X ; te)+ ;
INTRODUCTION
101
where
(X ^ te) (X1 ^ t; :::; Xn ^ t): When we condition with respect to a history ht as above (where events of null probability can possibly appear in the conditioning) we shall assume the existence of probability density functions for the joint distribution of X. It can also happen that this is tacitly assumed some other times, when clear from the context. When considering equations or inequalities for conditional probabilities of the form (3.1), it will be also understood that they are valid with probability one (i.e. up to possible events of probability 0). Our interest is in the condition of exchangeability and we use special notation when dealing with such a speci c case: as done in most of the last chapter, the symbol T (T1 ; :::; Tn ) denotes a vector of exchangeable lifetimes and the symbols (tn) ; (tn;h)(t1 ; :::; th ) are respectively used in place of jj; (t) and jjI (tjt1 ; :::; th ), for arbitrary j; I; such that j 2= I and jI j = h(1 h < n). Throughout this chapter several examples will be presented to illustrate how various concepts manifest in dierent exchangeable models for lifetimes. As a background for the arguments to be presented henceforth, in the next subsection we recall basic de nitions about notions of orderings between pairs of one-dimensional probability distributions.
3.1.1 One-dimensional stochastic orderings
We start with a very short presentation about totally positive functions of order 2, in the simplest form possible (see Karlin, 1968, for more general de nitions, results, and proofs). Let K (x; y) be a non-negative function of the two variables x; y (x 2 X R, y 2 Y R) De nition 3.1. K (x; y) is totally positive of order 2 (in short TP2) if, for y < y0 , the ratio K (x; y) K (x; y0 ) is a decreasing function of x.
Proposition 3.2. (Basic composition formula). Let K (x; y) and H (y; z) be two TP2 functions (x 2 X ; y 2 Y ; z 2 Z R) and let be a positive measure over Y . Then the function Z L(x; z ) K (x; y)H (y; z )d(y) Y
102
SOME CONCEPTS OF DEPENDENCE AND AGING
is TP2: Let K (x; y) (x 2 X ; y 2 Y ) be a TP2 function and let h(y) be a real function. For a given measure , consider the function g(x) de ned by the integral
g(x)
Z
Y
K (x; y)h(y)d(y)
Proposition 3.3. (Variation sign diminishing property). If h(y) changes sign at most once, then g(x) changes sign at most once. Now we recall, and comment on, three well-known types of stochastic ordering for one-dimensional distributions, namely likelihood ratio ordering, hazard rate ordering and usual stochastic ordering. We prefer in the beginning to formulate de nitions for the case of probability distributions concentrated on R+ [0; +1) and admitting a density. Let then F 1 and F 2 be two one-dimensional survival functions (F 1 (0) = F 2 (0) = 1) and denote by fi and ri the corresponding density functions and failure rate functions, respectively (i = 1; 2):
F i (t) = expf;
Zt 0
ri ( )d g; fi (t) = ri (t)expf;
Zt 0
ri ( )d g; for t 0
(3.3)
De nition 3.4. F 1 is less than F 2 in the usual stochastic ordering, written
F 1 st F 2 , if
F 1 (t) F 2 (t); 8t 0:
De nition 3.5. F 1 is less than F 2 in the hazard rate ordering, written F 1 hr F 2 , if r1 (t) r2 (t); 8t 0: De nition 3.6. F 1 is less than F 2 in the likelihood ratio ordering, written F 1 lr F 2 , if fi (t) is TP2 in i and t, namely f1 (t0 ) f1 (t00 ) ; for 0 t0 t00 : f2 (t0 ) f2 (t00 )
Example 3.7. Let us consider two exponential distributions with parameter 1 and 2 , respectively, i.e.
F i (t) = expf;i tg; ri (t) = i ; It is trivial that 1 > 2 implies F 1 lr F 2 , F 1 rh F 2 and F 1 st F 2 .
INTRODUCTION
103
Example 3.8. For t > 0, let r1 (t) = max(t; 1); r2 (t) = 1: Then 2 f1 (t) = expf;tg, for 0 t 1; f1 (t) = e;1=2 t expf; t2 g, for t > 1
whence
f1 (t) = 1, for 0 t 1; f1 (t) = e;1/2 t expf; t2 + tg, for t > 1 f2 (t) f2 (t) 2 It is F 1 hr F 2 and F 1 st F 2 , but the ratio ff12 ((tt)) is not everywhere decreasing. Example 3.9. Let
r1 (t) = 2, for 0 t 4 and t 5; r1 (t) = 0, for 4 < t < 5 and
r2 (t) = 1; 8t 0: As is immediately seen, F 1 st F 2 but neither F 1 lr F nor F 1 hr F 2 hold true. The following result is obvious. Theorem 3.10. F 1 hr F 2 is equivalent to the condition
F i (t) TP2 in the variables i and t: The following two results are of fundamental importance and are well known. Theorem 3.11. F 1 lr F 2 ) F 1 hr F 2 ) F 1 st F 2. Proof. For v > u 0, F 1 lr F 2 implies
Z vZ 1 u v
f1 (t) f2 (t0 ) dtdt0
that is
F 2 (v )
Zv u
Z 1Z v v
u
f1 (t) dt F 1 (v)
f1 (t) f2 (t0 ) dtdt0
Zv u
f2 (t) dt
104
SOME CONCEPTS OF DEPENDENCE AND AGING
or, equivalently,
F 2 (v) F 1 (u) ; F 1 (v) F 1 (v) F 2 (u) ; F 2 (v) which is just the condition F i TP2 in the variables i andRt. R On the other hand r1 (t) r2 (t) implies 0t r1 ( )d 0t r2 ( )d , and then F 2 (t) = expf;
Zt 0
r1 ( )d g expf;
Zt 0
r2 ( )d g = F 2 (t)
Let T1 and T2 be two random variables with survival functions F 1 and F 2 , respectively and let :R+ ! R be a given function. Then the expected value of (Ti ) (i = 1; 2); when it does exist, is denoted by E[
(Ti )] =
Z1 0
(t)dFi (t):
Theorem 3.12. F 1 st F 2 if and only if E [ (T1 )] E [ (T2 )] for any non-decreasing function such that both the expected values exist. Proof. For simplicity's sake we rst consider the case when Fi (t) = 1 ; F i (t) (i = 1; 2) is everywhere continuous and strictly increasing over [0; 1), so that Fi;1 : [0; 1) ! [0; 1) is also continuous and strictly increasing. Let U be a random variable uniformly distributed over the interval [0; 1] and consider the random variables X = F1;1 (U ); Y = F2;1 (U ): It is easy to check that the survival functions of X and Y coincide with F 1 and F 2 , respectively and then we can write E [ (T1 )] = E [ (X )] ; E [ (T2 )] = E [ (Y )] : Furthermore, by the assumption F 1 st F 2 , we have F1;1 F2;1 , whence X Y and then E [ (X )] E [ (Y )] : When Fi (t) (i = 1; 2) is not strictly increasing we can take as Fi;1 the right continuous inverse of Fi ; de ned by F1;1 (u) = supfxjFi (x) ug, 8u 2 (0; 1):
INTRODUCTION
105
Theorems 3.12 and 3.10 show that the de nitions of F 1 st F 2 and F 1 hr F 2 can immediately be extended, in a natural manner, to the arbitrary case of
a pair of distributions which are not necessarily both absolutely continuous. Even the de nition of likelihood ratio ordering can be extended to the general case, by considering the densities of the two distributions with respect to some measure which dominates both of them (such a measure always exists) and by requiring monotonicity of the ratio between two such densities. It is also clear that there is no reason to limit the above de nitions to pairs of one-dimensional distributions concentrated on [0; +1). When convenient, random quantities of concern will also be denoted by symbols X; Y; ::: Even though a stochastic ordering (where stands for st, hr, or lr) is an ordering relation between two probability distributions, one can also write X Y . This means that the distribution of X is majorized by the distribution of Y in the sense.
3.1.2 Stochastic monotonicity and orderings for conditional distributions
We are often interested in comparing dierent one-dimensional distributions which arise as conditional distributions for a same scalar random quantity X , given dierent observed events. In this respect we can obtain concepts of stochastic monotonicity corresponding in a natural way to the notions of stochastic orderings. Let X be given and let Z be a random variable taking values in the domain Z Rd for some d = 1; 2; ::: . We assume that the joint distribution of (X; Z ) is absolutely continuous and consider the conditional distributions of X given events of the type fZ = z g, with z 2 Z ; let fX (jz ) and F X (jz ) in particular denote the conditional density and survival function, respectively. Let e be a given partial ordering de ned on Z and let be a xed onedimensional ordering (thus = st; hr; or lr). De nition 3.13. X is stochastically increasing in Z in the ordering with respect to e , if the following implication holds:
z 0 e z 00 ) F X (jz 0 ) F X (jz 00 ):
When no partial ordering e on Rd is speci ed, one tacitly refers to the natural partial ordering on Rd (of course we have a total ordering in the case d = 1). We now translate some fundamental properties of the univariate lr ordering into properties for conditional distributions in a number of dierent cases. This permits us to underline some aspects of interest in Bayesian statistics. Simple examples of applications will be presented in Chapter 5.
106
SOME CONCEPTS OF DEPENDENCE AND AGING
Remark 3.14. (Preservation of the lr ordering under posterization ). Let be a scalar parameter with a prior density () and a (non-necessarily scalar) statistical observation X with a likelihood f (xj). The posterior density of given the observation fX = xg is then given by (jx) / ()f (xj): Compare now the two posterior densities 1 (jx) and 2 (jx), corresponding to two dierent prior densities 1 () and 2 (): i (jx) / i ()f (xj); i = 1; 2: It is obvious that, for x such that f (xj) > 0 for all , the following implication holds: 1 () lr 2 () ) 1 (jx) lr 2 (jx) Remark 3.15. (Creation of lr ordering by means of posterization ). Consider a scalar parameter with a given prior density () and a scalar statistical observation X with a likelihood f (xj). Assume f (xj) to be a TP2 function, i.e. let X be increasing in in the sense of the lr ordering, so that the conditional distribution of X , given f = 0 g is less than the conditional distribution of X , given f = 00 g in the lr ordering, if 0 < 00 . It is readily seen that this is equivalent to the TP2 property of the posterior density (jx) of given fX = xg i.e. to the condition that is increasing in X in the lr ordering: if we compare the eects of two dierent results fX = x0 g and fX = x00 g, with x0 < x00 , then (jx0 ) lr (jx00 ): Compare now the posterior distributions for given two dierent results of the form fX > x0 g and fX > x00 g, again with x0 < x00 : for F (tj) = we have
Z +1 t
f ( j)d;
(jX > x0 ) / () F (x0 j); (jX > x00 ) / () F (x00 j):
It is then obvious that the comparison (jX > x0 ) lr (jX > x00 ) can be obtained by requiring the weaker condition that the function F (xj) is TP2 , namely that X is increasing in in the hr ordering. Note that the same condition also implies (jX = x) lr (jX > x); for any x:
MULTIVARIATE STOCHASTIC ORDERINGS
107
Remark 3.16. (lr ordering for predictive distributions). Consider a scalar parameter and a scalar statistical observation X with a TP2 likelihood f (xj). Compare the two predictive densities g1 (x) and g2 (x), corresponding to two dierent prior densities 1 () and 2 ():
gi (x) =
Z1 -1
i () f (xj)d; i = 1; 2:
As an immediate consequence of the basic composition formula, we get the following implication:
1 () lr 2 () ) g1(x) lr g2 (x):
3.2 Multivariate stochastic orderings In this section we shall discuss de nitions of multivariate concepts of stochastic orderings which extend the notions seen above for the one-dimensional case; this leads to four dierent concepts: a) Usual multivariate stochastic ordering b) Multivariate likelihood ratio ordering c) Cumulative hazard rate ordering d) Multivariate hazard rate ordering
3.2.1 Usual multivariate stochastic ordering
This concept is very well known. For two d-dimensional random vectors
X; Y we want to de ne the condition X st Y. A subset U Rn is an upper set (or an increasing set) if its indicator function is increasing (with respect to the natural partial ordering ), i.e. x0 2 U; x0 x00 ) x00 2 U: De nition 3.17. X is stochastically smaller than Y (written X st Y) if, for any upper set U , one has
P fX 2 U g P fY 2 U g:
Remark 3.18. The condition X st Y trivially implies the inequality FX (x) FY (x); for any x 2 Rn . If n > 1, the vice versa is not true, as well-known examples
show.
108
SOME CONCEPTS OF DEPENDENCE AND AGING
A direct multidimensional analog of Theorem 3.12, shows however that Definition 3.17 can be seen as the most natural multidimensional extension of De nition 3.4:
Theorem 3.19. X st Y if and only if E [ (X)] E [ (Y )] for any non-decreasing function :Rn ! R such that both the expected values exist.
Proof. It is obvious that the condition E [ (X)] E [ (Y)] for any nondecreasing function is sucient for X st Y , since
P fX 2 U g = E [1U (X)]; P fY 2 U g = E [1U (Y)] and, U being an increasing set, the indicator function 1U is increasing. On the other hand, any increasing function :Rn ! R can be approximated as follows: (x) = mlim !1 m (x) where m has the form m (x) =
m X i=1
= am;i 1Um;i (x) ; bm;
and am;i > 0 (i = 1; :::; m), Um;i are upper sets of Rn and bm 2 Rn . Then it is E[
(X)] = mlim !1 E [ m (X)] = mlim !1
E[
(Y)] = mlim !1 E [ m (Y)] = mlim !1
m X i=1
m X i=1
am;i P fX 2 Um;i g ; bm am;i P fY 2 Um;i g ; bm
Then, if X st Y, E[
(X)] E [ (Y)] :
MULTIVARIATE STOCHASTIC ORDERINGS
109
3.2.2 Multivariate likelihood ratio ordering
Also the notion of lr ordering admits the following natural extension to the multidimensional case (see Karlin and Rinott, 1980; Whitt, 1982). For two n-dimensional random vectors X; Y with joint densities fX and fY , the condition X lr Y is de ned as follows: De nition 3.20. X is smaller than Y in the likelihood ratio (written X lr Y) if, for any pair of vectors u,v 2 Rn fY (u _ v)fX (u ^ v) fX (u)fY (v) (3.4) where u ^ v (u1 ^ v1 ; :::; un ^ vn ); u _ v (u1 _ v1; :::; un _ vn ):
Remark 3.21. Note that, by taking u v, from the de nition of lr , we immediately obtain
fX (v) fX (u) : fY (v) fY (u)
(3.5)
In the one-dimensional case such a condition just amounts to state X lr Y. In the multivariate case, on the contrary, the two condition are in general not equivalent. As it is easy to show, the condition (3.5), however, implies that X lr Y if X is MTP 2 , according to the next De nition 3.38 (see Kochar, 1999, p. 351); the condition implies that X st Y if X is associated, according to the next De nition 3.37 (see Shaked and Shanthikumar, 1994, p. 118). A fundamental property is the following result, which shows that the lr order is maintained under conditioning upon a suitable class of positive probability events (see e.g. Whitt, 1982 or Shaked and Shanthikumar, 1994, Theorem 4.E.1). The set A Rn is said to be a lattice if the following implication holds: ; 0 2 A ) ^ 0 2 A; _ 0 2 A: (3.6) Theorem 3.22. Let X lr Y and let A be a subset with the lattice property and such that P fX 2 Ag > 0; P fY 2 Ag > 0. Then
XjX 2 A lr YjY 2 A: A further fundamental property of the lr order lies in that it is maintained when passing to marginal distributions: if X lr Y and I f1; :::; ng then XI lr YI :
110
SOME CONCEPTS OF DEPENDENCE AND AGING
3.2.3 Multivariate hazard rate and cumulative hazard rate orderings Here we recall the de nitions of two other concepts of multivariate stochastic ordering, namely the multivariate hazard rate ordering and the cumulative hazard rate ordering. In order to de ne them we have to look at vectors of non-negative random quantities, thought of as lifetimes of individuals U1 ; :::; Un . Their probability distributions are assumed to be absolutely continuous. Let Y and X be two n-dimensional vectors of life-times as above and let j (j) and j(j) be the corresponding multivariate hazard functions. De nition 3.23. X is smaller than Y in the multivariate hazard rate ordering (written X hr Y) if 8u > 0,
rjJ (ujxJ ) rjI (ujyI ); whenever I J , r 2= J , xi yi for all i 2 I . In order to de ne the cumulative hazard ordering the following two de nitions are needed.
De nition 3.24. For a dynamic history ht fXI = xI ; XIe > teg, where I fi1; :::; ijI jg, xi1 xi2 ::: xijIj , the cumulative hazard of a component i 2 Ie at time t is de ned by ijI (tjxI ) = jI j Z xij X j =2 xij;1
Z xi1 0
ij; (u)du+
ijfi1 ;:::;ij;1 g (ujxi1 ; xi2 ; :::; xij;1 )du +
Zt xijI j
ijI (ujxI )du
De nition 3.25. For two dynamic histories ht fXI = yI ; XIe > teg; h0s fXJ = xJ ; XJe > seg with xI < te, yJ < se, we say that ht is less severe than h0s whenever 0 < t s; I J; xI yI : Let X (X1 ; :::; Xn ) and Y (Y1 ; :::; Yn ) be two n-dimensional vectors of lifetimes and let j(j) and j(j) be the corresponding cumulative hazard functions.
MULTIVARIATE STOCHASTIC ORDERINGS
111
De nition 3.26. X is smaller than Y in the cumulative hazard rate ordering (written X ch Y) if 8t > 0, rjJ (tjxJ ) rjI (tjyI ) whenever I J; r 2= J; xi yi for all i 2 I; i.e. when ht fYI = yI ; YIe > teg is less severe than
h0t
fXJ = xJ ; YJe > teg:
3.2.4 Some properties of multivariate stochastic orderings and examples
Remark 3.27. Consider the special case when both the vectors X (X1; :::; Xn) and Y (Y1 ; :::; Yn ) have stochastically independent components.
Denote by F i the one-dimensional survival functions of Yi (i = 1; :::; n) and by Gi the one-dimensional survival functions of Xi (i = 1; :::; n), respectively. Assume that F i and Gi , respectively, admit density functions fi , gi . Denote by ri , qi their hazard rate functions and by Ri , Qi their cumulative hazard functions. Which are the meanings of the above notions of multivariate ordering in such a case? One can easily check the following: X lr Y if and only if Xi lr Yi ; i = 1; :::; n: X hr Y if and only if qi (u) ri (u); 8u 0; i = 1; :::; n: X ch Y if and only if Ri (u) Qi(u); 8u 0; i = 1; :::; n: X st Y if and only if Xi st Yi ; i = 1; :::; n: Note, however, that the condition Ri (u) Qi (u); 8u 0 is equivalent to Xi st Yi . Then the two conditions X ch Y and X st Y are equivalent in this special case. A fundamental fact, in general, is the following chain of implications. Theorem 3.28. (See Shaked and Shanthikumar, 1994). Let Y and X be two non-negative random vectors with absolutely continuous distributions. Then the condition X lr Y implies X hr Y. X hr Y implies X ch Y. X ch Y implies X st Y. Remark 3.29. The above chain of implications is analogous to the one holding for univariate orderings where, however, the two concepts X ch Y and X st Y do coincide. When X and Y are not non-negative vectors, one can in any case establish that X lr Y implies X st Y.
112
SOME CONCEPTS OF DEPENDENCE AND AGING
The de nitions given so far will now be illustrated by means of examples and counter-examples, taken from exchangeable models considered in Chapter 2. Example 3.30. (hr and lr comparisons ). In Example 2.37 of Chapter 2, we saw that, for the special Ross model characterized by a system of m.c.h.r's of the form
(tn;h) (t1 ; :::; th ) = n ; h ;
n
the corresponding joint density is given by f (n) (t1 ; :::; tn ) = n! expf; t(n) g: Consider now two such models corresponding to a pair of dierent values and b, with 0 < < b, and compare f (n)(t) with fb(n)(t) and f(tk) ()g with fb(tk) ()g. Let T and Tb denote two vectors of random lifetimes with joint density ( n ) f (t) and fb(n) (t) , respectively. It is easy to check that Tb hr T and Tb lr T. Example 3.31. (Comparisons in cases of negative dependence). Let T (T1 ; T2 ) have an exchangeable joint density characterized by the set of m.c.h.r. functions + a (1) (2) t = 2 ; t (t1 ) = ; where a > 0. When a > , we have a case of negative dependence. Consider now Tb (T1 ; T2 ) with a joint distribution de ned similarly but with b + a (1) b(2) t = 2 ; t (t1 ) = b; (2) (1) (1) where b > . Since (2) t < bt and t (t1 ) < bt (t1 ), 80 t t1 , we can wonder whether it is Tb hr T. This is not necessarily the case, since the condition Tb hr T also requires that (2) t bt (t1 ), 8 0 < t1 t, and this condition is not satis ed if a > 2b ; . Denote by FbR;0 the joint predictive survival function of T (T1 ; :::; Tn ) in the statistical model of \proportional hazards" considered in Examples 2.18 and 2.40. is a non-negative parameter with a prior density 0 () and T1 ; :::; Tn are conditionally i.i.d given , with a conditional distribution of the form G(sj) = expf; R(s)g; i.e.
FbR;0 (t1 ; :::; tn )
Z1 0
expf;
n X i=1
R(ti )g0 ()d:
(3.7)
MULTIVARIATE STOCHASTIC ORDERINGS
113
We saw that, given a history ht fTI = tI ; TIe > teg, the posterior density of is given by
(jht ) / h expf;[
h X i=1
R(ti ) + (n ; h)R(t)]g0 ()
and the multivariate conditional hazard rate for a surviving individual Uj is
j (tjht ) = (tn;h) (t1 ; :::; th ) = r(t) E (jht ); where r(t) R0 (t).
(3.8)
In what follows we show that, by comparing distributions of the form (3.7), we can get further examples and counter-examples concerning notions of multivariate stochastic orderings. Example 3.32. ( hr comparison). Consider two dierent initial densities 0 () and e0 () with e() lr 0 (). By using arguments similar to those in Remark 3.14, we see that, for any history ht ,
e(jht ) lr (jht ); where
X e(jht ) / h expf;[ R(ti ) + (n ; h)R(t)]g e0 (): h
i=1
On the other hand, by Remark 3.15, we can also see that if we consider a dierent history h0t , with ht less severe than h0t , then we obtain
(jht ) lr (jh0t ); e(jht ) lr e(jh0t )
whence
e(jht ) lr (jh0t ): By taking into account the expression (3.8) for the multivariate conditional hazard rate, we can conclude
FbR;0 hr FbR;e0
Example 3.33. Here we continue the example above, considering the special case of a comparison between gamma distributions for ; more precisely we take
0 () / 1[>0];1 expf; g
114
SOME CONCEPTS OF DEPENDENCE AND AGING
and
e0 () / 1[>0];1 expf; 0 g: In order to get the condition e() lr 0 (), we set 0 > . By specializing the formula (2.29) of Chapter 2, we see that, with the above choice for 0 ()and e0 (), the joint density functions of FbR;0 and FbR;e0 respectively are
fR;0 (t1 ; :::; tn ) / [ + Pn 1R(t )]+n ; i=1 i fR;e0 (t1 ; :::; tn ) / [ 0 + Pn 1 R(t )]+n i=1
i
It can be easily checked that the inequality (3.4) holds and then FbR;0 lr
FbR;e0 :
Example 3.34. For an arbitrarily xed prior probability density 0 over [0; +1), and two absolutely continuous cumulative hazard functions R(s), Re(s) consider the two joint distributions identi ed by FbR;0 and FbRe;0 . One can check (see Exercise 3.84) that the following implication holds true:
R(s) Re(s) ) FbR;0 st FbRe;0 : However the inequality R(s) Re(s) does not necessarily guarantee the condition FbR;0 hr FbRe;0 . Indeed, consider again 0 () / 1[>0];1 expf; g and the histories ht = h0t fTI = tI ; TIe > teg. We can write, with obvious meaning of notation,
j (tjht ) = (tn) = (t) E (jht ) = + n(t)R(t) ej (tjht ) = e(tn) = e(t) E (jht ) = e(t)e : + n R(t) Of course we can nd functions R and Re such that R Re and
(t) e(t) + n R(t) < + n Re(t) ;
for some t > 0.
SOME NOTIONS OF DEPENDENCE
115
3.3 Some notions of dependence
Informally, a notion of dependence for a random vector X (X1 ; :::; Xn ) can be looked at as follows: it is a property concerning stochastic orderings and stochastic monotonicity for conditional distributions of a set of coordinates, given the observation of various types of events concerning X. Dependence conditions, on the other hand, may give rise to inequalities between dierent distributions having the same set of (one-dimensional) marginal distributions. The concept of copula has fundamental importance in this respect (see e.g. Nelsen, 1999). Actually a property of dependence is one of the probability distributions of X. Denote by D the class of all probability distributions on Rn which share a xed dependence property D. Since we usually identify the distribution of X by means of its joint survival function F , it will be also written F 2 D to mean that the random vector X satis es the property D.
3.3.1 Positive dependence
In what follows we recall the de nitions of some concepts of positive dependence. Let us start with those which are mostly well known.
De nition 3.35. X is positively correlated, written F 2 PC , if Cov(Xi ; Xj ) 0 for any pair 1 i = 6 j n, provided the covariances do exist. De nition 3.36. X is positively upper orthant dependent, written F 2 PUOD , if
\n
Yn
i=1
i=1
P f (Xi > xi )g
P f(Xi > xi )g
for any choice of xi 2 R (1 i n). In other words F 2 PUOD if and only if random vectors of the form (1(X1 >x1 ) ; :::; 1(Xn >xn) ) are positively correlated. Similarly, one de nes the property F 2 PLOD (positively lower orthant dependent ).
De nition 3.37. X is associated, written F 2 Assoc , if, for any pair of nondecreasing functions , (; : Rn ! R), one has that the pair ( (X) ; (X)) is positively correlated.
116
SOME CONCEPTS OF DEPENDENCE AND AGING
Some other notions of positive dependence can directly originate from notions of stochastic ordering. Consider in particular the multivariate stochastic orderings lr , hr , and ch de ned in the previous section. None of these is actually an ordering in the common sense; indeed none of them has the property of being re exive : we can easily nd random vectors X which do not satisfy X lr X; X hr X; or X ch X: The condition
X X; where stands for the symbols lr, or hr, or ch, is a condition of positive dependence for X and this then leads us to the three dierent concepts de ned next. De nition 3.38. (Karlin and Rinott, 1980). The density of X is multivariate totally positive of order 2 written F 2 MTP2 , if X lr X, namely if F is absolutely continuous and its density is such that
f (u ^ v)f (u _ v) f (u)f (v)
(3.9)
De nition 3.39. (Shaked and Shanthikumar, 1987). A non-negative random vector X is hazard rate increasing upon failure, written F 2 HIF , if X hr X namely if
ijI (tjxI ) ijJ (tjyJ ) whenever I J; i 2= J; t eI > xI yI ; yJ ;I t eJ ;I : And nally, in an analogous way, the following de nition can be given (see Norros, 1986; Shaked and Shanthikumar, 1990). De nition 3.40. A non-negative random vector X has supportive lifetimes, written F 2 SL , if X ch X. In order to de ne a last concept of positive dependence, we consider a nonnegative random vector X and compare two dierent histories ht and h0t of the form
fXI = xI ; XIe > teg;
(3.10)
fXI = xI ; Xi = t; XIe ;fig > teg
(3.11)
ht h0t
where i 2 Ie
SOME NOTIONS OF DEPENDENCE
117
De nition 3.41. A non-negative random vector X is weakened by failures, written F 2 WBF , if L[(XIe;fig ; te)jXI = t I ; Xi = t; XIe;fig > te] st st L[(XIe;fig ; te)jXI = t I ; XIe > te] For more details, see Shaked and Shanthikumar (1990); see also Arjas and Norros (1984) and Norros (1985) for dierent formulations of this concept. Among the notions listed so far there is a chain of implications explained by the following result.
Theorem 3.42. For a non-negative random vector X with an absolutely continuous distribution, one has
MTP2 ) HIF ) SL ) WBF ) Association ) PUOD ) Positive Correlation: Proof. The chain of implications MTP2 ) HIF ) SL is a direct consequence of De nitions 3.37, 3.38, 3.39 and Theorem 3.28. The implication SL ) WBF can be seen as a corollary of Theorem 3.43 below. The implication WBF ) Association is valid under the condition of absolute continuity (see Arjas and Norros, 1984). The implication Association ) PUOD is an obvious and very well-known consequence of the de nition of Association. The implication PUOD ) Positive Correlation is achieved by taking into account the identities
E (Xi Xj ) =
Z 1Z 1 0
for 1 i 6= j n and
E (Xi ) =
0
Z1 0
P fXi > u; Xj > vgdudv
P fXi > ugdu:
The following result shows that conditions of MTP2, HIF, and SL can be characterized in a way which is formally similar to the de nition of WBF.
118
SOME CONCEPTS OF DEPENDENCE AND AGING
Theorem 3.43. (Shaked and Shanthikumar, 1990). Let X be a non-negative random vector with absolutely continuous distribution. Then a) X is SL if and only if L[(XIe;fig ; te)jXI = tI ; Xi = t; XIe;fig > te] ch L[(XIe;fig ; te)jXI = tI ; XIe > te] b) X is HIF if and only if L[(XIe;fig ; te)jXI = tI ; Xi = t; XIe;fig > te] hr L[(XIe;fig ; te)jXI = tI ; XIe > te] c) X is MTP2 if and only if L[(XIe;fig ; te)jXI = tI ; Xi = t; XIe;fig > te] lr L[(XIe;fig ; te)jXI = tI ; XIe > te] Also for the concepts of positive dependence presented in this section, we can get concrete examples by analyzing, once again, the special exchangeable models introduced in Chapter 2. Example 3.44. (Proportional hazard models). Consider a proportional hazard model F R;0 . This is MTP2 for any choice of the initial density 0 as an application of the Theorem 3.59 below. Example 3.45. (Schur-constantPndensities). Let T (T1; :::; Tn) have a Schurconstant density: fR (n) (t) = n ( i=1 ti ) for a suitable function n : R+ ! R + . When n (t) = 0+1 n expf;tg0()d, this is a proportional hazard model F R;0 , with R(t) = t, and then it is MTP2 : On the other hand note that T is MTP2 if and only if n X
!
!
n ! X
n ! X
(xi ^ yi ) n xi n yi i=1 i=1 i=1 P P P P Then, since ni=1 xi _ yi ; ni=1 xi = ni=1 yi ; ni=1 xi ^ yi ;we see that X is MTP2 if and only if n is log-convex.
n
i=1
(xi _ yi ) n
n X
As a matter of fact, the latter condition is a necessary condition for T being WBF (see Caramellino and Spizzichino, 1996). Since MTP2 ) HIF ) SL ) WBF; we see that, in this special case, the properties MTP2 , HIF , SL, and WBF are equivalent. Example 3.46. (Hazards depending on the number of survivals). Consider the case of an exchangeable vector T where, for any I f1; 2; :::; ng, i 2= I , ijI (t) is a quantity non-depending on i and depending on I only through its cardinality jI j = n ; h: (tn;h) (x1 ; :::; xh ) = ijI (t) = 'h (t): We can of course expect that T satisfy some condition of positive dependence when 'h (t) is a non-decreasing function of h. By applying b) in Theorem. 3.43, one can actually check that, in such a case, T is HIF .
SOME NOTIONS OF DEPENDENCE
119
Example 3.47. Here we go back to the case of units undergoing a change of stress level at a random time : By Proposition 2.43, we can get
(tn;h) (t1 ; :::; th ) = r1 (t) P f > tjT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg+ +r2 (t) P f tjT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg: Under the condition r1 (t) r2 (t) we have 0
(tn;h) (t1 ; :::; th ) (tn;h ) (t01 ; :::; t0h0 ) for two dierent observed histories ht h0t
fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg
fT(1) = t01 ; :::; T(h0) = t0h ; T(h0 +1) > tg;
if and only if
P f > tjht g P f > tjh0t g (3.12) We now want to show that X is WBF . We must then check that the inequality
(3.12) holds when
ht h0t
fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg
fT(1) = t1 ; :::; T(h) = th ; T(h+1) = th+1 ; T(h+2) > tg:
This can be seen by using arguments similar to those of Remark 3.15 and applying Theorem 3.63 below (Exercise 3.86). It is worth remarking that checking the WBF property directly requires lengthy computations even in the simplest case when is an exponential time and the two subsequent stress levels do not depend on the age of units, namely r1 (t) = 1 ; r2 (t) = 2 . Remark 3.48. If were known, then the m.c.h.r functions (tn;h)(t1 ; :::; th) in the example above would be clearly equal to r1 (t) or to r2 (t) according to whether > t or t. Then, conditionally on , (tn;h) (t1 ; :::; th j) would neither depend on t1 ; :::; th nor on h; indeed the lifetimes T1; :::; Tn are conditionally independent (identically distributed) given .
120
SOME CONCEPTS OF DEPENDENCE AND AGING
Example 3.49. (Common mode failures). Consider the case of lifetimes T1; :::; Tn where Tj = Vj ^ U with V1 ; :::; Vn i.i.d. lifetimes and U a random time independent of V1 ; :::; Vn , denoting the time to the arrival of a shock. Of course we expect that T1; :::; Tn are positively dependent, due to the common dependence on U . It is easy to see that they are associated, as a consequence of Properties 2 and 4 in Remark 3.50 below. Remark 3.50. The concept of association has important structural properties,
whose applications in the reliability analysis of coherent systems are very well known (see Barlow and Proschan, 1975). The same properties hold, however, also for the stronger concept of MTP2 . In particular, letting the symbol * stand for \associated" or for \MTP2 " (* is seen as a condition on a random vector or on a set of random variables), one has Property 1. Any single random variable is *. Property 2. The union of independent * random vectors is *. Property 3. A subset of a * set of random variables is * (i.e., the property * is maintained for marginal distributions). Property 4. If X (X1 ; :::; Xn ) is * and 1 ; :::; m : Rn ! R are increasing functions, then ( 1 (X); :::; m (X)) is *.
Remark 3.51. As self-evident, there are some basic dierences between the notions of positive correlation, PUOD, association and MTP2 on one side, and notions of WBF, SL, and HIF on the other side. The latter notions are of \dynamic type", i.e. they are formulated in terms of the conditional distribution of residual lifetimes (Xj ; t); j 2 Ie, given an observed history of the form (3.2) with (I f1; 2; :::; ng). These notions, which are essentially considered only for the case of non-negative random vectors, have been de ned in the frame of reliability theory and survival analysis and their interest has been essentially limited to such a context. On the contrary the other notions are of interest in all elds of application of probability. It is interesting to remark that, in particular, the notions of MTP2 and association have been dealt with (under dierent terminology and almost independently up to a certain extent) in the two dierent elds of reliability and mechanical statistics. The implication MTP2 )Association dates back to Sarkar (1969), Fortuin, Ginibre and Kasteleyn (1971), and Preston (1974), see also the book by Ligget (1985).
3.3.2 Negative dependence
Here we mention a few de nitions of negative dependence. Also for this case, many de nitions have been presented in the literature; we shall only consider some which may be seen as the negative counterparts of De nitions 3.36, 3.37, 3.38 and, nally, the \condition N".
SOME NOTIONS OF DEPENDENCE
121
Examples of exchangeable distributions which exhibit negative dependence properties will be given in some of the exercises at the end of this chapter. De nition 3.52. X1; :::; Xn are negatively upper orthant dependent, written F 2 NUOD , if
\n
Yn
i=1
i=1
\n
n Y
i=1
i=1
P f (Xi > xi )g
P f(Xi > xi )g
for any choice of xi 2 R(1 i n). In other words F 2 NUOD if and only if random vectors of the form (1(X1 >x1 ) ; :::; 1(Xn >xn) ) are negatively correlated. X1 ; :::; Xn are negatively lower orthant dependent, written F 2 NLOD , if
P f (Xi < xi )g
P f(Xi < xi )g
X1 ; :::; Xn are negatively orthant dependent, written F2 NOD , if they are both NUOD and NLOD. De nition 3.53. (Joag-Dev and Proschan, 1983). X1; :::; Xn are negatively associated, written F 2 NA , if for disjoint subsets I and J of f1; :::; ng and increasing functions and , Cov((XI ); (XJ )) 0:
De nition 3.54. (Karlin and Rinott, 1980b). Let X1; :::; Xn admit a joint density function f . f is multivariate reverse regular of order 2 (MRR2 ), if
f (u ^ v)f (u _ v) f (u)f (v)
(3.13)
A dierent concept which is similar to, but weaker than, the latter de nition is reverse regular of order 2 in pairs. De nition 3.55. (Block, Savits and Shaked, 1982). X (X1 ; :::; Xn) is RR2 in pairs if, for any 1 i 6= j n,
P fa Xi b; c Xj dg P fa0 Xi b0 ; c0 Xj d0 g
P fa Xi b; c0 Xj d0 g P fa Xi b; c0 Xj d0 g whenever b < a0 ; d < c0 . A stronger notion is, on the contrary, given in the following
122
SOME CONCEPTS OF DEPENDENCE AND AGING
De nition 3.56. (Block, Savits and Shaked, 1982). X (X1; :::; Xn) satis es condition N , written F 2 N , if there exist a real number s and n + 1 independent random variables S1 ; :::; Sn+1 , each with a log-concave density (or PF2 probability function) such that (X1 ; :::; Xn ) =st [(S1 ; :::; Sn+1 )j
n X i=1
Si = s]:
See the papers by Ebrahimi and Ghosh (1981); Karlin and Rinott (1980); Block, Savits and Shaked (1982) and Joag-Dev and Proschan (1983), for several properties of the above concepts and for further de nitions of negative dependence. In any case, some fundamental properties are the following: Property 1. A set of independent random variables is *, where * stands for NA, MRR2; NUOD, NLOD. Property 2. A subset of * random variables is *, where * stands for NA, NUOD, NLOD. Property 3. The union of independent sets of * variables is *, where * stands for NA; MRR2, NUOD, NLOD. Property 4. If X1 ; :::; Xn are * and 1 ,..., n are increasing functions then 1 (T1 ); :::; n (Tn )
are *, where * stands for NA; RR2 in pairs, NUOD; NLOD. Actually for NA something stronger happens: increasing functions de ned on disjoint subsets of a set of NA variables are NA.
3.3.3 Simpson-type paradoxes and aspects of dependence in Bayesian analysis
The present subsection will be devoted to the discussion of some aspects, concerning dependence, which arise when probability is understood in a subjective sense. These aspects will also be relevant in our discussion about concepts of multivariate aging. First we notice that exchangeable models are likely to manifest relevant properties of dependence: often cases of nite extendibility arise from negative, symmetric dependence (think for instance of the case in Example 2.4 of Chapter 2) while cases of in nite extendibility, i.e. cases of conditionally i.i.d quantities, may manifest some form of positive dependence (in this respect see Theorems 3.57-3.59 below). In general, we point out that the kind of dependence among variables of interest is sensible to the state of information of the observer about the same variables.
SOME NOTIONS OF DEPENDENCE
123
As already observed, dependence can be created as an eect of uncertainty about some relevant quantities. In particular variables which are conditionally independent given a quantity , are actually stochastically dependent in their (unconditional) distribution, unless the distribution of is degenerate. In Examples 3.44 and 3.47 above we found properties of positive dependence for conditionally i.i.d. lifetimes with conditional densities g(tj) satisfying certain conditions of stochastic monotonicity in . These provide instances of the following general fact: when X1 ; :::; Xn are conditionally independent given a nite-dimensional parameter , we can get positive dependence properties, by combining stochastic monotonicity conditions on the conditional distributions Gi (xj) with positive dependence conditions on (no such condition is needed when is a scalar). Some results in this direction are the following. Theorem 3.57. (Jogdeo, 1978). Let be an associated vector and let Gi (xj) be stochastically increasing in , with respect to the usual stochastic order. Then (X1 ; :::; Xn ) is associated. Theorem 3.58. (Shaked and Spizzichino, 1998). Let be a scalar and let Gi (xj) be absolutely continuous and stochastically increasing in , with respect to the one-dimensional hazard rate order. Then (X1 ; :::; Xn ) is WBF. Theorem 3.59. (Shaked and Spizzichino, 1998). Let be a scalar and let Gi (xj) be absolutely continuous and stochastically increasing in , with respect to the one-dimensional likelihood ratio order. Let also the support of (X1 ; :::; Xn ) be a lattice. Then (X1 ; :::; Xn ) is MTP2 . Theorems 3.58 and 3.59 can be extended to the case when is multidimensional, by adding suitable MTP2 properties for the joint distribution (see Shaked and Spizzichino, 1998). On this theme see also Cinlar, Shaked and Shanthikumar (1989) and Szekli (1995). Exercises 3.87, 3.88, 3.94 provide examples related to the above results. Remark 3.60. In conclusion we see that cases of positive dependence arise from situations of conditional independence, given some relevant parameters, and are due just to lack of knowledge about the same parameters. We are particularly interested in the cases of lifetimes T1; :::; Tn , which, conditionally on , are identically distributed, besides being independent; this is then the case when Gi (j) is independent of the index i. Notice that in such situations the one-dimensional marginal of the predictive survival function
Z
F (n) (t1 ; :::; tn ) = G(t1 j):::G(tn j)d0 ()
is the mixture-type survival function
Z
F (1) (t) = G(tj)d0 ():
124
SOME CONCEPTS OF DEPENDENCE AND AGING
This will have an eect on the study of multivariate aging properties of F (n) (see Subsection 4.3). Notice also that these situations of positive dependence can be viewed as conceptually dierent from other cases of positive dependence such as the one considered in Example 3.46 above (see also Remark 2.41 in Section 3 of Chapter 2). Consider now cases which are more general than conditional independence; again we may want to compare dependence properties of conditional distributions (given the values of relevant parameters) with dependence properties of the corresponding predictive (i.e. unconditional) distributions. In some cases situations of lack of knowledge about parameters may create some positive dependence; in other cases situations of lack of knowledge may destroy some types of dependence present in the conditional distributions. When the latter happens we say that we meet a "Simpson-type paradox"; i.e. we say that a Simpson-type paradox appears when the conditional law of a random vector X exhibits a dependence property for every possible value of the conditioning random vector , but does not exhibit the same property unconditionally. More formally, let D be a property of dependence, D the family of joint distributions with the property D, and let (X; ) be a pair of random vectors; the paradox can then be described as follows:
L(Xj = ) 2 D ; 8 2 L; and L(X) 62 D ;
(3.14)
where L is the support of . In the language of Samuels (1993), this is an association distortion phenomenon. For more details and references about Simpson-type paradoxes see Scarsini and Spizzichino (1999); see also Aven and Jensen (1999). For the purpose of comparing dependence properties of conditional distributions (given the values of some relevant parameters) with dependence properties of the predictive (unconditional) distributions, it can be useful for instance to take into account the above Theorem 3.43 and to combine the formulae (2.52), (2.53), and (2.59) of Chapter 2. In order to speci cally analyze dependence for the predictive distributions we take into account monotonicity properties of (tn;h) (t1 ; :::; th j) and (tn) () and stochastic comparisons for the posterior distributions of the parameter given dierent observed histories. In this respect we note that, extending arguments presented in Remark 3.15, it is often possible to compare dierent posterior distributions in the sense of likelihood-ratio as will be discussed in the next subsection.
3.3.4 Likelihood-ratio comparisons between posterior distributions Likelihood-ratio orderings among posterior distributions can be obtained
SOME NOTIONS OF DEPENDENCE
125
by using orderings for the conditional distribution of data given the parameters (see e.g. Fahmi et al, 1982). Statements of this type were presented in Section 3.1 of this chapter (Remark 3.15), for the special case when both the parameter and the statistical observation X are scalar random quantities. De nitions of dependence presented so far put us in a position to extend such an analysis to the multivariate case. For the following three results we consider the case when (X; ) admits a joint density fX;(x; ), where (1 ; :::; d) is interpreted as a parameter and X (X1 ; :::; Xn ) as a vector of observations. 0 () will denote the prior density of ; fX (xj) will denote the conditional density of X given ( = ), and (jx) will denote the posterior distribution of , given a result (X = x). When X is a vector of random lifetimes, the notation (jht ) will be used for the posterior distributions of , given a history of the form (3.10). We prefer to postpone proofs of the following results to the end of this subsection. The following result is a slight modi cation of Theorem 3 in Fahmi et al. (1982). Here we assume that the set L f 2 Rd j0 () > 0g is a lattice (see (3.6)). Theorem 3.61. Assume that a) for suitable functions :Rn ! Rm , h:Rn ! R+ , and g : Rm+d ! R+ , fX (xj) has the form
fX (xj) = h(x) g( (x); ) b) g(q1 ; :::; qm ; 1 ; :::; d ) and 0 (1 ; :::; d ) are MTP2 . Then, for every x; x0 such that (x) (x0 ),
(jx) lr (jx0 ) Note that the condition a) is equivalent to (x) being a sucient statistic for . Of course one can have in particular m = n and (x) x. In our setting, i.e. in the case when X is a vector of lifetimes, one can need however to consider observed data of the form (3.10) rather than fX = xg. This motivates the interest for the following two results. For simplicity we restrict attention to the case d = 1 (similar results might be proved also in the case d > 1, but, also this time, at the cost of more involved arguments and of additional MTP2 conditions for related functions of (1 ; :::; d)). Theorem 3.62. Assume fX(xj) lr fX(xj0 ) for > 0 (i.e. X is increasing in , in the lr sense) and let h0t be more severe than ht . Then
(jht ) lr (jh0t ):
126
SOME CONCEPTS OF DEPENDENCE AND AGING
In the next result we consider the special case of more severeness de ned by (3.10) and (3.11) (a survival at t is replaced by a failure at t in the de nition of h0t ). This allows us to achieve a result similar to Theorem 3.62 under a weaker condition. The interest of such a result lies in that it can be combined with Theorem 3.43, in order to prove properties of positive dependence. Theorem 3.63. Let X (X1; :::; Xn) be increasing in in the multivariate hazard rate ordering; then
(jht ) lr (jh0t ):
Remark 3.64. It can be of more general interest, in Bayesian analysis, to as-
certain the existence of relations of stochastic orderings between two posterior distributions, corresponding to two dierent sets of data; such stochastic orderings can be used to obtain orderings between Bayes decisions, as we shall sketch in the last chapter. In particular, likelihood ratio orderings (and the related variation sign diminishing property) have a very important role in decision problems, as pointed out in the classical paper by Karlin and Rubin (1956). We now turn to proving the results enunciated above. Theorem 3.61 Proof. First we remark that, by Bayes' formula, one has
(jx) = K (x)0 ()g( (x); )
where K (x) = RL 0 ()g1( (x);)d . Taking into account a), b) and that (x) (x0 ), we can obtain, for any , 0 2 L,
( _ 0 jx)( ^ 0 jx0 ) = K (x)K (x0 )0 ( _ 0 )0 ( ^ 0 ) g( (x) _ (x0 ); _ 0 )g( (x) ^ (x0 ); ^ 0 ) K (x)K (x0 )0 ()0 (0 ) g( (x); )g( (x0 ); 0 ) = (jx)(0 jx0 ): Theorem 3.62
SOME NOTIONS OF DEPENDENCE Proof. Let ht
127
fXI = xI ; XJ > teJ ; X] I [J > te] I [J g
h0t
fXI = yI ; XJ = yJ ; X] I [J > te] I [J g; where 0 < yi xi < t , for i 2 I; yj t; for j 2 J .
We can write (jht ) / (jX] I [J > te] I [J )fXI (xI jX] I [J > te] I [J ; )
Z1 Z1 t
:::
t
fXJ (J jXI = xI ; X] I [J > te] I [J ; )dJ
(jh0t ) / (jX] I [J > te] I [J )fXI (yI jX] I [J > te] I [J ; ) fXJ (yJ jX I = yI ; X] I [J > te] I [J ; ):
We should check that ((jjhh0tt)) is a non-decreasing function of . Now (jht ) = Z 1 ::: Z 1 z ( ; ; x ; y ; y )d J I I J J (jh0t ) t t where z (J ; ; xI ; yI ; yJ )
fXI (xI jX] I [J > te] I [J ; )fXJ (J j XI = xI ; X] I [J > te] I [J ; ) fXI (yI j X] I [J > te] I [J ; )fXJ (yJ j XI = yI ; X] I [J > te] I [J ; ) f (x , jXI [J > te] I [J ; ) : = f XI ;XJ (y I, yJ jX] > te ; ) XI ;XJ I
J
] I [J
] I [J
Since, for > 0 , fX (xj) lr fX (xj0 ) it also follows that 0 fX (jX] I [J > te;) lr fX (jX] I [J > te; ); by Theorem 3.22. Then, by the fundamental property that lr is maintained for marginal distributions, it is, for > ' 0 fXI ;XJ (xI , J jX] I [J > te] I [J ; ) lr fXI ;XJ (xI , J jX] I [J > te] I [J ; ) : By taking into account Remark 3.21, we can conclude the proof by noting that z (J ; ; xI ; yI ; yJ ) must be a non-decreasing function of , since xI yI , and J te yJ .
128
SOME CONCEPTS OF DEPENDENCE AND AGING
Theorem 3.63 Proof. Note that, by de nition of m.c.h.r., the likelihood functions respectively associated to the observations ht and h0t are such that Lh0t () = ijI (tjxI ; )Lht (). X increasing in in the multivariate hazard rate ordering in particular implies that ijI (tjxI ; ) is a non-increasing function of . 0 Then ((jj hhtt)) is non-increasing in .
3.4 Some notions of aging Notions of aging are introduced, roughly speaking, to compare conditional survival probabilities for residual lifetimes of the type (3.1), for dierent ages of some of the surviving individuals. Some of these notions will be recalled in this section. Our main purpose is to highlight part of the motivations for introducing notions of multivariate aging, that we shall call \Bayesian", in the analysis of exchangeable lifetimes; Bayesian notions will be studied in the next chapter. Henceforth, we rst recall some well-known concepts of one-dimensional aging; we also see the role of notions of univariate stochastic orderings in characterizing some of these concepts. Later, we shall see how multivariate concepts of orderings, introduced in Section 3.2, are used to formulate some de nitions of aging for vectors of random lifetimes. In the last subsection a short discussion will be presented about the range of applicability of such notions, in the frame of exchangeable lifetimes.
3.4.1 One-dimensional notions of aging
Let T be a lifetime with survival function F (t) and let
F s (t) = F (t + s) = P fT ; s > tjT > sg F (s) denote the survival function of T ; s, the residual lifetime at age s, conditional on survival at s. The following de nitions are well known: The distribution of T is IFR (increasing failure rate) if
F s (t) is non-increasing in s; 8t > 0 or, equivalently,
F is log-concave
(3.15)
SOME NOTIONS OF AGING
129
The distribution of T is DFR (decreasing failure rate) if
F s (t) is non-decreasing in s; 8t > 0
(3.16)
or, equivalently,
F is log-convex The distribution of T is IFRA (increasing failure rate in average) if
;logF (t) is non-decreasing t
(3.17)
The distribution of T is NBU new better than used) if
F s (t) F (t)
(3.18)
The distribution of T is NWU (new worse than used) if
F s (t) F (t)
(3.19)
The properties (3.15), (3.17), and (3.18) de ne notions of (one-dimensional) positive aging. Of course (3.15) is a special case of (3.17) which, in its turn, is a special case of (3.18); (3.16) is a special case of (3.19). If the distribution admits a density function f , and then a failure rate function r(t), properties (3.15), (3.16) are equivalent to r(t) being non-decreasing and non-increasing, respectively. Fix now a pair s00 > s0 > 0 and consider the conditions a) F s0 (t) st F s00 (t); 8t > 0 and b) F s0 (t) hr F s00 (t); 8t > 0 a) is equivalent to 00 00 a') F (s 0 + t) F (s 0 ) ; 8t > 0 F (s + t) F (s )
while b) is equivalent to 00 b') F (s 0 + t) decreasing in t: F (s + t)
130
SOME CONCEPTS OF DEPENDENCE AND AGING
Obviously b') is a stronger condition than a'). However the condition that a') hold for any pair 0 s0 < s00 is easily seen to be equivalent to the condition that b') hold for any pair 0 s0 < s00 and both are equivalent to the IFR property of F . We can then conclude by noticing that the inequalities
L[(X ; t)jX > t] hr L[(X ; t0 )jX > t0 ] (0 t < t0 ):
(3.20)
L[(X ; t)jX > t] st L[(X ; t0 )jX > t0 ] (0 t < t0 ):
(3.21)
and
are both equivalent to the IFR property of F . This argument suggests, in the case of a lifetime T with a probability density function f , to consider the following stronger concept of positive aging: L[(T ; t)jT > t ] lr L[(T ; t0 )jT > t0 ]; for 0 t < t0 : (3.22) The comparison (3.22) is equivalent to the condition that the distribution of T is PF2 (Polya frequency of order 2) namely that the density f is a log-concave function on [0; 1). We note also that the notions of NBU and NWU can be rephrased in terms of stochastic orderings; indeed T is NBU (NWU) if and only if L(X ) st (st ) L[(X ; t)jX > t]; for 0 < t: (3.23) Let us now think of the case when the survival function of a lifetime T is assessed conditionally on the value of some parameter ; denoting the conditional survival function by G (j) and assuming that is a random quantity, taking values in a set L with a prior density 0 (), we obtain (recall Example 2.9, Chapter 2) that the \predictive" survival function of T is
F (s) =
Z
L
G (sj) 0 () d
(3.24)
and we can write the predictive survival function of the residual lifetime at age s in the form
F s ( t) = where
Z
L
Gs (tj) s (jT > s) d
Gs (t) = G(t + sj) , s (jT > s) = R G (sj) 0 () d : G(sj) L G (sj) 0 () d
SOME NOTIONS OF AGING
131
A central point is the comparison between aging properties of the conditional survival functions G (tj) ; 2 L; and of the predictive survival function F (t). The following facts are well known and can have a big impact in the applications Proposition 3.65. (see Barlow and Proschan, 1975). If G (tj) is DFR 8 2 L, then F (t) is DFR.
Remark 3.66. Being G (tj) IFR 8 2 L, on the contrary, does not imply that F (t) is IFR.
This fact provides an explanation of an apparent paradox that often arises in engineering practice (see e.g. Proschan, 1963 and Barlow, 1985). It formally describes the following case: we start with a \pessimistic" state of information about (i.e. we expect that is such as to prevent a large value of the lifetime T ); pessimism is contradicted by progressive observation of survival, so that we may become more and more optimistic, even if, per se, increasing age deteriorated the unit. These facts can be e.g. illustrated by means of the example considered in Exercise 2.64 where the conditional failure rates are r(tj) = t" and the predictive failure rate is r(t) = t =( + t1+ ). For > 0, r(tj) are increasing for all t > 0, > 0 while r(t) is increasing in a neighborhood of the origin and decreasing for t large enough. For ;1 < < 0, r(tj) are decreasing for all t > 0, > 0 and r(t) is decreasing as well. The following remarks are relevant for the arguments that will be developed in the next chapter.
Remark 3.67. Each of the notions IFR, DFR, NBU, NWU can be also looked
on as an appropriate inequality for the corresponding joint distribution of n i.i.d. lifetimes. Let us focus attention, for instance, on the notion of IFR and consider lifetimes T1 ; :::; Tn which are i.i.d with a common survival function G(): G() being IFR is equivalent to requiring
P fT1 > s1 + jDg P fT2 > s2 + jDg
(3.25)
where D fT1 > s1 ; T2 > s2 ; :::; Tk > sk; Tk+1 = tk+1 ; :::; Tn = tn g and whenever s1 < s2 . In the more general case when T1 ; :::; Tn are exchangeable, the IFR property of their marginal one-dimensional distribution is not anymore equivalent to the inequality (3.25). However we can notice that, in the case of stochastic dependence, we are, in general, more interested in the validity of (3.25) rather than in the IFR property of the marginal. We can say that, when analyzing a vector of several exchangeable lifetimes, the marginal IFR property is really relevant only in the case of independence.
132
SOME CONCEPTS OF DEPENDENCE AND AGING
Remark 3.68. Proposition 3.65 and Remark 3.66, respectively, can be rephrased in the following equivalent ways: if G (tj) is log-convex 8 2 L, then F (t) in (3.24) is log-convex for any initial density 0 G (tj) being log-concave 8 2 L does not imply that F (t) in (3.24) is log-concave. The theme of aging properties of one-dimensional mixture distributions is of great interest in the eld of reliability and survival analysis; there is then a very rich and still increasing literature on this topic. Many authors focused attention on relations between the behavior of failure rate of mixtures and aging properties of single components of the mixture; this has both an interest in the theory and in the applied statistical analysis. We refer in particular to Manton, Stallard and Vaupel (1979), Vaupel and Yashin (1985), Hougaard (1986), Cohen (1986), Gurland and Sethuraman (1995), among many others; see also the review paper by Shaked and Spizzichino (2001). In such a frame it is particularly important, however, to keep in mind that, under suitable conditions, a mixture of IFR distribution can still be IFR itself; sucient conditions in this sense are shown by Lynch (1999). In the eld of reliability, the study of aging properties of mixtures is relevant in the analysis of burn-in problems (see e.g. the review paper by Block and Savits, 1997). In this respect, the discussion in the following example is of interest for our purposes.
Example 3.69. (The burn-in problem). Let U be a unit to be put into operation; T denotes its lifetime and its survival function is F (s). We get a reward K under the condition that, in its operative life, it survives a mission time ; but we incur a loss C if the operative life is smaller than . We can decide not to put U into operation (i.e. to discard it) and this causes a cost c (0 < c < K < C ); if we decide that it is worth it to put U into operation, we may prefer to deliver it to operations after a burn-in period. If we choose as the duration of the burn-in period, then the length of the operative life (under the condition T > ; of course) will be the residual lifetime (T ; ) and its (conditional) survival function is F (t) P fT ; > tjT > g = F (+t) F () Suppose, moreover, that there is no cost for conducting the burn-in test and that we incur the cost c if T < , namely if U fails during the test (i.e. this would have the same economical eect as the decision of discarding U ). The expected cost coming from the choice of a duration for the test would
SOME NOTIONS OF AGING
133
then be
= c P fT < g + C P f < T < + g ; K P fT > + g = c + (C ; c)F () ; (K + C )F ( + ) and the optimal (i.e. Bayes) duration is such to minimize (see details in Chapter 5). We can have three cases a) (C ; c)F () ; (K + C )F ( + ) > 0; 8 > 0 b) inf (C ; c)F () ; (K + C )F ( + ) =
0
(C ; c) ; (K + C )F ( ) < 0 c) For some > 0, it is (C ; c)F () ; (K + C )F ( + ) < minf(C ; c) ; (K + C )F ( ); 0g In case a) it is optimal to discard U ; in case b) it is optimal to deliver immediately U to operations; in case c) we can nd > 0 such that it is optimal to conduct a burn-in test of duration and then deliver U to operation if it survives the test. Trivially c) can never apply if F (s) is NBU; it may apply in other cases, for instance when F (s) is NWU. In particular it may be optimal to conduct a burn-in procedure when T is conditionally exponential given a parameter ; i.e.
F (s) =
Z1 0
expf;sg0()d:
Reliability engineers may think that it is senseless to burn-in a unit with an exponential lifetime, so that the latter conclusion may appear contradictory. The question is: \ does it really make a dierence if is known or unknown?" Of course it makes a great dierence in the Bayesian paradigm. Indeed the latter prescribes that decisions are to be taken by minimizing expected loss, which is to be computed taking into account one's own utility function and prediction based on actual state of information (see Section 1 of Chapter 5). This means the following:
134
SOME CONCEPTS OF DEPENDENCE AND AGING
Consider two individuals I1 and I2 ; both sharing the same utilities and convinced that the distribution of T is in uenced by the factor . However, according to I1 's initial state of information, the density of is (1) and the conditional survival function of T given ( = ) is G(1) (sj) while I2 assesses (2) and G(1)(2) (sj) respectively. If, by chance, it happens that
Z1 0
G(1) (sj)(1) ()d =
Z1 0
G(2) (sj)(2) ()d; 8s > 0
(3.26)
then, in a Bayesian viewpoint, I1 and I2 should make the same decision as to the problem of considering a burn-in procedure, irrespective of their dierent opinions. Then unidimensional aging properties of G(i) (sj) have not a direct impact on the decision; R only one-dimensional aging properties of the predictive survival functions 01 G(i) (sj)(i) ()d are relevant.
3.4.2 Dynamic multivariate notions of aging
Here we discuss some formulations of the concept of \aging" for a vector of lifetimes. In the literature, several dierent concepts have been proposed to extend the univariate concept of IFR distribution to the multivariate case; we shall report some of those concepts here. We saw above that a lifetime X is IFR if and only if, for all 0 t < t0 with 0t such that F (t0 ) > 0, the inequality (3.20) is true. This suggested a de nition of MIFR (multivariate increasing failure rate) presented in Shaked and Shanthikumar (1991), which we report here in the following, slightly modi ed, form. De nition 3.70. A vector of lifetimes X (X1; :::; Xn) is MIFR when, for 0 t < t0 ; I \ J = ;; x0I xI te; te < x0J t0 e, and histories ht ; h0t0 of the form ht
fXI = xI ; XIe > teg; h0t0 fXI = x0I ; XJ = xJ ; XIe\ej > t0 eg
one has
0 + 0 L[(XI] I [J ; t e) jht0 ] [J ; te)jht ] hr L[(X]
(3.27) (3.28)
Note that, in this de nition, the histories ht and h0t0 coincide over the interval [0; t]. We saw above that, in the one-dimensional case, the inequality (3.20) is equivalent to the apparently weaker condition (3.21). Of course that cannot be true in general for the multivariate case. This fact then leads to the de nition of a weaker concept of MIFR, which can be seen as a special case of the notion of Ft-MIFR, analyzed by Arjas (1981a).
SOME NOTIONS OF AGING
135
De nition 3.71. A vector of lifetimes X (X1; :::; Xn) is st ; MIFR if, for 0 t < t0 and histories ht ; h0t0 of the form (3.27), one has L[(XI[J ; te)jht ] st L[(XI[J ; t0 e)jh0t0 ] (3.29) On the other side, the following de nition is a natural multivariate analogous of the PF2 property. De nition 3.72. (Shaked and Shanthikumar, 1991) X is MPF2 if and only if, for 0 t < t0 and histories ht ; h0t0 one has L[(XI[J ; te)jht ] lr L[(XI[J ; t0 e)jh0t0 ] (3.30) whenever ht ; h0t0 are of the form (3.27). The de nitions recalled above are conditional and of dynamic type, in that they are formulated in terms of the comparison between conditional laws of residual lifetimes at two dierent instants t; t0 , given histories ht ; h0t0 , which coincide over the interval [0; t]. A de nition of MIFR which, on the contrary, is not of this type was given in Savits (1985) (see also Shaked and Shanthikumar, 1978, for some related characterization). For other de nitions, which do not directly involve conditional distributions of residual lifetimes given failure and survival data, one can see e.g. Harris (1970) and Marshall (1975). Now we highlight some aspects of the above dynamic de nitions. First of all we note that, by taking into account the chain of implications existing among multivariate stochastic orders (Theorem 3.28), one gets Theorem 3.73.
]
]
]
]
X MPF2 ) X MIFR ) X st-MIFR
Remark 3.74. A common feature of the above de nitions is that they imply
dierent types of positive dependence properties. More precisely, the following results hold: i) X MPF2 ) X MTP2 (Shaked and Shanthikumar, 1991, Remark 6) ii) X MIFR ) X HIF (Shaked and Shanthikumar, 1991, Remark 5) iii) X st ; MIFR ) WBF (Norros, 1985). Example 3.75. (Ross models). Consider a (non-necessarily exchangeable) Ross model where (X1 ; :::; Xn ) has an absolutely continuous joint distribution with m.c.h.r. functions such that ijI (tjx1 ; :::; xh ) = i (I ) (with h = jI j). In such a case the MIFR property amounts to requiring that i (I ) i (J ), whenever I J (see also Ross, 1984; Norros, 1985; Shaked and
136
SOME CONCEPTS OF DEPENDENCE AND AGING
Shanthikumar, 1991). Then, for such a model, X MIFR is equivalent to X HIF .
Remark 3.76. Let X1; :::; Xn be independent lifetimes with one-dimensional failure rates ri () and densities fi (). Then X (X1; :::; Xn) is st-MIFR if and only if ri () is an increasing function (i = 1; :::; n), which is also equivalent to X being MIFR. X is MPF2 if and only if fi () is a PF2 function (i = 1; :::; n). More generally, the one-dimensional marginal distributions of a random vector with the MIFR property is IFR. Many other multivariate extensions of the property of IFR require that the marginal distributions are univariate IFR.
An important remark is that any plausibly de ned property of aging is subject to be altered or destroyed when we modify the actual ow of information. Then we should clarify, in the de nition of a notion of aging, which is the state of information upon which we condition, when considering the distribution of residual lifetimes of still surviving individuals. Here we considered the special case when, at any time t, conditioning is made with respect to the internal history, i.e. to a state of information of the type ht
fXI = xI ; XIe > teg:
For a much more general treatment and detailed discussion see Arjas (1981a), Arjas (1981b), Arjas and Norros (1991). Many other properties and aspects of De nitions 3.70, 3.71, 3.72 were proved and discussed in more general settings in Arjas (1981a), Arjas (1981b), Arjas and Norros (1984), Norros (1985), Arjas and Norros (1991), Shaked and Shanthikumar (1991), Shaked and Shanthikumar (1994) and references therein.
3.4.3 The case of exchangeable lifetimes
Concerning the above de nitions of multivariate aging, we want to discuss here some aspects related to the kind of interdependence described by the notion of exchangeability. In other words, we consider the case of similar individuals, which are embedded in the same environment, which undergo the same situation of stress, and which are subjected to the same shocks, so that the probability model which describes our uncertainty about the corresponding lifetimes is exchangeable (we are thinking of the cases when the lifetimes are not independent). Even if we admit that any single individual deteriorates in time, we cannot expect in general that the vector of lifetimes satis es any of the dynamic conditions of positive aging, de ned in the previous subsection.
SOME NOTIONS OF AGING
137
In fact the latter conditions require that the lifetimes are positively dependent as a consequence of the results cited in Remark 3.74. We can then exclude in particular the validity of MIFR, in the cases of negative dependence. Example 3.77. (Negative dependence) Let T (T1; T2) have an exchangeable joint distribution characterized by the set of m.c.h.r. functions
(2) (t) = r2 (t); (1) t (t1 ) = r1 (t)
(3.31)
If r2 (t) r1 (t), this is a case of negative dependence and, accordingly, T is not WBF . Then T cannot be MIFR; even if r1 (t) and r2 (t) are increasing functions. On the other hand, in the case of exchangeability, positive dependence is often concomitant with situations of uncertainty about some unobserved quantity. In its turn, this implies that the predictive distributions are of mixture-type and we cannot expect that conditions of positive aging generally hold. We now spell out these arguments in more precise detail, by focusing attention on the notion of MIFR, for exchangeable lifetimes. Let us then consider an exchangeable vector T (T1 ; :::; Tn ) with joint distribution admitting a density, and then m.c.h.r. functions. The m.c.h.r. functions will be, as usual, denoted by (tn;h) (t1 ; :::; th ) and (tn) . We can notice that, as a necessary condition for MIFR, we must have in particular
(tn) (t0n) ; (tn;h) (t1 ; :::; th ) (t0n;h) (t1 ; :::; th ) where t1 ; :::; th < t.
(3.32)
Let us focus attention on those cases of exchangeable, positive dependence when, conditional on a given parameter , T1 ; :::; Tn are exchangeable and the corresponding conditional distributions are characterized by a system of m.c.h.r. functions (tn;h) (t1 ; :::; th j) and (tn) (). Let L denote the set of possible values of and let (jht ) denote the conditional density, given the observation of a dynamic history ht . Recall now the Theorem 2.43. The conditions (3.32) become, respectively,
Z and
L
Z L
(n) ()(jT t
(1)
> t)d
(tn;h) (t1 ; :::; th j)(jht )d
Z L
Z
(t0n) ()(jT(1) > t0 )d
(3.33)
(t0n;h) (t1 ; :::; th j)(jh0t0 )d
(3.34)
L
138
SOME CONCEPTS OF DEPENDENCE AND AGING
where 0 t1 < ::: < th t < t0 , and ht fT(1) = t1 ; :::; T(h) = th ; T(h+1) > tg h0t0
fT(1) = t1 ; :::; T(h) = th ; T(h+1) > t0 g:
Speci cally, when T1 ; :::; Tn are conditionally i.i.d. given , denoted by r(tj) the conditional hazard rate functions, then in particular the conditions (3.33) and (3.34) become
Z
and
L
r(tj)(jT(1) > t)d
Z L
r(tj)(jht )d
Z L
Z L
r(t0 j)(jT(1) > t0 )d r(t0 j)(jh0t0 )d
Remark 3.78. (3.33) and (3.34) constitute a very strong set of conditions,
which can seldom be expected to hold, even if some strong positive dependence property is assumed for T (T1 ; :::; Tn ) and if (tn) () and (tn;h) (t1 ; :::; th j) are increasing functions of t, 8 2 L. Suppose for instance that is a scalar quantity and (tn) () is an increasing function of and t. Then, by a fundamental property of the one-dimensional st ordering, (3.33) would be guaranteed by the condition L(jT(1) > t) st L(jT(1) > t0 ), for t < t0 The latter inequality is however just in contrast with the assumption that (tn) () is increasing in ; 8t > 0 (see Remark 3.15). Similar arguments can be repeated for (3.34), by using Theorem 3.62. The points above are illustrated by the following examples. In the rst example we analyze the (limiting) case of Schur-constant densities. Example P 3.79. (Schur-constant densities). Let T have a joint density function f (t) = ( ni=1 ti ), for a suitable function . Then, as we saw in Example 3.45, T is WBF if and only if is log-convex, i.e. positive dependence goes together with negative aging of the one-dimensional marginal distribution. We can see that T is MIFR if and only if T1 ; :::; Tn are independent, exponentially distributed (i.e. (t) = n expf; tg, for some > 0). This fact can also be seen by recalling that, in the absolutely continuous Schur-constant case,
(tn;h) (t1 ; :::; th ) = b (h; y) ;(h+1)(y)=(h) (y)
P
where (n) = (;1)n ; y = ni=1 ti + (n ; h)t and noticing that the condition of MIFR in particular implies that b (h; y) is a non-decreasing function of y.
SOME NOTIONS OF AGING
139
Example 3.80. The arguments in the example above can be in some sense extended to \time-transformed exponential models" de ned by the condition n X
F (n) (t) = (
i=1
R(ti )):
For these models, cases of absolute continuity and positive dependence are of the type (n)
F (t) =
Z +1 0
expf;
n X i=1
R(ti gd0 ()
with R an increasing, dierentiable function. Letting (t) = dtd R(t), we have that the functions (tn) , (tn;h) (t1 ; :::; th ) are of the form
(tn) = (t)E jT(1) > t (tn;h) (t1 ; :::; th ) = (t)E jT(1) = t1 ; :::; T(h) = th ; T(h+1) > t
(see Example 2.40). We can remark now that E jT(1) > t and E jT(1) = t1 ; :::; T(h) = th ; T(h+1) > t are non-increasing functions of t; then we cannot expect that the conditions (3.32) generally hold, even if is increasing, i.e. even if T1 ; :::; Tn are conditionally i.i.d IFR. The following example more generally shows that a vector T being MIFR conditionally on f = g, 8 2 L, does not imply that it is st-MIFR in the unconditional distribution. Example 3.81. (Conditional exchangeable Ross models). Let be a nonnegative random quantity seen as an unobservable parameter and consider T (T1 ; T2 ) which, conditionally on f = g( > 0), is an exchangeable Ross model with i (;) = ; i (fj g) = 2; i; j = 1; 2; i 6= j: Then T is MIFR in the conditional distribution, given f = g, 8 > 0. Let 0 be the prior density of and x now t0 > t1 > t 0; we can write P fT2 > t0 + sjT1 = t1 ; T2 > t0 g =
Z1
0
0
P fT2 > t + sjT1 = t1 ; T2 > t ; g (jT1 0 P fT2 > t + sjT1 > t; T2 > tg =
Z1 0
= t1 ; T2 > t0 )d
P fT2 > t + sjT1 > t; T2 > t; g (jT1 > t; T2 > t0 )d:
140
SOME CONCEPTS OF DEPENDENCE AND AGING
P fT2 > t0 + sjT1 = t1 ; T2 > t0 ; g and P fT2 > t + sjT1 > t; T2 > t; g can be easily computed by taking into account the special form of i (;); i (fj g). P fT2 > t0 + sjT1 = t1 ; T2 > t0 gand P fT2 > t + sjT1 > t; T2 > tgcan be
explicitly computed by taking, e.g., 0 () / ;1 expf; g; > 0 (see Scarsini and Spizzichino, 1999, for details). One can choose t0 large enough and s small enough, in order to have P fT2 > t0 + sjT1 = t1 ; T2 > t0 g > P fT2 > t + sjT1 > t; T2 > tg: Then T is not st ; MIFR in the unconditional distribution.
In conclusion, let us come back to the case of \exchangeable", deteriorating, individuals, as considered at the beginning of this subsection. The arguments above point out the following: any notion of multivariate aging which requires, at a same time, both positive dependence and, at least, some positive aging property in the sense of inequalities (3.32) cannot be expected generally to hold for vectors of exchangeable lifetimes. We can then wonder which type of inequalities could, in the case of exchangeability, be associated with a judgment of wear-out. This problem gives us part of the motivation for the study of the notions which will be considered in the following chapter.
3.5 Exercises
Exercise 3.82. Check that T lr Tb for the exchangeable Ross models characterized by the systems of m.c.h.r. functions
b (tn;h) (t1 ; :::; th ) = n ; h ; b(tn;h) (t1 ; :::; th ) = n ; h
with 0 < b. Using this result we see, in particular, that T is MTP2. Hint: Recall that, if for given > 0, 'h (t) = n; h ; then the corresponding joint density function is n
f (n) (t1 ; :::; tn ) = n! expf;t(n)g:
Exercise 3.83. Consider, more generally, two vectors of lifetimes T and T0
corresponding to models characterized by m.c.h.r. functions of the form (tn;h) (t1 ; :::; th ) = 'h (t); (tn;h) (t1 ; :::; th ) = 'h (t); respectively, where is a positive quantity. Show that, if T and T0 are HIF and > 1, then T hr T0 .
EXERCISES
141
The following result is well known. It provides an important characterization of the condition X st Y. Let X; Y be two n-dimensional random vectors. The condition X st Y holds if and only if we can nd a probability space ( , F , P ) and two n-dimensional random vectors Xb ; Yb : ! Rn such that
Xb (!) Yb (!) ; 8! 2 ; Xb =st X; Yb =st Y:
Exercise 3.84. By using the result above, check the inequality in Example
3.34.
Exercise 3.85. Consider a proportional hazard model F R;0 : Check directly the validity of the MTP2 property, when the initial density 0 is of the gamma type. Exercise 3.86. For the disruption model considered in Example 3.47, spell out arguments, similar to those in the Remark 3.15, to check the WBF property under the condition r1 (t) r2 (t). Exercise 3.87. Consider the heterogeneous population in Example 2.20. There we have (K1; :::; Kn ) where Kj is a binary random variable explaining the \subpopulation" which the individual Uj belongs to; for the individuals' lifetimes T1 ; :::; Tn we assumed that they are conditionally independent given and that P fTj > tj = (k1 ; :::; kn )g = Gj (t) where k1 ; :::; kn 2 f0; 1g and G0 ; G1 are two given (one-dimensional) survival functions. Show that T (T1 ; :::; Tn) is PUOD if is positively correlated and G0 (t) G1 (t) (or G0 (t) G1 (t)) for any t > 0. Exercise 3.88. Under the same setting and notation of the exercise above, show that T is WBF if G0 hr G1 and P fC1 = 1jht g P fC1 = 1jh0t g where ht , h0t are two histories as in (3.10) and (3.11). Hint: note that, for a history ht ,
P fT1 > t + sjht g = G0 (t + s) P fC = 0jh g + G1 (t + s) P fC = 1jh g: 1 t 1 t G (t) G (t) 0
1
Exercise 3.89. (Schur-constant survival functions). LetPTn (T1; :::; Tn) have a
Schur-constant joint survival function: F (t1 ; :::; tn ) = ( i=1 ti ), for a suitable non-increasing function . Show that F 2 PUOD if and only if Ti is NWU and F 2 NUOD if and only if Ti is NBU
142
SOME CONCEPTS OF DEPENDENCE AND AGING
Hint: Remember that, in this case, the one-dimensional survival function of any life-time Ti is given by
F (1) (t) = P fT1 > t; T2 > 0; :::; Tn > 0g = (t):
Exercise 3.90. Show that Schur-constant joint survival functions with the
MTP2 property have DFR one-dimensional marginals (this then happens for conditionally independent, exponential lifetimes).
Exercise 3.91. Consider the special case of the Schur-constant joint survival function with
(t) = 1s [s ; t]+n;1
([a]+ = a if a 0 and [a]+ = 0 if a < 0). Show that (T1 ; :::; Tn;1) satis es the conditions N: Hint: This distribution corresponds to
F (t1 ; :::; tn ) = P fX1 > t1 ; :::; Xn > tn j
n X i=1
Xi = sg
with X1 ;P :::; Xn i.i.d. exponentially distributed (\ uniform distribution over the simplex ni=1 Ti = s"). It is also F 2 NA (Joag-Dev and Proschan, 1983, Theorem 2.8).
Exercise 3.92. Check that T (T1; T2) in Example 3.31, where + a (1) (2) t = 2 ; t (t1 ) = ; is RR2 for a > : Hint: Recall the Formula (2.58):
Exercise 3.93. Consider again the case of the heterogeneous population as in the above Exercise 3.87. Show that, under the condition G0 (t) G1 (t) (or G0 (t) G1 (t)) for any t > 0, T is NUOD if is negatively correlated. Exercise 3.94. For the same case as above, and comparing also with Exercise 3.88, deduce that the NWU of the one-dimension marginal can coexist with both positive and negative correlation.
Exercise 3.95. Show that a mixture of two NWU distributions is NWU . Exercise 3.96. Give a simple proof of Proposition 3.65 using the simplifying condition that G(tj) is stochastically increasing in :
BIBLIOGRAPHY
143
3.6 Bibliography Arjas, E. (1981a). A stochastic process approach to multivariate reliability systems: notions based on conditional stochastic order. Math. Op. Res. 6., No. 2, 263{276. Arjas, E. (1981b). The failure and hazard processes in multivariate reliability systems. Math. Oper. Res. 6, 551-562. Arjas, E. (1989). Survival models and martingale dynamics. Scand. J. Statist. 16, 177-225. Arjas, E. and Norros, I. (1984). Lifelengths and association: a dynamical approach. Math. Op. Res. 9, 151-158. Arjas, E. and Norros, I. (1986). A compensator representation of multivariate life length distributions, with applications. Scan. J. Stat. 13, 99-112. Arjas, E. and Norros, I. (1991). Stochastic order and martingale dynamics in multivariate life length models: a review. In Stochastic Orders and Decision under Risk, K. Mosler and M. Scarsini, Eds. Inst. Math. Statist., Hayward, CA. Aven, T. and Jensen, U. (1999). Stochastic Models in Reliability. Series Application of Mathematics. Springer Verlag, New York. Barlow, R. E. (1985). A Bayesian explanation of an apparent failure rate paradox. IEEE Trans. on Rel., R34, No. 2, 107-108. Barlow, R. E. and Proschan, F. (1966). Mathematical Theory of Reliability. Classics in Applied Mathematics, SIAM, New York. Barlow, R. E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing. Holt, Rinehart and Winston, New York. Block, H.W. and Savits, T.H. (1997). Burn in. Statistical Science, 12, 1-19. Block, H. W., Savits, T. H. and Shaked, M. (1982). Some concepts of negative dependence. Ann. Probab., 10, 765-772. Block, H. W., Sampson, A. and Savits, T.H. (Eds.) (1991). Topics in Statistical Dependence. Institute of Mathematical Statistics, Hayward, Ca. Brindley, E.C. and Thompson, W.A. (1972) Dependence and aging aspects of multivariate survival. JASA, 67, 822-830. Caramellino, L. and Spizzichino, F. (1996). WBF property and stochastic monotonicity of the Markov process associated to Schur-constant survival functions. J. Multivariate Anal., 56, 153-163. Cinlar, E., Shaked, M. and Shantikumar J.G. (1989). On lifetimes in uenced by a common environment. Stoch. Proc. Appl. 33, 347-359. Clarotti, C.A. and Spizzichino, F. (1990). Bayes burn-in decision procedures. Probab. Engrg. Inform. Sci., 4, 437-445. Cohen, J.E. (1986). An uncertainty principle in demography and the unisex issue. American Statistician, 40, 32-39. Ebrahimi, N. and Ghosh, M. (1981). Multivariate negative dependence. Comm. Statist. A 10, 307-337.
144
SOME CONCEPTS OF DEPENDENCE AND AGING
Esary, J.D. and Proschan, F. (1970). A reliability bound for systems of maintained, interdependent components. JASA, 65, 329-338. Esary, J.D., Marshall, A.W. and Proschan, F. (1973). Shock models and wear process. Ann. Probab. 1, 627-649. Esary, J.D., Proschan, F. and Walkup, D.W. (1967). Association of random variables with applications. Ann. Math. Stat., 38, 1466-1474. Fahmy S., de B. Pereira C., Proschan F. and Shaked M. (1982). The in uence of the sample on the posterior distribution. Comm. Statist. A, 11, 1757{1768. Fortuin, C.M. , Ginibre, J. and Kasteleyn, P.W. (1971). Correlation inequalities on some partially ordered set. Comm. Math. Phys. 22, 89-103. Gurland, J. and Sethuraman, J. (1994). Reversal of increasing failure rates when pooling failure data. Technometrics, 36, 416-418. Gurland, J. and Sethuraman, J. (1995). How pooling failure data may reverse increasing failure rates. J. Am. Statist. Assoc., 90, 1416-1423. Harris, R. (1970). A multivariate de nition for increasing failure rate distribution functions. Ann. Math. Statist.,37, 713-717. Hougaard P. (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika, 73, 387-396. Joag-Dev, K. and Kochar, S. (1996). A positive dependence paradox and aging distributions. J. Indian Statist. Assoc. 34, 105{112. Joag-Dev, K. and Proschan, F. (1983). Negative association of random variables, with applications. Ann. Statist., 11, 286-295. Joag-Dev, K. and Proschan, F. (1995). A general composition theorem and its applications to certain partial orderings of distributions. Stat. Prob. Lett., 22, 111-119. Jogdeo K. (1978). On a probability bound of Marshall and Olkin. Ann. Statist., 6, 232-234. Karlin, S. (1968). Total positivity. Stanford University Press, Stanford, Ca. Karlin, S. and Rinott, J. (1980a). Classes of orderings measures and related correlation inequalities. I multivariate totally positive distributions, J. Mult. Analysis, 10, 467-498. Karlin, S. and Rinott, J. (1980b). Classes of orderings measures and related correlation inequalities. II multivariate reverse rule distributions. J. Mult. Analysis, 10, 499-516. Karlin, S. and Rubin, H. (1956). The theory of decision procedures for distributions with monotone likelihood ratio. Ann. Math. Statist. 27, 272-299. Keilson, J. and Sumita, U. (1982). Uniform stochastic orderings and related inequalities. Canad. J. Statist., 10,181-189. Kemperman, J.H.B. (1977). On the FKG-inequality for measures on a partially ordered space. Indagationaes Mathematicae, 13, 313-331. Kimeldorf, G. and Sampson, A. R. (1989). A framework for positive dependence. Ann. Inst. Math. Statist.,41, 31-45.
BIBLIOGRAPHY
145
Kochar, S. (1999). On stochastic orderings between distributions and their sample spacings. Statist. Probab. Lett., 42 , 345-352. Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Statist. 37, 1137-1153. Ligget, T. (1985). Interacting Particle Systems. Springer Verlag, New York. Lynch, J. D. (1999). On conditions for mixtures of increasing failure rate distributions to have an increasing failure rate. Probab. Engrg. Inform. Sci., 13, 33-36. Lynch, J. D., Mimmack, G. and Proschan, F. (1987). Uniform stochastic orderings and total positivity. Canad. J. Statist., 15, 63-69. Manton, K.G., Stallard, E. and Vaupel, J.W. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16, 439-454. Marshall, A.V. (1975). Multivariate distributions with monotone hazard rate. In Reliability and Fault Tree Analysis. SIAM, Philadelphia, 259-284. Mosler, K. and Scarsini, M. (Eds.) (1991). Stochastic orders and decision under risk. Institute of Mathematical Statistics. Hayward, Ca. Mosler, K. and Scarsini, M. (1993) Stochastic Orders and Applications, A Classi ed Bibliography. Springer-Verlag, New York. Nelsen, R. B. (1999). An introduction to copulas. Springer Verlag, New York. Norros, I. (1985). Systems weakened by failures. Stoch. Proc. Appl., 20, 181-196. Norros, I. (1986). A compensator representation of multivariate life length distributions, with applications. Scand. J. Statist. 13, 99-112. Pellerey, F. and Shaked, M. (1997). Characterizations of the IFR and DFR aging notions by means of the dispersive order. Statist. Probab. Lett. 33, 389{393. Preston, C.J. (1974). A generalization of the KFG inequalities. Comm. Math. Phys., 36, 223-241. Proschan, F. (1963). Theoretical explanation of observed decreasing failure rate. Technometrics, 5, 375-383. Ross, S. M. (1984). A model in which components failure rates depend only on the working set. Nav. Res. Log. Quart., 31, 297-300. Samuels, M. L. (1993). Simpson's paradox and related phenomena. J. Am. Statist. Assoc., 88 , no. 421, 81{88. Sarkar, T.K. (1969). Some lower bounds of reliability. Tech. Report, No 124, Dept of Operations Research and Statistics, Stanford University. Savits, T. H. (1985). A multivariate IFR distribution. J. Appl. Probab. 22, 197-204. Scarsini, M. and Spizzichino, F. (1999). Sympson-type paradoxes, dependence and aging. J. Appl. Probab., 36, 119-131
146
SOME CONCEPTS OF DEPENDENCE AND AGING
Shaked, M. (1977). A concept of positive dependence for exchangeable random variables. Ann. Statist., 5, 505-515. Shaked, M. and Scarsini, M. (1990). Stochastic ordering for permutation symmetric distributions. Stat. Prob. Lett., 9, 217-222. Shaked, M. and Shanthikumar, J.G. (1987a). Multivariate hazard rates and stochastic ordering. Adv. Appl. Probab. 19, 123-137. Shaked, M. and Shanthikumar, J.G. (1987b). The multivariate hazard construction. Stoc. Proc. Appl., 24, 85-97. Shaked, M. and Shanthikumar, J.G. (1988). Multivariate conditional hazard rates and the MIFRA and MIFR properties. J. Appl. Probab., 25, 150-168. Shaked, M. and Shanthikumar, J. (1990). Multivariate stochastic orderings and positive dependence in reliability theory. Math. Op. Res. 15, 545-552. Shaked, M. and Shanthikumar, J.G. (1991a). Dynamic multivariate aging notions in reliability theory. Stoc. Proc. Appl., 38, 85-97. Shaked, M. and Shanthikumar, J.G. (1991b). Dynamic construction and simulation of random vectors. In Topics in Statistical Dependence, H.W. Block, A. Sampson and T.H. Savits Eds., Institute of Mathematical Statistics, 415-433. Shaked, M. and Shanthikumar, J.G.(1994). Stochastic orders and their applications. Academic Press, London. Shaked, M. and Spizzichino, F. (1998). Positive dependence properties of conditionally independent random times. Math. Oper. Res., 23, 944-959. Shaked, M. and Spizzichino, F. (2001). Mixtures and monotonicity of failure rate functions. In Handbook of Statistics: Advances in Reliability, C. R. Rao and N. Balakrishnan, Eds. Elsevier, Amsterdam. To appear. Simpson, E. H. (1951). The interpretation of interaction in contingency tables. J. Roy. Statist. Soc., 13, 238-241. Strassen, V. (1965). The existence of probability measures with given marginals. Ann. Math. Statist., 36, 423-439. Szekli, R. (1995). Stochastic ordering and dependence in applied probability. Lecture Notes in Mathematics 97, Springer Verlag, New York. Tong, Y.L. (Ed.) (1991). Inequalities in Statistics and Probability. Institute of Mathematical Statistics. Hayward, Ca.. Vaupel, J.V. and Yashin, A.I. (1985). Some surprising eects of selection on population dynamics. The Am. Statist., 39, 176-185. Veinott, A.F. (1965). Optimal policy in a dynamic, single product, nonstationary inventory model with several demand classes. Operat. Res., 13, 761-778. Whitt, W. (1979). A note on the in uence of the sample on the posterior distribution. J. Am. Statist. Assoc., 74,424-426. Whitt, W. (1980). Uniform conditional stochastic order. J. Appl. Prob. 17, 112-123. Whitt, W. (1982). Multivariate monotone likelihood ratio and uniform conditional stochastic order. J. Appl. Probab. 19, 695-701.
Chapter 4
Bayesian models of aging 4.1 Introduction In this chapter we introduce notions of multivariate aging, that we call \Bayesian", for reasons which will be clari ed below. A discussion on motivations for introducing such notions will be presented here; we shall also analyze some structural dierences with those reported in the previous chapter. On this beginning, we however emphasize the basic aspects that Bayesian notions will have in common with the other notions: a) the de nitions are formulated by means of concepts of stochastic orderings b) they concern conditional probability distributions of residual lifetimes of surviving individuals, given some survival and failure data for interdependent individuals c) they arise in a natural way as extensions of inequalities respectively holding for independent lifetimes, under corresponding conditions of one-dimensional aging. A speci c feature of Bayesian notions, is, on the other hand, that they are essentially formulated for the cases of exchangeable lifetimes; we shall discuss brie y possibilities of extensions to other cases at the end of Section 4.4. In order to explain the origin and the spirit of such notions, we start by noticing that various one-dimensional notions, such as IFR, DFR, and NBU, recalled in Section 3.4, can be formulated as conditions of the following type:
L(T ; t1 jT > t1 ) L(T ; t2 jT > t2 ); 8(t1 ; t2 ) 2 A (4.1) by suitably xing as one of the one-dimensional stochastic ordering st ; hr ; lr and xing a subset A [0; +1) [0; +1). For instance the concept of IFR can be obtained by letting A f(t1 ; t2 )jt1 t2 g and replacing with st or hr in (4.1). 147
148
BAYESIAN MODELS OF AGING
As a second step, we now convert the relation (4.1) into an equivalent inequality for a set of independent lifetimes T1 ; :::; Tn distributed as T . In other words, we base our development on the following lemma, whose proof is trivial. Lemma 4.1. Let T1; :::; Tn be i.i.d. lifetimes, separately satisfying the inequality (4.1) and let D be an event of the form
D f(T1 > t1 ) \ (T2 > t2 ) \ H g
(4.2)
with (t1 ; t2 ) 2 A and
H fT3 > t3 ; :::; Tk > tk ; Tk+1 = tk+1 ; :::; Tn = tn g: Then
L(T1 ; t1 jD) L(T2 ; t2 jD)
(4.3)
The idea lies now in replacing the condition that T1 ; :::; Tn are i.i.d with the condition that the joint survival function F (n) of T1 ; :::; Tn is exchangeable and taking then (4.3) as a notion of aging for F (n) . This gives rise to several notions, according to the particular choice of A; and , and, possibly, also according to the number of failures in the event H being positive or zero. This line is developed in Bassan and Spizzichino (1999), where sucient conditions for dierent inequalities of the type (4.3) are shown. It originates from previous consideration of the following instance: as a particular multivariate notion of IFR for exchangeable lifetimes we can take the one that arises from imposing (4.3) with
A f(t1 ; t2 )jt1 t2 g; = st ; and k = n ; 2: In the next Section 4.2, we shall discuss in detail the fact that the latter yields a very natural notion, which is equivalent to the condition of Schurconcavity of the joint survival function; indeed the latter was already considered as a notion of aging (Barlow and Mendel, 1992). In general the concept of Schur-concavity has an important role here, as will be explained in Section 4.2. Section 4.3 will be speci cally devoted to cases of absolute continuity; there, we analyze the meaning of Schur properties for the joint density function of exchangeable lifetimes. In Section 4.4 we discuss some further aspects and details about notions of aging studied here. As an introduction to the more technical treatment presented in the subsequent sections, we here point out some general aspects of de nitions originating from (4.3) (and we thus trace some dierences with De nitions 3.70-3.72).
INTRODUCTION
149
First of all we notice that the inequality (4.3) regards the conditional distributions of two residual lifetimes given observed survival and failure data and that it is not of a dynamic type; in fact dierent survival data are contained in the observation, i.e., we compare the conditional distributions of two dierent individuals, of two dierent ages. While in the inequalities (3.28)-(3.30), on the contrary, all surviving individuals share the same age, we may need, in some decision problems, to deal with individuals who start working (or living) in dierent time-instants (see in particular Remark 2.24). The de nitions 3.70-3.72 actually seem to be speci cally appropriate for those cases where random variables are thought of as lifetimes of devices working simultaneously as dierent components of the same system. This aspect is also related to a further, main, source of dierence between the two types of de nitions; the dierence regards the kind of interdependence among individuals. In particular conditions of positive dependence, adapted to describe interactions among devices working simultaneously as dierent components of the same system, can be substantially dierent from dependence due to \conditional independence", as Remark 3.67 shows. As a matter of fact, we want de nitions of aging which are also compatible with correlations due to situations of learning about some unknown parameter. In this respect, we say that de nitions of multivariate aging originating from (4.3) are \Bayesian". More generally we want de nitions compatible with signi cant cases of exchangeability. As remarked in Section 3.4, we must admit that the validity of a given aging property may depend on the actual ow of information. In fact the validity of some aging property can be destroyed in the cases when the eects of a dierence in information overlap the eects of dierent ages for surviving individuals. This is just the case for De nitions 3.70-3.72, where a comparison is established between two conditional distributions having two dierent conditioning events. However we want to face the following problem: to single out inequalities which still remain unchanged under some special kinds of change of information, which do not destroy exchangeability. This is also explained by the following Remark.
Remark 4.2. As we saw in Chapter 1, the importance of the notion of exchangeability in Bayesian statistics is that it describes joint distributions of random quantities arising from random sampling. In more probabilistic terms, we can say that exchangeability is a common property for the following dierent kinds of joint distribution 1) distributions of i.i.d. random variables, 2) distributions of conditionally i.i.d. random variables, 3) conditional distributions of originally P i.i.d. random variables T1; :::; Tn, given symmetric statistics such as Sn ni=1 Ti .
150
BAYESIAN MODELS OF AGING
This means in particular that exchangeability is maintained under the operations of unconditioning and of conditioning with respect to symmetric statistics, whereas the condition of stochastic independence is in general lost. In our speci c setting, we more precisely start from lifetimes which are i.i.d. with monotone failure rates. Under the \Bayesian" operations mentioned above, even properties of monotonicity for one-dimensional failure rates can get lost along with stochastic independence. In this respect we pursue multivariate aging notions which are maintained under such operations. The discussion in the following example may help clarify why we are interested in the problem formulated above.
Example 4.3. (The burn-in problem). Here we continue to comment on the
burn-in problem discussed in Example 3.69, in the last chapter. There, we assumed that, in deciding about possible burn-in of the unit U , there is no opportunity to observe further lifetimes aected by the same unknown parameter and we realized that one-dimensional aging properties of conditional distributions G(sj) have no direct impact on the decision: only one-dimensional aging properties of the \predictive" survival function
Z1 0
G(sj)()d
are relevant to the problem. Here we consider, on the contrary, the case when an individual I may use or test n units U1 ; :::; Un with lifetimes T1 ; :::; Tn and he/she assesses that T1; :::; Tn are conditionally independent given ; with the same conditional survival function G(sj). In such a case I can choose among many more decision procedures than those envisaged in the single unit case; for instance I can decide to test a few units up to their failures in order to learn about , so as to predict the behavior of the remaining units, without burning them in. In the choice among dierent decision procedures (How many units to test? Which rule should be adopted to stop testing?), and G(sj) are to be taken into account. Two individuals I1 ; I2 , respectively, making probability assessment through i and Gi (sj) (i = 1; 2), may take dierent decisions even if the identity in (3.26) holds. In particular one-dimensional aging properties of Gi (sj) (i = 1; 2) can have a direct impact on their decisions. Then we may wonder: \How the condition of Gi (sj) being IFR (or DFR) for any re ects in the joint probability model for T1 ; :::; Tn ?" or, better:
SCHUR SURVIVAL FUNCTIONS
151
\Which inequalities are inherited by the joint (predictive) probability distributions of T1 ; :::; Tn in the case of monotonicity for the conditional hazard rate functions of Gi (sj) ?" De nitions of the type (4.3) actually aim to provide solutions to the above problem; an essential feature in fact is that they are only based on comparisons between conditional probabilities for dierent events under the same conditioning. In this respect we shall see, for instance, that de nitions of positive aging of the type (4.3) can encompass both the cases of i.i.d. lifetimes with IFR distributions and conditionally i.i.d. lifetimes with conditional IFR distributions, as well as other cases of exchangeable lifetimes originating from the condition of one-dimensional IFR (see in particular Remarks 4.19 and 4.41). A further aspect to point out is the following: we shall see that the notions originating from (4.3) allow a certain compatibility between positive aging and negative dependence, as well as compatibility between negative aging and positive dependence. Sometimes this may be appropriate, especially in the case of biological applications. For instance we can have in practice situations of negative dependence which are compatible with reasonable heuristic ideas of positive aging (see also the discussion in Brindley and Thompson, 1972). Finally we note that in terms of (4.3), we can formulate multivariate notions of positive aging and of negative aging in a symmetric way. From a technical point of view, we notice that an inequality of the type (4.3) is written in terms of one-dimensional stochastic orderings (while de nitions 3.70-3.72 involve notions of multivariate stochastic orderings). Some applications of the notions studied here to problems of Bayesian decisions will be indicated in the next chapter.
4.2 Schur survival functions In this section we analyze the case of a vector of lifetimes T1 ; :::; Tn with an exchangeable survival function F (n) such that the following condition holds: P fT1 > s1 + jDg P fT2 > s2 + jDg; 8 > 0 (4.4) where s1 < s2 and D is an event of the particular form D f(T1 > s1 ) \ (T2 > s2 ) \ (T3 > s3 )::: \ (Tn > sn )g: The condition (4.4) is equivalent to (4.3) with the particular choice = st , k = n ; 2. As remarked in Section 4.1, the inequality (4.4) is equivalent to T1; :::; Tn being IFR in the case when T1 ; :::; Tn are i.i.d..
152
BAYESIAN MODELS OF AGING
In the general case of exchangeability, it can still be seen as a condition of positive aging in the following sense: it asserts that, between two individuals which survived a life test, the \younger" is deemed to have higher probability than the \elder" to survive an extra time . Notice that in (4.4), we do not compare the conditional distribution of a residual lifetime for an individual with the analogous conditional distribution for the residual lifetime of the same individual considered when \older" than now. Actually we compare conditional distributions of residual lifetimes of two individuals of dierent ages, given the same state of information. We already noticed that this allows us to avoid undue eects of dierent states of information when comparing two conditional probabilities of survivals. As mentioned, the condition (4.4) is strictly related to the concept of Schurconcavity. This will become clear after recalling the de nitions of majorization, Schur-convexity, Schur-concavity and some related properties.
4.2.1 Basic background about majorization
In this subsection we recall the basic de nitions of majorization ordering and of the related notion of Schur-convex and Schur-concave functions. A general reference for these topics is the book by Marshall and Olkin, 1979. For a vector x (x1 ; :::; xn ) of Rn , denote by x[1] ; :::; x[n] the components of x; rearranged in decreasing orde_r: x[1] ::: x[n] and let a (a1; :::; an) and b (b1; :::; bn) be two vectors of Rn . De nition 4.4. The vector a is greater than b in the majorization ordering (written a b), if k X i=1
a[i]
k X i=1
b[i] ; k = 1; 2; :::; n ; 1 and
n X i=1
ai =
n X i=1
bi :
Denoting by x(1) ; :::; x(n) thePcomponents of a vector x; rearranged in an P k k increasing orde_r, the condition i=1 a[i] i=1 b[i] (k = 1; 2; :::; n ; 1) can be rewritten as h X i=1
a(i)
Example 4.5. In particular
h X i=1
b(i) ; h = 1; 2; :::; n ; 1:
1
(1; 0; :::; 0) 2 ; 12 ; 0; :::;
::: n ;1 1 ; :::; n ;1 1 ; 0 n1 ; :::; n1 :
SCHUR SURVIVAL FUNCTIONS
153
s’
s
.a
.c .b
s
s’
Figure 7. a b; c is not comparable with a nor with b, in the sense of
majorization.
In general, starting from any vector a (a1 ; :::; an ), we can create a second vector a0 (a01 ; :::; a0n ) such that a a0 ; this can be done by means of the following procedure: all components of a0 are equal to those of a, but i-th coordinate and j -th coordinate, where ai ; aj are respectively replaced by their convex combinations (1 ; ) ai + aj and (1 ; ) aj + ai , with 0 1: Note that this also means a0i = ai + ; a0j = aj ; with = (aj ; ai ): For a given vector a (a1 ; :::; an ); it will be convenient to introduce the symbol G(i;j) (a) to denote the vector G(i;j) (a) (a1 ; :::; ai + ; :::; aj ; ; :::; an ) (4.5) A fundamental fact about majorization is the following: a a0 if and only if a0 can be obtained from a by means of a nite number of successive transformations of the type in (4.5), with ai < aj ; 0 aj ; ai We point out that the transformation G(i;j) is usually termed \T transform" (e.g. in the book by Marshall and Olkin, 1979). We preferred to use a dierent notation for practical reasons (in particular to avoid possible confusion with the notions of \TTT process" and \TTT Transform" used in the reliability literature).
154
BAYESIAN MODELS OF AGING
De nition 4.6. A function g : A Rn ! R is Schur-convex on A if it is non-decreasing with respect to the majorization ordering, i.e.
a; b 2 A; a b ) g(a) g(b) If a b ) g(a) g(b), then g is Schur-concave on A. Since we are essentially interested in Schur properties for functions of lifetimes, we shall understand that we deal with Schur functions de ned on Rn+ , if not explicitly mentioned otherwise. We then simply say that g is Schur-convex (Schur-concave) when g is Schurconvex (Schur-concave) on A Rn . We say that g is a Schur function if it is Schur-concave or Schur-convex. It is immediate that any Schur function g must necessarily be permutation invariant since, for any permutation (1 ; :::; n ) of f1; 2; :::; ng, trivially (a1 ; :::; an ) (a1 ; :::; an ) and (a1 ; :::; an ) (a1 ; :::; an ) In particular we say that g is Schur-constant if it is, at a same time, Schur concave and Schur convex; g is Schur constant if and only if, for a suitable function : R ! R, one has
g(a) =
n ! X i=1
ai :
Remark 4.7. A function g is Schur-concave if and only if, for each constant c; the level set Ac fajg (a) cg is such that its indicator function is Schurconcave, i.e. such that
y 2 Ac; y x ) x 2 Ac
(4.6)
Similarly g is Schur-convex if and only if, for each constant c; the level set Ac is such that its indicator function is Schur-convex (i.e. x 2 Ac ; y x ) y 2 Ac ).
SCHUR SURVIVAL FUNCTIONS
155
Figure 8. Example of a set with Schur-concave indicator function. For our treatment we need some general properties of Schur functions, which will be reported next, directly adapting their formulation to the case A Rn+ : Fix an ordered subset I f1; 2; :::; ng and denote, as usual, Ie f1; 2; :::; ng; I. ; For a given vector tIe 2 Rn+;jI j ; we now look at g tI ; tIe as a function of the vector tI . ; ; Since it is immediate that t0I t00I , implies t0I ; tIe t00I ; tIe , we obtain Proposition 4.8. Let g : Rn+ ! R be a Schur-concave (Schur-convex) function ; then g tI ; tIe is Schur-concave (Schur-convex) as a function of tI , for any I f1; 2; :::; ng ; tIe 2 Rn+;jI j : In the special case when I fi; j g ; we shall write, for any given function g and for vectors et (t1; :::; ti;1; ti+1 ; :::; tj;1 ; tj+1 ; :::; tn) 2 Rn+;2 ;
g(i;j) ti ; tj ; et g(t1 ; :::; ti;1 ; ti ; ti+1 ; :::; tj;1 ;tj ; tj+1 ; :::; tn )
(4.7)
The following results provide useful characterizations of Schur functions (for proofs, see Marshall and Olkin, 1979, pg. 55-57). Proposition 4.9. A function g is Schur-concave (Schur-convex) if and only if g is permutation invariant and g(i;j) (; s ; ; t) is increasing (decreasing) in 0 2s for each s > 0, and t 2 Rn+;2 . The above conditions can be expressed in terms of derivatives when g is dierentiable.
156
BAYESIAN MODELS OF AGING
Proposition 4.10. (Schur, 1923; Ostrowski, 1952). Let g : (0; +1)n ! R be continuously dierentiable. g is Schur-concave (or Schur-convex) if and only if it is permutation invariant and, for each i; j;
@g (t) ; @g (t) 0 (ti ; tj ) @t @t
i
j
or (ti ; tj ) @g (t) ; @g (t) 0 @t @t i
j
(4.8)
for all t 2 (0; +1)n : The above inequalities are often called Schur's conditions. The following properties are fundamental for our treatment. Proposition 4.11. If a symmetric function g : Rn ! R is log-concave (logconvex) then it is Schur-concave (Schur-convex). Proposition 4.12. If g(a) is of the form g(a) = ' (a1) ::: ' (an) then it is Schur-concave (Schur-convex) if and only if ' is log-concave (log-convex).
4.2.2 Schur properties of survival functions and multivariate aging
The starting point for the discussion to be developed here is the relation between arguments above and the concepts of IFR and DFR. First we recall that a lifetime T has a IFR (DFR) distribution if and only if its survival function G(t) P fT > tg is log-concave (log-convex). Proposition 4.12 then yields Corollary 4.13. The joint survival function F (n)(s1 ; :::; sn) of i.i.d. lifetimes T1 ; :::; Tn is Schur-concave (Schur-convex) if and only if the one-dimensional distribution of T1 ; :::; Tn is IFR (DFR). In view of Corollary 4.13 one regards the properties of IFR and DFR (for one-dimensional distributions over [0; 1)) as properties of a joint survival function for n i.i.d lifetimes. Recalling that T1 ; :::; Tn being IFR, in the case when they are i.i.d., is equivalent to the inequality (4.4), Corollary 4.13 can also be reformulated as follows: for i.i.d. lifetimes T1; :::; Tn , the following three conditions are equivalent a) the inequalities (4.4) hold b) F (n) (s1 ; :::; sn ) P fT1 > s1 ; :::; Tn > sn g is Schur-concave on Rn+ c) T1 ; :::; Tn are IFR As seen above, it is of interest to consider the condition a) even in the general case of exchangeable distributions for T1 ; :::; Tn . On the other hand, Corollary 4.13 suggests the study of lifetimes with a Schur-concave joint survival function.
SCHUR SURVIVAL FUNCTIONS
157
Remark 4.14. Since, as noticed above, a Schur function is necessarily per-
mutation invariant, vectors of lifetimes with Schur joint survival functions are exchangeable. The following result points out that the equivalence between the conditions a) and b) is true for any exchangeable survival function on Rn+ . However, these are not in general equivalent to c) anymore, if T1; :::; Tn are not independent (recall the discussion in Section 3.4 and see Remark 4.17 below). Again let D fT1 > s1 ; :::; Tn > sn g
Proposition 4.15. Let F (n)(s1 ; :::; sn) be an exchangeable joint survival function. Then the following two conditions are equivalent. a) 0 s1 < s2 ) ) P fT1 > s1 + jT1 > s1 ; :::; Tn > sn g P fT2 > s2 + jT1 > s1 ; :::; Tn > sn g; 8 > 0 b) F (n) (s1 ; :::; sn ) P fT1 > s1 ; :::; Tn > sn g is Schur-concave. Proof. One has
(n) P fTi > si + jDg = F (s1 ; :::; s(in;) 1 ; si + ; si+1 ; :::; sn ) F (s1 ; :::; sn ) Then the implication a) means that, for 0 ri < rj , we have
F (n) (r1 ; :::; ri;1 ; ri + u; ri+1 ; :::; rn ) F (n) (r1 ; :::; rj;1 ; rj + u; rj+1 ; :::; rn ): Since, for ri < rj ; (r1 ; :::; ri;1 ; ri + u; ri+1 ; :::; rn ) (r1 ; :::; rj;1 ; rj + u; rj+1 ; :::; rn ); it is immediate that, if F (n) is Schur concave, then the above implication holds. Vice versa, if the implication a) holds, then F (n) is Schur-concave as a consequence of the characterization given in Proposition 4.9. Remark 4.16. Proposition 4.15 provides motivations to interpret Schur-concavity of the joint survival function as a condition of multivariate aging. By reversing inequalities, we can also immediately see that F (n) is Schurconvex on Rn+ if and only if T1 ; :::; Tn are exchangeable and the implication s1 < s2 ) P fT1 > s1 + jDg P fT2 > s2 + jDg; 8 > 0
158
BAYESIAN MODELS OF AGING
holds true. By arguments similar to those above, the latter implication can be interpreted as a condition of infant mortality. A further aspect of positive aging exhibited by vectors of lifetimes with Schur-concave survival function can be obtained by noting the following: an immediate consequence of this assumption is
P fT1 > s1 ; :::; Tn > sn g P fT1 > tg ; 8t In particular, by letting s1 = ::: = sn = u; we obtain
T1
n X i=1
si
P 1min T >u P n >u in i i.e. min T 1in i st
T1 : n
The above inequalities are reversed in the case of Schur-convexity.
Remark 4.17. As already noticed, the property that one-dimensional marginal survival functions are IFR (i.e. log-concave) is in general lost under mixing (see Remark 3.66 in Section 3.4). Consider in fact the joint survival function of lifetimes T1 ; :::; Tn which are conditionally i.i.d. given a random variable :
F (n) (s1 ; :::; sn ) =
Z
L
G(s1 j) ::: G (sn j) d0 ();
with 0 a given probability distribution over L. G (j) being log-concave, 8 2 L; (i.e. T1 ; :::; Tn IFR given f = g) imply that the one-dimensional predictive survival function F (1) (s) = RdoesG(not L sj)d0 () is log-concave as well. This shows that the one-dimensional marginal of a n-dimensional Schur-concave survival function is not necessarily IFR. Here we remark that, on the contrary, the implication in a) remains valid under mixtures. Indeed, consider a vector of n lifetimes T (T1 ; :::; Tn ), with conditional distributions, given the parameter , such that 8 > 0, 8 2 L, and again with D fT1 > s1 ; :::; Tn > sn g, one has 0 < si < sj )
SCHUR SURVIVAL FUNCTIONS
159
P fTi > si + jD; g P fTj > sj + jD; g: Then
P fTi > si + jDg =
Z L
Z L
P fTi > si + jD; gd0 (jD)
P fTj > sj + jD; gd0 (jD) = P fTj > sj + jDg:
This same fact can be reformulated by means of the following statement
Proposition 4.18. Let F (n) be a joint survival function of the form (n)
F (s1 ; :::; sn ) =
Z
L
G(s1 ; :::; sn j)d()
with G(s1;:::;sn j) Schur-concave (or Schur-convex) 8 2 L. Then F (n) (s1 ; :::; sn ) is Schur-concave (or Schur-convex).
Arguments presented above can be summarized as follows. Remark 4.19. Schur-concavity (or Schur-convexity) of the joint survival function F (n) is a property shared both by i.i.d. lifetimes with IFR (or DFR) distributions and by lifetimes which are conditionally i.i.d. IFR (or DFR). The following results show further useful closure properties.
Proposition 4.20. Let F (n) (s) be a Schur-concave (Schur-convex) survival
function. Then its k-dimensional marginal F (k) is Schur-concave (Schur-convex) as well (k = 2; :::; n ; 1) : Proof. We note that
F (k) (s1 ; :::; sk ) = F (n) (s1 ; :::; sk ; 0; :::; 0) Then the proof is immediately achieved by taking into account the implication (s01 ; :::; s0k ) (s001 ; :::; s00k ) ) (s01 ; :::; s0k ; 0; :::; 0) (s001 ; :::; s00k ; 0; :::0) :
160
BAYESIAN MODELS OF AGING
Fix an increasing transformation ' : R+ ! R+ and consider the transformed lifetimes U1 ; :::; Un de ned by
Ui ' (Ti ) ; i = 1; :::; n The joint survival function of U (U1 ; :::; Un ) is given by
H (n) (u1; :::; un ) = P T1 > ';1 (u1 ) ; :::; Tn > ';1 (un ) ; = F (n) ';1 (u1 ); :::; ';1 (un ) :
Proposition 4.21. Let F (n) be a Schur-convex joint survival function and let the function ' : R+ ! R+ be increasing and convex. Then
; H (n) (u1; :::; un ) = F (n) ';1 (u1 ); :::; ';1 (un )
is also Schur-convex. An analogous property holds for Schur-concave survival functions.
We direct the reader to the book by Marshall and Olkin (1979) for the proof of Proposition 4.21, details on compositions of Schur functions, and for many other closure properties which can be converted into closure properties of Schur distributions for lifetimes.
4.2.3 Examples of Schur survival functions
Example 4.22. (Proportional hazard models). Consider the model in Example 2.18, Example 2.40, Chapter 2, and Example 3.46, Chapter 3. For any distribution 0 over (0; +1) the corresponding joint survival func-
tion F (n) is Schur-concave when the function R (t) is convex, and it is Schurconvex when the function R (t) is concave (see also Barlow and Mendel, 1992 and Hayakawa, 1993). Example 4.23. (Proportional hazard models; Schur-constant case). In the proportional hazard model with R(t) = t; we obtain that joint survival function is Schur-constant. When the distribution 0 is degenerate on a value > 0; we in particular obtain the case of i.i.d. exponential lifetimes:
F (n) (s1 ; :::; sn ) = expf;
n X i=1
f (n) (t1 ; :::; tn ) = n expf;
ti g;
n X i=1
ti g:
SCHUR SURVIVAL FUNCTIONS
161
Example 4.24. (Time-transformed exponential models). Continuing Example 2.17, consider more generally, survival functions of the form F
(n)
(s1 ; :::; sn ) = H (n) (R(s1 ); :::; R(sn )) =
n X i=1
R(si )
!
(4.9)
where R is a given increasing function and H (n) is a Schur-constant survival function. Then H (n) is at a time Schur-concave and Schur-convex and F (n) (s1 ; :::; sn ) in (4.9) is Schur-concave, Schur-constant, or Schur-convex according to whether R is concave, linear, or convex.
s
s
s .
Figure 9. A Schur-constant survival function F (2) is constant over the simplex
Example 4.25. From the de nition of majorization ordering, it is immediate that a b implies maxfa1 ; :::; an g maxfb1 ; :::; bn g. Then the function g(a) maxfa1 ; :::; an g is Schur-convex, whence, for any non-increasing function : R ! R, [maxfa1 ; :::; an g] is Schur-concave. In particular any joint survival function F (n) (s1 ; :::; sn ), which depend on (s1 ; :::; sn ) only through maxfs1 ; :::; sn g, must be Schur-concave. This can be seen as an extreme case of Schur-concavity.
162
BAYESIAN MODELS OF AGING
Consider the extreme cases of positive dependence where P fT1 = ::: = Tn g = 1: Letting F (1) denote the one-dimensional survival function of Tj (j = 1; :::; n), the corresponding survival functions are of the form
F (n) (s1 ; :::; sn ) = F (1) (maxfs1 ; :::; sn g) and are Schur-concave.
m
m
Figure 10 A level set of a survival function of the form F (2) (s1 ; s2 ) = (max(s1 ; s2 )).
Example 4.26. (Common mode failures, bivariate model of Marshall-Olkin). The two-dimensional joint survival function
F (2) (s1 ; s2 ) = exp f; (s1 + s2 ) + 0 max(s1 ; s2 )g arising as a special case of the common-mode failure model considered in Example 2.23, is Schur-concave (see also Marshall and Olkin, 1974). This property ts with the following heuristic explanation: suppose that one component is inspected at time s1 and the other is inspected at the time s2 and suppose that both are found to be alive. Then we know that there has been no shock before s2 and hence our hopes of nding the rst component still alive shortly after time s1 are greater than hopes of nding the second component alive shortly after time s2 .
SCHUR SURVIVAL FUNCTIONS
163
4.2.4 Schur survival functions and dependence
The condition of Schur-concavity for a survival function F (n) can coexist with situations of independence, or positive dependence or negative dependence as well. The same happens for survival functions which are Schur-constant or Schur-convex (see in particular the above Example 4.23). However, certain compatibility conditions among Schur properties and dependence properties can be obtained in terms of aging properties of one dimensional marginal distributions. For this purpose, recall the de nitions of NBU, NWU, PUOD and NUOD, given in Chapter 3. Proposition 4.27. Let F (n) be Schur concave. a) If T (T1 ; :::; Tn ) is NUOD then Ti is NBU (i = 1; 2; :::; n): b) If Ti is NWU (i = 1; 2; :::; n) then T (T1 ; :::; Tn) is PUOD Proof. a) F (n) being Schur concave implies
F
(n)
(s1 ; :::; sn ) F
n X
(n)
i=1
!
si ; 0; :::; 0 = F Ti
On the other hand, if T is NUOD then
n ! X i=1
si :
F Ti (s1 ) ::: F Ti (sn ) F (n) (s1 ; :::; sn )
whence
F Ti (s1 ) ::: F Ti (sn ) F Ti
n ! X i=1
si ;
which in particular implies the NBU property. b) Ti NWU means, for any s; t 0; F Ti (s + t) F Ti (s) F Ti (t) whence, if F (n) is Schur concave
F
(n)
(s1 ; :::; sn ) F Ti
n ! X i=1
si F Ti (s1 ) ::: F Ti (sn ) :
In a similar way one can prove Proposition 4.28. Let F (n) be Schur convex. a') If T (T1; :::; Tn ) is PUOD then Ti is NWU (i = 1; 2; :::; n): b') If Ti is NBU (i = 1; 2; :::; n) then T (T1 ; :::; Tn) is NUOD.
164
BAYESIAN MODELS OF AGING
Very special cases where F (n) is Schur-convex and T is PUOD are those of lifetimes which are conditionally independent identically distributed, with a DFR (conditional) distribution, given some parameter. Then part a') of Proposition 4.28 can be seen as a generalization of the following well-known result, which gives a property for Ti that is stronger than NWU, under stronger dependence conditions among T1; :::; Tn : Proposition 4.29. (Barlow and Proschan, 1975). If T1; :::; Tn are conditionally independent identically distributed, with a DFR (conditional) distribution, then Ti is DFR (i = 1; 2; :::; n):
4.3 Schur density functions In the case of independence (and identical distribution), the inequality (4.4), which is equivalent to Schur-concavity of survival functions, is also equivalent to one in which the conditioning event can contain some failure data along with survival data, i.e., for i.i.d. IFR lifetimes, we obviously have, 8 > 0; the validity of the following implication
s1 < s2 )
P fT1 > s1 + jT1 > s1 ; T2 > s2 ; :::; Tk > sk; Tk+1 = tk+1 ; :::; Tn = tn g
(4.10)
P fT2 > s2 + jT1 > s1 ; T2 > s2 ; ; :::; Tk > sk; Tk+1 = tk+1 ; :::; Tn = tn g
Such an equivalence is not true in general when stochastic dependence is present. We then turn now to conditions on an exchangeable survival function F (n) which guarantee the validity of (4.10). In order to avoid technical problems in considering conditioning events of the form D fT1 > s1 ; T2 > s2 ; ; :::; Tk > sk; Tk+1 = tk+1 ; :::; Tn = tn g (4.11) we assume that F (n) admits a joint density f (n). Below, we show that the validity of (4.10) is implied by Schur-concavity of f (n) . Actually this is a stronger condition than Schur-concavity of the joint survival function, as can be seen in view of the following property.
For a set A Rn+ and a vector s 2Rn+ , denote by A + s, the set de ned by A + s ft2Rn+ jt = a + s, for some a 2Ag so that the event fT 2 A + sg means fT ; s 2Ag. Let now A Rn+ have, in particular, a Schur-concave indicator function (i.e. an indicator function satisfying the implication (4.6)).
SCHUR DENSITY FUNCTIONS
165
Theorem 4.30. (Marshall and Olkin, 1974) If f (n) is Schur-concave on Rn+ , then
P fT 2 A + sg =
Z
A+s is a Schur-concave function of s 2 Rn+ :
f (n) (t1 ; :::; tn )dt1 :::dtn
Note that, letting in particular A Rn+ , it is
P fT 2 A + sg = P fT1 > s1 ; :::; Tn > sn g = F (n) (s) : From Theorem 4.30 we can then obtain Corollary 4.31. If f (n) is Schur-concave on Rn+ , then also the joint survival function F (n) is Schur-concave on Rn+ . If f (n) is Schur-convex on Rn+ , then also the joint survival function F (n) is Schur-convex on Rn+ . In the following we shall generally formulate results only for the case of Schur-concavity. We recall that, by Proposition 4.11, the condition that f (n) is Schur-concave is in particular veri ed when f (n) is symmetric and log-concave. Lemma 4.32. Let f (n) be a Schur-concave joint density for a vector of lifetimes T1 ; :::; Tn. The conditional survival function of T1 ; :::; Tk given
H fTk+1 = tk+1 ; :::; Tn = tn g is Schur-concave. Proof. Taking into account the identity (n) ; :::; tn ) ; f (k) (1 ; :::; k jH ) = f f((n1;; k:::;) (tk ; tk; +1 :::; tn ) k+1
the proof is immediate, in view of the implication
f (n) Schur-concave ) f (n) (1 ; :::; k ; tk+1 ; :::; tn ) Schur-concave as a function of (1 ; :::; k ) : By simply applying this Lemma, Corollary 4.31, and using Proposition 4.15 for the variables T1 ; :::; Tk , we obtain
166
BAYESIAN MODELS OF AGING
Proposition 4.33. Let f (n) be a Schur-concave joint density for a vector of lifetimes T1 ; :::; Tn . Then, 8 > 0, the implication (4.10) holds.
Similarly to the above, one can also easily obtain, for an observation D as in (4.11), Remark 4.34. If f (n) is Schur-convex, then the following implication holds 8 > 0:
s1 < s2 ) P fT1 > s1 + jDg P fT2 > s2 + jDg:
(4.12)
Later in this section we shall consider dierent aspects of vectors of lifetimes admitting a joint density f (n) (t1 ; :::; tn ) which is a Schur function. Before, we present a few examples of Schur densities; we start by discussing in some detail the special case of Schur-constant density functions.
4.3.1 Schur-constant densities
Consider the case of lifetimes T1; :::; Tn with a Schur-constant joint density f (n) :
f (n) (t
1 ; :::; tn ) = n
n ! X i=1
ti :
(4.13)
We already noticed that f (n) being Schur-constant implies that F (n) is Schurconstant as well. This can also be immediately obtained by taking into account that, in such a case, f (n) is simultaneously Schur-concave and Schur-convex and then applying the above Corollary 4.31. By dierentiation, one can easily check, vice versa, that if F (n) is absolutely continuous and Schur-constant then also the density is Schur-constant; then an absolutely continuous F (n) is Schur-constant if and only if its density f (n) is such. More precisely we have that the condition (4.13) is equivalent to (n)
F (s1 ; :::; sn ) =
n ! X
si i=1 with : R+ ! [0; 1] such that n (t) = (;1)n dtdnn (t).
Notice that the function is a non-increasing function with the meaning of one-dimensional survival function of a single lifetime Ti (i = 1; 2; :::;); in fact
F (1) (s) = F (n) (s; 0; :::; 0) = (s):
SCHUR DENSITY FUNCTIONS
167
Remark 4.35. In Example 2.35 of Chapter 2 we saw that the condition (4.13)
also implies that the corresponding m.c.h.r. functions (tn;h) (t1 ; :::; th ) (h = 2; :::; n ; 1) are Schur-constant functions of t1 ; :::; th .
The condition (4.13) is trivially veri ed in the cases of independent, identically distributed, exponential lifetimes, where n (t) = n expf;tg (4.14) and of conditionally independent, identically distributed, exponential lifetimes. Several probabilistic properties which hold for the case (4.14) also hold for arbitrary Schur-constant densities. An example of that is provided by the signi cant invariance property, concerning conditional distributions of residual lifetimes, that we are going to illustrate now. From Proposition 4.33 and Proposition 4.34, we obtain that, under the condition (4.13), the implications (4.10) and (4.12) simultaneously hold, and then we have P fT1 > s1 + jDg = P fT2 > s2 + jDg (4.15) for any pair s1 ; s2 > 0 and for D in (4.11). For the case of i.i.d lifetimes, Equation (4.15) is nothing but the \memoryless" property of exponential distributions. Remark 4.36. More generally, in view of Equation (4.15), the condition (4.13) can be seen as the proper \multivariate" formulation of indierence with respect to age: given the same observation D, the conditional probability of extra survival is the same for any surviving individual, irrespective of its own age. Such a property of Schur-constant densities, which could be called \noaging", was also pointed out by Barlow and Mendel (1992). We notice that something (apparently) stronger happens under the condition (4.13). Consider the conditional density fT1 ;s1 ;T2 ;s2 ;:::;Tk ;sk (x1 ; :::; xk jD) of residual lifetimes T1 ; s1 ; T2 ; s2 ; :::; Tk ; sk , conditional on the observation D. Under the condition that a joint density does exist, we actually have Proposition 4.37. The following conditions are equivalent: (a) The joint density f (n) is Schur-constant (b) fT1 ;s1 ;T2 ;s2 ;;:::;Tk;sk (x1 ; :::; xk jD) is Schur-constant, 8s1; :::; sk ; tk+1 ; :::; tn .. (c) fT1 ;s1 ;T2 ;s2 ;;:::;Tk;sk (x1 ; :::; xk jD) is exchangeable, 8s1; :::; sk ; tk+1 ; :::; tn : (d) Given D, T1 ; s1; T2 ; s2; :::; Tk ; sk are identically distributed, 8s1; :::; sk , and 8tk+1 ; :::; tn .
168
BAYESIAN MODELS OF AGING
Proof. By adapting Proposition 2.15 to the present notation and applying it to the case when (a) holds, one can easily check that (b) holds. Obviously (b) implies (c) which in turn implies (d). In view of Proposition 4.15 and Remark 4.16, (d) implies that the joint survival function must be Schur-constant. As noticed, when a joint density does exist, the latter condition implies (a). In conclusion (4.13) is equivalent to a condition of exchangeability (and then of indierence) for the residual lifetimes, which can be interpreted as a condition of indierence with respect to aging. We recall that combining such a condition with in nite extendibility we get, via Theorem 1.44, the \lack of memory" models de ned by exponentiality or conditional exponentiality of lifetimes (go back to Remark 1.46). Some other characterizations of Schur-constant densities follow from arguments to be developed next (see e.g. Remark 4.51). Further characterizations, which point out dierent aspects of such distributions, have been discussed by Chick and Mendel (1998). Examples of lifetimes with Schur-constant densities which are not necessarily independent, or conditionally independent, and exponentially distributed (i.e. not necessarily in nitely extendible) can be produced as shown by the next example. Example 4.38. Consider an arbitrary Schur-constant density
f (n) (t
1 ; :::; tn ) = 'n
and random variables Y1 ; :::; Ym such that
n ! X i=1
ti
(T1 ; :::; Tn ) and (Y1 ; :::; Ym )
P are conditionally independent, given Sn ni=1 Ti . This means that the conditional density fY (yjT1 = t1 ; :::; Tn = tn ) is a Schur-constant function of t1 ; :::; tn : In this case, the conditional density
fT (tjY1 = y1 ; :::; Ym = ym ) /
/ 'n
n ! X i=1
ti fY (yjT1 = t1 ; :::; Tn = tn )
is Schur-constant. Note that in general this is not a case of proportional hazard.
SCHUR DENSITY FUNCTIONS
169
In Example 3.45, Chapter 3, we saw that we have the MTP2 property if and only if n is log-convex and this implies that the one-dimensional survival function is DFR; if is IFR we have a case of negative dependence. This argument can be used to obtain, by means of the construction considered in Example 4.38, Schur-constant models which are not proportional hazard models (see Exercise 4.75). In order to discuss further aspects of Schur densities, we shall need to come back to the case of Schur-constant densities, later on. In particular, in Section 5.3, it will be necessary to point out further aspects of invariance, dierent from that shown by Proposition 4.37.
4.3.2 Examples of Schur densities
An elementary property of Schur-concavity and Schur-convexity is the one described by Proposition 4.18. It is obvious that an analogous implication is true for joint density functions; that is we have Proposition 4.39. Let f (n) be a joint survival function of the form
f (n)(t1 ; :::; tn ) =
Z
L
g(t1 ; :::; tn j)d0 ()
with g(t1 ; :::; tn j) Schur-concave (or Schur-convex) 8 2 L. Then f (n) (t1 ; :::; tn ) is Schur-concave (or Schur-convex). A dierent but, in a sense, similar closure property is described next. Let f (n) be a joint density for a vector of lifetimes T1 ; :::; Tn and denote by
f (n)(t1 ; :::; tn;1 jSn = s) P the conditional density of T1 ; :::; Tn;1 given Sn = s, where Sn ni=1 Ti : Proposition 4.40. If f (n) is Schur-concave (Schur-convex) then f (n;1) (t1 ; :::; tn;1 jSn = s)
is Schur-concave (Schur-convex) as well. Proof. We limit ourselves to considering the case of Schur-concavity. For s > 0; (t1 ; :::; tn;1 ) 2 (sn;1) where
(n;1) s
(
) nX ;1 n ; 1 (t1 ; :::; tn;1 ) 2 R+ j ti s ; i=1
f (n;1) (t1 ; :::; tn;1 jSn = s) =
P ;1 t ) f (n) (t1 ; :::; tn;1 ; s ; ni=1 i fSn (s)
170
BAYESIAN MODELS OF AGING
with
fSn (s) =
Z
f (n)(t1 ; :::; tn;1 ; s ; (n;1)
s
nX ;1 i=1
ti )dt1 :::dtn;1 :
Consider now two vectors (t01 ; :::; t0n;1 ); (t1 ; :::; tn;1 ) 2 (sn;1) such that nX ;1
If
f (n)
i=1
t0i =
nX ;1 i=1
ti = z; (t01 ; :::; t0n;1 ) (t1 ; :::; tn;1 )
is Schur-concave, one then has, by Proposition 4.8, (n) 0 0 f (n;1) (t1 ; :::; tn;1 jSn = s) f (tf1 ; :::;(st)n;1 ; s) = Sn
f (n;1)(t01 ; :::; t0n;1 jSn = s):
Remark 4.41. Coming back to Remark 4.2, we summarize the arguments above as follows. Schur properties of joint densities are not lost under P unconditioning nor under conditioning with respect to the statistic Sn ni=1 Ti :
Propositions 4.39 and 4.40 can be used to build several examples of Schur densities. In particular, recalling the Proposition 4.12, we get Proposition 4.42. Let be a parameter taking values in a set L, with a prior distribution 0 . Let T1; :::; Tn be conditionally i.i.d., given f = g, with a conditional density g(tj):
f (n) (t1 ; :::; tn ) =
Z
L
g(t1 j):::g(tn j)d0 ()
If g(tj) is log-concave (log-convex) then f (n) (t1 ; :::; tn ) is Schur-concave (Schurconvex). Example 4.43. (Conditionally i.i.d. Weibull lifetimes). Let be a nonnegative random variable with a prior distribution 0 . Let T1; :::; Tn be conditionally i.i.d., given f = g ( > 0), with Weibull density g(tj) = t ;1 expf;t g; t 0 where is a positive constant. g(tj) is log-concave if > 1 and it is log-convex if 0 < < 1. The joint density of T1 ; :::; Tn is then Schur-concave if > 1 and Schurconvex if 0 < < 1. As mentioned in Example 3.46, we have in any case the strong positive dependence property of MTP2 :
SCHUR DENSITY FUNCTIONS
171
Example 4.44. (Conditionally i.i.d. gamma lifetimes). Let be a nonnegative random variable with a prior distribution 0 . Let T1; :::; Tn be conditionally i.i.d., given f = g ( > 0), with gamma density g(tj) = ;( ) t;1 expf;tg; t 0
where is a positive constant. g(tj) is log-concave if > 1 and it is log-convex if < 1. The joint density of T1 ; :::; Tn is then Schur-concave if > 1 and Schurconvex if < 1. In the following example we consider again the case of conditionally i.i.d lifetimes. In this case, however, the conditional survival functions G(sj) are not positive over all R+ . Example 4.45. Let be a non-negative random variable with a prior density 0 and let T1 ; T2 be conditionally i.i.d., given f = g ( > 0), with a distribution which is uniform over the interval [0; ]: g(tj) = 1 1
f0tg The joint density of f (2) (t1 ; t2 ) of T1 ; T2 is then f (2) (t1 ; t2 ) =
Z1
Z1 0 () d: ( ) d = 0 f max( t ;t ) g 1 2 2 0 max(t1 ;t2 ) 2 11
In order to make explicit computation easy, we consider the particular case when 0 is a Pareto density with parameters a0 and 0 < u0 < max(t1 ; t2 ): () = ( ; 1) u;1 1 1 : 0
This yields
a fug
( ; 1) u;1 : ( + 1) [max(t1 ; t2 )]+1 which is Schur-concave, since it is a decreasing function of max(t1 ; t2 ).
f (2) (t1 ; t2 ) =
Example 4.46. Consider the case of lifetimes T1; :::; Tn satisfying the condition
N of negative dependence (De nition 3.56): there exists s > 0 and log-concave densities g1 ; :::; gn+1 such that the joint P density of T1 ; :::; Tn is equal to the conditional density of Z1 ; :::; Zn+1 given ni=1 Z1 = s, where Z1 ; :::; Zn+1 are independent variables with densities g1; :::; gn+1 , respectively. T1; :::; Tn are exchangeable if g1 = ::: = gn+1 = g and the corresponding density is Schur-concave by Proposition 4.40.
172
BAYESIAN MODELS OF AGING
The following examples show cases of Schur densities which have not been obtained from conditionally i.i.d. lifetimes or from conditioning i.i.d. lifetimes with respect to the sum. Example 4.47. (Linear breakdown model). In the special Ross model with m.c.h.r. functions of the form (n;h) (t ; :::; t ) = 1 ; (4.16) t
we have the joint density
1
h
n;h
n
f (n)(t1 ; :::; tn ) = n! expf; max(t1 ; :::tn )g
which, by the same argument as above, is also Schur-concave. The interest of this example lies in that it is a case of Schur-concave density, with positive dependence, which has not been obtained as one of conditionally i.i.d. lifetimes. Example 4.48. Consider the case of dependence, for a pair of lifetimes T1; T2, characterized by m.c.h.r. functions as follows + a (1) (2) t = 2 ; t (t1 ) = where and a are given positive quantities. The corresponding joint density is f (2) (t1 ; t2 ) = +2 a exp f; max(t1 ; t2 ) ; a min(t1 ; t2 )g : If a < , f (2)(t1 ; t2 ) is Schur-concave and we have a case of positive dependence; if a > , f (2) (t1 ; t2 ) is Schur-convex and we have a case of negative dependence.
4.3.3 Properties of Schur densities
Propositions 4.39 and 4.40 show closure properties of the family of Schurconcave or Schur-convex density functions. The following results show further useful closure properties. Proposition 4.49. Let f (n) be a Schur-concave (Schur-convex) density function. Then its k-dimensional marginal f (k) is Schur-concave (Schur-convex) as well. Proof. For an arbitrary vector (1 ; :::; n;k ) 2 Rn+;k we have the implication (t01 ; :::; t0k ) (t001 ; :::; t00k ) ) (t01 ; :::; t0k ; 1 ; :::; n;k ) (t001 ; :::; t00k ; 1 ; :::; n;k ) :
SCHUR DENSITY FUNCTIONS
173
Then the assertion follows from the relation
f (k) (t
1 ; :::; tk ) =
Z1 Z1 0
:::
0
f (n) (t1 ; :::; tk ; 1 ; :::; n;k ) d1 :::dn;k :
As a corollary of Theorem 4.30, one obtains (see Marshall and Olkin, 1974)
Proposition 4.50. Let T1 and T2 be two independent vectors of lifetimes with Schur concave joint densities f1(n) and f2(n) ; respectively. Then the vector T1 + T2 has a Schur-concave joint density. The assumption that the distribution of T (T1 ; :::; Tn ) admits a Schur
density f (n) can be used to obtain some special properties for multivariate conditional hazard rates, normalized spacings, and TTT process. In particular one can obtain that some signi cant properties, valid in the case of vectors of i.i.d. lifetimes with monotone one-dimensional hazard rate functions, also hold for vectors of exchangeable lifetimes, under Schur conditions of the density functions. Next, we shall show a result concerning the vector C (C1 ; :::; Cn ) of normalized spacings between order statistics (De nition 2.53), while in Subsection 4.4.1, we shall see a related result concerning the so called TTT plot. From Section 2.4 we recall that the joint density of C is given by (2.74).
Remark 4.51. When T1; :::; Tn are i.i.d. and exponentially distributed, C1; :::; Cn
are also i.i.d. and exponentially distributed. This is a special case of a remarkable property holding for any Schur constant joint density f (n) for T1 ; :::; Tn . Indeed from (2.74) we immediately obtain that, if f (n) is Schur-constant, then the joint density fC of C1 ; :::; Cn coincides with f (n) . In particular then, in such a case, fC is Schur constant as well and C1 ; :::; Cn are exchangeable. One can also see that, under regularity conditions for f (n) , the condition fC = f (n) implies that fC and f (n) are Schur-constant (see Ebrahimi and Spizzichino, 1997, for details). Of course in general C1 ; :::; Cn cannot be exchangeable. Rather we can expect that, under conditions of aging, their one-dimensional marginal distributions are stochastically ordered. In fact, in the case when T1; :::; Tn are i.i.d. with a one-dimensional survival function F , the following result holds
Proposition 4.52. (Barlow and Proschan, 1966, pg 1585). If F is IFR (DFR), then C1 st ::: st Cn (C1 st ::: st Cn ).
174
BAYESIAN MODELS OF AGING
We shall prove that the same property holds for dependent lifetimes admitting Schur joint densities f (n) . For this purpose we need the following lemma concerning a special aspect of the transformation Z ; introduced in Section 2.4. For a vector u (u1 ; :::; un) 2 Rn+ and for i 2 f1; 2; :::; n ; 1g, denote by Mi (u) the vector
Mi (u) (u1; :::; ui;1; ui+1; ui; ui+2 ; :::; un)
(4.17)
obtained from u by interchanging the positions of the i-th coordinate and the (i + 1)-coordinate. The proof of the following Lemma is a bit tedious but elementary (see Exercise 4.76). Lemma 4.53. If ui > ui+1 , then
Z (Mi (u)) Z (u)
Proposition 4.54. (see e.g. Ebrahimi and Spizzichino, 1997). If f (n) is Schur-concave (Schur-convex) then, for i = 1; :::; n ; 1; Ci st Ci+1 (Ci st Ci+1 ) Proof. Let us consider the Schur-concave case. We must prove that, 8c > 0, it is
P fCi > cg P fCi+1 > cg : Now
Z1
P fCi > cg =
0
d1 :::
Z1
Z1 Z1
di;1
0
di
c
0
(4.18)
di+1 :::
Z1 0
fC ( ) dn (4.19)
By interchanging integration order, the r.h.s. of (4.19) can also be written
Z1 0
d1 :::
Similarly
Z1 0
P fCi+1 > cg =
di;1
Z1 0
Z1 0
d1 :::
di+2 :::
Z1 0
By writing
Z1 Z1 c
di
0
fC ( ) di+1 =
Z1
di;1
0
dn
Z1 0
di
c
di+2 :::
Z 1 Z c c
Z1 Z1
0
di
Z1 0
0
fC ( ) di+1 :
dn
fC ( ) di+1 +
Z1 c
Z1 c
di+1
Z1 0
fC ( ) di :
fC ( ) di+1
SCHUR DENSITY FUNCTIONS and
Z1 c
di+1
Z1 0
fC ( ) di =
Z1 c
175
di+1
"Z d 0
fC ( ) di +
Z1 c
fC ( ) di
#
we see that, in order to achieve (4.18), we have only to show that, 8 (1 ; :::; i;1 ; i+2 ; :::; n ) ;
Z 1Z c c
0
fC ( ) di di+1
Z 1Z c c
0
fC ( ) di+1 di
or, by taking into account Proposition 2.56,
Z 1Z c c
0
f (n) (Z ( )) d
i di+1
Z 1Z c c
0
f (n) (Z ( )) di+1 di
(4.20)
By using the notation introduced in (4.17), we can write
Z 1Z c c
0
f (n) (Z ( )) di+1 di =
Z 1Z c c
0
f (n) (Z (Mi ( ))) di di+1
The validity of (4.20) can then be proved by taking into account that f (n) is Schur-concave and that, in the integration domain fi c; 0 i+1 cg, it is, by Lemma 4.53,
Z (Mi ( )) Z ( ) :
Example 4.55. (Linear breakdown models). For the Ross model with joint density
n
f (n) (t) = n! exp f; max (t1 ; :::; tn )g one has
(
)
n n X fC (c) = n! exp ; n ;cii + 1 i=1 i.e. C1 ; :::; Cn are independent and the distribution of Ci is exponential with mean i = n;i+1 , i = 1; :::; n:
For a slightly more general example, see Exercise 4.73.
176
BAYESIAN MODELS OF AGING
4.4 Further aspects of Bayesian aging In this section we just sketch some other aspects which can be of interest for applications and further research. In the next subsection we concentrate attention on the notion of \scaled" TTT process. We notice that, under Schur conditions for f (n) we can derive special properties for the random variables Y(1) ; :::; Y(n) , de ned in Section 2.4, by using Proposition 4.54. In the statistical analysis of failure data, however, it is more convenient to look at \scaled" TTT process and at \scaled" total time on test until successive failures. We shall then illustrate some properties which hold for such a notion in connection with aging properties of original lifetimes.
4.4.1 Schur densities and TTT plots P
Denote as usual Sn ni=1 Ti : De nition 4.56. The scaled total time on test until h-th failure is de ned as the random quantity
Y Qh S(h) ; h = 1; :::; n ; 1 n For h = n, it is of course Qn YS(nn) = 1: The space of possible values of the random vector Q (Q1 ; :::; Qn;1 ) is
;1;n;1 t 2 Rn+;1 jt1 t2 ::: tn;1 1
and the joint density function of Q will be denoted by gQ . Schur properties of the joint density function of (T1 ; :::; Tn ) are immediately re ected by gQ . We have in fact Proposition 4.57. If f (n) is Schur-concave (Schur-convex) then gQ is nondecreasing (non-increasing) on ;1;n;1 , with respect to the natural partial ordering. We prefer to show applications of Proposition 4.57, before proceeding to provide a sketch of the proof. As a rst application, we have Corollary 4.58. Let f (n) be an arbitrary Schur-constant density. Then the distribution of Q is the uniform distribution over ;1;n;1 . Consider now two vectors of lifetimes T (T1 ; :::; Tn) and T0 (T10 ; :::; Tn0 ), admitting joint densities fT(n) and fT(n0 ) respectively. As an interesting consequence of Proposition 4.57 and of the de nition of multivariate lr ordering (De nition 3.20) we have
FURTHER ASPECTS OF BAYESIAN AGING
177
Proposition 4.59. If fT(n) is Schur-concave and fT(n0 ) is Schur-convex, then Q lr Q0: Corollary 4.60. If fT(n) is Schur-concave (or Schur-convex) and fT(n0 ) is Schurconstant, then
Q lr Q0
(or Q0 lr Q)
Remark Pn 4.61. Consider lifetimes Tb1; :::; Tbn with a Schur-constant density f (n)(t) = n ( i=1 ti ).
b is uniform over ;1;n;1 and it is The distribution of the corresponding Q independent of n : In other words, for all vectors having a Schur-constant density, the distribution of the corresponding scaled Total Times on Test is the same as in the case of i.i.d. T1; :::; Tn with a standard exponential distribution. Corollary 4.60 may be seen as a counterpart, for the case of stochastic dependence, of a result valid for i.i.d. lifetimes. In fact, a well-known result is the following
Proposition 4.62. (Barlow and Campo, 1975). Let T1; :::; Tn be i.i.d. with an IFRA (or, in particular IFR) distribution. Then, for the corresponding scaled total time on test, it is Q st Qb :
This result can be used to obtain, under conditions of aging, useful properties of the scaled total time on test plot associated with the longitudinal observation of lifetimes T1 ; :::; Tn. This is de ned as the polygonal line obtained by joining the (n + 1) points of coordinates
1
2
(0; 0) ; n ; Q1 ; n ; Q2 ; :::;
n ; 1
n ; Qn;1 ; (1; 1) :
This statistic appears to be a very interesting one in the analysis of failure data and in particular in the derivation of burn-in and age replacement procedures (see in particular Barlow and Campo, 1975; Bergman, 1979; Cooke, 1993). In the case of i.i.d. lifetimes it can be seen as a sort of empirical counterpart of the notion of total time on test transformation (see Shorack, 1972; Bergman and Klefsjo, 1985; Chandra and Singpurwalla, 1981; Gill, 1986; Klefsjo, 1982). We now give a sketch of the proof of Proposition 4.57; for details see Nappo and Spizzichino (1998). A rst step is to derive the joint law of (Q1 ; :::; Qn;1 ).
178
BAYESIAN MODELS OF AGING
;
To this purpose note that, by taking into account the de nitions of Y(1) ; :::; Y(n) and (C1 ; :::; Cn ), one can write
;T ; :::; T = T (Q ; :::; Q ; S ) 1 n;1 n (1) (n)
where T is the one-to-one dierentiable transformation de ned by
T (Q1 ; :::; Qn;1 ; Sn ) = (1 (Q) Sn ; :::; n (Q) Sn ) with
j ;1 X 1 ; 1 q ; j = 1; :::; n j (q) = n ;qjj + 1 ; n ; i+1 n;1 i i=1
(4.21)
Note also that it is n X j =1
j (q) = 1; 8q (q1 ; :::; qn;1 ) 2 ;1;n;1
(4.22)
By some computation it can ben;shown that the Jacobian of the transforma1 tion T is jJT (q1 ; :::; qn;1 ; s) j = s n! . By recalling Lemma 2.55, we then obtain Proposition 4.63. The vector (Q1; :::; Qn;1; Sn) has the joint density
g(Q1 ;:::;Qn;1;Sn) (q1 ; :::; qn;1 ; s) = f (n) (1 (q) s; 2 (q) s; :::; n (q) s)) sn;1 Whence
Corollary 4.64. The vector Q has the joint density gQ (q1 ; :::; qn;1 ) =
1;1;n;1 (q1 ; :::; qn;1)
Z1 0
f (n) (1 (q) s; :::; n (q) s)) sn;1 ds
Proposition 4.57 can nally be obtained from the following property of the transformation T : Lemma 4.65. Let q; q0 2 ;1;n;1: If qi qi0 for i = 1; :::; n ; 1; then (1 (q0 ) ; 2 (q0 ) ; :::; n (q0 )) (1 (q) ; 2 (q) ; :::; n (q)):
FURTHER ASPECTS OF BAYESIAN AGING
179
4.4.2 Some other notions of Bayesian aging
As said in the rst section of this chapter, one can de ne several notions of aging for exchangeable lifetimes, by xing attention on a given univariate notion and then extending it to the exchangeable case, by requiring the validity of an inequality of the type in (4.3). Following this line, we discussed two dierent multivariate extensions of the notion of IFR, one of which is equivalent to Schur-concavity of the joint survival function and the other, valid for the absolutely continuous case, is implied by Schur-concavity of the joint density. Here we mention a few further de nitions along the same direction. We start with the multivariate extension of the notion of NBU. The univariate notion for i.i.d. lifetimes is, in particular, equivalent to P fT1 > jDg P fT2 > s2 + jDg; 8 (4.23) where D f(T2 > s2 ) \ H g (4.24) with H fT3 > s3 ; :::; Th > sn g: (4.25) It is immediately obvious that (4.23) is equivalent to
F (n) (; s2 ; s3 ; :::; sn ) F (n) (0; s2 + ; s3 ; :::; sn ) = F (n;1) (s2 + ; s3 ; :::; sn )
(4.26)
and that it is implied by Schur-concavity of F (n) . The condition (4.26) can then be seen as a multivariate de nition of NBU for exchangeable lifetimes. Some aspects of this de nition have been analyzed by Bassan and Spizzichino (2000). In particular, it is easy to see that, when T1 ; :::; Tn are conditionally i.i.d., with NBU distribution, given a parameter , then (4.23) holds for the unconditional survival function (see Exercise 4.78). Let us now come to multivariate notions related to stochastic comparisons stronger than st . Here we just mention a couple of cases, focusing attention on situations of positive aging. A more complete treatment, with additional cases, examples and proofs is presented in Bassan and Spizzichino (1999). As seen in Section 3.4, a univariate condition, stronger than IFR, is that a lifetime T is PF2 , i.e. that the density f of T is log-concave, which is equivalent to the stochastic comparison in (3.22). A multivariate extension to exchangeable lifetimes T1 ; :::; Tn is then provided by the condition L(T1 ; s1 jD) lr L(T2 ; s2 jD); for 0 s1 < s2 : (4.27)
180
BAYESIAN MODELS OF AGING
where
D f(T1 > s1 ) \ (T2 > s2 ) \ H g with H containing survival and failure data for lifetimes T3 ; :::; Tn . A set of sucient conditions for (4.27) is: (i) f (n) is log-concave (ii) f (n) is MTP2 : An example is presented in Exercise 4.81. We also saw that, in the univariate case, the IFR property is equivalent to the apparently stronger condition
L(T ; s1 jT > s1 ) hr L(T ; s2 jT > s2 ); 80 s1 < s2 In other words this means that, for i.i.d. lifetimes T1; :::Tn the two conditions
L(T1 ; s1 jD) st L(T2 ; s2 jD); 80 s1 < s2
(4.28)
L(T1 ; s1 jD) hr L(T2 ; s2 jD); 80 s1 < s2 coincide for D f(T1 > s1 ) \ (T2 > s2 ) \ ::: \ (Tn > sn )g:
(4.29)
This is not anymore true in general when T1 ; :::Tn are exchangeable but stochastically dependent. A set of sucient conditions for (4.29) is: (i) The mapping s1 ! F (n) (s1 ; s2 ; s3 ; :::; sn ) is log-concave (ii) The mapping (s1 ; s2 ) ! F (n) (s1 ; s2 ; s3 ; :::; sn ) is TP2 . See Exercises 4.79, 4.80, relative to the case n = 2:
4.4.3 Heterogeneity and multivariate negative aging
It is well known that, for one-dimensional lifetime distributions, situations of heterogeneity create a tendency to negative aging. This is shown in particular by Remark 3.66 and by Proposition 3.65 which states that a mixture of DFR distributions is DFR (then in particular a mixture of exponential distributions is DFR). More in general it has been noticed that properties of negative aging can even arise in the case of mixtures of non-necessarily DFR distributions. In this subsection we generalize the exchangeable hierarchical model of heterogeneous populations, considered in several examples in the previous chapters. The model we obtain accounts, at a time, for heterogeneity and for interdependence among lifetimes; it allows us to clarify some facts related to heterogeneity and aging; in particular we shall see multivariate analogs of Proposition 3.65 and Remark 3.66. Consider a population formed of n individuals U1 ; :::; Un and let (Ki ; Ti ) be a pair of random variables corresponding to Ui , for i = 1; :::; n. We think of
FURTHER ASPECTS OF BAYESIAN AGING
181
Ti as of an observable lifetime, while Zi is an unobservable (endogenous to Ui ) quantity which in uences the distribution of Ti (and only of Ti ). K1; :::; Kn are random variables taking their values in a set K R (in the
previous examples of exchangeable heterogeneous populations the special case K f0; 1g was considered). Then the marginal distribution of Ti is a mixture (the mixture of conditional distributions given (Ki = k); k 2 K). Put K (K1; :::; Kn ) and assume, more precisely, the existence of a family fG(tjk); k 2 Kg of one-dimensional survival functions such that the conditional survival function of Ti , given fK = kg is G(tjki ). Assuming T1 ; :::; Tn to be conditionally independent given K, we see that the joint conditional survival function of T1 ; :::; Tn given K is
F (n) (t1 ; :::; tn jk1 ; :::; kn ) =
Yn
i=1
G(ti jki )
(4.30)
so that, by denoting 0 (k) the joint density of K, the unconditional survival function of T1; :::; Tn is
F (n) (t1 ; :::; tn ) =
Z
n Y
K:::K i=1
G(ti jki )0 (k)dk1 ; :::; dkn
(4.31)
Let us rst consider the case when K1 ; :::; Kn are i.i.d. and denote by p0 their common density, so that their joint density is
0 (k1 ; :::; kn ) =
n Y
i=1
p0 (ki ):
(4.32)
In such a case the unconditional survival function of T1; :::; Tn is
F (n) (t1 ; :::; tn ) =
Z
n Y
G(ti jki )
K:::K i=1
n Y
i=1
p0 (ki )dk1 ; :::; dkn
(4.33)
and it is clear that T1 ; :::; Tn are i.i.d. as well, their common one-dimensional survival function being
F (1) (t) =
Z
K
G(tjk)p0 (k)dk:
(4.34)
Recalling the Corollary 4.13, we reformulate Proposition 3.65 and Remark 3.66, respectively, as follows: Proposition 4.66. If G(tjk) is DFR, 8k 2 K, then F (n)(t1 ; :::; tn) in (4.33) is Schur-convex. Remark 4.67. G(tjk) being IFR 8k 2 K, does not necessarily imply that F (n) (t1 ; :::; tn ) in (4.33) is Schur-concave.
182
BAYESIAN MODELS OF AGING
Here, we are interested in the case when K1 ; :::; Kn are not independent and then when also T1 ; :::; Tn are interdependent. More speci cally, we consider the case when the initial distribution of K (K1 ; :::; Kn) has an exchangeable density function 0(n) (k). In such a case also T1 ; :::; Tn are exchangeable. Proposition 3.65 and Remark 3.66, respectively, can be extended to this case as follows. Remark 4.68. G(tjk) being IFR 8k 2 Z , does not necessarily imply that F (n) (t1 ; :::; tn ) is Schur-concave. The dierence between this claim and the one in Remark 4.67 is just in the fact that there we considered survival functions of the form (4.33), while here the case (4.31) is considered.
Proposition 4.69. Let GG((ttjjkk0)) be non-decreasing in t, for k < k0: If G(tjk) is DFR 8k 2 K, then F (n) (t1 ; :::; tn ) in (4.31) is Schur-convex for any exchangeable joint density 0(n) (k).
Note that the condition that GG((ttjjkk0)) is non-decreasing in t, for k < k0 means that Ki can be seen as a \frailty" for the individual Ui . The proof of Proposition 4.69 substantially relies on the following facts: letting D be an event as in (4.2), we have, by the above assumption of conditional independence, P fTi > si + jDg = E [ G (si + jKi ) jD]; (4.35) G(si jKi ) for 0 s1 < s2 ,
P fK1 > kjDg P fK2 > kjDg; 8k 2 K:
A complete proof is contained in Spizzichino and Torrisi (2001). Such a result therein is obtained as a corollary of a more general result, concerning a multivariate extension of the concept of ultimately DFR univariate distributions, which was considered for instance in Block, Mi and Savits (1993). Simple examples can be found, which con rm what was claimed in Remark 4.68. First examples are in particular provided by the case K1 ; :::; Kn i.i.d., when indeed Remark 4.68 reduces to Remark 3.66. However in the extreme case when P fK1 = K2 ::: = Kng = 1 (4.36) then T1 ; :::; Tn are conditionally independent, identically distributed given K1 and, assuming G(tjk) IFR 8k 2 K, we actually obtain that F (n) (t1 ; :::; tn ) is Schur-concave, since now we reduce to the case of Proposition 4.18. This agrees with the fact that, in such an extreme case of positive dependence among K1 ; :::; Kn , the situation of heterogeneity disappears.
FURTHER ASPECTS OF BAYESIAN AGING
183
In this respect, the case is particularly interesting when
G(tjk) = expf;tkg:
(4.37)
Under this choice, F (n) (t1 ; :::; tn ) is Schur-constant if (4.36) holds, while we have Schur-convexity in the strict sense if the condition (4.36) is replaced by the opposite condition K1; :::; Kn i.i.d. (with a nondegenerate distribution). Further aspects of the model of heterogeneity described above are studied in Gerardi, Spizzichino and Torti (2000a), where a study is carried out of the case when K1 ; :::; Kn are discrete random variables, and in Gerardi, Spizzichino and Torti (2000b), where the conditional distribution of residual lifetimes, given a dynamic history, is studied as a stochastic ltering problem.
4.4.4 A few bibliographical remarks
Some general aspects of probability distributions with Schur properties were studied in the paper by Marshall and Olkin (1974). As mentioned a complete treatment about majorization and Schur-concavity (Schur-convexity) is presented in the book by Marshall and Olkin (1979). In the speci c eld of reliability the concept of majorization was used by Pledger and Proschan (1971), by Proschan (1975), and by several other Authors later on, as accounted in the review paper by Boland and Proschan (1988). Starting in the late eighties, interest about notions of multivariate aging, suitable for a Bayesian approach to reliability, started to arise among a number of dierent people. This was mainly related to research on the meaning of infant mortality (in the case of interdependence) and to the study of optimality of burn-in. In particular there were several discussions between Richard E. Barlow and the author. The idea that Schur properties of joint densities of lifetimes is a notion of multivariate aging is contained in Barlow and Mendel (1992). A discussion on that was presented by R. E. Barlow in the Fall of 1990 at a Conference on Reliability and Decision Making, held in Siena (Italy); it came out rather naturally from a comparison with the particular Schur-constant case, which conveys the idea of indierence with respect to age, or of a Bayesian, multivariate, counterpart of exponentiality.
4.4.5 Extensions to non-exchangeable cases
In many cases, properties of aging for a vector of lifetimes are interesting in view of the fact that they can be converted into corresponding probabilistic assessments on the behavior of the vector of order statistics (see for instance Subsection 4.1 above).
184
BAYESIAN MODELS OF AGING
Here we considered notions of aging for exchangeable lifetimes; however it is well known that, for an arbitrary vector of random variables X1 ; :::; Xn , we can build a vector of exchangeable random variables T1 ; :::; Tn in such a way that the two corresponding vectors of order statistics have the same law; this is in particular true by letting
Ti XPi ; i = 1; :::; n where P is a random permutation of f1; 2; :::; ng. If fX is the joint density of X, then the joint density fT of T is the exchangeable density obtained by symmetrization : X fT (t) = n1! fX (t1 ; :::; tn ) (4.38) where the sum is extended to all the permutations of f1; 2; :::; ng. Then a property of aging for fT in (4.38) can, in a sense, be seen as a property of aging for fX . This shows that it can be of interest to nd sucient conditions on fX which guarantee a given property of aging for fT . A result of this kind (see Marshall and Olkin, 1979, p. 83) is provided by the following.
Proposition 4.70. If fX is convex (but not necessarily permutation-invariant) then fT is Schur-convex. This result is also used in the proof of Theorem 3.1 in Kochar and Kirmani (1995).
4.5 Exercises
Exercise 4.71. Consider the joint survival function F (n) for the model of heterogeneity described in Examples 2.3 and 2.20. Assume that the failure rate functions r0 (t) and r1 (t) are non-increasing and such that r0 (t) r1 (t) and
that K1 ; :::; Kn are (exchangeable) binary random variables. Show that F (n) is Schur-convex. Hint: Show that, for D fT1 > s1 ; :::; Tn > sn g, 0 s1 s2 , and arbitrary s3 ; :::; sn 0, it is
P fK1 = 1jDg P fK2 = 1jDg: Then take into account conditional independence of T1 ; :::; Tn given K1; :::; Kn and arguments in Remark 4.16 (see also arguments in Subsection 4.4).
EXERCISES
185
Exercise 4.72. Consider the change-point model in Example 2.19 and assume r1 (t) = 1 ; r2 (t) = 2 ; q () = expf;g: Find an expression for F (2) and, by applying the Schur's condition (4.8), check if F (2) is Schur-concave. Hint: for s1 s2 ,
F (2) (s1 ; s2 ) =
Z s2 s1
Z s1 0
(2)
F (2) (s1 ; s2 j) q () d +
F (s1 ; s2 j) q () d
Z1 s2
F (2) (s1 ; s2 j) q () d:
Exercise 4.73. Show that, for a Ross model characterized by the condition (tn;h) (t1 ; :::; th ) = 'h ; t(n) = '0 ; with '0 '1 ::: 'n;1 , the MIFR property holds and C1 st ::: st Cn :
Hint: Take into account De nition 2.61 and De nition 3.70. Exercise 4.74. Consider a joint density of the form
f (n) (t) =
t n 1max in i
:
Show that n being non-increasing implies, for the normalized spacings between order statistics,
C1 st ::: st Cn and is equivalent to the MIFR property of f (n).
Exercise 4.75. (Schur-constant, non-in nitely extendible models). T1; :::; Tn and Y1 ; :::; Ym are the lifetimes of individuals forming two dierent populations P1 and P2 , respectively. The individuals in the two populations are competing in the following sense: they need the same resource A for their lives. In particular the need of A for the individual j in P1 is proportional to the lifetime Tj (j = 1; :::; n). The initial amount of A is a > 0. The individuals in P1 use the resource A before the individuals in P2 do. IfP fT1 = t1 ; :::; Tn = tn g, the total amount of A left for individuals in P2 is a ; nj=1 tj . This amount is shared among dierent Pn individuals, and each of them has the same distribution with mean a; mj=1 tj . Assume that the initial density of T1; :::; Tn is Schur-constant. Show that, conditionally on fY1 = y1 ; :::; Ym = ym g, the density of T1; :::; Tn
is Schur-constant and that T1; :::; Tn are not i.i.d. or conditionally i.i.d.
186
BAYESIAN MODELS OF AGING
Exercise 4.76. Prove Lemma 4.53. Hint: Notice that, 8a 2Rn+ the components of the vector u Z (a) are
ordered in the increasing sense:
u1 ::: un ; i.e. ui = u(i) , i = 1; :::; n:
Exercise 4.77. Prove Lemma 4.65. Exercise 4.78. Check that, for lifetimes T1; :::; Tn, conditionally i.i.d. with conditional NBU distributions given a parameter , the inequality in (4.26) holds. Exercise 4.79. Let (T1; T2) be exchangeable lifetimes with survival function F (2) satisfying: (a) F (2) is TP2
(b) the mapping s1 ! F (2) (s1 ; s2 ) is log-concave. For D fT1 > r1 ; Ts > r2 g, 0 r1 ; r2 , prove that
L (T1 ; r1 jD) hr L (T2 ; r2 jD) i.e. that
g R(t) PP ffTT1 ;; rr1 >> ttjjD Dg 2
2
is increasing in t:
Exercise 4.80. (Bivariate models of Marshall-Olkin). Check that, for the joint survival function
F (2) (s1 ; s2 ) = expf;(t1 + t2 ) ; 0 max(t1 ; t2 )g the conditions (a) and (b) of Exercise 4.79 hold.
Exercise 4.81. Consider again the model with joint density of the form n
f (n) (t) = n! expf; 1max t g: in i Show that the inequality (4.27) holds.
BIBLIOGRAPHY
187
4.6 Bibliography Barlow, R.E. and Campo, R. (1975). Total time on test processes and applications to failure data analysis. In Reliability and fault tree analysis, R.E. Barlow, J. Fussel and N. Singpurwalla Eds., SIAM, Philadelphia, 451-481. Barlow, R.E. and Proschan, F. (1966). Inequalities for linear combinations of order statistics from restricted families. Ann. Math. Statist., 37, 1574-1592. Barlow, R.E. and Proschan, F. (1975). Statistical theory of reliability and life-testing. Probability models. Holt, Rinehart and Winston, New York. Barlow, R.E. and Mendel, M.B. (1992). de Finetti-Type representations for life-distributions. J. Am. Statis. Soc., 87, 1116-1122. R.E. Barlow and Mendel, M.B. (1993). Similarity as a characteristic of wear-out. In Reliability and decision making, R.E. Barlow, C.A. Clarotti and F. Spizzichino, Eds., Chapman & Hall, London. Barlow, R.E. and Spizzichino, F. (1993). Schur-concave survival functions and survival analysis. J. Comp. Appl. Math., 46, 437-447. Bartoszewicz, J. (1986). Dispersive ordering and the total time on test transformation. Statist. Probab. Lett., 4, 285-288. Bassan, B. and Spizzichino, F. (1999). Stochastic comparisons for residual lifetimes and Bayesian notions of multivariate ageing. Adv. Appl. Probab. 31, no. 4, 1078{1094. Bassan, B. and Spizzichino, F. (2000). On a multivariate notion of New Better than Used. Volume of contributed papers presented at the Conference \Mathematical Methods of Reliability", Bordeaux, July 2000. Bergman, B. (1979). On age replacement and the total time on test concept. Scand. J. Statist., 6, 161-168. Bergman, B. and Klefsjo;B. (1985). Burn-in models and TTT-Transforms. Quality and Reliability Engineering International, 1, 125-130. Block, H.W., Mi, J. and Savits, T.H. (1993). Burn in and mixed populations. J. Appl. Probab., 30, 692-702. Boland, P.J. and Proschan, F. (1988). The impact of reliability theory on some branches of mathematics and statistics. In Handbook of Statistics, Vol. 7, P.R. Krishnaiah and C.R. Rao Eds, Elsevier Science Pub., 157-174. E.C. Brindley and W.A. Thompson (1972) Dependence and Aging Aspects of Multivariate Survival. JASA, Vol. 67, pg. 822-830. Caramellino, L. and Spizzichino, F. (1994). Dependence and Aging Properties of Lifetimes with Schur-constant Survival Functions. Prob. Engrg. Inform. Sci., 8, 103-111. Chandra, M. and Singpurwalla, N. (1981). Relationships between some notions which are common to reliability theory and Economics. Math. Op. Res., 6, 113-121. Chick, S. E. and Mendel, M. B. (1998). New characterizations of the noaging property and the l1 ;isotropic model. J. Appl. Probab., 35, no. 4, 903{910
188
BAYESIAN MODELS OF AGING
Cooke, R. (1993). The total time on test statistic and age-dependent censoring. Statist. Probab. Lett., 18, 307-312. Ebrahimi, N. and Spizzichino, F. (1997). Some results on normalized total time on test and spacings. Statist. Probab. Lett. 36 (1997), no. 3, 231{243. Gerardi, A., Spizzichino, F. and Torti, B. (2000a). Exchangeable mixture models for lifetimes: the role of \occupation numbers". Statist. Probab. Lett., 49, 365-375. Gerardi A., Spizzichino, F. and Torti, B. (2000b). Filtering equations for the conditional law of residual lifetimes from a heterogeneous population. J. Appl. Probab., 37, no. 3, 823{834. Gill, R.D. (1986). The total time on test plot and the cumulative total time on test statistics for a counting process. Ann. Statist., 4, 1234-1239. Hayakawa, Y. (1993). Interrelationships between lp -isotropic densities and lp -isotropic survival functions, and de Finetti representations of Schur-concave survival functions. Austral. J. Statist. 35, no. 3, 327{332. Kochar, S. C. and Kirmani, S. N. U. A. (1995). Some results on normalized spacings from restricted families of distributions. J. Statist. Plann. Inference 46, no. 1, 47{57. Klefsjo, B. (1982). On aging properties and total time on test transforms. Scand. J. Statist., 9, 37-41. Langberg, N. A., Leon, R.V., and Proschan, F. (1980). Characterizations of nonparametric classes of life distributions. Ann. Probab., 8, 1163-1170. Marshall, A.W. and Olkin I. (1974). Majorization in multivariate distributions. Ann. Statist., 2, 1189-1200. Marshall, A.W. and Olkin I. (1979). Inequalities: theory of majorization and its applications. Academic Press, New York. Nappo, G. and Spizzichino, F. (1998). Ordering properties of the TTT-plot of lifetimes with Schur joint densities. Statist. Probab. Lett. 39, no. 3, 195{203. Ostrowski, A. M. (1952). Sur quelques applications des fonctions convexes et concaves au sens de I. Schur. J. Math. Pures Appl., 31, 253-292. Pledger G. and Proschan, F. (1971). Comparisons of order statistics and of spacings from heterogeneous distributions. Optimizing methods in statistics. Academic Press, New York, 89{113. Proschan, F. (1975). Applications of majorization and Schur functions in reliability and life testing. In Reliability and fault tree analysis, Soc. Indust. Appl. Math., Philadelphia, Pa, 237{258. Schur, I. (1923). Uber eine Klasse von Mittelbildungen mit Anwendungen die Determinanten-Theorie Sitzungber. Berlin Math. Gesellshaft, 22, 9-20. Shorack, G. (1972). Convergence of quantile and spacing processes with applications. Ann. Math. Statist., 43, 1400-1411. Spizzichino, F. (1992). Reliability decision problems under conditions of ageing. In Bayesian Statistic 4, J. Bernardo, J. Berger, A.P. Dawid, and A.F.M. Smith Eds., Clarendon Press, Oxford, 803-811.
BIBLIOGRAPHY
189
Spizzichino, F. and Torrisi, G. (2001). Multivariate negative aging in a exchangeable model of heterogeneity. Statist. Probab. Lett. (To appear).
Chapter 5
Bayesian decisions, orderings, and majorization 5.1 Introduction In this chapter we discuss some aspects of Bayesian decision problems arising in the eld of reliability. We shall also see the role that notions of stochastic orderings, dependence, and majorization can have in those problems. We shall in particular concentrate attention on the cases of two-action decision problems and of burn-in problems. Some aspects of two-action decision problems and of burn-in problems will be discusssed in Section 5.2 and Section 5.4, respectively. Section 5.3 will be speci cally devoted to analyzing the possible role of the notion of majorization, when comparing the informational eects of two dierent sets of survival/failure data, with the same value for the total-time-on-test statistic. In this section, the language and some basic facts concerning decision problems under uncertainty will be brie y recalled; moreover we shall give the shape of decision problems to a few classical problems of the theory of reliability. For further discussions and examples of the role of stochastic orderings in decision problems and reliability see e.g. Torgersen (1994); Block and Savits (1994). A Bayes decision problem under uncertainty is speci ed by the elements
A; V ; l ; where
A is the space of possible \actions" (or \decisions")
V is the space of possible values for an unobservable quantity W is the probability distribution of W l : A V ! R is the loss function 191
192
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
In words the problem amounts to determining an optimal action a 2 A, taking into account that the consequence of an action a depends on the value taken by W , which is unknown to us, at least at the instant when the decision is to be taken. The loss, coming from the choice of a when the value of W is w, is measured by the scalar quantity l (a; w) and our uncertainty on W is speci ed by means of the distribution . In the statisticians' language, W is sometimes called \state of nature ". Commonly the loss l (a; w) is thought of as the opposite of the \utility" coming from the consequence of choosing a if the value of W were w. This motivates that we can consider as an optimal action the one which minimizes the expected value of the loss, namely we consider the following de nition of optimality, corresponding to the principle of maximization of expected utility (see e.g. Savage, 1972, De Groot, 1970, Lindley, 1985, Berger, 1985). When we x a loss function l (a; w), we consider the loss l (a; W ) for any action a 2 A. W being a random variable, l (a; W ) is itself random (see the brief discussion in Subsection 5.1.3); the probability distribution of l (a; W ) depends on l and on the distribution of W . When the distribution of W is , the expected value of l (a; W ) is given by
R (a) E [l (a; W )] =
Z
V
l (a; w)d(w)
R (a) is the \risk of a against ". De nition 5.1. For a xed loss function l , a is a Bayes (or optimal) action against the distribution , if
R (a ) R (a)
(5.1)
for all a 2 A: Of course, when A is not a nite space, it can happen that no Bayes action exists.
Example 5.2. (A typical two-action problem). For a given unit U , we have to decide whether it is good enough for a pre-established task; say for instance that the latter requires a mission time . This can be seen as a decision problem in which A fa1 ; a2 g with: a1 = fdiscarding the unit U g; a2 = fusing U for the missiong: The unobservable quantity W of interest here obviously coincides with T , the lifetime of U .
INTRODUCTION
193
For a given loss function l (a; t) (a 2 A; t 0) the optimal decision will be
a2 if and only if
E [l (a1 ; T )] =
Z1 0
l (a1 ; t)f (t)dt
Z1 0
l (a2 ; t)f (t)dt = E [l (a2 ; T )]
(5.2)
where f is the density function of T . The simplest, reasonable, loss function might be speci ed as follows: l (a1 ; t) = c; 8t > 0
C for t < l (a2 ; t) = ;K for t >
(5.3)
with K > C > c. In such a case the optimal decision will be a2 if and only if E [l (a2 ; T )] = CF ( ) ; KF ( ) c i.e., if and only if
F ( ) CC +;Kc :
(5.4)
Example 5.3. (The spare parts problem). An apparatus, that is to accomplish
a mission of duration , contains a component, which is sensitive to a certain type of shock. The subsequent shocks occur according to a Poisson process Nt of intensity and each shock causes the failure of the component. We want to determine the optimal number of spare copies of the same component, taking into account that any spare part has a cost c (due to purchase, storage, maintenance, ...), that the gain for the accomplishment of the mission is K , and that the cost caused by the failure of the apparatus previous to , is C . Let M be the maximum number of spares which it is practically possible to purchase. This is a decision problem where the space of possible actions A is f0; 1; 2; :::; M g (choosing a = n, means that we provide the apparatus with n spare components). The unobservable quantity of interest is W = N , the number of shocks before the mission time , which coincides with the number of needed spare parts; then V = f0; 1; 2; :::g. The loss function is speci ed by l (a; w) = c a + C 1fw>ag ; K 1fwag Since the distribution of W is obviously Poisson with mean , the expected loss associated with the choice a is 1 ( )i a ( )i X X E [l (a; W )] = c a + C expf; g ; K exp f; g i=a+1 i! i=0 i!
194
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
and then E [l (a; W )]
; E [l (a ; 1; W )] = c ; (C + K ) expf; g (a!)
a
The optimal action is M if
( )a c expf g a! C +K for a = 1; 2; :::; M ; otherwise it is the maximum integer a such that ( )a c expf g: a! C +K Example 5.4. (The burn-in problem for a single unit). The burn-in problem for a unit U described in Example 3.69 is a decision problem with A = [0; +1]; V = [0; 1) and where we can interpret the choice a = 0 as the decision of delivering the unit U to operation immediately and the choice a = 1 as the decision to discard U ; the choice 0 < a < 1 corresponds to conducting a burn-in of duration a; the unit will be delivered to operations, if it survives the test. The state of nature in this problem is the overall lifetime (burn-in + operational life) of U , which we denoted by T and whose survival function was denoted by F . The loss function is l (a; w) = c1fw
a+ g (5.5) More generally we might consider the case where the cost coming from a burn-in of duration a is c(a) and the gain coming from an operative life of duration t yields a gain K (t). It is natural to assume that K is a non-decreasing function, such that K (t) < 0 in the neighborhood of the origin and positive for t large enough; the corresponding loss function takes the form l (a; w) = c (a) 1fw
;
Z +1 a
K (w ; a)f (w) dw:
The loss function considered in (5.5) is in particular obtained by letting c (a) = c; 8a > 0
K (t) =
;C for t < K for t >
(5.7)
INTRODUCTION
195
In such a special case the expected value in (5.7) becomes E [ l (a; T )] = c + (C ; c) F (a) ; (C + K ) F (a + ): The optimal solution for this case will be analyzed in some detail in Section 5.4. Example 5.5. (Optimal age replacement of a single unit). Here we consider a very simple form of the problem of optimal preventive age replacement of a single unit (see e.g. Barlow and Proschan, 1975; Aven and Jensen, 1999; Gertsbakh, 2000); this allows us to emphasize both the analogies and the dierences between such a problem and the problem of optimal burn-in. U is a unit and its lifetime T has a survival function F . U is put into operation at time t = 0, when it is new. As a consequence of an operative life of length r, we get a gain K (r); however we have a cost Q if the unit fails when still in operation. We can then decide to replace U when it cumulates an age a (if it did not fail before); replacement of U when still in operation causes a cost c (c < Q). We face the problem of optimally choosing the age of replacement a. We have a decision problem where the space of possible decisions is V = [0; 1], W = T and the cost function turns out to be: l (a; w) = ;K (w ^ a) + Q1fw
;
Za 0
K (w)f (w)dw + [c ; Q ; K (a)] F (a) + Q:
It is natural to assume that K is a non-decreasing function; if it is also dierentiable we obtain that an optimal replacement age a must be a solution of the equation 0 (5.8) r(a ) = K (a ) ;
Q;c where r denotes the hazard rate function of T .
More precisely, we can conclude as follows: 0 If r(0) < KQ;(0)c and Equation (5.8) is satis ed for some a > 0 then the 0 smallest such a provides the optimal solution; if r(a) < KQ;(ac) , 8a 0, then we can take 0a = 1, meaning that no preventive replacement is to be planned; if r(0) KQ;(0)c , then a = 0 is the optimal choice, meaning that we have to discard the unit immediately.0 The possibility of a burn-in procedure is to be considered provided r(a) < KQ;(ac) , for some a 0:
196
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Example 5.6. (Inspection procedures for acceptance of lots). Here we consider
the classical situation of quality control. A lot of size N of similar units is to be put into assemblies in a production line; the number of defective units in the lot is a random variable W with a distribution over the set of possible values f0; 1; :::; N g. A simple problem concerns the decision, on the basis of , whether to install the lot immediately or to inspect all the units before installation. In this simple form, the problem is not very realistic. Rather a more common practice consists in inspecting a number n of units and then, on the basis of the observed number of defective units, facing the decision problem described above for the reduced lot of the remaining N ; n units. This shows that a rst problem related with this situation is the one of optimally choosing the number n of units to inspect. For details about the decision-theoretic approach to this problem, see e.g. Deming (1982); Barlow and Zhang (1987). In all the dierent decision problems of interest in this matter we have that the state of nature W is to be taken as the number of defective units among the non-inspected units. Example 5.7. (Design of scram systems). A scram device is a unit designed to detect the occurrence of a situation of danger in a given production system and to consequently shut the production system down. However such a unit can itself fail and behave in an erroneous way; typically it can have two possible failure modes: unsafe: the unit does not recognize the situation of danger safe: the unit shuts the production system down when it is not needed. Due to this, one usually implements a scram system formed of several, say n, units. Since the units are typically similar and all play the same role, we have a situation of symmetry which suggests implementing a k : n system: i.e. the production system will be shut down if and only if at least k out of n units suggest doing so. Related to the optimal design of a scram system, we can consider dierent kinds of decision problems; in any case the decision consists in the optimal choice of the numbers n and k (see Clarotti and Spizzichino, 1996).
5.1.1 Statistical decision problems
The problems considered so far are sometimes called \decision problems without observations". More often however one is interested in a \decision problem with observations" or \statistical decision problem". The latter means that we have a decision problem as de ned above with one more ingredient: the availability of a statistical observation X , before choosing the action a.
INTRODUCTION
X.
197
X is a random variable and we denote by X the space of possible values of We consider (W; X ) as a random variable with values in the product space
V X and denote by P the joint distribution of the pair (W; X ).
We already denoted by the (marginal) distribution of W ; then, in order to describe the joint distribution of (W; X ) we assign the conditional distribution of X , given fW = wg (for w 2 V ); the latter will be denoted by Pw : Thus a statistical decision problem is speci ed by the elements A; V ; l ; X ; P or, equivalently, by the elements A; V ; l ; ; X ; fPw gw2V ; the latter choice is more common and often more convenient in that it is useful, as we shall see, to compare the elements of a decision problem with observations with those of a related decision problem without observations. It is commonly assumed that the statistical model for the observation X is such that any probability distribution Pw (w 2 V ) admits a density, which will be denoted by f (xjw) (see also Subsection 5.1.3).
Example 5.8. (Life-testing). We come back to the two-action decision prob-
lem of Example 5.2, where T is the lifetime of a unit to be possibly used for the pre-established task. Now we suppose that n units, with lifetimes T1 ; :::; Tn, are available and that T; T1; :::; Tn are exchangeable (but are not stochastically independent). This implies that observing T1 ; :::; Tn provides some information about T . Here, T has the role of the state of nature W , while X = (T1 ; :::; Tn ). Specializing the form (1.37) of Chapter 1, we have that fX (jt) is given by the predictive density
fT (tjt) = f (n) (tjt) = R 1 0
(n+1)
R f (t; t1; :::; tn) : ::: 1 f (n+1) (t; ; :::; )d :::d 0
1
n
1
n
while fW (tjT1 = t1 ; :::; Tn = tn ) coincides with the predictive density (n+1) f (1) (tjt) = R 1f (n+1)(t; t1 ; :::; tn ) : (; t1 ; :::; tn )d 0 f
We can then consider the decision problem with observations whose elements are given as follows:
A = fa1; a2 g; V = [0; 1); f (1); X = [0; 1)n ; f (n) (tjt); l where the loss function l is as in (5.3). After making a life-testing experiment on the n available units, i.e. after observing the statistical result
D fT1 = t1 ; :::; Tn = tn g;
198
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
our state of information on T is described by the conditional density fT (tjD) = f (1) (tjt). Conditionally on D, the probability for the unit to survive the mission time , is then given by
F (1) ( jt1 ; :::; tn ) =
Z1
f (n+1) (; t1 ; :::; tn )d;
where for brevity's sake we set in place of R01 f (n+1) (1;t1 ;:::;tn)d . For any such result D, we can consider the optimal decision corresponding to the new state of information; by adapting formula (5.4), we obtain that a2 is optimal if and only if the vector (t1 ; :::; tn ) is such that
Z1
f (n+1) (; t1 ; :::; tn )d CC +;Kc :
We see that the optimal action must be a function of (t1 ; :::; tn ) and, under the considered loss function l , it is then
a (t1 ; :::; tn ) =
a1 a2
R
if R1 f (n+1) (; t1 ; :::; tn )d < CC+;Kc ; if 1 f (n+1) (; t1 ; :::; tn )d > CC+;Kc
(5.9)
we can indierently take a (t1 ; :::; tn ) = a1 or a (t1 ; :::; tn ) = a2 when
Z1
f (n+1) (; t1 ; :::; tn )d = CC +;Kc :
Example 5.9. (Acceptance of lots after inspections). Coming back to Example 5.6, we assume that we inspected n units and we found k defective units and (n ; k) good units.
After this result, we of course install the good units and discard the k defective units; furthermore we reconsider the problem whether to install the remaining (N ; n) units in the lot without further inspection or to inspect all of them. We see that, at this step, we face a new decision problem without observations, where the essential element of judgment is the conditional distribution k;n;k that we assess for the number of remaining defective units, given the result observed in the inspection. Let us now summarize what the above examples say. In the Bayesian approach it is natural to consider the conditional distribution (jx) of the unobservable variable W , given the value x, observed in the statistical experiment performed before taking the decision. Of course (jx) can be derived from the initial distribution and the statistical model ff (xjw)gw2V .
INTRODUCTION
199
Since (jx) completely describes the relevant information we have about W , just after the experiment, we can forget about the observation itself and
reconsider a new decision problem, without observations, speci ed by the elements
A; V ; (jx); l As a function of the observed data x (x 2 X ), it is then natural to consider as optimal any action a(jx), that is a Bayes action against the distribution (jx), i.e. such that
R (a(jx) ) R (a), 8a 2 A:
(5.10)
We see that, in this way, a decision problem with observations is reduced to an appropriate decision problem without observation. Remark 5.10. The distinction between decision problems without observations and decision problems with observations gives rise to a possibly useful mental scheme. However, such a distinction is not really substantial in a Bayesian approach. In fact, as we saw, any decision problem with observations reduces to a decision problem without observations after observing the statistical data; on the other hand, any decision problem without observations is such only in that the information contained in past observed data has already been incorporated within the distribution of unobservable quantities of interest. Notice that a(jx) is then a mapping from X to A. Any mapping with
: X ! A;
in a statistical decision problem, is called decision function or strategy. A decision function is then a rule associating an action (x) 2 A to any possible statistical result x 2 X ; denote by the set of all possible decision functions. Fix now a decision function 2 and consider the loss l ((X ); W ). Notice that, X being a random variable (or vector), (X ) is a random action, i.e. (X ) takes values in A. W is also random and then l ((X ); W ) is a random loss, a scalar random variable; the distribution of l ((X ); W ) of course depends on the joint distribution of (X; W ). As already mentioned, we usually assume that the conditional distributions of X given (W = w) are xed and they admit the densities f (xjw). The marginal (prior) distribution of W is denoted by and the marginal (predictive) density of X is given by
f X (x ) =
Z
V
f (xjw)d(w)
(5.11)
200
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
The expected value of l ((X ); W ), when it does exist, is expressed by : r ( ) E [l ( (X ); W )] =
Z Z X
V
l ((x); w)f (xjw)d(w) dx:
(5.12)
The quantity r () evaluates the risk associated with the strategy , against the distribution , before getting the information provided by statistical data. De nition 5.11. : X ! A is a Bayes decision function (against ) if the following inequality holds ) r ( ) r ( for all 2 : Obviously, when X and A are not nite, the space is not nite, generally, and then it can happen that no Bayes decision function exists (against some xed ). For a xed prior distribution , consider now the decision problem with observations where the original space X is replaced by X fx 2 XjfX (x) > 0g and suppose that a Bayes action a(jx) exists against the conditional distribution (jx), for all x 2 X . Then we have Proposition 5.12. The decision function de ned by b (x) = a(jx); x 2 X ; (5.13) is a Bayes strategy against . Proof. For 2 , it is r ( ) =
Z Z X
Z Z inf
V
X a2A V
Z
l ((x); w)f (xjw)d(w) dx
l (a; w)f (xjw)d(w) dx =
Z
fX (x) ainf l (a; w) f (xjw) d(w) dx 2 A fX (x) X V
On the other hand, by taking into account Bayes' formula, we have, 8a 2 A,
Z
l (a; w) f (xjw) d(w) = E [ l (a; W )jX = x] = R(jx)(a); fX (x) V
INTRODUCTION
201
and then we can write
Z
Z
fX (x) ainf l (a; w) f (xjw) d(w) dx = 2 A fX (x) X V
Z X
fX (x)
Z V
l (a(jx) ; w)d(wjx) dx = r b :
Remark 5.13. It is to be noticed that, for 2 , the computation of the quantity r () involves an integral over the space X of all the possible statistical results. The evaluation of a decision function in terms of the quantity r (), is then an a priori evaluation, which will not take into account the statistical result x 2 X that will be actually observed. Then evaluation of a decision function in terms of r () should not in general be confused with the a posteriori evaluation of actions, conditional on the observation of x. The a priori expected loss, on the other hand, is a useful concept in problems of optimal design of experiments where a decision is really to be taken before getting any statistical observation. For a more detailed discussion and useful examples on these points see Piccinato (1980) and (1993). Often, as for instance in Example 5.8, we are interested in predictive statistical decision problems ; this is the case when we have random variables X1 ; :::; Xn ; Xn+1 ; :::; XN so that X (X1 ; :::; Xn ) is the statistical observation and W (Xn+1 ; :::; XN ) is considered as the state of nature. In particular we can be interested in the cases when (X1 ; :::; XN ) are exchangeable. Remark 5.14. Consider in particular the case of a \parametric model" for the random variables X1 ; :::; Xn ; Xn+1 ; :::; XN in a predictive statistical decision problem; i.e. the case when X1 ; :::; XN are conditionally i.i.d. given a parameter . In such a case it may be mathematically convenient to transform the original decision problem into one where appears as the new state of nature. This can be done by de ning a new loss function as follows
bl(a; ) = E [ l (a; W ) j = ] ; 2 L
Remark 5.15. Notice that the operation above is possible any time when the
original state of nature W is conditionally independent of the statistical observation X , given a parameter . This explains why the parameter in a statistical model often has also the role of state of nature.
202
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Example 5.16. (Spare parts problem with observations). Let be a nonnegative random variable and consider the spare parts problem in the case when the process counting the arrival of shocks is Poisson with intensity , conditionally on f = g. Before deciding the number of spare parts, we may want to estimate , and thus we observe the number Xs of shocks which occur in a time interval of length s. We assume that Xs and N (the number of shocks during the mission time ) are conditionally independent given f = g, and Poisson distributed with means s and , respectively. We can look at this as a problem where Xs is the observation and is the new state of nature. The corresponding loss function is
bl(a; ) = E [ l (a; W ) j = ] =
c a ; K + (C + K ) expf; g
1 ( )i X i=a+1
i!
(5.14)
Example 5.17. (Life-testing before a burn-in procedure). Suppose that, be-
fore facing the decision problem mentioned in Examples 3.69 and 5.4, we obtained a result
D = fT1 = t1 ; :::; Th = th ; Th+1 > t; :::; Tn > tg from a life-testing experiment on n units, whose lifetimes are judged to be exchangeable with T . We can then look at the conditional (predictive) survival function F (1) (tjD) of T given D, and consider the optimal solution related to such a distribution. The optimal burn-in duration is a function a (t; n; h; t1 ; :::; tn ) of the result
observed in the life-testing experiment. Example 5.18. In the problem of designing a scram system, our decision is to be based on the distributions of the times to safe failure and to unsafe failure, for a scram unit, respectively. Suppose now that the scram units, in our judgement, are similar and not independent; before taking any decision, then, we may want to observe a number, say n, of similar units, each for a total time t. Denoting by Ui and Vi , respectively, the times to safe failure and to unsafe failure of the inspected scram unit i (i = 1; :::; n), we observe a special type of censored data, in fact what we observe is the set of pairs (Ti ; Ei ), where
8 1 if T = U < t < i i Ti = Ui ^ Vi , Ei = : 0 if Ti = Vi < t ; if Ti > t
INTRODUCTION
203
After observing a result D from this inspection,
D = f(T1 = t1 ; E1 = e1 ); :::; (Tk = tk ; Ek = ek ); (Tk+1 > t; Ek+1 = ;) ; :::; (Tn > t; En = ;)g; our decisions are to be based on the joint distributions of the times to safe failures and to unsafe failures of scram units, conditional on D. We have seen that, in the problems of interest in reliability and survival analysis, the following situation is met: for a vector of lifetimes T1 ; :::; Tn ; Tn+1; :::; TN we have that X (T1 ; :::; Tn ) is the statistical observation and W (Tn+1 ; :::; TN ) is the state of nature. In other cases W is a vector of residual lifetimes
Tn+1 ; rn+1 ; :::; TN ; rN : while the observation is of the form
D fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; TN > rN g i.e. survival data may appear in the statistical result along with failure data. For this reason we often prefer to denote a decision function by the symbol (D) rather then (x).
5.1.2 Statistical decision problems and suciency
In a statistical decision problem, the Bayes strategy, as a function a (D) of the observation D, is determined on the basis of the posterior distribution W (jD). Consider now the case when we can nd a function S of the observation which is sucient for the prediction of W , i.e. such that, for a pair of dierent set of observations D and Db , the following implication holds:
S (D) = S Db ) W (jD) = W jDb . In such a case we obviously have:
S (D) = S Db ) a (D) = a Db
This shows the interest in understanding the structure of suciency existing in a given decision problem.
204
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Example 5.19. Consider the life testing problem of Example 5.8 in the case
when the joint distribution of (T; T1 ; :::; Tn) is described by a proportional hazard model. Let D and Db be two dierent observed data of the form
D fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g Db fT1 = t01 ; :::; Th = t0h0 ; Th0+1 > rh0 0 +1 ; :::; Tn > rn0 g By the arguments in Example 2.18 we then have
a (D) = a2 , a Db = a2
whenever
h = h0 ;
h X i=1
R(ti ) =
h X i=1
R(t0i );
n X i=h+1
R(ri ) =
n X i=h+1
R(ri0 )
where R() is the cumulative hazard function. In the case R(t) = t, we get the well known suciency property of the exponential model (see Remark 5.32). Remark 5.20. The proportional hazard models are the only known models where we can nd a xed-dimension sucient statistic of data containing survival data for units of dierent \ages". In the case when all surviving units in the observed data share the same age, i.e. when, for some r > 0, it is
D fT1 = t1 ; :::; Th = th ; Th+1 > r; :::; TN > rg
(5.15)
we can have sucient statistics even for models dierent from those characterized by proportional hazards; in this respect see the arguments in Costantini and Pasqualucci, (1998). When the observation is as in (5.15) and the estimation concerns the residual lifetimes
Th+1 ; r; :::; TN ; r: the notion of \dynamic suciency" introduced in Section 2.4 can also be useful.
5.1.3 Some technical aspects
For practical reasons, we avoided considering, in this section, a few technical aspects, which are important from a mathematical point of view. Some of them, which might be of interest for non-professional mathematicians, will be mentioned here.
INTRODUCTION
205
For a given loss function l(a; w) and for a given random variable W (the \state of nature"), we considered the quantities l(a; W ) as real random variables (a 2 A). Furthermore, for a given random variable, or vector, X (the \observation") and for a decision function , we considered (X ) as a random variable taking values in the space of actions A and l( (X ) ; W ) as real random variables. In order to give the above a rigorous meaning, it is necessary, from a technical point of view, to equip the spaces A, V and X with - elds of subsets B(A), B(V ), and B(X ) respectively, and, related to that, we have to require appropriate measurability conditions for the functions l(; w); l(a; ); (see e.g. Billingsley, 1995). When the space is nite, one tacitly assumes that the - eld coincides with the family of all its subsets; when the space is a regular region of the set R of real numbers or of Rn , for some n, the - eld is the one formed by the Borel sets. In statistical decision problems (i.e., with statistical observations) the probability distributions P ( 2 L) are measures on (X ; B(X )). As mentioned, it is commonly assumed that any P ( 2 L) admits a density, which we denoted by f (xj) (this is not a very precise statement: one should say that f (xj) is a density with respect to a xed \- nite" positive measure on (X ; B(X ))). Typically, in cases of more frequent interest, however, X is indeed a regular region of R or of Rn and it is tacitly understood that is a counting measure or the Lebesgue measure. The assumption mentioned is, in the statistical literature, referred to by saying that the statistical model fP g2L is dominated. This condition of domination is really a very important one, since it ensures the following: If () is the density of the prior distribution with respect to a given - nite positive measure on (L; B(L) ), then also the posterior distribution (jX = x) admits a density (jx) with respect to . The relation between (jx) and () is described by the familiar Bayes' formula. When the statistical model is not dominated, we cannot generally use Bayes' formula anymore (at least in the form as we usually know it) to obtain the posterior distribution from the prior distribution and from the knowledge of the statistical model. In very simple words, we can say that the statistical model is not dominated when, at least for some values x of the statistical observation, the support of the posterior distribution (jX = x) is dierent from the one of the prior (i.e., observing x provides some information of \deterministic" type about the parameter). Example 5.21. Consider the common mode failure model, where we observe
Ti = ^ Ei ;
206
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
E1 ; :::; En being i.i.d lifetimes. The family of conditional distributions P of Ti given ( = ), > 0, is not dominated: given ( = ), there is a positive probability that (Ti = ), and we cannot derive the conditional distribution of (or of Tj , with j 6= i) given (Ti = ti ), by using the Bayes' formula in its common form; even if the prior distribution of admits a density, the posterior distribution of given (Ti = ti ) will not do, since it has a probability mass on the value ti (see e.g. Macci, 1999, for some discussion on the case of non-domination).
5.2 Stochastic orderings and orderings of decisions Comparisons between probability distributions can be used to obtain inequalities among Bayes decisions. In fact relevant aspects in the structure of the loss function l(ai ; w) in a decision problem can be combined with properties of stochastic ordering in order to achieve monotonicity properties of decision procedures. Results of this kind are well known in the classical literature about decision theory (see e.g. Karlin and Rubin, 1956 and Ferguson, 1967). In this section we shall consider Bayes decisions in the frame of reliability, along with some speci c aspects related to the presence of survival data among statistical observations. In particular we concentrate attention on the simple cases when A = fa1; a2 g (two-action problems) and the state of nature W is a scalar quantity (V R). As a rst illustration we start with problems without observations. Often it is reasonable to assume that the function
z (w) = l(a1 ; w) ; l(a2 ; w);
(5.16)
in a two-action decision problem, is monotone on V (i.e. one of the two actions becomes more and more preferable than the other, when the value of w becomes bigger and bigger); to x ideas assume that a1 and a2 are indexed in a way so that z (w) is a non-decreasing function. In this respect we have the following simple result. Let 1 , 2 be two probability distributions for W . Proposition 5.22. If 1 st 2 and z(w) is a non-decreasing function then
a1 = a2 ) a2 = a2 : (5.17) Proof. Let be an arbitrary distribution for W . By Equation (5.2), it is a = a2 if and only if
Z
V
l(a1 ; w)d (w) ;
Z
V
l(a2; w)d (w) =
Z
V
z (w)d (w) 0
STOCHASTIC ORDERINGS AND ORDERINGS OF DECISIONS
R
R
207
Then we have to show that V z (w)d1 (w) 0 implies V z (w)d2 (w) 0. This implication immediately follows from the assumption that z (w) is a nondecreasing function of w and that 1 st 2 (recall Theorem 3.12).
Example 5.23. In the two-action problem of Example 5.2, we accept the more \risky" decision a2 (using U for the mission) if and only if the survival function of T , the lifetime of U , is such that F ( ) CC+;Kc . The loss function l(ai ; t) is such that C for t < l(a1; t) ; l(a2 ; t) = f cc +; K for t is a non-decreasing function. Compare now two distributions F and G for T such that F st G. It is obvious that if we accept a2 against F then we must accept a2 against G, also.
Remark 5.24. In general, in a two-action problem we must implicitly assume that the function z (w) in (5.16) has at least one change of sign (otherwise we would not actually have any signi cant problem). Then the condition that z (w) is non-decreasing implies that z (w) changes sign exactly once or, more in detail, that there exist w0 such that z (w) 0, for w w0 ; z (w) 0, for w w0 .
(5.18)
The latter is obviously a much weaker assumption than monotonicity of z (w). The following result can be seen as a very special consequence of the signvariation diminishing property of TP2 functions; for the reader's use, we provide a direct proof. We notice that in this result only condition (5.18) is actually needed. We consider two prior distributions 1 and 2 with densities 1 and 2 , respectively. Proposition 5.25. Let the two densities 1 ; 2 be positive over V and such that 1 lr 2 . If the condition (5.18) holds, then
a1 = a2 ) a2 = a2 :
Proof. The condition ai = a2 (i = 1; 2) means
Z w0 ;1
jz (w)ji (w) dw
Z1 w0
jz (w)ji (w) dw:
(5.19)
208
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
The condition 1 lr 2 implies the existence of a value w 2 V , such that 2 (w) 1; 8w w and 2 (w) 1; 8w w: (5.20) 1 (w) 1 (w) In the case w = w0 , the result is trivial. Let us distinguish the two cases: a) w < w0 and b) w > w0 : Case a) For i = 1; 2, set
Ri
Zw
;1
jz (w)ji (w) dw; Si
Z w0 w
jz (w)ji (w) dw; Ti
Z1 w0
jz (w)ji (w) dw;
so that the condition ai = a2 can be rewritten as Ri + Si Ti : From (5.20), we can write
R2 = S1 =
Z w0 w
Zw ;1
jz (w)j 2 ((ww)) 1 (w) dw R1 ; 1
jz (w)j1 (w) dw
Z w0 w
jz (w)j 2 ((ww)) 1 (w) dw = S2 1
2 (w0 ) Z w0 jz (w)j (w) dw = 2 (w0 ) S 1 1 (w0 ) w 1 (w0 ) 1
2 (w0 ) T = Z 1 jz (w)j (w) dw T = Z 1 jz (w)j 2 (w) (w) dw: 1 2 1 (w0 ) 1 w0 1 (w) 1 w0 We can then conclude that, under the condition R1 + S1 T1, we have R2 + S2 R1 + 2 ((ww0 )) S1 2 ((ww0 )) R1 + 2 ((ww0 )) S1 = 1 0 1 0 1 0
2 (w0 ) R + 2 (w0 ) S 2 (w0 ) T T 1 (w0 ) 1 1 (w0 ) 1 1 ( w0 ) 1 2 i.e. the condition a2 = a2 . Case b) Now we set
Ri
Z w0
;1
jz (w)ji (w) dw; Si
Zw w0
jz (w)ji (w) dw; Ti
Z1 w
jz (w)ji (w) dw;
STOCHASTIC ORDERINGS AND ORDERINGS OF DECISIONS
209
so that we have the equivalence
ai = a2 , Ri + Si Ti In the present case, we can write
R2 =
Z w0 ;1
jz (w)j 2 ((ww)) 1 (w) dw 2 ((ww0 )) R1 1
1
0
2 (w0 ) S S = Z w jz (w)j (w) dw S 2 1 1 (w0 ) 1 2 w0 T2 =
Z1 w
jz (w)j 2 ((ww)) 1 (w) dw T1 : 1
Then, under the condition R1 + S1 T1 , we obtain
R2 + S2 2 ((ww0 )) R1 + S1 R1 + S1 T1 T2: 1
0
Remark 5.26. Statistical decision problems with two actions correspond, in
the standard statistical language, to problems of testing hypotheses (see e.g. Lehmann 1986; De Groot, 1970). A two-action problem with a loss function satisfying condition (5.18) can be seen as a problem of testing the hypothesis fW w0 g against fW > w0 g and Proposition 5.25 can be used in the derivation of the classical result about existence of uniformly most powerful tests, for families of densities with monotone likelihood ratio (see Lehmann, 1986). Here we are essentially interested in extending the analysis of predictive twoaction life-testing problems, as considered in Example 5.8. More exactly, we consider a two-action decision problem with observations where A fa1; a2 g, T is a non-observable lifetime, and the vector of lifetimes X = (T1 ; :::; Tn ) is the statistical observation. The loss function is speci ed by the functions l(ai ; t), so that we incur a random loss l(ai ; T ) if we choose the action ai (i = 1; 2). For two given histories ht
= fT1 = t1 ; :::; Tm = tm ; Tm+1 > t; :::; Tn > tg
h0t
= fT1 = t01 ; :::; Tm0 = t0m0 ; Tm+1 > t; :::; Tn > tg
(5.21)
210
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
we want to compare a (ht ), a (h0t ), the Bayes decision after observing ht and h0t , respectively. To this purpose, we aim to obtain, for the conditional laws of T given ht and h0t , likelihood-ratio comparisons which could be combined with Proposition 5.25. A result in this direction is the following. Proposition 5.27. Let (T1; :::; Tn; T ) be MTP2 and z(t) = l(a1; t) ; l(a2; t) satisfy the condition (5.18). If h0t is more severe than ht , then
a (h0t ) = a2 ) a (ht ) = a2 :
Proof. First we show that the MTP2 property for (T1 ; :::; Tn ; T ) implies that W = (T1 ; :::; Tn) is increasing in T in the multivariate lr sense.
By the de nition 3.20 of multivariate lr comparison, we only must check that, for arbitrary vectors (t01 ; :::; t0n ) and (t001 ; :::; t00n ), it is
fT1 ;:::;Tn (t0 ^ t00 jt0 ) fT1 ;:::;Tn (t0 _ t00 jt00 ) fT1 ;:::;Tn (t0 jt0 ) fT1 ;:::;Tn (t00 jt00 )
(5.22)
for 0 t0 t00 . This is immediate, taking into account that (T1 ; :::; Tn; T ) is MTP2 and that (5.22) reduces to
fT1 ;:::;Tn;T (t0 ^ t00 ; t0 ^ t00 ) fT1 ;:::;Tn;T (t0 _ t00 ; t0 _ t00 ) fT1 ;:::;Tn;T (t0 ; t0 ) fT1 ;:::;Tn;T (t00 ; t00 ) : Now we are in a position to apply Theorem 3.62, where we take = T: This shows that
fT(1) (tjht ) lr fT(1) (tjh0t )
(5.23)
Then the proof is concluded by applying Proposition 5.25. In Equation (5.21) we considered two observed histories, each containing several survival data of the type fTi > tg for a given t. More generally we can be interested in comparing observed data where surviving individuals have dierent ages. In the following we show a result in this direction; we shall restrict attention to the case of in nite extendibility. Remark 5.28. We recall that, when T1; :::; Tn; T are conditionally independent given a parameter , sucient conditions for the MTP2 property is provided by Theorem 3.59.
STOCHASTIC ORDERINGS AND ORDERINGS OF DECISIONS
211
Let D and Db be two dierent observations of the type
D fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g Db fT1 = t01 ; :::; Th0 = t0h0 ; Th0 +1 > rh0 0 +1 ; :::; Tn > rn0 g;
(5.24)
say. Similarly to the above, denote by a (D) and a (Db ) the Bayes decisions after observing D and Db , respectively, in the predictive two-action problem where T has the role of state of nature, the vector X (T1 ; :::; Tn ) has the role of observable variable, and l(ai ; t) (i = 1; 2) is the loss function. We now consider the case when T1 ; :::; Tn; T are conditionally independent and identically distributed given a parameter with conditional density g(tj) and conditional survival function G (tj); L denotes the space of possible values of and 0 the initial density of . The following result shows that, under conditions analogous to those of Proposition 5.27, we can obtain the same conclusion therein, replacing the histories ht and h0t with observations of the type D and Db in (5.24). Proposition 5.29. Assume g(tj) to be a TP2 function of t and . If h h0; ti t0i (i = 1; :::; h); ri t0i (i = h + 1; :::; h0 ); ri ri0 (i = h0 + 1; :::; n) and z (t) = l(a1; t) ; l(a2 ; t) satisfy the condition (5.18), then
a (Db ) = a2 ) a (D) = a2 :
Proof. Similarly to what was done in the proof of Proposition 5.27, we must check that f (1) (tjD) and f (1) (tjDb ), the conditional densities of T , given D and Db , respectively, are such that
f (1)(tjD) lr f (1) (tjDb ): By conditional independence of T; T1; :::; Tn given , it is f (1) (tjD) =
Z
L
g(tj)(jD)d;
f (1)(tjDb ) =
Z
L
(5.25)
g(tj)(jDb )d:
By the TP2 assumption on g(tj), the stochastic comparison (5.25) is implied by the condition
(jD) lr (jDb )
(5.26)
(see Remark 3.16). By Bayes' formula, it is
(jD) = g(t1 j):::g(th j)G (rh+1 j) :::G (rn j) 0 ()
; (jDb ) = g(t01 j):::g(t0h j)g(t0h+1 j):::g(t0h0 j)G rh0 0 +1 j :::G (rn0 j) 0 ()
212
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
and thus the comparison (5.25) is achieved by checking that the product
g(t1 j):::g(th j) G (rh+1 j) :::G (rh0 j) G;(rh+1 j):::G (rn j) g(t01 j):::g(t0h j) g(t0h+1 j):::g(t0h0 j) G rh0 0 +1 j :::G (rn0 j) in an increasing function of . This can, in its turn, be easily obtained by using once again the assumption that g(tj) is TP2 and by taking into account that the latter implies that G (tj) is TP2 as well (see Theorems 3.10 and 3.11).
5.3 Orderings of residual lifetimes and majorization In the last section we saw the possible role, in the eld of predictive decision problems, of stochastic comparisons between conditional distributions of lifetimes given two dierent survival and/or failure data D and Db , for exchangeable lifetimes T1 ; :::; Tn . In this section we x attention on the case when D and Db have the same value for the total time on test statistic. This topic is related to the analysis of some majorization properties for the joint distribution of T1 ; :::; Tn. In this respect, it is convenient to start by summarizing some relevant properties of Schur-constant densities. Let us consider then a Schur-constant density
f (n) (t
1 ; :::; tn ) = n
n ! X i=1
ti :
(5.27)
Let D be an observation of the form D = fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn > rn g and look at the conditional distribution of the residual lifetimes
(5.28)
Th+1 ; rh+1 ; :::; Tn ; rn ; given D. By combining Equation (5.27) with Equation (2.20) of Chapter 2, we obtain that, for 0 rj sj , the conditional survival function is
P fTh+1 > sh+1 ; :::; Tn > sn jDg =
R 1 ::: R 1 (Pn t ) dt :::dt Rs1h+1 ::: Rs1n n (Pni=1 ti ) dth+1:::dtn rh+1
rn n
i=1 i
h+1
n
(5.29)
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
213
The latter equation shows the following relevant features of models with Schur-constant densities: The conditional distribution of the residual lifetimes Th+1 ;rh+1 ; :::; Tn ;rn , given D, has itself a Schur-constant survival function and a Schur-constant density Then, given D, Th+1 ; rh+1 ; :::; Tn ; rn are exchangeable (see Proposition 4.37) Let the observation
Db = fT1 = t01 ; :::; Th = t0h ; Th+1 > rh0 +1 ; :::; Tn > rn0 g (5.30) contain the same number of failures, h, as D and have the same value for the
total time on test
Y0 =
h X 0 i=1
ti +
n X i=h+1
ri0 =
h X i=1
ti +
n X i=h+1
ri = Y ;
(5.31)
then the conditional distributions of the residual lifetimes of the surviving individuals, given D and Db respectively, do coincide. We can thus conclude with the following
Proposition 5.30. Let f (n) be Schur-constant. Then the conditional distribu-
tion of the residual lifetimes, given D, is exchangeable and the pair (h; Y ) is a sucient statistic. Remark 5.31. Exchangeability of the residual lifetimes given observations D as in (5.28) is not only necessary, but also a sucient condition for the joint density f (n) to be Schur-constant (see again Proposition 4.37). Remark 5.32. For the case when T1; :::; Tn are conditionally i.i.d. exponential, the suciency property in Proposition 5.30 is a special case of that shown in Example 5.19.
We want now turn to the main problem of this section and then consider the case of a density which is not necessarily Schur-constant. Let D and Db be two dierent observations as in (5.28) and (5.30), respectively. We naturally expect that, even if D and Db satisfy the equality (5.31), i.e. if they contain the same number of failures and have the same value for the total time on test statistic, the conditional distributions, given D and Db , of vectors of the residual lifetimes (Th+1 ; rh+1 ; :::; Tn ; rn ) and (Th+1 ; rh0 +1 ; :::; Tn ; rn0 ) are dierent. Furthermore, such conditional distributions are not exchangeable (if f (n) is in particular Schur-concave or Schur-convex, the marginal distributions are ordered as explained in Proposition 4.33).
214
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Next, we analyze conditions, for the joint distribution of the lifetimes, under which the two distributions can be compared in some stochastic sense, at least for some pairs D and Db satisfying suitable conditions of majorization. For simplicity's sake we compare the one-dimensional conditional distributions, given D and Db , respectively, of Tj ; rj and Tj ; rj0 for j = h + 1; :::; n; due to exchangeability there is no loss of generality in xing j = n. We start analyzing the case of observations of only survivals, i.e. the case h = 0, furthermore we take rn = rn0 = r, for some r 0: Let us then look at the conditional survival functions F Tn ;r (jD) and F Tn ;r (jDb ) ; where, for two sets of ages (r1 ; :::; rn;1 ; r) and r10 ; :::; rn0 ;1 ; r , it is
D fT1 > r1 ; :::; Tn;1 > rn;1 ; Tn > rg; Db fT1 > r10 ; :::; Tn;1 > rn0 ;1 ; Tn > rg Remember that we generally want D and Db to satisfy the equality (5.31) which here reduces to nX ;1 i=1
ri =
nX ;1 i=1
ri0 ;
in particular we assume, say, the condition
;r0 ; :::; r0 (r ; :::; r ) : 1 n;1 1 n;1
(5.32)
For this case we want to compare F Tn ;r (jD) and F Tn ;r (jDb ) in the sense of the (one-dimensional) hazard rate ordering. It is not restrictive in practice to limit attention to the case r = 0, so that
D fT1 > r1 ; :::; Tn;1 > rn;1 g; Db fT1 > r10 ; :::; Tn;1 > rn0 ;1 g
In view of Theorem 3.10, we look for conditions such that the ratio
F Tn (tjD) F Tn (tjDb )
is a monotone function of t, or in particular an increasing function, say. Since (n) F Tn (tjD) = F(n;(1)r1 ; :::; rr;1 ; t) ; F (r1 ; :::; rr;1 )
and a similar expression is valid for F Tn (tjDb ), we can equivalently consider the condition
F (n) (r1 ; :::; rr;1 ; t) increasing function of t. F (n) (r10 ; :::; rr0 ;1 ; t)
(5.33)
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
215
When F (n) admits the rst-order partial derivatives with respect to its variables, the latter condition can be written as @ F (n) (r ; :::; r ; t) @ F (n) (r0 ; :::; r0 ; t) F (n) (r1 ; :::; rr;1 ; t) : 1 r ;1 1 r ;1 @t @t F (n) (r10 ; :::; rr0 ;1 ; t) or as (n) @ @ F (n) (n) 0 0 j @t F (r1 ; :::; rr;1 ; t)j j @t F (r1 ; :::; rr;1 ; t)j (n) (r10 ; :::; rr0 ;1 ; t) ; (5.34) F (r1 ; :::; rr;1 ; t) since F (n) is a non-increasing function of its coordinates and then
@ (n) @t F (r1 ; :::; rr;1 ; t) 0:
Remark 5.33. Assume the existence of a continuous joint density f (n) for
F (n) . Then it is Z1 Z1 @ (n) j @t F (r1 ; :::; rn;1 ; t)j = ::: f (n) (t1 ; :::; tn;1 ; t)dt1 :::dtn;1 : r1 rn;1 By applying Theorem 4.30, we see that Schur-concavity of f (n) is a sucient condition for Schur-concavity of both F (n) and j @t@ F (n) (r1 ; :::; rn;1 ; t)j. On the other hand, when F (n) (r1 ; :::; rr;1 ; t) is Schur-concave, the inequality (5.34) is a stronger condition than Schur-concavity of j @t@ F (n) (r1 ; :::; rr;1 ; t)j.
The following example presents the case of survival functions F (n) , which satisfy the condition (5.33), without being dierentiable nor being necessarily Schur-concave. Example 5.34. (Common mode failure models). Extending Examples 4.26 and 2.22, consider joint survival functions of the form
F (n) (s1 ; :::; sn ) = G(s1 ) ::: G(sn ) H 1max s in i
where G and H are one-dimensional survival functions. The condition (5.33) becomes H (z _ v) increasing in v H (z 0 _ v ) where we set z = max1in;1 ri , z 0 = max1in;1 ri0 . It is easy to see that this happens, irrespective of the choice of G and H (see Exercise 5.67). Notice that log-concavity of G implies Schur-concavity of F (n) .
216
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Now we consider the case when T1 ; T2 ; :::; Tn are conditionally independent identically distributed, given a scalar parameter , i.e. when the joint survival function is of the form
F (n) (s1 ; :::; sn;1 ; sn ) =
Z Yn
L j =1
G(sj j)0 ()d;
(5.35)
with L R. For this case we have the following easy result. Proposition 5.35. Let G(sj) admit a partial derivative w.r.t. and satisfy the conditions i) GG((ss0jj)) increasing (decreasing) as a function of s and , for s < s0 ii) @@ log G(sj) concave (convex) as a function of s, 8 2 L: Then (5.33) holds. Proof. We can limit to analyze the case GG((ss0jj)) increasing, @@ log G(sj) concave. Due to conditional independence given , we have
F Tn (tjD) = where
Z
L
G(tj)(jD)d; F Tn (tjDb ) =
(jD) / 0 ()
nY ;1 j =1
Z
L
G(tj)(jDb )d
Y G(rj j); (jDb ) / 0 () G(rj0 j) n;1 j =1
Resorting to Remark 3.16, we see that, if GG((ss0jj)) is increasing, we have that F Tn (tjD) is increasing in t when the ratio (jD) is a decreasing function of , F Tn (tjDb ) (jDb )
Qn;1
=1 G(rj j) is a decreasing function of . i.e. when Qnjj=1 ;1 G(rj0 j) Now notice that Q ;1 G(r j) j @ nj=1 Q n ; 1 @ j=1 G(rj0 j) =
2n;1 3 3 2n;1 3 2n;1 ;1 @ G(r0 j) @ G(rj j) nX Y Y X 4 G(rj j)5 4 G(rj0 j)5 4 @ ; @ 0 j 5 : j =1
j =1
j =1
G(rj j)
j =1
Then (recall Proposition 4.12) the condition ii) implies Q ;1 G(r j) j @ nj=1 ;1 G(r0 j) 0: @ Qnj=1 j
G(rj j)
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
217
Example 5.36. Continuing Examples 2.18 and 4.22, let us consider the case of proportional hazard models, where (5.35) holds with L = [0; 1) and G(sj) = expf;R(s)g; R(s) being an increasing function. It is G(sj) = expf[R(s0 ) ; R(s)]g G(s0 j) and then GG((ss0jj)) is increasing, for s < s0 ; furthermore
@ @ log G(sj) = ;R(s)
Qn;1
G(rj j) and then we obtain that Qnj=1 is decreasing (or increasing) when R(s) is ;1 j=1 G(rj0 j) convex (or concave), which corresponds to F (n) being Schur-concave (or Schurconvex). Remark 5.37. When the function R() in the example above admits a derivative () and this is non-decreasing, F (n) is Schur-concave and dierentiable and the inequality (5.34) holds. However, in such a case, something even stronger happens. Consider in fact the conditional densities
fTn (tjD) =
Z1 0
g(tj)(jD)d; fTn (tjDb ) =
Z1 0
g(tj)(jDb )d
where
g(tj) = (t) expf;R(t)]g and notice that R(t) being convex (and then F (n) Schur-concave) also means that the ratio g(tj) = (t) expf[R(t0 ) ; R(t)]g g(t0 j) (t0 ) is increasing in for t < t0 . By taking into account the corresponding monotonicity property for ((jjDDb )) , we can then conclude that even the stronger comparison
fTn (tjD) lr fTn (tjDb )
holds.
(5.36)
218
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
5.3.1 The case of observations containing failure data
Now we analyze the case when the observations D and Db also contain failure data for the lifetimes T1 ; :::; Tn;1 ; in this case we need to develop arguments which directly involve the conditional densities and we compare the two conditional distributions in the sense of (one-dimensional) likelihood-ratio ordering. Consider two observations
D = fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 ; :::; Tn;1 > rn;1 g; Db = fT1 = t01 ; :::; Th = t0h ; Th+1 > rh0 +1 ; :::; Tn;1 > rn0 ;1 g such that
h X i=1
and look at the condition
t0i =
h X i=1
ti ;
nX ;1 i=h+1
ri0 =
nX ;1 i=h+1
ri :
fTn (jD) lr fTn (jDb ):
(5.37) (5.38)
(5.39)
First, we focus again attention on the case of conditionally i.i.d. lifetimes, where the joint density is of the form
f (n) (t1 ; :::; tn;1 ; tn ) = and then
fTn (tjD) =
Z L
Z Y n
L j =1
g(tj j)0 ()d:
g(tj)(jD)d; fTn (jDb ) =
with
(jD) / 0 () (jDb ) / 0 ()
hY ;1 j =1 hY ;1 j =1
g(tj j) g(t0j j)
Z L
nY ;1 j =h+1 nY ;1 j =h+1
g(tj)(jDb )d
G(rj j); G(rj0 j):
R As usual we set G(sj) = s1 g(t)dt and assume L R.
(5.40)
(5.41)
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
219
Similarly to Proposition 5.35, this time we have the following result, for which we assume the existence of partial derivatives of both g(tj) and G(sj) with respect to the variable : Proposition 5.38. Let gg((tt0jj)) be increasing in , for t < t0 and let @@ log g(tj), @ @ log G(sj) be concave functions of t and s, respectively. Then
fTn (jD) lr fTn (jDb )
whenever t0 t, r0 r: Proof. By taking into account the identities in (5.41) and reasoning as in the proof of Proposition 5.35, the proof reduces to showing that ((jjDDb )) is an increasing function of . We now have
Q
Q
;1 g(t j) n;1 G(r j) j j (jD) _ hj=1 j =h+1 Q Q h ; 1 n ; 1 0 0 : b (jD) j =1 g (tj j) j =h+1 G(rj j)
Similarly to what was done in the proof of Proposition 5.35, the ratio ((jjDDb )) isQhthen easilyQseen to be increasing by developing the partial derivatives of n;1 G(rj j) g ( t j ) j j =1 j = h +1 Qhj=1 g(t0j j) , Qnj=;h1+1 G(rj0 j) with respect to and by taking into account the assumptions of concavity for @@ log g(tj) and @@ log G(sj).
Example 5.39. As in Remark 5.37, take L = [0; 1) and g(tj) of the form g(tj) = (t) expf;R(t)g: It is easy to see that the conditions in Proposition 5.38 hold when (t) is increasing. For this special kind of model where T1 ; :::; Tn are conditionally independent with a joint density of the form
f (n)(t1 ; :::; tn ) =
Z1 0
n (t1 ):::(tn ) expf;
n X i=1
R(ti )g0 () d:
(5.42)
(proportional hazard models with dierentiable cumulative hazard rate functions R), it is interesting to notice that the following conditions are all equivalent: i) (t) is increasing, ii) F () is Schur-concave iii) @@ log G(sj) = ;R(t) is concave, iv) @@ log g(tj) = 1 ; R(t) is concave.
220
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
At the cost of some more assumptions and technicalities the above arguments can be extended to the case when T1 ; :::; Tn are conditionally i.i.d. given a multidimensional parameter . Now we turn however to consider more general cases of exchangeable densities f (n) and we limit to observations exclusively containing failure data:
D = fT1 = t1 ; :::; Tn;1 = tn;1 g; Db = fT1 = t01 ; :::; Tn;1 = t0n;1 g:
(5.43)
Notice that the condition (5.39) becomes fTn (tjT1 = t1 ; :::; Tn;1 = tn;1 ) fTn (tjT1 = t01 ; :::; Tn;1 = t0n;1 ) increasing function of t; which is equivalent to the condition
f (n) ;(t1 ; :::; tn;1 ; t) increasing function of t. f (n) t01 ; :::; t0n;1 ; t
(5.44)
When f (n) admits the rst-order partial derivative @t@ f (n) (t1 ; :::; tn;1 ; t), the latter is in turn equivalent to
@ (n) ; 0 0 (n) @ (n) (n) ; 0 0 @t f (t1 ; :::; tn;1 ; t) f t1 ; :::; tn;1 ; t @t f t1 ; :::; tn;1 ; t f (t1 ; :::; tn;1 ; t) : (5.45)
Consider now
t0 ;t01; :::; t0n;1 t (t1 ; :::; tn;1) :
; We notice that the inequality (5.45), when @t@ f (n) t01 ; :::; t0n;1 ; t 0, is trivially equivalent to @ f (n)(t1; :::; tr;1; t) @ f (n)(t01 ; :::; t0r;1; t) f ((nn))(t01; :::; t0r;1; t) @t @t f (t1 ; :::; tr;1 ; t)
and consider the class of the real-valued functions W (t1 ; :::; tn;1 ; t) satisfying the property
;:::;tn;1 ;t) for W (t0 ; :::; t0 ; t) < 0 jW (t1 ; :::; tn;1 ; t)j W (t01 ; :::; t0n;1 ; t) ff (n) ((tt011 ;:::;t 0n;1 ;t) 1 n;1 W (t1 ; :::; tr;1 ; t) W (t01 ; :::; t0r;1 ; t) for W (t01 ; :::; t0n;1 ; t) 0
; for t01 ; :::; t0n;1 (t1 ; :::; tn;1 ).
(n)
The following result can easily be proved (see Exercise 5.68).
(5.46)
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
221
Proposition 5.40. Let f (n) be Schur-concave and @t@ f (n) (t1 ; :::; tn;1 ; t) satisfy ; 0 0 the conditions in (5.46). Then, t1 ; :::; tn;1 (t1 ; :::; tn;1 ) implies fTn (jD) lr fTn (jDb ): Remark 5.41. The hypotheses in Proposition 5.40 are rather strong; however,
they are sucient but not necessary for the result; in some cases when the joint density is Schur-concave, the condition (5.44) can be substantially obtained without conditions of dierentiability. A case of interest is shown in the next example. Example 5.42. (Linear breakdown model). Consider the joint density n
f (n) (t1 ; :::; tn ) = n! expf; t(n) g (see Example 2.50). In this case it is
f (n) ;(t1 ; :::; tn;1 ; u) = expf [u _ z 0 ; u _ z ]g f (n) t01 ; :::; t0n;1 ; u
where we let z max(t1 ; :::; tn;1 ), z 0 max(t01 ; :::; t0n;1 ). Since max(t1 ; :::; tn;1 ) is a Schur-convex function, wehave that, as already ; noticed, f (n) is Schur-concave and that, for t01 ; :::; t0n;1 (t1 ; :::; tn;1 ), it is (n) ;:::;tn;1;u) z 0 z ; whence the desired monotonicity property of ff(n) ((tt011 ;:::;t 0n;1;u) readily follows. Remark 5.43. Arguments above can be reformulated to deal with some other cases by reversing the directions of inequalities and by suitably replacing the condition of Schur-concavity with that of Schur-convexity and vice versa. A natural application of arguments here is, for instance, the life-testing problem with two actions. For the case when the units are conditionally i.i.d. exponential, the following is a well known (and very useful) fact: sometimes we prefer to test a few units for a long time, other times we prefer, for various cost reasons, to test a large number of units, each for a short time. When the total number of failures and the total time on test are the same in the two dierent experiments, we reach the same information concerning the residual lifetimes (of both surviving and new units). This means that we must do the same prediction on the behavior of those units and then any choice between two dierent actions concerning them must also be necessarily the same (see e.g. Barlow and Proschan, 1988). We can then conclude that, as an important issue of conditional exponentiality of similar units, two dierent observations lead to the same Bayes decision, provided the total time on test and the number of failures is the same.
222
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
The same argument can also be repeated for any model with a Schurconstant density, slightly more generally. It is also important to stress that, in Schur-constant models, the conditional distribution for a residual lifetime, given a set of failure and survival data, does not depend on the age of the surviving unit of interest (where age 0 in particular means that the unit is new). This implies that, in predictive decision problems, we do not have to distinguish between new units and used units (provided they have not failed yet). When we drop the assumption that the joint density is Schur-constant, we expect that two dierent observations contain dierent information, even though they present the same number of failures and the same total time on test statistics. It is then natural to wonder which of the two observations is more favorable, to one of the two actions (say a1 ), than the other one. The problem here is in some sense analogous, even though dierent, from what we considered in the previous section. After directing the reader's attention to the uses of notions of stochastic comparisons in decision problems with observations, therein we ordered the conditional distributions, given two sets of data comparable in the sense of less severeness. Here we can repeat arguments analogous to those therein. The dierence is in that we compare two sets of failure or survival data, which present the same value for the total time on test statistic and then which cannot be comparable in the sense of less severeness. Example 5.44. Let T1; T2; :::; Tn be lifetimes with joint density f (n)and consider the lifetesting problem for Tn , with loss function speci ed in (5.3). X = (T1 ; :::; Tn;1 ) is the statistical observation and we denote by Xa2 the region of acceptance of the more risky action a2 , for the Bayes decision function, i.e. we set
Xa2 = f(t1 ; :::; tn;1 ) ja (t1 ; ; :::; tn;1 ) = a2 g: It is then (see also (5.9))
Xa2 = f(t1 ; :::; tn;1 ) j
Z1
fTn (tjt1 ; :::; tn;1 ) CC +;Kc g:
;
If f (n) is such that the condition (5.44) holds for t01 ; :::; t0n;1 (t1 ; :::; tn;1 ), then we have that Xa2 is a Schur-convex set, i.e. its indicator function is Schurconvex. Remark 5.45. Let X be a lifetime with a density f and Y a lifetime with a density g and consider, for r > 0, the conditional densities fX ;r (tjX > r) = R f1(t + r) ; gY ;r (tjY > r) = R g1(t + r) : r f (x)dx r g (x)dx
ORDERINGS OF RESIDUAL LIFETIMES AND MAJORIZATION
223
Then it is immediately seen that f lr g implies fX ;r (jX > r) lr gY ;r (jY > r). This shows that, under the assumptions of Proposition 5.40, we also have, for any r > 0, the implication
;t0 ; :::; t0 (t ; :::; t ) ) F (jDb ) F (jD); 1 n;1 Tn ;r lr Tn ;r 1 n;1
(5.47)
where now
D = fT1 = t1 ; :::; Tn;1 = tn;1 ; Tn > rg; Db = fT1 = t01 ; :::; Tn;1 = t0n;1 ; Tn > rg:
Example 5.46. Some ideas, intrinsic to the arguments of this section, can be
well illustrated by a further analysis of proportional hazard models. Take T1; :::; Tn conditionally independent with a joint density of the form (5.42) We assume (t) = R0 (t) to be monotone; recall that being increasing means that the joint survival function is Schur-concave and that the stochastic comparison
F Tn ;r (jDb ) lr F Tn ;r (jD) holds, for the residual lifetime Tn ; r, with D = fT1 = t1 ; :::; Th = th ; Th+1 > rh+1 :::; Tn;1 > rn;1 ; Tn > rg; Db = fT1 = t01 ; :::; Th = t0h ; Th+1 > rh0 +1 ; :::; Tn;1 > rn0 ;1 Tn > rg
t0 t; r0 r: If is decreasing the comparison
F Tn ;r (jD) lr F Tn ;r (jDb )
on the contrary holds. Take now
rn0 ;1 = rn;1 and suppose rn;1 < r, say, and also compare the conditional survival probabilities
P fTn;1 ; rn;1 > jDg; P fTn ; r > jDg; P fTn;1 ; rn;1 > jDb g; P fTn ; r > jDb g:
224
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
By Proposition 4.33, it is
P fTn;1 ; rn;1 > jDg P fTn ; r > jDg
or
P fTn;1 ; rn;1 > jDg P fTn ; r > jDg
according to being increasing or decreasing. The same happens for conditional probabilities given Db . This then means that if, between two surviving units, we prefer the less aged one, we should also evaluate the observation D \more optimistic" than Db .
5.4 Burn-in problems for exchangeable lifetimes Here we consider the problem of optimally choosing a burn-in procedure for
n units U1; :::; Un with exchangeable lifetimes T1 ; :::; Tn and shall also see some
application of the likelihood ratio orderings. Throughout, the assumptions described below are made (see also Spizzichino, 1991, Runggaldier, 1993b). We assume that the burn-in starts simultaneously for all the units at time 0 when they are new (of age 0); at any time t > 0, all the working units share the same age t. When the burn-in is terminated, all the surviving units are delivered to operation. Furthermore it is assumed that the units undergo the same stress level, both during burn-in and when in operation. The history of what happened up to any time t, before the burn-in experiment is terminated, is a dynamic history of the form: ht = fHt = h; T(1) = t1 ; :::; T(h) = th ; T(h+1) > tg (5.48) as in (2.40); T(1); :::; T(n) are the order statistics of T1 ; :::; Tn and Ht is the process counting the number of failures observed up to t (recall position (2.57)). Here we emphasize the role of the variable Ht and add the index t, to emphasize dependence on time. We assume that we have complete information about what happens during the burn-in, i.e. that, for any t, we are able to actually observe ht . In principle we admit the possibility of having dierent burn-in times for dierent units; it is furthermore assumed that the loss caused by choosing the durations a1 ; :::; an for the burn-in of the n units, when their life-lengths are w1 ; :::; wn respectively, has the additive form
el(a; w) = X l(ai; wi ) n
i=1
(5.49)
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
225
where l(ai ; wi ) is the loss caused by a burn-in of duration ai for a single unit, when its life-length is wi .
5.4.1 The case of i.i.d. lifetimes
Let us start with the case when T1 ; :::; Tn are independent. In such a case, it is easily understood that the optimal burn-in time for each unit is a quantity which is independent of the burn-in times and of the behavior of the other units. We then see that the burn-in problem for T1; :::; Tn reduces to a set of n independent burn-in problems for n single units. In other words, we must minimize, with respect to a 2 Rn+ , the expected value E
he
i
l(a; T) = E
"X n i=1
l(ai ; Ti ):
#
Let us denote by ai the optimal burn-in time for the unit i (i = 1; :::; n). When T1; :::; Tn are independent and identically distributed, symmetry arguments show that it must be
a1 = a2 = :::an = a ; (5.50) for some suitable value a . The value a then minimizes E [ l(a; Ti):] and actually
can be found by minimizing the expression in (5.7).
t (1)
t (2)
a*
Figure 11. n = 4; t(3) ; the deterministic burn-in time a :
a ; t
(4)
;
a
t (3)
t (4) t
are the operative lives of units surviving
226
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Sometimes (as for instance in De nition 5.55 below) one may need to emphasize the dependence of a on;the lifetime's survival function F . In such cases, we shall use the symbol a F . Remark 5.47. We stress that, when T1; :::; Tn are i.i.d., the quantity a, the same optimal duration of burn-in for all the units, is a deterministic (nonnegative) quantity: it neither depends on the number of failures that can be observed at early times, nor on the values of the failure times that will be possibly observed; in other words, a is to be xed at time 0. It is also remarkable that a does not depend on n, the total number of units to submit to burn-in. In view of what was said before, we continue this subsection by concentrating attention on the burn-in problem for a single unit, the survival function of its lifetime T being denoted by F . Furthermore, we assume in particular that the loss function is as in (5.5); we recall that, in such a case, the quantity to be minimized is E [l (a; T )] = c + (C
; c)F (a) ; (C + K )F (a + ):
(5.51)
Letting
= CC +;Kc ; we can obtain:
(5.52)
; a F = 0 if f (fa(+a) ) ; 8a 0 ; a F = +1 if f (fa(+a) ) ; 8a 0
; ) and f (a + ) for some a > 0: 0 < a F < 1, if ff ((0) f (a) By dierentiating (5.51) with respect to a, we can obtain more speci cally Proposition 5.48. Under the condition that a value ba > 0 exists such that f (a + ) for a ba, f (a + ) for a ba (5.53) f (a) f (a) it is
;
a F = ba:
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
227
Note that the condition (5.53) certainly holds when, for the given value of
and for some a > 0, it is and
f ( ) ; f (a + ) > f (0) f (a) f (a + ) increasing. f (a)
(5.54)
Remark 5.49. Let us consider the problem of optimally choosing the residual duration of burn-in for the residual lifetime T ; t given the survival data T > t. The corresponding density is then
fT ;t (xjT1 > t) = f (t + x) F (t) Assuming the condition (5.53) for f () implies that a similar condition also holds for fT ;t (jT > t), by replacing ba with ba ; t. Then the optimal duration for the surviving unit is ba ; t. This means that (if the unit does not fail before) we have to continue the burn-in up to reaching the age ba. Remark 5.50. Obviously the condition (5.54) certainly holds for all 0 if f () is log-convex. Example 5.51. Consider the case when T is conditionally exponential given a random variable with density 0 : f ( t) =
Z1 0
expf;tg0 () d.
(5.55)
f (t) is then log-convex. If R f ( ) = 01 expf; g0 () d ; f (0) E () the optimal burn-in time is a0 such that R f (a0 + ) 01R expf;( + a )g0 () d f (a0 ) = 01 expf;a g0 () d = E
expf; gjT
= :
1 = a0
(5.56)
In the following result, we consider two dierent densities f and g for the lifetime T , with corresponding survival functions F and G, respectively. We assume that both f and g satisfy the condition ; (5.53).; We want to compare the corresponding optimal burn-in times a F and a G .
228
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Proposition 5.52. If f lr g, then a ;F a ;G : Proof. By de nition f lr g implies f (a + ) g(a + ) : f (a) g(a)
Then if baf and bag are such that f (a + ) for a ba ; f (a + ) for a ba f f (a) f f (a) and g(a + ) for a ba ; g(a + ) for a ba ; g g (a) g g(a)
we can conclude that it must be baf bag .
5.4.2 Dependence and optimal adaptive burn-in procedures
In this subsection and in the next one we present an introductory discussion and an informative description of the theme of optimal burn-in in the case of dependence among units' lifetimes, without developing formal arguments. Let us then look at the case when T1 ; :::; Tn are dependent: we drop the assumption of stochastic independence but, in order to maintain the symmetry property of the problem, we keep the condition of exchangeability among the Ti0s. Due to the condition of dependence, a continuous process of learning takes place during the burn-in experiment; the consequence of this is twofold: i) At any instant t (during the burn-in procedure) we should take into account the information, regarding residual lifetimes of the units, collected up to t, in order to decide about the residual duration of the procedure itself, for the surviving units. This means that only \adaptive" procedures should be considered. ii) When planning, at time 0, the burn-in procedure, we should optimize also taking into account the expected amount of information that we could collect during the experiment. From a technical point of view, choosing an adaptive burn-in procedure means to choose a stopping time T , adapted to the ow of information described by fht gt0 as in (5.48) i.e. a non-negative random variable T such that, for any t 0, we can establish with certainty if the event fT tg or fT > tg is true only based on the information contained in ht .
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
229
In practice, in order to construct an adaptive procedure we have to reconsider the residual duration of the procedure only at the instants when possible failures arrive and to take into account the information, collected up to those instants (we have then a \multistage" decision problem). This is an intuitive fact, that could, nevertheless, also be proved in rigorous terms by using Theorem T 33 in Bremaud (1981, p. 308). For this reason we consider sequential burn-in procedures of the following type: - At the instant 0, when all the n units are working and are new, we start the burn-in procedure, planning to stop it at a deterministic time a(n) , if no failure will be observed in between. - If T(1) = t1 < a(n) , at t1 we reconsider the residual duration of burn-in for the remaining (n ; 1) units. This will be a non-negative quantity which takes into account the past information (i.e. fT(1) = t1 g) and that ; will be denoted by a(n;1) (t1 ); generally , a(n;1) (t1 ) will be dierent from a(n) ; t1 , as initially planned; notice that, when determining a(n;1) (t1 ), the (n ; 1) surviving units will have an age t1 . - If one of the units fails before t1 + a(n;1) (t1 ) (i.e. if T(2) = t2 with t2 < t1 + a(n;1) (t1 )), we x the residual duration of burn-in equal to a non-negative quantity, denoted by a(n;2) (t1 ; t2 ) and so on. A generic adaptive procedure of this kind will be denoted by the symbol S. When we need to avoid ambiguities and to emphasize that we are dealing with a speci c procedure S we shall write a(Sn) ; a(Sn;1) (t1 ); :::; a(1) S (t1 ; ::; tn;1 ). Then S is determined by the sequence
a(Sn) ; a(Sn;1) (t1 ); a(Sn;2) (t1 ; t2 ); :::; a(1) S (t1 ; ::; tn;1 ):
(5.57)
230
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
a*(2) (t 1 ,t 2) t (3) t (4) t Figure 12. n = 4; t3 ; a(2) (t1 ; t2 ) ; t4 ; a(2) (t1 ; t2 ) are the operative lives units surviving the burn-in time a(2) (t1 ; t2 ) : t (1)
t (2)
a*(4)
a*(3) (t 1)
of
Not any adaptive procedure S will be Bayes optimal with respect to the preassigned loss function el. A Bayes optimal procedure S is one that minimizes the overall expected cost over all possible adaptive procedures of the form in (5.57). Such a procedure can be indicated by S
[a(n) ; a(n;1) (t1 ); a(n;2) (t1 ; t2 ); :::; a(1) (t1 ; ::; tn;1 )]: Remark 5.53. Consider the case when T1; :::; Tn are i.i.d.: F (n) (t1 ; :::; tn ) = F (1) (t1 ) ::: F (1) (tn ):
Due to identical distribution the residual burn-in time must be the same for all the units surviving at an instant s > 0. Due to independence such a residual burn-in time must be independent of the failure times observed past to s and coincides with the optimal solution of the burn-in problem for a single unit of age s. We can expect that it is (see Remark 5.49 for special cases)
a(n) = a (F (1) ); a(n;1) (t1 ) = a (F (1) ) ; t1 ; :::; a(1) (t1 ; ::; tn;1 ) = a (F (1) ) ; tn;1 : See Herberts and Jensen (1999) for a formal treatment. Some discussions about the characterization of S by means of dynamic programming and about other aspects of the Bayes-optimal burn-in procedure, have been presented by Spizzichino, 1991 and Runggaldier, 1993.
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
231
However, a more complex notation would be necessary to describe such a characterization. Furthermore, the problem of actually computing the Bayesoptimal procedure is one of high complexity, if further simplifying assumptions are not made on the structure of the cost functions and on the stochastic model of the vector T1 ; :::; Tn (see also the discussion in the next subsection). For this reason it can be of interest to nd suitable inequalities for the Bayes-optimal procedure; in this respect, one may guess that the functions
a(n;1) (t1 ); a(n;2) (t1 ; t2 ); :::; a(1) (t1 ; ::; tn;1 ) have some monotonicity properties, under suitable assumptions on the loss functions and on the probability model for lifetimes. This topic can be related with the general study of monotonicity of sequential Bayes test (see Brown et al., 1979); it gives still rise to a rather complex problem, which will not be faced, here. These topics are in fact beyond the purposes of this monograph and we shall rather concentrate attention on the concept of open-loop optimal adaptive procedure ; this will be treated in Subsection 5.4.4. Denoting by bS the open-loop adaptive optimal burn-in procedure, which will be de ned therein, we shall see, instead, that some inequalities for the functions
ab(Sn;1) (t1 ); ab(Sn;2) (t1 ; t2 ); :::; ab(1) S (t1 ; ::; tn;1 ) may be obtained as an application of the previous Proposition 5.52.
5.4.3 Burn-in, optimal stopping, monotonicity, and Markovianity
Dierent models and formulations exist of burn-in problems and of their optimal solutions. It is a general fact however that burn-in problems can be seen as optimal stopping problems for suitable stochastic processes (see Aven and Jensen, 1999 and references cited therein). High complexity of optimal stopping problems compels to the search for special structures which can be exploited to reduce complexity or to a priori prove qualitative properties that the optimal solutions should have. Special structures often involve monotonicity properties of the cost functions and/or of the involved stochastic processes. In Aven and Jensen (1999), an approach is illustrated, based on the notion of semimartingale and on a speci c notion of monotonicity. The approach there permits one to deal with quite general models, where, in particular, the processes to be stopped are not necessarily Markov. Indeed there are problems of interest where Markovianity is not a realistic assumption. When, on the other hand, the process to be stopped actually has
232
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
such a property, the theory of optimal stopping of Markov processes is available (see Shiryaev, 1978). Being interested in the study of models admitting dependence, we xed attention on the condition of exchangeability. For exchangeable lifetimes T1; :::; Tn a useful connection with the property of Markovianity is sketched in what follows. For any t > 0, consider the (n + 1)-dimensional random vector de ned by
Ht (Ht ; T(1) ^ t; :::; T(n) ^ t); The rst coordinate Ht counts the number of failures observed up to t, then it is
T(1) = T(1) ^ t; :::; T(Ht ) = T(Ht ) ^ t; T(Ht +1) > t: When T(Ht ) < t (which in regular cases happens with probability one) Ht con-
tains then a redundant information; however it can be convenient for dierent reasons to consider the redundant description Ht in place of (T(1) ^ t; :::; T(n) ^ t). The process fHtgt0 has the following important properties: i) The knowledge of the value of Ht is equivalent to the observation of the event ht in (5.48). ii) fHtgt0 is a Markov process. iii) Any adaptive burn-in procedure for the lifetimes T1; :::; Tn can be seen as a stopping time for fHtgt0 (for the de nition of stopping time, see e.g. Bremaud, 1981, or Shiryaev, 1978). iv) We can assume that the cost function in a burn-in problem is such that the cost of stopping burn-in at a generic instant t is only a function of Ht . Finding a Bayes optimal adaptive burn-in procedure then becomes an optimal stopping problem for the process fHtgt0 , i.e. for a (n + 1) dimensional Markov process. The theory of optimal stopping of Markov processes can be applied to characterize the optimal Bayes burn-in procedures and to study their properties. However, even for Markov processes, analytic procedures to nd the solution of optimal stopping problems are seldom available; also, numerical procedures can be dicult due to great complexity involved in the computations. Since the computational complexity increases with the dimension of the process which is to be stopped, a possibly useful idea is the following: try to transform an optimal stopping problem of fHt gt0 into one for another process of reduced dimension. The notion of dynamic suciency (De nition 2.60) can be of help in this respect. First note that the set Gb de ned in (2.75) coincides with the state-space of the process fHt gt0. Then a function q de ned on Gb gives rise to a new
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
233
stochastic process fQt gt0, where
Qt q (Ht )
It can be seen that, under the assumption that q is a dynamic sucient statistic and under some additional technical condition, Qt is also a Markov process. Let us then think of models for (T1 ; :::; Tn) for which a low-dimension dynamic sucient statistic q exists so that Qt = q (Ht ) is a Markov process, and the dimension of its state-space is smaller than n. Sometimes, in a burn-in problem, the loss function is such that the cost of stopping the burn-in at t is only a function of Qt . In such cases the burn-in problem can become an optimal stopping problem of the process Qt . Example 5.54. As we saw in Subsection 2.3.4, a relevant example of dynamic suciency is the following: when T1 ; :::; Tn admit a Schur-constant joint density, a dynamic sucient statistic can be found in the pair Zt (Ht ; Yt ) (5.58) where Yt is the TTT process de ned in Section 2.4:
Yt =
n ; X
T(i) ^ t :
i=1 It can easily be seen that, when f (n) is Schur-constant, fZt gt0 in (5.58) is a
(two-dimensional) Markov process, which moreover turns out to have the property of stochastic monotonicity when T1 ; :::; Tn are conditionally independent, exponentially distributed (see Caramellino and Spizzichino, 1996). When the cost for conducting a burn-in up to time t depends only on the value of Zt ; one can actually reduce the (Bayes) optimal burn-in problem to an optimal stopping problem for the process fZtgt0 (Costantini and Spizzichino, 1997). Under the condition that T1 ; :::; Tn are conditionally independent and exponentially distributed, such an optimal stopping problem has furthermore a special structure, based on some stochastic comparisons in the lr sense. Such a structure derives from the mentioned stochastic monotonicity of fZtgt0 and from a suitable monotonicity property of the considered cost function; it can be exploited to nd the solution in a rather explicit form. This is then a case where the Bayes optimal burn-in procedure can actually be computed.
5.4.4 Stochastic orderings and open-loop optimal adaptive burn-in procedures In this subsection we give the de nition of open-loop feedback optimal burnin, then we show some related role of the arguments discussed in the previous sections.
234
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
The general concept of open-loop feedback optimality (OLFO) has been introduced within the theory of optimal control, where OLFO solutions of control problems are compared with closed-loop feedback optimal solutions (the reader can nd in the paper by Runggaldier (1993a) a review and an essential list of references on optimal control theory). Closed-loop feedback optimal solutions are those corresponding to Bayes optimality and, as mentioned, the problem of their computation is often a very complex one. OLFO solutions are more easily computable. Actually they are suboptimal according to the Bayesian paradigm of minimizing the expected cost; however they are generally reasonably good and not as complex to be computed. In any case they are based on a still Bayesian (even if suboptimal) logic, as will be seen later for our speci c case of interest. It is to be noted that the problem of nding an optimal burn-in procedure, substantially being a special problem of optimal stopping, can be seen as a very particular problem in the theory of optimal control. Then the concept of open-loop feedback optimality can be applied to the burn-in problem (see Runggaldier, 1993b). We are going to de ne what is meant by the term OLFO burn-in procedure, next. The term adaptive, which is more familiar to statisticians, will be sometimes used in the following in place of feedback. Let us, as usual, denote by F (1) the one-dimensional marginal distribution of T1; :::; Tn . For a dynamic history ht as in (5.48), let moreover F (1) t (jh; t1 ; :::; th ) denote the (one-dimensional) conditional distribution, given ht , of the residual lifetimes of the (n ; h) units (of age t) which are still surviving at t. De nition 5.55. The open-loop optimal adaptive burn-in procedure is the adaptive burn-in procedure bS de ned by the positions
ab(Sn) = a (F (1) ); ab(Sn;1) (t1 ) = a F (1) t1 (j1; t1 ) ;
ab(Sn;2) (t1 ; t2 ) = a F (1) t2 (j2; t1 ; t2 ) ; :::
(1) :::; ab(1) S (t1 ; ::; tn;1 ) = a F tn;1 (jn ; 1; t1 ; :::; tn;1 ) :
(5.59)
Remark 5.56. In words, the essence of an OLFO burn-in procedure bS can be explained as follows: at time 0, we x the burn-in duration equal to
ab(Sn) = a (F (1) ):
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
235
This means the following: initially we take into account that the common marginal survival function of T1; :::; Tn is F (1) , but we ignore the structure of dependence among those lifetimes, i.e. we behave as if T1 ; :::; Tn were independent. If no failure is observed before the time a (F (1) ), then, at that time, we deliver all the units to operation. If, on the contrary, T(1) < a (F (1) ), we reconsider, at T(1) , the duration of the residual burn-in for surviving units. We still go on as if the residual lifetimes were independent; however we take into account that the updated (one-dimensional) survival function for them is now F (1) t1 (j1; t1 ) and then we set
ab(Sn;1) (t1 ) = a F (1) t1 (j1; t1 )
and so on.
Remark 5.57. When T1; :::; Tn are i.i.d., with a marginal survival function
F (1) , the OLFO burn-in procedure does obviously coincide with the closed-loop feedback optimal procedure and it is
ab(Sn) = a F (1) ; ab(Sn;1) (t1 ) = a F (1) ; t1 ; :::
(1) ; tn;1 :::; ab(1) S (t1 ; ::; tn;1 ) = a F
(compare also with Remark 5.53). Example 5.58. (OLFO burn-in of conditionally exponential units). Let T1; :::; Tn be conditionally independent given f = g, with density f (tj) = expf;tg. Denoting by 0 () the prior density of , we then have
f (n) (t1 ; :::; tn ) =
Z1 0
n expf;
n X i+1
ti g0 () d
We assume the loss function as in (5.5). Since the one-dimensional marginal density is as in (5.55), we have that
ab(Sn) = a
where a is given as in (5.56). If T(1) = t1 < ab(Sn) ; we must continue, after t1 the burn-in procedure for an extra time ab(Sn;1) (t1 ). Now note that, conditionally on the observation ht1
= fT(1) = t1 ; T(2) > t1 g;
236
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
the conditional density of becomes (jht1 ) = R 1 expf;nt1g0 () : 0 expf;nt1 g0 () d whence the conditional density of the residual lifetimes is R 1 2 expf;(t + nt )g () d 1 0 f (tj1; t1 ) = 0 R 1 ; exp f; nt g 1 0 () d 0 furthermore ab(Sn;1) (t1 ) = a(jht1 ) : In a similar way we can obtain ab(Sn;2) (t1 ; t2 ); :::; ab(1) S (t1 ; ::; tn;1 ):
Notice that ab(Sn;h) (t1 ; ::; th ) depends only on h and on the observed total time P on test statistic hi=1 ti + (n ; h)th . Now we proceed to brie y discuss how the Proposition 5.52 can be applied to obtain useful properties of an OLFO burn-in procedure, when the loss function is as in (5.5). From the very de nition, we see that an OLFO burn-in procedure depends on F (1) and on the (one-dimensional) conditional distributions (1) (1) F (1) t1 (j1; t1 ) ; F t2 (j2; t1 ; t2 ) ; :::; F tn;1 (jn ; 1; t1 ; t2 ; :::; tn;1 )
for the residual lifetimes of the units, respectively surviving after the progressive failure times T(1) ,...,T(n;1) . On the other hand, in Sections 5.2 and 5.3, we obtained a sample of results leading to establish stochastic comparisons, in the lr sense, between two (onedimensional) conditional distributions for residual lifetimes given two dierent dynamic histories. By Proposition 5.52 then, such stochastic comparisons can be used to possibly obtain appropriate monotonicity properties of an OLFO procedure of burnin. In particular, as a consequence of (5.23) and Proposition 5.52, we have Lemma 5.59. Let the joint density f (n) of T1; :::; Tn be MTP2 and let the conditional survival functions F (1) t (jh; t1 ; :::; th ) admit densities satisfying condition (5.53),8h = 1; 2; :::; n ; 1; 0 t1 ::: th . Then, for 0 t1 t2 ::: th < th+1 such that th+1 < ab(Sn;h) (t1 ; ::; th );
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
237
we have that
ab(Sn;h) (t1 ; ::; th ) is a decreasing function of t1 ; ::; th . Consider now speci cally the case of conditional independence, given a scalar parameter , where the joint density f (n) is of the form (5.40), with g(tj) > 0; 8t > 0; we see that, under suitable additional conditions on g(tj), we have a result stronger than Lemma 5.59. Proposition 5.60. Let log g(tj) be a convex function of t and @@ log g(tj) be a convex, decreasing function of t; 8. Then whenever t0 t:
ab(Sn;h) (t01 ; ::; t0h;1; th ) ab(Sn;h) (t1 ; ::; th;1; th )
Proof. Let be any initial density of and consider the one-dimensional predictive density de ned by
f (t) = We can write, for x > 0,
Z
L
g(tj) () d:
(5.60)
R
f (t + x) = LRg(t + xj) () d = f (t) L g (tj) () d
R
g(t+xj) Z L Rg(tj) g (tj) () d = g (t + xj) (jT = t) d; L g (tj) L g (tj) () d
by setting
(jT = t) = R gg((ttjj)) (()) d : L @ @ log g (tj) being a decreasing function of
decreasing function of and then for t0 > t,
t; 8, it follows that g(gt(+tjxj)) is a
(jT = t0 ) lr (jT = t)
Z g(t0 + xj) Z 0 (jT = t0 ) d g(t + xj) (jT = t) d: L
g(t0 j)
L
g(t0 j)
238
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
On the other hand, log g(tj) being a convex function of t means that g(t0 + xj) g(t + xj) ; 8; g(t0 j) g(tj) whence Z g(t + xj) Z g(t0 + xj) ( j T = t ) d (jT = t) d: L g (tj) L g (t0 j) Then f (t0 + x) = Z g(t0 + xj) (jT = t0 ) d f (t0 ) L g (t0 j)
Z g(t + xj) (jT = t) d = f (t + x) L
g(tj)
f (t)
and we can then claim that a density f (t) of the form (5.60) is log-convex, for any initial density : Now a conditional survival function of type F (1) t (jh; t1 ; :::; th ) admits a density given by (1) (t + ujh; t1 ; :::; th ) ft(1) (ujh; t1 ; :::; th ) = f (1) F (tjh; t1 ; :::; th )
R g(t + uj) (jh; t ; :::; t ) d 1 h ; = LR G(tj) (jh; t ; :::; t ) d whence
(5.61)
h
1
L
R
ft(1) (u + xjh; t1 ; :::; th ) = LRg(t + u + xj) (jh; t1 ; :::; th ) d ft(1) (ujh; t1 ; :::; th ) L g (t + uj) (jh; t1 ; :::; th ) d By the arguments above we see then that ft(1) h (jh; t1 ; :::; th;1 ; th ) is a logconvex density, for any dynamic history ht as in (5.48). By using Proposition 5.52, we can now conclude the proof by showing that
;
(1) 0 0 ft(1) h (jh; t1 ; :::; th;1 ; th ) lr fth jh; t1 ; :::; th;1 ; th :
In view of Equation (5.61), and since an analogous expression holds for
;
0 0 ft(1) h jh; t1 ; :::; th;1 ; th ;
BURN-IN PROBLEMS FOR EXCHANGEABLE LIFETIMES
239
the above comparison is achieved by showing that (jT1 = t1 ; :::; Th;1 = t0h;1 ; Th = th ) lr Now
(jT1 = t01 ; :::; Th;1 = t0h;1 ; Th = th ):
Q
;1 g(t j) (jT1 = t1 ; :::; Th;1 = th;1 ; Th = th ) = hj=1 j ;1 g(t0 j) ; (jT1 = t01 ; :::; Th;1 = t0h;1 ; Th = th ) Qhj=1 j
@ @ log g (tj) being convex, it follows
@ (jT1 = t1 ; :::; Th = th ) @ (jT1 = t01 ; :::; Th = t0h ) 0:
Proposition 5.61. Let the conditions of Proposition 5.60 hold and let t1; :::; th;1,
et1; :::; eth;1 be such that
t1 ::: th;1 ; et1 ::: eth;1
t1 et1 ; t1 + t2 et1 + et2 ; :::; Then
hX ;1 i=1
ti
hX ;1 i=1
eti :
(5.62)
ab(Sn;h) (et1 ; ::; eth;1 ; th ) ab(Sn;h) (t1 ; ::; th;1 ; th ):
(5.63)
; such Proof. By the conditions (5.62), we can nd a vector t0 t0 ; :::; t0 that
1
h;1
t0 t; t0 et:
Then, by using Lemma 5.59 and Proposition 5.60, we have ab(Sn;h)(et1 ; :::; eth ) ab(Sn;h) (t01 ; ::; t0h ) ab(Sn;h) (t1 ; ::; th ): Notice that the inequality (5.62) obviously is a weaker condition than et t. Remark 5.62. Suppose that we collected two dierent sets of failure data (t1 ; ::; th;1) and (t01 ; ::; t0h;1 ), until time s, for two dierent sets of units U1 ; ::; Un and U10 ; ::; Un0 , respectively. The condition t0 t can be seen as an indication that the units U10 ; ::; Un0 still are (more than it can happen for U1 ; ::; Un) in the period of infantile mortality, at time s. This argument supplies a heuristic interpretation of the result in Proposition 5.61.
240
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
5.5 Exercises
Exercise 5.63. Let Q; c and K (t), in Example 5.5, be given. Apply the formula (5.8) to nd the optimal time of preventive replacement a for a unit whose lifetime has density of the form
f (1) (t) =
Z1 0
(t) expf;R(t)g0 () d; (t) = dtd R (t)
Deduce that (t) decreasing implies a = 0 or a = 1. Exercise 5.64. In Example 5.5, take now K (t) = q t, for some q > 0 and consider the optimal times of replacement a1 ; a2 for two units U1 and U2 , whose lifetimes are IFR and have densities f1(1) and f2(1) respectively. Check that f1(1) lr f2(1) implies a1 a2 . Exercise 5.65. In Example 5.8, consider the case of a proportional hazard model for T1 ; :::; Tn; T , de ned by a joint density of the form (5.42), with the special choice 0 () = ; () ;1 expf; g;
for given > 0; > 0. Write explicitly a (t1 ; :::; tn ) in terms of (). Check then directly that a (t1 ; :::; tn ) is Schur-concave or Schur-convex according to () being decreasing or increasing. Exercise 5.66. For the same model as in Exercise 5.65 nd the solution of the predictive spare parts problem in Example 5.16. Exercise 5.67. Check that the condition (5.33) holds for the joint survival function considered in Example 5.34. Hint: H is decreasing and max1in;1 ri is a Schur-convex function of (r1 ; :::; rn;1 ). Exercise 5.68. Prove Proposition 5.40. Hint: split the proof for the two cases when W (t01 ; :::; t0n;1 ; t) 0 and W (t01 ; :::; t0n;1 ; t) < 0 and notice that, for t such that @t@ f (n) (t1 ; :::; tn;1 ; t) 0, Schur-convexity of @t@ f (n) (t1 ; :::; tn;1 ; t) and the condition t0 t imply that only two cases are possible:
@ f (n) ;t0 ; :::; t0 ; t @ f (n) (t ; :::; t ; t) a)0 @t 1 n;1 1 n;1 @t
@ f (n) ;t0 ; :::; t0 ; t 0 @ f (n) (t ; :::; t ; t) : b) @t 1 n;1 1 n;1 @t
BIBLIOGRAPHY
241
Exercise 5.69. Consider the proportional hazard model for conditionally i.i.d. lifetimes T1 ; :::; Tn and nd sucient conditions on () which imply the as-
sumptions used decreasing, log-convex implies that the corresponding onedimensional density f (1) is log-convex, for any choice of the prior density 0 . For the case when 0 is a gamma density (i.e. the same model as in Exercise (1) 5.65) give an explicit expression to the ratio f f (1)(t(+t) ) . By applying Proposition 5.52, nd, for the case above, an expression for the corresponding optimal solution of the one-dimensional burn-in problem, with cost function of the form as in (5.5). Consider now the OLFO procedure. Using the expression found above and replacing 0 with the conditional density of given an observation of the type D = fT(1) = t1 ; :::; T(h) = th g, show directly, for the present special case, the validity of the inequality (5.62) for (t1 ; :::; th;1 ) and (et1 ; :::; eth;1 ) as in Proposition 5.61.
Exercise 5.70. Consider the problem of optimal time replacement for units with exchangeable lifetimes. Similarly to the case of the burn-in problem, we can argue that, in the case of stochastic dependence, the optimal replacement policy should be adaptive, i.e. the decision as to replacing a unit of age r at a time t, should depend on the history of failures observed up to t. For the case of a loss function speci ed by the position K (t) = q t, formulate a de nition of \open loop optimal" time replacement policy and explain the possible use of the result in Exercise 5.64.
5.6 Bibliography Aven, T. and Jensen, U. (1999). Stochastic models in reliability. Applications of Mathematics, 41. Springer-Verlag, New York. Barlow, R.E. (1998). Engineering reliability. Society for Industrial and Applied Mathematics (SIAM), Philadelphia. Barlow, R. E. and Proschan, F. (1965). Mathematical theory of reliability. John Wiley & Sons, New York. Barlow, R. E. and Proschan, F. (1975). Statistical theory of reliability. Probability models. Holt, Rinehart and Wiston, New York. Barlow, R. E. and Proschan, F. (1988). Life Distribution Models and Incomplete Data. In Handbook of Statistics, Vol. 7, 225{250. North-Holland/Elsevier, Amsterdam, New York. Barlow, R.E. and Zhang, X. (1987). Bayesian analysis of inspection sampling procedures discussed by Deming. J. Stat. Plan. Inf. 16, no. 3, 285{296. Barlow, R. E. , Clarotti, C. A. and Spizzichino, F. (Eds) (1990). Reliability and decision making. Proceedings of the conference held at the University of Siena, Siena, October 15{26. Chapman & Hall, London, 1993.
242
BAYESIAN DECISIONS, ORDERINGS, AND MAJORIZATION
Berg, M. (1997). Performance comparisons for maintained items. Stochastic models of reliability. Math. Meth. Oper. Res. 45, no. 3, 377{385. Berger, J. (1985). Statistical decision theory and Bayesian analysis. Second edition. Springer-Verlag, New York-Berlin. Bertsekas, D. (1976). Dynamic programming and stochastic control. Mathematics in Science and Engineering, 125. Academic Press, New York. Billingsley, P. (1995). Probability and measure. Third edition. John Wiley & Sons, New York. Block, H.W. and Savits, T. H.(1997). Burn-in. Statistical Science. 12, No. 1, 1-19. Block, H.W. and Savits, T. H. (1994). Comparison of maintenance policies. In Stochastic Orders and Their Applications (Shaked and Shamthikumar, Eds). Academic Press, London. Bremaud, P. (1981). Point Processes and Queues. Martingale dynamics. Springer Verlag, New York-Berlin. Brown, L.D., Cohen, A. and Strawderman, W.E (1979). Monotonicity of Bayes Sequential Tests. Ann. Math. Stat., 7, 1222-1230. Caramellino, L. and Spizzichino, F. (1996). WBF property and stochastic monotonicity of the Markov Process associated to Schur-constant survival functions. J. Multiv. Anal., 56, 153-163. Clarotti, C.A. and Spizzichino, F. (1990). Bayes burn-in decision procedures. Probab. Engrg. Inform. Sci., 4, 437-445. Clarotti, C.A. and Spizzichino, F. (1996). Bayes predictive design of scram systems: the related mathematical and philosophical implications. IEEE Trans. on Rel.,45, 485-490. Costantini, C. and Pasqualucci, D. (1998). Monotonicity of Bayes sequential tests for multidimensional and censored observations. J. Statist. Plann. Inference,75 , no. 1, 117{131. Costantini, C. and Spizzichino, F. (1997). Explicit solution of an optimal stopping problem: the burn-in of conditionally exponential components. J. Appl. Prob, 34, 267-282. Deming, W. E. (1982). Quality, Productivity and Competitive Position (M.I.T., Center for Advanced Engineering Study, Cambridge, Ma). De Groot, M. H. (1970). Optimal Statistical Decisions. McGraw-Hill, New York. Ferguson, T.S. (1967). Mathematical statistics: A decision theoretic approach. Academic Press, New York-London. Gertsbakh, I. (2000). Reliability theory with applications to preventive maintenance. Springer-Verlag, New York. Herberts, T. and Jensen, U (1999). Optimal stopping in a burn-in model. Comm. Statist. Stochastic Models 15, 931{951. Karlin, S. and Rubin, H. (1956). The theory of decision procedures for distributions with monotone likelihood ratio. Ann. Math. Statist. 27, 272{299.
BIBLIOGRAPHY
243
Lehmann, E. L. (1986). Testing statistical hypotheses. Second edition. John Wiley & Sons, New York. Lindley, D. V. (1985). Making decisions. Second edition. John Wiley & Sons, Ltd., London. Macci, D. V. (1999). Some further results about undominated Bayesian experiments. Statist. Decisions, 17, no. 2, 141{156. Mosler, K. and Scarsini, M. (Eds) (1991). Stochastic orders and decision under risk. Institute of Mathematical Statistics, Hayward, CA. Piccinato, L. (1980). On the orderings of decision functions. Symposia Mathematica XXV, Academic Press, London. Piccinato, L. (1993). The Likelihood Principle in Reliability Analysis. In Reliability and Decision Making (Barlow, Clarotti, Spizzichino, Eds) Chapman & Hall (London). Piccinato, L. (1996). Metodi per le Decisioni Statistiche. Springer Italia, Milano. Runggaldier, W. J. (1993a). Concepts of optimality in stochastic control. In Reliability and Decision Making (Barlow, Clarotti, Spizzichino, Eds) Chapman & Hall (London). Runggaldier, W.J. (1993b). On stochastic control concepts for sequential burn-in procedures. In Reliability and Decision Making (Barlow, Clarotti, Spizzichino, Eds) Chapman & Hall (London). Savage, L. J. (1972). The Foundations of Statistics. Second Revised Edition. Dover, New York. Shiryaev, A. N. (1978). Statistical sequential analysis. Optimal stopping rules. Springer-Verlag, New York. Spizzichino, F. (1991). Sequential Burn-in Procedures. J. Statist. Plann. Inference 29 (1991), no. 1-2, 187{197. Torgersen, E. (1994). Information orderings and stochastic orederings. In Stochastic Orders and Their Applications (Shaked and Shamthikumar, Eds). Academic Press, London. van der Duyn Schouten, F. (1983). Markov decision processes with continuous time parameter. Mathematical Centre Tracts, 164. Mathematisch Centrum, Amsterdam.