RECENTADVANCES IN STOCHASTIC MODELING AND DATA ANALYSIS
This page intentionally left blank
RECENTADVANCES IN STOCHASTIC MODELING AND DATA ANALYSIS 29 May - 1 June 2007
Chania, Greece
editor
Christos H Skiadas Technical University of Crete, Greece
r pWorld Scientific N E W JERSEY
*
LONDON
*
SINGAPORE
*
BElJlNG
*
SHANGHAI
*
HONG K O N G
*
TAIPEI
*
CHENNAI
Published by World Scientific Publishing Co. Re. Ltd. 5 Toh Tuck Link, Singapore 596224 USA ojjice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK ojjice: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-PublicationData A catalogue record for this book is available from the British Library.
RECENT ADVANCES IN STOCHASTIC MODELING AND DATA ANALYSIS Copyright 0 2007 by World Scientific Publishing Co. Re. Ltd. All rights reserved. This book, or parts thereox may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-981-270-968-4 ISBN-10 981-270-968-1
Printed in Singapore by World Scientific Printers (S) Pte Ltd
This volume contains a part of the invited and contributed papers which were accepted and presented at the 12nd International Conference on Applied Stochastic Models and Data Analysis in Chania, Crete, Greece, May 29- June 1, 2007. Since 1981, ASMDA aims to serve as the interface between Stochastic Modeling and Data Analysis and their real life applications particularly in Business, Finance and Insurance, Management, Production and Reliability, Biology and Medicine. Our main objective is to include papers both theoretical and practical, presenting new results having potential for solving real-life problems. Another important objective is to present new methods for solving these problems by analyzing the relevant data. Also, the use of recent advances in different fields will be promoted such as for example, new optimization and statistical methods, data warehouse, data mining and knowledge systems and neural computing. This volume contains papers on various important topics: Stochastic Processes and Models, Distributions, Insurance, Stochastic Modelling for Healthcare Management, Markov and Semi Markov models, Parametric/ Non -Parametric, Dynamical Systems / Forecasting, Modeling and Chaotic Modeling, Sampling and Optimization problems, Data Mining, Clustering and Classification, Applications of Data Analysis and various other applications. The World Scientific had also published the proceedings in two volumes of the 1993 Sixth ASMDA Conference, held also in Chania, Crete, Greece. I acknowledge the valuable support of the Mediterranean Agronomic Institute, Chania, Greece, as well as the IBM France. Sincere thanks must be recorded to those whose contributions have been essential to create the Conference and the Proceedings. Finally, I would like to thank Anthi Katsirikou, Mary Karadima, John Dimotikallis and George Matalliotakis for their valuable support. Chania, July 30,2007 Christos H. Skiadas Editor V
This page intentionally left blank
Contents Preface
V
1 Stochastic Processes and Models
1
An Approach to Stochastic Process using Quasi-Arithmetic Means Etienne Cuvelier and Monique Noirhomme-Fraiture The Quantum Generator of Translations in a Fraction-Dimensional Manifold Paulius MiSkinis Cause of Stock Return Stochastic Volatility: Query by Way of Stochastic Calculus Juho Kanniainen On a Class of Backward Stochastic Differential Equations and Applications to the Stochastic Resonance Romeo Negrea
2 10
18
26
2 Distributions
34
An Application of the Extended Waring Distribution to Model Count Data Variables Jose' Rodriguez Avi, Antonio Conde Sanchez, Antonio Jose' Sa'ez-Castillo and Ma Jose' Olmo Jime'nez Estimation of Simple Characteristics of Samples from Skewed and Heavy-Tailed Distributions Zdengk Fabian Estimating Robustness and Parameter Distribution in Compartmental Models of Neurons Noam Peled and Alon Korngreen On the Stability of Queues with Negative Arrivals Kernane Tewfk Random Multivariate Multimodal Distributions George Kouvaras and George Kokolakis A System Maintained by Imperfect and Perfect Repairs under Phase-Type Distributions Delia Montoro-Gazorla, Rafael Pe'rez-Ocdn and M. Carmen Segovia Asymptotically Robust Algorithms for Detection and Recognition of Signals Veniamin A. Bogdanovich and Aleksey G. Vostretsov
35
vii
43
51
59 68 76
82
viii
Recent Advances in Stochastic Modeling and Data Analysis
Three Parameter Estimation of the Weibull Distribution by Order Statistics Vaida Bartkute and Leonidas Sakalauskas
91
3 Insurance
101
Stochastic Models for Claims Reserving in Insurance Business Tarna's Falukozy, Ildikd Ibolya Vite'z and Miklds Aratd Stochastic Risk Capital Model for Insurance Company Gaida Pettere Measuring Demographic Uncertainty via Actuarial Indexes Mariarosaria Coppola, Ernilia Di Lorenzo, Albina Orlando and Marilena Sibillo Location as Risk Factor. Spatial Analysis of an Insurance Data-Set Ildikd Vite'z A Hierarchical Bayesian Model to Predict Belatedly Reported Claims in Insurances J&os Gyarrnati-Szabd and Lhszld Ma'rkus
102
4 Stochastic Modeling for Healthcare Management
145
Non-Homogeneous Markov Models for Performance Monitoring of Healthcare Sally McClean, Lalit Garg, Brian Meenan and Peter Millard Patient Activity in Hospital using Discrete Conditional Phase-Type (DC-Ph) Models Adele H. Marshall, Louise Burns and Barry Shaw Identifying the Heterogeneity of Patients in an Accident and Emergency Department using a Bayesian Classification Model Louise Burns and Adele H. Marshall Modelling the Total Time Spent in an Accident and Emergency Department and the Associated Costs Barry Shaw and Adele H. Marshall
146
5 Markov and Semi Markov Models
181
Periodicity of the Perturbed Non-Homogeneous Markov System M. A. Syrneonakiand P.-C. G. Vassiliou On the Moments of the State Sizes of the Discrete Time Homogeneous Markov System with a Finite State Capacity G. Vasiliadis and G. Tsaklidis Copulas and Goodness of Fit Tests Pal Rakonczai and Andra's Zernple'ni Discrete Time Semi-Markov Models with Fuzzy State Space Aleka A. Papadopoulou and George M. Tsaklidis An Application of the Theory of Semi-Markov Processes in Simulation Sonia Malefaki and George Iliopoulos
182
114 122
130 137
154
162
172
190
198 206 213
Contents ix
On a Numerical Approximation Method of Evaluating the Interval Transition Probabilities of Semi-Markov Models Dimitrios Bitziadis, George Tsaklidis and Aleka Papadopoulou Markov Property of the Solution of the Generalized Stochastic Equations Khaldi Khaled Partially Markov Models and Unsupervised Segmentation of Semi-Markov Chains Hidden with Long Dependence Noise Je'r6me Lapuyade-Lahorgue and Wojciech Pieczynski
22 1
6 Parametricmon-Parametric
242
x;
Independent Distributed in the Limit Components of Some Chi-Squared Tests Vassilly Voinov, Mikhail Nikulin and Natalie Pya Parametric Conditional Mean and Variance Testing with Censored Data Wenceslao Gonzcilez Manteiga, Ce'dric Heuchenne and Ce'sar Sknchez Seller0 Synthetic Data Based Nonparametric Testing of Parametric Mean-Regression Models with Censored Data Olivier Lopez and Valentin Patilea
229
234
243
25 1
259
7 Dynamical Systems/Forecasting
267
Application of the Single Index Model for Forecasting of the Inland Conveyances Eugene Kopytov and Diana Santalova Development and Application of Mathematical Models for Internet Access Technology Substitution Apostolos N. Giovariis and Christos H. Skiadas Exploring and Simulating Chaotic Advection: A Difference Equations Approach Christos H. Skiadas
268
277 287
8 Modeling and Stochastic Modeling
295
Likelihood Ratio Tests and Applications in 2D Lognormal Diffusions Ramdn Gutie'rrez, Concepcidn Roldan, Ramdn Gutitrrez-Sanchez and Jost Miguel Angulo Cartographical Modeling as a Statistical Method for Monitoring of a Spatial Behaviour of Population Irina Pribytkova Learning and Inference in Switching Conditionally Heteroscedastic Factor Models using Variational Methods Mohamed Saidane and Christian Lavergne
296
304
312
x
Recent Advances in Stochastic Modeling and Data Analysis
Correlation Tests Based NARMAX Data Smoother Validation Li Feng Zhang, Quail Min Zhu and Ashley Lorigden Kernel Based Confidence Intervals for Survival Function Estimation Dimitrios I. Bagkavos, Aglaia Kalamatianou and Dimitrios Ioannides Chaotic Data Analysis and Hybrid Modeling for Biomedical Applications Wlodzimierz Klonowski, Robert Stepien, Marek Darowski and Maciej Kozarski Stochastic Fractal Interpolation Function and its Applications to Fractal Analysis of Normal and Pathological Body Temperature Graphs by Children Anna Sods A Modeling Approach to Life Table Data Sets Christos H. Skiadas and Charilaos Skiadas An Extended Quadratic Health State Function and the Related Density Function for Life Table Data Charilaos Skiadas, George Matalliotakis and Christos H. Skiadas
322 330 338
342
350 360
9 Statistical Applications in Socioeconomic Problems
370
Dumping Influence on a Non Iterative Dynamics Ce'cile Hardouin Firm Turnover and Labor Productivity Growth in the Italian Mechanical Sector Luigi Grossi and Giorgio Gozzi Continuous Sampling Plan under an Acceptance Cost of Linear Form Nicolas Farmakis and Mavroudis Elefheriou A Dynamic Programming Model of a Machine Tool in Flexible Manufacturing Bernard F. Lamond Particle Filter-Based Real-Time Estimation and Prediction of Traffic Conditions Jacques Sau, Nour-Eddin El Faouzi, Anis Ben Aissa and Olivier de Mouzon Probability of Trend Prediction of Exchange Rate by ANFIS George S. Atsalakis, Christos H. Skiadas and Ilias Braimis The Organizational Structure of Greek Libraries: The State of the Art and the Perspective of Team Working Anthi Katsirikou
37 1 382
390 398 406
414 423
10 Sampling and Optimization Problems
433
Applicability of Importance Sampling to Coupled Molecular Reactions Werner Sandmann
434
Contents xi
Bispectrum Estimation for a Continuous-Time Stationary Process from a Random Sampling Karim Benhenni and Mustapha Rachdi Search via Probability Algorithm for Engineering Optimization Problems Nguyen Huu Thong and Tran Van Ha0 Solving the Capacitated Single Allocation Hub Location Problem using Genetic Algorithm Zorica StanimiroviC
442
11 Data Mining and Applications
472
Robust Refinement of Initial Prototypes for Partitioning-Based Clustering Algorithms Sami Ayramo, Tommi Karkkainen and Kirsi Majava Effects of Malingering in Self-Report Measures: A Scenario Analysis Approach Massirniliuno Pastore, Luigi Lombardi and Francesca Mereu The Effect of Agreement Expected by Chance on Some 2 x 2 Agreement Indices Teresa Rivas-Moya Qualitative Indicators of Libraries’ Services and Management of Resources: Methodologies of Analysis and Strategic Planning Aristeidis Meletiou
473
12 Clustering and Classification
511
Languages Similarity: Measuring and Testing Liviu P. Dinu and Denis Encichescu On Clustering Romance Languages Liviu P. Dinu and Denis Encichescu A Clustering Method Associated Pretopological Concepts and k-Means Algorithm T. V. Le, N. Kabachi and M. Lamure Alternatives to the Estimation of the Functional Multinomial Regression Model Manuel Escabias, Ana M. Aguilera and Mariano J. Valderrama A GARCH-Based Method for Clustering of Financial Time Series: International Stock Markets Evidence Jorge Cuiado and Nuno Crato
512
454
464
483
49 1
499
521
5 29 5 37
542
13 Applications of Data Analysis
552
Reliability Problems and Longevity Analysis Anatoli Michalski
553
xii
Recent Advances in Stochastic Modeling and Data Analysis
Statistical Analysis on Mobile Applications among City People: A Case of Bangkok, Thailand Pakavadi Sirirangsi Pollution Sources Detection via Principal Component Analysis and Rotation Marie Chavent, Herve' Gue'gan, Vanessa Kuentz, Brigitte Patouille and J e ' r h e Saracco Option Pricing and Generalized Statistics: Density Matrix Approach Petr Jizba
562 57 1
578
14 Miscellaneous
588
Inference for Alternating Time Series Ursula U. Miiller, Anton Schick and Wolfgang Wefelmeyer Estimation of the Moving-Average Operator in a Hilbert Space Ce'line Turbillon, Jean-Marie Marion and Besnik Pumo Monte Carlo Observer for a Stochastic Model of Bioreactors Marc Joannides, Irbne Larramendy-Valverde and Vivien Rossi Monte Carlo Studies of Optimal Stopping Domains for American Knock Out Options Robin Lundgren SONAR Image Denoising using a Bayesian Approach in the Wavelet Domain Sorin Moga and Alexandru Isar Performance Evaluation of a Tandem Queueing Network Smail Adjabi and Karima Lagha Assessment of Groundwater Quality Monitoring Network Based on Information Theory Malgorzata Kucharek and Wiktor Treichel Improving Type I1 Error Rates of Multiple Testing Procedures by Use of Auxiliary Variables. Application to Microarray Data Maela Kloareg and David Causeur
589
Author Index
597
605 613
62 1
630 636 645
653
CHAPTER 1 Stochastic Processes and Models
An approach t o Stochastic Process using Quasi-Arithmetic Means Etienne Cuvelier and Monique Noirhomme-Fraiture Institut d’Informatique (FUNDP) 21, rue Grandgagnage, 5000 Namur, Belgium (e-mail: e c d i n f 0 . f undp. ac .be, mnoQinf0 . f undp .ac .be)
Abstract. Probability distributions are central tools for probabilistic modeling in data mining. In functional data analysis (FDA) they are weakly studied in the general case. In this paper we discuss a probability distribution law for functional data considered as stochastic process. We define first a new kind of stationarity linked t o the Archimedean copulas, and then we build a probability distribution using jointly the Quasi-arithmetic means and the generators of Archimedean copulas. We also study some properties of this new mathematical tool. Keywords: Functional Data Analysis, Probability distributions, Stochastic Process, Quasi-Arithmetic Mean, Archimedean copulas.
1
Introduction
Probability distributions are central tools for probabilistic modeling in data mining. In functional data analysis , as functional random variable can be considered as stochastic process, the probability distribution have been studied largely, but with rather strong hypotheses , [Cox and Miller, 19651, [Gihman and Skorohod, 19741, [Bartlett, 19781 and [Stirzaker, 20051. Some processes are very famous like Markov process [Meyn and L, 19931. Such a process has the property that present is not influenced by all the past but only by the last visited state. A very particular case is the random walk, which has the property that one-step transitions are permitted only to the nearest neighboring states. Such local changes of state may be regarded as the analogue for discrete states of the phenomenon of continuous changes for continuous states. The limiting process is called the Wiener process or Brownian motion. The Wiener process is a diffusion process having the special property of independent increments. Some more general Markov chain with only local changes of state are permissible, gives also Markov limiting process for continuous time and continuous states. The density probability is solution of a special case of the Fokker-Planck diffusion equation. In preceding work [Cuvelier and Noirhomme-Fraiture, 20051 we used copulas to model the distribution of functional random variables at discrete cutting points. Here, using the separability concept, we can consider the continuous case as the limit of the discrete one. We will use quasi-arithmetic means in order to avoid copulas problem when considering the limit when the number 2
Approach to Stochastic Process using Quasi-Arithmetic Means 3
of cuttings tends to infinit,y. In section 2 we define the concept of distribution of functions and recall the notion of separability. In section 3 we propose to use the Quasi-arithmetic mean in conjunction with an Archimedean generator to build a probability distributions appropriate to the dimensional infiuite nature of the functional data. And in section 4 we study the properties of this new mathematical tool.
2
Distribution of a functional random variable
Let us recall some definitions that will be useful in the following paper.
Definition 1. Let (a,A, P ) a probability space and V a closed real interval. A functional random variable (frv) is any function from V x 0 + R such for any t E V ,X ( t , .) is a real random variable on (a, A, P ) . Each function X ( . , w ) is called a realization. In the following we will write for S ( . , w ) , and for X ( t , .). X,can be considered as a stochastic process.
x,
We study, here, the measurable and bounded functions.
Definition 2. Let 2, a closed real interval, then & ( D ) is the space of real measurable functions u(t) defined on a real interval V such that
-
Definition 3. Let f , g E C 2 ( D ) . The pointwise order between f and g on
V is defined as follows : &E
V >f ( t ) I y ( t )
f
I D 9
(2)
Definition 4. The functional cumulative distribution function (fcdf) of a f r v X on & ( D ) computed at u. E & ( D ) is given by : FX,D(U) = PIX
5 D
I.
(3)
Definition 5 . A frv is called separable if there exists in V an everywhere countable set I of points { t i } and a set N of f 2 of probability 0 such that for an arbitrary open set G c V and an arbitrary closed set F c JR the two sets { w : X ( t , w ) E F, Vt E
G)
{ w : X ( t ,w ) E F, Vt E G n I
}
differ from each other only on the subset N . The set 1 is called the separability set [Gihman and Skorohod, 19741. The space C 2(D ) is a separable Hilbert space. In the following we suppose that any realization of X is in Cz(V).
4 Recent Advances in Stochastic Modeling and Data Analysis
Definition 6. Two frv X , ( t , w ) and X z ( t . w ) (t E V , w E 0)are called stochastically equivalent if for any t E V P(X1(t,w)# X,(t.w))
=0
(4)
The interest of separability comes from the following theorem .
Theorem 1 (J.L. Doob). Let X and J’ be metric spaces, X be separable, y be compact. A n arbitrary random function X ( t , w ) , t E X with values in Y is stochastically equivalent to a certain separable random function.
3
The QAMM and QAMML distributions
In this section we build a sequence of sets that converge toward a separability set of V and at each step we define a probability distribution. Let n E N, and { t y , . . . t;}, n equidistant points of V such that t ; = inf(V) and t; = sup(D), and Vi E (1,. . . , n - 1) we have = A t . Let the two - tll = following sets
n n
A , ( ~ )=
{w 6 R : ~
( t p ,5 ~u(t:)} )
i=l
A(u) = { W E fl : i 5.0 U } We will use the following distribution to approximate the fcdf (3): P[An(u)] = H ( u ( t ; ) >... , ~ ( t : ) )
(5)
where H ( . , . . . , .) is a joint distribution of dimension 11. In previous works (see [Diday, 20021, [Vrac et al., 20011, [Cuvelier and Noirhomme-Fraiture, 20051) the Archimedean copulas were used for the approximation with small value of n. Let us recall the definition and property of copulas.
Definition 7. A copula is a multivariate cumulative distribution function defined on the n-dimensional unit cube [0, 11, such that every marginal distribution is uniform on the interval [0, 11 :
C : [O, 11,
+
[O, 11 ( ~ 1 ,. .. ) z L ~H) c ( ~ 1 ,; . .,
~ n )
The power of copulas comes from the following theorem (see [Nelsen, 19991).
Theorem 2 (Sklar’s theorem). Let H be an n-dimensional distribution function with margins F l , ..., F,. Then there exists an n-copula C such that for all 3: E R” , H ( x ~. ., . , ~ n = ) C ( F l ( x l ) ... ,
)
Fn(Xn)).
(6)
If F1, ...,F, are all continuous, then C is unique; otherwise, C is uniquely determined on Range of F1 x ... xRange of F,.
Approach to Stochastic Process using Quasi-Arithmetic Means 5 Before using copulas, we define a function that gives the distribution of the values of X,for a chosen t E V.
Definition 8. Let
a frv. We define the surface of distributions as follow :
G (tj Y) = PIX,5 Y1
(7)
We can use various methods for determining suitable G for a chosen value of t . Thus for example, if is a Gaussian process with mean value p ( t ) and standard deviation a ( t ) ,then we can use the cdf from N ( p ( t ) ,a ( t ) ) . In other cases we can use the empirical cumulative distribution function to estimate G:
In the following we will alway use this function G in conjunction with a function u of Cz (V): G [t,u ( t ) ] .So, for ease the notations, we will write G [t;u] = G [t,u ( t ) ] .If we use the preceding expression in conjunction with ( 6 ) , then (5) become :
P[A,(u)]= C ( G [ t ; ; u ]..., , G[tt;~])
(9)
An important class of stochastic process is the class of stationary processes. A stochastic process is said to be strictly stationary [Burril, 19721 if its distributions do not change with time; i.e. if for any t l ,..., t , E V and for any h E V ,the multivariate distribution function of . . , Xtn+h)does not depend on h. We propose here a more wide stationary property.
Definition 9. A stochastic process is said copula stationary if V t l , ..., t , E V and for any h E V ,the copula of ...,Xtnth)does not depend on h, i.e. its copula does not change with time. Let us notice that, if we deal with true functional data, realizations of a stochastic process X,we can suppose that there is always the same functional relation between X, and X t for any value s., t E V.If a frv is also a copula stationary stochastic process, then we call it a copula stationary frv. There is an important class of copulas which is well appropriate for copula stationary stochastic processes : the class of Archimedean copulas.
Definition 10. An Archimedean copula is a function from [0, 11, to [ 0 , 1 ]
where
4,called the generator, is a function from [0,1]to [0,ca]such that:
0
4 is a continuous strictly decreasing function,
0
d(0) = co and 4 ( 1 ) = 0,
Recent Advances in Stochastic Modeling and Data Analysis
6
Generator Uom. of 0
mame
Clayton to - 1 -1 - In ea--',+t , Frank , Gumbel-Hougaardl (- lnt)@ I -I
0
= 4-1 is completely monotonic on t in [0, m[ and for all k.
10,
0>0 B >o 0 2 1
m[ i.e.
( - ~ ) ' $ + ( t ) 2 0 for all
Notice that the k-dimension margins of (10) are all the same, and this for any value of 1 5 k 5 n. If X is a copula stationary f r v then expression (9) can be written :
PIAn(u)l =
+
L1
)
4 (G [tl;.I)
(11)
Table 1[Nelsen, 19991 shows three important Archimedean generators for copulas. The distribution (11) with the Clayton generator was already used for clustering of functional data coming from the symbolic data analysis framework (see [Vrac et al., 20011 and [Cuvelier and Noirhomme-Fraiture, 20051). Unfortunately the above limit is almost always null for Archimedean copulas when n + 0;) (see [Cuvelier and Noirhomme-Fkaiture, 2007])! Proposition 1. I f f o r u E &(D) : G ( t ;u ) < 1, Vt E
V ,then
Another objection t o the use of this type of joint distribution is something which we could call volumetric behavior. Definition 11. A function ZL E p , written Q p , if
&(D)is called a functional quantile of value
G(t;Qp) = p , Vt E 2)
(13)
The functional quantile Qp can be seen as the level curve of value p . Now let us remark that for a functional quantile :
P[An(Qp)]= lij
r
C 4(G [tl; i=l
Qpl)
1
= $ .(
' 4( P ) ) < P
+
(14)
And it is easy to see that, if n < m then $J [m#~ (p)] < [nq5(p)],and thus, the more we try to have a better approximation for a functional quantile of value p , the more we move away from reference value p toward zero. A simple way to avoid these two problems is to use the notion of quasi-arithmetic mean, concept which was studied by [Kolmogorov, 19301, [Nagumo, 19301 and [Aczel, 19661.
Approach to Stochastic Process using Quasi-Arithmetic Means
7
Definition 12. Let [ . , b ] be a closed real interval, and n E No. A quasiarithmetic mean is a function M : [a,b]" + [a,b] defined as follows:
where q5 is a continuous strictly monotonic real function. We show below that if we use the generator for Archimedean copulas in (15), we define a cumulative distribution function built from one-dimensional distributions. It's easy to prove the following lemma.
Lemma 1. Let n E No,F be a one dimensional cdf, and q5 a generator of Archimedean copula , then
is also a cdf. In various situations one can apply increasing transformations to the data without destroying the underlying dependence structure. This is classical in multivariate extreme value theory. And for these kind of transformation the copulas does not change. Proposition 2. Let n E No,{Fill5 i 5 ,n} be a set of one dimensional cdf, and q5 a generator of Archimedean copula , then
is a multivariate cdf. Proof. By the above lemma we have that the functions F ; ( z ) are cdf, and as q5 is an "Archimedean generator" so expression (10) is a copula, and thus 3 (Cy=lq5(F;(xi))) is a multivariate cdf. 0 We call the distributions given by the expression (17) the Quasi-Arithmetic Mean of Margins ( Q A M M ) distributions. Now if we use a Q A M M distribution in expression (5) :
then for each n E N we have an approximation, and the limit of the above expression is not always null.
8
Recent Advances in Stochastic Modeling and Data Analysis
Definition 13. Let : X be a frv, u E C z ( D ) ,G its Surface of Distributions and 4 a generator of Archimedean Copulas. We define the Quasi-Arithmetic M e a n of Margins L i m i t ( Q A M M L ) distribution of by :
In fact transformation (16) can be seen like giving an importance to the margins in proportion with the length of an interval [tl,tl+,l]in the approximation of Fxz,(u).
4
QAMML properties
First it is easy to see that the Q A M M L distribution preserves the functional quantiles.
Proposition 3. If I ; ~ C , D ( Q ~=) P
Qp
E
&(D) is a functional quantile of ualue p , t h e n
Now, what is the difference between the quasi-arithmetic mean of margins and the classical mean? Let
p
=
1
L
-
11~Il
G [t;u]d t
and let us define the function tp [t;u] = G [t;u] - p . Thus we can use the following Taylor’s approximation for all t (recall that 0 5 p , G [t;u] 5 1):
4 (G It;74) = 4 (P+ t p ( t I;). = 4(P)
+ 4’(P)tp It;I . + &”(P)-
t;
[t;u]
+ 0 (t; [t;4)
(21)
and then
and so like $ is a decreasing function, 4’’ ( p ) 2 0 and as u a r ( t ) 2 0 we can see that FX ~ ( udecreases ) with the variance of the differences between the function u-knd the quantile function associated to the value of the arithmetic mean of G along u.. And so the Q A M M L distribution is equal to the arithmetic mean only in the case of quantile functions.
Approach to Stochastic Process using Quasi-Arithmetic Means
9
Conclusion I n this paper we do n o t propose a new method in Functional Data Analysis but a new Probabilistic tool. Like in t h e real case, we can hope t h a t this tool can be used for analysis of functional data, like in mixture decomposition, statistical tests, ... Moreover, several ways to improve t h e tool exist. B y example let us note that the QAMML definition (see (19)) uses a uniform distribution over D: other distributions could be considered (see [De Finetti, 19311).
References [Aczel, 19661J Aczel. Lectures o n Functional Equations and Their Applications. Mathematics in Science and Engineering. Academic Press, New York and London, 1966. [Bartlett, 1978lM S Bartlett. An introduction to stochastic processes. Cambridge University Press, Cambridge, 1978. [Burril, 1972lC W Burril. Measure, integration and probability. McGraw-Hill, NewYork, 1972. [Cox and Miller, 1965lD R Cox and HD Miller. The theory of stochastic processes. Methuen, London, 1965. [Cuvelier and Noirhomme-Fraiture, 2005lE. Cuvelier and M. Noirhomme-Fraiture. Clayton copula and mixture decomposition. In ASMDA 2005, pages 699-708, 2005. [Cuvelier and Noirhomme-Fraiture, 2007lE. Cuvelier and M. Noirhomme-Fraiture. Classification de fonctions continues l’aide d’une distribution et d’une densit dfinies dans un espace de dimension infinie. In Extraction et gestion des connaissances EGC’2007, pages 679-690, 2007. [De Finetti, 1931lB De Finetti. Sul concetto di media. Giornale dell’ Instituto Itialiano degli Attuari, 2:369-396, 1931. [Diday, 2002lE. Diday. Mixture decomposition of distributions by copulas. In Classification, Clustering and Data Analysis, pages 297-310, 2002. [Gihman and Skorohod, 197411I Gihman and A V Skorohod. The theory ofstochastic process. Die grundleheren der mathematischen wissenschaften in einzeldarstellungen. Springer, Berlin, 1974. [Kolmogorov, 1930lA Kolmogorov. Sur la notion de moyenne. Rendiconti Accademia dei Lincei, 12(6):388-391, 1930. [Meyn and L, 19931s P Meyn and Tweedie R L. Markov chains and stochastic stability. Communications and Control. Springer-Verlag, New York, 1993. [Nagumo, 1930lM Nagumo. Uber eine klasse der mittelwerte. Japan Journal of Mathematics, 7:71-79, 1930. [Nelsen, 1999lR.B. Nelsen. An introduction to copulas. Springer, London, 1999. [Ramsay and Silverman, 200515 0 Ramsay and B W Silverman. Functional Data Analysis. Springer Series in Statistics. Springer, New-York, 2005. [Stirzaker, 2005lD Stirzaker. Stochastic processes and models. Oxford University Press, Oxford, 2005. [Vrac et al., 2001lMathieu Vrac, Edwin Diday, Alain Chkdin, and Philippe Naveau. Mklange de distributions de distributions, d6composition de melange de copules et application B la climatologie. In Actes du VIIIkme congre‘s de la Socie‘te‘ Francophone de Classijication, pages 348-355, 2001.
The quantum generator of translations in a fraction-dimensional manifold Paulius MiSkinis1>2 NORDITA, Nordic Institute of Theoretical Physics, Blegdamsvej 17, DK-2100 Copenhagen 0, Denmark Vilnius Gediminas Technical University Faculty of Fundamental Sciences Department of Physics Sauletekio Ave.11, LT-10223, Vilnius-40, Lithuania (e-mail: paulius .miskinismfm. vtu.It)
Abstract. In the case of the quantum generalization of random processes with the Hurst index H # 1/2, expression for the quantum Hermitian generator of translations and its eigenfunctions are proposed. The normalization constant has been determined and its relation to the operator of momentum is shown. The interrelation between the momentum and the wave number has been generalized for the processes with a non-integer dimensionality a. Keywords: quantum generator of translations, fractional derivative, non-Markovian process, Hermitian operator.
1
Introduction
The physical basis for the existence of quantum mechanics comprises a series of phenomena described by the mathematical theory of the Wienerian processes. The non-Markovian stochastic process is a natural generalization of the Brownian motion or the Wiener stochastic process [l]. The foundation for this generalization is the theory of stable probability distributions developed by Ldvy [2]. The most fundamental property of such distributions is the stability in respect to addition, in accordance with the generalized central limit theorem. Thus, from the probability theory point of view, the stable probability law is a generalization of the well-known Gaussian law. The nonMarkovian processes are characterized by the Hurst index H , which' takes values 0 < H < 1. At H = 1/2 we have the Gaussian process or the process of the Brownian motion. The non-Markovian stochastic process with stable Ldvy distributions is widely used to model a variety processes such as anomalous diffusion [3], turbulence [4], chaotic dynamics [5], plasma physics [6], financial dynamics [7], biology and physiology [8] (for recent references see e.g. [9-111). 10
The Quantum Generator of Translations 11
The constantly increasing number of experimental facts in various fields of knowledge related to classical non-Wienerian processes evokes a natural desire to “close” the commutative diagram shown in Fig. 1. QM
lh WP
a FQM
lh
a SLP
Fig. 1. Schematic representation of interrelations of Wienerian processes (WP), stable LBvy processes (SLP), quantum mechanics (QM) and fractional quantum mechanics (FQM). and, at least formally, to consider the possible existence of a quantum analogue of a more narrow class of phenomena related to stable Levy processes, the so-called fractional quantum mechanics (FQM) [12-141. Unfortunately, these works are not aimed at a thorough analysis of the properties of the quantum operator of momentum. The lack of such analysis results in some accuracies of even gross blunders while formulating FQM (see Conclusions). The present note offers a brief discussion of one of the crucial issues related to FQM, which is the one-dimensional operator of momentum. Like in usual quantum mechanics (QM), one-dimensional problems are a kind of excess idealization. Nevertheless, they may be used for elucidating the fundamental features of FQM. One-dimensional problems arise while considering the three-dimensional evolutionary equation in which the interaction potential depends on a single coordinate. This fact allows, with the aid of a corresponding factorization, to move to a simpler one-dimensional evolutionary equation. The purpose of this paper is formulation in the explicit form quantum expression of the one-dimensional operator of momentum for the fractional probability processes with a non-integer dimensionality a and the investigation of the interrelation between the quantum generator of translations and the operator of momentum.
2
Fractional quantum operator of momentum
The classical definition of momentum in QM follows from the invariance of the Hamiltonian of the quantum system H with respect to the infinitesimal displacements Sx [15]. Under such transformation, the wave function +(z) turns into the function
+(x here
+ b z ) = +(x) + C5xax+ = (1+Sxa,)
+(z),
axis the differentiation operator over the space variable x.
12 Recent Advances in Stochastic Modeling and Data Analysis
ax$
However, it may turn out that does not exist, but there exists the so-called fractional derivative a:$ in which the order of the derivative a may be both an integer and a fractional number. For the function determined on the whole real axis R, the right and left derivatives of the order a are derived
where [a]and { a } are the integer and the fractional parts of the parameter a. For the bilateral derivatives to exist, it is sufficient that $ ( x ) E C[a](52), where C[a](f2) is a set of continuously differentiated functions of the order [a]determined on the domain 52 [El. Another peculiarity related to the operator of momentum is the expansion of the wave-function $ ( x ) into a Taylor series by fractional powers [17,18]
c [a1
$(x) =
(x - xO)=+n + Rn(x)
7
(3)
n=O
2'
where are numerical coefficients and R,(x) is he residual term, which provides a better approach to the initial function. n all such cases, determination of the quantum operator of momentum should be specified. It is reasonable to suppose that the momentum operator should be proportional to the fractional derivative:
E
@ = ca;$(x)
(4)
here C is a certain coefficient of proportionality. For a -+ 0, we must obtain a usual quantum operator of momentum, 5 = Thus, in FMQ we always deal with two kinds of limit transitions: 1) ti -+ 0, when we shift to classical mechanics, and 2) a -+ 0, when we turn to usual QM (see Fig. 1). The kind of the coefficient C in the expression for the momentum (4) is best defined if on the whole real axis we consider a flat wave of the form
-%ax.
there fi is the Planck constant and 1, is a certain peculiar scale of the length of the nonlocal process under consideration. Let us impose a requirement for the momentum operator (4) to obey the eigenvalue equation p$ = p $ . Applying the property of the fractional derivatives, d;ienx = neenx (ReK > 0), we obtain that
c = (--i)QfiZga--l.
(6)
For the values observed in QM to be real, the corresponding operators should be Hermitian. It is easy to see that the quantum operator of momentum (4) with the constant C from (6) is non-Hermitian. In order to obtain
The Quantum Generator of Translations 13
a Hermitian operator of momentum, to the type (4) operator we will add a Hermite-conjugated operator $+ ; then, the momentum operator determined in this way
-
here is the symbol of transposition, will be clearly Hermitian. Indeed, the momentum operator (7) fi = ($+ + $ - ) / 2 will be Hermitian because of the idempotency of the operation of Hermitian conjugation ((@+)+= $) and the structure of the operator itself ($- = (@+)+).On the other hand, employing the rule of fractional integration by parts, +m
p* =
1,
'p*$+$dx = 2
/
+m
cp* (@-+@+)$d3: = P
-m
,
(8)
we directly see that the momentum operator is Hermitian for the different functions of state 'p and $. Thus, we obtain that the operator (7) is Hermitian and its eigenvalues on the whole real axis are flat waves of the (5) type. Like in the classical case, the eigenvalues of the momentum operator do not belong to the class L 2 ( R ) . Therefore, they don't describe the physically realizable states of the quantum particle. These eigenfunctions should be regarded as the basic functions, which comprise the complete system of functions.
3
Wave function normalization
To determine the constant normalization A in the expression for the flat wave ( 5 ) , we will take that
(9)
s
This is a particular case of the conventional condition $p,$pdx = 6(p' - p ) for p' E 0. Using the property of the &function, we shall obtain that
Let us specify the peculiarities of such normalization. Firstly, generally speaking, the amplitude is a complex magnitude; secondly, it depends on the eigenvalue of the momentum p . Only when a -+ 1, as the case should be,
A + l / m . Inasmuch as the physical sense applies not to the amplitude itself but to IAI2, the complex nature of A does not contradict unitarity. However, because of the complex nature of the amplitude we may get an impression that we deal with a damping wave; however, actually there is no damping, because A # A ( z , t ) . Besides, the same conclusion results from analysis of
14 Recent Advances in Stochastic Modeling and Data Analysis
the dispersion expression of the corresponding Hamiltonian. The dependence A = A(p)is not a matter of principle and may be avoided by a suitable choice of the normalization condition. For instance, under condition
dx = 6 ( ~-' K ) ,
$:t$&
(11)
the dependence A(p) is absent. Another important circumstance should be noted as regards the type of the momentum operator. Transition to momentum representation is not a Fourier transformation. Momentum representation should be understood in the sense of f-representation:
4
Translation operator
Lastly, let us derive the formula to express, through the momentum operator
6, the parallel translation operator in space to any finite (not only infinitesimal) distance. From the definition of such an operator it follows:
Fz+$(x) = $(x
-a),
F?-$(x)
= $(x
+ b) ,
(13)
In this case, a and b denote the values of finite displacements but not the coordinate ends of the interval. Expanding the function $(x - a ) in the neighbourhood of the point x into a Taylor series by fractional powers as in (3) and employing the expression for the "right-hand" and "left-hand" parts of the momentum operator,
$7 = - i h i ; - l q ,
6:
= irii;-l@,
(14)
we obtain that
where EE is the generalized exponential function. These are exactly the finite displacement operators we have been searching for. For a -+ 1, for Tz+ we obtain that
The expression T?- could be obtain by substituting in equation (16) a -+ -b and 6 : -+ p?.
The Quantum Generator of Translations 15
5
The shift operator in a superspace of fractional dimension
The modern theories of quantum field specifically unite physical fields of various statistics. Physical fields are unified into super-multiplets which are irreduceable 2,-graded Poincar6 group representation. 8, In the algebra of this group] alongside the translation generators P, in the ususal space, translation generators in the odd-dimensional sector the - i (70) 8, is present. generator Q = In this part of the paper we consider a generalization of the shift generator Q for the case of fractional dimensional of the odd-dimensional sector of the Minkowski superspace. In this case, differently from the shift generator in the even sector, we cannot rely upon physical intuition, because the odd Grassman variables and parameters are unobservable. To obtain the fractional shift operator Q, we shall employ the Balakrishnan formula, konown form functional analysis [191: N
&
By analogy with P,,
Q = lim E-' (T'
-
I ) , where TE= eEQ ,
E'O
(19)
is an element of Grassman's algebra. We take E" as a matrix operator for which, taking into account its Grassmanian nature,
E
( I + E)"
=I
+
a!&
(20)
is an exact equality. Threfore
For translation operators in the even sector, formula (21) leads to the already discussed operator of the fractional Marchaud derivative a;. Therefore, expression (21) may be regarded as the odd Marchaud derivative. Note here that, differently from the usual fractional Marchaud derivative] integration in (21) over E takes place in the whole space of spinor determination] but not within certain area. Formula (21) allows generalization for the c a e of a! > 1.
16 Recent Advances in Stochastic Modeling and Data Analysis
6
Conclusions
Note that the classical restriction on the smoothness of the wave function $(x) E C2([a,b ] ) does not hold here. The restriction on $(x) follows from the continuity equation; however, in the case of fractional dimension we can show that the condition of continuity is changed, and the limitation on $(x) is reduced to $(x) E d a ] ( [ ab, ] ) . Another note pertains to the structure of the momentum operator. It seems highly significant that the momentum operator consists of two parts the right and left displacements. In classical fractional mechanics, it is quite possible to limit ourselves to one of these two components - $+ or p-. In the quantum case it is impossible, because the full operator of momentum is a Hermitian. The limit transition h + 0 for { a }# 0 means transition to classical fractional mechanics. However, the form of the momentum operator undergoes no qualitative change: fi = $ (p+ fi-), i.e. it consists of two parts, each being proportional to its one-sided derivative. For linear evolutionary equations of classical (not quantum) fractional mechanics this type of structure of the momentum operator may be simplified if p = @+ or p = Ij-. However, here additional considerations are necessary. For the nonlinear fractional evolutionary processes it is impossible in principle, because fi+ = 6- is the condition of smoothness. Note here, too, that all results for the Hermitian operator of momentum are valid in the case of the Riesz quantum derivative: RD,"$(x) K [(-ia)T
+
+
(8) ]: $(x). From the definition of n there follows an interrelation between the momentum and the wave number:
Q: + 1, tc + Ic, and the expression (22) turns into p = hlc. The appearance of the characteristic length scale of I , and the power dependence of the quantum particle momentum on the wave number directly indicate the fractional character of quantum mechanics. Thus we have the Hermitian quantum operator of momentum (7) with the eigenfunctions (5). This allows us to construct the quadratic form of the Hamiltonian H K p 2 instead of the power form H K D,Jp la, and the Hermitian Hamiltonian instead of non-Hermitian proposed in [12,13],and the unitarian Hamiltonian instead of non-unitarian proposed in [14].
for
References [l]M. Kac. Probability and Related Topics in Physical Sciences. Chap. IV., Interscience, New York, 1959.
The Quantum Generator of Translations 17
[2] P. LBvy. The'orie de 17Addition des Variables Ale'atoires GauthierVillaws, Paris, 1937. [3] B.B. Mandelbrot, J.W. van Ness. Fractional Brownian Motionx, Fractional Noises and Applications. SIAM (SOC. Ind. Appl. Math.) Rev. 70(4): 422-437, 1968. [4] J. Klafter, A. Blumen and M.F. Shlesinger. Stochastic pathway to anomalous diffusion. Phys. Rev. A 55(7): 0081-9085, 1987. [5] G.M. Zaslavsky, Fractional kinetic equation for Hamiltonian chaos. Physicu D 76(1-3): 110-372, 1994. [6] G. Zimbardo, P. Veltro, G. Basile and S. Principato. Anomalous d i f i sion and Lvy random walk of magnetic field lines in three dimensional turbulence. Phys. Plasmas 2(7): 2653-2163, 1995. [7] R.N. Mantega and H.E. Stanley. Scaling behaviour in the dynamics of an economic index. Nature 376(6535): 46-48, 1995. [8] B.J. West and W. Deering. Fractal physiology for physicists: Lvy statistics. Phys. Rep. 246(1-2): 1-100, 1994. [9] A. Le Mehautk et al. (Eds.) fiactional differentiotion and its applications. Books on Demand, Norderstedt, 2005. [lo] R. Metzler and J . Klafter. The restaurant at the end of the random walk: rkcent developments in the description of anomalous transport by fractional dynamics. J. Phys. A: Math. Gen. 37: R161-R208, 2004. [ll] G.M. Zaslavsky. Chaos, fractional kinetics, and anotaloub transport. Physics Reports 377: 461-580, 2002. [12] E. Lutz. Fractional Transport Equations for LBvy Stable Processes, Phys. Rev. Lett. 86(12): 2208-2211, 2001. [13] N. Laskin. Fractional quantum mechanics. Phys. Rev. E 62(3): 31353125, 2000. [14] M. Naber. Time fractional Schrodinger equation. J. Math. Phys. 45(8): 3739-3356, 2004. [15] H. Kleinert. Path Integrals in Quantum Mechanics, Statistics and Polymer Physics. World Scientific, Singapore, 1990. [16] V. Marchaud. Sur les dBrivBes et sur les differences des functions de variables rkelles. J. math. pures et uppl. 6(4): 238-235, 1927. [17] S.G. Samko, A.A. Kilbas and 0.1. Marichev. Fractional Integrals and Derivatives. Theory and Applications. Gordon and Breach, Amsterdam, 1993. [18] P. MiSkinis. Nonlinear and nonlocal integrable models. Vilnius, Technika, 2003. [19] E. Hille, R. Phillips. Functional analysis and semigroups. Cambr. Univ. Press. 1961.
Cause of Stock Return Stochastic Volatility: Query by Way of Stochastic Calculus Juho Kanniainen’ Tampere University of Technology Institute of Industrial Management P.O.Box 541, FI-33101 Tampere, Finland (e-mail: juho. kanniainen(0tut. f i) Abstract. This study uses stochastic calculus to investigate the causes of the stock return stochastic volatility. The study aims to advance new explanations of stochastic volatility that hold also if the firm is unleveraged, and if the level of uncertainty about future business conditions does not change. Using the dividend discount model, I show that stock return volatility is admittedly stochastic if future dividends are affected by more than one stochastic state variable. Morever, I study how the mappings of state variables are related to stochastic return volatility. This study also investigates the effects of the discount rate and state variables’ mutual correlation on the level of stock volatility and its fluctuation, finding substantial relationships therein. Keywords: Stochastic calculus, Stochastic volatility, Geometric Brownian motion,
Dividend discount model.
1
Introduction
It is by now a widely accepted observation that the volatilities of individual stocks and aggregate stock markets are not constants, but change stochastically over time. The literature has repeatedly presented sophisticated statistical models t o describe stochastic volatility, but the question of why stock return volatility varies remains open. [Schwert, 19891 presents an extensive analysis of the relation between market volatility and economic activity, confirming Officer’s [Officer, 19731 earlier results that market volatility is higher during economic downturns. This has also been justified by [Hamilton and Lin, 19961, who found that economic recessions are the single most important factor explaining market volatility, accounting for about 60 percent of its variation. Further, changes in volatility are also related t o financial and operating leverage, personal leverage, interest rates, inflation rates, money growth, industrial production growth, trading volume, trading halt, and program trading (see, for example, [Black, 19761, [Christie, 19821, [Mascaro and Meltzer, 19831, [Schwert, 19891, [Schwert, 19901). Stock volatility is also stochastic if a stock has option characteristics; that is, the stock can be viewed as an option on the leveraged firm’s assets (see [Merton, 1974]), and the firm may have numerous growth options. Overall in the literature, the usual explanation of stochastic volatility is either corporate leverage or a change in the level of uncertainty about future macroeconomic conditions. 18
Cause
0.f
Stock Return Stochastic Volatility 19
This study advances new explanations of stochastic volatility by way of stochastic calculus. My explanation does not challenge the earlier explanations but rather complements them. This paper investigates the significance of the above observation by suppressing the other possible causes of stochastic volatility. The assumption of risk-neutrality suppresses leverage effects, the assumption of a constant interest rules out the possibility that the randomness of stock volatility is driven by varying interest rates, and the assumption of constant state variable volatilities eliminates any change in the uncertainty about future macroeconomic conditions. I also examine the effects of state variables’ mutual correlations and the discount rate on the level of volatility and its variation.
2
Model
To suppress the leverage effect, we assume that the risk-neutrality, and hence the discount rate, denoted by p, equals the risk-free interest rate. Moreover, by assuming a constant interest, we rule out the possibility that the randomness of volatility is driven by varying interest rates. The stochasticity of stock volatility arise solely from the dividend process. We denote the dividend stream by {Dt,, D t z , . . . , Dt,}, where dividends occur at (known) times t l , t 2 , . . . ,t,. The stock price is assumed to equal the cumulative present value of its expected dividends, n
v(t)=
Et [Dt,]exP [ - p ( t k
-
t)],
(1)
k=l
Suppose that a discrete dividend stream consists of n discrete dividends. All investors are assumed to monitor the processes of state variables and continuously revising their beliefs regarding to expected dividends. Each dividend is driven by an m stochastic state variable. In addition, the dividends’ state variable vectors need not be equal; that is, the next year’s dividend can be driven by state variables different from those of the dividend paid after five years. Consequently, we specify the dividend process in rather general terms and allow that, for example, each dividend depends on the interest rate and inflation, but that the two first dividends also depend on the oil price, whereas later dividends depend on the price of biodiesel instead of oil. Overall, the matrix X ( t ) E Rmxn, represents the dividend stream information apprearing at time t , and vector X k ( t ) represents the information associated with the stock dividend k (that will be paid at time t k ) . For all k = 1 . . .n, Dtk : Rm ++ R+ is a known mapping, and { X k ( t ) ;t 2 0}, X k E Rm the state variable vector of the dividend Dtk, is a linear diffusion defined on a complete filtered probability space (62, F x k ,P ) . We assume that for all i = 1 . . .m, k = 1 . . . n, { x i k ( t ) ;t 2 0) evolves according to the stochastic differential equation
20
Recent Advances in Stochastic Modeling and Data Analysis
where Wik is a standard Wiener process with the instantaneous correlations dWikdWjl = pik,jldt for all i , j = 1 . . . m, k , 1 = 1 . ..n. Read the above such that Xik is the state variable i of the dividend k (a dividend that will be paid at time t k ) . According to It8’s lemma, the stock price must itself follow the It6 process:
where i ,j = 1 . . .m, k , 1 = 1 . ..n, and where a is the expected price appreciabik E d W i k makes the stock price behave stochastically. tion. The term The rest of the study focuses on just this term.
xi,k
3 3.1
Causes of Stochastic Return Volatility Mappings of State Variable
For simplicity the analysis, let us suppose temporarily that there is only one stochastic state variable driving all the dividends, and that the state variable of the dividends { X ( t ) ;t 2 0 } itself follows the geometric Brownian motion:
d x(t)=ex(t)dt
+ O x ( t ) d w ( t ) , x > 0,
(4)
where 0 E R and 0 E R+ are constants. We will relax assumptions later when we will consider multiple state variables and the alternative processes of state variables. Now the asset price process is in the form
dV = adt
V
av + C-vx -dW ax
= adt
+ OEdW,
where a equals the expected price appreciation and E is the elasticity of the asset price to the state variable. If E is stochastic, so is stock return volatility O&, too. Proposition 1. Suppose that dividends’ state variable x follows geometric Brownian motion. If the elasticity of the asset price t o the state variable is a constant, say (, the asset price V = cxc,where c is a positive constant.
Proof. The solution of the differential equation positive constant. 0
%$ = C is V = cxc,where c is a
Remark 1. The above proposition also says that if the asset price is not in the form of V = cxc,then the elasticity of the asset price to the state variable, and hence also stock return volatility, is not constant.
Cause of Stock Return Stochastic Volatility 21 Example 1 . The above means that the assumption of constant return volatility is invalid, if, for example, the asset price V = clxb + C Z , where b, c 1 , c ~are constants. This is the case if the dividend of time t k ,
If x follows (4), then Now
E
is admittedly stochastic:
av x ax v
--
-
Clxbb C1XbfCZ1
Remark 2. Example 1 has an important economic interpretation. Suppose that the price of output good fluctuates according to (4) and that the dividend of time t k is X ( t k ) - a , where a denotes the production costs of the period. Then return volatility equals cr& and is stochastic. Example 2. Another example of stochastic return volatility could be V = f l ( x ) f z ( x ) . . , where f i , i = 1 , 2 , . . . , is the function of x. This is a case if, for example, the dividend of time tk, Dt, = lnX(tk), when E is stochastic even if discount rate p equals zero and number of dividends n equals one:
+
+.
0
We have now considered transformations that result in return volatility to be stochastic. What kind of dividend transformations do produce constant return volatility? The next proposition clarifies this.
Proposition 2. Suppose that dividends ' state variable x follows geometric Brownian motion. If the dividend of time t has a form Dt, = where a and b are constant numbers, then the elasticity of the asset price to the state variable, E , is constant and equals b. Proof. The proposition is a direct implication of Example 1 with a2 = 0 in (5). 0 3.2
Multiple State Variables
We assume that Dtk
=D(Xk(tk)) =Xlk(tk)
+X2k(tk)
f ' . . f X'TLk(tk)?
(6)
k = 1. . .n, and that the state variables of the dividends { X ( t ) ;t 2 0 } follow the geometric Brownian motion with constant drifts and volatilities :
+
dXilc(t) = QikXilc(t)dt aircXik(t)dWik(t),
(7)
22
Recent Advances in Stochastic Modeling and Data Analysis
where x i k > 0 for all i, k . Now stock price as a cumulative present value of expected dividends can be expressed as
where i = 1. . .m, k = 1. . .n, and where p is the stock’s instantaneous discount rate. The stochastic term in (3) takes the form
i = 1. . . m, k = 1 . . . n. How should we interpret this? What can we say now about stock volatility? Proposition 3. Suppose that there are several (more t h a n one) state variables and that (6), (?’),and (8) hold. T h e n the stock return volatility is a constant if and only zf all the state variables are drived by perferctly correlated W i e n e r processes and volatilities of state variables, cr&, are equal t o each other.
Proof. Without loss of generality, suppose that all dividends depend on the same two state variables] X 1 and X 2 . Technically, for all k = 1.. . n D ( x l ( t k ) ~ X Z ( t k= ) )x l ( t k ) + x Z ( t k ) . These variables follows the geometric Brownian motions driven by the Wiener processes W l ( t )and Wz(t), d W I ( t ) d W z ( t ) = p l z d t . Let W’ be an independent Wiener process with respect to Wl and W2. Then we can write d W z ( t ) = p l z d W l ( t ) d m d W * ( t ) . If aik = cr and plz = 1, then we can then write (9) as follows:
+
=VadW(t),
where d W l ( t ) = d W z ( t ) = d W ( t ) . Thus, the stock price evolves with constant volatility. We must still prove that if return volatility is constant] then all the state variables are drived by perferctly correlated Wiener processes and volatilities of state variables, are equal to each other. Let us do this by showing that if processes are not perefectly correlated or if volatilities are not the equal, return volatility is not constant. Again, suppose that all dividends depend on the same two state vasia,hles, XIand X,. The price of the stock is n
V ( t )=
C { X l ( t )exp((o1
-
P)(tk - t ) )+ ~ 2 ( texp((e2 ) - P)(tk - t ) ) > .
Cause of Stock Return Stochastic Volatility 23 We can then write (9) as follows:
where
i = 1 , 2 . Let &t)dl@(t) = +l(t)dWl(t) + + z ( t ) d W 2 ( t ) , when we find that
$(t)= J$l(t>Z
+ + z ( t > Z + 2PlZ+l(t)+Z(t).
(11)
Therefore, stock price diffusion takes the form
+
dV(t)= aV(t) J(t)V(t)dW(t).
(12)
8
Clearly, the stock volatility is admittedly stochastic even if either 01 = 02 or plz = 0. Note that if return volatility depends positively on the correlation PlZ. 0 I have illustrated the stochastic volatility numerically by generating sample paths for state variablesand supposing that equations (7), (10) and (11) determine state variables, stock price, and volatility. The illustration can be found at http://www.tut.fi/"kanniain/ASMDA/illustration.pdf I assumed two state variables, X , ( t ) and X z ( t ) with a constant correlation and with constant volatilities and drifts. Moreover, I assumed that n dividends will be paid at times (years) 1 , 2 , 3 .. . , and I simulated the time interval ( 0 , l ) (the first year). The illustration shows that stock return volatility may vary considerably over time. Moreover, it also examines the effect of the correlation plz on the price paths concluding that if the correlation increases, the stock price becomes more volatile, and the volatility curve moves upward. If the correlation decreases, also stock volatility decreases, and the volatility curve drops. This is in line with our analytical observation of a positive relation between plz and The result is analogous with portfolio diversification: if st.ate variables do not correlate mut'ually, their fluctuations eliminate each other. Moreover, the numerical illustration argues that the greater (less) the correlation, the greater (less) the stock volatility but with less (greater) fluctuation. T h e result is also quite intuitive. As we can see from the above equations, the mutual proportions of the state variable values clearly affect the level of stock volatility. Suppose that dividends depend on two variables, and that the volatility of the first variable, n1, is less than the volatility of the second variable, 02. If the value of the second state variable, Xz, increases in proportion to the value of the second state variable, X I ,
4.
24
Recent Advances in Stochastic Modeling and Data Analysis
the more volatile state variable takes room, and stock volatility increases. Similarly, if X I increases in proportion to X z , stock volatility decreases. Obviously also the greater (less) the correlation between state variables, the less (greater) their mutual proportion changes over time. Therefore, if state variables evolve with different volatilities and a low, or even negative mutual correlation, stock volatility may fluctuate substantially. We could interpret this result economically in that if a business depends on homogenous (heterogeneous) factors (in the sense of statistical dependency), its volatility is high (low) and does not (does) vary much. I also illustrated the effect of discount rate to volatility and its fluctuation. Here the effect of the discount rate on stock volatility can be either positive or negative. In addition, the discount rate has a great effect on how stock volatility varies over time.
3.3
Alternative Characterizations of State Variables Processes
Finally, we consider alternative processes of the state variables. Suppose that a i k ( t , x i k ) = q i k ( x i , - x&(t)) and b i k ( t , X z k ) = U i k , in which case the state variable { X i k ( t ) ;t 2 0}, z = 1 . . . m, k = 1 . ..n evolves according to the stochastic differential dxik(t)
= qik
( X i k - x i k ( t ) ) dt
+uikdWik(t),
where V i k , X i k , and f f i k are constant numbers for all z, k . This is the so-called mean-reverting process. Now, because for some 7 > t (see, for example, [Dixit and Pindyck, 1994, p. 741)
+ ( X i k ( t ) - X i k ) eXP ( - q i k ( T - t ) ) the stock price with mapping Dt, = X I k ( t k ) f X 2 k ( t k ) + ' . + X m k ( t k ) V ( t )= { x i k + ( x i k ( t ) - X i k ) exp (-'%k(tk t ) ) }exp ( - P ( t k t)). Et
[ x i k ( T ) ] = XEk
1
'
x
-
-
i,k
Stock return volatility is now unquestionably stochastic since the stochastic term in (3) takes the form
Note that stock volatility would be stochastic even if f f i k = cr and W i k ( t ) = W ( t )for all i = 1 . ..m, k = 1 . .. n, t > 0, in which case volatility would be equal to
Remark 3. The reason here is that the variance rate does not grow with x. Therefore, if dividends are driven by such a process, stock volatility remains unquestionably stochastic with linear mappings of state variables.
Cause of Stock Return Stochastic Volatility 25
4
Conclusions
The starting point of this study was that dividends are driven by state variables and that investors monitor the state variables and continuously revise their beliefs regarding to stock price. The paper studied how the mappings of state variables are related stochastic return volatility. Moreover, also multiple state variables were explored. The main result is that the stock return volatility is admittedly stochastic if future dividends are affected by more than one stochastic state variable. Thus, the paper affirms the invalidity of the geometric Brownian motion as models of stock price. We observed that the correlation between state variables has an effect on volatility dynamics according to the greater (less) the correlation, the greater (less) the stock volatility, with less (more) fluctuation over time. In addition, we found that the discount rate affects volatility and its fluctuation positively or negatively.
References [Black, 19761F. Black. Studies of stock price volatility changes. Proceedings of the 1976 meetings of the American Statistical Association, Business and Economic Statistics Section, pages 177-181, 1976. [Christie, 1982lA. A Christie. The stochastic behavior of common stock variances: Value, leverage and interest rate effects. Journal of Financial Economics, 10:407-432, 1982. [Dixit and Pindyck, 1994lA. K. Dixit and R. S. Pindyck. Investment Under Uncertainty. Princeton, New Jersey, Princeton University Press, 1994. [Hamilton and Lin, 19961J . D. Hamilton and G. Lin. Stock market volatility and the business cycle. Journal of Applied Econometrics, 11:573-593, 1996. [Mascaro and Meltzer, 1983lA. Mascaro and A. H. Meltzer. Long- and short-term interest rates in a risky world. Journal of Monetary Economics, 12:485-518, 1983. [Merton, 19741R. C. Merton. On the pricing of corporate debt: The risk structure of interest rates. Journal of Finance, 29:44%470, 1974. [Nielsen, 19991L. T . Nielsen. Pricing and Hedging of Derivative Securities. Oxford University Press, 1999. [Officer, 1973lR. R. Officer. The variability of the market factor of the new york stock exchange. Journal of Business, 46:434-453, 1973. [Schwert, 19891G. W. Schwert. Why does stock market volatility change over time? Journal of Finance, 44:1115-1153, 1989. [Schwert, 19901G. W. Schwert. Stock market volatility. Financial Analyst Journal, 46:23-34, 1990.
On a class of backward stochastic differential equations and applications to the stochastic resonance Romeo Negrea' Department of Mathematics Politehnica University of Timisoara P-ta Victoriei 2, 300006 Timisoara ROMANIA (e-mail: negreabath. uvt .ro)
Abstract. In this paper, we intend to prove a result on the solvabillity of a class of backward stochastic differential equations by the McShane type. We will considered some more general conditions by the coefficient functions and prove a result on the existence and the pathwise uniqueness by the Nagumo type, extended the Athanassov results ([Athanassov, 19901) for the ordinary differential equations. In final, we study the control of some electronic circuits in the presence of the stochastic resonance. Keywords: backward stochastic differential equations, adapted solution, pathwise uniqueness, belated integrals, stochastic resonance, signal-to-noise ratio.
1
Introduction
Noise in dynamical system is usually considered a nuisance. However, in certain nonlinear systems, including electronic circuits and biological sensory systems, the presence of noise can enhance the detection of weak signals. The phenomenon is termed stochastic resonance and is of great interest for electronic instrumentation. The essential ingredient for the stochastic resonance is a nonlinear dynamical system, which typically has a period signal and noise at the input and output that is a function of the input as well as the internal dynamics of the system. The nonlinear component of the dynamical system is sometimes provided by a threshold which must be crossed for the output to be changed or detected. A nonlinear system is essential for stochastic resonance to exists, since in a system that is well characterized by linear response theory, the signal-to-noise ratio at the output must be proportional to the signal-to-noise ratio at the input. Engineers have normally sought to minimize the effect of noise in electronic circuits and communication systems. Today, however, it is acknowledged that noise or random motion is beneficial in breaking up the quantization pattern in a video signal, in the dithering of analog to digital converters, in the area of Brownian ratchets, etc. 26
O n a Class of BSDE 64 Applications to the Stochastic Resonance
27
In the field of control, we usually regard y(.) as an adapted control and
x(.) as the state of the system. We are allowed to choose an adapted control y(.) which drives the state x(.) of the system to the given target X at time t = 1. This is so-called reachability problem. So in fact we are looking for a pair of stochastic processes { x ( t ) y(t), , 0 2 t 5 1) with values in R x R which is Ft-adapted and satisfies the above equation. Such pair is called an adapted solution of the equation and was introduced in [Pardoux and Peng, 19901. In the electronic circuits with stochastic resonance we have some nonlinear system, therefore, must to be considered more general coefficient functions for a backward stochastic differential equations - non-linear and maybe non-lipschitz. On the other hand, in practice, the additional input noise (which is responsible for the resonance) and the external perturbations belong to a large class of noises, then the stochastic calculus by the McShane type (introduced in [McShane, 19741) can be a better approach for the mathematical models of some phenomena. In this paper, we intend to prove a result on the solvabillity of a class of backward stochastic differential equations by the following type:
where { z j ( t ) , 0 5 t 5 l}, j = 1 , 2 , . . . , T is a stochastic process defined on the probability space (R, .F, P ) with the natural filtration {Ft,0 5 t 5 l} and X is a given F1-measurable random variable such that EIX12 < 00. Moreover, f is a mapping from R x [0,1] x R x R to R which is assumed to be P @I B @I B \ B-measurable, where P is the 0-algebra of Ft-progressively measurable subsets of R x [0,1]. Also g is a mapping from 0 x [0,1] x R to R which is assumed to be P x B \ B-measurable. All stochastic integrals are belated or McShane integrals. A version of the equation (1)which has the most important mathematical properties is its canonical form:
2
Preliminary results
In 1974, E.J. McShane, introduced so called belated integrals and stochastic differentials and differential systems which enjoying the following three
28
Recent Advances in Stochastic Modeling and Data Analysis
properties: inclusiveness, consistency and stability. McShane's calculus had proved to very valuable in modeling and is finding applications in physics, engineering and economics ([McShane, 19741, [McShane, 19751). Therefore, in this section we recall some specifically results of the McShane stochastic calculus. Let (Q,.F,P) be a complete probability space and let { F t , 0 5 t 5 u } be a family of complete a-subalgebras of .F such that 0 5 s 5 t 5 a then
Fs c 3 t .
Every process denoted by z with differents affixes will be a real valued second order stochastic process adapted to {Ft,0 5 t 5 u } (i.e. z ( t ) is .?-measurable for every t E [0, u]) and
IE[(z(t)- z(s))"/Fs]l i K ( t - S )
5 a, m = 1,2,4, for a positive constant K having as. continuous sample functions (and we say that the process satisfies a a.s., whenever 0 5 s 5 t
K-condition). It is known (see [McShane, 19741) that if f : [O,u] + L2 is a measurable process adapted to the Ft and if t ---t Ilf(t)lI is Lebesgue integrable on [O, a], then if z1 and z2 satisfy a K-condition, the McShane integrals
exist and the following estimates are true ra
where C = 2Ku4
3
+Ki.
Theoretical results
Some results in weaker conditions than the Lipschitz conditions for the coefficient functions of a backward stochastic differential equation are given in [Mao, 19951, [Lepeltier and San Martin, 19981, [Lepeltier and San Martin, 20021, [Kobylanski, 20001 or [Constantin, 20041, [Negrea, 20061 and many others. A very interesting applications was given, for example, in [Constantin, 20051, [Ladde and Sambandham, 20031, [Srinivasan and Udayabhaskaran, 19821, In [Athanassov, 19901 is proved a uniqueness theorem of Nagumo type for the Cauchy problem generalizing several known uniqueness theorems and sufficient conditions to guarantee the convergence of the Picard successive
O n a Class of BSDE &' Applications to the Stochastic Resonance
29
approximations for the ordinary differential equations. Stochastic generalizations of the results of Athanassov for (forward) stochastic differential equations are given in[Constantin, 1995][Constantin, 19961 [Constantin, 19981, [Constantin and Negrea, 20041 or [Negrea, 20031 and others. But it is known that the uniqueness of the solution of an initial value problem of Cauchy and the convergence of successive approximations are logically independent, i.e. the uniqueness of the solution does not ensure the convergence of successive approximations nor is the converse true. For the equation (4) we consider the following hypotheses: i) gj and its partial derivations are P 8 B 8 B measurable functions; ii) gj(.,O,O) E M 2 ( ( 0 , 1 ) , R ) ,g ( . , O , O ) E M2((0,1),R)and %(.,O,O) E
M 2 ( ( 0 ,I ) , R); iii) there exists u(t) a continuous, positive and derivable function on 0 < t 5 1 with u(0)= 0, having nonnegative derivative u'(t) E L([O,l]),with u'(t) + 00, t + Of such that if p is any one of the functions g j , relation (2)) we have
hjk
(as in
for al121,~2,yl,y2E R, 0 I t 5 1, a n d A = (r+1)C2+(r2+1)C2+(2r2)C2.
iv) with the same functions u ( t )as above,
for all z , y l , y 2 E R,0
< t 5 1.
In [Negrea and Caruntu, 20071 is given the result on the existence and uniqueness of the solution of the equation (4). More exactly, we have the following theorem:
Theorem 3.1. Let be gj and h j k satisfying the above hypotheses for any j , k = 1,.. . , T and X E L2(f2,F1,P, R),then there exists a unique pair (z,y) E M 2 ( ( 0 , 1 ) , R )which satisfy the equation (4) in the canonical form for [S, 11, for any positive constant 6. Now, we will present some results on the stability properties of the solution of the equation (1). We consider families of backward stochastic integral equations
30
Recent Advances in Stochastic Modeling and Data Analysis
whit X E A- a open and bounded set
c R".
In similar way as in [Negrea, 20061 and using the properties (3), (4), (5) and (6) it easy to prove the following two results:
Theorem 3.2.If, f o r any X E A, the coeficient functions fx a n d g x satisfy the hypothesis (i)-(iv), then the family (4) has a unique solution (xx,y x ) E W(0,11). Moreover, if IXx,, - Xxl -+ 0, m -+ 00, then, we have that
Theorem 3.3.h the hypothesis (i)-(iv), XxoI = 0 , then lim Izx
X-Xo
-
we have that if limx+xo IXx
xxo12 = 0 , and lim Iyx X-Xo
-
-
yxOl2= 0 ,
on any compact subset of the domain (0,1], where cp is any functions f or g.
4
Applications to the Stochastic Resonance
In this section we study some applications of a general class of McShane backward stochastic differential equations described above. For a better understanding of the notion of stochastic resonance, we consider the following mechanical experiment - a ball is pushed with a force F on a non-linear path, and let to consider the following three cases: a) Non-perturbed case - if the force F is strong enough and there are not external perturbations, then the ball arrives at the final point. b) Perturbed case - if the there are some external perturbations, the ball do not rise "'the first pick", therefore is increase the force F and the ball pass to the first hill, but if there are more external perturbations the ball can not pass to the second ball and it not return to the initial point for a new increasing of the force F .
O n a Class of BSDE 8 Applications to the Stochastic Resonance
31
c) Perturbed case with a "jumper ball" - the force F pushes not a jumper ball, a ball with a "noise". If there is a "good noise", the ball pass to the all external perturbations and arrive to the final point. A model of one-dimensional nonlinear system that exhibits stochastic resonance is the damped harmonic oscillators with the Langevin equation of motion: rnji.(t)
+ yk(t) =
dU(x) dx
--
+ fi[(t)rnji.(t) + yk(t) = --dU(x) + dE[(t) dx
This equation describes the motion of a particle of mass m moving in the presence of friction . The restoring force is expressed as the gradient of some bistable or multi-stable potential function U(x). In addition, there is an additive stochastic force [(t) with intensity D , and, in generally, it supposed as been a white Gaussian noise. U ( x ) = - a$+b$ In the case of bistable system, in [Gammaitoni et al., 19981, [Harmer et al., 20021 the potential function is a simple symmetric function and, adding a period signal and considering case of time dependent system U(x, t) = U ( z )- Axsin(w,t) = -a-
22
2
+ b-x44
-
Axsin(w,t)
where A and are the amplitude and the frequency of the periodic signal, respectively. More models for the electronic circuits with a stochastic resonance are discrete. In general, they used a state space time series models given by the bellow expressions z n = f (xn-1) un Yn = h(xn) wn
{
+
+
where f and h are non-linear functions and un (input noise) and wn (external noise) are respectively Gaussian or non-Gaussian noises. About these models there is a simple observation: we can not say exactly if the external perturbation is present just at the discrete moments. A continuous model appears as more adequate. This approach is possible just using the theory of stochastic differential equations. Because, the control of the input noise is the key of the benefit of the stochastic resonance, we use the backward stochastic differential equations. The problem, which frequency appear in practice, is the value of initial state 20. This value is "proposed" but in some non-standard external conditions, this make an discontinuity of the simple path of the process {z(t)} After this time (t=O), the adapted control {y(t)} works and we have a continuity of {x(t)} . Therefore, is necessary to consider some general coefficient functions and good stability properties of the solutions. On the other hand, the stochastic resonance make possible a control of the electronic circuits in some external stochastic perturbations by controlling the adapted process {y(t)} (see [Gammaitoni et al., 19981, [Harmer et al., 20021, [Calvo and Chialvo, 2006]), but the control appears as the result of filtering.
32
Recent Advances in Stochastic Modeling and Data Analysis
About above models we make the following remarks: a) in choosing the type of stochastic processes that we shall use a models of the noises we meet a dilemma. On the one hand, there is no physical basis for considering function with properties such as non-differentiability and on the other hand, the involved process must be of a kind that we can manage mathematically. b) in almost approaches, the disturbance is supposed to be Gaussian white noise, but, is known that a electrical system cannot support more than some limited current or voltage difference without destruction. Therefore, the Gaussian white noise which is a special derivate of Brownian motion (in the sense of distributions) must replace or combine with some other noises, more smoothed. For some numerical results, we consider a Chua’s type electronic circuit (see [Harmer et al., 20021, [Beglund and Gentz, 20021)with a discontinuity at time t = 0. More precisely we consider the coefficient functions f ( t , z, y) =
--t$z3+tkz+sin(27rt/T),
g(t,z,y) = y
4-t:, ~ ( 0 )
= 0,
T =X
= 1, z1 =
W.
For the sequences of successive approximations of { ~ ( t and ) } {y(t)} we have the following simple path XI
Y
Fig. 1. Simple path of the signal and the noise. In conclusions we can see the effect of the adapted process in the bistable regime of the state process { ~ ( t ) } .
References [Athanassov, 19901Z. S. Athanassov. Uniqueness and convergence of successive approximations for ordinary differential equations. Math. Japonica, 53:351-467, 1990. [Beglund and Gentz, 2002lN. Beglund and B. Gentz. A sample path approach to noise-induced synchronization: Stochastic resonance in double well potential. Ann. Appl. Probab., 12:1419-1470, 2002. [Calvo and Chialvo, 200610. Calvo and D.R. Chialvo. Ghost stochastic resonance in an electronic circuit. Inter. Journ. of Bifurcation and Chaos, 16:731-735, 2006. [Constantin and Negrea, 20041Gh. Constantin and R. Negrea. An application of schauder’s fixed point theorem in stochastic mcshane modeling. J. Fixed Point Theory, 5:37-52, 2004.
O n a Class of BSDE
tY Applications to the Stochastic Resonance
33
[Constantin, 1995lA. Constantin. Global existence of solutions for perturbed differential equations. Annuli d i Mat. Pura ed Appl., IV:237-299, 1995. [Constantin, 1996lA. Constantin. On the existence and pathwise uniqueness of solutions of stochastic differential equations. Stochastic and Stochastic Reports, 56~227-239, 1996. [Constantin, 1998lA. Constantin. On the existence and uniqueness of solutions of mcshane type stochastic differential equations. Stoch.Ana1. Appl., 16:217-229, 1998. [Constantin, 20041Gh. Constantin. The uniqueness of solutions of perturbed backward stochastic differential equations. J. Math. Anal. Appl., 300:12-16, 2004. [Constantin, 20051Gh. Constantin. An application of the stochastic mcshane’s equations in financial modeling. In J. Janssen and P. Lenca, editors, Proceedings of Applied Stochastic Models and Data Analysis, pages 917-925, 2005. [Gammaitoni et al., 1998lL. Gammaitoni, P. Hanggi, P. Jung, and F. Marchesoni. Stochastic resonance. Rev. of Modern Physics, 70:223-287, 1998. [Harmer et al., 2002lG.P. Harmer, B.R. Davis, and D. Abbot. A review of stochastic resonance: Circuits and measurement. IEEE Trans. on Instr. Measur., 51:299309, 2002. [Kobylanski, 20001M. Kobylanski. Backward stochastic differential equations and partial differential equations with quadratic growth. The Annals of Probability, 28:558-602, 2000. [Ladde and Sambandham, 20031 G.S. Ladde and M. Sambandham. Stochastic versus Deterministic Systems of Differential Equations. Marcel Dekker, Inc., New York, 2003. [Lepeltier and San Martin, 19981J.-P. Lepeltier and J. San Martin. Existence for bsde with sperlinear-quadratic coefficient. Stoch. Stoch. Reports, 63:227-240, 1998. [Lepeltier and San Martin, 2002]J.-P. Lepeltier and J. San Martin. On the existence or non-existence of solutions for certain backward stochastic differential equations. Bernoulli, 8:123-137, 2002. [Mao, 1995lX. Mao. Adapted solutions of backward stochastic differential equations with non lipschitz coefficients. Stoch. Proc. and Their Appl., 58:281-292, 1995. [McShane, 1974lE.J. McShane. Stochastic Calculus and Stochastic Models. Academic Press, New York, 1974. [McShane, 1975lE.J. McShane. Stochastic differential equations. J.Multivariate Analysis, 5:121-177, 1975. [Negrea and Caruntu, 2007lR. Negrea and B. Caruntu. On certain class of backward stochastic differential equations by mcshane type. Analele Univ. Timisoara, Ser. Mat.-Inf, 45:225-234, 2007. [Negrea, 20031R. Negrea. On the uniqueness and convergence of successive approximations for a class of stochastic differential equations. Analele Univ. Bucuresti, Ser. Mat., 52~225-234, 2003. [Negrea, 2006lR. Negrea. On the existence and uniqueness of solutions for certain backward stochastic differential equations. to appear, 2006. [Pardoux and Peng, 19901E. Pardoux and S.G. Peng. Adapted solution of a backward stochastic differential equation. Systems & Control Letters, 14:55-61, 1990. [Srinivasan and Udayabhaskaran, 1982lS.K. Srinivasan and S. Udayabhaskaran. Modeling and analysis of dynamical systems subject to discontinuous noise processes. Journ. Math.Phys.Sci., 16:415-430, 1982.
CHAPTER 2
Distributions
An application of the Extended Waring distribution to model count data variables Jose Rodriguez Avi' , Antonio Conde S h c h e z l , Antonio Jose Shz-Castillo', and Ma Jose Olmo Jinienez' Department of Statistics and Operations Research University of Ja6n 23071 Jakn, Spain (email: jravioujaen.es) Abstract. An extension of the univariate Waring distribution is proposed as an alternative to the U G W D for overdispersed count data in those cases in which the
parameter estimates do not allow the properties of the U G W D to be used, such as the partition of the variance. Specifically, this model is applied to study the set of variables number of hotels per municipality in the Autonomous Region o n Andalusia (Spain) f r o m 1990 t o 2003. Keywords: Waring distribution, Number of hotels, Overdispersed count data.
1
Introduction
To study the impact of tourism in the economy, an interesting variable can be the number of hotels per municipality within a particular geographical framework. Specifically, the aim of this work is to find a probabilistic model to describe, in an appropriate way, the variable X, number of hotels per municipality in the Autonomous Region of Andalusia (Spain) from 1990 to 2003. These data have been compiled from the data bank of the System of Multi territorial Information of Andalusia (SIMA, by its Spanish acronym, [IEA, 20061). The observed frequencies for the 14 variables appear in Table 1.
For all years we obtain discrete count variables that have a minimum in 0 which is also a modal value, whereas the maximum is a very high value
that increases from one year t o another. A summary of the main descriptive characteristics of the 14 variables is shown in Table 2. The aggregation index shows that the 14 variables have strong overdispersion, so the variability is not only due t o randomness. Firstly, an important factor that influences the observed variability is the size of the municipality. However, this is not the only deciding factor. Thus, cities with great tourist importance, such as Mdaga or Granada, have greater number of hotels than Sevilla, the capital of Andalusia and the most populated. The same happens with coastal towns compared with inland towns, etc. Other factors to 35
36
Recent Advances in Stochastic Modeling and Data Analysis - - 91 92 13t i2: i0: 62 62 71 18 24 27 10 12 16 13 13 12 6 8 9 4 7 9 4 2 3 0 1 3 2 2 0 0 3 2 1 0 2 3 1 2
qEi - 90 0 1 2 3 4
5 6 7 8 9 10 11 12 13 14 f15
1 0
7
-
1 1 1 0 9 8 -
-
-
)3 99 14 !7 15 14 7 8 4 4 0 2 2 1 1 1 9 -
9 4 88 34 17 14 14 11 6 2 5 3 1 3 1 1 0 9 -
3 5 81 36 19 14 15 9 8 3 5 3
36 87: 39
- - 37 - 98 - 99 16i 15; l4( 3 1 O i .0; 3 1 32 29 35 14 16 20 18 14 14 10 13 9 9 4 5 8 8 11 9 3 4 5 6 4 4 5 5 5 2 1 1 1 3 4 3 3 1 2 3 1 0 3 2 3 1 2 1 2 1 3 3
- )1 12 - 13 1 2 01 8E 24 24 38 i2 17 5 1 19 13 14 9 11 16 9 4 5 7 LO 6 1 5 6 8 7 6 4 4 4 3 4 4 3 1 1 2 2 4 3 1 1 0 2 3 0 0 1 3 0 0 1 1 0 4 3 2 2 11 13 14 16 LO 18 - LO - 10 - - - 17 - -
-
00 i2: -2: 41 16 13 6 9
Table 1. Observed frequencies for the hotel data from 1990 to 2003
be taken into account are the financial development of the municipality, the business initiative, etc.
Tf=EF
Range Aggregation in( 1991 767 0.8292 13.6227 1992 768 0.9440 16.1492 1993 768 0.9544 16.3899 1994 769 1.0117 19.5980 1995 769 1.0351 16.8479 1996 769 1.1235 19.5309 1997 769 1.1599 20.0537 1998 769 1.1274 18.6001 1999 769 1.1938 20.1484 2000 769 1.3199 22.8808 2001 770 1.3844 23.8886 2002 770 1.4909 27.0162 2003[77011.61561 31.0912
48 54 57 67 56 59 60 59 60 67 68 71 78
16.4287 17.1072 17.1730 19.3714 16.2766 17.3840 17.2892 16.4982 16.8775 17.3353 17.2556 18.1207 19.2444
Table 2. Descriptive summary
An Application of the Extended Wuring Distribution 37
2
Negative binomial model
Since data have overdispersion, a model that could be proposed to describe them is the negative binomial model, N B , that can be seen as the mixture
= Poisson ( A ) AA G a m m a ( a ,k ) This means that the average number of events is not constant but follows a gamma distribution. In this case, the total variance can be split into two components:
+
V ~( X T ) = E A ( V ~( T X l n ) ) V W A( E ( X ~ A .) )
(2)
The second term of (2) represents the variability due to sources of variation that cause overdispersion, whereas the first term reflects the variation due to randomness. The N B ( a , p ) distribution may also be seen as belonging to the Gaussian hypergeometric distribution family, G H D , ([Johnson et al., 20051) because its probability generating function (p.g.f.) is given by:
Table 3 includes the results obtained using the N B model for the hotel data. The estimates have been obtained by the maximum likelihood method.
NB(a,p) x' p-value log-likelihood 1990 (0.0799, 0.0996) 13.1479 0.0686 -626.2 -686.0 1991 (0.0881, 0.0961) 10.2694 0.2466 -760.1 1992 (0.1032, 0.0986) 10.1610 0.3366 -768.3 1993 (0.1057, 0.0997) 10.1809 0.3360 -803.6 1994 (0.1144, 0.1016) 16.7173 0.0533 -827.3 1995 (0.1214, 0.1050) 15.2138 0.0827 -860.0 1996 (0.1242, 0.0995) 15.0755 0.1293 -880.2 1997 (0.1287, 0.0999) 15.7031 0.1085 1998 (0.1418, 0.1117) 33.0476 0.0000 -890.1 1999 (0.1466, 0.1094) 28.6091 0.0014 -919.4 2000 (0.1632, 0.1100) 46.2239 0.0000 -928.6 2001 (0.1786, 0.1143) 46.7014 0.0000 -1026.0 2002 (0.1847, 0.1102) 49.3454 0.0000 -1065.7 2003 (0.1945, 0.1074) 56.3517 0.0000 -1112.1
Year
Table 3. MLE, X2-goodness of fit test and log-likelihood for the hotel data using
the N B model
It should be emphasized that only in the first years - until 1997 - the fits are acceptable whereas in the last years the fits are very bad.
38
Recent Advances in Stochastic Modeling and Data Analysis
3
Fit by a U G W D
The univariate generalized Waring distribution, U G W D , with parameters ( a , k and p ) has also been proposed as a valid model in the case of overdispersion ([Johnson e t al., 20051). It is also a distribution that belongs to the G H D family since its p.g.f. can be expressed as:
This distribution can be obtained from a N B ( a , p ) when the parameter p is not constant but follows a BetaI(k, p) distribution, that is
( p')
P o i s s o n ( A ) A Gamma a , -
Beta1 ( p ,k ) .
(5)
The results obtained fitting the U G W D model and applying the maximum likelihood method ([Rodriguez Avi et al., 20031) are shown in Table 4. It can be seen that the values of the X2-statistic are lower than those provided by the N B fit for all years, and values for the log-likelihood are greater. Year
-
UGWD(a,k,p)
XL 1990 (0.8137, 0.2707, 1.1087) 7.0334 1991 (5.8800, 0.1296, 1.8618) 3.5696 1992 (5.5370, 0.1562, 1.8669) 4.2428 1993 (4.9568, 0.1651, 1.8020) 3.8855 1994 (0.8623, 0.4219, 1.1977) 7.8420 1995 (1.3074, 0.3299, 1.2557) 6.2382 1996 (3.7016, 0.2151, 1.6272) 5.7105 1997 (0.5972, 0.6822, 1.1723) 5.8098 1998 (0.4958, 1.0040, 1.3141) 13.677( 1999 (0.6326, 0.7998, 1.2907) 7.9839 2000 (0.7727, 0.7727, 1.3377) 8.2827 2001 (0.8332, 0.8331, 1.4081) 7.3912 2002 (0.8636, 0.8581, 1.4044) 10.191: 2003 (0.8957. 0.9029. 1.4158) 12.873(
p-value log-likelihood AIC 0.2182 0.7347 0.7515 0.7929 0.2449 0.5122 0.6796 0,6685 0.0572 0.3340 0.4064 0.4951 0.3352 0.1684
-621.71 -681.18 -753.75 -761.52 -794.30 -819.98 -851.34 -872.30 -875.96 -904.80 -963.00 -1003.7 -1042.0 -1084.1
76.66 83.95 84.12 83.81 90.12 91.52 91.52 93.46 96.35 96.35 112.04 106.40 116.67 119.10 ~
Table 4. MLE, X2-goodness of fit test, log-likelihood and AIC for the hotel data using the U G W D model
Comparing Table 3 and Table 4, it is clear that the U G W D model improves the fits that provides the N B model in all cases. Since the U G W D and the NB are not nested models the Vuong test ([Winkelmann, 20031, pg 109) has been carried out as an extension of the likelihood ratio test. The null hypothesis indicates that the two models are
A n Application of the Extended Waring Distribution
39
equivalent versus the alternative hypothesis that the U G W D model is preferable t o the N B one: ffo : E (LUGWD(%GWD) - L B N ( @ B N=) )0
H i :E(LUGWD(~GGW -D L B) N ( @ B N > ))0
(6) (7)
The test statistic is given by
where
is the variance of the individual differences in log-likelihood divided by n. So, under the null hypothesis, the L R statistic converges in distribution to a standard normal distribution. The results of the test for each year appear in Table 5. The p-value is less than 0.1 in all years, except the year 1990, and decreases, specially, from 1998. Therefore, the U G W D model may be considered, in general, to be better than the N B model. Year Statistic 1990 1.0206 1991 1.6696 1992 1.7954 1993 1.7986 1994 1.4743 1995 1.3395 1996 1.8765 1997 1.2916 1998 2.2484 1999 2.2799 2000 2.8479 2001 3.1587 2002 3.2674 2003 3.6413
7-valuc 0.1537 0.0475 0.0363 0.0360 0.0702 0.0902 0.0303 0.0982 0.0123 0.0113 0.0022 0.0007 0.0005 0.0001
Table 5. Vuong test between the U G W D and the N B
Many studies have demonstrated the main properties of the U G W D ([Irwin, 19681, [Irwin, 19751, [Xekalaki, 1983133, [Xekalaki, 1983a], among others). Particularly, in order to interpret the data variability, the decomposition
40
Recent Advances in Stochastic Modeling and Data Analysis
a s a mixture of three distributions provides a partition of the variance into three components ([Irwin, 19683):
when p > 2. The first of these is related to random factors, the second to the variability due to external factors that affect the population (liability), and the third to the differences in the internal conditions of the individuals (proneness) ([Irwin, 19681). However, p^ < 2 in the 14 years studied, so the fitted distributions have infinite variance.
4
Fit by an Extended Waring distribution
Trying to solve the aforementioned problem, the hotel data will be modelled by a tetraparametric G H D of type I ([Rodriguez Avi et al., 20031) whose p.g.f. has the expression
with a , P , y > 0, 0 < A 5 1 and 0 < y. This distribution is called Extended Waring distribution, E W D , ([Rodriguez Avi et al., 20061) and includes, as limit cases, the U G W D ( u ,k , p ) when a = a , P = k , y = a k p and X = 1 and the N B when y is equal to a or /3 and X = 1 - p . Moreover, like the U G W D , the E W D can be expressed as a mixture. Specifically:
+ +
(
Poisson ( A ) A G a m m a a , A
- )' 1-X(1-P)
)
A P
G B e t a (y - a - PIP,a , A ) ,
(12) where G B e t a (y - a - p, p, a , A) denotes a generalization of the Beta distribution whose density function is
with y > /3 > 0, a > 0 and 0 < X 5 1 ([Rodriguez Avi et al., 2007al). A similar continuous distribution appears when [Rodriguez Avi et al., 2007bl develop a generalized beta binomial distribution, GBB. Once again the MLE are computed. A summary of all the fits is shown in Table 6 where it can be seen that the results are very similar to those obtained by the U G W D . Taking (12) into account, the variance can also be split into three compcnents: randomness, liability and proneness. X < 1 in all cases, so the fitted model has finite variance and the three components can be computed as it is shown in Table 7. h
A n Application of the Extended Waring Distribution 41 AIC Year EWD(%P;7;4 xs p - d u e log-likelihood -620.22 77.46 1990 (0.3682,0.3672,1.4041,0.9736) 8.0857 0.1516 84.41 -680.94 1991 (2.6136,0.1464,3.8107,0.9821) 3.8337 0.5726 -753.58 86.50 1992 (3.1237,0.1707,4.5378,0.9870) 4.6440 0.5902 -761.28 85.16 1993 (2.2502,0.1919,3.5235,0.9840) 4.0410 0.6711 -793.15 89.73 1994 (1.1743,0.2906,2.4461,0.9852) 5.3354 0.5016 -818.62 91.55 1995 (0.5193,0.4994,1.7711,0.9742) 4.1903 0.5554 91.07 1996 (0.9950,0.3260,2.1547,0.9791) 4.9794 0.6625 -850.67 -870.40 92.96 1997 (0.9950,0.3415,2.1749,0.9792) 3.2506 0.8609 96.30 -875.10 1998 (0.6287,0.6284,2.2798,0.9859) 12.5357 0.0510 -903.93 96.30 1999 (0.6481,0.6469,2.3196,0.9863) 6.7368 0.4568 112.95 -962.42 2000 (0.7239,0.7239,2.5868,0.9905)12.7388 0.0787 -1003.1 107.82 2001 (0.7964,0.7955,2.8477,0.9930) 7.7258 0.3574 118.41 -1041.7 2002 (0.8238,0.8235,2.9047,0.9936)12.3208 0.1375 120.73 -1083.9 2003 (0.8692,0.8685,3.0368,0.9952) 12.8017 0.1189
Table 6. MLE, X2-goodnessof fit test, log-likelihood and AIC for the hotel data using the E W D model
G
Year Total variance b Randomnes 3.12% 3.69% 2002 40.4804 3.94% 2001 35.2344 2000 31.4221 4.19% 4.76% 1999 25.0598 4.88% 1998 23.1130 22.1181 5.25% 1997 5.25% 1996 21.3852 5.87% 1995 17.5759 1994 21.4973 4.70% 1993 18.4643 5.17% 1992 5.02% 18.7985 1991 5.30% 15.6626 1990 5.55% 12.9923
6 Liabilitj '0 Pronenesz 50.96% 55.83% 56.55% 58.80% 61.34% 61.81% 50.55% 50.45% 65.94% 49.28% 64.14% 70.71% 67.28% 71.96%
45.91% 40.48% 39.31% 37.01% 33.90% 33.31% 44.20% 44.30% 28.19% 46.02% 30.69% 24.28% 27.42% 22.49%
Table 7. Partition of the variance for the hotel data
5
Conclusions
The E W D distribution is more flexible than the U G W D distribution to model overdispersed count data sets where the excess dispersion is assumed to be due to external factors (modelled by the gamma ditribution) and internal factors (modelled by the beta distribution). So, there are data sets that are adequately modelled by the E W D but not by the U G W D ([Rodriguez Avi et al., 2007al). When the fits provided by both models are acceptable and similar, the best model is the U G W D since it has less number
42 Recent Advances in Stochastic Modeling and Data Analysis
of parameters, according to t h e principle of parsimony. However, t h e estimation results sometimes do not allow t h e inherent properties of t h e U G W D to b e exploited, in particular, t h e partition of t h e variance. I n these cases, t h e E W D model is proposed as an excellent alternative to t h e U G W D model. This is the example included in this work.
References [IEA, ZOO6]Institute of Statistics of Andalusia IEA. System of Multi territorial Information of Andalusia (SIMA). Web access: http://www.juntadeandalucia.es/institutodeestadistica/sima/indexZ.htm, 2006. [Irwin, 196815. 0. Irwin. The generalized waring distribution applied to accident theory. Journal of the Royal Statistical Society A , 131:205-207, 1968. [Irwin, 197515. 0. Irwin. The Generalized Waring Distribution. Part I. Journal of the Royal Statistical Society A , 138:18-31, 1975. [Johnson et al., 2005lN. L. Johnson, A.W. Kemp, and S. Kotz. Univariate Discrete Distributions. Wiley, New Jersey, 2005. [Rodriguez Avi et al., 2003]J. Rodriguez Avi, A. Conde Shchez, A.J. S&ez Castillo, and M.J. Olmo Jimknez. Estimation of parameters in gaussian hypergeometric distributions. Communication in Statistics, Theory and Methods, 32:11011118, 2003. [Rodriguez Avi et al., 2007alJ. Rodriguez Avi, A. Conde Sanchez, A.J. S&ez Castillo, and M.J. Olmo Jimknez. A new extension of the waring distribution. Computational Statistics and Data Analysis, in press, 2007. [Rodriguez Avi et al., 2007blJ. Rodriguez Avi, A. Conde Sbchez, A.J. S&ez Castillo, and M.J. Olmo Jimknez. A generalization of the betabinomial distribution. Journal of the Royal Statistical Society, Series C: Applied Statistics, 56:in press, 2007. [Rodriguez Avi et al., 2006lJ Rodriguez Avi, A. Conde Shchez, A.J. S&ez Castillo, and M.J. Olmo Jimhez. Extended waring bivariate distribution. In Callej6n J . Herrerias J. Herrerias, R., editor, Distribution Models Theory, pages 221-232, 2006. Econometric Analysis of Count Data. [Winkelmann, 2003lR. Winkelmann. Springer, Heidelberg, 2003. [Xekalaki, 1983alE. Xekalaki. Infinite divisibility, completeness and regression properties of the univariate generalized waring distribution. Annals of the Institute of Statistical Mathematics, 35279-289, 1983. [Xekalaki, 1983blE. Xekalaki. The univariate generalized waring distribution in relation to accident theory: Proneness, spells or contagion? Biornetrics, 39:887895, 1983.
Estimation of simple characteristics of samples from skewed and heavy-tailed distributions Zdenek Fabi&n Institute of Computer Science Academy of Sciences of the Czech Republic Pod vodarenskou veii 2 Prague 18200 Czech Republic (e-mail: zdenekQcs.cas .cz) Abstract. Abstract. We present new characteristics of the central tendency and dispersion of data samples. They are constructed from estimates of parameters of underlying distributions and make possible an easy comparison of results obtained under different assumptions. Keywords: scalar inference function, generalized moment method.
1
Introduction
Johnson score - a scalar inference function - was introduced by [1,2] for a large class of continuous probability distributions. It was shown that the Johnson score moments exist under mild regularity conditions even in cases of distributions without mean and variance. The first moment describes the central tendency of the distribution and the reciprocal value of the second moment the dispersion of the values around the central point. It seems that whereas the mean m = J z d F ( z ) and variance g2 = z 2 d F ( z )-m2 compare the properties of distribution F with the standard (with the the normal distribution), the new characteristics have an ability t o compare distributions within parametric families even when distributions are skewed and/or heavytailed. Usually, having a n idea about the type of the underlying distribution, it is to estimate the parameters. We argued for a slight change of view: there are the sample Johnson mean and sample Johnson variance, which are to be estimated as characteristics of data samples taken from the distribution under consideration. They make possible t o compare results of estimation for various assumed distribution families parametrized by arbitrary ways.
2
Johnson score
Let us define the basic concept. Definition 1. Let F be distribution with support X = ( a ,b) C_ R and density f continuously differentiable according t o the variable. Let mapping 43
44
Recent Advances in Stochastic Modeling and Data Analysis
77 : X
+
R be defined by f
77b)=
x log(x - a ) (x - a ) log iogjb - .j
if ( a , b ) = R if-ca
< a < b < 00 - m = a < b < 00,
if -cc if
let function T ( z ) be given by
and the solution x* of equation T ( x )= 0
(3)
be unique. Function S(x) = q’(x*)T(x)
(4)
will be called a Johnson score of distribution F. Since ~’(x) > 0, x* is the solution of equation
S ( x )= 0
(5)
as well. The philosophy behind Definition 1is the following. Any distribution F with interval support X # R is viewed as a transformed prototype G supported by R, that is, F(x) = G(q(x)), x E
X.
(6)
Mapping q given by (1) is the Johnson transformation [3] adapted for arbitrary interval support. Denoting by g the density of G, the density of the transformed prototype (6) is f ( x ) = 9(77(2))77‘(x),
xE
x
(7)
where ~ ’ ( 2is)the Jacobian of the transformation. Let Q be the score function of G.
While the score function can be taken as a suitable inference function of prototypes, the transformed score function of the prototype,
T ( x )= Q ( v ( x ) ) ,
(9)
was found to be a relevant inference function of (6. It is termed by [4] a core function of F . Formula (2) follows from (9) by using (8) and (7) and shows
Estimation f r o m Skewed & Heavy- Tailed Distributions
45
that the core function can be determined without reference to the prototype by a special type of differentiating of the density according to the variable. Let G(y-p) be a prototype with location parameter p E R (expressing its central tendency). Consider distribution which is the transformed G(y - p ) on ( a ,b ) # R. Set t = 7-l ( P I , (10) (10) is called a Johnson parameter and will be considered even in multiparameter cases to express the central tendency of the transformed distribution, the density and core function of which are f(z; t ) = g ( q ( z )- q ( t ) ) q ' ( z ) and T ( z ;t ) = Q ( q ( z )- q ( t ) ) .It was shown by [4] that
d log f ( z ; t ) = q'(t)T(z; t). at
(11)
-
Johnson score S(z) = q ' ( t ) T ( zt;) of the transformed distribution with Johnson parameter thus equals to the likelihood score for this parameter. However, there are distributions without Johnson parameter and without parameters at all. Johnson score (4) is a generalization of relation (11) for these distributions. The generalization consists of replacing t in term q'(t) by the zero of the core function, which actually means a replacing the location parameter of the prototype G by the mode of G. Let us note that the solution of (3) is unique if G is unimodal; in cases of multimodal distributions we consider that z* is the image of the global maximum of the density. Hence z* is the most important point of the transformed distribution, characterizing its central tendency, and Johnson score can be interpreted as the 'score' for this point, which may or may not be a parameter of the distribution. Densities and Johnson scores of distributions used throughout the paper are given in Table 1.
IDistributionI normal lognormal Weibull Frkchet gamma
X R (0, m)
( 0 ,m) ( 0 ,m) (0,m)
7 f log(z/t)"
f [(Zit)" - 11 f [ l-
(;)"I
beta-prime ( 0 ,m)
Table 1.Densities and Johnson scores of some model distributions. function, B the beta function.
r is the gamma
The use of the modified Johnson transformation (1) in Definition 1 leads to Johnson scores expressed by simple formulas for many commonly used
46
Recent Advances in Stochastic Modeling and Data Analysis
probability distributions. Moreover, it is the only transformation under which the prototype of the lognormal distribution is the normal distribution and the inference function of the uniform distribution is linear.
3
Johnson mean and Johnson variance
The Johnson score moments of distribution F with Johnson score S are defined by
ESk
=
S k ( x )d F ( z ) ,
k = 1 , 2 , ...
(12)
It can be easily seen that ES = 0. From the Cram&-Rao regularity conditions imposed on F it follows that 0 < E S 2 < 03. Definition 2. Let F be distribution regular in the Cram&-Rao sense with Johnson score S. The solution x* of equation S ( x ) = 0 will be called a Johnson mean and value w2 = ( E S 2 ) - l (13) a Johnson variance of F. By (4), (13) turns for distributions with support w2
= (X*)”ET~.
X
=
(0, m) into (14)
Let us discuss Johnson characteristics of distributions from Table 1. The zero of S ( x ) = Q ( x ) of the normal distribution is x* = p . Since EQ2 = l/s2,Johnson mean and Johnson variance are equal to the mean and variance. Parameter t of the lognormal, Weibull and Frkchet distributions is the Johnson parameter. Fig. 1 shows densities and Johnson scores of three particular cases of Weibull distribution with c = 1 (exponential distribution), c = 2 (Rayleigh distribution) and c = 3 (Maxwell distribution). Johnson means of all three distributions are x* = 1, the means (denoted by stars) have similar values, m(1) = x*. Frkhet distribution is a heavy-tailed distribution and its mean m = t r ( 1 - l / c ) and variance o2 = t2[T(1- 2 / c ) - T2(1 - l / c ) ] exist only if c > 1 and c > 2, respectively. Fig. 2 shows the densities and Johnson scores of Frkchet distributions with Johnson mean x* = 1. Neither of the means (the stars) describes the position of the distribution on x-axis. m(1) does not exist. Fig.3 shows densities of Fritchet distributions with various Johnson means. The variability of values around Johnson mean is apparently similar to all four distributions. Actually, they have the same Johnson variance w2= 1. Gamma and beta-prime distributions are examples of distributions without Johnson parameter. By ( 2 ) , the core function of the gamma distribution is T ( x ) = y x - Q so that x* = a/r. Since ET2 = Q, by (14)
Estimation from Skewed & Heavy- Tailed Distributions 47
0 0
2.5
-5‘
1
0
2.5
Fig. 1. Densities (a) and Johnson scores (b) of Weibull distributions with z* = 1, c = 1,2,3. The means m ( c ) are denoted by stars. m ( 1 ) = 1,m(2) = 0.885,m(3) = 0.893.
w2 = (x*)’/a: = a:/y2. Johnson mean and Johnson variance of the gamma
distribution are thus equal to the mean and variance. On the other hand, the mean m = p / ( q - 1) and variance
of the beta-prime distribution exist only if q > 1 and q > 2, respectively. By = (P- P)/(X 11, by (3) x* = P / % ET2 = P d ( P 4 1) and (2)’ Johnson variance is, by (14),
+
+ +
Note that (16) looks like (15) with a ’corrected denominator’. Standard deviation and the square root of Johnson variance of the betaprime distribution with p = q as functions of l / q are compared on Fig. 4. u blows up at l / q = 1/2 whereas ’Johnson deviation’ w is comparable with the simulated average median absolute deviation.
48
Recent Advances in Stochastic Modeling and Data Analysis 1
0 0.5
1
2
0
-6
0.5
1
3
Fig. 2. Densities (a) and Johnson scores (b) of FrBchet distributions with z* = 1, c = 1,1.5,2. The means m ( c ) are denoted by stars. m(1) do not exist, m(1.5) = 2.68, m(2) = 1.77.
4
8
Fig. 3. Densities of Frkchet distributions, t = 1 , 2 , 3 , 4 , w = 1.
4
Estimates
Let X I , ..., X , be random variables i.i.d. according t o Fs, B E 0 , 0 C R" with unknown 6' and 51, ...,x, their observed values. The structure of parameters of various distributions, the consequence of the historical development of mathematical statistics, exhibit a chaotic picture. It is difficult t o compare results of the estimation for distribution families parametrized by different ways.
Estimation f r o m Skewed & Heavy- Tailed Distributions
Fig. 4. Beta-prime distribution. 1 - o,2 - w , 3
-
49
simulated MAD
Both the Johnson mean x* : S ( x ; O ) = 0 and Johnson variance w2 = 1/ES2(6')are functions of 0 and can be constructed from the maximum likelihood estimate of 0. In what follows A N *means 'asymptotically normal'. Since ES2(6')> 0, numbers ?b, = x * ( 6 ' ~ , )and Gk, = w2(6),
s,,
characterize a 'center' and dispersion of the sample. Their asymptotic behavior can be easily established by using the delta method theorem, saying that if 8 is AN(6',a2)and ~ ( 6 ' is ) differentiable a t 6' with ~ ' ( 6 ' # ) 0, ~ ( 8is) AN((p(B),[(p'(6')]2a2) (Corollary to Theorem A, [ 5 ] ,pp.122). Another possibility to estimate z* and w is discussed by [I]. Unlike the usual moments, the sample versions of Johnson score moments cannot be determined without an assumption about the underlying distribution family. On the other hand, by substituting the empirical distribution function into (12), a system of equations
appears t o be an alternative t o the maximum likelihood equations in the whole range of the parameters. The estimates from (17) were shown to be asymptotically normal and, in cases of families with bounded Johnson scores, robust and with relative efficiencies near to one. The first equation of (17) for distributions with Johnson parameter is the maximum likelihood equation for this parameter. New results are obtained for distributions without this parameter. Thus, the first equation (17) for - a ) = 0, from which the estimate of z* = gamma distribution is Cy==l(yxi a/r is 2: = 12-l z i . From the first equation of (17) for the beta-prime distribution, cy=l(qxi - p)/(zi 1) = 0, the estimate of z* = p / q is
cyZl
+
In a simulation study, samples of length 100 were generated consecutively from each distribution listed in rows of Table 3, each with values of 6' giv-
50 Recent Advances in Stochastic Modeling and Data Analysis
ing x*(O) = 1 and w(O) = 1.118. Both z* and w were estimated under the assumption of either distribution listed in headlines of columns in Table 3, where the average values over 5000 samples are summarized. It is apparent that erroneous assumptions often lead to unacceptable estimates (note, however, the similar results obtained under assumptions of the lognormal and beta-prime distributions). By the use of the estimates of the Johnson mean and Johnson variance, it is easy to compare the results of estimation for various assumed distribution families parametrized by arbitrary ways.
0.49 0.64
1.013 1.006 1.016
Weibull
gamma
lognormal beta-prime 1.97 0.997 0.99 0.66
3.41 1.11 1.103 0.55
Frkchet 1.42 2.09 0.931
Table 2. Estimates of Johnson mean and Johnson deviation
Aknowledgements. The work was supported by GA ASCR under grant No. 1ET 400300513. References [I]F a b i h , Z. (2006). Johnson point and Johnson variance. Proc. Prague Stochastics 2006, (eds. HuSkovB and Janiura), Matfyzpress, 354-363. [2] FabiBn, Z. (2007). New measures of central tendency and variability of continuous distributions. To appear in Commun. Stat., TheoryMethods. [3] Johnson, N.L. (1949). Systems of frequency curves generated by methods of translations. Biometrika, 36: 149-176. [4] FabiBn, Z. (2001). Induced cores and their use in robust parametric estimation. Commun. Stat., Theory Methods, 30, 3, 537-556. [5]Serfling, R. J. (1980). Approximation. Theorems of Mathematical Statistics. Wiley, New York.
Estimating robustness and parameter distribution in compartmental models of neurons Noam Peled' and Alon Korngreen' The Mina and Everard Goodman Faculty of Life Sciences and the Leslie and Susan Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan, Israel. (e-mail: korngraamai I. biu.ac. i I) Abstract. Our understanding of the input-output function of single neurons has been advanced by biophysically accurate multi-compartmental models. The large number of free parameters in these models requires the use of automated approaches for finding their optimal values in a very multi dimensional parameter-plane. Due to inherent noise in measuring equipment and surroundings determination of the accuracy of the obtained parameters is near impossible. Here we show that finding the parameters distribution via Monte Carlo approach simulations reveals the sensitivity of each parameter to noise. This method allows for the reduction of the complexity of the parameter-plane which in turn enables finding the sensitivity of the model to each parameter and the sensitivity of each parameter to noise. Keywords: Multi-compartmental models, Monte Carlo, Robustness, Noise sensitivity, Global minimum
1
Introduction
The properties and functions of neurons in the CNS have been intensively studied over the past decade [Johnston et al., 2003; Migliore and Shepherd 2002; Stuart et al. 19991 providing exciting new information that revived the discussion about the computational properties of single neurons [Hausser and Mel, 2003; Mel, 2002; Poirazi and Mel, 2001; Polsky et al. 20041. Clearly, understanding complex neuronal computations requires functional models which must take into account several types of neuronal properties, e.g., voltage-gated conductances, conductances distributions etc. This complexity resulted in many parameters in neuronal models being tuned by hand [Rhodes and Llinis, 2001; Schaefer et al., 20031. However, the large number of free parameters causes these models to be ill constrained [Rhodes and Llinas 2001; Schaefer eta]., 20031. Previous studies have shown that parameter search methods can be effective in finding matches between single-neuron models and a target data set [Eichler West, 1996; Vanier and Bower, 19991. Nevertheless, not all parameter search methods perform equally well, and the relative performance of the different methods depends on the model being optimized. Stochastic algorithms such as 51
52
Recent Advances in Stochastic Modeling and Data Analysis
simulated annealing [Kirkpatrick et al., 19831 and genetic algorithms (GAS) [Mitchell, 19961 have been found to be the most successful in constraining single-neuron compartmental models because such algorithms are less greedy and rarely fall into local minima, unlike gradient-descent methods that easily get stuck in a local minimum of the parameter space [Vanier and Bower, 19991. It has been shown that for a search with a large number of free parameters, the GA method may be the most effective [Keren et al., 20051. Additionally, the intrinsic parallelism of this method makes it very well suited for implementation on parallel computers [Eichler West, 1996; Keren et al., 20051. Generally, the model of a neuron can be defined as a function with n parameters describing such properties as the potassium and sodium conductance densities or the capacitance of the plasma membrane. The output of such a function will be the output of electrophysiological measurements; i.e., the current or the voltage of the measured compartment. Over the years the robustness of neuronal models has been discussed extensively [Schneidman et al., 2000; Goldman et al., 2001; Marder and Goaillard, 20061. However, to this day the robustness of neuronal compartmental models remains an extremely fluid concept as no rigorous definition of this property for neuronal models has ever been proposed. In this work we defined robustness as the model sensitivity to small fluctuations of its parameters. We next addressed the problem of estimating parameters in compartmental models in the face of noise. Specifically, the parameter distribution of these models was calculated. These distributions were then used in calculating indices indicating the parameters sensitivity to noise and the sensitivity of the model to small fluctuations in the parameters.
2
Methods
Electrophysiological measurements were used to produce the output of the neuronal model. Our aim is to reveal the model parameters using a form of reverse engineering based on the given output. For that purpose some new variables were defined: DS (Data Set): The model output DStme: The desirable model output, or the output of an electrophysiological measurement a: The parameters values vector
{ a l ,a 2 , . .,a,, ]
a,,,,: The parameters values vector yielding DS,,,,. To solve the reverse engineering problem a metric on the parameters phase-plane was defined, calculating the distance between a given DS to DStme,and yielding zero when the DS is exactly the DS,,, using the x2 criterion,
Estimating Robustness and Parameter Distribution
53
Thus, we obtain an N dimensional function which yields, at the global minimum, zero for atrue.The complexity of the function increases with the number of free parameters resulting with numerous local minima [Achard and De Schutter, 20061. To find the global minimum in such a parameter space stochastic optimization algorithms may be of use [Vanier and Bower, 19991. For this work a genetic optimization algorithm was used [Mitchell, 1996; Keren et al., 20051. Electrophysiological measurements of a neuron vary in output at different points in time due to inherent noise; this noise can be divided into two categories: a) equipment noise, and b) behavioral noise of the neuron itself; the latter can be described as “stochastic” as long as its true nature is unknown. In this work, we dealt only with the equipment noise, which was assumed to be random white Gaussian and with deterministic models, where the nature of the noise inserted to the DS is known. Thus, after defining the noise, synthetic data sets were simulated, by adding white Gaussian noise to DStrue (Fig. 1). Two additional variables were defined to deal with the noise:
DslS : The ith synthetic data set
as : The vector of the parameters values which is the global minimum in the parameter phase plane, based on DSl!. The creation of the synthetic data sets enables us to use Monte Carlo simulation (schematically displayed in figure 1) [Press et al., 20021. By using the set
{a:] it was possible to calculate the parameter distributions.
The distributions generator was the noise, as without adding the noise to the DStrue,the parameter distributions would be built from only one value. The wider the parameter distribution the greater is the parameter sensitivity to the noise. A dispersion measure, the VMR (Variance to Mean Ratio) was applied to compare between several distributions width. The VMR was interpreted as an index which indicates the parameters sensitivity to noise. Furthermore, in order to estimate the model robustness, in each minimum the algorithm converged to, the partial derivative (PDer) of the recording to each parameter was calculated numerically. The PDers indicate how much the distance function is sensitive to small changes in the parameters values. After summing all the PDers of all parameters in all the
as , a set of the average influence of each parameter was calculated ( pder )
54
Recent Advances i n Stochastic Modeling and Data Analysis
Figure 1: The fitted parameters from an experiment are used as surrogates for the true parameters. White Gaussian noise is used to simulate many synthetic data sets. Each of these is analyzed to obtain its fitted parameters. The reverse engineering problem, based on a synthetic data set, is not a well defined problem. The synthetic data set is not the true output of a model due to the addition of noise. As a result, no vector exists that satisfies the distance function, and the function never yields 0. The function infimum can be defined as:
inx = dis t (DS,,, ,DS;F) For each minimum the algorithm converged to, there is a difference between each parameter value and its true value. This difference can be defined as the parameter bias:
bias, := alS - alrU, Thus, for a parameter displaying a relatively large PDer value, small fluctuations from als will significantly change the distance function. Therefore, it is possible to suggest that a large PDer value means a high probability of small bias. Finally,
Estimating Robustness and Parameter Distribution
55
in order to scale the models a new variable was defined: Information Index for Model Sensitivity (IIMS):
3
where pder is the ith component of { pder . This equation, based on information theory, calculates the ratio between the PDer distribution and a uniform distribution. When the distribution is uniform, it means that the model is sensitive to all the parameters alike. Therefore, IIMS is a factor describing the model sensitivity to its parameters. 0 I IIMS I 1, when IIMS=l the dependence on all the parameters exhibits the same tendency, and when IIMS=O the model depends on one parameter alone.
Results
3
A model of a soma was constructed with one dendrite connected to the soma and two more connected to the first one using the simulation environment NEURON [Carnevale and Hines, 20061. Each compartment included the basic Hodgkin and Huxley mechanism of potassium and sodium channels [Hodgkin and Huxley, 19521. Each compartment is consisted also of 3 types of parameters (12 parameters for the complete model): El : Leakage reversal potential 0
gK: Maximum sodium channel conductance -
g,
: Maximum potassium channel conductance
The DS,,, was calculated using those parameters values: E, = -54.3mV; S,, = 0 . 1 2 P y ; g, = 0 . 0 3 6 ~ ~ .m A 2 step current of O.lnA cm
amplitude and 90ms duration was injected to the soma. The voltage was recorded from the soma. After analysis of the electrophysiological recordings of L5 pyramidal neurons, the noise amplitude turned to be around +0.5 mV. Figure 2 displays the parameter distributions obtained after calculating more than 100 runs of the Monte Carlo simulation. The VMR and PDer calculated from these distributions are shown in figure. 3. As seen in figure 3A, El, the leakage reversal potential, in each compartment is highly sensitive to noise, in comparison to the other parameters. The sensitivity to noise increases with the compartment
Recent Advances i n Stochastic Modeling and Data Analysis
56
Figure 2: Plot of The 12 parameters distributions in a Hodgkin-Huxley neuron with soma and 3 dendrites.
2
6eL
d
s gW
5
gK
d i SL
dl g N A
dl gK
d2sL
d2gW
d2gK
rUeL
639NA
d? gK
02
Figure 3: Plot of the VMR and PDer of the 12 parameters model. Each set of 3 parameters are in a different compartment (s: soma, d,: dendrite number n). (A) El has the largest VMR in each compartment, and it increases with the compartment distance fi-om the soma. (B) gK has the largest PDer in each compartment, significantly, and it decreases with the compartment distance fi-omthe soma.
Estimating Robustness and Parameter Distribution
57
distance from the soma. Figure 3B shows that g K ,the maximum sodium channel conductance density, is the most influential parameter in each compartment. The influence decreases with the compartment distance from the soma. The IIMS is 0.5646, and as we can see almost all the dependence of the model is on a set of half of the parameters.
4
Discussion
We observed that the El has a relatively large sensitivity to noise. Therefore, when measuring the El in electrophysiological experiments numerous measurements must be considered in order to extract meaningful results. The extreme width of the El spectrum may indicate that it is stochastically conducting in the neuron itself. We also observed that S, is relatively the most influential parameter on the behavior of the neuron. It is most likely that when trying to converge to a global minimum for any model, this parameter will have small bias. In this work we built an infrastructure which calculates the parameters sensitivity to noise and the robustness of the model. This infrastructure is not limited to computation ional neuroscience alone, but can be adapted to any field, after determining the N parameters model, the scalar output and the noise component. Further investigations should take into account the dependence of the distribution
{a;] on the distance function used; e. g., the distribution can change when using other distance function, such as IS1 (chi-square of the inter-spikes intervals), or a more complex one [Keren et al., 20051. Additionally, the distribution is dependent on the characteristics of the noise inserted to the DS; inserting more complex noise, non Gaussian and/or not white may lead to different results.
References [Achard and De Schutter, 2006lP. Achard and E. De Schutter, Complex Parameter Landscape for a Complex Neuron Model, PLoS Computational Biology Vol. 2, No. 7,2006 [Carnevale and Hines, 2006lM. Hines, N. T. Camevale, The neuron book, Cambridge university press, 2006. [Eichler West, 1996lRM. Eichler West, On the Development and Interpretation of Parameter Manifolds for Biophysically Robust Compartments Model of CA3 Hippocampal Neurons (PhD thesis). Minneapolis, MN: University of Minnesota, 1996 [Goldman et al., 2001lMS. Goldman, J. Golowasch, E Marder, LF Abbott, Global Structure, Robustness, and Modulation of Neuronal Models, Journal of Neuroscience, 2 1( 14), 200 1
58
Recent Advances in Stochastic Modeling and Data Analysis
[Hodgkin and Huxley, 1952lA. L Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its application to conduction and excitation in nerve.” J Physiol 117(4): 500-44, 1952 [Johnston et al., 2003lD. Johnston, BR. Christie, A. Frick, Gray R, Hoffman DA, Schexnayder LK, Watanabe S, and Yuan LL. Active dendrites, potassium channels and synaptic plasticity. Philos Trans R SOCLond B Biol Sci 358: 667-674, 2003 [Keren et a1.,2005]N. Keren, N. Peled, A. Korngreen, Constraining Compartmental Models Using Multiple Voltage Recordings and Genetic Algorithm, Journal of Neurophysiology 94: 3730-3742,2005 [Kirkpatrick et al., 19831s. Kirkpatrick, CD. Gelatt and MP. Vecchi, Optimization by simulated annealing. Science 220: 671480, 1983. [Marder and Goaillard, 2006lE. Marder and JM. Goaillard, Variability, compensation and homeostasis in neuron and network function, Nature Reviews Neuroscience 7, 563574,2006 [Mel, 2002]BW. Mel, What the synapse tells the neuron. Science 295: 1845-1846,2002. [Mitchell, 1996lM. Mitchell, An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1996 [Migliore and Shepherd 2002lM. Migliore and GM. Shepherd, Emerging rules for the distributions of active dendritic conductances. Nat Rev Neurosci 3: 362-370, 2002. [Poirazi and Mel, 2001lP. Poirazi and BW. Mel, Impact of active dendrites and structural plasticity on the memory capacity of neural tissue. Neuron 29: 779-796,2001. [Polsky et al. 2004lA. Polsky, BW. Mel, and J. Schiller, Computational subunits in thin dendrites of pyramidal cells. Nat Neurosci 7: 621427,2004. [Rhodes and Llina’s, 20011 PA. Rhodes and RR Llina’s, Apical tuft input efficacy in layer 5 pyramidal cells from rat visual cortex. J Physiol 536: 167-187,2001. [Schaefer et al., 2003lAT. Schaefer, ME. Larkum, B. Sakmann, and A. Roth, Coincidence detection in pyramidal neurons is tuned by their dendritic branching pattern. J Neurophysiol89: 3 143-3154,2003. [Schneidman et al., 2000lE Schneidman, I Segev, N Tishby, Information capacity and robustness of stochastic neuron models, MIT Press, 2000 [Stuart et al. 1999lG. Stuart, N. Spruston, and M. Hausser, Dendrites. Oxford, UK: Oxford Univ. Press, 1999. [Vanier and Bower, 1999lM.C. Vanier. and J. M. Bower, A comparative survey of automated parameter-search methods for compartmental neural models, J Comput Neurosci 7(2): 149-71, 1999 [Press et al., 2002lW.H.H Press, S.A. Teukilsky, W.T. Vetteling and B.P. Flannery, Numerical Recipes in C++, Cambridge university press, 2002.
On the Stability of Queues with Negative Arrivals Tewfik Kernane Department of Probability and Statistics Faculty of Mathematics University of Sciences and Technology USTHB BP 32 El-Alia, Algiers, Algeria (e-mail: tkernaneQgmai1.corn) Abstract. We obtain in this paper stability and instability conditions for some single server queues with negative customers of batch or work removals. A negative customer removes a group of customers or an amount of work if present upon its arrival. Under general assumptions of stationary ergodic arrivals of negative customers or services of regular customers, we obtain sufficient stability conditions by strong coupling convergence of the process modeling the dynamics of the system to a stationary ergodic regime. We obtain also instability conditions by convergence in distribution to improper limiting distribution. The systems considered are of batch or amount of work removals, removals of customers from the tail of the queue, removals of customers in service, batches of arrivals and services of regular customers. Keywords: Queues, Negative Customers, Stability.
1
Introduction
This paper deals with the problem of stability for single server queueing systems with regular and negative customers. If a regular customer (or positive) enters in the system he takes place in the waiting zone and waits t o be served according t o a specified service discipline. On the other hand, if a negative arrival occurs it removes one or a batch number of regular customers if present. The notion of negative customers was originated by Gelenbe [Gelenbe, 19911. We consider various models of this type, including batch or work removals, batches of arrivals and services, removals of customers in service or at the tail of the queue. These special models, have been studied for their practical interest in telecommunication and computer systems. They may model virus infections in computer systems, deletion of transactions in databases amongst others. The analysis of stability in queueing systems is of great importance since all other performance characteristics depend on whether or not the system is stable. The stability conditions for some models of negative customers have been derived by [Gelenbe et al., 19911 under i.i.d assumptions on service times and inter-arrivals of negative customers, with single deletion of regular customers. In [Kernane and Aissani, 20061, the sufficient stability 59
60
Recent Advances in Stochastic Modeling and Data Analysis
condition for a retrial queue with negative customers was obtained under stationary ergodic service times for regular customers.The main approach used here for deriving the strong coupling convergence to a stationary regime was also previously used for retrial queues in [Altman and Borovkov, 19971 and [Kernane and Aissani, 20061. Harrison and Pitel [Harrison and Pitel, 19961 generalized the theory on negative arrivals to the M/G/l queue. Several authors generalized this concept allowing that a negative arrival removes a batch of customers ([Chao and Pinedo, 1995]), a random amount of work [Boucherie and Boxma, 19951 or indeed all the work in the system (Chao 1995, Jain and Sigman 1996). Other studies considered negative customers that accumulate at the nodes of networks but do not need services from the server ([Boucherie and Van Dijk, 19941, [Henderson, 19931, and Henderson e t al. (1994). Another recent interesting generalization is that of Zhu and Zhang (2004) who considered an M/G/l queue with positive and negative customers that cancel each other out and server provides service to either a positive or a negative customer. Our contribution in this paper is based on constructing explicit stochastic recursive sequences which model the dynamics of the considered models and allowed us to obtain explicit stability and instability conditions under general assumptions of stationary ergodic (without independence assumption) service times or inter-arrivals of negative or positive customers. We use the notion of strong coupling convergence to the stationary ergodic regime which implies the coupling convergence and then the convergence in total variation. In the subsequent section, we study the stability and instability of models with removal of customers in service under general stationary and ergodic inter-arrivals of negative customers or service times of regular customers. We also consider that the arrivals are in groups of random sizes and service is also provided for random batches. In section 3, we consider removals of customers at the tail of the queue and batches of arrivals and services under general stationary and ergodic service times or inter-arrival times of negative customers. We also generalize models of negative customers to the case where the negative arrivals not only removes regular ones but breaks also the server. We consider also a model of retrial queue with batches of arrivals, services and removals. In section 4, we consider a system with work removal and batches of arrivals and services.
2 2.1
Removal of Customers in Service Stationary ergodic arrivals of negative customers
Consider a single server queue at which the occurrences of arrivals of regular customers follow a Poisson process with rate A+, and at each arrival epoch a random batch of size ai,with ai an i.i.d sequence with mean ti, enters to the system. Denote by T? the inter-arrival times of regular customers. The waiting zone is of infinite capacity and we assume work conserving disciplines
On the Stability of Queues with Negative Arrivals 61 such as FIFO, LIFO or random access to the service. In this section, we consider the case when a negative arrival removes the customers (if any) in service (RCS). A negative arrival has no effect on an idle system. Negative custoniers arrive at times t,; n = 0,1, ..., and denote by T; = tn+l - t , the inter-arrival times of negative customers. We assume that T; is a stationary (in the strict sense) and ergodic sequence (without independence assumption). Ergodicity means essentially that time averages converge to constants almost surely. Service of regular customers is provided in batches of random size b j , with bj an i.i.d sequence with mean 6, and the time Sj it requires to serve them is exponentially distributed with rate p+. The negative arrival removes the batch bj which is already in service. Denote by N,+ ( t ) (respectively N,+(t))the counting Poisson process with parameter A+ (resp. p + ) which counts the number of occurrences of arriving regular customers (resp. services) during a time interval (0, t ) . We assume that the input flows of customers, sizes of batches and service times are mutually independent. Let Q(t) be the number of customers present in the system at time t. We consider the process Qn embedded immediately after time t , (i.e., Qn = &(in+)). The process Qn can be represented as a stochastic recursive sequence (SRS) as follows:
where (z)+ = maz(0, z). Denote by
i=l
i=l
r
Put VO= 0 and Vn = CY=,E-i. The following proposition gives a stability condition by strong coupling convergence to a unique stationary ergodic regime and instability condition by convergence in distribution to improper limiting sequence.
z,
Proposition 1. If (A% - p+6)ErC < t h e n the process Q , i s strong coupling convergent t o a unique stationary ergodic regime Q, such that QO = SuPn20 Vn. If (A% - p+z)E~; > 6,then the process Qn converges in distribution t o a n improper limiting sequence.
Proof. We have from Wald's equation and the memoryless property of the Poisson process EEn = A+ZErF - p'bEr1 - 6. (3) Since r; is a stationary and ergodic sequence then Sn, which is generated by T; (and other i.i.d sequences), is also a stationary ergodic sequence (see
Recent Advances in Stochastic Modeling and Data Analysis
62
[Kernane and Aissani, 20061). If condition (A% - pL+6)E71 < 6 is satisfied then EEn < 0. From this and Example 11.1 of [Borovkov, 19981, there exists a stationary sequence of renovating events with positive probability for Qn. Using Theorem 11.4 of [Borovkov, 19981,-the SRS Qn is strong coupling convergent to a unique stationary sequence Q, = UnQ0, where is measurable with respect to the a-algebra generated by 7; and U is the shift transformation of random variables generated by 7; (i.e., 7i+1 = V7;). Since T; is ergodic then is also ergodic from the fact that it is an U shifted sequence. From Loynes’ scheme [Loynes, 19621 (for more details see [Gyorfi and Morvai, 20021) - it follows that the SRS is coupling convergent to the sequence Qn where QO= sup,>o V,. Since the strong coupling convergence implies the coupling then the result follows. For the instability condition, if it holds that (A% - p+6)E~; > 6 then E& > 0. It is well known that for SRS of the form Qn+l = ( Q n En)+, the condition E& > 0 implies that the SRS converges in distribution to an improper limiting sequence (see Theorem 1.7 in [Borovkov, 19761).
00
an
-
+
Remark 1. In the case of an i.i.d service times (general distribution B ) and
N ( t ) the counting process of a non-delayed renewal process having cycle lengths i.i.d.-B, then we obtain a more general result, that is for the condition of stability X%E7, - ~ N ( T ;<) 6. 2.2
Stationary ergodic service times
We assume now that the service times S, are stationary and ergodic and arrivals of negative customers occur according to a Poisson process with rate A-. Since the service times Sn are stationary then they have the same distribution function B ( t ) and assume it has a Laplace-Stieljes transform (LST) B * ( s )= B ( t )exp(-st)dt. If it possesses a density b(t) then we denote by p*(s) its corresponding LST. Define s, the instant at which the ( n- 1)st service time ends. Service times, input flows of customers, size of batches are assumed mutually independent. We consider the process Q, embedded immediately after time s, (i.e., Qn = Q(sn+)). The process Qn satisfies the following recursion:
som
In this case EEn = X+ZE(min(S1, 71))- 5. Since tributed with rate A- then we have
E(min(S1,Tc))=
71 is exponentially
(1 - B(s))exp(-sX-)ds =
(1 - A-B*(X-))
x-
dis(5)
or with the density we have E(Tnin(s1,71))= (1 - P*(X-))/X-. We have the following proposition for which the proof is omitted since similar to that of Proposition 1.
O n the Stability of Queues with Negative Arrivals 63 Proposition 2. If A%(l - P*(A-)) < A-b, t h e n the process Q , i s strong coupling convergent t o a unique stationary ergodic regime Q, such that QO = SuPn>oV,. If A%(l - p*(A-)) > A-6, t h e n the process Qn converges in distribution to a n improper limiting sequence. I
Remark 2. By assuming an i.i.d sequence for inter-arrival times of negative customers with exponential distribution with rate A- = ( l / E ~ c and ) single arrivals and services, hence ii = 1 and 6 = 1, we obtain in Proposition 1 the well known stability condition for M/M/l queue with negative customers (see [Harrison and Pitel, 19931): (A+
-
A-)(l/p+)
< 1.
(6)
In Proposition 2, if we assume an i.i.d sequence of service times with exponenA-) tial distribution with rate p+ = (l/E&), we have p*(A-) = p+/(p+ and condition 6 holds true also for this case.
+
3 Removal of Customers at the Tail of the Queue 3.1
Stationary ergodic arrivals of negative customers
The sequence of inter-arrivals of negative customers {T;} is assumed stationary ergodic and customers are removed from the tail of the queue at the epoch of arrivals t , in batches of random sizes d,, with {d,} an i.i.d sequence with mean 2.We assume that the input flows of customers, size of batches and service times are mutually independent. Let Q , embedded just before the arrival of a negative customer. The representation of Qn as a stochastic recursive sequence (SRS) is given by:
i=l We have the following result:
Proposition 3. If (A% - p+b)E~; < 2,t h e n the process - Qn i s strong-coupling convergent t o a unique stationary ergodic regime Qn such that QO = SuPn>o V n . If (A% - p+6)E~; > 2,t h e n the process Q , converges in distribution t o a n
improper limiting sequence. 3.2
Stationary ergodic service times
We assume now that the service times S, are stationary and ergodic and the inter-arrivals of negative customers are i.i.d exponentially distributed with
64
Recent Advances in Stochastic Modeling and Data Analysis
rate A-. The process Qn is embedded immediately after the end of the ( n - 1 ) s t service time. The process Qn satisfies the following recursion:
We have the following proposition:
Proposition 4. If (X+si - X-1)ESl < b, t h e n the process Qn i s strong-coupling convergent t o a unique stationary ergodic regime Qn such that QO = SUPn>o Vn. If - X-E)ES1 > b, t h e n the process Qn converges in distribution t o a n improper limiting sequence.
-
Negative customers breaking the server An interesting generalization in practice is to consider that the negative customer not only eliminates regular customers but also causes a breakdown to the server, which must immediately be repaired. As in computer systems where the entry of a virus causes the elimination of programs and thus requires its treatment by an antivirus, and this time of repair may be random. Assume that the server fails at times. Upon the arrival of a negative customer the server fails and takes immediately a repair time Ri, i = 1 , 2 , .... The sequence of repair times {Ri}is assumed to be stationary and ergodic. We assume that after a repair time the server is as good as new and the service of a customer is cumulative. The SRS modeling the system has now the form: NA+(Sn)
Qn+l=(Qn+
1 i=l
ai-
NA-(sn) NA-(Sn)NA+(Rz) ci+ aj - b n ) i=l j=1 i=l
C
C 1
+
.
(9)
We have the following proposition:
-
Proposition 5. If ( X + t i ( l + X-ER1) - X - E ) E S l < 6, t h e n the process Qn i s strong coupling convergent t o a unique stationary ergodic regime Qn such that QO = V,. If ( A + t i ( l + X-ERl) - A-i?)ESl > 5, t h e n the process Qn converges in distribution t o a n improper limiting sequence. Retrial queueing systems with negative customers We can obtain in a similar manner stability and instability conditions for a special class of queueing systems called retrial queues. We consider arrivals in batches for regular customers and if a group of an arrival finds the server busy then the whole batch joins a group of blocked customers called "orbit". Otherwise, if the batch arrival finds the server idle, one of them takes his service and the others join the orbit. Customers in orbit reapplies at random times to get
O n the Stability of Queues with Negative Arrivals 65 served according to the following versatile policy. The probability of having a retrial during the time interval ( t ,t At],given that j customers were in orbit at time t, is (6(l - 60j) j v ) A t o ( A t ) . This versatile retrial policy, introduced by [Artalejo and Gomez-Corral, 19971, incorporates the classical linear policy and the constant one. If v = 0 we obtain the constant retrial policy studied by [Fayolle, 19951. Denote by u, a sequence of independent random variables uniformly distributed on [0,1] and generating the type of customer that gets the service (external or from the orbit) at the end of the successive service periods. The stochastic recursive sequence modeling the dynamics of the system has the following representation: NA+(Sn) Nx- ( S n ) ci Ci=l Qn+l = (Qn Ci=l ai (a1 - l ) I ( U n I x + + ~ + Q1 ~ ~
+
+
0fQnv A++o+Q,,,
I)+.
+ +
+
bnI(un 5 a1 is the size of the first batch arrival if the external arrival occurs before a retrial from the orbit. We have the following proposition: Proposition 6. 1) If v = 0, 6 > 0 and ( X f E - X-E)ESl < ( A + ~ ~ f ~ e ' ) then th,e proce_ss Qn is stronx coupling convergent to a unique stationary er- V,. godic regime Q, such that QO = sup,>o
If ( X b - X-E)ES1 > then the process Q, converges in distribution to an improper limiting sequence. 2) If v > 0, 0 2 0 and (A% - X-E)ESl < b, then the process- Q, is strong coupling convergent to a unique stationary ergodic regime Q, such that Qo = SUP,>O - Vn. If (A% - X-E)ESl > 6, then the process Q, converges in distribution to a n improper limiting sequence. Remark 3. By considering a system with retrials without negative customers and service is provided in single then we obtain the result of Theorem 7.1 in [Kernane and Kissani, 20061 for a retrial queue with versatile retrial policy and batch arrivals.
4
Work Removals
Consider a single server queue with work removals. The inter-arrival times of regular customers {T:} and service times {S,} are general stationary and ergodic sequences. Let t, the arrival epoch of the nth regular batch of customers. During T:, NA(T;) negative arrivals occur removing t,he with mean E . We assume that i.i.d amounts of work c a , i = 1,...,NA-(T$) the input flows of customers, size of batches and service times are mutually independent. Denote by W ( t )the workload at time t. The following recursion holds for the workload process Wn = W(t,-):
i t1
,
66
Recent Advances in Stochastic Modeling and Data Analysis
Denote by En = S , - T ; - ~ Z < ' ~ ' ) cy. We have Etn = ES~-ETF-X-~ET:. Using the same approach as for the precedent propositions we have the following result:
+
< (1 X-E)ET,~,then the process W, is strong coupling convergent to a unique stationary ergodic regime W, such that WO= suPn>o V n . If ES1 > (1 X-E)ET$, then the process W, converges in distribution to an improper limiting sequence. Proposition 7. If ES1
-
-
+
Remark 4. If we assume and i.i.d. sequence of service times with mean 0 a Poisson process with rate A+ = ~/ET; then we obtain the result of stability given in [Boucherie and Boxma, 19951 by X+p < 1 A-2.
+
References [Altman and Borovkov, 1997lE. Altman and A.A. Borovkov. On the stability of retrial queues. Queueing Systems, volume 26, pages 343-363, 1997. [Artalejo and Gomez-Corral, 19971J.R. Artalejo and A. Gomez-Corral. Steady state solution of a single-server queue with linear repeated requests. J. Appl. Prob., volume 34, pages 223-233, 1997. [Borovkov, 1976lA.A. Borovkov. Stochastic Processes in Queueing Theory. John Wiley and Sons, 1976. [Borovkov, 1998lA.A. Borovkov. Ergodicity and Stability of Stochastic Processes. John Wiley and Sons, 1998. [Boucherie and Boxma, 1995lR.J. Boucherie and O.J. Boxma. The workload in the M/G/l queue with work removal. Probab. Engineering and Informational Sci., volume 10, pages 261-277, 1995. [Boucherie and Van Dijk, 1994lR.J. Boucherie and N.M. Van Dijk. Local balance in queueing networks with positive and negative customers. Ann. Operat. Res., volume 48, pages 463-492, 1994. [Chao, 1995lX. Chm. A queueing network model with catastrophes and product form solution. Operations Research Letters, volume 18, pages 75-79, 1995. [Chao and Pinedo, 1995lX. Chao and M. Pinedo. Networks of queues with batch services, signals and product form solutions. Oper. Res. Letters, volume 17, pages 237-242, 1995. [Fayolle, 1995lG. Fayolle. A simple telephone exchange with delayed feedbacks. Teletrafic Analysis and Computer Performance Evaluation (0.J. Boxma, J. W. Cohen, and H. C. Tijms, eds.), Elsevier Science, Amsterdam, pages 75-79, 1986. [Gelenbe et al., 19911E. Gelenbe, P. Glynn, and K. Sigman, Queues with negative arrivals. J. Appl. Prob., volume 28, pages 245-250, 1991. [Gelenbe, 19911E. Gelenbe, Queueing networks with negative and positive customers and product form solution. J. Appl. Prob., volume 28, pages 656463, 1991.
On the Stability of Queues with Negative Arrivals 67 [Gyorfi and Morvai, 2002lL. Gyorfi and G. Morvai. Queueing for ergodic arrivals and services. In I. Berkes,(ed.) et al., Limit theorems in probability and statistics. Fourth Hungarian colloquium on limit theorems in proba. and Stat, Balatonlelle, Hungary, 1999. volume 11. Budapest: Janos Bolyai Mathematical Society pages 127-141, 2002. [Harrison and Pitel, 1993lP.G. Harrison and E. Pitel. Sojourn times in single-server queues with negative customers. J. Appl. Prob., volume 30, pages 943-963, 1993. [Harrison and Pitel, 1996lP.G. Harrison and E. Pitel. The M/G/l queue with negative customers. Adv. Appl. Prob., volume 28, pages 540-566, 1996. [Henderson, 1993lW. Henderson. Queueing networks with negative customers and negative queue length. J. Appl. Prob., volume 30, pages 931-942, 1993. [Henderson et al., 1996lW. Henderson, B.S. Northcote and P.G. Taylor. Geometric equilibrium distributions for queues with interactive batch departures. Ann. Operat. Res., volume 48, pages 493-511, 1994. [Jain and Sigman, 1996lG. Jain and K. Sigman. A Pollaczek-Khintchine formula for M/G/1 queues with disasters. J. Appl. Prob., volume 33, pages 1191-1200, 1996. [Kernane and Aissani, 2006lT. Kernane and A. Aissani. Stability of Retrial Queues with Versatile Retrial Policy. Journal of Applied Mathematics and Stochastic Analysis, 2006:pages(16), 2006. [Loynes, 1962lR.M. Loynes. The Stability of a Queue with non-independent interarrival and Service Times. Proc. Cambridge Philos. SOC.,58:497-520, 1962. [Zhu and Zhang, 2004lY. Zhu and Z.G. Zhang. M/GI/l queues with services of both positive and negative customers. J. Appl. Prob., volume 33, pages 11911200, 1996.
Random Multivariate Mult imodal Distributions George Kouvaras' and George Kokolakis2 National Technical Uiiiversity of Athens Department of Mathematics Zografou Campus 15780 Athens, Greece (email: gkouvhath. ntua. gr) National Technical University of Athens Department of Mathematics Zografou Campus 15780 Athens, Greece (email: Kokolakishath. ntua. gr)
Abstract. Bayesian nonparametric inference for unimodal and multimodal random probability measures on a finite dimensional Euclidean space is examined. After a short discussion on several concepts of multivariate unimodality, we introduce and study a new class of nonparametric prior distributions on the subspace of random multivariate multimodal distributions. This class in a way generalizes the very restrictive class of random unimodal distributions. A flexible constructional approach is developed using a variant of Khinchin's representation theorem for unimodal distributions. Results using our approach in a bivariate setting with a random draw from a Dirichlet process are presented. Keywords: Convexity, Dirichlet process, Unimodality-Multimodality, Polya trees, Random probability measures.
1 Introduction Much of nonparametric Bayesian inference has proceeded by modelling the unknown cumulative distribution function (c.d.f.) as a stochastic process. In a fundamental paper, [Ferguson, 19731, a random process, called the Dirichlet process, was defined as a distibution on ( P , S ) ,where P is the collection of all probability measures on a measurable space (X,.F), endowed with a calgebra S . The major drawback of a Dirichlet process is that it selects discrete distributions with probability one [Ferguson, 19731 and [Blackwell, 19731. Several different classes of nonparametric priors, which all contain the Dirichlet process as a particular case, have been proposed. It seems worth mentioning, among others, the mixture of Dirichlet processes [Antoniak, 19741, which is a Dirichlet process where the base measure is itself random and the mixture of Dirichlet process prior [Lo, 19841, which is a convolution of a Dirichlet process with an appropriate kernel. After the work of previous authors, the study of absolutely continuous random probability measures has 68
Random Multivariate Multimodal Distributions
69
become a very active area of research, touching on both analytical and simulation based approaches, cf. Polya Trees [Lavine, 19921, [Lavine, 19941 and [Kokolakis and Dellaportas, 19961, Dirichlet diffussion trees [Neal, 20011 and Levy-driven Processes [Nieto-Barajas and Walker, 20051. For reviews of nonparametric priors the reader is referred to [Walker et al., 19991, [Hjort, 20031, [Dey, 19981, [Ghosh, 20031 and the references therein. In this paper, we present a Bayesian nonparametric inference for unimodal and multimodal random probability measures on a finite dimensional Euclidean space that have finite expected number of modes. As a consequence, we get a random probability measure that admits a derivative almost everywhere in IRd. The paper is organized as follows. Section 2 has the essential theoretical background on multivariate unimodality to implement our methodology. In section 3, a detailed description of partial convexification procedure is provided. Random bivariate multimodal probability measures are constructed and possible modifications and extensions are discussed in section 4.
2
Univariate and multivariate unimodality
An important property of a distribution is unimodality. A univariate c.d.f. F is said to be unimodal with mode (or vertex) at m, if F is convex on (-m, m) and concave on ( m ,GO). We make use of unimodality to get absolutely continuous distribution functions. 2.1
Unimodality on R
For univariate distributions there is a well known representation theorem due to Khinchin (see [Feller, 19711, p.158) that refers to the classical univariate unimodality.
Theorem 1. A real valued random variable X has a unimodal density at 0 i f and only i f it is a product of two independent random variables U and Y , with U unaformly distributed on ( 0 , l ) and Y having an arbitrary distribution. This can be expressed in the following equivalent form cf. [Shepp, 19621, [Brunner, 19921 and (Kokolakis and Kouvaras, 20071.
Theorem 2. The c.d.f. F is convex on the negative real line and concave on the positive, if and only i f there exists a distribution function G on R such that F admits the representation:
for all x points of continuity of G .
70
2.2
Recent Adva.nces in Stochastic Modeling and Data Analysis
Unimodality on Rd
For multivariate distributions, however, there are several different ways that unimodality is defined. Among the main types of multivariate unimodality there are the following: the beta unimodality, which is generated by the Beta distribution, Beta(m,u ) , instead of Uniform distribution on the interval (0, l), and contains the classical univariate unimodality and some of the existing multivariate notions of unimodality as special cases (star unimodality, u-unimodality), the linear unimodality, which is characterized by the unimodality of the distribution of any linear combination of the components of a random vector and the strong unimodality, which is defined as a convolution of unimodal distributions. An extended study of different types of unimodality and their useful consequences can be found in [Dharmadhikari and Kumar, 19881 and [Bertin et al., 19971. In what follows we focus our attention on the Khinchin’s classical unimodality extended to the multivariate case. For the sake of simplicity we restrict ourselves to the bivariate case. Results for higher dimensions can be easily derived. According to the classical unimodality [Shepp, 19621 we have: Theorem 3. The c.d.f. F is unimodal at 0 i f and only i f there is a random vector ( X l , X 2 ) with c.d.f. F , such that ( X l ,X 2 ) = ( K U 1 ,Y2U2)l
(2)
where (Y1,y2) and ( U l ,U2) are independent random vectors, (U1,U2) is uniformly distributed on the unit square and (Y1,Yz) having an arbitrary c.d.5 ti.
Equivalent to Theorem 3 is the following.
Theorem 4. The c.d.f. F is unimodal at 0 i f and only i f for all points of continuity of G ,
( 2 1 , ~ ~ )
F ( z i , 2 2 ) = ~ 1 F ’ ~ ( 2 1 , 2 2+ ) z2FZ2(51,z2)- z i ~ f ( z i , z 2+)G(21,22), (3) where G is an arbitrary c.d.f., with subscripts in (3) denoting partial derivatives. According to the above procedure, i.e. by the component wise multiplication of two independent random vectors (Y1,Y2) and (U1,U Z ) ,where the latter is uniformly distributed on the unit square, we always get a c.d.f. F with a single mode at zero, no matter what the distribution G, we start with, is. To overcome the limitation of getting always a single mode at zero we propose the following “partial convexification” procedure.
3 Univariat e and multivariate partial convexification Partial convexification procedure of a c.d.f. G relies on using U ( a ,1) distributions, with 0 < a < 1, instead of U ( 0 , l ) [Kokolakis and Kouvaras, 20071.
Random Multivariate Multimodal Distributions
71
The parameter (Y can be fixed, or random with a prior distribution p ( ~ ) , on the interval (0, 1). According to this, we obtain a prior distribution on the subspace of multimodal c.d.f.’s. The expected number of modes of F increases from one, when (Y = 0, to infinity, when a = 1, having a finite number of modes when 0 < LY < 1. This means that when 0 < LY < 1, the c.d.f. F ( x ) alternates between local concavities and local convexities, i.e a “partial convexification” of F is produced. Definition 1. The d-variate c.d.f. F is called partially convexified if there exists a random vector X = (XI,.. . ,Xd) with c.d.f. F , such that
(XI,. . . ,Xd) = (YIUl,. . . YdUd),
(4)
where Y = (Yl,.. . ,Y d ) and U = (Ul,. . . ,U d ) are independent vectors, U is uniformly distributed on the rectangle ( ( ~ 11) , x . . . x ( a d , 1) and Y having an arbitrary d-variate c.d.f. G. 3.1
Partial convexification on
IR
Using the partial convexification procedure, Theorem 2 can be expressed [Kokolakis and Kouvaras, 20071 in the following form. Theorem 5. The c.d.f. F is partially convexified i f there exists a distribution function G on R such that F admits the representation:
for all x, z / u , points of continuity of G. 3.2
Partial convexification on
R2
A generalization to the bivariate case using Uniform distributions U ( ( Y l), ~, i = 1,2, with parameters ai fixed in the interval (0, l), is as follows. Theorem 6. The c.d.f. F is partially convexified i f there exists a distribution function G on IR2 such that F admits the representation:
F(xI,~
2= ) X
+
I F ~( ~, 1 , ~ 2 x2FZz ) ( 2 1 , ~-) x 1 ~ 2 f ( x i~,
+ &(xi,~
2 )
2 )(6)
where
for all (x1,x2), (x1/a1,22), (zlrx2/a2) and (xi/ai,x2/a2) points of continuity of G.
72
Recent Advances in Stochastic Modeling and Data Analysis
Proof. We have: F(Xl,X2) =
P[X1 I
X l , X 2
i x 2 1 = P[UlYl 5 X l , U Z Y 2 5 5 2 1
For xi 2 0,
for x;
<0
(i = 1,2)
and Ho(.) is degenerate at zero. In the sequel we will only refer to the first quadrant 11 = {(x1,x2) : X I 2 O , x 2 2 0). Thus in 1 1 , the expression (8) can be written as follows:
F(Xl,X2)
=
J
(-c=711
G( dYl, dY2) x (--w,=21 2 2 -02y2
21 -
21
WYl
- f f l Y l x2
- 02Y2
G(dY1, d Y 2 ) .
(9)
With fixed yi # 0, the functions Hy,(.)(i = 1 , 2 ) are bounded with bounded left and right derivatives a.e. in I l . Applying the bounded convergence
Random Multivariate Multimodal Distributions
73
theorem we conclude that the c.d.f. F is differentiable a.e. in I1, with respect to the Lebesgue measure, and their 1st and 2nd order mixed derivatives, wherever they exist, are as follows:
and
Introducing the above results in (9) we get finally:
where
We may notice that when the points ( 2 1 , 2 2 ) , ( 2 1 / a 1 , 2 2 ) , ( 2 1 , 2 2 / a 2 ) and ( z l / a l , ~ / a 2 ) are all continuity points of G, then Q(z1,zz) takes the form:
Expressions (10) and (11) hold to the others quadrants as well. When the origin (0,O) is a continuity point of G then f is a density.
74 Recent Advances in Stochastic Modeling and Data Analysis ._
4
om bivariate multimo~alprobability ~ e a s u r e s
In our Bayesian model specification we assume the following:
-
(Y1,Y2) G, where G is a random c.d.f. produced by a Dirichlet process, using the tree structure presented in [Kokolalcis and Kouvaras, 20071. ( U l , U2) is uniformly distributed on the rectangle ( a l ,1) x (az,1) with a, fixed on the interval (0, 1). (Yl ,Y2) and (U1, U2) arc independent.
By definition, the c.d.f. F of the random vector ( X 1 , X z ) = (YlUl, Y2U2) will be random partially convcxified c.d.f. and thus we take a prior distribution on the subspace of bivariate multimodal c.d.f.'s. For demonstration purposes, we consider the generation of a random c.d.f G from a Dirichlet process, DP[c/3(.)],where the concentration parameter is c = 2 and the base distribution /?(-) on Rzis a mixture of two bivariatc normal distributions. Specifically, a(y) = wlN(yl1-11,CI) w2N(yylp2,62) where,
+
1-11
=
(;),
1-12
=
(;;),
.El=
(; ;), c; ( ,) =
16 4 4
with weights wl = 0.4 = 1 - w2. The sample sizes have all been taken equal to 250. The partial convexification procedure has been applied with the parameter a = a1 = a2 = 0.80.
(a) Before Partial Convexification.
(b) After Partial Convexification.
Fig. 1. Construction of a Random Bivariate Muitimodal Distribution Using Partially Convexified Procedure.
In Figure l(a), a random c.d.f. G from a Dirichlet process is presented. In Figure l(b), partially convexified version is presented. We may notice the partially convcxification procedure results in some overestimation of the posterior distributions. Various generalizations of the present result for higher dimensions and correlation structures arc possible. Such models can be useful alternatives to the standard Bayesian mixtures models for multivariate multimodal distributions.
Random Multivariate Multimodal Distributions
75
References [Antoniak, 19741Ch. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist., 2:1152-1174, 1974. [Bertin et al., 1997lE.M-J. Bertin, I. Cuculescu, and R. Theodorescu. Unimodality of Probability Measures. Kluwer Academic Publishers, 1997. [Blackwell, 19731D. Blackwell. Discreteness of Ferguson selections. Annals of Statistics, 1~356-358, 1973. [Brunner, 1992lL.J. Brunner. Bayesian nonparametric methods for data from a unimodal density. Statistics and Probability Letters, 14:195-199, 1992. [Dey, 1998]Muller P. Sinha D. Dey, D. Practical nonparametric and semiparametric Bayesian Statistics. New York: Springer-Verlag, Inc, 1998. [Dharmadhikari and Kumar, 19881s. Dharmadhikari and J-D. Kumar. Unimodality, Convexity and Applications. Academic Press, 1988. [Feller, 19711W. Feller. A n Introduction to Probability Theory and its Applications, Volume 2. Wiley, New York, 2nd edition, 1971. [Ferguson, 1973lTh. S. Ferguson. A Bayesian analysis of some nonparametric p r o b lems. Annals of Statistics, 1:209-230, 1973. [Ghosh, 2003]Ramamoorthi R. Ghosh, J. Bayesian Nonpammetrics. New York: Springer-Verlag, Inc, 2003. [Hjort, 2003lN. Hjort. Topics in nonparametric Bayesian statistics. In N. Hjort P. Green and S. Richardson, editors, Highly structured stochastic systems. Oxford: Oxford University Press, 2003. [Kokolakis and Dellaportas, 1996lG. Kokolakis and P. Dellaportas. Hierarchical modelling for classifying binary data. In Bayesian Statistics 5, pages 647452, London, 1996. Oxford University Press. [Kokolakis and Kouvaras, 2007lG. Kokolakis and G. Kouvaras. On the multimodality of random probability measures. Bayesian Analysis, 2:213-220, 2007. [Lavine, 19921M. Lavine. Some aspects of Polya tree distributions for statistical modelling. Annals of Statistics, 20:1222-1235, 1992. [Lavine, 1994lM. Lavine. More aspects of Polya tree distributions for statistical modelling. Annals of Statistics, 22:1161-1176, 1994. [Lo, 1984lA.Y. Lo. On a class of nonparametric estimates: I. Density estimates. Annals of Statistics, 12:351-357, 1984. [Neal, 2001lR. Seal. Defining priors for dist,ributionsusing Dirichlet diffussion trees. Technical report, Department of Statistics, University of Toronto, March 2001. [Nieto-Barajas and Walker, 2005lL.E. Nieto-Barajas and S.G. Walker. A semiparametric Bayesian analysis of survival data based on Levy-driven processes. Lifetime Data Analysis, 11:529-543, 2005. [Shepp, 1962lL.A. Shepp. Symmetric random walk. Transactions of the American Mathematical Society, 104:144-153, 1962. [Walker et al., 199915. Walker, P. Damien, P.W. Laud, and A.F.M. Smith. Bayesian nonparametric inference for random distributions and related functions. J. Roy. Statist. SOC.,Ser. B, 61:485-527, 1999.
A system maintained by imperfect and perfect repairs under phase-type-distributions Delia Montoro-Cazorla', Rafael Perez-0con2, and Ma Carmen Segovia3
' Departamento Estadistica e 1.0 Universidad de Ja6n Jab, Spain (e-mail: dmontoro@,uiaen.es) Departamento de Estadistica e 1.0 Universidad de Granada 18071 Granada, Spain (e-mail: [email protected]) Departamento de Estadistica e 1.0 Universidad de Granada 18071 Granada, Spain (e-mail: msegovia@,ugr.es)
Abstract. We study a system maintained by means of imperfect repairs before a perfect repair. This model has been previously studied by Biswas and Sarkar (2000), which calculated the availability of the system when the operational and repair times were exponential. We extend the model considering these times phase-type distributed. Performance measures are calculated in stationary regime. Keywords: Phase-type distribution,repairs, maintenance.
1 Introduction In certain practical situations it results convenient to maintain operational the system the most time, even its conditions are not the optimal. Mostly for reasons of time and costs, the repairs are frequently imperfect; up to a perfect repair is necessary. In this paper we consider a system that will be perfectly repaired after a certain number of imperfect repairs. The successive operational times after repairs are stochastically decreasing, and the successive repair times are stochastically increasing. This model is studied by Biswas and Sarkar (2000) assuming general distributions for the operational and repair times, but certainly, if the distributions were not exponentially distributed, it would be intractable the application of the main result. We consider phase-type distributions, which can approximate any continuous distributions defined on the positive real line. Moreover, introducing these distributions, the model will be governed by a 76
A S y s t e m Maintained by Imperfect and Perfect Repairs 77
Markov process, and this allows well-structured and tractable expressions. Other related paper are the one of Lim & Lie (2000), Neuts (2000), Perez-Ocon et al. (2004,2004a, 2004b), and references in them. We calculate explicitly the stationary vector of the Markov process that governs the system, and from it, we calculate performance measures of interest, not only the availability, but also the rate of occurrence of failures and the interval reliability.
2 The model In the present paper, we consider the Biswas and Sarkar model, preserving the assumptions and modifying the distribution of the operational and repair times. First, we define the phase-type distributions, which play an important role throughout the paper. Definition I The continuous distribution H(.) on [o,m[ is a phase-type (PHdistribution) with representation (a,T) if it is the distribution of the time until absorption in a Markov process on the states {I ,...,m,m+l) with generator
and initial probability vector (a,am+J,where a is a row m-vector. We assume that the states {I ,..., m) are all transient. Throughout this paper, e denotes a column vector with all components equal to one, the dimension of which is determined by the context. The matrix T of order m is non-singular with negative diagonal entries and non-negative off-diagonal entries and satisfies Te + T o = 0. The distribution H(9 is given by H ( x ) = 1- a exp(Tx)e, x 2 0. It will be denoted that H( 9 follows a PH(a, T ) distribution.
x,
Assumption 1. We denote by the lifetime of the unit after completing (i-I) imperfect repairs. We assume that this random variable Xi follows a phase-type distribution with representation (a(i),T(i)) and with m(i) phases, i=1,2,...k+l. We denote by Y,, i= 1,2,...,k the imperfect repair time of the failure ith, this random variable follows a phase-type distribution with representation (p(i),S(i)) and with
78
Recent Advances in Stochastic Modeling and Data Analysis
n(i) phases. We have: F,(x)=l-a(i) .exp(T(i) -x) .e,Gi(x)=I-P(i) .exp(S(i) .c) -e where e has the appropriate dimension in each case. Assumption 2. We assume that F,,, is stochastically smaller than Fi, i=l,...,k, and G,+,is stochastically larger than Gi, i=l,..,k-I.
Assumption 3. When the system completes k imperfect repairs it is perfectly repaired. The time for completing a perfect repair follows a PH-distribution with representation (y,L), of order n. Assumption 4. All the involved times are independent.
3 Stationary regimen. Realiability measures. 3.1 The generator The methodology we use implies to introduce the structure of states. We will say that the system occupies the macro-state i, i=l,..., k + l , if it is operational and it has completed (i-1) imperfect repairs, it occupies thmacro-state iRif it is being imperfectly repaired after the failure ith, iR=I fi..., kR, and it occupies the macrostate (k+llRwhe it is being perfectly repared.
s(1) Q=
so( M
2 )
... T(k+1) To(k+l)y
A System Maintained by Imperfect and Perfect Repairs
79
The blocks that form this generator can be interpreted in terms of the transition between the macro-states. So, the block that represents the transition I+IR is given by To(l)P(I), since the change occurs when the system undergoes a failure, with absorption vector To(l), and enters repair, with initial vector similarly the generator is calculated.
P( 1). Reasoning
The Markov process that governs the system is positive recurrent, so the system reaches the stationary distribution. We denote by w the stationary probability vector, solution of wQ=O, subjected to the normalization condition we=l. It can be written according to the blocks defined by the macro-states:
Operating, the components of the stationary vector result:
,i = 1,. .., k + 1 i=l
i=l
kcl
k+l
n-(iR)=-,8(i)S-I(i)
with pi=-a(i)T+(i)e, i=I, ...,k+ I , piR=-P(i)S-I(i)e,i=l, ...,k and ,uF+~,R=-YL 6
3.2 Performance measures Now we give the expressions of some well known performance measures, such as the availability, the rate of occurrence of failures (ROCOF) and the reliability in an interval of time. The availability is the probability that the system will be up. It is denoted by A and results:
80
Recent Advances in Stochastic Modeling and Data Analysis
since r(i)e =n(l)To(l)pi, i=l,.. ., k+ 1,
A measure of the deterioration is the rate of occurrence of failures (or repairs), that is, the mean number of failures per unit time undergone by the system. This quantity is
In emergency systems, the reliability in an interval is a useful quantity. It represents the probability that the system will be operational during a period of time. Let R(z) be the probability that in the long run the system will not fail in the interval of rank z. For calculating the expression of this quantity, we consider that the system must be in any of the operational macro-states within the interval. Then, we have:
4 Numerical application Let a system be that is imperfectly repaired up to 3 times before its perfect repair, then k=3. The successive operational times after repairing form a geometric processes with a deteriorate factor a=1.25, that is, the operational meantime after an imperfect repair is reduced a 20%. While operational, the system goes through three phases, and the fourth indicates the failure. Then, the operational time after the (i-1)-repair follows a PH-distribution with representation (a,a1-'T),i=l,2,3,4, that is, a(i)=a=( l,O,O), T(i)=a"T. The time of the ith imperfect-repair follows a PH(P,b"S), being b=O. 75, that is, the repair meantime after every repair is
A S y s t e m Maintained by Imperfect and Perfect Repairs 81 enlarged a 25%, that is, p(i)=p=(l,O,O), S(i)=b"'S, i=I,2,3. The perfect repair time follows a PH(y,L), with y=( l,O,O). The matrices T, S, L are:
-3
3
27
-27
-15
15
For this model, we calculate the availability, rate of occurrence of failures, availability in the interval [t,t+5] for t large. Moreover, the measures are calculated for k=O,1,2 and a comparison is established, kl 0
A
I
V
I
R(5)
I 0.8488 I 0.4630 I 0.0093
I,
Table 1. Reliability measures for the Model 'rhc results in the table above illustrate that the highest availability is obtained for the model when k = l .
Acknowledgement: This work was partially supported by Ministerio de Educacion y Ciencia, Spain, under grant MTM2004-03672.
5 Specifying references [Biswas, A. and Sarkar, J.,2000]. Availability of a system maintained throough several imperfect repairs before a replacement or a perfect repair. Statist. Probab. Lett. 50, 105-114. [Lim, T.J. and Lie, C.H., 20001. Analysis of System Reliaiblity with Dependent Repair Modes. IEEE Trans. Reliab., 49(2), 153-162. [Neuts, F. et al. 20001. Repairable models with operating and repair times governed by phase type distributions. Adv. Appl. Probab. 34, 468-479. [PCrez-Ocon, R. and Montoro-Cazorla, D., 20041. Transient Analysis of a Repairable System, Using Phase-Type Distributions and Geometric Processes. IEEE Trans. Reliab., 53(2), 185-192. [PCrez-Ocon, R. and Montoro-Cazorla, D., 20041. A multiple system governed by a quasi-birth-and-death process. Relab. Eng. Syst. Safety, 84, 187-196. [Perez-Ocon, R. and Ruiz-Castro, J.E., 20041. Two models for a repairable twosystem with phase-type sojourn time distributions. Relab. Eng. Syst. Safety, 84, 253260.
Asymptotically robust algorithms for detection and recognition of signals* Veniamin A. Bogdanovich’ and Aleksey G. Vostretso?
’ Saint-Petersburg Electrotechnical University “LETI” Professor Popov str. 5 , St.Petersburg, 197376, Russia (E-mail:
[email protected]) * Novosibirsk State Technical University K.Marx av. 20, Novosibirsk, 630092, Russia (E-mail:
[email protected])
Abstract. Using the principle of minimax and Bayesian approach the method of synthesis of asymptotically robust (AR) algorithms for detection and recognition of signals with unknown parameters on the background of independent noise and signallike interference is developed. According to this method the asymptotically robust algorithm is sought in the class of algorithms of correlation type with the threshold depending on the observed sampling. For synthesis of a robust algorithm using this class no requirements are imposed on energy parameter of a signal, it is different from the known class of algorithms of M-type. Keywords: nonparametric a priori uncertainty, model of noise distribution, q-point model, finite variance model, principle of minimax, Bayesian approach, asymptotically robust algorithm.
1
Introduction
For the AR algorithm synthesis we use minimax optimality criterion, according to which in some class of expected density of probability distribution (DPD) of noise the maximal asymptotic losses owing to signal omission for fixed level of asymptotic probability of false alarm is minimized. The additive interference is represented as the sum of an independent stationary noise with some scaling parameter LT E (0, a) and a P
signal-like interference 17 = C
with the arbitrary shift parameters
/=I
$, € ( - a , m ) , j = l ,
...)P .
The energy signal ( A / & ) S ( 8 )
A and nonenergy 8 parameters of the being detected are also considered as uncertain. For the parameter A
only the set of its values is established in order to have possibility of construction of AR algorithms which are uniformly asymptotically minimax with respect to this parameter. The parameter 8 is assumed as random but Supported by Russian Foundation for Basic Research (Project No 07-0700070) 82
Asymptotically Robust Algorithms 83
with unknown a priori distribution P ( 8 ) . To overcome uncertainties of the distribution P(8) we use asymptotic Bayesian approach to statistical synthesis. In the case of nonparametric a priori uncertainty the hypothesis Ho about the signal absence and the alternatives H k , k = 1, ...,m , about the presence of some signal are represented, correspondingly, by the families of distributions
{
Pk,n = wn
(xl&e?Y)
= wn[x-ks(e)
I Yl,
wn
(’ I Y) EPo,n,A
‘
(O3.O)
,8 Edk)}
>
(2)
where W is the class of expected DPD of noise, y = (D,4,...,.pP ) is the fixed parameter of the additive interference, d k )is the range of definition of the parameter 8 for the alternatives H,, k = 1, ...,m . The signal S (0) in families (2) is expressed in the form N
(3) j=l
i
1
where S(’) = Sp),...,S r ) ,j
( ’
= 1,...,N
I
is the known orthogonal basis of
the signal, 8 = ( Ql ,. . .,ON ) is a random parameter. In order the energy of the signals
(l/&)e(’),
(l/&)S(’),j
= 1, ..., N
,
i = 1, ...,P , do not tend to infinity with increase of the observed
sampling
size
we
use
the
P, (l/&)lle(i)ll
following
normalizations:
= 1 t’ m = 1,...,P. m
In addition, we assume that the norm ll8ll= 1 t’ 0 E E = IJ
dk).Therefore,
k=l
for all values of the parameter 8 E E the signal
1
(A/&)S@)
II=A
.
(A/&) S (8) has the norm
84
2
Recent Advances in Stochastic Modeling and Data Analysis
Models for a priori uncertainty of noise
Nonparametric a priori uncertainty of noise can be represented by some class W of the densities ~ ( t )of distribution with finite Fisher's information about shift. For the finite Fisher's information the densities w ( t ) are
absolutely
continuous
functions
and
Fisher's
information
m
y$ ( t )w(t)dt , where yw is the logarithmic derivative of the
Z(w) = -m
density w(t) [l]. For synthesis of AR algorithms in the case of independent noise the following models of noise distributions are the most applicable: model with a finite variance, E -pollution model and q -point model. The model with finite variance is represented by the class of DPD of
i
I
m
noise W, = w: J t2w(t)dt= 6,, I( w ) < co , the model of -m
by the class W,
q -point model
-
=
E
-pollution -
(w:w(t)=(l-~)f(t)+~h(t), Z ( f ) < ~ , Z ( h ) < c o and )
-{
I
by the class Wa,q- w: 1 w(t)dt = q, I( w )< co , where a:
all DPDs have zero mean, f is such doubly differentiable DPD that the function -logf(t) is convex, E E (0,1), 6 E (o,co), q E (O,1), a E ( 0 , ~ ) are the parameters of the models. The models W,, W, II Wa,q are mostly investigated. For these models the distributions are found that have minimal Fisher's information about shift and play important role in synthesis of AR algorithm [l]. The distribution density with minimal Fisher's information is called least favourable DPD of noise. For this density we use here and further the notation wo ( t ). In the class W, the least favourable DPD coincides with the density of normal distribution
In the class W, the least favourable DPD is expressed in the form
(1 - ~ ) f ( l)exp[ t k ( t -t
,)I , co
;
(5)
tl I t I t , ;
(l-E)f(t~)exp[-k(l-t2)~ , t*
2
Asymptotically Robust Algorithms 85
where tl
H
t2 are the limits of the range where the inequality holds
l f ’ ( t ) / f ( t ) lI k , k is the parameter related to the value of
E
by the
relationship
If the nominal density f ( t ) = (l/&)
(
exp -t2/2) , then, according to (5)
exp(-t2/2)
, ~ tI( k ,
and (6), the density wo( t )=
In the class Wu,qthe least favourable DPD has the form
wo
where
the
(4 =
I
c cos2 (A/2)
parameters
A
cos2[ A t / ( 2 a ) ] , It1 I a,
and
B
satisfy
the
equations
wo( t )dt = q and the parameter C = cos2(A/2)/( 1+ 2 / B ) .
B = A .tg (A/2), -U
Extending of the models W, and Wu,qis based on the possibility of representation of almost any DPD with zero mean in the form p ( t ) = ( l / c ) w ( t / c ) , where the density w E W, ( W a , q ) .For such representation of DPD it is sufficient to assume that density w ( t )= c p (ct ) m
and to determine the parameter c from the equation
I t 2 p ( t ) d t= e2iT2in
-a2 UO
the case of the model W, or from the equation
I
p ( t ) d t = q in the case of
-UO
the model Wu,q.Owing to arbitrariness of the density p the parameter c for the specified extension of the models W, and WUlqis a priori uncertain. Therefore, the extended models W, and Wu,qare the models of a priori uncertainty of a mixed type and are represented, correspondingly, by the classes of distributions , (r E (O,co), w
E
W, (w E Wu,q)
loss of generality it is possible to assume for the models P, and
that,
86
Recent Advances in Stochastic Modeling and Data Analysis
correspondingly, the parameter S = 1 and the parameter a=l . Further the class Watqfor a=l is denoted for conciseness as Wq, and the class Pa,q as Pq.
3
Statistical synthesis of asymptotically robust algorithms
Initial premises. We synthesize AR algorithms for the following initial premises.
1
(
1B. The basis vectors S(’) = S,(’),...,Sk) satisfy the condition max
max
j=l, , N r = l , ,n
of the signal S(0)
J s ~ ’ ~ _ < n=1,2, o < ~...v .
2B. All distributions of the class W have finite Fisher’s information about shift. 3B. The density w(t19) = w,, ( t - 9), where wo(t- 9) is the least favourable DPD in the class W1, where the density w(t I 9) is continuous and positive with respect to the argument t function differentiable with respect to 9 at the point 9 = 0 ; there exists the neighbourhood U of the point 9 = 0 in which the density w(t I 9) = w ( t )[1+9 y ( t )+ a2S( t ,a)] , m
j S2(t,S)W(t)dt
4B.
For
all
densities
wEW
the
aw( r )= M{W, ( t + z)lw(t)} is differentiable with respect to
function t
in the
neighbourhood of the point r = 0 , moreover the value aw ( r = 0) = 0 and the derivative a; (z = 0 ) =M(vA ( t ) l w ( t ) } where ,
d
v, ( t )= --ln
dt
wo ( t ) .
greater than zero, finite and continuous in the neighbourhood of the point z = 0 for all densities w E W . Definition of A R algorithms. Quality of the asymptotic algorithms p = {pfl,n = 1,2,. ..) we estimate by limiting average losses caused by
w ) = lim n(pn,A, w ) and by the limiting omission of the signal n(q,,A, n-tm
’
For sufficiently general conditions the density wo exists, is absolutely continuous and is the only in the class of distributions with finite Fisher’s information [l].
Asymptotically Robust Algorithms
87
probability of false alarm a (p, w) = lim a (pn,w) , where the losses n+oo
KI (pn,A, y) =
5 nk(p,, ,A, y)
and
the
probability of
false
alarm
k=l
operator of averaging,
nk (pn,A, y )
are average losses in the case of
fulfilment of the hypothesis Hk . We remind that level of false alarm probability (level of the algorithm) is the maximal permissible value a of false alarm probability. Let us choose the class Oa = {p : a (p, w ) I a 'd w E W} of all asymptotic algorithms of the level a . Let us define in this class AR algorithm according to minimax optimality criterion. According to this criterion the algorithm pm E @a is called AR algorithm if it satisfies the condition
where A c ( 0 , ~ )is some set of values of the parameter A , on which the algorithm pm is uniformly optimal according to minimax criterion. In the known publications [2, 31 about robustness it is used for construction of AR algorithm, so called, M-type algorithm .
pr
To construct AR algorithm based on the algorithm p:
it is required
that in both cases, when the signal is present and is absent, the following convergences in probability be ensured:
8 k = y g supTn(x,8,y), k = 1,...,m , are
the estimates of the parameter 0 ,
&ZV)
'y,
d
( t )= --In dt
p ( t ) is the function of inertialess transformation of the
1 n sampling x; .~,,(x,e,y>= - S,(8)r~,
&c
P
r=l
the estimates
ek = ygsupZn(x,8,y), k = 1, ...,m , )
helping to establish specified level of the algorithm.
p ( a ) is the parameter
88
Recent Advances in Stochastic Modeling and Data Analysis
Owing to the fact that for convergences (8) to exist it is required fulfilment of stronger conditions than premises 1B - 5B, therefore, further for construction of AR algorithm we choose A 0 algorithm p; of correlation type with threshold dependent on the observed sampling. Decision functions of the algorithm p; have the form
where the estimates
8k
= argsupT,(x,8, y),
&$’
k
= 1,...,m
, the threshold
dependent on the sampling size is
To construct AR algorithm based on the algorithm with decision functions (9) it is required that in both cases, when the signal is present and is absent, the following convergence in probability take place Cn (x , a , Y ) * P ( a ) J m of
the
W E
w.
(1 1)
Construction of AR algorithm. It can be proved that for large values parameter A the algorithm p t with the parameter
~ ( a= C ) ( a ) / d m is the sought AR algorithm for the class W if premises 1B - 5B are satisfied, thus the following lemma is valid: L e m m a 1 . If premises 1B 3B are satisfied then the algorithm qEo is AR algorithm in the class 6 a ,moreover, optimality of the algorithm with respect to minimax criterion is ensured in this class for all values of the parameter A E (0,a) and any distribution P (0). In the class of all possible asymptotic algorithms the property of the asymptotic optimality of the algorithm qEo with respect to minimax criterion is ensured in limit ~
when the parameter A + co .
Asymptotically Robust Algorithms 89
Note, that when convergence (8) exists the statement of this lemma is also valid for the algorithm . However, optimality of this algorithm
cpg
with respect to minimax criterion is ensured only in the case when the parameter A 2 2np ( a ) / (2 - A ) . For low level of a of the algorithm (large parameter p ( a )) the lower boundary for the parameter too high, that is drawback of the algorithm
A can appear to be
cpg
After applying lemma 1 for constructing AR algorithms for representation of nonparametric a priori uncertainty by the classes of distributions W, , W, and Wqq, we have that AR algorithm is the algorithm Two
with decision functions (9).
For logarithmic derivatives convergence (8) is valid. Therefore, for the classes W, , W, and Wa,s with symmetric DPDs the algorithm can
cpg
also be used as AR algorithm. It can be shown that for specified change of initial premises 1B and 4B the algorithm qEo asymptotically robust in the class &a for symmetric DPD remains robust for the case when no requirement of symmetry is imposed on DPD of noise. Let W, denote the class of symmetric DPDs of noise and W* denote the class of distributions having both symmetric and nonsymmetric DPDs of noise. Let the mean value for the nonsymmetric DPDs be zero. As before, Fisher’s information is considered finite for all DPDs from the class W” . Initial premises 1B and 4B we formulate in the following way: of the signal S(8)
1B‘. The basis vectors S(’) = satisfy the conditions j=I,N r=l,n
4B’. For all densities w E W* the function a, (z) = m { ‘yo ( t + z) (w( t ) } is differentiable with respect to z in the neighbourhood of the point z = 0 , moreover the derivative ah (z
{
I
)) .
= 0) = M ‘y: ( t ) w ( t
The requirement a, (z = 0 ) = 0 is excluded from the initial premises because it can be unfulfilled in the extended class W* of noise distributions. Instead of this requirement the condition S ( j ) = 0 V’
= 1,...,N
is included, owing to the condition it is
n+m
possible to construct AR algorithm in the case of nonsymmetric DPD of noise.
90
Recent Advances in Stochastic Modeling and Data Analysis
It can be proved that the following lemma is valid. L e m m a 2 . Let the conditions of lemma 1 hold for the formulation of premises I B and 4B in the form 1B' and 4B'. Then the algorithm qEo, where wo is the least favourable DPD in the class W,, is
AR algorithm in the set of asymptotic algorithms for the extended class W" . If the condition lim n+m
= O V j = I , ...,N is not satisfied
the algorithms qEo and q g lose the property of asymptotic robustness in the class W* because of appearance of the uncontrolled displacement pw for asymmetric DPD. However, after some modification the given algorithms preserve the property of asymptotic robustness even in this case, but at the expense of some energy losses.
Conclusion To represent nonparametric a priori uncertainty of noise new models are proposed, the models were obtained by extension of the known models: the model of distributions with finite variance and the q-point model. The extension of the models was obtained by introduction of a priori uncertain scaling parameter. Owing to this almost any DPD can be expressed through the distributions of the models by using appropriate scaling parameter. Using the principle of minimax and Bayesian approach the method of synthesis of asymptotically robust algorithms for detection and recognition of signals with unknown parameters on the background of independent noise and signal-like interference is developed.
References Huber P.J. Robust Statistics. John Wiley and Sons. New York. Chichester. Brisbane. Toronto. 1981. Robustness in Statistics/ Edited by R.L. Launer, G.N. Wilkinson. Academic Press. New York. San Francisco. London. 1979. El-Sawy A. H., Vande-Linde V.D. Robust detection of known signal // IEEE Trans. Inform Theory. 1977. Vol. IT-23. pp. 722-727.
Three parameter estimation of the Weibull distribution by order statistics Vaida BartkutC' and Leonidas Sakalauskas' Institute of Mathematics and Informatics, Akademijos 4, Vilnius '(e-mail:
[email protected]),
2(e-mail:
[email protected]) Abstract. In this paper, we consider the estimation of the three-parameter Weibull
distribution. We construct numerical algorithms to estimate the location, scale and shape parameters by maximal likelihood and simplified analytical methods. Convergence of these algorithms is studied theoretically and by computer modeling. Computer modeling results confirm practical applicability of estimates proposed. Recommendations for implementation of the estimates are discussed, too. Results from simulation studies assessing the performance of our proposed method are included. Key words: order statistics, Weibull distribution, maximum likelihood.
1. Introduction Estimation of three-parameter Weibull distribution [Weibull, 195 11 occurs in many real-life problems. This distribution is an important model especially for reliability and maintainability analysis. The Weibull distribution is one of the extreme-value distributions which is used especially in optimality testing of Markov type optimization algorithms, too [Haan, 1981, Zilinskas and Zhigliavsky, 1991, Bartkute and Sakalauskas, 20041. Because of its useful applications, its parameters need to be evaluated precisely, accurately and efficiently. Three-parameter Weibull estimation has been studied by many researchers but an explicit solution is not known yet. Some questions concerning to the estimation of the location, scale and shape parameters of this distribution by both censored and noncensored samples were considered by several authors [Rockette et al, 1974, Lemon, 1975, Hirose, 1991, etc.]. However, iterative computational methods for the three-parameter Weibull distribution estimation are needed in a most cases [Hirose, 1991, Bartolucci, 19991. In this paper, we present some methods for estimating Weibull parameters, namely, translation, scale and shape parameters using order statistics of noncensored sample. We apply some simplifications which enable us to construct reliable and computationally efficient procedures for estimation. 91
92
Recent Advances in Stochastic Modeling and Data Analysis
2. Maximum likelihood and analytical estimation The three-parameter Weibull distribution function is given by:
where c, A and a denote the scale, location and shape parameters, respectively. In this paper we compare analytical and maximum likelihood method for the estimation of these parameters by order statistics of noncensored sample.
2.1 Maximum likelihood estimator This section is concerned with maximum likelihood estimation (MLE) of the three-parameter Weibull distribution. The maximum likelihood methodology is based on large-sample theory and the method might not work well when samples are small or moderate in size. Let m be number of order statistics ~ ( 1 )5 ~ ( 2 2) ... 2 T ( ~ which ) is given
N i.i. random value with distribution function W(.). from the population of Then likelihood function of order statistics can be expressed as follows [Balakrishnan and Cohen, 1990, Zilinskas and Zhigliavsky, 19911:
where w(.)-density function. Now we may write likelihood function for the Weibull distribution which depends on three estimated parameters c, A, a. The general technique for solving for MLE's involves setting partial derivatives of likelihood function logarithm (the derivatives are taken with respect to the unknown parameters) equal to zero and solving the resulting (usually non-linear) equations. Since these equations are nonlinear in the three parameters they can be solved only using nonlinear optimization techniques. However, the standard maximum likelihood method for estimating the parameters of the three-parameter Weibull model can have problems since the regularity conditions are not met; besides, numerical implementation of the method requires of complicated optimization software [Murthy et al., 2004, Blischke, 1974, Zanakis & Kyparisis, 19861. Let propose the modification to simplify the estimation. Hence assume instead asymptotic expansion the next expression is true for some x E [ A , A 61:
+
w(x,a,c, A ) = c(x - A)".
(3)
93
Three Parameter Estimation of the Weibull Distribution
Now let write the likelihood function:
~
l
x-.
m 1 ~..-,qm); ~ A, ~ a))~ =(N-) m). c* , d q m ) -~)a-' -(a-1)dA 1-4qm) - ~ ) a j=lTi)-A
Taking the partial derivative of the log-likelihood function (with respect to each parameter) and setting it equal to 0 we have 6 ,2 estimates:
m
a = m-1
c
2
1n(l+ Pj
(A))
j=l
m
n
C=
where P j
(A)=v(m) V(j ) 1
j=l
A
and
2 , 2 < q(1) is the solution of the equation: 1
j=l
1
(7)
94
Recent Advances in Stochastic Modeling and Data Analysis
Denote the maximum point ymax= arg max F ( y ) . Conditions for Oly
O , O
1 -.
i=l
2
1 I-, m
then equation F ( y )= 0 has the solution, 0 < y*
<
, where the function
F ( y )changes its sign from "-" to "+". ~f
F ( y m a x ) > 0 0 < ymax < 00, and
then the function F(y) has the minimum point 0 < y
< 00.
Thus, if condition (8) not valid then the equation (7) has the solution in which the likelihood function obtains minimum only. Then likelihood function n
has not maximum. At last case we proposed to take A = -00 and a = 00 . If both conditions (8) and (9) are true then we have to take into account that equation has one additional solution corresponding to minimum of likelihood fkction (4). As an initial value for numerical procedures of solving (7) we propose
where
Three Parameter Estimation of the Weibull Distribution
95
Computer modeling results by maximal likelihood approach are given in Section 3.
2.2 Analytical estimator Simple analytical estimate of the shape parameter was proposed by [Haan, 1981, Zilinskas and Zhigliavsky, 19911:
A
a=
where m
+
4 m
00, -
,
+ 0.2.
2.3. Improved analytical estimator However empirical computer simulation shows that the estimator (1 1) is biased and this bias increases when a increases. Better results are obtained when ~ ( 1 )is changed by linear estimate of the location [Balakrishnan N. and Cohen
A., 1990, Zilinskas and Zhigliavsky, 19911:
~ r n , N= ~ ( 1 -ern ) (V(m)-71
)?
(12)
Recent Advances i n Stochastic Modeling and Data Analysis
96
where
c, =
m! After simplification we obtain
1
c, = j=1
Thus more exact estimator is given by the equation:
*
a =
where 6 is taken from (1 1). To study this equation we denote
im\i
In
-
Three Parameter Estimation of the Weibull Distribution 97
1-
1
1-
1 , x=-.
a
j=l Thus, improved estimator must satisfy equation:
*
a = f(z,a*). For the solution of this equation it is convenient to use the simple iteration method: “t+l = f ( z 3 a t L
(15)
where a0 - initial value. Studying of properties of function (13) enable us to
21
make sure that equation
“0
>
1
.
lnkl
(24) has solution if
z>-
j=1 J
. Next, if
and solution of equation (14) exists, then the
98
Recent Advances in Stochastic Modeling and Data Analysis
sequence
(15) converges to the solution
of (14) with
linear rate:
3 Computer modeling We investigate the approach developed for three-parameter Weibull distribution estimation by computer modeling. Random samples from Weibull distribution (c=l, A=O) has been simulated and parameters have been estimated for every sample. Sampling averages of estimates are presented in the Table 1 and Table 2 varying the sample size N, number of order statistics m. Number of random samples M=lOO. In Figure 1 histograms of estimators are depicted.
AE
MLE
IAE
N 1000
I
2.4254142572
I
2.0301502001
I
2.6106745229
10000
I
2.4342467693
I
2.0462364888
I
2.7187719537
20000
2.3773970971
1.9687636383
2.4980881479
50000
2.3979007379
1.9538269253
2.458 1814946
100000
2.3602375824
2.0192535483
2.5933076899
Three Parameter Estimation of the Weibull Distribution
I
m=500, q=100,
I I I
I I I
1000 10000
~~
I I
5.3418326642
~~~~~~
a=5.0, c=l, A=O
I
MLE 5.0369583816
I
AE
3.3977207549 3.4717963621
I I
IAE
4.8583516027 5.0889989867
~
20000
5.2527942058
3.449861 1165
5.0855570765
50000
5.1 191912978
3.3959926626
4.8853179053
100000
5.1535 109997
3.4021267545
4.9607077656
0
99
1
2
3
4
5
6
7
8
9
10
Fig.1. Histogram of ML estimators of a (a=2.5, 100 trials) Computer modelling results show that maximal likelihood and improved analytical estimators (MLE and IAE) tolerably evaluate shape and other parameters of Weibull distribution under appropriate choice of N and m. For small m MLE sometimes does not exists. The probability of non-existence of MLE decreases when m and N increase. For reliable evaluation of large a, more order statistics and large samples would be used. The method described above could be applied for real-life problems, for instance, for optimality testing and termination of random search algorithms.
100 Recent Advances in Stochastic Modeling and Data Analysis
4 Conclusions In this paper we present maximal likelihood (ML) and analytical estimators for the estimation of the Weibull parameters, namely, translation, scale and shape parameters using order statistics of noncensored sample. Simplified method used for MLE enables us to create a simple estimation procedure of the shape parameter by solving one-dimensional equation. Improved analytical Haan estimator has been developed. Computer simulation confirmed that MLE and improved Haan estimator allows us to estimate Weibull distribution parameters with acceptable accuracy.
5. References [Balakrishnan N., Cohen A., 19901. Balakrishnan N., Cohen A. Order Statistics & Inference: Estimation Methods. Academic Press, Inc. [Bartkute V., Sakalauskas L., 20041. Bartkute V., Sakalauskas L. Order statistics for testing optimality in stochastic optimization. // Proceedings of the 7'hInternational Conference ,,Computer Data Analysis and Modeling". Minsk, p. 128-131. [Bartolucci A. A,, Singh K. P., Bartolucci A. D., Bae S., 19991. Bartolucci A. A., Singh K. P., Bartolucci A. D., Bae S. Applying medical survival data to estimate the threeparameter Weibull distribution by the method of probability-weighted moments. Mathematics and Computers in Simulation, Vo. 48, No. 4-6, p. 385 - 392. [Blischke W. R.,1974]. Blischke W. R. On nonregular estimation. 11. Estimation of the Location Parameter of the Gamma and Weibull Distributions. Communications in Statistics, Vo. 3, p. 1109-1129. [Haan L., 19811. Haan L. Estimation of the Minimum of a Function Using Order Statistics. Journal of the American Statistical Association. Vol. 76, No. 374, p. 467-469. [Hirose H., 19911. Hirose H. Percentile point estimation in the three parameter Weibull distribution by the extended maximum likelihood estimate. Computational Statistics &Data Analysis, vol 11, p. 309-33 1. [Lemon G . H., 19751. Lemon G . H. Maximum Likelihood Estimation for the Three Parameter Weibull Distribution based on censored samples. Technometrics, vol. 17, NO. 2, p. 247-254. [Murthy D. N. P, Xie M., Hang R., 20041. Murthy D. N. P, Xie M., Hang R. Weibull Models. Wiley Series in Probability and Statistics, John Wiley & Sons, Inc. New Jersey, USA. [Rockette H., Antle Ch., Klimko L. A.,1974]. Rockette H., Antle Ch., Klimko L. Maximum Likelihood Estimation with the Weibull Model. Journal of the American Statistical Association. Vol. 69, No. 345, p. 246-249. [Zanakis S. H., Kyparisis J., 19861. Zanakis S. H., Kyparisis J. A Review of Maximum Likelihood Estimation Methods for the Three Parameter Weibull Distribution. Journal of Statistical Computation and Sirnulation, Vo.25, p. 53-73. [Zilinskas A., Zhigljavsky A., 19911. Zilinskas A., Zhigljavsky A. Methods ofthe global extreme searching. Nauka, Moscow. (In Russian). [Weibull W., 19511. Weibull W. A statistical distribution function with wide applicability. Appl Mech, 18:293-7.
CHAPTER 3
Insurance
Stochastic models for claims reserving in insurance business Tamas Falukozy', Ildiko Ibolya Vitez', and Mikl6s Arat6l 1
Department of Probability Theory and Statistics Eotvos Lorind University Budapest, Hungary (e-mail: [email protected], [email protected])
Abstract. Insurance companies have to build a reserve for their future payments which is usually done by deterministic methods giving only a point estimate. In this paper two semi-stochastic methods are presented along with a more sophisticated hierarchical Bayesian model containing MCMC technique. These models allow us to determine quantiles and confidence intervals of the reserve which can be more reliable as just a point estimate. A sort of cross-validation technique is also used to test the models. Keywords: insurance, reserving, quantiles, MCMC, cross-validation.
1
Introduction
For an insurance company it is crucial to determine the future payments - generally this amount is called 'the reserve'. This is usually done by using the expected value (EV) principle which means that the reserve gives the safety appropriate to the EV. It is obvious that this is not sufficient enough, therefore insurance companies have to build a solvency capital which provides a safety puffer above the reserves for unexpected cases. In 1999 the European Union decided to reform the regulation of capital requirements for insurance companies, however, substantial work began in 2003 (Solvency I1 project). Although it seems settled that the new regulation will deal simultaneously with the reserves and the solvency capital. The main goal is to provide safety levels for the reserve and for the capital based on stochastic modelling. In this paper the socalled IBNR reserve is examined which refers to the claims that already incurred (occured) but have not been reported to the insurance company yet. There are a lot of different reserves but in non-life business the main part of the ultimate reserve comes from the IBNR and from the outstanding claims, i.e. the claims that already incurred, are also reported to the company, but are not paid yet. Up to the present the determination of the IBNR reserve was usually done by using deterministic methods based upon the so-called run-off triangle. In this triangle the claims of the past are presented and they are separated according to the time period of incurrance and to the delay of report or payment. Throughout this paper the elements of this incremental run-off triangle are denoted with C l , ] . 102
Stochastic Models for Claims Reserving in Insurance Business
103
Most of the deterministic methods uses the cumulative run-off triangle i
Ci,(see example Figure 1).
which elements are D,,, = S=l
(Maximum) Delay of reportfpayment (+1) 1
1
2
...
...
n
Fig. 1. The structure of a cumulative run-off triangle.
There are a lot of different run-off triangles, for example, the time period can be different - we examined daily, monthly, quarterly and annual periods and their influence on the reserve. Our second goal was to develop stochastic reserving methods for IBNR claims according to the main idea of the Solvency I1 project discussed above. Two naiv stochastic models are presented first and then a more sophisticated hierarchical Bayesian model is produced which contains MCMC technique. The results of the different methods are compared and the models are also tested by using some kind of cross-validation method. We have two different data-sets received from two different Hungarian insurance companies. The first one contains household insurance contracts and covers seven development years (from 01.10.1998. to 30.09.2005., with 686 71 1 contracts, 179 618 claims, 1,3 bn HUF claim amount and 10,5 mn HUF outstanding claim amount), the second one consists of motor third party liability (MTPL) contracts and covers a four-year period (from 01.01.2000. to 31.12.2003., with 500 690 contracts, 43 392 claims, 13,5 bn HUF paid claim amount and 3,5 bn HUF outstanding claim amount). These two data-sets are actually of two different kinds. The household insurance claim data-set has a short run-off which means that the claims are reported and paid within a short period of time (approximately one year), while the MTPL data-set has a substantially longer run-off. Considering that one of our aims is to analyze how the period of the run-off triangle influences the ultimate reserve, we constructed four run-off triangles to both data-sets: with daily, monthly, quarterly and yearly periods, all of them include incurred claim numbers.
104 Recent Advances in Stochastic Modeling and Data Analysis
Claim amounts were also examined, moreover, in that case we dealt with both incurred and paid claims - in that context the results can be very different, and the so-called Munich chain-ladder method, which uses both triangles and tries to reduce the gap between the results given by using them separately, was also tested. However, in this paper we discuss only the claim number results, because the claim amount results are far more complicated and discussing and analyzing them would be very spacious.
2
Deterministic and semi-stochastic methods
2.1 The Chain-Ladder technique In this section we start with reviewing one of the most widely used reserving techniques, the chain-ladder (CL) method, then we provide two naive stochastic models, which allow us to determine quantiles of the ultimate reserve. Let Di,, denote the number of claims incurred in period i and reported until period i+j-1 (i=l,...,t, j = l , ...,n-i+l) . If there are claims not reported after n years, we denote the approximated claim number incurred in first period and reported after YE periods by D,,n+. The
Di, j+l Di, j
essential assumption of the CL technique is that the lj(i) := - rates do not depend strongly on i (the period of claim incurrence), they all are approximately I,. To calculate the 1,
= D"n+ rate (the rate of very late ~
D13n claims) we use our experience from previous periods or the outstanding m
claim number. (In the latter case I , = 1+
where
is the
Q,n
outstanding claim number referring the first period.) All the other I, rates are determined by a practical weighting of the real b(i) rates - the advantage of these formulae is that we don't need to calculate the actual i'Ji) rates, we just need the X I , ]claim numbers: 1, =
~ , , J ~ J ~ ~ ~ + ~ ~ , J ~ J- ~4 , , +~1 ~+D2,J+i + ~ +-+Dn-,,,+i ~ ~ + ~ ~ ,- j , =, 1,..., l n~ - 1l . ~ ~ ~ J ~
' '
'1.J Dz,J ".+ 'n-J.1 D1.j + D2,j -+ ...+ Dn-j,j Because of its simplicity and reliability the CL method is a very popular reserving technique, moreover, it can be shown that it is correspondent with some stochastic reserving models (see for example [Verral, 20001).
Stochastic Models for Claims Reserving in Insurance Business
105
2.2 Semi-stochastic methods As mentioned before, our aim is to develop stochastic methods which we can use to provide quantiles of the ultimate reserve. First we introduce two pretty simple methods which poorly use stochastic techniques therefore we call them semi-stochastic methods.
First of all we recall the
J+1 I, (i) := values DI
which we used to
DIJ determine the Z, ratios in the CL method. Now these Z, rates are not the quotients of the sum of the D,,J values in the columnsj andj+Z but they are i.i.d. random variables. Instead of the real distribution hereafter we use the empirical distribution of these variables which is a discrete uniform distribution over the known Z,(i) values. Now, we can follow two different ways. On the one hand we can consider the expected values of these variables and by filling up the lower triangle using these values we get the expected value of the ultimate claim number, e.g., the expected value of the ultimate reserve. By the same way we can also calculate the variances of the variables and that of the ultimate reserve. Considering that we are dealing with a pretty large data-set, we assume that the ultimate reserve has a normal distribution which parameters can be estimated by the way described right now. Once we have the expected value and variance we can easily calculate quantiles of the appropriate normal distribution. This method is denoted by UN (uniform-normal). The other way is to generate a suitable large amount of outcomes of the ultimate reserve using the uniform distributions and then we can look at the empirical quantiles as the quantiles of the ultimate reserve - this method is denoted by UG (uniform-generating). Let us be more specific about the first method. Our basic assumption is that the L, variable has a discrete uniform distribution over the set
zj (i) = Di,j+l
. Therefore
~
its expected value is given
Di,j by E(L,) =
-z*, 1
n-J
n - ~D I=1
so we have to multiply the known claim
DZ,J
numbers in c o l u m j by this value to get the unknown claim numbers in c o l u m j + l . One can note that this expected value is the sum of the h(i) rates with the same weights, i.e., the average of them, so this method exactly follows the so-called ,,link ratios with simple average" method. As mentioned above we assume that the LJ variables are independent, so we
106 Recent Advances in Stochastic Modeling and Data Analysis
c[
Dn-J+l,J p E ( L k ) ]
can calculate the ultimate claim number by
2
e'g'
J=l
. As discussed
the ultimate reserve by
above this value is exactly the same that one can get using the ,,link ratios with simple average" method. But considering the 4 rates as LJ variables we are able to determine the variance of the ultimate claim number which is equal to the variance of the ultimate reserve (because one can get the n
latter as the sum of the first and the known Z Dn - j + l , jvalue). As j=1
assumed
earlier
the
LJ
1
c*, n-J
variables
are
independent,
so
D2
and using E(L:) = n - J I = I DiiJ
we get that the variance of the
ultimate reserve is given by k=J
k=J
having the expected value and variance of the ultimate reserve and assuming a normal distribution we can easily calculate any quantile of that reserve. Let us deal with the second approach. We assume that the LJ variable has a discrete uniform distribution over the set Di,j+l
l j ( i )= -
4,j
. To
get the quantiles of the ultimate
reserve we shall calculate all the possible outcomes (having the same probability), i.e., we shall determine all the possible reserves for each row (each period of incurrence). Considering that the reserve for the row i is -(Ln-i+,.Ln-r+2 Ln-l -1) and the fact that LJ can given by Di,n-i+l ....a
have (n - j ) ! different values we have (n - l)!.(n - 2). .... I! possible outcomes for the ultimate reserve. If we deal with daily periods (which means the run-off triangle for the MTPL data-set has n=1461 rows), it can be seen that it is impossible to calculate all the possible outcomes. Therefore we generate only 1000 outcomes: from the LI, Lz, ..., Ln-l
Stochastic Models for Claims Reserving in Insurance Business
variables
we
take
a
random
sample,
then
calculate
107
the
Dl,n-l+l .(L,-,+,. Ln-,+2. ... Ln-l - 1) reserve for each period i, finally by summarizing these period-reserves we get the ultimate reserve. From these thousand values any quantile of the real ultimate reserve can be estimated.
3
The MCMC method
The methods discussed so far are deterministic or semi-stochastic. Using Markov Chain Monte Car10 technique makes it possible to build up a more sophisticated stochastic reserving method. For the different lengths of period the model is the same, only the parameters and the input are different. We give the description of the model for daily data. We decided to create a one-level model, so the parameters of the prior distributions are fixed numbers chosen in such a way that the prior distributions of the parameters are best fitted to the data. For the calculations we need the runoff triangle of incremental claims, its elements denoted by Cl,,, and we also use the vector e containing the number of contracts in the several days (or months, quarters, years). In this model we assume that the data are Poisson distributed, as we are working with positive integers. The expected value of the claims incurred on day i, reportedj-2 day later is the product of e,, the number of (valid) contracts on day i, X,, the claim intensity of day i per one policy, and 5, the proportion of claims reported j - l days late: E(C,,, ) = elX,Y, , (i, j = 1,...,u ) . Assuming that the claim intensity of day i, X, and the proportion of claims reportedj-l days late, 5 are random variables, we have to determine their prior distributions. XI,...X n are considered to be prior independent and identically Gamma distributed random variables. We also fitted the model with other prior distributions such as normal but this choice seemed to be the best working. Additionally, when examining claim numbers it is widespread to use Poisson distribution for the number of claims and Gamma for the intensity as this way the unconditional distribution of claims incurred on a fixed day is negative binomial. ,.., Y, are prior Beta distributed random variables on the appropriate intervals, determined below. We have the n
constraint X Y , = 1 as we suppose (presume) that all the claims are k=l
reported at most n days late. The parameters of the prior distributions are chosen as follows. Alpha and lambda parameters of the Gamma
108 Recent Advances in Stochastic Modeling and Data Analysis
distribution are calculated in such a way that the expected value is the mean of the X i-s' starting values, and the variance is suitably high to let the data dictate. For the starting values of X , we used the naive estimate:
vectors C,!,,= C, /e,. The prior distribution of ,
Y, (i = 1,..A- 1) is Beta
with parameters a, b where the variance is fairly high and the expected value is greater than the half length of the interval. Thus, we have the following prior distributions:
where notation [x I y ] denotes the conditional density function
f,.,, (xly)
in continuous cases, and the P(X = x I Y = y ) conditional probability in discrete cases, respectively. Providing that the elements C,,, have Poisson distribution with the appropriate parameter, the joint posterior distribution is:
We obtain the posterior distribution of
X,:
Stochastic Models for Claims Reserving in Insurance Business
The posterior density function of
109
is a bit more complicated:
Clearly, the latter is not the density function of a well-known distribution so we have to use Markov Chain Monte Carlo simulation to get the expected values and also the appropriate quantiles. We determine certain quantiles of the reserve in the following way: in each MCMC iteration we generate a subsequent sample of X, and Yj, and also ofC,,, as random sample of Poisson( e,X,Y, ). So in each iteration we have a whole square of claims incurred in the first n days and reported at the most n-I days late fiom which it is easy to calculate the reserve as the sum of the values in the right down triangle. Therefore we performed 1 000 000 iteration of the MCMC method, leaving the first 100 000 out as the burn-in period, using the rest for determining the point-estimate and the quantiles. In each iteration we obtained a subsequent sample of the parameters and using them we generated a Poisson-distributed value with the appropriate expected value for the reserve. This way we obtained the quantiles deriving from these 900 000 values. 4
The results
In the previous sections we introduced the models used to estimate the reserve of the two data-sets, here we present the results for the monthly and the quarterly data - daily periods result a high variance when determining the quantiles and using annual data we have just four or seien rows in the run-off triangle which is insufficient for a correct MCMC algorithm.
Fig. 2. Point estimates of the reserve for the household data-set.
110 Recent Advances in Stochastic Modeling and Data Analysis
Fig. 3. Point estimates of the reserve for the MTPL data-set.
Figures 2 and 3 show the point estimates for the reserve given by the different methods and by using different time periods for the household and MTPL data-set, respectively. Of course we received different values working with the different periods but we also got highly different results using the various methods on the same run-off triangles which is not so obvious. It can be seen that the estimates of each algorithm grow with the length of the time period. As for the different methods it seems that the measures produced by MCMC method are consequently near the chainladder estimates while the two semi-stochastic methods are similar to each other, always giving higher results than the other two - as far as the household data-set is concerned. But considering the MTPL data it is the MCMC method that gives different values which are lower than the estimates of the CL technique while the two simple stochastic methods result very similar values. Therefore it looks like that choosing the best method can be done only according to the data.
Fig. 4. Quantiles of the reserve for the household data-set.
Stochastic Models f o r Claims Reserving i n Insurance Business
111
'i
Fig. 5. Quantiles of the reserve for the MTPL data-set.
For the quantiles we have three methods as from the chain-ladder technique we get only a point estimate. In Figures 4 and 5 the 50, 75, 90, 95% quantiles for the reserve can be seen for the three different methods using different time periods. As previously stated the MCMC results tend to be lower than the others and this is true for all quantiles and for both data-sets. This method differs from the other two in respect of the variation which is basically lower than that of the semi-stochastic methods. As mentioned before, the daily data give always the highest variance, but for the other periods the variance is neither decreasing nor increasing with the length of the time period. 5
Testing
The purpose of this paper is to calculate the reserve for an insurance company's claim numbers. In the previous chapters we introduced various methods of estimation, and of course we are interested in comparing these methods, let alone the fact that our estimates are rather different. A way of testing the goodness is calculating these estimations for just the first part of our data, and the real value of the reserve using the continuation of the data. By comparing the estimates to this last value it turns out how good the different methods are for this sequence of data (a sort of crossvalidation). It is unnecessary to explain how important is to know more about the reliability of these algorithms. The result of our calculations show that even in the simple case of household claims some of the basic procedures (like chain-ladder) fails. (As the MTPL data-set covers only four years, this testing method can barely applied to it.) For the test calculation we used the claims incurred and reported in the first 3 years. Here we present the results received by using monthly periods. The real
112
Recent Advances in Stochastic Modeling and Data Analysis
value for this part of the data is 1788 while the CL method gives only 990 which is a really bad underestimation.
Fig. 6. Quantiles of the reserve for the test data-set compared to the real value.
Figure 6 shows the quantiles of the reserve for this first three years computed by the three stochastic methods compared to the real value. Even if the point estimate is low, the algorithm can be acceptable if the real value of the reserve lies in the [lo%, 90%] confidence interval of our estimations. As discussed above the MCMC method gives the lowest results and has the lowest variance so it is not surprising that all the quantiles given by this model lie under the real value. But the 90% and 95% quantiles of the two semi-stochastic methods exceed the real value (in the case of the UN model also the 75% quantile is higher) so by using these methods it is possible to get reliable estimates for the reserve.
6
Conclusion
As a conclusion we can say that longer time periods result higher reserves, while using more detailed data (shorter periods) lowers the result but also raises the variance and therefore the uncertainty. Concerning the other goal, i.e., to develop stochastic models for claims reserving (it is obvious that the point estimates given by the deterministic methods are not reliable enough), one can see that also the stochastic models can provide very different results. The main question is how to choose between these models. We suggest that one should apply and also test all the methods (the deterministic models too), and then the most appropriate can be taken according to the type of the data set (i.e. shorter or longer run-off) and to the results of testing. In the future we want to develop our methods by using bootstrap technique and compare them with other stochastic models.
Stochastic Models for Claims Reserving in Insurance Business
113
References [de Alba, 2002lE. de Alba. Bayesian estimation of outstanding claim reserves. North American Actuarial Journal, volume 6, number 4, pages 1-20. [de Alba, 2004lE. de Aha. Bayesian claims reserving. Encyclopedia of Actuarial Science, John Wiley and Sons, Ltd., Sussex, UK. [England, 2002lP.D. England. Addendum to “Analytic and bootstrap estimates of prediction errors in claims reserving”. Insurance: Mathematics and Economics, volume 3 1, pages 46 1-466. [England and Verrall, 1999lP.D. England and R.J. Verrall. Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Mathematics and Economics, volume 25, pages 281-293. [England and Verrall, 2002lP.D. England and R.J. Verrall. Stochastic Claims Reserving in General Insurance. Institute ofActuaries and Faculty ofActuaries, pages 1-76. [Mack, 1994lT. Mack. Which Stochastic Model is Underlying the Chain Ladder Method?. Insurance: Mathematics and Economics, volume 15, pages 133-138. [Mack and Venter, 2000lT. Mack and G. Venter. A comparison of stochastic models that reproduce chain-ladder reserve estimates. Insurance: Mathematics and Economics, volume 26, pages 101-107. [Renshaw and Verrall, 1998lA.E. Renshaw and R.J. Verrall. A stochastic model underlying the chain-ladder technique. British Actuarial Journal, volume 4, number 4, pages 903-923(21). [Verrall, 2000lR.J. Verrall. An investigation into stochastic claims reserving models and the chain-ladder technique. Insurance: Mathematics and Economics, volume 26, pages 91-99. [Gyarmati-Szabo and Markus, 200715. Gyarmati-Szab6 and L. Mhrkus. A Hierarchical Bayesian model to predict belatedly reported claims in insurances. Manuscript, 2007.
Stochastic Risk Capital Model for Insurance Company Gaida Pettere Banking Institution of Higher Education K. Valdemara street 1b, Riga LV 18 19, Latvia e-mail: gaidaB1atnet.h Abstract. The paper is about stochastic modelling risk capital requirements to cover equity risk for an insurance company by using copulas. We have tried to find the best portfolio structure with copula model and simulation using risk measure VAR. The result is compared with equity shock model given in Solvency I1 documents.
Keywords:equity risk, portfolio optimisation, copula model, Value-at-Risk, solvency
1 INTRODUCTION All insurance companies in Europe are starting to follow Solvency I1 requirements. One of the main approaches of Solvency II is solvency capital requirements for insurance companies. The Value-at-Risk has become very important risk measure because it expresses the amount of capital to cover certain amount of losses with some probability. Usually probability of insolvency is chosen arbitrary and remains around 0.5% if we are talking about loss distributions. VAR as risk measure for all insurance risks has become main keystone in evaluating capital requirements under Solvency I1 approach. In the same time it is advised to evaluate equity risk as part of market risk to assume 32% decrease for global equities and 45% decrease for other equities without taking into account diversification of portfolio (see CEIOPS’ materials). Such simplified assumption leads to very high solvency capital requirement for equity risk. Therefore our aim is to show the difference between the mentioned simple assumption and stochastic approach in solvency capital calculations using as risk measure the same VAR measure as for insurance risks. There have been many attempts to evaluate risk capital in recent years (see Panjer (2003), Bagarry (2006), Fabien (2003)) but all of them deal with multivariate Gaussian or elliptical distributions for modelling loss distributions. VAR as a risk measure in asset pricing and portfolio optimization is used very widely in last ten years despite that VAR and expected shortfall (ES)as possible risk measures were already introduced by Markowitz (1959). In last 15 years many publications have appeared about using copulas in different fields of our life. The advantage of copulas is possibility 114
Stochastic Risk Capital Model for Insurance Company
115
to estimate VAR of portfolio in the case of non-Gaussian and non-elliptical distributions of portfolio. Breymann, Dias and Embrechts (2003) have shown how to use copulas when working with high frequency data in finance in the two-dimensional case. Later Embrechts, Hoing and Juri (2003) have found estimators of upper and lower bounds of VARfor portfolio consisting from the equal parts of n equities and have given two-dimensional examples. More widely copula methods in finance are used by Cherubini, Luciano, Vecchiato (2004). They have shown how to simulate multivariate n-dimensional copulas and how to use copula technique in VAR estimation for equity portfolio. But only two-dimensional cases are considered as examples. Embrechts and Hoing (2006) have found how to estimate V&? for ndimensional portfolio which consists of assets in equal amounts using copulas. We have tried to work out model how to find high level VAR of portfolio if it does not consist from the same amount of equities. Necessity for that is discussed by Embrechts and Hoing (2006). At the same time we have tried to find the best proportion of equities in portfolio to have smallest necessary solvency capital in the case when no marginals no portfolio distribution are elliptical.
2 METHODOLOGY We have investigated portfolio consisting of three equities with correlation around 0,5 and less. We put together the portfolio from equities of American Express Co (Axp),Walt Disney Co (DIS) and Hewlett Packard Co (HPQ) with correlation coefficients = 0.32, = 0.53 andp2,3= 0.37, where index 1 is used for rate of returns of American Express Co, index 2 for Walt Disney Co and index 3 for Hewlett Packard Co. We calculated rate of returns by formula:
Ri= r::+Wdqys - 1:
r:
where is the price of the ith security. One quarter (90 days) is used as a time lag because our liabilities are calculated on quarter basis. Rates of return were transformed later (transformed rates of return) to have losses OR right hand side:
xi=1-E;5.
Portfolio is considered consisting from mentioned securities in different proportions:
116 Recent Advances in Stochastic Modeling and Data Analysis
cai .xi 3
xp,t+T =
i=l
where
cai 3
with the condition that
= 1,
i=l
ai is the proportion of the ith security in portfolio
We have used in our investigation two-stage method: at first we estimated the parameters of marginals by using Kolmogorov D (absolute value of difference of empirical and theoretical distribution hnctions), secondly we estimated chosen copula parameters applying numerical method for minimizing sum of squared distance between sample and simulated data. Several classes of multivariate Archimedean copulas: Clayton, Frank, survival, two-parameter copula (Nelsen, 1999) and Gaussian were investigated. Kendall’s Tau was used as the dependence measure. After appropriate copula was found we 50 times simulated 50000 data, constructed portfolio using formula (3) and found its empirical VAR(99.5%). After that we started to investigate the behaviour of VAR to change proportion of securities in portfolio. All calculations have been made using Mathcad and Microsoft Excel software.
3 DISCUSSION AND RESULTS The data under consideration consists of 1165 rates of returns from 01.06.2001 until 2.06.2006 with time lag 90 days for each security calculated by formula (3) and transformed by formula (4). Basic characteristics of all three random variables are presented in the following Table 1 and correlations in Table 2. Table 1. Characteristics of examined random variables
Kurtosis Skewness Sample size ‘Largest(1) Smallest(1)
0.39515 -0.04863 1165 1.37041 0.55841
-0.14765 0.55240 1165 1.43354 0.60120
-0.19225 -0.10848 1165 1.40796 0.29948
hStochastic Risk Capital Model for Insurance Company 117
Kendall’s Tau
Linear correlation coefficient
AXP
1
DIS
HPQ
0.321 1
0.528 0.371 1
1
0.544 1
0.310 0.357 1
First the marginal univariate distributions are examined using families of lognormal and Wald distributions. Random variable X has Wald distribution with parameters p and 2, if the density is of the form (Balakrishnan, Nevzorov, 2003):
We have applied Kolmogorov goodness of fit test to find the best approximations for all three random variables. Used parameters, and the test statistics for comparison with the 5% critical value of Kolmogorov test (0.039845) are shown in next table (Table 3).
Both distributions give good approximations as it is possible to see to compare calculated values of D with the critical value of the Kolmogorov test statistic. Graph of the empirical distribution and both theoretical densities for transformed rate of returns of American Express Co are shown in Figure 1. We have used standard algorithm for Archemedian copula generating described in Frees, Valdez (1998) and later for Clayton, Gumbel and Frank copulas a generalized version by Cherubini, Luciano, Vecchiato (2004): 1. Generate U,, ,..., independent uniform (0,1) random
u, up
numbers.
118 Recent Advances in Stochastic Modeling and Data Analysis
2.
Set XI = 4-'(UI) and co = 0 .
3.
For k = 2, . . . , p , recursively calculate X k as the solution of
where p is the generator of an Archimedean copula, q
-l(k-I)
is the k - 1
order derivative of the inverse generator, Fk- marginal distribution hnction and 'k
0.1
1
= d6(x1>1+ d 4 (
x 2
>I+..*+p[& ( x k
1
0-
Figure 1. Approximation of transformed rate of returns of American Express Co by lognormal and Wald distribution. We have worked out ourselves the mentioned simulation algorithm for Survival copula and for two parameter Copula 4.24 from Nelsen (1999), p. 124. Algorithm for Survival copula is the following: 0 Simulate 3 independent random variables u l , u 2 , U3 from
W O , 1) 0 0
Set XI = e-I(u1) Calculate C1= ln(1- 6 In u l ) Find 1-(1-61n(F2 (x2)))eC'
1-eC1 ~
x2 t root[(1 - 6 In( F2(x2)))e
6
-242-e
,x2]
Stochastic Risk Capital Model f o r Insurance Company 119
Calculate C2 = ln[(l - S In ul)(l - S In u2)] I-(I-JI~(F,(x~)))~'~ 0
Find x3 t root[(l- 8 h(F3(x3)))e
6
The algorithm for Copula 4.24 is: 0 Simulate 3 independent random variables ul, u2, U3 from
W O , 1)
'
0
set xl = e - l ( u 1 ) .
0
Calculate C, = (uI-" -1)'.
0
Calculate C, = (ul-"
- 1)' + (u2-" - I)'.
x3 t root[((c2+ ((F,(x3))-"- 1)"F
1
-1-a
+ 1)"
1
x(c2 +((F,(x3))-"-l)p>-2+p
We have used also the standard algorithm to simulate Gaussian copula (see Cherubini, Luciano, Vecchiato (2004)). We investigated all mentioned copulas (Clayton copula, Frank copula, Survival copula, Copula 4.24 (Nelsen (1999) example 4.24)) with lognormal and Wald distributed marginals. Copula parameter and the copula itself for hrther applying was found by using the fitting measure
cc H
sEE=
H
k=l J = 1
H
(Oi, j , k -,i' i=l
H3
j,k
1'
120 Recent Advances in Stochastic Modeling and Data Analysis
where maximum lengths of each axis are divided into H segments and ff 3 rectangles created, Oi,j,kis observed frequency in each rectangle and Si,j,k is the simulated frequency in each rectangle under the condition that size of the simulated data coincides with data size. Average values of the fitting measure and its standard deviation with the best copula parameter are shown in Table 4. Marginals
Clayton copula
Frank copula
S = 1.4 Lognormal, lognormal, lognormal Wald, Wald' Wald
Survival copula
6 = 7.5
p = 12.256 0 = 0.782 p = 7.486 0 = 0.578
p = 8.470 0 = 0.489 p = 8.903 0 = 0.563
Copula 4.24
S=-O.1
a=5, ~=1.4
p =20.156 0 = 1.109 p = 19.023 0 = 0.989
p = 13.024 0 = 0.566 p = 14.356 0 = 0.675
Gaussian copula
p = 16.34 0 = 0.967
As it is possible to see from Table 3 the best approximation is reached with Clayton copula with Wald marginals. Finally we have simulated 50 000 data 50 times with that copula, have calculated portfolio with different proportions of securities in it and have found average of empirical VAR(99.5%) of portfolio. Results with several proportions of securities are shown in the Table 5. As it is possible to see from mentioned table, it is possible to find optimal structure of portfolio that gives better 99.5% VAR to compare with 32% or 45% decrease. Table 5. Portfolio VAR and corresponding rate of return in comparison with
0,4 0.2
I I
0,2 0.2
I I
0,4 0.4
I I
1.221 1.395
I
I
-0.221 -0.395
I
I
0 0.6
I
I
0.4
0.4
I
I
0.6 0
I
I
1.106
1.201
I
I
-0.106
-0.201
4 CONCLUSIONS Solvency I1 papers claim that no diversification can be taken into account but in the same papers is advised to add all risks with correlation matrix. Like it is possible to see from Table 4 diversification plays large role in solvency capital determination. The negative point is that it is difficult to
Stochastic Risk Capital Model for Insurance Company
121
find appropriate copula because most widely used copulas have only one parameter and it is difficult to fit the distribution to data. Even two-parameter Copula 4.24 did not give better approximation..
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Bagany M. (2006) Economic capital: a plea for the Student copula. 28"' International Congress of Actuaries, http://papers.ica2O06.com/30 14.html. Balakrishan, N., Nevzorov, V. (2003) B. A Primer on Statistical Dism'butions.Wiley, New York. Breymann W., Dias A,, Embrechts P. (2003) Dependence Stsuctures for Multivariate High-Frequency Data I Finance, Quantitative Finance 3(1) 116, m.math.ethz.ch/-baltes/ftp/papers.html. CEIOPS'advice to European Commission in the light of Consultation Paper CP15-20, m.gcactuaries.org, Solvency I1 page. ceiops_qis2_final,www.pcactuaries.org, Solvency I1 page. Cherubini, U., Luciano, E. (2001) Value at risk trade-off and capital allocation with copulas. Econ. Notes, 2001, v01.30,235-256. Cherubini U., Luciano E., Vecchiato W. (2004) Copula Methods in Finance, Wiley, New York. Embrechts P., Hoing A., Juri A.(2003) Using Copulae to bound the valueat-Risk for functions of dependent risks, Finance dt Stochastics 7(2) 145167, www .math.ethz.ch/-baltes/ftp/papers .html Embrechts P., Hoing A. (2006) Extreme VAR scenarios in higher dimensions, www.math.ethz.ch/-baltes/ftp/DaDers.html. Fabien F.(2003) Copula: A New Vision forEconomic Capital and Application to a Four Line of Business Company, www.actuaries.org/ASTIN/Co~lo~uia/Berliil/F Frees, E. W., Valdez, E. A. (1998) Understanding relationships using copulas. North American Actuarial Journal, vol. 2, 1-25. Hurlimann W. (2002) An alternative approach to portfolio selection, in Proceedings of the 12"' international AFIR Colloquium, Cancun, Mexico. Markowitz, H. (1959) Porvolio Selection. Wiley, New York. Micocci M., Masala G. Basktesting value-at-risk estimation with non Gaussian marginals. Nelsen, R B.(1999) An Introduction to Copulas. Springer-Verlag, New York. Panjer H H. (2003) Risk, Solvency and Allocations of Capital in Financial Institutions. Financial Engineering News, 2003, vol. 30, MarcldApril, www.fenews.codfen30.
Measuring demographic uncertainty via actuarial indexes Mariarosaria Coppola' , Eniilia Di Lorenzo', Albiiia Orlando3, and Marileiia Sibillo4
*
University of Naples Federico I1 via Rodin6 22 80134 Napoli, Italy (e-mail: m. coppola0unina. it) University of Naples Federiro I1 via Cintia, Complesso Monte S. Angelo 80126 Naples, Italy (e-mail: diloremiounina. it) CNR, Istituto per le Applicazioni del Calcolo Mauro Picone via P. Castellino 111 801213 Naples, Italy (e-mail: a. orlando0iac. cnr . it) University of Salerno via Ponte Don Melillo 84084 Fisciano (SA), Italy (e-mail: msibilloounisa. it)
Abstract. Aim of the paper is the analysis of the behaviour of risk filters connected to the demographic risk drivers for a portfolio of life annuities. The model, easily suitable to the rase of pension annuities, involves the evolution in time of the mortality rates, taking into account the randomness of the financial scenario. Within this context, the uncertainty in the choice of the deiriograpllic scenario is measured and the analysis is also supported by the VaR sensitivity to this risk source. Keywords: Longevit,y risk, Value at Risk, Risk index.
1 Solvency assessment principles: the importance of the longevity risk 1.1
Basic guidelines on solvency
In a solvency assessnient framework the keystone is represented by the capability of assets to cover obligations. Of course such valuations can be pursued on a run-off basis, i.e. considering only written business, or as an on-going concern, i.e. considering also future new business. As explained by [Cocozzaet al., 2006a], several a.pproaches have been considered in literature and in practice, for instance consistently with a surplus analysis, according t o an actuarial perspective, or with an income valuation. according to an accountant perspective, or with a profit analysis, according t o an ecoiioinic point of
122
Measuring Demographic Uncertainty via Actuarial Indexes
123
view. In any case, whatsever valuation perspect.ive one chooses (that can be aimed, for instance, t o the financial policy of the company or whatever company's policy), it is quite iinportant what follows (cf. [Cocozzaet d., ZOOCia] for a deeper understanding): 0 0
0
identifying the risk drivers affecting solvency, iiieasuring these risk sources, defining the capital requirements apt to cover liabilities.
Categorizing any possible risk must be opportunely addressed t o a specific purpose, ta.king into account several points of view: the shareholder's, the policyholder's, the supervisory a,uthority's ones. Focusing on a run-off basis; according to a. supervisory authority perspective, two risk components basically affect a life insurance system, that is the demographic and the financial risk drivers. Demographic risks are due to the differences between the anticipated mortality rates and the actual death frequencies; analogously financial risks are due to the differences between the actual return on assets (obtained investing premiums or contributions) and the interest rates adopted in the technical bases. Capital requirements can he based on fixed ratio systems or risk-based approaches. As well known, the first procedure, structured on a fixed percentage of a quantity representative of the risk exposure, which determines the solvency margin, does not capture the company's pecularities in terms of risk profile. The risk-ba.sed a.pproa.ch is cha.ra,cterized by connecting ca.pita.1 reqiiirement's to the specific risk profile of t,he insiirance business, that) is (cf. [Cocozzaet al., 2006bl) asset risk, insurance risk, interest ra.te risk, business risk (see RBC requirements developed by NAIC). On the other ha.nd, according to the Ra.sle indica.tions., iiiterna.1 models are realized connecting the siirpliis level t o the company's results and linking capital requirements to ruin probabilities. This approach is framed into a stochastic model involving an overall valuation perspectivc, apt, t,o assess solvency in connection t o the company's activities. In general solvency can be assessed on the basis of the ineqiialit,y:
q . 4 , - Lt 2 0 ) = E ,
E > O
where At and Lt. are; respectively, assets and liabilities at time t . Within a risk mapping procedure, in the following we analyze the impact on the liabilities of the systema.tic deniogra.phic risk, ta,kiiig into a.ccount its intera.ction with the financial risk soi.irces. In this order of ideas the int,ernal risk profile is deeply investigated by means of suitable risk indexes, apt to describe the impact of the selected risk source, as well as additional costs can be estimated with a fixed reliability degree.
124
Recent Advances in Stochastic Modeling and Data Analysis
1.2
An overview on longevity risk: the dynamic aspect of mortality and the projected survival models
The traditional survival models describe the age behaviour of mortality in a wide period, from age 0 t,o the maximum att,ainable age, attributing to the force of mortality, thereby to the probabilities of death, values deriving from period iiiortality observations. Such models assume the mortality trend static in time and manifest itself t o be inadequate in capturing the betterment in the mortality trend, known as the longevity phenomenon, particularly observed beginning from the past century in all industrialized countries (cf. [Pita.cco 20041). The curves of dea,ths reveal to be strongly clia.ra.cterizedby the increasing concent.rat#ionin the inode (recta,ngi~larization)and at the same time by the mode random approaching toward very old ages (expansion), effecting decreasing mortality profiles. In substance, the evolut,ion in time of the number of deaths presents, besides the xcidental deviations from the expected values, systematic deviations too. In actuarial calculations, period-based assuinptioiis for siirvival probabilities mean to represent the future as the past (even if recent past), without taking into account that future death probabilities will he most likely less than the corresponding period-ba,sed ones. In the fair value framework, the question of the correct choice of tools modelling the mortality pattern becomes noda.1 observing that no indications about mortality systematic deviations from the expected values come out from the market, so that values are obtained as marked to model values. Otherwise the longevity bond market is not perfectly developed yet for the incompleteness of the a.ges a.t issue represented in the contra.cts a.nd for the aspect of the incomplete liquidity. As suggested in [Ballottaet al., 2006], in the current valuation approach the deinogmphic phenomenon can be represented a.s the expecta.tion of its best estimate, considering the market neutral with respect to the two aspects, systematic and unsystematic, of the demographic risk. In this basic assumption, the. projec morta.lity tables., founded on models capt'iiring t,he dynamic aspects of the human life, in particular with respect to longevity, coiistitute the tool for fronting the problem. The consequence in the risk mapping is tha.t the demographic risk component results from the combined action of the mortality risk, deriving from the accideiital deviations of the number of deaths from the expected value, and the risk arising froin the use of a. ta.ble not a.dequa.te in representing the future mortality, the demographic m,odel risk. The demographic model risk is studied in the death benefit context in [Olivieri , 20011, in which term a,ssura,nce portfolios are considered paying att,eiition to new deceases. In [Coppola et al., 2006b], the impact, of mortality ta,bles with different degree of projection is analysed with respect to the fair valuation of the mathematical provision in t,he part,icular case of a term assurance with decreasing capital. In that paper the influence of the de-
Measuring Demographic Uncertainty via Actuarial Indexes
125
mographic model risk on the periodic premium duration is practically shown and measured. On the other side, due to their long dura.tion a,nd ma.nifold payments, the actuarial appraisal involving life annuities is indeed a business section heavily affected by the demographic model risk. The choice of a mortality table with an iiiifair degree. of projection can cause remarkable iinderestiination of future costs, including insurers’ or pension fund liabilities or, otherwise, too high amounts constrained for reserves. In light of these considerations, in the life benefit contract framework, the longevity component holds a primary role in the insura.nce business risk ina.na.geiiient a.nd in the solvency a.ssessment. In papers by [Olivieri , 20011 and [Coppolaet al., 20021, the longevity risk is shown to be independent on the portfolio size and the model risk is introduced and measured outlying different inort,ality scenarios. Di Lorenzo and Sibillo (cf. [Di Lorenzoet al., 20021 draw the behaviour of the this risk for the mathematical provisions of a life annuity portfolio as fiiiict,ion of age at issue and portfolio size, picking out a critical age in correspondence of which the projection risk changes its trend. In the same paper some considerations arising from the actuarial functions’ pattern are developed. The relative importance of the two demogra.phic risk coiiipoiients, na.mely the insura.nce risk and the model risk, are coinpared in [Coppolaet d., 2002], together with the contribute of the stochastic interest rate to the global portfolio risk.
2
The mathematical provision fair value in living benefit products
The risk analysis we propose is based on stochastic assumptions both for the financial coinponeiit, due t o the random movements of the interest rates, and for the demographic one. The stochastic framework we will work in is outlined in its evolution in time by the information flow containing the financial and the demographic information formalized in the two probability F’, P’), (0, F”, P”), where F’ and F” a.re the a-a.1gebra.sreferred spa.ces (0, t o the financial and survival information flow respectively; we suppose the independence of the randomness in interest rates on the randomness in mortality rates, as commonly assumed. The two spaces generate the probability space (Q, F, P ) , represented by the filtration { F k } C F , with Fk = FL U 8‘: and {FL} C F’, { F ’ k } C F . The financial market in which the valuation is framed is assumed to be frictionless, with continuous trading, no restrictions on borrowing or short sales, the zero bonds and the stocks being iiifinitely divisible. We consider the case of payments due in case of life at the end of each period to persons belonging t o a.n initia.1 group of c coeval individuals aged 2. If w ( t , j ) is the market value at t,iine t of one monetary unit! due at time j and X j is the stochastic cash flow at time j , we can write the fair value at, t,iine t of the stochastic stream of loss from t to the coiitract,ual ending time:
126 Recent Advances in Stochastic Modeling and Data Analysis
In formula (1) Nj indicates the number of claims at time j in a portfolio of hoinogeneous policies, coinciding with the iiuinber of survivors at that time, in the case of life annuities. In (1) the generic element of the trading strategy replicating the portfolio flow at time j is constituted by N J X , units of unitary zero coupon bond issued in t and maturing in J with the following price (see [Coppola. et al., 2006bl):
The operator E in formula (1) denotes t,he expected value under a risk neutral probability measure hypotheses, deriving by the completeness of the market. This assumption, acceptable in relation to the financial aspect of the operation, is less realistic for its demographic component. As suggested in [Ba.llotta.et al., 20061, the expected va.lue under a. risk neutra.1 proba.bility measure can be calculated using the most opportune probability measure for the demographic component, taking into account both the systematic and the unsystematic mortality risks. In the case of an immediate life annuity portfolio, following [Coppola et al., 2006a], having iiidicated by t p , the probability that the individual aged z survives at the age 2 t , we write:
+
3
Risk indexes in living benefit contract fair valuation
Measuring the impact of the demographic model risk on a fiiiaiicial undertaking involving life benefit payments means quantifying its variability due to t,he raidomness in the. choice of the mortality table, the effects of the other two risk components (stochastic interest rates and random deviations of mortality) having been averaged out (cf . [Coppolaet ul., 20021). Referring to the case considered in section 2, we measure the impact of the demographic model risk ( D M R M ) on the value V, calculated in formula (3), proposing the. following expression (see [Coppola et al., 2006aI)
DMRM
= Va~[IE[v,lKtl] = c ’ V a r [ E [ x Xjjp,jp,.+ttJ(t,
j)lKt]].
(4)
j>t
having estimated by the conditioning on h’t the randomness in the choice of the survival function used to determine the survival probabilities.
127
Measuring Demographic Uncertainty via Actuarial Indexes
Numerical applications
4
First of all we estimate the risk index DMRM in forinula (4),in the case of a portfolio of c = 1000 immediate unitary life annuities, a,ccording to a demographic scenario represented by the mortality rates involved by 0
0 0
the Italian inale life tables S I M 2002, the Weibul model with parameters cy = 85.2 and y = 9.15, the Lee-Ca,rter model a.pplied to the 1ta.lia.nda.ta. (referred to the Ita.lian popillation from 1947 to 1999 and from age 0 to 109, where 109 is the limiting age).
with reliability degrees, respectively, 0.2, 0.3, 0.5. The first table is the less projected one a.nd the third ta.ble the most projected. The fiiiancia.1sceimrio is represented by a CIK process for the interest rates, adjusted on daily average yields on fixed interest securities over the period January 1999- July 2006. t=5
t-10
t=l5
t=20
t=25
t=40
t=50
z = 40 984,8022 1,078,248 1,542,4113 1,984,756 2,480,825 3,156,176 1,278,737 0 x = 65 1,198,026 1,429,125 1,285,527 1,200,000 410,178.8 0.01886
Table 1. Projection Risk -(Model risk) estimation, reliability degrees: 0.2, 0.3, 0.5
Consistently to the results obtained in [Di Loreiizod al., 20021, t.he values in Table 1 show that the projection risk, for every fixed age at issue, generally increases with the time of the reserve valuat.ion, until the age reached at that time assumes a certain (high) value (depending on the pecularities of the selected tables). This valuation time is higher the younger the insured is. The results obtained in Table 1 reflect the reliability degree assigned t o each survival model, as the results in Table 2 confirm, where the probabilities given t o each survival table are, respectively, 0.2, 0.6, 0.2. t=5
t-10
t=15
t=20
t-25
t=40
t=50
z = 40 616,076.6 745,656.3 952,733,3 1,217,554 1,516,759 1,951,206 800,480.4
z = 65 733,151.5 874,301.6 790,432.4
741,963 253,750.6
0.278133
0
Table 2. Projection Risk -(Model risk) estimation, reliability degrees: 0.2, O.G, 0.2
It is interesting t o observe tha.t for the insured a.ged LL: = 40, the risk index, capturing the projection uncert,aint,y, increases until a valuation t,iine greater than the time in which 5 = 65 reaches his hit, since the results incorporate the behaviour of t,he Lee-Carter survival probabilities, where t,he survival dynamic in time reflects the starting age. In confirmation of this, we observe in
128 Recent Advances in Stochastic Modeling and D ata Analysis
Table 3 the behavioiir of t,he va,lues, corresponding t,o the ages z = 40, z = 60 and z = 65, using the Weibull model with parameters ( a = 83.5;g = 8), ( a = 85.32; 8 = 9.15), ( a = 87.0; ,/? = 10.45) and reliability degree, respectively, 0.2, 0.6, 0.2. In fact in this case the three reserves corresponding t o the ages z = 40, z = 60 and z = 65 begin t o decrease when the corresponding insured is between the ages 65 and 70. The behaviour for older ages is due t o the intera.ctions between z a,nd t rela.ted to the projected probability t p , and the residual duration of the contract (cf. also [Di Lorenzoet al., 20021).
t=5 t-10 t=15 t=20 t=25 t=30 t=40 t=50 z = 40 303,495.2 478,901.2 493,899.7 547,731.1 552,953, 1 480376.3 159,494.5 4,576.932 z = 60 421,733.1 406,077.3 279,729 138,056.7 37,788.14 18.73209814 12.1783 0 z = 65 188.557.7 139,OG7.2 118, GG7.9 32.894.78 3.1G0.081 0.223 0.048717 0
Table 3. Projection Risk -(Model risk) estimation, Weibul model
The following tables reporte the quantile reserve values obtained by means of a siiriulatioii procedure developed in [Orlandoet al.. 20061. Here, in order t o have a better approximation of each simulated path for the interest rate, we consider a weekly sample interval. t=5 t-10 t=15 t=20 t=25 t=40 t=50 SIM2002 21,278.48 18,516.65 15,891.98 13,495.11 10,737.45 4,649.763 1,759.868 Weibul 22.218.74 19.370.08 16.637.88 14.086.61 11.225.09 5.021.374 1.818.434
Table 4. Quantile Reserve. Confidence level=95%, x=40 c= 1000
t=40 t=50 t=5 t=lO t=15 t=20 t=25 SIM2002 21,285.61 18,522.31 15,896.32 13,498.28 10,739.62 4,650.199 1,759.966 Weibul 22,226.47 19,376.21 16,642.60 14,090.07 11,227.46 5,021.893 1,818.523 LeeCarter 23.816.98 21,210.78 18,839.69 16,726.03 14,199.61 10,524.92 4,698,106 Table 5. Quantile Reserve. Confidence level=99%, x=40 c=1000
We see that, if the projection degree increases, the quantile reserve for both the considered confidence levels increases. This trend is stronger in the case of the Lee-Carter table and when the t,iine of valuation is high. Finally, all the numerical results shows that, in a solvency perspective, it is
Measuring Demographic Uncertainty via Actuarial Indexes
129
opport,iiiie to take into account context,iially different kinds of risk indexes. in particular referred t o the uncertainty in t h e choice of survival tables a n d to t h e quantification of t h e fiuancial position for different tables. In fact, t h e risk index i n (4)provides a measure of t h e overall effect of t h e uncertainty i n the choice of t h e deiriographic scenario. whilst t h e evolution i n time of t h e quantile reserves is a concrete indicator, in terms of fiiiancial results, of the a.bove choice.
References [Ballottaet al., 2OOG]L. Ballotta and S. Haberinan The fair valuation problem of guaranteed annuit,y options: The stochastic niortality environment case. Insurance: Mathematics and Economics, 38, 2006. [Cocozzaet al., 200Ga,]R. Cocozza and E. Di Lorenzo Solvency of life insura.nce companies: methodological issues. Journal of ilctuarial Practice, 13:81-101, 2006. [Cocozzaet al., 200GblR. Cocozza, E. Di Lorenzo A . Orlando and M. Sibillo In ProThe VaR of the Mathematical Provisions: critical issues. ceedin.gs of the In.tem,ation,a.l Conjeren.ceof Actuni-ies, 2:95-102, 2004. 1ittp://www.papers.ica200G.coiii/Papiers/2002/2002.pdf, 2OOG. [Coppolaet ul., 2002lM. Coppola, E. Di Lorenzo and h l . Sibillo Further Remarks on Risk Sources Measuring in the case of a Life Annuity Portfolio. Journal of Actuarial Practice, 10:229-242, 2002. [Coppolaet al., 2003]M. Coppola, E. Di Lorenzo and M. Sibillo Stochastic a.nalysis in life office managenlent: application to large annuity portfolios. .4pplied Stochustic Models in Business u71d In&ustry, 19:31-42, 2003. [Coppolaet al., 2OOGa]M. Coppola, V. DAniato, E. Di Lorenzo, M. Sibillo Risk measurement and fair valuation in the life insurance field. InEcople: from tradition t,o complexinty, Ca.pri, 2006. [Coppolaet al., 200Fb]M. Coppola, V. DAniato, M. Sibillo Fair value and demograpliic aspects of the insured loan. InProcee&inys of 1 Uth Intel,national Congress on Insurance: Mathematics and Economics Leuven, http://www. kuleuven. be/ime2U06/abstract.php ?zd=80. [Di Lorenzoet al., 2002lE. Di Lorenzo and M. Sihillo Longevit,y risk: measurement and application perspectives. InProceedings of the I 1 Conference in Actuarial Scie71ce, Sarnos, 2002. [Olivieri , 2001lA.M. Olivieri Uncertainty in niortality projections: an actuarial perspective. Insurance: Mathematics and Economics, 29:231-245, 2001. [Olivieriet ak., 2003lA.M. Olivieri and E. Pitacco Solvency requirements for pension annuities. Journal of Pension Economics and Finance, 2: 127-154, 2003. [Orlandoet ul., 20061A. Orlando and M. Politano Further remarks on risk profiles for insurance participating policies. InProceedings of the Ma4F Conference, Salerno, 2006. [Pitacco , 2004lE. Pitacco Sun~icralm.odels in a dgnnmic context: a surtieg. Insurance: Mathematics and Economics, 35(2):279-298,2004,
Location as risk factor Spatial analysis of an insurance data-set Ildikb VitBz' Department of Probability Theory and Statistics Eotvos L o r h d University P b m h y P. stny. 1/C Budapest, Hungary (e-mail: vildikoOcs.elte.hu) Abstract. Our aim was to examine the territorial dependence of risk for household insurances. Besides the classical risk factors such as type of wall, type of building, etc., we consider the location associated to each contract. A Markov random field model seems to be appropriate to describe the spatial effect. Basically there are two ways of fitting the model; we fit a GLM to the counts of claims with the classical risk factors and regarding their effects as fixed we fit the spatial model. Alternatively we can estimate the effects of all covariates (including location) jointly. Although this latter approach may seem to be more accurate, its high complexity and computational demands makes it unfeasible in our case. To overcome the disadvantages of the distinct estimation of the classical and the spatial risk factors proceed as follows: use fist a GLM for the non-spatial covariates, and then fit the spatial model by MCMC. Refit next the GLM with keeping the obtained spatial effect fixed and afterwards refit the spatial model, too. Iterate this procedure several times. We achieve much better fit by performing eight iterations. Keywords: GLM, insurance, Markov Chain Monte Car10 (MCMC), Markov
random field, Spatial statistics.
1
Introduction - Risk models with spatial components
Numerous models are known in the literature for estimating the spatial effect in various problems such as disease mapping, deliquency, or number of accidents [Aratb et al. (2004)], [Gilks, W. R. et al.]. Alongside the spatial effect, however, other influential variables may exist, and they can be quite different depending on the subject of study. For example age and sex are important factors in the case of a motor TPL insurance, while they are insignificant for household insurances. In the latter case type of wall and type of building seem to be important parameters of the contracting partner affecting the number of claims. Besides these more usual (hereinafter: classical) variables location can be an important risk factor worth including into the model. Basically, there are two ways of dealing with the classical and the spatial risk factors; we can estimate the classical ones first, and then fixing their values as if they were known, estimate the spatial effect, or we can estimate all effects simultaneously. Both methods work better than the ones ignoring territorial dependence when the considered phenomenon does contain a 130
Location as Risk Factor
131
spatial effect, but as the usual critic of the first method goes it is not reasonable to cope with the different variables separately, simultaneous estimation is more accurate. Though this latter concept seems to be more desireable but the realisation can be hampered. These models are usually several- level hierarchical Bayes models as spatial effects are generally choosen to be random variables with some prior distribution and using MCMC algorithm for estimating the parameters is often necessary. In this case too many parameters result in cumbersome full conditionals and enormous running time. To avoid this difficulty we suggest the following. Fit first a generalized linear model to the classical risk factors then keeping the received parameters fixed estimate the spatial effect. Return then to the classical risk factors and refit the GLM keeping now the spatial effect fixed, and then move on to the spatial estimation with the newly fixed GLM parameters. By iterating these steps much better predictions can be obtained and the disadvantages of the distinct estimation of the classical and the spatial risk factors can largely be eliminated. The paper is organized as follows. Section 2 describes our model. Section 3 presents the MCMC implementation. The results of the parameter-estimation are presented in Section 4. Section 5 tests the goodness of model choice by comparing the results from our model with ones obtained from some alternatives.
2
Model construction
Selecting the explanatory variables in the generalized linear model we found that: "time spent in risk", "type of building" (4 types), "type of wall" (4 types), "type of roof' (6 types), "type of tarif" (4categories), and "population size of the locality" (10 groups), (all of them are factor-type variables in the GLM) are the significant classical risk factors. As our aim is to analyse the effect of location we include the region of the contract into our model. We build up a hierarchical Bayes model, because this way we can incorporate our prior belief about the structure of the regional effects. At the lowest level of the hierarchy we suppose the counts of claims to comply with the Poisson distribution: yi N P d s ~ m ( t.iEi . A,<) where yi denotes the observed number of claims of client i during the insured period, Ei is the expected number of claims per day based on the non-spatial variables, ti is the number of exposure days i.e. the days spent in risk, Xk = exp(uk), where u k is the relative risk of region k, and ri denotes the index of the region that contract i belongs to. We used a generalized linear model for approximating the effect of the classical risk factors. Using the received parameters we calculated the Ei values of each client:
132
Recent Advances in Stochastic Modeling and Data Analysis
where @buizding is the vector of the effects of the different types of building, and bi denotes the type of building belonging to client i. The other notations can be interpreted similarly. The prior distribution of the relative risks of the regions - denoted by {ui, i = 1 , 2 , . . . ,m } , m being the number of regions - is given by a Markov Random Field model. To create a neighbourhoodsystem we regard two regions as neighbours if the distance between their centers is less than 35 kms. The choice of the distance ensures that all regions have at least one neighbour. Would it happen that no claims incur or no data of claims is available in a region without neighbour the estimated relative risk for that region would be zero, which is clearly a nonsense. The same situation arises if we use uncorrelated spatial effects. So, working with correlated regions has the advantage that neighbouring regions influence the relative risk of each other - which is a sensible assumption. In the Markov Random Field model the relative risk of region a is normally distributed with the averaged relative risk of the neighbouring regions as expected value and with variance which is an overall variance parameter u2 divided by the number of neighbours:
where the hyperparameter c7 - denoting the strength of correlation - is a random variable with inverse Gamma(a,X)prior distribution, 61 is the set of the neighbours of region i. The symbol [zly]denotes the conditional density function fx(zlY = y ) in continuous cases, and the conditional probability P ( X = zlY = y ) in discrete cases The joint prior distribution of the parameters ui is:
Using Bayes theorem we determine the joint posterior distribution of all parameters given the count of claims of the n clients:
-n n
[14,E, 4YJ
[Yi IE,141 . [QbI.I.[
i=l
and deduce the posterior distributions of the parameters. For the relative risk of the regions we receive the following:
CjEbi u j . The posterior distribution of the inverse of where Ui denotes the hyperparameter is:
Location as Risk Factor
133
Where K is a 168x168-dimension matrix, with elements: k,,, =I 6, I in the main diagonal, and I C , , ~ = -1 if region r and s are adjacent, otherwise k,.,8 = 0. Clearly the formula obtained for the posterior distribution of the relative risk is not the density function of a well-known distrinution so we use Markov Chain Monte Car10 simulation to get the expected values. Once we have the spatial effects we reestimate the effects of the classical factors fitting GLM, and then refit the spatial model, etc..
3
Implementation of the model
In our algorithm the first step is to fit Generalized Linear Model for the classical risk factors. For this we use software R: glm(y
-
of fset(log(t))+buiZding+wall+tari f +roof +pop, family = Poisson),
Using of fset(log(t)) guarantees that the coefficient of log(t) is 1, according to our prior assumption. This way we receive a first estimate of parameters
P. The second step is to estimate the relative risk of the regions using MCMC algorithm. Denoting the logarithm of the Poisson parameter of client i by vi we get the following expression:
Grouping the non-spatial parameters and the time together we can regard the expression ti = ti . exp(P0 p;rilding ptall /3rOof PpoP) Pi as a modified variable of time. Using this variable we get a simpler expression for ui: vi = log(t!,) u,;
+
+
+ &aTif +
+
+
In this step distinguishing the contracts just by the regions we can lower the dimension. Let’s summarize the modified version of times for each region and also the number of claims for each region. This way we can work with these vectors of only m elements: t l , y.: For determining the expected value of vector u we generate random samples from its density function and get an estimate by their mean. Using vector t: we receive a simpler formula for it:
In the Metropolis-Hastings algorithm we update the elements of vector u simultaneously using normal distribution as proposal distribution. In each
134
Recent Advances in Stochastic Modeling and Data Analysis
B is updated by a Gibbs kernel. After a burn-in period of 100 000 steps 800 000 further steps seemes to be enough to get the estimates.
step
In our process we iterated these two steps. Returning to the parameters taking the spatial effect into consideration. We refit the Generalized Linear Model the same way as we described before, modifying only the vector of times spent in risk with the effect of regions: ti = ti . ezp(ur,). Eigth steps of iteration seemes to be enough to get good estimates.
p we get new estimates for them
4 Data and results We had a household insurance data-set containing 686 711 contracts. With the PO constant and the hyperparameter B we have to estimate the value of 200 parameters. Our main purpose was to study the spatial effect in household insurances, we plotted the relative risks in Picture l.,and the joint effect of the population and the regions in Picture 2. In the first picture light grey denotes the regions with negative relative risk, the centers of regions with positive relative risk are coloured darker. In picture 2. lighter grey means smaller relative risk. A coloured version of the figures can be found on webpage www.cs.elte.hu/ vildiko.
17
18
19
25
21
22
srbQ
Fig. 1. Relative risk of the regions
For B we received 0.35, which means that there is quote strong correlation between the neighbouring regions.
Location as Risk Factor
135
Fig. 2. Joint effect of the population and the region
5
Conclusion: Model performance and comparison
Assessing the goodness of the model is not obvious. Of course we want to mimic the real process and so our estimation is good if the model gives similar predictions to the real values. At the same time we choose to use Markov Random Field because we want the regions located near each other to have similar relative risk. Thus when estimating the relative risk of a region we take the neighbouring regions into consideration as well, so we can not expect a perfect fit. It is difficult to decide how to compare the different models and we did not find a standard concept in the literature either. One possible method is to calculate the normalized error of the estimate for each region:
where @i is the expected value of the number of claims of contract i. based on our model. Summarizing the square of these values we get: m
H =x h q , i=l
This is a kind of distance between the estimated and the real value, so the less this number is the better the fit is. In Table 1. we can see the values received for the different models. The first column concerns to the model ignoring the spatial effect, the second column is the measure of fit after just one iteration, finally in the third column we can see the measure of fit after 8 iterations.
I
I
[Number of iterations1 0 1 18 H 13561581541
Table 1. Measure of fit depending on the number of iterations
136
Recent Advances in Stochastic Modeling and Data Analysis
It became clear that spatiality does have its effect on this household insurance and it is worth including this effect in the model. We can see that by iterating the steps we can receive better fit.
6
Acknowledgement
The author expresses her gratitude to Miklos Arato for the guidance and the continuous attention he payed to her work.
References [Arat6 et al. (2004)]Arat6 N. M., I. L. Dryden, C.C. Taylor (2004). Hierarchical Bayesian modelling of spatial age-dependent mortality. [Denuit and Lang (2004)]Denuit, M, Lang, S. 2004: Non-life rate-making with Bayesian GAMs, Insurance: Mathematics and Economics, 35,627-647. [Dimakos and Frigessi di Rattalma (2002)]Dimakos,X.K., Frigessi di Rattalma, A., 2002: Bayesian premium rating with latent structure, Scandinavian Actuarial Journal, (2002), 162-184. [Gilks, W. R. et al.]Gilks, W. R., S. Richardson, D. J. Spiegelhalter, Markov Chain Monte Garlo in practice, Chapman and Hall, London. [Miirkus L. et al. (1989)]Mkkus, L., Arat6, M.: Geographical Dependence of Insurance Risk, Proceedings of the 11th Annual Conference of the International Association for Mathematical Geology, CD-Rom 1-6. [MolliB, A. (1996)]MolliB, A. Bayesian mapping of disease, in W. R. Gilks, S. Richardson, D. J. Spiegelhalter, Markov Chain Monte Carlo in practice, Chapman and Hall, London, pp. 359-376. [Robert, C. P., G. Casella (1999)]Robert, C. P., G. Casella, Monte Carlo Statistical Methods, Springer-Verlag, London
A Hierarchical Bayesian model t o predict belatedly reported claims in insurances JBnos Gyarmati-Szab6l and L h z l 6 MBrkus' Eotvos LorLnd University Dept. Probability Theory and Statistics Pdzminy P. stny. I / C H-1117 Budapest, Hungary (e-mails: [email protected], [email protected]) Abstract. Latent, that is Incurred But Not Reported (IBNR) claims influence heavily the calculation of t h e reserves of an insurer, necessitating an accurate esti-
mation of such claims. The highly diverse estimations of the latent claim amount produced by the traditional estimation methods (chain-ladder, etc.) underline the need for more sophisticated modelling. We are aimed at predicting the number of latent claims, not yet reported. This means the continuation the so called run-off triangle by filling in the lower triangle of the delayed claims matrix. In order to do this the dynamics of claims occurrence and reporting tendency is specified in a hierarchical Bayesian model. The complexity of the model building requires an algorithmic estimation method, that we carry out along the lines of the Bayesian paradigm using the MCMC technique. The predictive strength of the model against the future disclosed claims is analysed by cross validation. Simulations serve to check model stability. Bootstrap methods are also available as we have full record of the individual claims at our disposal. Those methods are used for assessing the variability of the estimated structural parameters. Keywords: hierarchical Bayesian model, IBNR claims, insurance, MCMC.
1
Introduction
The financial balance of an insurance company may be seriously endangered by latent claims that in fact incurred but, for the time being, are not known for the company, as they have not yet been reported (IBNR claims). To guarantee secure operation, the insurer accummulates reserves, a part of which is meant to cover the IBNR claims. Consequently, it is of capital importance in the insurance business t o predict the number and amount of such claims correctly. The traditional estimation methods (chain-ladder, etc.) based on numerical computations notably lack sophisticated stochastic modelling, and produce highly diverse estimations [2]. This provides a clear motivation for us t o build up a stochastic model t o estimate the number of IBNR claims. Then, taking into account the usual claim amount (dependent, in general, upon the delay time), the necessary reserve can be estimated by simulation. The analysed monthly data of IBNR claims is organized into the so called monthly Run-off Triangle (ROT).It is 137
138
Recent Advances in Stochastic Modeling and Data Analysis
a matrix, where only the triangle above the semi-diagonal is filled in with values. The t-th element in the s-th row contains the number of claims occurred in the s-th month and reported t months later. We suggest a hierarchical Bayesian model for predicting the values below the semi-diagonal that correspond to the latent claims, not yet known. On the first level of model hierarchy the s , t elements of the run-off triangle are supposed to be distributed as Poisson with an intensity parameter On the second level the X,,t intensities are decomposed into the product X,,t = a, . ,& of a timedependent gamma-distributed mixing variable a, for the occurrence, and a decreasing trend plus noise & for the delay time series. The resulted interdependent intensities create the probabilistic dependence of the entries of the run-off triangle. We consider the time dependent parameters of the gamma mixing variable as a trend seasonal component. According to the Bayesian setup on the third level of hierarchy we put priors on the coefficients as well as on the frequencies in the trigonometric polynomial representation of the seasonal component. The complexity of the model requires an algorithmic estimation method, based on the MCMC technique. From the application point of view the most important aspect of model adequacy is the accuracy of the predictions of latent claims revealing in future, that we analyse by cross validation. Simulations serve to check for model fit and stability. Bootstrap methods are also at hand to test the stability of the estimated parameters as beyond the ROT we have the full record of the individual claims at our disposal. Note that bootstrapping otherwise would not be possible for the accumulated counts of the belatedly reported claims, as the the cells in the runoff triangle does not have identical distribution.
+
2
Construction of the Hierarchical Bayesian model
According to our model, outlined in the introduction, the cells of the ROT are realisations of an ensemble of conditionally Poisson distributed variables, whose interdependent intensity parameters are the sole source for probabilistic interdependence. In other words the distribution of the cells are conditionally independent, given the Poisson intensities. The assumption that the counting process of claims numbers in a specified contract has independent and stationary increments provides the heuristic background to this assertion. However, the intensities of those processes vary with contracts. Were the intensities known, the aggregated number of claims appearing in the cells of the ROT would also obey Poisson law. As the intensities are not known we suppose them changing randomly according to a gamma distributed variable. This way the cell values are gamma mixed compound Poisson variables, so we end up with negative binomial distributions. For the description of the joint distribution of the cells the dependence upon occurrence and reporting of the gamma distributed intensity parameter has to be specified. In our conjecture, the occurrence time influences the shape parameter of the gamma
Hierarchical Bayesian Model to Predict Belatedly Reported Claims
139
distribution, whereas the reporting tendency appears as a scaling factor. This assumption is in good harmony with reality, as on the one hand in different months/seasons different causes for claim occurrence get greater emphasis changing the character of the distribution. On the other hand when twice as many claims occur in a given month then, most likely, roughly twice as many will be reported with a given delay, pointing to the scaling character of the reporting tendency. In line with this conjecture the A,,t random intensity parameters are decomposed into the product of two components as X,,t = a, .,i3t with a T(6,, p ) distributed a , and a positive scaling constant ,Ot dependent on occurrence time s and delay-in-reporting time t , respectively. The time-dependent 6, shape parameter of the a, component consists of an exponentially decreasing trend plus seasonal fluctuations representing the claim-occurrence dynamics. To preserve positivity of 6, set
6s -- ul
+ ,az+a3's
+
,c",=,
Ck C O S ( 2 T . S ' @ k ) + ~ 3 k = 1 Sk
,
sin(2T.s'@k).
s=1...54 (1) were ( a l ,a2, u3, c1, c2, c g , s1, s2,s3, @ I , Q 2 , Q 3 , p ) unknown model-parameters and @k denote the frequencies corresponding to the seasonality. Parameterising the gamma distribution this way the seasonality and trend affects both the expected value and the variance. A careful study of the periodograms of ROT columns justifies the choice of just three different frequencies. Reversible jump MCMC would allow for setting the number of frequencies as a further structural parameter but the necessary computations would become much more cumbersome, and the added increase in precision is dubious a t the very least. The change of the scaling factor pt in (reporting-)time produces a positive polynomially decreasing trend in t. We specify the pt component as ,Lit = bl b2 . f ( t ) where f ( t ) is the normalised column average in the ROT, decreasing roughly as a third order polynomial and ( b l , b 2 ) are unknown positive coefficients. Since ( b l ) affects all values in the ROT, i.e. acts as an overall scale, and p is the scale parameter of the a, component, for the sake of unique parameterisation we must set p = 1. For the same reason the coefficient of the seasonal component is chosen to be equal t o 1. These choices guarantee the positivity of the intensity parameter At,,.
+
2.1
The MCMC based estimation for the parameters
Our aim is to estimate the structural parameter vector 0 = ( u ~ , u ~ , u ~ , c s~l ,,s2, c ~s 3, , c@~l r ,@ 2 , @ 3 b, l , b z ) on the basis of the ob-
+
~ E ~ of the ROT, where S := { s , tls t 5 55, s = served { X s , t } ~ s , telements 1,.. . ,54, t = 1, ,54}. Though it is possible to give a formula for the likelihood, the high complexity of the model makes it unfeasible to maximize. In the absence of a random component in the the Poisson intensity i.e. when A,,t = a, . pt and the dynamical structure of a, is given by the deterministic function 6, alone it is possible t o derive the maximum likelihood estimator. The procedure in ' ' '
140 Recent Advances in Stochastic Modeling and Data Analysis
this simplified case has very much in common with the median-polish kriging technique of spatial statistics. Even the computation of the full conditionals is hopeless. So our only option seems to be an algorithmic MCMC estimation with metropolis hastings steps for all the parameters. To implement the MCMC estimation, one needs to sample from the posterior distribution
f(Ol{XS,d)
f ( { X S , t }I@) f (o)
Here the first term is the likelihood, defined as
where N B denoting the distribution function of the negative binomial distribution. The second term f ( @ ) is the joint prior distribution of the structural parameters, defined below. The prior distributions are ( ~ l r a 2 r a 3 r C l , C 2 r C 3 , S l , s 2 , ~N(O, 3 ) 2)
Beta(abkr&.), i ( h b2) r ( Q g k 47,)
@i
N
1
1
=
1 '" ,
1
c = dzUg(o&l,
...,g
3
k = 1,2
where N stands for the normal distribution, while Beta denotes the beta and r the gamma distribution and the priors for the different parameters are independent. The hyperparameters ( a ( i ) , &, ag,, As, ); i = 1, . . . 9.,.I= 1 , 2 ; k = 1,.. . , 3 are determined in advance in a non-informative manner and they are not updated. E.g. the beta distribution is fairly flat except of the ends of the intervals where it diminishes reflecting that the observed sample does not provide information on very high and very low frequencies. We now have everything at hand to calculate the likelihood, and prior ratios for the MCMC. The proposed values of ( a l ,a2, a3, q ,c2,c3,sl,s2,s3) are sampled from a normal distribution centered at the current value of the chain with some fixed (small) standard deviation. As the normal distribution is symmetric, calculation of the acceptance probability in the Metropolis-Hastings step is very simple: 1
where
51 = al, 5 2 = a2, x2 = a3, xq = c1, x5 = c2, X 6 = c3, 5 7 = s1,
'
xs = s2,
xg = s3.
Turning to the frequencies they have values in [0,27r]so the priors of @1,2,3 should be concentrated onto the [0,1] interval. The proposed replacement values @;,2,3 of the @1,2,3 frequencies are created by the strictly monotonous
Hierarchical Bayesian Model t o Predict Belatedly Reported Claims
141
inverse-logistic transformation (x H -) of a normal random walk updating similar to the previous one. An advantage of this choice is that the proposal ratio is 1 again. The acceptance probabilities are
where
x 1 = @I, 5 2 = @ 2 , 2 3 = @3.
Values for updating
(bl , b2)
are proposed by letting log
[-i1
(k), log ( k )be
$1. Having thus 1 for proposal ratio sampled uniformly on the interval the acceptance probabilities turn out to be
where X I = bl, 5 2 = b2.
3 3.1
Application to motor insurance data The Data
We have third party liability motor insurance data at our disposal, registered with monthly frequency for 54 months in the period of January 2000 - June 2004. It contains the number of claims and the time-delay between occurrence and reporting for each contract. We discussed the concept of the run-off triangle previously. We create the ROT for this data, with known values above the semi-diagonal of a 54x54 quadratic matrix and fit our model by using the described MCMC technique. According to the usual (e.g. Geweke [l])tests ( for further methods checking convergence see [3] and references therein), the chain reaches stationarity after the first 20.000 updates that we regard as burn-in. Reliable estimation of the structural parameters have been achieved within 100.000 iteration steps. 3.2
Confirming model fit
It is important to ascertain that the model describes well the observed phenomenon. For this purpose we first compare simulations from the fitted model with the observed sample. Our model provides a straightforward way to simulate the values of the ROT, and that is just what we do 100 times. Then we compare the first four columns in the simulated and observed ROT, just by visual inspection. Figure 1 displays both the observed and simulated values for the first two columns. Then we simulate 10000 ROTS, average the cells and show
142 Recent Advances in Stochastic Modeling and Data Analysis
g b r r 7 - l 0
10
20
30
40
50
0
10
0
10
20
30
20
30
40
50
40
50
secod column
f i ~mlurnn l
40
50
0
10
20
30
second column
Fig. 1. View of the simulated and observed 1st and 2nd coluni~isof the ItoT
the means and the 95% confidence bounds in comparison with the observed values. We can conclude by the figure that our estimation captures the main characteristic of the triangle. The descriptive statistics given in table 1 strengthen our claim that we got rather accurate estimations for the first four column.
Table 1. Descriptive statistics of observed and simulated valucs i n the ROT
Of course these observations are not enough by far to ascertain us about the appropriateness of our method to predict the unknown values in the ROT.
Hierarchical Bayesian Model to Predict Belatedly Reported Claims
143
Therefore, in what follows we discuss further methods in order to get more quantitative information on the goodness and stability of the model fitting.
3.3
Cross-validation and bootstrapping
One of the aspects the performance of our model has to be analysed is the predictive strength. Obviously, that is of vital importance from the application point of view. A commonly accepted method for assessing the predictive strength is cross-validation. In its implementation we disregard of the data in the triangles rows and columns after the 42 months (in both occurrence and reporting). That is we cut the ROT at its 42nd row and 42nd column. Consider this "new" object as a 42x42 ROTwith unknown values under the diagonal and proceed with our modelling. Predict the lower triangle from the smaller model consider only those cells for which observations are available from the original data and compare the predicted values with observed ones. Because of the varying distributions of the cells direct comparison is not very informative so we compare the sum of predicted and observed claims counts for the cells in question. The sum of predicted claims turns out to be 493 whereas the sum of observed is 485 which is rather good, but it does not reflect the variability among cells. In what follows we assess variability by bootstrapping. Remark here that the length of the truncation is chosen in accordance with the observable periodicities. Of course we can use other truncated ROT to carry out cross-validation, and we in fact analysed the 36 and 48 months cuts as well, with similar results. In general any effort for resampling straight from the ROT encounters difficulties because, containing accumulated claims counts, every cell has its own unique and unknown distribution, so cells are irreplaceable. To overcome this difficulty, one has to resample from the contracts and the associated claims counts. For this a contract by contract full record of claims and their reporting is needed that, fortunately, we have at our disposal. So, we can generate bootstrap claims and delays and then create a new simulated run-off triangle as the bootstrap sample of the ROT.This way we can get information about the sensitivity on sample variability of the estimated model parameters. We can amalgamate the previous two methods. Take a bootstrap sample from the run-off triangle, cut it into eg. a 42x42 one, predict the lower triangle and compare the predicted values with the known bootstrap ones. By this process we can measure the predictive strength of the model including the uncertainty inherent from parameter estimation. We present the result of the amalgamated methods obtained from ten bootstrap samples with the cut at 42 rows and columns. Figure 2 displays for ten bootstrap samples the sum of the observed and predicted values under the semi-diagonal. The importance of this quantity is obvious as it is the overall number of latent claims.
144
Recent Advances i n Stochastic Modeling and Data Analysis
n
P
Fig. 2. The sui~iof estimated values plotted against observcd oms in crossvalidation for ten bootstrap saniplcs
4
Discussion
In this paper we introduce a new stochastic model t o estimate the number of IBNR claims and present an MCMC-based method for estimating its parameters. It may seem t o be reasonable t o alter the dynamic structure of occurrence into a n autoregressive model driven eventually by a gamma distributed noise. We intend t o carry out research towards this direction. Clearly the most important characteristics of the latent claims from the insurer point of view is the overall amount of those claims. There exist methods for estimating the individual claim amount depending on the delay in reporting. Combining our prediction of the unknown values of the ROT with these methods by simulation it is possible to obtain a n estimate for the overall amount of claims.
5
Acknowledgement
T h e authors wish to thank Mikl6s Arat6 for the useful discussions on the topic and his valuable remarks on our work. This research was partially supported by the Hungarian National Research Fund OTKA, grant No.: T047086.
References [l]Geweke, J., 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: Bayesian Statistics 4.0xford University Press, Oxford, pp. 169193. [2]T. Falukozy, 1.1. Vitbz, M. Arat6: Stochastic models for claims reserving in insurance business [3]Salaheddine El Adlouni, Anne-Catherine Favre, Bernard Bobe: Comparison of methodologies to assess theconvergence of Markov chain Monte Carlo methods (Computational Statistics and Data Analysis Vo1.50 (2006) 2685 - 2701)
CHAPTER 4 Stochastic Modeling for Healthcare Management
Non-homogeneous Markov Models for Performance Monitoring of Healthcare Sally McClean', Lalit Garg', Brian Meenan', and Peter Millard' University of Ulster, Northern Ireland, UK (e-mail: si .mccleanCmlster.ac .uk) University of Westminster, London, UK (e-mail: phmillardQtiscali. co .uk)
Abstract. Markov chain modelling has been previously used for hospital and community care systems, where the states in hospital care are described as phases, such as acute, rehabilitation, or long-stay and likewise social care in the community may be modelled using phases such as dependent, convalescent, or nursing home. This approach allows us to adopt a unified approach to health and community care modelling and management rather than focusing on the improvement of part of the system to the possible detriment of other components. We here extend this approach to show how the non-homogeneous Markov framework can be used to extract various metrics of interest. In particular, we use time-dependent covariates to obtain the mean and variance of the number of spells spent by a patient in hospital and in the community, and the expected total lengths of time in hospital and in the community. Keywords: Non-homogeneous Markov Models, Healthcare Performance Monitoring.
1 Introduction The cost of healthcare is increasing and, in addition, since there are escalating proportions of elderly people, the problem of their care is becoming increasingly important. A systems approach to healthcare planning is necessary to facilitate understanding of the process and develop a holistic method for management, monitoring and performance measurement of healthcare systems. Healthcare planning should therefore include care in the community as well as care in hospital, otherwise policies may lead to an improvement in hospital care at the expense of other components of the system. Hospital patients may be thought of as progressing through phases such as acute care, assessment, diagnosis, rehabilitation and long-stay care. Similarly, once patients are discharged to the community they progress through phases such as dependent, convalescent, or nursing home. Such processes may be modelled using phase-type distributions [Faddy and McClean, 19991, which describe the time to absorption of a finite Markov chain in continuous time, when there is a single absorbing state and the stochastic process starts in a
146
Performance Monitoring of Healthcare
147
transient state. In addition, covariates may be incorporated into the models, thus further increasing their ability t o describe complex healthcare processes. In this paper we model stay in hospital and stay in the community as two separate phase type-distributions with transient states being phases of care in hospital and the community respectively and death being an absorbing state. Transitions can occur from all hospital phases to the first coininunity phase, representing discharge, and from all community phases to the first hospital phase, representing admission. Transitions may also occur from all transient states t o the absorbing state, death. A non-homogeneous Markov representation is used to incorporate time-dependent covariates thus improving realism of the model.
2
The Model
We have previously [Faddy a n d McClean, 1999,Marshall and McClean, 20031 and [McClean and Millard, 20071 modelled the movement of patients through a hospital using a Coxian phase-type model where we consider a system of n + l states (or phases) and a Markov stochastic process { X ( t ) t; 2 0} defined according to transition probabilities: P { X ( t + bt) = 2 + l l X ( t ) = 2 ) = X i b t
+ o(St),
(1)
+ o(6t),
(2)
f o r i = 1 , 2 , ..., n - l a n d : P { X ( t + 6 t ) = n + I l X ( t ) = i} = piSt
for i = 1 , 2 , . . . ,n. Here XI. X 2 , . . . , Xnpl describe sequential transitions between hospital phases 1 , 2 , . . . , nand p 1 , p 2 , .. . .,pndescribe transitions from and (at least) phases 1 , 2 , . . . ,n to phase n 1 (Figure 1). If X I , X2, . . . , X,-1 p n are all positive then phases 1 , 2 , . . . ,n are transient and phase n f l (death and discharge from hospital) is absorbing. Writing the vector:
+
p = ( l O O ... 00):
(3)
and matrix
&=
(4)
then the time spent in the transient phases, after starting in phase 1, before absorption has probability density function:
f(t) = p e x p ( Q t ) q
(5)
148
Recent Advances i n Stochastic Modeling and Data Analysis
where: Q = ( P ~ ~ P .Z' ,, . ~ n - l , ~ n ) ~ (6) Given data on lengths of stay, the parameters A1 Az., . . . , X,-1 and p1 1 1 2 , . . ., pn-l can be estimated by maximum likelyhood using the form of the density ( 5 ) ;starting with n = 1 phase, n call be increased until a.n a.dequa.te fit is obtained. Such an approach may also be used to model social care in the com~
Fig. 1. The Coxian Phase-type Model
munity [Xie et al., 20051 and transitions between hospital and community components of the systcm [Taylor et al., 19981, [Faddy and McClean, 19991 and [Faddy and McClean, 20051 where we define in additional community phases, with a1, ag, . . . ,am-l describing sequential transitions between community phases I, 2,. . . ,m and ,31,32,.. . , 8 , describing transitions from phases 1 , 2 , . . . ,m to n+m+ 1 (death). In addition we represent transitions between hospital phase i and the community phase 1 by vz : i = 1 , . . . , n and transitions between community phase i and the hospital phase 1 by -ft : i = 1,.. . ,m The whole system (hospital plus community) may then be represented by the matrix:
&=
71
0
0
YZ
0
0 '
0
Ym
0
0
0
u1
0 0 '
0
vz
0 0 -
0
v.,
0 0 -
0
. -(a1+B1+71)
,
0
. .
0 0
'
-(Pn+Ym)
where, A, = a , = 0. Then the time spent in the transient phases, having started in phase 1 (admission t o hospital), until absorption (death) has probability density function, as before where now: 4 = ( P I ,~
. , Pn- 1 , /In,fi1,
2. .,
~
,
2 . .,. ~ m 1 -
pmIT.
(7)
Performance Monitoring of Healthcare
149
We now define the matrix A , corresponding to the transient states of the embedded Markov chain, representing the next transition between states of the continuous time model presented above. Here, A = { a i j } where: aij = Prob (next transition is to state j I currently in state i) and, 1
(*lfPl+"ll (X2+!2+4
0
0
' 0
O
0
' 0
. .
A=
is
Then, the expected number of entries t o state j given initially in state i where, N = { n i j } is given by:
vij
N
=
(I
~
A)-'
(8)
Formulae for the corresponding variances and expected total times spent in hospital and the community [Iofescu, 19801 can similarly be derived and were presented in our previous paper [McClean et al., 20061. An example of such a system is presented in Example 1.
Example 1. Four hospital states and three community states: This model has been previously fitted to da.ta [Fa.ddy a.rid McClea.11, 2005,Milla.rd, 19911 and is illustrated in Figure 2. The data were described in [Millard, 19911.
3
The Non-homogenous Markov model
An important aspect of such an approach is the incorporation of covariates into the models so that we can take account of significant heterogeneity between pat,ients, caused, for example, by differences between patients due to gender, age, year or socio-economic factors. Such differences may be part,ly addressed by stratification, where we separately model groups of patients with different cova.ria.tes, e.g. we ma.y fit different niodels for ma.le and female patients. However time dependent covariates, such as age, require a different approach. In such cases we now develop a time-heterogeneous Markov model where the parameters (of the matrices Q and A ) are updated every time a patient makes a transition. This can be achieved by having the transition rates Xi, pi, C Y ~and ,& depend log-linearly on covariates. Such dependency parameters have previously been estimated by maximum likelihood [Faddy and McClean, 19991. The update time can then be represented
150 Recent Advances in Stochastic Modeling and Data Analysis
Fig. 2. The Health and Social Care Markov System
by mean time t o make the corresponding transition from hospital to community or vice versa. Dependence on covariates X = ( X 1 X 2 . . . X,)T can thus be incorporated into the model by having the transition rates X i , p i j u i , c l i , pi and yi take the form: (9) exp(a + bTX) with coefficient parameters a and b estimated for each of the transition rates [Faddy and McClean, 19991. For the data analysed here, there are two time dependent covariates: z1 = patients age at admission (to hospital or community care) and 2 2 = year of admission. Data were also available on the different events that terminated the patient's periods of care: for hospital these were discharge or death, and for community they were re-admission t o hospital or death. Given this iiiforinatioii and a fit,t,e.dphase-type distribution for the preceding period of care, probabilities H i j for event j from phase i can be obtained by conditional maximum likelihood [Faddy and McClean, 20051. In this way, estimation of these B i j is carried out after estimation of the parameters Xi, p i , vi,c y i , ,3i and "/i of the phase-type distribution of the time in hospital and community care, respectively. Covariate X = ( X 1 X 2 . . . X,)T dependence in the R i j probabilities can be included by putting:
and estimating parameters a and b for each B i j (i = 1 , 2 , . . . ,number of phases, and j = 1 , 2 , . . . ,number of events) [Faddy and McClean, 20051. In order t o implement time-dependence in the covariates we update the transition matrix every time there is a discharge or admission to hospital. The
Performance Monitoring of Healthcare
151
parameters Xi, p i , vi,ai, and ~i are then recalculated with the new updated age and updated year as covariates. Using these updated values of parameters, we recalculate the matrix A . Also, for each admission (or re-admission), we calculate the expected total time spent in hospital and for each discharge we calculate the expected total time spent in the community. This expected total time is then used to update the age and the year after each admission (or re-admission) t o hospital and each discharge to the community. For each spell in hospital, we then calculate the probability of the spell ending with discharge t o the community as 7rd and the probability of the community spell ending with re-admission to hospital as 7rr respectively, given by: ~d=(lOO...O)(I-Ai)-lbl~ , = ( 1 0 O . . .O ) ( I - A g ) - l b z Here A1 and A2 are sub-matrixes of A given by:
(*)
A=
and b l and b 2 are column vectors. The probability of a patient, initially admitted t o hospital, surviving to eventual re-admitted to hospital, is then: ~ T T ,
(12)
= TdnT
In the case of the 7 transient state model presented in Example 1, we therefore obtain:
1 + { ( X 1 + PxL : f V 1 ) * 1 + { (X1+P:+vl) X * (Xz+/l2z+vz) X 1 + { (xl+P:+vl) * A * X * ( X 4 + v/ 2 + U 4 ) > = a15 + a12 * a25 + a12 * a23 * a35 + a12 * a25 + a12 * a23 * a34 * a45
7rd=
{ (X1+;:+vd
(XZ+;:+Yd
*
(X3+G v+V3)
(Xz+llZz+v2)
(X3+2+v3)
and T?-=
*
{ ( a * + X + y d 1+ { ( a l + f j : + Y l )
(X3+P y3 3+W)
= a51
* ( a z + X + r z ) 1+ {
(al+fj:+71)
* (az+b:+rz) "
1
+ a56 * a61 + a56 * a67 * a71
We then calculate the expected (mean) number of admission/ re-admission t o hospital Ad,,,,
=1
+ 7rL1) + (2 *
7rL1)
* 7rb2))
+ . . . + ( n* 7rbl) * T?) * . . . * T?)) + .
,
.
and expected (mean) number of discharges to the community. Dismean= 7ry) + (2 * T ; ' )
* T?')
+ . . . + ( n* ~6') * 7rb2) * . . . * T?)) + . .
,
where np) is the probability of surviving both of the nth hospital and is the probability of discharge from the nthhospital community spells and spell. Next we calculate the expected (mean) time in hospital:
7rg)
Recent Advances in Stochastic Modeling and Data Analysis
152
Similarly, the expected (mean) time in the community after the first admission to hospital is: TCmean
=
.+(n*Ty**cnp*.. .*7p*TTz("))+..
(7i~)*TT,('))f(7ibl)*~~)*TT2(2))+. .
where 7rP) is the probability of re-admission to hospital after the nth spell. TT,(n)is the expected time in hospital at the nth admission and TTin)is the expected time in the community after the nth discharge from hospital. Also, the variance of the number of admission to hospital is given by:
Vhos= Ci{(i- Admean)' * p i p ) *
* . . . * pip-') * pip'}
where i is the admission number: i = 1 , 2 , 3 , .. . . Similarly the variance of the number of discharge t o the community is given by: Vdis = Ci{(i-
Dismpan)' * pi:')
* . . . * pi:-') * pijl"')
We here note that all the above computations are terminated when the probability of surviving the previous spells becomes very small, in this case or less.
4
Results
We now obtain results for example 1, using parameters estimated from [Faddy and McClean, 20051. The corresponding results are presented in Table 1. We here compare our previous results [McClean et al., 20061 with the results from our new approach, which incorporates time-dependent covariates, namely age and year. From Table 1 we can see that incorporating time
Table 1. Results for Example 1
dependent covariates slightly increases the mean and variance of the number of admissions to hospital and number of discharges. There is a substantial
Performance Monitoring of Healthcare
153
decrease in t h e number of days spent in hospital a n d t h e community respectively, corresponding t o higher death (absorbing) probabilities in all cases. This is to be expected as older patients are more likely t o die during a spell in hospital or back in t h e community t h a n younger patients. Our new nonhomogeneous Markov model is therefore likely t o be more realistic t h a n our previous approach.
5
Conclusions and Further Work
We have described a n extension t o previous work that allows us t o comp u t e key performance measures for t h e whole patient care system, including b o t h hospital a n d community components. Such a n approach is particularly important for assessing the effectiveness of geriatric care systems, which typically include significant components of b o t h hospital and community care. By including time dependent covariates, such as age a n d year, a n d utilizing a corresponding non-homogeneous Markov model, we are able t o develop a more realistic model that describes key metrics.
References [Faddy and McClean, 1999lM.J. Faddy and S.I. McClean. Analysing data on lengths of stay of hospital patients using phase-type distributions. Applied Stochastic Models in Business and Industry, 15:311-317, 1999. [Faddy and McClean, 2005lM.J. Faddy and S.I. McClean. Markov Chain Modelling for Geriatric Patient Care. Methods of Information in Medicine, pages 369-373, 2005. [Iofescu, 1980lM. Iofescu. Finite Markov Processes and their Applicatzons. Wiley, 1980. [Marshall and McClean, 2003lA. H. Marshall and S. I. McClean. Conditional phasetype distributions for modelling patient length of stay in hospital. International Transactions in Operational Research, 10(6):565-576, 2003. [McClean and Millard, 2007]S. McClean and P.H. Millard. Where to Treat the Older Patient? Can Markov Models Help us Better Understand the Relationship Between Hospital and Community Care? Journal of the Operational Research Society, 58(2):255-261, 2007. [McClean et al., 2006]S. I. McClean, M. J . Faddy, and P. H. Millard. Using markov models to assess the performance of a health and community care system. cbms, 0:777-782, 2006. [Millard, 19911P. H. Millard. Throughput in a department of geriatric medicine: a problem of time, space and behaviour. Health Trends, 24:20-24, 1991. [Taylor et al., 1998lG.J. Taylor, S.I. McClean, and P.H. Millard. Continuous-time Markov Models for Geriatric Patient Behaviour. Applied Stochastic Models and Data Analysis, 13:315-323, 1998. [Xie et al., 2005lH. Xie, TJ Chaussalet, and PH Millard. A continuous time Markov model for the length of stay of elderly people in institutional long-term care. Journal of the Royal Statistical Society Series A (Statistics in Society), 168(1):51-61, 2005.
Patient Activity in Hospital using Discrete Conditional Phase-type (DC-Ph) Models Adele H. Marshall', Louise Burns', and Barry Shaw'
' Centre for Statistical Science and Operational Research (CenSSOR) David Bates Building Queen's University Belfast BT7 INN Northern Ireland, UK (e-mail: a.li.marshall~aub.ac.uk)
Abstract. Previous research introduced Conditional Phase-type distributions (C-Ph) consisting of a Coxian phase-type distribution conditioned on a BN where the phase-type distribution represents a continuous survival variable, the duration of time until a particular event occurs and the BN represents a network of inter-related variables. The C-Ph model has proved to be a suitable technique for modeling patient duration of stay in hospital characterized by the patient's characteristics on admission to hospital. This paper expands upon this technique to form a family of DC-Ph models ofwhich the C P h is a member and applies the models to modeling patient stay in hospital based a set of inter-related discrete covariates. Keywords: Coxian phase-type distributions, length of stay, Bayesian networks, survival.
1
Introduction
The Conditional Phase-type model (C-Ph) was previously developed to represent a continuous survival distribution conditioned on a Bayesian network [Jensen, 20011 of inter-related variables [Marshall and McClean, 20031. Incorporated into the model is the Coxian phase-type distribution [Neuts, 19811, a special type of Markov model that considers the survival distribution as consisting of phases in a process which commence in a transient state or form, and move through successive states until finally terminating in a single absorbing state. This process component in the model is supported b y a network of variables which together represent any influences that other u s e h l data may have on the survival distribution. The research presented in this paper introduces the Discrete Conditional Phase-type models (DC-Ph) as an expansion ofthc C-Ph distribution. The C-Ph model was initially applied to the healthcare domain to represent the duration of stay of elderly patients in hospital based on patient information. This then provided potential for using the model as a management tool for identifylng patients who were most likely to have an extreme length of stay in hospital and as a consequence block the facilities or hospital bed to others. The DC-Ph will be demonstrated using similar data, this time for waiting times in an Accident and Emergency Department or Emergency room. This is a topic in need 154
Activity in Hospital using DC-Ph Models
155
of investigation particularly as it currently generates a huge amount of media attention.
2
The DC-Ph model
Discrete Conditional Phase-type models (DC-Ph) are a family of models capable of representing a skewed survival distribution as a Process component preceded by a set of related discrete variables that may be referred to as the Causal or Conditional component. The models possess the following characteristics (i)
Conditional Component comprising a graphical structure that captures the nature of the data by representing the various interrelationships between discrete variables.
(ii)
Process Component consisting of a Coxian phase-type distribution that represents the survival distribution as the time to absorption of a finite Markov chain in continuous time, when there is a single absorbing state and the stochastic process starts in a transient state.
COMPONENT
COMPONENT
Fig. 1. Illustration of the DC-Ph models 'Figure 1 illustrates the general form of the DC-Ph model comprising of the two components previously described. The second component, the Coxian phasetype distribution, is conditioned on the graphical model of variables in the first conditional component. There any many kinds of graphical model that could represent the conditional component, in particular, this paper focuses on the following; a Bayesian network, and a na'ive Bayes classifier.
156
Recent Advances in Stochastic Modeling and Data Analysis
2.1 Bayesian network coniponent (C-Ph model) The DC-Ph model uses a Coxian phase-type distribution conditioned on a Bayesian network (BN). The nodes in the network represent the variables, while the arcs indicate dependence and independence relationships between the variables [Jensen, 20011. Conditional probability tables are associated with each node and these parameters, along with the structure of the network, can be used to calculate any inference queries. Figure 2 illustrates the general form of the DC-Ph model with the Bayesian network as the conditional component.
DISTRIBWION
Fig. 2. DC-Ph model with Bayesian network component (C-Ph Model).
2.2 Na'ive Bayes classifier coniponent. The second of the two DC-Ph models uses a Coxian phase-type distribution conditioned on a naive Bayes classifier. The na7ve Bayes classification technique is based on Bayes Theorem where the posterior probability of an event occurring is calculated using some attributes or features in the data and the classification performed by selecting the event with the highest probability as the class in which the member of the data set should belong. As its name suggests, the na'ive Bayes method [Korb and Nicholson, 20041, also referred to as 'Idiot's Bayes', is easy to use and interpret, as highlighted from inspection of Figure 3 which illustrates the general form of DC-Ph model where the conditional component is a na'ive Bayes classifier. NAIVE BAYES CLASSIFIER
DISTRlBUTION
Phase 1
Phase2
Phase3
Fig. 3 . DC-Ph model with n a i k Bayes classifier component.
Activity in Hospital using DC-Ph Models
157
However, even though the nai've Bayes classification model is simple in nature, in many cases it can still manage to outperform more sophisticated classification techniques. As such it is considered to be particularly suited to situations where the dimensionality of the inputs is high. Nalve Bayes reduces a high-dimensional density estimation task to a one-dimensional kernel density estimation. Furthermore, the main criticism of the nalve Bayes classification method that it assumes independence between input features, does not seem to greatly affect the posterior probabilities, thus leaving the classification task unaffected.
3
Accident and Emergency Patients
The UK National Health Service (NHS) has received a large amount of media attention concerning their provision of care to the general public. Under particular scrutiny are the queues in which patients are waiting at Accident and Emergency (A&E) departments or Emergency Rooms and the resulting trolley waits they accrue. Upon receipt of their treatment, A&E patients either leave hospital and return home or they become an emergency admission requiring further medical care or attention during a stay in hospital. The trolley waits are the times that the emergency admission patients spend waiting in a hospital trolley, from the clinician's decision to admit (DTA) until they are allocated a hospital bed. The UK Government have now placed targets on these trolley waiting times in order to monitor the efficiency of hospitals with the view of motivating healthcare managers to improve the service ofhealthcare to the patient WHS, 20041. The DC-Ph models are applied to data taken from the NIRAES (Northern Ireland Regional Accident and Emergency System) database. The data set contains all new arrivals at a busy UK A&E department, over a 12 month period between 2005/06. In total, there are records for all 52,928 new patients presenting at the A&E department and thus does not include patients who have been called back for review. Patient information recorded include patient age, sex, arrival method, departure method, incident type, assigned priority code, an indicator variable on whether a patient should be admitted to hospital, that is, whether a decision to admit (DTA) was made or not and an associated trolley waiting time for all DTA patients.
4
Results
The focus of this research is to represent the trolley waits based on patient information using the two DC-Ph models previously discussed. The time that patients spend waiting for a hospital bed is considered by the process component of the model, the Coxian phase-type distribution whereas the patient information and how it inter-relates is considered by the first component of the model which either takes on the form of a Bayesian network or a na'ive Bayes classifier. The results ofwhich are discussed as follows.
158 Recent Advances in Stochastic Modeling and Data Analysis
4.1 A&E Bayesian network DC-Ph model Figure 4 illustrates the fitted model for patient trolley waiting times where the first component in the model represents the network of patient characteristics and how they inter-relate to determine the patient outcome of whether they receive a decision to be admitted to hospital. The second component then captures this waiting time for patients placed in trolleys awaiting admission to a specific hospital ward. Those patients who are DTA, that is, those requiring a hospital stay, are modeled using the second component of model the Coxian phase-type distribution described by the following probability density hnction of T, the random variable for trolley waiting time; F(t) = P exp {QtlqT (1) where
p, represents the rate of movement out of phase i, and A, represents the rate of movement from phase i to phase i+l with the following values p 1 -- 0 ooo033,p2 =0000015,p3 = 0 0 0 9 8 9 2 , ~=0001957 ~
+ = o o i 5 2 , x, = O O I S I , a3 = O O O S I
(5)
The diagrammatic representation in Figure 4 highlights some interesting relationships in the data for example the age group of a patient has an influence on the type of incident, arrival mode and ‘DTA?’, the outcome of the patient. This was also evident from preliminary data analysis which highlighted teenagers as the age group category with the smallest (6%) proportion ofpatients being admitted to hospital, while 60 year olds and over make up 54% of all DTA patients, in particular, the patients aged 80 to 89 years consisted of the single largest group of DTA patients ( 1 9%).This is expected as the older patients would be considered more dependent on others, potentially having other medical complaints which could lead to more complications and an overall more likely decision to be admitted to hospital.
Activity in Hospital using DC-Ph Models
Bayesian Networ?
159
u
Phasetype Distribution
Fig. 4. A&E Bayesian network DC-Ph model.
In addition to age group, the variables incident type, arrival mode and priority level are considered to have influence on outcome ‘DTA:”. Incident type refers to the kind of incident in which the patient was involved such as ’road traffic accident’, ‘non-trauma case’ or ‘home accident’ case. A high proportion of DTA patients were admitted as a non-trauma case whereas those patients with No DTA tended to be in A&E due to non-trauma, home accident or due to an incident that happened in a public place. Arrival mode refers to the form of transport taken to get to the A&E department for instance ‘ambulance arrival’, ‘private transport’ or ‘public transport’ are possible. The DTA patients, as is expected were generally arriving at the A&E department via ambulance or private transport. This seems reasonable as you would expect that those patients that are admitted to hospital for hrther care to be the more severe cases and thus the more urgent so more likely to be the ambulance cases or private transport. Priority level is a variable assigned to the patient, by hospital stafc on arrival to A&E. There are five different levels starting with code red or level 1 which is the most severe where the patient requires immediate resuscitation, level 2 refers to patients considered to be very urgent right through to the least severe level 5 classed as non-urgent. The patients who receive a DTA mainly comprised of the urgent cases (coded as level 3) while those patients that have No DTA tend to be less severe cases, coded as level 4 referred to as standard. This variable is considered a very useful indicator in predicting the patients’ outcome of whether they receive DTA or No DTA. Indeed the priority variable could be viewed as a proxy variable for many other covariates in addition to the value of the hospital staff member’s intuition and experience on assessing patients on the first initial inspection.
160 Recent Advances in Stochastic Modeling and Data Analysis
4.2 A&E naive Bayes classification DC-Ph model Figure 5 illustrates the fitted model for patient trolley waiting times where the first component in the model represents the naive Bayes Classification of patient characteristics and how they relate to determine the patient outcome of whether they are admitted to hospital, and the second component represents waiting time as detailed in equations (1) - (5). The diagrammatic representation of the model highlights the naive Bayes classifier using the variables priority code and arrival mode to predict whether a new patient will require admission or not, that is whether they receive a decision to admit (DTA).
Arrival mode
+/----./--
\
Priority code
\. Naive? Baye
u
Phase-type Distribution
Fig. 5 . A&E nafve Bayes classifier component DC-Ph model.
The naive Bayes Classifier produces a table representing the association between each node in the classification structure and its classification. The outcome, ‘DTA?’, of whether a patient should be admitted or not is based on the two patient features, the patient’s priority code and admission mode. For example, a patient with a priority code of 1, high priority, and admission mode as ambulance is predicted to be a DTA which again is what would be expected. This relationship between priority code and admission mode was also included in the previous Bayesian network model however on comparison of the two techniques, the naive Bayes component is simpler and easier to understand than that of the previous model. The Bayesian network model does however permit the use of the nehvork of inter-relationships between variables and thus as a consequence provides a lot more information regarding associations in the model.
5
Conclusion
This paper introduces a family of DC-Ph models which expand upon previous research on the C-Ph distribution. The technique considers a model consisting of
Activity i n Hospital using DC-Ph Models
161
two components; the conditional component represented by a graphical structure and the process component represented by a Coxian phase-type distribution. This paper examines two forms of the DC-Ph model in terms of having a graphical structure consisting of either a Bayesian network or a naiive Bayes classification component. Previous research on the C-Ph model was applied to the healthcare domain with a particular focus on representing the duration of stay of elderly patients in hospital based on patient information. This then provided potential for using the model as a management tool for identifying patients who were most likely to have an extreme length of stay in hospital and as a consequence block the facilities or hospital bed to others. The DC-Ph models are demonstrated using similar data, this time for waiting times in an Accident and Emergency Department or Emergency room where patient information, known on arrival to an A&E department, is used to predict the fuhire outcome (‘DTA?’) of the patients and their associated trolley wait. The model may be used to identify those patients at risk of experiencing a long trolley wait so that something can be arranged to alleviate this problem and prevent the situation occurring. Alternatively, the model has the potential of acting as a management support tool where ‘what if?’ scenarios can be considered and the coilsequences of them impacting upon the system modeled in advance to highlight benefits, potential problems and further requirements that will improve and monitor the efficiency of the hospital system. This paper considered two forms of DC-Ph model, which involved the use of a Bayesian network or alternatively a na’ive Bayes classification component as the graphical structure in the conditional component of the model, Alternatively, other forms of graphical model could be investigated to identify if any potential exists to represent the association between the discrete variables in a different form. For instance, further work could be carried out to investigate whether it may be worth including clustering algorithms to form groups of variables in the conditional component of the model or alternatively to consider some form of principle component analysis (PCA) and use the PCA components in the conditional component.
References [Jensen ,20011 F.V. Jensen. Bayesian Networks and Decision Graphs, Springer. [Korb and Nicholson, 20041 K.B. Korb and A.E. Nicholson. Bayesian Artificial Intelligence, Chapman and Hall, 2004. [Marshall and McClean, 20031 A.H. Marshall and S.I. McClean. Conditional Phase-Type Distributions for Modelling Patient Length of Stay in Hospital, International Trans Operational Research 10(6):565-576,2003 [Neuts, 1981lM.F. Neuts. Matrix-Geometric Solutions in Stochastic Models - An Algorithmic Approach, John Hopkins Univemity Press. [NHS, 20041 NHS Information Authority, www.connectingforhealth.nhs.uk, 2004.
Identifying the heterogeneity of patients in an Accident and Emergency department using a Bayesian classification model Louise Burns and Adele H. Marshall Centre for Statistical Science and Operational Research (CenSSOR) David Bates Building, Queen’s University Belfast BT7 lNN, Northern Ireland (e-mail: 1.bums@ qub.ac.uk) Abstract. This paper considers the analysis of waiting times in a hospital Accident
and Emergency department, in particular the waiting time from the clinician’s decision to admit until actual ward admission. A model is developed which employs a nave Bayes classifier for identification of patients who will require admission to ospital and thus experience such a waiting time. Such waiting times are found to be adequately represented by a lognormal model. Potential exists to expand such a model to include patient covariates. Keywords: Accident and Emergency, Waiting times, Naive Bayes Classification.
Introduction The UK National Health Service (NHS) is faced with many challenges when considering the flow of patients through the hospital service. In particular, there are problems associated with patients leaving the hospital, and more recently with patients requiring admission into the hospital. Delayed discharges prohibit patients leaving the hospital, causing an accumulation of patients occupying beds. In turn, beds are unavailable for newly arriving patients, thus causing what is known as bed blocking [BBC News, September 20031. To address this problem the NHS focused a major drive to try and reduce patient’s total length of stay (LoS) in hospital, including the establishment of a discharge plan on arrival and developing patient pathways. [NHS Plan, 20001. This resulted in hospitals being fined for delayed discharges [Community Care Bill, 20041. Over the last 15 years there have been a wide range of statistical models developed to accurately represent patterns of LoS and to assist healthcare officials in planning patient management. One such statistical model is the phase-type distribution for representation of LoS data. The phase-type distribution describes the time to absorption of a finite Markov chain in continuous time, where there is a single absorbing state and the stochastic process starts in a transient state [Neuts, 19811. Coxian phase-type distributions [Cox, 19551 are a special sub-class of phasetype distributions, where the transient states are ordered with one single absorbing phase. The process begins in the first stage and either sequentially moves through the transient phases, or can at any time, enter into the absorbing phase. Faddy and McClean (1999) investigate fitting the Coxian phase-type distribution to LoS data for a group of geriatrics in hospital. As 162
Identzfying Heterogeneity of Patients using a Bayesian Model
163
an extension to modelling patient LoS using Coxian phase-type distributions, the Conditional phase-type (C-Ph) model uses Coxian phase-type distributions conditioned on a Bayesian Network [Marshall and McClean, 20031. This allows patient attributeskovariates to be involved when modelling their LoS. Essentially the Coxian phase-type distribution models the continuous survival data. The Bayesian network allows the inclusion of statistical graphical models which provide a framework for describing and evaluating probabilities when there is a network of inter related variables representing causality. ,
The problem of getting into hospital has become an increasing worry for the NHS. Typically the Accident and Emergency (A&E) department exposes most of the problems in this area, where problems getting into hospital are characterised by long waits from the clinician’s decision to admit (DTA) until actual admission to a hospital ward [NHS Information Authority, 20041. This period of waiting is known as the experienced trolley waiting time. Both the number of trolley waits and the actual trolley waiting time are of concern to those trying to improve the running of A&E departments, the objective being to reduce both. In extreme cases, high numbers of trolley waits can lead to the refusal of further emergency admissions, as was the case in Belfast City Hospital, March 2005 [BBC News, March 20051. There are two current targets concerning trolley waiting times in Northern Ireland. 1. 75% of patients should be admitted to a ward within 2 hours of DTA. 2. No patient should have a trolley wait greater than 12 hours. However, the unpredictable nature of emergency arrivals at A&E can make patient management a difficult task. For example, it is impossible to forecast public emergencies causing an influx of emergency admissions. Focus on improving A&E services has become a key task for the NHS as the number of emergency admissions continues to rise each year. In Northern Ireland the total number of attendances at A&E departments rose by 2.4% from 1998 to 2004 [DHSSPS, 20051. Of all patients presenting at A&E only a proportion will require admission to hospital following a DTA (referred to as DTA patients), while the other patients can be sent home or referred to their general practitioner (No DTA patients). It is only DTA patients who will experience a trolley waiting time. This paper will present a modified version of the C-Ph model discussed above, and apply the representation to patient trolley waiting time in A&E.
164 Recent Advances in Stochastic Modeling and Data Analysis
Model The model presented in this paper consists of two independent components as illustrated in Figure 1 Outline of Model structure
Figure 1
Classifcation Technique
Trolley waiting times
The first component employs the use of Bayesian classification to act as a prediction tool. The Bayesian approach to classification considers a set of features or attributes, observable on each object, described by a feature vector. Bayesian classifiers then assign the most likely class to a given example according to the features exhibited by the feature vector [Rish, 20011. If
xi denotes random variables to be used as features, lower-case xi denotes
the values of these random variables then
x = (x,,x,,....,x,) describes
the feature vector while
x = (x~, x2,....x,)
particular example. Let
c denotes the class of an example, where c can
describes the outcomes for any
{o,
...,m - I}. Then applying Bayes rule to give take one of m values c E the posterior probability of class membership yields P(C = i I x = x ) = where
P(X = x (c = i)P(C = i) P(X = x)
P(x = x) can be ignored as it is a constant.
(1)
Identifying Heterogeneity of Patients using a Bayesian Model
165
The n a k e Bayes classifier greatly simplifies learning by assuming that features are independent given class. Despite this unrealistic independence assumption the nai've Bayes classifier is surprisingly effective in practice since its classification decision may often be correct even if probability estimates are inaccurate [Rish, 20011. From this assumption of feature independence it follows that
and P ( c = i) can be calculated from the data. The nai've Bayes classifier may then be facilitated to calculate the posterior probability of belonging to each class, choosing the class with the highest probability. That is, for a given example with feature vector x, assign class if and only if
ci
Nai've Bayes has proven effective in text classification, medical diagnosis and computer performance management, among many other applications [Domingos and Pazzani, 1997; Hellerstein et aZ,20001. In this model the aim of classification will be to categorise newly arriving patients according to whether the patient will eventually have a DTA or no DTA, based on information on the patient which is available shortly after they arrive at A&E. The ability to identify patients who will require admission to the hospital at this earlier stage could be extremely useful. It will give bed managers in the hospital more time to make beds available for admissions, rather than this process only beginning when the doctor makes a DTA. This in turn should lead to a reduction in the recorded trolley waiting time for patients. The first component of the model allows the recognition of the two streams of patients who arrive at A&E. Those with no DTA will not experience trolley waiting whereas those with a DTA will have a trolley waiting time, which may then be modelled using the second component, which addresses the skewed distribution.
The Data The model is demonstrated using data taken from the NIRAES (Northern Ireland Regional Accident and Emergency System) database, and is based on all new arrivals at a busy A&E department, over the period 01/04/05 to 31/03/06. The dataset contains records for 52,928 new patients presenting at
166
Recent Advances in Stochastic Modeling and Data Analysis
the A&E department, that is, it does not include patients who have been called back for review. Fifty three patient variables include patient age, sex, arrival method to A&E, departure method, incident type, assigned priority code and whether a DTA was made or not. Also recorded were datedtimes of various A&E activities such as arrival to A&E, assessment by nurse, examination by doctor, time of departure and, if applicable, time of DTA and time to ward. This facilitated the calculation of an associated trolley waiting time for all DTA patients. Of the 52,928 arrivals, 12,838 (24%) were DTA patients while 40090 (76%) were not. The proportions of males to females presenting were approximately even, with 51.1% males to 48.9% females. On inspection of age groups of patients (in groups of ten years) it was observed that teenagers were the largest single group of patients presenting (14% of all arrivals). However, when considering only the DTA patients, it was found that teenagers were actually the smallest group (6% of DTA patients) while 60 year olds and over make up 54% of all DTA patients. Patients aged 80 to 89 years are the single largest group in the DTA patients (19%).
The Bayes Classifier In order to build the na’ive Bayes classifier, it is necessary to consider which variables would be most important for inclusion in the model with re ard to
(x2j
their influence on the decision to admit or not. Chi-square tests were performed on several variables and those indicating the strongest association with DTA were considered. The results of the X 2 tests, shown in Table 1, indicate that there are three patient features which could be used for classifying newly presenting patients as patients who will have a DTA or patients who won’t have a DTA.
Variable
X2
P-Value
Priority Code (PC) Arrival Method
8036 6877 528 1
<0.001 <0.001 <0.001
I
Patient priority code (PC) is an indicator which is assigned by the nurse at assessment stage. The PC ranges from 1 to 5 (1 being the most urgent and 5 being the least) and indicates the order of priority in which patients should be seen based on the severity of their symptoms. These are colour coded in order (1 to 5 ) red, orange, yellow, green and blue.
I
Identifying Heterogeneity of Patients using a Bayesian Model
167
The arrival method at A&E for each patient was recorded as one of the following; ambulance, with a police escort, by private transport, by public transport or by walking. Finally the patient age category was also found to be significantly associated with the decision to admit or not. Age categories were grouped in ten year groups up to ninety years, then ninety years plus. These three variables PC, arrival method and patient age group, all produced highly significant results at the 99% confidence level, indicating that each is strongly associated with the decision to admit or not. For the construction of the classifier, the original dataset was randomly split into a training set for learning the model (containing 70% of the original data) and a test dataset (remaining 30%) for testing the accuracy of the classifier. Using different combinations of these features, seven different nai've Bayes classifiers were constructed and tested.
Features used to build theclassifier
PC Arrival Method
Overall % of decisions matched correctly 77.5% 77.6%
Of patients with real DTA - % matched correctly 12.8% 49.1%
Of patients with real No DTA - % matched correctly 98.1% 88.6%
PC, Age Group & Arrival Method The performance of the classifiers was assessed by analysing the outcomes from the test set and comparing these with the real decisions recorded. Three areas were assessed - how many decisions the classifiers predicted correctly overall, considering only the subset of patients with real DTA; how many did the classifier correctly predict as DTA and, then considering only the subset of patients with No DTA, how many of these did the classifier predict correctly.
168 Recent Advances in Stochastic Modeling and Data Analysis
The results can be seen from Table 2, showing that the least accurate classifier was built from the age and arrival variables. Overall accuracy was 74.4%, however when it came to identifying the DTA patients correctly, this classifier only worked for 5.9% of cases. Ultimately, it is the clinician who applies their expert knowledge in making a DTA. Therefore without somehow considering the clinicians views, exact classification is limited. This is somewhat reflected in the best classifier which uses PC and Arrival method as explanatory features. Of all cases it predicted 91.2% of decisions correctly. More importantly it performed the best in correctly identifying the DTA patients (72.3%) which is important for modelling those with waiting times. The classifications according to this model are given in Table 3. Table 3. Naive Bayes classifier results to identify patients as DTA or No DTA according to their arrival method and priority code Arrival Method
1
Police Escort Private Public Walking
PC=1
I
DTA
1 DTA NO DTA
PC=4
PC=5
NO DTA NO DTA NO DTA NO DTA NO DTA
NO DTA NO DTA NO DTA NO DTA NO DTA
+ NO DTA
NO DTA
NODTA
The results of this Bayes classifier with only two features can be followed in a simple table. With the addition of more features and thus extra layers in a table, results can be more easily illustrated in a decision tree structure.
Representing trollev waiting time The second component of the model represents the final patient trolley waiting time. On preliminary analysis it was found that approximately a quarter (24%) of patients had a zero time trolley wait, in other words, they left A&E immediately and were transferred to a hospital ward. The maximum trolley wait is recorded as 3255 minutes (approximately 2.25 days) and median trolley waiting time is 60 minutes (Figure 2).
Identifying Heterogeneity of Patients using a Bayesian Model
169
The distribution of the trolley waiting times follows a typical positively skewed distribution, with most patients having a fairly short waiting time, but with a few patients’ experiencing extremely long waits. The basis of the NHS targets was focused upon where trolley waiting times less than 2 hours and over 12 hours were considered. In the case of the current study, 64% of patients had trolley waiting times of less than 2 hours (target 75%),while approximately 9% of patients were waiting over 12 hours (target 0%). This hospital therefore does not meet the required NHS targets and would therefore be considered unsatisfactory. The trolley waiting times are modelled using the following lognormal distribution;
1
f (t>= 1 . 1 5 t G
enp( - (logt - 1.39)’ 2.66
It can be seen from Figure 2 that this distribution adequately captures the shape of the data, even though it doesn’t manage to depict the large peak at the beginning, owing to the large frequency of zero waiting. This initial lognormal model allows us to reasonably describe the distribution of the trolley waiting times. To build an improved model for trolley waiting times it will be useful to incorporate patient covariates which are shown to be associated with trolley waiting times. For example results of the dataset indicate a strong association between patient age group and waiting time (p value < 0.0001). As patient age group increases their waiting time significantly increases. Development of further models which account for age and other influencing variables will improve the modelling ability.
170 Recent Advances in Stochastic Modeling and Data Analysis
Figure 2. Fitting a lognormal model to patient trolley waiting times
0.25
I
0.2
0.1 5 h
4-
v c
0.1
0.05
0
Conclusion This paper presents the first steps in establishing a clear model for A&E trolley waiting times. Firstly a nai've Bayes classifier was developed to identify the two streams of patient entering A&E. The optimal classifier was constructed using patient priority code and arrival method as defining features. For those patients with a DTA it was found that a lognormal distribution reasonably fitted the associated trolley waiting time. Future work will be concerned with further modelling of the trolley waiting time. Influencing patient attributes will be considered and introduced, to produce a more accurate model, for example, the age of the patient. Also, the use of phase-type distributions will be explored to see if these specialised distributions are a suitable tool for modelling trolley waiting times. By modelling the waiting process, it should lead to a greater understanding of the problem trolley waits and the factors which are contributing to it.
Identifyzng Heterogeneity of Patients using a Bayesian Model
171
References [BBC News, March 20051 available at htttx//news.bbc.co.uMl/hi/northerni r e l a d 4 3 10 133.stm Accessed on 11/12/2006. [BBC News, September 20031 available at httr,://news.bbc.co.t1Ml~ii/health/311SS6O.stm Accessed on 14/12/2006. [Community Care Bill, 20041 available at htt~://~~~.~arliament.u~commons/lib/researchlr~2002/r~O2-066.pdf Accessed 12112/2006 [Cox 19551 D.R. Cox. A use of complex probabilities in the theory of stochastic processes. Proceedings of the Cambridge Philosophical. Society, 51:.313-319, 1955 [DHSSPS, 20051 available at htt~://www.dhss~sni.aov.uWkev-facts-99-05.~df Accessed on 10/12/2006 [Domingos and Pazzani, 19971 P.Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103-130, 1997. [Faddy and McClean, 19991 M. Faddy and S.I. McClean. Analysing data on length of stay of hospital patients using phase-type distributions, Applied Stochastic Models in Business and Industry, 15:311-317, 1999. [Hellerstein et al, 20001 J. Hellerstein, T. Jayram and I. Rish. Recognising end-user transactions in performance management. In Proceedings 0fAAAI-2000, pages 596602,2000. [Marshall and McClean, 20031 A.H. Marshall and S.I. McClean, Conditional phasetype distributions for modelling patient length of stay in hospital, International Transactions in Operational Research, 10:565-576 , 2003. [Neuts, 19811 M.F. Neuts. Matrix Geometric Solutions in Stochastic Models, John Hopkins University Press, Baltinmore, Maryland, 1981. [NHS Plan, 20001 available on www.dh.eov.uk Accessed on 12/12/2006 [NHS Information Authority, 20041 available at ?searchterm=troII ey+ wai t Accessed on 14/12/2006 [Rish 20011 I. Rish. An empirical study of the nai've Bayes classifier, Proceedings of IJCAI 2001 Workshop Empirical Methods in Artificial Intelligence. 2001 [Rish et al, 20011 I. Rish, J. Hellerstein and T. Jayram. An analysis of data characteristics that affect nai've Bayes performance. Technical Report RC21993, IBM T.J. Watson Research Centre. 2001
Modelling the total time spent in an Accident and Emergency department and the associated costs Barry Shaw and Adele H. Marshall Centre for Statistical Sciences and Operational Research Queen’s University of Belfast Belfast, Northern Ireland, BT7 1 N N (e-mail: barry .shawQqub. ac .uk) Abstract. The aim of this paper is to analyse the total length of time spent by a group of patients in an Accident and Emergency (A&E) department using a multistage Markov model. A patient’s pathway through A&E consists of a sequence of stages, such as triage, examination, and a decision of whether to admit to hospital or not. Using Coxian phase-type distributions, this paper models these stages and illustrates the difference in the distribution of t,he time spent in A&E for those patients who are admitted to hospital, and those patients who are not. A theoretical approach to modelling the costs accumulated by a group of patients in A&E is also presented. The data analysed refers to the time spent by 53,213 patients in the A&E department of a hospital in Northern Ireland over a one year period. Keywords: A&E department, costs, Markov model, patient flow, waiting time.
1 Introduction Accident and Emergency (A&E) departments provide the main passageway t o a hospital for emergency inpatients and are one of the principle contacts with hospital services for many members of the public. Over 14 million patients use A&E services in the United Kingdom (UK) each year [Department of Health, 20031. In Northern Ireland this number is approximately 700,000 [DHSSPSNI, 20051. The total time spent in A&E is used as the key measurement of a department’s performance because it is this that is of concern to those attending the department, and has clinical implications. Analysing the total time spent in A&E gives a more robust indicator of performance of the whole system, rather than measuring performance based on just one part. The importance of the total time spent in A&E, from arrival t o admission into hospital or discharge, is emphasised by the U.K. government’s target that 98% of patients should spend no more than 4 hours in any A&E department [Department of Health, 20001. In Northern Ireland a similar target of 95% is being implemented by March 2008. The aim of this paper is to analyse the total length of time spent in an A&E department using a multi-stage Markov model. This is an appropriate approach since the patients in an A&E department proceed through a series of
172
Modelling Total Tame Spent an A&E and Associated Costs
173
different stages relating to arrival, assessnzent: examination, decision to admit ( D T A )to hospital, and admittance or discharge. It is increasingly recognised that by more accurately inodelling the flow of patients in hospital, policy rnakers may understand better the case-mix of patients being admitted. Identifying this heterogeneity and the associated length of stay is important in allowing health care managers to understand the syst,ein act,ivit,y. Different, stat’istical techniques have been used to model the flow of patients through various other hospital departments, see for example [Harrison and Millard, 19911, [Gorunescu e t al., 20021 and [Faddy and McClean, 19991. The latter of these used a Coxian phase-type distribution to analyse the flow of’ patients through a geriatric ward, where the phases of the Markov model could be thought of as the stages of care experienced by a patient. Further work expanded this idea by incorporating costs into a three-stage model [McClean and Millard, 19981 and a generalised 2-stage model (Shaw and Marshall, 20061, enabling the costs to be estimated for a group of patients’ overall duration of stay. These techniques have previously been applied to a geriatric ward, however from analysing the distribution of the total length of time spent in an A&E department, a similar approach could be adopted here. The patients admitted into A&E may be considered as two sets - those who are admitted to hospital ( D T A = y) and those who are not admitted t o hospital ( D T A = n) and are discharged home or referred to their GP. This paper aims t o illustrates two things: (1) the use of the Coxian phase-type distribution as a modelling technique for the flow of patients through an A&E department, and (2) a theoretical approach t o model the expected cost for a group of patients’ accumulated duration of time in an A&E department, given appropriate costing data. The remainder of the paper is constructed as follows. Section 2 provides the background of Coxian phase-type distributions, section 3 illustrates an application using data from the A&E department of a hospital in Northern Ireland, while section 4 discusses the methodology for analysing the costs of a group of patient’s duration of time in A&E. Section 5 concludes the paper.
2
The Coxian phase-type distribution
A Coxian phase-type distribution [Cox, 19551 describes the time to absorption of a finite Markov chain ill continuous time, where there is one absorbing state (or phase) and the process starts in the first of k transient states, see Figure 1. The process sequentially moves through the ordered transient states, with the choice of departing into the absorbing state at any time. The parameters of the Coxian phase-type distribution, the X,’s and pz’s,estimated from the data, describe the transition rates through the ordered transient states and the transition rates from the transient states t o the absorbing state, respectively. Due to the highly skewed nature of the total length of time spent in an A&E department, the Coxian phase-type distribution provides a robust
174 Recent Advances in Stochastic Modeling and Data Analysis A1
1
-
A2
2
-
3
----> k
method for modelling the variable of time to absorption. The probability density function (pdf) of the random time variablc T, representing the time until absorption, is given by
where Q, a sub-matrix of transition rates rcstricted t o the transient phases, is of the form
0 0
0 0
Q=
p is a vector of probabilities defining the initial transient phases, given by p= (l00...00)
(3)
and q is the transpose of a 1 x k vector of transition rates from transient phases to the absorbing phase, given by
As well a s having been previoiisly iised to model patient, flow, t,he Coxiaii phase-type distribution has also been used to model the length of treatment times of patients in a suicide study [Faddy, 19941, the telephonc service times for customers of an Israeli bank [Ishlay, 20021 and the incubation time of AIDS [Aalen, 19951. Such applications requirc the estimation of the parameters of the Coxian phase-type distribution from some observational data.
Modelling Total Time Spent in A&E and Associated Costs
175
One technique for doing this is maximum likelihood estimation. This paper uses MATLAB t o implement an optimization function that maximises the log-likelihood for a I-phase distribution, then a 2-phase distribution and so on. By ta.king into coiisidera.tion both the fit of the model to the da.ta. and the complexity of the model, the minimum Bayesian Information Criterion is used to decide the most appropriate number of phases. The corresponding parameter estimates of the distribution are also obtained.
3
Application
The data refers t o the total length of time spent by 53,213 patients in the A&E department of a hospital in Northern Ireland between April 2005 and March 2006. Depending on the severity of the admittee’s injury or illness, they will pass through a sequence of stages in the department, before either being admitted into hospital or discharged, see Figure 2. On arrival t o A&E,
Arrival
Examination
Fig. 2. Patient pathway through an A&E department
a patient may wait for a period of time before seeing a triage nurse for initial assessment. After this they may ‘queue’ for a further time period before they are given a further examination by a doctor. The next stage is the DTA. If a patient is admitted t o hospital, they can wait in the A&E department for a further period of time before proceeding to the ward. Of the 53,213 patients analysed in our data set, 12,917 (24%) are admitted into a hospital ward. These patients have a mean (median) time in A&E of 372 (237) minutes. If a patient is not admitted to hospital they are discharged home or referred t o
176 Recent Advances in Stochastic Modeling and Data Analysis
their GP. These patients have a mean (median) time in A&E of 129 (100) minutes. A Mann-Whitney test carried out to aiialyse the iiiflueiice of D T A on the total time spent in A&E produces a p-valuc < 0.001. The results of fitting DTA
No. of phases
BIC score
Y
4
175,680
n
3
464,267
Fitted parameters @1=0.00004, @~=0.00001,@3=0.01569, @4=0.00207, x^1=0.01152, &=0.01484, &=0.00893. @1=0.00011, @z=O.01080, @3=0.23641, X^I =0.01013. x^2=0.02722.
Table 1. Results of fitting the Coxian phase-type distribution to those patients categorised as DTA = y and DTA = n.
separate Coxian phase-type distributions t o the total time for D T A = y and D T A = n patients are shown in Table 1. The Bayesian Information Criterion, BIG' = -2 x LL nln(N), was used t o determine the most suitable number of phases for ea.ch fit. L L represents the log-likelihood va.lue for ea.ch fit, n represents the number of parameters of the Coxian phase-type distribution and N represents the number of data cases. The most suitable number of phases for the time spent in A&E of those patients who were admitted t o hospital is four. For those patients who were not admitted to hospital, a four phase fit, produced a better BIC score t,liaii that for three phases, however, after several fits one or more of the parameters were zero. This changes the distribution being fitted, and so it was decided to use three phases. The visual fit between the three phases and four phases was near negligible. Figures 3 and 4 show the graphical fit of the four phase arid three phase distributioiis to the D T A = y and D T A = n data, respectively, truncated at 1000 minutes. Although no attempt is made here to physical interpret the fitted phases of the Coxian phase-type distributions, it is note worthy that the number of fitted phases found from the data correspond to the number of physical stages illustrated in Figure 2. For those patients admitted t o hospital ( D T A = y), their pathway consists of four components. For those patients discharged or referred ( D T A = n ) , their pathway consists of three components. Previously, using data sets describiiig the flow of patients through other hospital wards, a physical int,erpret,atioii has been attached to t,he fitted phases of the model [McClean et al., 20051. For example, in geriatric medicine, collaboration with geriatricians has helped label the phases of a 3-phase distribution as acute-, medium- and long-term care. By attaching financial costs to these phases, or stages of care, a model was developed enabling the estimation of the costs for the duration of time spent in care by a group of patients [McClean and Millard, 19981. Further collaborative
+
Modelling Total Tame Spent in A & E and Associated Costs
177
0.0045
0.004
1
1 -data
-f(t)
[4 phases]
time (minutes)
Fig. 3. Fitted 4-phase Coxian phase-type distribution for the time spent in A&E of those patients admitted to hospital.
between modellers, A&E clinicians, and financial administrators would offer a similar model for the application discussed in this paper. The following section briefly discusses the theoretical approach that could be adopted for modelling the expected cost for a group of patients' accumulated duration of time in A&E.
4
Modelling costs in an A&E department
By considering the number of patients in each phase i, n, , the cost per patient per time unit in phase i, c,, and the continuous random variable T, representing the time a patient spends in phase i, the aim is to derive the moment generating function (MGF) of the total costs for all patients in A&E, T N . The costs, c, are phase dependent, but time homogeneous. If D,, represents the total cost per patient that leaves phase j , given they started in phase z , then the MGF of D,, is given by
178 Recent Advances in Stochastic Modeling and Data Analysis 0.008
0.007
0.006
0.005
r ’?
0.004
Q
0.003
0.002
0.001
0
1
46
91
136 181 226 271 316 361 406 451 496 541 586 631 676 721 766 811 856 901 946 991
time (minutes)
Fig. 4. Fitted 3-phase Coxian phase-type distribution for the time spent in A&E of those patients discharged or referred.
where X i and pi are the parameters of the Coxian phase-type distribution. If the discrete random variable Z,iJ,where 1 5 i 5 k and i 5 j 5 k , represents the number of subjects who leave A&E from phase j , given that they started in phase i, then the following MGF can be derived
where p i j is the probability of leaving phase j , given the patient started in phase i. Assuming that TN is a random variable representing the total cost for all patients while in A&E, then using (5) and (6) the MGF of T N ,M T ~ , is given by
The expected future total cost, C=E(TN), for a group of patients’ accumulated duration of time in an A&E department is then derived by taking the
Modelling Total T i m e Spent an A&E and Associated Costs
179
first moment of (7) and substituting z=O [Shaw, ZOOS]. The approach outlined here offers a possible technique that may be used for an A&E department. Considering, for example, those patients in section 3 who, having spent a period of time in A&E, are admitted into hospital. Assuming, through manipulation of available financial data, that the cost per patient per minute in each of the four stages of care was Eu, Eb, Ec and L d , respectively. These costs would be representative of the different stages of care a patient may undergo while in A&E. Then using ( 7 ) ,C is given by
By altering the parameters of the model, policy changes, such as opening an additional examination room or changing the number of staff in certain parts of the A&E department, could be investigated as potential benefits of such a model t o hospital managers.
5
Conclusion and further work
The importance of analysing and modelling the total length of time patients spend in an A&E department can only be highlighted by the recent targets introduced by the U.K. government. Modelling the flow of pat.ients t,hroiigli the system is important, allowing policy makers to test possible changes before implementing them on a real department. This paper has illustrated how the Coxian phase-type distribution may be utilised t o model the total length of time spent by a group of patients in an A&E department. Although no physical interpretation was attached to the phases of both distributions, it was signifimnt to note tha.t the number of fitted phases corresponds to the number of stages a patient proceeds through while attending A&E. Both distributions provide good fits to the empirical data. The last section of this paper illustrates a possible approach that could be taken for modelling the costs for a group of patients in A&E if appropriate financial dat'a was obtained. This is one avenue that could be considered as future work. A costing model would give hospital administrators and clinicians the ability to foresee the economic and non-economic consequences of any theoretical changes made to a health care system before implementing the changes on a real A&E department.
180
Recent Advances in Stochastic Modeling and D a t a Analysis
References [Aalen, 199510. Aalen. Phase type distributions in survival analysis. Scandavian Journal of Statistics, 22:447-463, 1995. [Cox,1955lD.R. Cox. A use of complex probabilities in the theory of stochastic processes. PTOC.Camb. Phil. Soc., 51:313-319, 1955. [Department of Health, 2000]Department of Health. The nhs plan. 2000. Hospital activity statis[Department of Health, 2003JDepartment of Health. tics. Available: www.performance.doh.gov.uk/hospitalactivity/data-requests, 2003. [DHSSPSNI, 2005lDHSSPSNI. Northern ireland hospital statistics 1999/2000 to 2004/2005 key facts. Available: http://www.dhsspsni.gov.uk/key-facts-9905.pdf, 2005. [Faddy and McClean, 19991M. Faddy and S.I. McClean. Analysing data on length of stay of hospital patients using phase-type distributions. Applied Stochastic Models in Business and Industry, 15:311-317, 1999. [Faddy, 1994lM. Faddy. Examples of fitting strii.rt,iired phase-type distributions. Applied stochastic models and data analysis, 10:247-255, 1994. [Gorunescu et al., 2002]Gorunescu, -, and et al. A queueing model for bedoccupancy management and planning of hospitals. Journal of the Operational Research Society, 53:19-24, 2002. [Harrison and Millard, 19911G. Harrison and P.H. Millard. Balancing acute and long term care: the mathematics of throughput in departments of geriatric medicine. Meth Inform Med, 30:221-228, 1991. [Ishlay, 2002]E. Ishlay. Fitting phase-type distributions to data from a telephone call center. Research Thesis: Israel Institute of Technology, 2002. [McClean and Millard, 1998]S.I. McClean and P.H. Millard. A three compartment model of the patient flows in a geriatric department: a decision support approach. Health Care Management Science, 1 (2):159-163, 1998. [McClean et al., 2005]S.I. McClean, -, and et al. Markov model-based clustering for efficient patient c x e . 18th IEEE Symposium on Computer-Based Medical Systems, pages 467-472, 2005. [Shaw and Marshall, 2006lB. Shaw and A.H. Marshall. Modeling the health care costs of geriatric inpatients. IEEE Transactions on Information Technology in Biomedicine, 10 (3):526-532, 2006. [Shaw, 2006]B. Shaw. An extended bayesian network approach to model the health care costs of patient spells in hospital. PhD Thesis: Queen’s University Belfast, 2006.
CHAPTER 5 Markov and Semi Markov Models
Periodicity of the Perturbed Non-Homogeneous Markov System M. A. Symeonaki' and P.-C. G. Vassiliou2 Panteion Univeristy Department of Social Politics, 136 Syggrou Av., 17671 Athens, Greece (e-mail: msymeon(0unipi.gr ,) Aristoteleion University of Thessaloniki, Department of Mathematics 54006, Thessaloniki, Greece (e-mail: v a s i l i o u h a t h .auth.gr) Abstract. In this paper the periodicity of a perturbed non homogeneous Markov system (P-NHMS) is studied. More specifically, the concept of a periodic P-NHMS is introduced, when the sequence of the total transition matrices {Q(t)}Eodoes not converge, but oscillates among several different matrices, which are rather close to a periodic matrix Q, with period equal to d. It is proved that under this more realistic assumption, the sequence of the relative population structures {q(t)}zo splits into d subsequences that converge in norm, as t -+ 00. Moreover, the asymptotic variability of the system is examined using the vector of means, variances and covariances of the state sizes, p ( t ) . More specifically, we prove that the vector p ( t ) also splits into d subsequences that converge, as t + 00, and we give the limits in an elegant closed analytic form. Keywords: Markov processes, Markov systems, Perturbation theory.
1
Introduction
Consider a NHMS as introduced in [Vassiliou, 19821. The main purpose of this paper is to provide the concept of a periodic P-NHMS and to study the asymptotic behavior and variability of the system. This is an effort to provide a general framework for a number of periodic systems in manpower planning, where the sequence {Q(t)},00=, does not converge as t -+ co, but lies rather close to a stochastic matrix Q, whose period is equal to d, for each t. Applications for periodic systems and a fair number of conclusions concerning a NHMS when the embedded Markov chain is periodic are given in [Georgiou and Vassiliou, 19921, [Tsaklides and Vassiliou, 19881 and [Tsaklides and Vassiliou, 19921. Perturbed Markov chains are studied in [Meyer and Shoaf, 19801, [Meyer, 19801 and [Meyer, 19941, where the sensitivity of Markov chains in changes in the transition probabilities is studied. In [Vassiliou and Symeonaki, 19971 and [Vassiliou and Symeonaki, 19991 the concept of a P-NHMS both in discrete and continuous time is presented, 182
Periodicity of Perturbed Non-Homogeneous Markov System
183
in order to examine the sensitivity of a NHMS. In this paper it is assumed that the embedded non homogeneous Markov chain {Q(t)}& is of the form: Q(t) = Q - E q ( i ) , W = 0,1,2, ..., where &,(i) is randomly selected from a finite set of matrices Eq. More specifically, we assume that for each t , the matrix P(t) is of the form:
P(t) = P - E P ( i ) , t = 0,1,2, ...
(1)
with:
P(t)l' I 1', P1' I 1', P 2 0, P(t) 2 0 where 1' = [l,1,...,11' and the matrix EP(i) is randomly selected from the &p(2),...,EP(w)} according to the probabilities: finite set &I = {L$,(l),
prob{&p(t)= & p ( i ) } = ci
> 0, V i = 1,2, ...,V.
(2)
Assume moreover that for each t , the vector p,(t) is of the form: p,(t) = po - ~ ~ ( where i ) , p,(t)l' = 1
(3)
where the vector ~ , ( i is ) a perturbation vector for the vector p,(t) and is ~ E0(m)} ( 2 ) , according randomly selected from the finite set &, = { ~ ~ ( 1 ) , ~..., to the probabilities:
p r o b { ~ , ( t )= ~ , ( i )= } c,i > 0,Vi = 1,2, ..., m.
(4)
For the sequence of loss probabilities we have that:
prob{Ek+l(t) = ~ k + l ( i ) } = ci
> 0, f o r
i = 1 , 2 , ...,w
(6)
Consider now the embedded non homogeneous Markov chain {Q(t)}Z",o. Then, 'dt, Q(t) is of the form:
Eq(4
= &P(S)
+ P;+,&o(j) + &+l(S)Po- .;+l(+o(.d
(8)
for t = 0,1,2, ... and let that prob{&,(t) = E q ( i ) } = cqi > 0, for i = 1,2, ...,m w and Q - E q ( i ) 2 0. We suppose moreover, that the matrix Q is d-periodic, i.e. d is the least positive integer, such that Qdfl = Q. Assuming that c1,..., Cd-1 are the cyclic subclasses of Q , let Qi denote the submatrix of Q corresponding to the transition probabilities from Ci-1 to Ci. We also
c,,
184
Recent Advances in Stochastic Modeling and Data Analysis
assume that for h = 1,2, .... m w , the matrix &,(h) do not have nonzero elements, where the matrix Q has zero elerrierits. More specifically the following condition is required: if ~
# 0, then q i j # 0, 'di,j E S
i j
Then:
E [ Q ( t )= ] Q h
-
gq 2 0
xy:l
where &, = &,(h)prob{Q(t) = Q - &,(h)}. The system we have just described is called a periodic perturbed non homogeneous Marlcov system. The following Lemma is now proved:
Lemma 1. The matrix E[Q(t)]is a stochastic, d - periodic matrix. Proof. Following the same steps as in Theorem 3.1 in [Vassiliou and Symeonaki, 19991 we have that E [ Q ( t ) is ] a stochastic matrix. Due to the fact that the matrix &,(i) has zero elements exactly where the matrix Q has zero elements, we have that the matrix &,(i) has the following form, for each i = 1 , 2 , .... mu:
0 0
Eq0(i)
0
&,(i) =
0 0 Eqd&-l(i) 0 [ .
1.
0 ... &ql(i) 0 . . . . 0 . . ' &qd&2(i) 0 ". 0 0 " '
(9)
We do not exclude the fact that E q j ( i ) = 0 for some j. Thus, the ma-
2,
= Q - E q j ( h ) }= will also have the above trix C;I"="lEqj(h)prob{Qj(t) form, since p r o b { Q j ( t ) = Q - E q j ( h ) }> 0. Considering the fact that Q is h
d-periodic we conclude that E [ Q ( t ) = ] Q - &, is also a d-periodic matrix. and E [ Q ( t ) ]are respectively of the following Consequently, the matrices form: I 0 r,, ..' 0 0 0 E,, . . . 0 . . . . . . E, =
2,
h
,o
h
h
0 t
and
i-P d - 1
0 0
0 0
Eqd-2 0
_ ' '
_ ' '
Periodicity of Perturbed Non-Homogeneous Markou System 185 A
0 0
Qo - &qo
0
0 0.
= [Qiocd-I
... ... ...
0 Q i - 6 1
0 0
Qi - 6 d - 2
...
'
where:
h=l
In the following lemma the limit of the matrix E [ Q ( t ) ]td as t estimated.
+
00
is
Lemma 2. Let a periodic - P - N H M S and let Q be a d- periodic stochastic matrix. T h e n matrix B = limt+m{EIQ(t)]}td i s the diagonal matrix given by:
where kxi
7=2
7=2 s=o
i--T+2
m=d-r+2
s=i
k=O
and A! is the generalized group inverse ([Meyer, 19801) of matrix Ai = I Qi, for every i = 0,1,2, ...,d - 1. -d
Proof. According to Lemma 1, the matrices Qd and {1!3[Q(t)]}~ = Q (Q - c $ ) ~are given respectively by the following relations:
and
-
--
where Xi = QiQi+l . . . Qd-lQo. . . Qi-l and Xi = Q i Q i + l . are regular stochastic matrices. Now it is true that:
--
A
QiQi+i = (Qi - &q;)(Q,+i
A
- &,,+I
I = QiQi+l
-
-
. .Q d - l Q , . . h
-G I
-
. Qi-1
=
186 Recent Advances in Stochastic Modeling and Data Analysis A
A
+
A
-
where Ei, = QiEqi+l EqiQi+l. Repeating this process will yield:
-
QiQi+l.
-
. .Q d - i Q ,
&ij
-
&id-l
=
{
= QiQi+l. . . Qd-l&qo
. . .Q d - i Q o
-
h
+
+
A
=
d-1
A
-
if i A
+j
-
A
A
>j
Id
&ij+lQj-d+i
i
we have that:
if i + j > d
-
and E,, = Eqo, Q d =
&qiQi+l
i+l
+
-L
Now if i
&id-l-iQo.
Qi+j Q i Q . . . . Qi+j-lgqi+j Q i Q i + l . . . Q , . . . Qj-d-l+i&qj-d+i %+l
1 then Ei1 = QiEqi+l s < 0. Then:
If j
+
A
A
where
A
= QiQi+i
i--7
- -
Q,, Q ,
= I if
i-1
According to Lemma 1, the matrses %i are regular, stochastic matrices, 'dz = 0 , 1 , ..., d - 1.. The matrices E i d - 1 are therefore perturbation matrices b'i = 0 , 1 , ...,d - 1. Thus, from Theorem 4.1 in [Meyer, 19801, we have that limt,, = ni(I & - l A % ) p land consequently 6 is given by Relation (12).
Ei
+
We now give the following theorem:
Theorem 1. Let a P-NHMS like the one we have j u s t described and let that Q be a d-periodic stochastic matrix. Let also that the system i s expanding, i.e. T ( t )2 T ( t - 1 ) and that limt,, = 0. T h e n the sequence of the relative structures { q ( t ) } E 0 splits into d subsequences, with limits:
$#
m=O
where: ,s
= lim 12103
c
n-l
k=O
+
A T ( k d m) T(nd+m)
and B is given by relation (12). Proof. It is true that in a P-NHMS the following equation holds:
E[Q(t)l= E[P(t)1+ E [ P k + i ( t ) l E [ P ~ ( t ) ] .
(14)
Periodicity of Perturbed Non-Homogeneous Markov System
187
Moreover it is true that:
+
Without loss of generality we assume that t = n d Relation (15) can be written in the following form:
r-l
C
u=o
+
AT(nd u) T(nd+u) (Q -
T
(0 5 T 5 d
-
1). Then
zq)r-u-l.
We now prove that the second term of (16) converges, when t -+co. Let:
From Proposition 3 in [Tsaklides and Vassiliou, 19881 we conclude that: lim
t-m
lim
t-w
II(Q
,.
- €4) (n-k)d+(r-m-1) - B(Q -
gq)T-m-lI I = O 11
II(Q-gq)("-k))df(r-m -B - l()Q - g q ) d + r - m - l = O
i f r > m + l , and if
T
< m + l . (18)
IS . convergent 'dm = 0,1, ..., d - 1 since Now, the sequence it is limited from above and is a monotonically increasing sequence of time. Let the limits of the sequence be:
sm = lim n-w
c
n-l
k=O
AT(kd T(nd
+ m)
+ ?-) .
188 Recent Advances in Stochastic Modeling and Data Analysis
It is true, using also Relations (17) and (18) that: n-1
AT(kd + m, (Q - gq)(n-k)d+(T-m-l) T(nd+r)
lim
n-00
k=O
if r > m + l if r < m + l
'
From Relations (17) and (19) we have that:
m=r
m=O
We now consider the third part of Equation (16), for which it is: r-l
AT(nd +u)
IIC u=o T ( n d + u )
(Q - ~ q ) r - u - l l l
'-' A T ( n d + u )
I
u=o
T(nd
+r )
which converges to 0 due to the hypothesis. Moreover, since T ( t )+ M, the first part of (16) converges to 0. Hence: d-1
C
q,(m) = ;lEq(t)= {p, - E [ E , ( ~ ) ] } B (Q - I^q)dfT-mfl. m=O
We now study the asymptotic variability of the system using the vector of means, variances and covariances p ( t ) . More specifically, let:
P ( t ) = (E"l(t)I, E"2(t)l,
...I
E"k(t)ll
COV(N1( t ) Nl , ( t ) )C, O V ( N l ( t ) , N2 ( t ) I)'", COV(N1( t )I Nk ( t ) I)COV(N2( t ) Nl , (t)), COV(N2(t),N 2 ( t ) ) ,
"')
cov(Nz(t),N k ( t ) ) ,..., COV(Nk(t),N k ( t ) ) ) .
(22)
Then the following theorem holds:
Theorem 2. Let a P-NHMS like the one we have just described and let that Q be a d-periodic stochastic matrix. It is also assumed that the system is expanding, i.e. T ( t ) T ( t - 1 ) and that limt,, T ( t )= T . Then the sequence of the vectors { p ( t ) } E osplits into d subsequences, with limits:
>
Pr+I(M) = t-oo lim P ( t d r+ 1
+ + 1) = P(O)E[ ?-
VQ0,?I
d-1
C w r n ~ [ ~ o ( m ) ~+~ [C~ q w~ m, ~~ I[ ~ o ( m ) l ~ [ ~ q ~ ,
+{T - T(O)I{
m=O
m=r+2
Periodicity of Perturbed Non-Homogeneous Markov System
189
b'r = 0,1,2, ..., d - 1, where:
and limn+m E [V,(n,d
+ v, nd + r)] =
if r > v - 1
E [vqT,u] if T < 'u - 1 .
References [Georgiou and Vassiliou, 1992lA. Georgiou and P.-C. G. Vassiliou. Periodicity of asymptotically attainable structures in non-homogeneous markov systems. Lin. Alg. Appl., pages 137-174, 1992. [Meyer and Shoaf, 198OlC. D. Meyer and J. M. Shoaf. Updating finite markov chains by using techniques of group matrix inversions. J. Statis. Comp. Sam., pages 163-181, 1980. [Meyer, 198OlC. D. Meyer. The condition of a finite markov chain and perturbation bounds fot the limiting probabilities. Siam J. Alg. Disc. Meth., pages 273-283, 1980. [Meyer, 1994lC. D. Meyer. Senstivity of the stationary distribution of a markov chain. Siam J. Mat. Anl. Appl., pages 715-728, 1994. [Tsaklides and Vassiliou, 1988lG. Tsaklides and P.-C. G. Vassiliou. Asymptotic periodicity of the variances and covariances in non-homogeneous markov systems. J. Appl. Prob., pages 21-33, 1988. [Tsaklides and Vassiliou, 1992lG. Tsaklides and P.-C. G. Vassiliou. Periodicity of infinite products of matrices with some negative elements and row sums equal to one. Lin. Alg. Appl., pages 175-196, 1992. [Vassiliou and Symeonaki, 1997lP.-C. G. Vassiliou and M. A. Symeonaki. The perturbed nonhomogeneous markov system in continuous time. Appl. Stoc. Mod. and Data Analysis, pages 207-216, 1997. [Vassiliou and Symeonaki, 1999lP.-C. G. Vassiliou and M. A. Symeonaki. The perturbed nonhomogeneous markov system. Linear Algebra and Applications, pages 319-332, 1999. [Vassiliou, 1982lP.4. G. Vassiliou. Asymptotic behavior of markov systems. J. Appl. Prob., pages 851-857, 1982.
On the moments of the state sizes of the discrete time homogeneous Markov system with a finite state capacity G. Vasiliadis and G. Tsaklidis Department of Mathematics Aristotle University of Thessaloniki 54124, Thessaloniki, Greece (e-mail: gvasilhath.auth.gr, tsaklidiamath.auth.gr)
Abstract. In the present paper we study the evolution of a discrete-time homogeneous Markov system (HMS) with a finite state capacity. In order to examine the variability of the state sizes, their moments are evaluated for any time point, and recursive formulae for their computation are derived. Also a recursive formula is provided for the moments of the overflow size due to the finite state's capacity. The p.d.f. of the overflow size follows directly by means of the moments. Keywords: Stochastic population systems, discrete-time homogeneous hlarkov models, Markov systems.
1
Introduction
Consider a discrete-time Homogeneous Markov System (HMS) with state space S = 1,2,. . . ,k. Symbolize by t , t = 0,1,. . . , the time variable. Every member of the HMS may be in one and only one of the states 1 , 2 , .. . , k at some time point t according to some characteristic, i.e., the states of the HMS are exclusive and exhaustive. The members of the system can be human populations, biological organisms, etc. Let p i j denote the one-step conditional (time independent) transition probability for a member of the HMS of moving from state i to state j, and P = ( p i j ) the respective k x k transition matrix. The population structure of the HMS at time t is given by the (column) state vector n(t) = (n1(t),nz(t),' . . , n k ( t ) Y , where ni(t)stands for the number of the members at state i, i = 1 , 2 , . . . , k , and the superscript denotes transposition of the respective vector (or matrix). Moreover, we will denote by nij(t),i , j = 1 , 2 , . . . , k , the number of the members of the HMS who are moving from state i to j in the time interval [t,t l ) , and by ni(t) the vector of the numbers of transitions from state i to the states 1 , 2 , . . . ,k , in the time interval [t,t l ) , i.e.,
+
+
190
Moments of State Sizes of Discrete Time Homogeneous Markov System
191
for i = 1 , 2 , . . . , k , t = 0 , 1 , . . .. Then ni(t)/ni(t)
-
Multinomial ( n i ( t ) ; p i l , p i z , .. . ,pile),
for i = 1,2, . . . , k , t = 0,1, . . . ([Bartholomew, 19821). In the literature, such Markov systems, are called homogeneous (HMS) if P = ( p i j ) , or non-homogeneous (NHMS) if P = ( p i j ( t ) ) . They appear in manpower planning ([Bartholomew, 19821, [Gani, 19631, [Tsaklidis, 19941, [Vassiliou, 19821, [Vassiliou, 19971), in demography ([Bartholomew, 19823), biology, etc. For example, a HMS can be used in order to describe the evolution of an university system ([Gani, 1963]), the patients’ flows and costs in a hospital ([McClean et al., 19981, [Taylor et d.,2000]), the pollution of a biological system ([Patoucheas and Stamou, 1993]), etc. The evolution of a Markov system is usually examined by studying the evolution of the state vectors n(t), t = 0 , 1 , . . . , and their expectations, variances and covariances. In particular their asymptotic behaviour is of central importance in the study of the system. Results concerning the convergence of the expectations of the state vectors n(t), t = 0 , 1 , . . ., can be found for instance in [Kipouridis and Tsaklidis, 20011 and [Taylor et al., 20001, while in [Tsaklidis and Soldatos, 20031 the trajectories of the expectations of the r.v.’s n(t),t = 0,1,. . ., are examined in order to interpret a HhlS as an elastic solid. Let the size of the HMS be equal to N E N+. Note that nj(t 1) = k CiZ1 n i j ( t ) ,and consequently the probability distribution function (p.d.f.) of nj(t+l) is the convolution of the p.d.f.’s of n i j ( t ) , i = 1 , 2 , . . . , k . Since it is not convenient to deal with the convolution, only the means, variances and covariances of the r.v.’s nj(t)as well as their asymptotic distributions have been evaluated so far ([Bartholomew, 19821, [Vassiliou, 19821, [Vassiliou, 19971) etc). For this reason it is very useful to investigate the distributions of the r.v.’s n j ( t )by means of their moments.
+
2
Moments of the r.v.’s ni(t)
We consider the case of a homogeneous Markov system the s - th, s E { 1,2,. . . , k } , state of which has finite capacity c. Conventionally we assume that c E N. We will denote this Markov system by HMSIc,. We assume that the overflowing members leave the system. Let e ( t ) symbolize the number of the members who are overflowing at time t and Nt the size of HMS/c, at time t. Moreover, we will denote by m i @ ) the number of the members who decide k to move to state i in the time interval [t-1, t ) ,that is m i ( t ) = Cj=l nji(t--1). In what follows, we will use a new vector product denoted by ” x ’I: If x is a (column) vector then xT x xT is a row vector, that can be derived from the Kronecker product xT 8 xTIby replacing the products appearing in xT @ xT with the respective factorials. For example, if x = ( 2 1 , Q ) ~ since , XT 8 x T = ( 2 : , 2 1 2 2 , 2 2 2 1 , 2 2 2) ,
Recent Advances in Stochastic Modeling and Data Analysis
192
then XT
x
XT
= (xl(2) , X (1) I z2 (1),x2 (1)21 (1), x W 2 ),
that is XT x XT =
(Zl(X1
- 1),2122,X2X1,22(22 -
1)).
Moreover, we will use the following proposition ([Vasiliadis and Tsaklidis, 2007bl).
Proposition 1. Let a HMS/c, with transition matrix P = (pij) . Then: I . The r - t h factorial moment E[n?)(t)]of the r.v. n,(t), s E { 1 , 2 , . . . , k } , is given by
2. The mixed factorial moments of the state sizes ni(t) are given by
where ri E
N,i = 1 , 2 , . . . ,k .
By means of Proposition 1 the factorial and mixed factorial moments of the r.v.'s ni(t),i = 1 , 2 , . . . ,k , can be computed using the following theorem:
Theorem 1. Let a discrete-time HMS/c, with transition matrix P . Then
I. E [ n T ( t )x . . . x n T ( t ) ]= E [ m T ( t )x . . . x mT(t)] - eT(t)= P r
-
= E [ n T ( t- 1) x
/
Y
r
. . . x nT(t- I)]? 8 . ~8 .$1 "
\
\
-
eT(t)
4
r
r
2. E [ n T ( t )x . . . x nT(t)] = (nT(o) x . . . x n T ( 0 ) ) ( P t @. 8 . ~ '- ) -
cg:,
€+ !
- j)(Pj8
T
\
Y
T
8 Pj),
/-
r
Moments of State Sizes of Discrete T i m e Homogeneous Markov System
193
where mT(t)= ( m l ( t )ma(t), , . . . , m k ( t ) ) ,and e T ( t )as a 1x kT vector, whose element at the same place where in the vector E[rnT(t)x . . . x mT(t)] appears
equal to 0. Proof. 1. Taking into account that ([Vasiliadis and Tsaklidis, 2007al)
and Proposition 1, we derive the desired result. 2. By means of the first part of the theorem we have that
E [ n T ( t )x . . . x nT(t)] = E[nT(t- 1 ) x . . . x nT(t- 1 ) ] ( PC3 . . . C3 P ) - e T ( t ) P
\
-4
r
3
T
J-
r
The factorial moments of the overflow size
In order to evaluate the factorial moments of the state sizes, given in Theorem 1,the computation of the factorial moments of the overflow size and the mixed factorial moments E [ e ( r * ) ( t ) n('"(t)]are needed. For this purpose we denote by et the total overflow till time t . Then the following theorems hold.
n!=l %#S
Theorem 2. Let a HMS/c, with transition matrix P = ( p i j ) , i ,j = I, 2,. . . , k . The r - t h factorial moment of the r.v. e ( t ) , at any time t , i s given by
194 Recent Advances in Stochastic Modeling and Data Analysis
Proof. The r - th factorial moment of the r.v. e ( t ) / e t - l , is given by No-et-l-c
C
~ [ e ( ' ) ( t ) / e , - l= ]
x ( ' ) P ( e ( t )= x / e t - l ) .
(1)
x=o
The probability density functions (p.d.f.'s) of m i ( t ) / e t - l , i = 1,2,. . . ,k, satisfy the relation ([Vasiliadis and Tsaklidis, 2007al)
1 P [mi@) = n/et-l] = n!
C
j=O
(-l)j
T E[m,!"fj'(t)].
(2)
From (2) we get
P ( e ( t )= x / e t - l )
= P(m,(t) = c
+ x/et-l)
Then, from ( 1 ) and (3), we deduce
In order to evaluate the mixed factorial moments of the r.v.'s e ( t ) ,ni(t), i = 1,2,. . . , k , i # s, the next proposition is needed. Proposition 2. Let a discrete r.w. x = ( X I ,x2,.. . ,xk),where X I + x2 -I. . . + XI, = N , N E N. Then, the joint distribution of X satisfies the relation
+ + +x k =
Proof. For a discrete r.v. x = ( X I ,x2,.. . ,x k ) , with x1 x2 . . . N , N E N, the mixed factorial moments E [ n t = l X i T i ) ] where , ri E r1 7-2 . . . rk = N , are given by
+ + +
n
N and
k
=
Z!'")P(x1
XI+
= 2 1 , x2 = Z2,.
. . ,XI, = 2 k ) .
...+X & = N Z=l
+ + +
Since r1 7-2 . . . rk = N , all but one of the summands appearing in the latter sum are equal to 0; only the term arising for 2 1 = T I , x2 = 7-2, . . . ,xk = rk is not equal to 0. Thus, we have = T I ,Xz = T z , . ..,Xk = T k ) .
Moments of State Sizes of Discrete Time Homogeneous Markov System 195 Hence
Now we provide the following theorem. Theorem 3. Let a HMS/c, with transition matrix P = ( p i j ) , i , j = 1 , 2 , . . . ,k . T h e n the mixed factorial moments E [ e ( T s ) ( t ) n ! =niTi)(t)], l ri E M, i = a#s 1 , 2 , . . . ,k , at any time t , are given by No -c No - I -C
k
k
where the summation CZiis extended over all xi, i = 1 , 2 , . . . , k , i # s, which satisfy the condition X I . . . x k = NO- j - c - 1.
+ +
Proof. We have that
or
i=l i#s
. P ( m l ( t )= X l , m 2 ( t ) = 2 2 , . . . ,m,(t) = c
+ 1,. . . ,mk(t)= x k / e t - I ) ,
where the summation CZiis extended over all xi, i = 1 , 2 , . . . , k , i which satisfy the condition X I . . . x k = NO - et-l - c - 1. Then, from Proposition 2 we get
+ +
(4)
# s,
196 Recent Advances in Stochastic Modeling and Data Analysis
Hence k
k
i=l
i=1 a#*
i#s
Nn-c
k
j=O
i=l i#a
and by ( 5 ) , we get the desired result. The p.d.f. of the total overflow et can be evaluated using the following theorem.
Theorem 4. Let a HMS/c,. The p.d.5
of
et satisfies the recursive relation
for 1 5 x 5 NO- c, and Nn-c x=1
Proof. For 1 5 x 5 No - c we have
+
P(et = x/et-l = 1) = P ( e ( t ) et-1 = x/et-l = I ) = P ( e ( t )= II: - Z/et-l = 1 ) = P ( m , ( t ) = c z - I/et-l = I ) .
+
We get from (2) that
1 ( c + x - Z)!
No-Z-c-x+Z
j=o
For z = 0, we derive No-c
P(et = 0) = 1 -
C P(et
j=1
=j ) .
Moments of State Sizes of Discrete Time Homogeneous Markov System
4
197
Concluding remarks
T h e evaluation of t h e moments of t h e state sizes becomes very useful in real applications, since they determine completely the state sizes’ distributions. The problems treated in sections 2-3 can be examined if capacities are considered for all the states of t h e system. In this case t h e overflowing members can be assumed to enter a new (nominal) state, say k 1. Then, new transition probabilities, and t h e respective transition matrix, shall be computed. It is very interesting t o notice that the transitions between t h e states of this new system are not independent. Thus t h e results given in t h e literature for the classical HMSs cannot be generalized directly for the HMS with capacities. Nevertheless, the evaluation of t h e moments of t h e state sizes is still a very useful tool for t h e determination of the distributions of t h e state sizes and the examination of t h e evolution of t h e system.
+
References [Bartholomew, 1982lD.J. Bartholomew. Stochastic Models for Social Processes, volume 3rd edn. J. Wiley, New York, 1982. [Gani, 196315. Gani. Formulae for projecting enrolments and degrees awarded in universities. J. R. Statist. SOC.A., 126:400-409, 1963. [Kipouridis and Tsaklidis, 200111. Kipouridis and G. Tsaklidis. The size order of the state vector of discrete-time homogeneous markov system. J. Appl. Prob., 38~357-368,2001. [McClean et al., 1998lS.I. McClean, -, and et al. Using a markov reward model to estimate spend-down costs for a geriatric department. J. Operat. Res. SOC., 10:1021-1025, 1998. [Patoucheas and Stamou, 1993lP.D. Patoucheas and G. Stamou. Non-homogeneous markovian models in ecological modelling: a study of the zoobenthos dynamics in thermaikos gulf, greece. Ecological Modelling, 66:197-215, 1993. [Taylor et al., 2000lG.J. Taylor, -, and et al. Stochastic model of geriatric patient bed occupancy behaviour. J. R. Statist. SOC.,163 (1):39-48, 2000. [Tsaklidis and Soldatos, 2003lG. Tsaklidis and K.P. Soldatos. Modelling of continuous time homogeneous markov system with fixed size as elastic solid. Appl. Math. Modell., 27:877-887, 2003. [Tsaklidis, 1994lG. Tsaklidis. The evolution of the attainable structures of a homcgeneous markov system with fixed size. J. Appl. Prob., 31:348-361, 1994. [Vasiliadis and Tsaklidis, 2007alG. Vasiliadis and G. Tsaklidis. On the distributions of the state sizes of discrete time homogeneous markov systems. Methodology and Computing in Applied Probability (Under publication), 2007a. [Vasiliadis and Tsaklidis, 2007bl G. Vasiliadis and G. Tsaklidis. The discrete-time homogeneous markov system with a finite state capacity. Proceedings of the 20-th Pan. Statist. Conference, Nicosia, 2007b. [Vassiliou, 1982lP.-C.G. Vassiliou. Asymptotic behaviour of markov systems. J. Appl. Prob., 19:851-857, 1982. [Vassiliou, 1997]P.-C.G. Vassiliou. The evolution of the theory of non-homogeneous markov systems. Appl. Stoch. Models Data Anal., 13, no. 3-4:159-176, 1997.
Copulas and goodness of fit tests PA1 Rakonczai’ and AndrAs Zemplkni’ Eotvos Lor6nd University Department of Probability Theory and Statistics Budapest, Hungary (e-mail: pauloQmath. elte .hu) Eotvos LorAnd University Department of Probability Theory and Statistics Budapest, Hungary (e-mail: zempleni@math. elte.hu) Abstract. There are more and more recent copula models aimed at describing the behavior of multivariate data sets. However, no effective methods are known for checking the validity of these models, especially for the case of higher dimensions. Our approach is based on the multivariate probability integral transformation of the joint distribution, which reduces the multivariate problem to one dimension. We compare the above goodness of fit tests to those, which are based on the copula density function. We present the background of the methods as well as simulations for their power. Keywords: copulas, goodness of fit test, probability integral transformation.
1
Introduction
In the last decades the question of multivariate modeling became also tractable, by the vast number of recorded data and the powerful computing equipment readily available. However, the methodology has not always been kept pace with the available resources: one can easily fit multivariate models by one software or another, but there is not always a suitable method at hand for checking the goodness of the fit. Copulas, simple yet powerful tools for modeling, ensuring the separation of marginal modeling and dependence questions, have been re-invented in the 1990s and their use has been expanded rapidly since then. One natural area of their applications is in the financial mathematics, where they are often used t o model the dependence structure between assets or losses, stock indices and so on. The paper is organized as follows. In Section 2 we recall the definition of the Archimedian copula and outline its most relevant properties. In section 3 we demonstrate two completely different approach for investigating the goodness of fit. In Section 4 we apply the presented methods for real financial data sets and compare their power. We show possible extensions of the wellknown bivariate methods t o higher dimensions. In this aspect the open-source R package played a leading role, see http://www.rproject.org/ for its description and the available packages 198
Copulas and Goodness of Fit Tests
2
199
d-dimensional Archimedian copulas
In the following we present the basic elements of copula theory for the class of the d-variate Archimedian copulas, which is the extension of the very popular notion of bivariate Archimedian copulas, so it possesses similarly favorable characteristics. Let us consider a copula generator function: $Q(u): [0,1] + [O,oo],which is continuous and strictly decreasing with $(1) = 0. Then a d-variate Archimedian copula function is
The d-copula inherits the beneficial properties of its bivariate ancestor, however it has a limitation that for a fixed family $6, there are only a few parameters to capture the full dependence structure. Since all the d - 1 dimen(1,u2, ...,U d ) = sional margins of an Archimedian copula are identical: ... = c,@(u~, ...,~ d - 1 ~ 1=) qj,’(~f:; +o(ui)),it assumes a certain symmetry among the coordinates. Anyway, since the main aim of this paper is to introduce some appropriate methods for checking a given model’s validity and not to develop involved copula models, the Archimedian families are pretty eligible. Indeed in the course of the next sections we deal mostly with the Clayton copula family, but we emphasize that the presented methods can be adapted for any Archimedian models in the same way. The generator function of the Clayton copula is given by $ Q ( u= ) u-’ - 1, hence +il(t) = ( t + l ) - i . The Clayton d-copula function, also known as Cook and Johnson’s family, is given by:
c,,
with 9 > 0. Simulations can be performed by general methods, such as the conditional sampling, which can be computed quite easily with the help of the derivatives of the function $ i l ( t ) , for details see [l]. Beyond the simulation, another relevant question is the parameter estimation. An easy method for the bivariate case is based on Kendall’s T [cite:] In the general, d-dimensional case one may use the by the form of 8 = so-called maximum pseudo-likelihood method (see [5]), based on the copula density function:
&.
200
3
Recent Advances in Stochastic Modeling and Data Analysis
Goodness of fit t e s t s
In this section we discuss the goodness of fit statistics in two subsections, first the tests related to the cumulative distribution function and then those based on the probability density function. 3.1
GOF statistics based on PIT
Let a random vector X = ( X I ,...,X,) possess a continuous d-variate copula model c = (CQ) with unknown margins F‘1, ..., Fd and (X11,...,& I ) , ..., ( X I n r...,X d n ) , n 2 2 a random sample from X. Let the distribution function of the probability integral transformation V = H ( X ) be denoted by
K ( e , t )= P(H(X) 5 t ) = p(CB(Fl(xl), ...>F l ( X d ) )5 t ) .
(4)
In the case of the Archimedian copula family (4) can be computed as follows
( t ~Clayton . copulas this can be given as where f i ( 0 , t ) = $ q 5 ~ 1 ( z ) / z = ~ oFor
nii’,(m+
where q(e, i, rn) = $9). Define the empirical version of K as
where Ein = x ; = , l ( X 1 k 5 xli,...,Xdk 5 Xdi). The test statistics we propose for checking the goodness of fit is based on the comparison of the parametric estimate K ( & , t ) of K ( Q , t )with its empirical counterpart K n ( t ) (for further details see [4]). Known tests for the bivariate case use some continuous functional of the Kendall’s process n,(t) = fi(K(&,t) - K n ( t ) )such as Sn = J i ( r ~ . ~ ( t ) )and ~ d tTn = supost
where
Weighted deviations
is a n appropriately fine division of the interval ( 0 , l ) .
Copulas and Goodness of Fat Tests
3.2
201
Kernel-based GOF statistics
An other prevailing approach is based on the smoothed empirical copula density. For each index i, define the d-dimensional vectors
Yi
=
(Fl(Xi,l), ..., Fd(Xi,d)) and Yn,i= (Fn,l(Xi,l), ..., Fn,d(Xi,d)),
denoting by Fn,kthe empirical k-th marginal cdf of X. Obviously, the copula C is the cdf of Y and its empirical version is
i=l k = I
We assume that Y has a density c, so its kernel estimator at point u is
where r is a d-dimensional kernel and h = h ( n )is a usual bandwidth sequence. - nhd
2
sr2
k=l
(cn(uk) - c ( u k , ~ n ) ) ~ C(ukr
0,)'
(10)
Under the appropriate conditions (see [a]): If the chosen copula is the true one then T tends to the x2 distribution with m degrees of freedom. Theoretically it is very comfortable, because it provides a distribution free testing method. But we have to mention that the conditions even for fitting a proper non-parametric model are far from minimal (adequate kernel structure and bandwidth). Beyond that choosing a grid for evaluating the statistic can be cruel as well. So in practice we propose the following test statistics m
In the modified statistic (ll),the coefficients of the sum were omitted and the squared deviations were weighted by the copula density. This is a logical choice, as thus we concentrate on points which are more frequent under our model. Critical values can be calculated by Monte Carlo simulations similarly to the previous subsection.
4
Application for financial data sets
This section illustrates the implementation of the described goodness of fit procedures. We investigate three stock indices namely BUX (Budapest), WIG (Warsaw) and PX (Prague). The data are the daily closing prices
202
Recent Advances in Stochastic Modeling and Data Analysis Empirical 3-copula Kendall’s tau = 0.754
Filted Claylon 3-copula Theta = 6.131
BUX
EUX
BUX-WIG-PX
Simulationsfrom Clayton 3-copula
0.0
0.2
0.4
0.6
0.6
1.0
Deviations
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 1. Dependence structure for the whole data set
recorded from 02/01/1997 to 02/01/2007, i.e. around 2500 observations (as there are observations on working days only). Since our aim is to capture the dependence structure, we transformed the data set into the unit cube with the help of the empirical univariate margins, and attempted to fit a 3-dimensional Clayton model. Since the empirical copula has upper tail dependence and the Clayton copula has lower tail dependence, in the first step it is practical to ”turn the empirical copula upside down”. This can be seen in the upper left panel of Figure 1. Next to the empirical data we can see a simulation from the fitted model, so it is clear that the Clayton model can not mimic the given structure properly. The lower graphs report the deviation.between the given and the estimated data set. In the first panel the estimated K ( & , t ) function is given, together with its 95 percent confidence bounds from a Monte Carlo simulation with 1000 repetitions, compared to the empirical K,(t) function. The second graph emphasizes the deviations between the graphs. We see that the the observed data are fairly far from our model. This phenomena is proven by the test procedures, since none of our statistics accepts our model. Indeed the observed test statistics are 5-20 times higher then the maximum from 1000 simulations (see Table 1). However, if we omit the extremes from our data set by considering instead of the unit cube just the observations falling into [0.125;0.8613, then no tests among the proposed ones rejects the fit (Table 2 gives the details).
Copulas and Goodness of Fit Tests 203 Sirnulation summary
I s1 I
s 2
I
I
s3
s 4
Min. 10.0672 I 0.00019 I 0.00058 IO.0019 1st Qu 0.1228 0.00064 0.00157 0.0068 Median 0.1455 0.00092 0.00211 0.0092 Mean I 0.1499 I 0.00104 0.00226 0.0103 3rd Qu.1 0.1707 I0.00129 10.00275 10.0127 Max. 10.3255 I0.00531 I 0.00829 I10.0415 I 1 I I I I Obs. ~1.3787~0.06991~0.16377~0.7183~ Table 1. Simulated and observed statistics for the whole data set
Empirical 3-copula Kendall’s tau = 0.349
Fitted Clayton 3-cOPula Theta = 1.072
BUX
BUX BUX-WIG-PX
Simulationsfrom Clayton 3-copula
0.0
0.2
0.4
0.6
0.0
1.0
Deviations
0.0
0.2
0.4
0.6
0.0
1.0
Fig. 2. Dependence structure for that part of the data set, where the extremes were removed
In the case of the kernel-based approach there are more difficulties. The value of the proposed T statistic depend strongly on the kernel estimation of the copula. In the current case we set h = 0.05 as bandwidth and evaluated the estimated models on a 100 x 100 grid. This is reported in Figure 3 for the BUX-WIG indices. In the left panel there is the estimated kernel density for the observations, next t o that in the middle panel the estimated kernel density for a simulated data set from the model, with the given sample size and in the last one the ”true” values of the model’s density.
204
Recent Advances in Stochastic Modeling and Data Analysis Simulation summary
I s1 I
s 2
I
s 3
I s4
Min. 10.108210.00055 IO.0011 II 0.003 1st Qu 10.21471 0.00221 I 0.0054 10.0192 Median10.27361 0.00361 I 0.0085 10.0341
Table 2. Simulated and observed statistics for that part of the data set, where the extremes were removed Kernel density of Observallon?l
Kernel density Of Simulallans
Theoreticaldensity
Fig. 3. Contour plots of the densities for the whole data
As in the previous 3-dimensional analysis, next we investigated the subset data ( observations falling into [0.125; 0.8612 ) as well, in which case we noticed a better fit. This can be seen in Figure 4, where more similarity between the observations and the estimated model is detected. We performed a simulation study for both of the two cases and found that simulated T statistics can not detect the deviations so effectively as in S statistics (the observed sample corresponds to the 0.858 quantile in the case of the whole data set, and t o the 0.324-quantile for the subset). It is also clear, however that the model fits better t o the subsample.
5
Conclusions
We have shown that the statistics] based on the probability integral transform, gave reasonable results even for our moderate sample size. We are
Copulas and Goodness of Kernel density Of Obsewationb
Kernel densihl of Simulations
Fit Tests
205
Theoretics1 density
Fig.4. Contour plots of the densities for that part of the data set, where the extremes were removed
about to undertake further investigations about the sensitivity of the statistics and the proposed weight functions (see [3],[6]and [7]). The presented analysis about the kernel-based methods was more preliminary. A lot of work has still to be done until one gets a clear picture about the properties of these statistics for real, relatively small data sets. A crucial question is the choice of the points U k as well as the exact form of the most appropriate statistics for a given type of problems.
References [llcherubini, U. Luciano, E. and Vecchiato, W. (2004) ”Copula methods in Finance”, WileyFinance, West Sussex, England. [2]Fermanian, J.D. (2005) ”Goodness of fit tests for copulas”, J. Multivariate Anal, 95, 119-152. [3]Fouweather,T. Rakonczai, P. and Zemplbni, A. (2007) ” Anderson-Darling type goodness-of-fit tests with extreme value applications”, (in preparation) [4]Genest, C. Quessy, J.-F. and Rbmillard, B. (2006) ”Goodnes-of-fit Procedures for Copula Models Based on the Integral Probability Transformation”, Scandinavian J . of Statistics, 33, 337-366. [5]Genest, C. and Favre, A.-C. (2006) ”Everything you always wanted to know about copula modeling but were afraid t o ask”, Journal of Hydrologic Engineering. [GIRakonczai, P. Bozs6, D. and Zemplbni, A. (2005) ”Goodness of fit in extreme value analysis and for copulas”, Morgan Stanley Conference on Quantitative and Mathematical Finance, Budapest, Hungary. [7]ZemplBni, A., Bozs6, D. and Rakonczai,.P. (2006) ”High dimensional copulas for simulating and testing Extreme Value Models”, XXVI European Meeting of Statisticians, Torun, Poland.
Discrete time semi-Markov models with fuzzy state space Aleka A. Papadopouloul and George M. Tsaklidis' Department of Mathematics Aristotle University of Thessaloniki, Thessaloniki 54124, Greece (email: apapadommath. auth. gr, [email protected])
Abstract. In the present paper, the classical semi-Markov model in discrete time is examined under the assumption of a fuzzy state space. The definition of a semi-Markov model with fuzzy states is provided. Basic equations for the interval transition probabilities are given for the homogeneous and non homogeneous case. The definitions and results for the fuzzy model are provided by means of the basic parameters of the classical semi-Markov model. Keywords: fuzzy states, semi-Markov process, discrete time.
1 Introduction In this paper we extend the classical semi-Markov model by assuming a fuzzy state space. We examine its behaviour through the concept of probability of fuzzy events. Important theoretical results and applications for semiMarkov models can be found in Cinlar (1969,1975,1975), Iosifescu-Manu (1972), Teugels (1976), Pyke and Schaufele (1964), Keilson (1969,1971), Mclean and Neuts (1967), Howard (1971), McClean (1980,1986), Janssen and De Dominicis (1984), Janssen (1986) and in Janssen and Limnios (1999). The non homogeneous semi-Markov system in discrete time was examined in Vassiliou and Papadopoulou (1992), and the asymptotic behaviour of the same model was studied in Papadopoulou and Vassiliou (1994). Fuzzy states occur essentially in two cases. First, when the states of the system cannot be precisely measured and thus the states used t o model the system are intrinsically fuzzy. In the second case the actual states can be exactly measured and are observable but the number of states is too large and thus the decisions cannot be practically associated with the exact states of the system. In these situations the decisions are associated with fuzzy states which can be defined as fuzzy sets on the original non fuzzy state space of the system (Bhattacharyya (1998)). Fuzzy set theory was introduced by Zadeh (1965,1968) to describe situations with unsharp boundaries, partial information or vagueness such as in natural language. In fuzzy set theory subsets of L?are replaced by functions f : L?--f [O, 11. In section 2 basic equations of a semi Markov model are given. In section 3 the definition of a semi Markov model with fuzzy states 206
Discrete Time Semi-Markov Models with Fuzzy State Space 207 and the basic equations for the interval transition probabilities are given. All definitions and results for the fuzzy model are provided by means of the basic parameters of the classical semi-Markov model.
2
The semi-Markov model
We consider a population which is stratified into a set of states according t o various characteristics and we denote by S = {1,2,. . . ,M } the set of states assumed t o be exclusive and exhaustive, so that each member of the system may be in one and only one state at any given time. Time t is considered to be a discrete parameter and the state of the system at any given time could be , . . . ,N ~ ( t ) l 'where , Ni(t) described by the state wccfor N(t) = "1 ( t ) Nz(t), is the expected number of members of the system in the i-th state at time t . The expected number of members of the system at time t is denoted by T ( t ) ,and the expected number of leavers at time t by N ~ + l ( t )We . assume that T ( t ) = T , i.e., the total number of leavers equals the total number of recruits for every t , and that the individual transitions between the states occur according to a non homogeneous semi-Markov chain (emhdded nmn homogeneous semi-hfa.rkov cham). In this respect we denote by F(t),"=,the sequence of matrices, whose (i,j ) -th element is the probability of a member of the system t o make its next transition to state j , given that it entered state i at time t. Let also p ~ + l ( t be ) the Mxl loss vecfor, whose i-th element is the probability of leaving the system from i, given that the entrance in state i occurs at time t , and po(t)the Mxl vector whose j-th element is the probability of entering the system in state j as a replacement of a member who entered his last state at time t . Every member entering the system holds a particular membership which moves within the states with the members (see also Bartholomew (1982), Vassiliou and Papadopoulou (1992), Vassiliou et al. (1990)). Since the size of the system is constant, when a member decides to leave the system, then the empty membership is taken by a new recruit who behaves like the former one. Denote by P(t) the matrix
Obviously P(t) is a stochastic matrix whose (i,j)-th element is the probability that a membership of the system which entered state i at time t makes its next transition t o state j . However before entering state j , the membership 'holds' for a time in state i. Holding times for the memberships are described by the holding time mass functions h,, (m) which express the probabilities that a membership which entered state i at its last transition holds m time units in i before its next transition, given that state j has been selected. Also, denote the interval transition probabilities by q i j (n,s) = prob{that a membership which entered state i at time s will be in state j after n time units},
208
Recent Advances in Stochastic Modeling and Data Analysis
with corresponding probability matrix $(TI., and Papadopoulou (1992) we have that
s) =
{qij(n,s)}. From Vassiliou
n
Q(n,s ) => W(n, S )
+ C C ( Sm, ) Q ( n
-
m, s
+ m),
(1)
m= 1
where 0 C ( s ,m) is the Hadamard product of the matrices P(s),H(m), i.e., C ( s ,m ) = P(s)oH(m),where H(m) = { h i j ( m ) } 'W(% s ) = 10{C:==7L+l C(S,m)U> 0 U is a matrix with all its elements equal to 1. For the homogeneous case equation (1)becomes
n
Q ( n ) => w ( n )
+ 1C ( m ) Q ( n
-
m),
(2)
m=l
where
C ( m ) is the Hadamard product of the matrices P, H(m), i.e., C ( m ) = PoH(m) 'W(n) = IO{c:=,+1 C(m)U>. Also from F'rom Papadopoulou and Vassiliou (1994) we get that the limit of Q(n,s ) } equals 0
where 0 P' = limn+mP7L, P = lzms-,mP(s), assuming that P is an irreducible and regular stochastic matrix 0 W = Cz=, nW(n) and W is a diagonal matrix whose i-th element is the mean waiting time in i when s -+ co and W(n) = lim,,,W(s,n).
3
The homogeneous semi-Markov model with fuzzy states
Let SF = { F I ,F 2 , .. . , F N } be the fuzzy state space. In most cases the number N of fuzzy states is much smaller than the number M of states of the initial semi-Markov process. Then the probability that the process is in the fuzzy state F,. (r = 1 , 2...,N ) equals M
Prob(F,) = -ypFT(i)Pl.ob(z),
(4)
i= 1
where 0
p ~ ? , (is. )the membership function of the fuzzy event F,., 0
5 p ~ , ( i )5 1
Discrete Time Semi-Markov Models with Fuzzy State Space 209
cr=l
for every i and N p F , ( i ) = 1. The membership function p ~ , . ( .describes ) the grade of membership of the fuzzy state F, to the crispy state i. 0 Prob(i) is the probability that the process is in state i of the initial state space. Now as described above and taking into account Bhattacharyya (1998), the fuzzy transition probabzlztaes of the process can he defined as: P F ~ F=, prob{the process moves to fuzzy state Fj at its next transition / the process entered fuzzy state Fi at its previous transition} = prob{X,, = Fj/Xp = Fi}. Then from (4)and the latter definition we have
In what follows we will define the corc matrix of the fuzzy semi-Markov process. Let us define as: Cpi,(m) = prob{the process holds in Fi m time units and makes its next transition to Fj / the process entered Fi at its previous transition}.
Lemma 1 It is true that
Proof We have that C F , F , ( ~=)prob{the process holds in Fi m time units and makes its next transition to Fj , the process entered Fi at its previous transition}/prob{ the process entered Fi at its previous transition} = ELl prob{the process holds m time units in state r, makes its next transition to s, entered state r at its previous transition}.
c,"=,
M p F i ( T ) p F j ( s ) / c r = lP o r p F i (T)
-
c::,
P"+Fi
(r)
Using similar reasoning as in Bhattacharyya (1998), the transition proba~ , the probabilities of the holding times C F , F , ( ~for ) the bilities P F ~ and fuzzy process can he interpreted in matrix notation by means of the corresponding matrices of the initial process. We denote: PF = { p F i F j }
210
Recent Advances in Stochastic Modeling and Data Analysis
MF = { p : } , where p$ = p F j ( i ) , i = 1 , 2...M , j = 1,2, ...N Pois an NxN diagonal matrix with its (2, i)-th element equal to porpp, (r)]-' 0 D is an MxM diagonal matrix with its (i,i)-th element equal to poi Then it c m be easily seen that
[EL,
0
pF = D M ~ P , P M ~ ,
(7)
and then the k-step transition matrix (PF)' is given by
(PF)' = D M k P o ( P M ~ D M k P o.)..( P M F D M ~ P , ) P M F ,
(8)
where the number of the parenthesis is k-1 terms. In order to simplify notation we adopt Einstein's notation to get
(PMFDMk Wij = P o j P i , ~ s ~ , s ~ j s
(9)
Under mild conditions concerning the initial probabilities poj and the matrix MF, the product PMpDMkP, turns out to be a primitive matrix. For example, this is the case if puspjs and poi are non-null. Then (PF)', k = 1,2, ..., becomes a fully regular (stochastic) matrix. Similarly, the fuzzy core matrix is of the form
C ~ ( r n .=) DMkP,C(m)MF.
(10)
In the following lemmas we will prove that PF is a stochastic matrix and CF,F, are probability functions, as in the classical process. (a)
Lemma 2 The transition matrix of the fuzzy model is stochastic, i.e.,
Discrete Time Semi-Markov Models with Fuzzy State Space
211
For the probability functions CF,F,(.) yields: Lemma 3
It is true that
Proof We have that
Now, since
C:=, h,.,(m)
= 1 for every s, then
Last, following probabilistic arguments it can be proved that the interval transition probabilities for the fuzzy process are given by ri
Q F ( ~=') W F ( ~+)
C C F ( ~ ) Q F- (m), ~
(13)
ni= 1
where >WF(n) =
Io{cz=TL+l DMkPoC(m)MFU).
Similarly, the above equation for the non homogeneous model can be proved to be of the form
References l.Bhattacharyya, M (1998): Fuzzy Markovian decision process. Fuzzy scts and sustems, 99, 273-282. P.Cinlar, E (1969): Markov renewal theory. Adv. Appl. Prob., 1, 123-187. d 21, 7273.Cinlar, E (1975): Markov renewal theory: a survey. M u ~ ~ u y e , n s tSLL., 752. 4.Cinlar, E (1975): Introductzon t o stochastzc processes., Prentice-Hall, Englewood Cliffs, NJ. B.Howard, R.A. (1971)' Dynanzzc Probabzlzstzc systems. Wiley, Chichester.
212
Recent Advances in Stochastic Modeling and Data Analysis
6.Iosifescu - Manu, A . (1972): Non homogeneous semi-Markov processes. StudLasz Cercetuan A4atematice, 24, 529-533. 7.Janssen, J . (1986): Senti-Markov models: Theory and Applications. ed. J. Janssen, Plenum Press, New York. 8.Janssen, J. and R. De Dominics (1984): Finite non homogeneous semi-Markov processes: Theoretical and computational aspects. Iiwur.utLcc: M u ~ ~ ~ ~ J ~ u ~ L and Economics, 3, 157-165. 9.Janssen, J . and N. Limnios (1999): Semi-Marko~~ models and Applications. J. Janssen and N. Limnios Eds, Kluwer Academic Publishers, Dordrecht. lO.Keilson, J (1969): On the matrix renewal function for Markov renewal processes. Aim. Malh. Slulrsl., 40, 1901-1907. ll.Keilson, J (1971): A process with chain dependent growth rate. Markov Part 11: the ruin and ergodic problems. Adv. Appl. Prob., 3, 315-338. la.McClean, S.I. (1980): A semi-Markovian model for a multigrade population. .I. , 846-852. Appl. P r ~ b . 17, lS.McClean, S.I. (1986): Semi-Markov models for Manpower planning. In SemiMarkov models: Theory and Applications, 283-300. Plenum Press, New York. 14.Mclean, R. A. and M. F. Neuts (1967): The integral of a step function defined on a semi-Markov process. Sia7n. .I. A p i ~ l .Math.., 15, 726-737. 15.Papadopoulou, A.A. & P.-C.G. Vassiliou (1994): Asymptotic behavior of non homogeneous semi-Markov systems. Lznear Algebra and Its Applications, 21 0, 153-198. 16.Pyke, R. and R. A. Schaufele (1964): Limit theorem for Markov renewal process. Ann. Math. Statist., 55, 1746-1764. 17.Teugels J.L. (1976): A bibliography on semi-Markov processes. J . CoTrip. A p p l . Math., 2, 125-144. 18.Vassiliou, P.-C.G. and A.A. Papadopoulou (1992): Non homogeneous semiMarkov systems and maintainability of the state sizes. .I. Appl. Proh., 2.9, 5 1S534. 19,Vassiliou, P.-C.G. , A. Georgiou and N. Tsantas (1990): Control of asymptotic variability in non homogeneous Markov systems. J . Appl. Prob., 27, 756-766. 20.Zadeh, L.A. (1965): Fuzzy sets. Information and Control, 8, 338-353. 21.Zadeh, L.A. (1968): Probability measures of fuzzy events. .J. Math. An.al. A p p l . , 23, 421-427.
An application of the theory of semi-Markov processes in simulation Sonia Malefaki and George Iliopoulos Department of Statistics and Insurance Science University of Piraeus 80 Karaoli & Dimitriou str., 18534 Piraeus, Greece (e-mail: { smalef ak ,geh}Qunipi .gr) Abstract. Importance Sampling (IS) is a well-known Monte Carlo method which is used in order to estimate expectations with respect to a target distribution 7r, using a sample from another distribution g and weighting properly the output. Here, we consider IS from a different point of view. By considering the weights as sojourn times until the next jump, we associate a jump process with the weighted sample. Under certain conditions, the associated jump process is an ergodic semiMarkov process with stationary distribution 7 i . Besides its theoretical interest, the proposed point of view has also interesting applications. Working along the lines of the above approach, we are allowed to run more convenient Markov Chain Monte Carlo algorithms. This can prove to be very useful when applied in conjunction with a discretization of the state space. Keywords: Importance sampling, properly weighted samples, Markov chain Monte Carlo, semi-Markov process, limit distribution, discretization.
1 Introduction One of the most common and difficult t o handle problems in computational stastistics and especially in Bayesian analysis is the estimation of integrals of the form
.i, (X, B(X)), E,(h) :=
h(z)T(dz),
a probability distribution T and a function for a measurable space h E C1(n). For this purpose, many Monte Carlo (MC) and Markov chain Monte Carlo (MCMC) methods have been developed. A well-documented introduction t o this topic is presented by Robert and Casella [Robert and Casella, 19991 and Gilks et al. [Gilks et al., 19961. Importance sampling (IS) [Marshall, 19561 is one of' the most popular and well-known MC methods for handling succesfully such problems. The basic idea behind IS is that instead of sampling directly from the target distribution T , a sample ( 5 1 , . . . , 2), is generated from another distribution g which is easy t o sample from and with support at least the same as T . Then, E,(h) is estimated by
213
214
Recent Advances in Stochastic Modeling and Data Analysis
where w(zi) := r ( z i ) / g ( z i )which , is called importance weight. The most frequently used estimator is LAs since it can be used in more general settings, such as cases where T is known up to a multiplicative constant. Moreover, it can be mentioned here that the assumption of independent samples is not so crucial. IS estimators converge to E,(h) even if the sequence of z's forms a Harris ergodic MC, due to the Ergodic theorem. On the other hand, since all draws are from the proposal distribution g, IS seems at a first glance to fail obtaining samples from T . This makes IS not to be a proper method for the cases when the aim is to estimate features of the distribution that cannot be expressed as expectations, such as quantiles. Recently, it has been proven in [Malefaki and Iliopoulos, 20071 that under certain conditions the g-sample converges in a sense to the target distribution T . By considering the importance weights as sojourn times until the next jump, a jump process is associated with the weighted sample. In the case that the original sample sequence forms an ergodic Markov chain, the associated jump process is an ergodic semi-Markov process with stationary distribution T . From this point of view IS does not differ much from MCMC schemes, in that it exhibits convergence to the target distribution as well. Some of the well-known MCMC procedures are special cases of the above mentioned jump processes (e.g. the Metropolis-Hastings algorithm, see Subsection 3.1). This is also true for general properly weighted samples with respect to a target distribution r. Behind the theoretical interest of the above approach, it can be used in order to facilitate MCMC approximates. Combining the above approach with the discretization of the state space, one is allowed to run a more convenient MCMC algorithm with a diEerent target distribution and then weight properly the obtained output. This paper is organised as follows: In Section 2, we give some basic definitions and the main results of the paper. In Section 3, Metropolis-Hastings is connected with the context of jump processes. Moreover, we give a toy example to illustrate the method and then this method is applied to a benchmark example in Bayesian analysis. Finally, we conclude this paper by providing a short discussion.
2
Main results
The concept of a properly weighted sample has been introduced by Liu and Chen [Liu and Chen, 19981 as a generalisation of the standard IS method. An equivalent and more convenient definition is
Definition 1. [Liu, 20011 A set of weighted random samples called proper with respect to T if
for some positive constant
K,
where
Xi
-
g.
(Xi, [i)lcicn is
An Application of Theory of Semi-Markov Processes in Simulation
215
In the sequel, Malefaki and Iliopoulos [Malefaki and Iliopoulos, 20071 associated a jump process with any infiiiite weighted random sequence in the following sense.
Definition 2. Consider a weighted sequence (X,, strictly positive weights. Define So = 0, S, =
xrzi
Nt
:= sup{n :
s, < t } ,
where the ['s are ti,n 3 1, and let
[n)nE~+,
t 3 0.
(x)tao
Then, the stochastic process Y = defined by yt := X N ~t ,3 0 , will be called the jump process associated with the weighted sequence ( X n ,En),Ez+. T h e definition ensures that the process Y has right continuous sample paths which also have left hand limits. However, if the support of En's is a subset of N = {1,2,. . .}, we will consider the process Y only for t E Z + , i.e. we set Y = (Yo,Y1,Yz,.. .). If this is the case, limits of quantities related t o Yt should be suitably interpreted.
Proposition 1. Assume that the sequence X = (X,),€Z+ is a homogeneous Harris ergodic Marlcow chain with state space ( X ,B ( X ) ) having an invariant depends solely on X , with probability distribution g and the distribution of E{EnIXn = x} = K W ( Z ) = m-(Ic)/g(x)for some K > 0 . Then, for the jump process ( y t ) t ) o associated with the weighted sequence (X,, E n ) n E ~ + , it holds that lim P{yt E A } = 7r(A), V A E B ( X ) .
cn
tTm
Proof. T h e result follows from the standard theory of semi-Markov processes [Limnios and OpriSan, 20011. Under the above assumptions, Y is a n ergodic semi-Markov process with embedded Markov chain X and respective sojourn times ( E n ) n E Z + . Thus,
as is claimed. Setting deterministically
En = w(X,),
we have the following:
Corollary 1. If (Xn),€z+ forms a Harris ergodic Marlcov chain with stationary distribution g , then the jump process associated with the weighted sequence (X,, W ( X , ) ) , ~ Z +has 7r as limit distribution. Any sequence of independent g-distributed random variables trivially forms a n ergodic Markov chain with stationary distribution g. Thus, Corollary 1 covers also the original importance weighted sequence. The requirement that the distribution of (, depends only on x, seems rather restrictive. However, zn could be a block of specific size allowing En
216
Recent Advances in Stochastic Modeling and Data Analysis
to depend on more than one term of the original sequence. (Note that the standard definition of a semi-Markov process allows the sojourn time of X , depending on both X , and X,+1.) As already mentioned in the Introduction, some of the well-known sampling schemes are special cases of this context. Moreover, another potential application of Proposition 1 is the following: Let 7r be a target distribution of which some full conditional distributions are difficiilt, to sample from. Instead of hybridizing Gibbs sampler using suitable Metropolis steps, one can replace 7r by another target distribution g of which all full conditional distributions are easily handled, run the Gibbs sampler and fiimlly weight the output. The weighted sample will be the realization of a converging .jump process (see Section 3).
3
Examples
In this section we connect Metropolis-Hastings (MH) alorithm with the context of jump processes. Moreover, we give a simple example of how one can facilitate MCMC algorithms by discritization of the state space followed by proper weighting of the sample, so that the associated jump process converges to the target distribution. 3.1
The Metropolis-Hastings algorithm
Consider an arbitrary MH algorithm [Metropolis et al., 1953; Hastings, 19701 with target distribution 7r and proposal q ( , l . ) , that is, at time t 1 given yt = y, draw 2 q ( z l y ) and set yt+l = z with probability
+
-
or X+l = y with probability 1 - a ( y , 2). Although it is well-known that the algorithm defines a reversible Markov chain with stationary distribution 7 r , let us consider it from a different point of view. + a Markov chain with transition density Let X = ( X n ) n E ~be
(Notice that this is exactly the density of the accepted states of the above MH algorithm.) It can be easily verified that g ( z & - 1 ) sat)isfies the detailed balance condition g ( z i - 1 ) g ( x i l z i - l ) = g ( z i ) g ( z i - 1I z i ) , where g ( z ) x J min(7r(x)q(zlx),7 r ( z ) q ( z / z ) } p ( d z ) .This function, when normalized, results in a probability density function, hence it is the stationary distribution of the Markov chain X . Weight now xi by ti drawn from the geometric distribution with success probability J a(zi,z)q(zlzi)p(dz). Since
E{tlzi} =
{.f ~ ( z iz ),& I Z i ) p ( d z ) } - '
x T(zi)/g(zi),
(1)
A n Application of Theory of Semi-Markov Processes in Simulation 217 the sequence ( X n r ( n ) n E is ~ +properly weighted with respect to 7r. It is immediately seen that the associated jump process is the original MH output (Y,)tGz+which is known t o be a pure Markov chain (rather than a general discrete time semi-Markov process). The above analysis suggests that we are allowed t o use any distribution (beyond the geometric) for the weights provided that (1) is sat,isfied. In particular, if & m ~ ( z i=) 7r(zt)/.q(zi)is chosen, the variance of estimators of certain expectations of interest will be minimized. However, direct calculation of this importance weights is in general computationally demanding or even infeasible making such a task hard to accomplish. Moreover, the geometric distribution comes out naturally since each simulation from g(.l.) automatically generates the corresponding geometric weight. 3.2
A toy example
-
Consider a random variable X Beta(2, 2), 0 5 z 5 1. The density function of X, up t o the normalizing constant is: 7r(z) 0;
z(1- z).
We discretize the interval [0, 11 into m = 10 equal length bins and choose as target distribution the g(x) 0: q m l ( 1 - " [ m ] ) .
+
where x [ m ~= ( [ m z ] 0.5) /m. In order t o sample from g , first draw a bin from the discrete distribution
-
+
and then simulate U U ( 0 , l ) and set z = ( j u l ) / m . In the above scheme the importance weights are w ( z ) = 7 r ( L c ) / g ( z )and according t o the Proposition 1the jump process associated with the weighted output converges t o 7r. Fig.l shows the histogram and the convergence of the weighted mean 2;' = C~="=,(zi).i/C~="=,(zz) with m = 10, computed from a n output of 10000 updates after a burn-in period of 1000 iterations t o the mean E ( X ) . In this connection, it may be remarked that this method does not require a large m in order t o converge. Moreover, we note the fast convergence of 2;'. Finally, we can point out that the convergence of 2;' is not worse than the corresponding one of the sample mean of an iid sample from the target distribution, as we can see in Fig.1.
3.3
~
Dugongs dataset
The proposed method has also been applied t o the well known dugongs dataset which was originally analyzed by Ratkowsky [Ratkowsky, 19831. This
218
Recent Advances in Stochastic Modeling and Data Analysis
m
0.65~ 0.6 0.55
Fig. 1. Histogram of the weighted sample ( x i ,w ( ~ ~ ) ) ~ ~ and ~ ~ convergence ~ o o o o of line) and of the sample mean of an iid sample from the target distribution (dashed line).
?ks with m = 10 (solid
particular dataset is among the standard examples which is used by many authors in order to illustrate and compare several sampling techniques. The data consist of length (y) and age (z) measurements for n = 27 dugongs captured near Townsville, Queensland. Carlin and Gelfand [Carlin and Gelfand, 19911 modelled the data using a nonlinear growth curve with no inflection point and an asymptote as z tends to infinity. Specifically, they assumed that
yi-N(wwhere N , , ~ , I -> 0 and 0 < y prior for the parameters:
-
" 0 , 7,1)1(. > O ) , P
with ra = (a3Bl
Yl.)
70
=
O p , 7 - 1 ) ,
i = l,...,n,
< 1. We consider the following relatively vague
-
N(0,T,l)1(3 > O), y
- U ( 0 ,1),
7
N
G(kl k ) ,
and k = lop3. The posterior distribution of 6, =
is
Sampling from the full conditional (posterior) distributions of a , 0 (truncated normal) and 7 (gamma) is a straightforward task but this is not the case for y. Instead of using a Metropolis step, we can adopt the following strategy. We choose a different target distribution, namely, g(0ldata) by discretizing the sample space of y into m equal length bins, with y being uniformly distributed within each bin. The form of the new target distribution is
A n Application of Theory of Semi-Markou Processes in Simulation 219
where 7jm]is the point that the maximum of r(0Idata) with respect to y,is achieved in each bin. In the above scheme the importance weights are
w(e)
c(
n(0 I data) g(0ldata) ’
These points are selected in order for the weighted sample mean t o have finitme variance. Sampling from the full conditional distribution of y is now a n easy task: one can first draw a bin from the discrete distribution
-
and then simulate U U ( 0 , l )and set y = ( j + u - l ) / m . According to Proposition 1, the jump process associated with the weighted output converges t o rr. Fig.2 shows the histogram and the convergence of the weighted mean = CZ, w(&)yi/ Cy=lw(&),with m = 20, computed from the output of 10000 updates after a burn-in period of 1000 iterations to the posterior mean E{yldata}. The graphs for the rest of the parameters are similar. An interesting feature arising from the above weighted scheme is that the autocorrelations almost vanish (see Fig.2). Hence, the standard errors of the estimates of the parameters can be calculated easily. The decrement in autocorrelations is also similar t o all the parameters of the model.
+A”
0.6
0.7
0.8
1.
L
0.7
0.65
0.3
1.
0.75
0.75
0.5
0.5
0.25.
0.25 11111,
Fig. 2. Histogram of the weighted sample (yi, z u ( O i ) ) l ~ i ~ l o o oconvergence ~, of 9;” and the autocorrelatons of y and weighted y with m = 20 for the dugongs dataset.
220
4
Recent Advances in Stochastic Modeling and Data Analysis
Discussion
T h e a i m of this paper is to stress o u t that t h e proper weighting of a Markov chain’s o u t p u t can be used i n order to obtain samples from t h e target distribution. T h i s is accomplished by considering i t from a different point of view, namely, associating an appropriate weakly convergent j u m p process to the weighted sample. Man y well known simulation schemes, including the MH algorithm, fall in this context. Moreover, contrary to w ha t is thought, this is t h e case for t h e s t an d ar d IS o u t p u t . Hence, IS c a n also be used in order to obtain (approximate) samples from t h e t ar g et distribution. Besides its theoretical interest, t h e benefit of the proposed point of view is significant if i t is applied i n conjunction with the discretization of t h e state space i n order to facilitate M C M C algorithms (as in Subsection 3.3).
References [Carlin and Gelfand, 1991lB.P. Carlin and A.E. Gelfand. An iterative Monte Carlo method for nonconjugate Bayesian analysis. Statistics & Computing, 1:119128, 1991. [Gilks et al., 1996lW.R. Gilks, -, and et al. Markov Chain Monte Carlo in practice. Chapman & Hall, New York, 1996. [Hastings, 197OlW. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97-109, 1970. [Liu, 2001 1J.S. Liu. Monte Carlo Strategies in Scientific Computing. SpringerVerlag, New York, 2001. [Liu and Chen, 1998lJ.S. Liu and R. Chen. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association, 93: 10321044, 1998. [Limnios and Oprigan, 20011N. Limuios and G. OpriSan. Semi-Markov processes and reliability. Statistics for industry and technology, Birkhauser, 2001. [Malefaki and Iliopoulos, 2007]S. Malefaki and G. Iliopoulos. On convergence of properly weighted samples t o the target distribution. Journal of Statistical Planning and Inference, (to appear). [Marshall, 1956lA.W. Marshall. The use of multi-stage sampling schemes in Monte Carlo computations. In M.A. Meyer, editor, Symposium on Monte Carlo methods, pages 123-140, Wiley, New York, 1956. [Metropolis et al., 1953lN. Metropolis, -, and et al. Equations of state calculations by fast computing machines. The Journal of Chemical Physics, 21:1087-1091, 1953. [Ratkowsky, 1983lD. Ratkowsky. Nonlinear regression modeling. Marcel Dekker, New York, 1983. [Robert and Casella, 1999lC.P. Robert and G. Casella. Monte Carlo statistical methods. Springer-Verlag, New York, 1999.
On a numerical approximation method of evaluating the interval transition probabilities of semi-Markov models Dimitrios Bitziadis', George Tsaklidis2and Aleka Papadopoulou2
' Technological Institute of Thessaloniki P.O. Box 14561 54 10 1 Thessaloniki, Greece (e-mail: BitziadisDOgiraeusbank.gr) Department of Mathematics Faculty of Science Aristotle University of Thessaloniki 54124 Thessaloniki, Greece (e-mail: [email protected]) Abstract. For the classical semi-Markov model, either time homogeneous or nonhomogeneous, an examination of the convergence of the interval transition probabilities Pi,(s,t) as t-m is presented using an approximation method provided by [R. De Dominics and R. Manca 19841. Especially, we examine the dependence of the accuracy of the respective numerical method on the various values of the step h, in finding the transition interval probabilities, and we investigate the complexity of this algorithm.
Keywords: Semi-Markov process, Non homogeneity, Numerical methods.
1. Introduction
In what follows denote by E={ 1,2,...,n} the state space of the Markov model and by t the (continuous-) time parameter. The states 1,2,...,n are exclusive and exhaustive, i.e., every member of the system may be in one and only one state at some time point t. Also denote by P,, i,j =1,2, ...,n, the probability that a member who entered state i on its last transition will enter state j on its next transition. The transitions between the states occur according to either a homogeneous or a non-homogeneous semi-Markov chain (the embedded semi-Markov chain). Let P={P,} be the matrix of transition probabilities and G(s,t)={G,(s,t)} the matrix of the distributions of the holding times, i.e., G,(s,t)=prob{the holding time in state i is less than or equal to t-s, given that the entrance in i occurred at time s and next transition will take place in state j}. 221
222
Recent Advances in Stochastic Modeling and Data Analysis
The core matrix of the semi-Markov process is Q(s,t) = { Q,(s,t)}, with Q(U) = POG(s,t) = {P,GkJ(s,t)}, where the symbol “0” stands for the Hadamard product. Finally, we denote S(s,t)=diag{S,(s,t)}, where S,(s,t) = CQJ(s,t)= prob{ the holding time in state i is less than or equal J
to t-s, given that the entrance in i occurred at time s } The aspects that have been studied in discrete and continuous time semi-Markov models, as outlined above, are the asymptotic behaviour, stability, variability, attainability etc. Semi Markov models arise in physics, actuarial work, biometry and manpower planning. Basic results can be found in [McClean, 1978,1980,1986], [Mehlman, 19791, [Bartholomew, 19861, [Howard, 19711, [Janssen and Limnios, 19991, [R. De Dominics and R. Manca, 19841, [Papadopoulou and Vassiliou, 19941 and [Vassiliou and Papadopoulou, 19921. The most characteristic features of a semi-Markov model are related to the interval transition probabilities of the model. A recursive formula for the interval transition probabilities is 1
P,(s,t>=4J(l-s,(4)
+
Cjp,(u,W(ak(S,4)
f
(1)
k s
Relation (1) can be approximated numerically by the recursive relation 1
c,s,j,f= q,s,j,t+
Cc&,k,he,h,j,t k h=s
9
(R. De Dominics and R. Manca (1984)), where:
The step of the approximation is equal to 1, i.e., the calculations in (2) are carried out only for h = s, s+l, s+2,. ..t Q,s,J,, =q,J(l-,S,,s,,) is the approximation of the probability that the process holds in i without any transition in the time interval (s, t ) S,,s,,=cQ,,,,,, is the approximation of the probability that the J
chain moves fiom state i into any other state before time t, given that the entrance in i took place at time s =q,s,J,tq,J is the approximation of the i,j element of the core
a,,,,,,
matrix of the semi-Markov process y,s,k,h is the difference Q,s,k,h -Q,s,k,h-,for the time interval (h-1,h) if t>s+l,which interprets an approximation of the p.d.f. c,&)
Numerical Approximation of Semi-Markou Models
223
if t=s+l, and y,b,j,b =O. The recursive formula (2) in matrix notation becomes
y,s,k,b =Q,s,k,b
I
K,t = D,,l
+ cy,bpb,t 3
(3)
b=S
with initial conditions P,,, = D,,,= I and Vb,b=0, where the i,j-th element of P,,, equals & J , t , the i,j-th element of D,,, equals L?s,J,l, and the i,j-th element of V,,hequals y,s,J,b. ARer some manipulations, formula ( 3 ) can be written in the form UP=D, where U is an upper block triangular matrix with the (i,i)-th block equal to the unity matrix and the (i,j)-th block (ifj) equal to -y,J, P is a block upper triangular matrix with the (i,i)-th block equal to the unity matrix and the (ij)-th block (ifj) equal to p,,, , D is a block triangular matrix with the (i,i)-th block equal to the unity matrix and the (i,j)-th block (ifj) equal to D,,, . It is interesting to investigate the convergence of ( 3 ) numerically. Using a simulation program created in PowerBilder we examine, for various values of the step h, the dependence of the accuracy of the numerical method based on ( 3 ) in order to find the transition interval probabilities. We treat both classical cases, i.e., when the holding times depend on the time of entrance into some state or not (non-homogeneous and homogeneous case respectively). Finally, we investigate the complexity of the algorithm provided by ( 3 ) . 2. Homogeneous case
We applied the approximation method provided by (3), for the example of the homogeneous semi-Markov system given in Papadopoulou and Vassiliou (1999), where
9/15 5/15
1/15
1/10 8/10 1/10 1/20 4/20 15/20
2e-2t 3e-3~ 4e4' 2e-2' e-' 3e-3' 3e-3' 2e-2t e-'
224
Recent Advances in Stochastic Modeling and Data Analysis
Then the core matrix Q(s,t)= POG(s,t) and the matrix D(s,t) can be evaluated easily for every s and t. We carried out the simulation using relation (3), with iterations 1 to 50 for s and t, and step-size h equal to 1, 0.5, 0.3, 0.2 and 0.1. We used double precision floating-point variables, with 15 digits of accuracy and range fiom 2.2E-308 to 1.7E+308. The results concerning the computation of the transition probabilities, via (3), are given below. Result analysis for the evaluation of the transition probabilities step h=l, time interval [30,50], execution time 2 min [1,1] = 0,13621 [1,2] = 0,58660 [1,3] = 0,27719 [2,1] = 0,13617 [2,2] = 0,58662 [2,3] = 0,27721 [3,1] = 0,13503 [3,2] = 0,58026 [3,3] = 0,28471 step h=0.5, time interval [30,50], execution time 6 rnin [1,1] =0,11662 [1,2]=0,59928 [1,3] =0,28410 [2,1] = 0,11663 [2,2] = 0,59941 [2,3] = 0,28396 [3,1] = 0,11629 [3,2] = 0,59664 [3,3] = 0,28707 step h=0.3, time interval [29.8,49.6], execution time 20 rnin [1,1] = 0,10738 [1,2] = 0,60561 [1,3] = 0,28698 [2,1] = 0,10739 [2,2] = 0,60574 [2,3] = 0,28683 [3,1] = 0,10718 [3,2] = 0,60371 [3,3] = 0,28907 step h=0.2, time interval [30,50], execution time 60 rnin [1,1] = 0,10254 [1,2] = 0,60902 [1,3] = 0,28851 [2,1] = 0,10255 [2,2] = 0,60913 [2,3] = 0,28839 [3,1] = 0,10240 [3,2] = 0,60756 [3,3] = 0,29011 step h=0.1, time interval [30,50], execution time 180 rnin [1,1] = 0,09755 [1,2] = 0,61227 [1,3] = 0,28989 [2,1] = 0,09756 [2,2] = 0,61238 [2,3] = 0,28977 [3,1] = 0,09745 [3,2] = 0,61109 [3,3] = 0,29117 Theoretical values of the transition probabilities [ 1,3] = 0,2968 [ 1,1] = 0,0942 [ 1,2] = 0,609 [2,1] = 0,0942 [2,2] = 0,609 [2,3] = 0,2968 [3,1] = 0,0942 [3,2] = 0,609 [3,3] = 0,2968
Numerical Approximation of Semi-Markov Models
225
As an example, we present for h=0.1 the trajectories of the interval transition probabilities P,2(30,30+t), P22(30,30+t) and P32(3O730+t), t=1,2,..,20. Convergence-graph of the three second column elements of the interval transition probabilitv matrix with step h=O. 1
Convergence-graph of the three third column elements of the interval transition probability matrix with step h=O. 1
I
I
C2DD' 0,100~ 4
2 4 6 8 10 11 14 I 6 18 20 22 24 26 B 30
Time
I 4 $ 8 10 12 I k 1% C I 20 2 14 2d 18
Time
Conclusion: There are significant estimation errors if h=l, 0.5, 0.3 and 0.2. Even for h=0.1 there are estimation errors appearing at the third, second and third decimal place for the first, second and third columns of the transition matrix respectively. Thus, the step-size h must be small enough - in connection with the estimation accuracy needed. On the other hand the execution time becomes considerable large; in our case, for h=O.1 it becomes three times more than for h=0.2. 3. Non-homogeneous case
Let the transition matrix be the one used in the homogeneous case, and the non-homogeneous matrix of the holding times be as follows:
226
Recent Advances in Stochastic Modeling and Data Analysis
The results concerning the computation of the transition probabilities by means of (3) are given below. step h=l, time interval [30,50], execution time 2 min and 30 sec [1,1] = 0.13553 [1,2] = 0.58724 [1,3] = 0.27724 [2,1] = 0.13548 [2,2] = 0.58729 [2,3] = 0.27723 [3,1] = 0.13427 [3,2] = 0.58036 [3,3] = 0.28537 step h=0.5, time interval [30,50], execution time 7 min and 10 sec [1,1] =0.11616 [1,2]=0.59970 [1,3] =0.28415 [2,1] = 0.1 1617 [2,2] = 0.59985 [2,3] = 0.28398 [3,1] = 0.1 1579 [3,2] = 0.59672 [3,3] = 0.28749 step h=0.3, time interval [30,50], execution time 23 min and 30 sec [1,1] = 0.11707 [1,2] = 0.60590 [1,3] = 0.28699 [2,1] = 0.10708 [2,2] = 0.60605 [2,3] = 0.28683 [3,1] = 0.10684 [3,2] = 0,60372 [3,3] = 0.28939 step h=0.2, time interval [30,50], execution time 68 min [1,1] = 0.10233 [1,2] = 0.60923 [1,3] = 0.28852 [2,1] = 0.10234 [2,2] = 0.60936 [2,3] = 0.28837 [3,1] = 0.10217 [3,2] = 0.60754 [3,3] = 0.29037 simulation step h=0.1, time interval [30,50], execution time 190 min [1,1] = 0.09744 [1,2] = 0.61241 [1,3] =0.28987 [2,1] = 0.09745 [2,2] = 0.61253 [2,3] = 0.28974
Numerical Approximation of Semi-Markov Models
227
[3,1] = 0.09732 [3,2] = 0.61 103 [3,3] = 0.29138 theoretical values of the transition probabilities [ 1,3] = 0,2968 [ 1,1] = 0,0942 [ 1,2] = 0,609 [2,1] = 0,0942 [2,2] = 0,609 [2,3] = 0,2968 [3,1] = 0,0942 [3,2] = 0,609 [3,3] = 0,2968 Conclusion: In the same manner as in the homogeneous case, a small h is needed in order to achieve accurate values of the transition probabilities. Thus, considerable errors are observed for h=l, 0.5, 0.3 and 0.2. For h=O. 1, the errors appear at the third, second and third decimal digit for the first, second and third column of the transition matrix respectively; in that sense the transition probabilities are close to the theoretical values if h10.1. The execution time (190 min) for h=O. 1 is almost three times more than for h=0.2.
4. On the algorithmic complexity 1. Discrete case. Evaluation of Pst, by means of (3), needs N3 multiplications and (N-1)N2 additions for the computation of Vs,hPh,t. Thus the time complexity of computing Pst is OW3); it can be reduced, for instance using Strassen's algorithm to become O(n22807) (but the error bound is weaker than that of the traditional algorithm). We notice that the same conclusion holds if the method provided by Vassiliou and Papadopoulou (1992) is used, since the respective equation applied, is of the same structure as (3).
2. Continuous time. For the algorithm given in De Dominics and Manca (1984) a small-enough step-size h shall be applied in order to achieve the desired accuracy for the computation of the transition probabilities. We note that especially for the evaluation of the asymptotic probabilities a formula in closed analytical form is given by Vassiliou and Papadopoulou (1992), where matrix inversion is needed. Using Gaussian elimination the time complexity of matrix inversion is O(N3). The time needed for the computation of the asymptotic transition probabilities by means of (3) in that case, is larger than the time needed using the direct evaluation of the asymptotic probabilities by means of the formula using matrix inversion, as carried out -for instance- by MATLAB or Mathematics etc.
228
Recent Advances in Stochastic Modeling and Data Analysis
References
[Bartholomew, 19861 D. J. Bartholomew. Social applications of semiIn Semi Markov models: Theory and Markov processes. Applications. New York:Plenum Press, 463474,1986. [De Dominics and Manca, 1984lDe Dominics R. and R. Manca. An algorithmic approach to non homogeneous Semi Markov processes Commun. Statist. - Simula. Computa., 13(6), 823-838, 1984. [Howard, 19711A. R. Howard. Dynamic probabilistic systems Chichester: Wiley, 1971. [Jansen and De Dominics, 19841 J. Jansen 'and R. De Dominics. An algorithmic approach to non homogeneous Semi Markov processes. Insurance:Math. Econ. 3: 157-165, 1984. [Janssen and Limnios, 1999lJ. N. Janssen and N. Limnios eds. Semi Markov Models and Applications. Vol.15, 23-32. Dordrecht Kluwer Academic Publishers, 1999. [McClean, 19781s. I. McClean. Continuous time stochastic models for a multigrade population. J. Appl. Prob. 15: 26-32, 1978. [McClean, 19801s. I. McClean. A semi Markovian model for a multigrade population. J.Appl. Prob. 17: 846-852, 1980. [McClean, 19861s. I. McClean. Semi Markov models for manpower planning. In Semi Markov models: Theory and Applications. New York:Plenum Press, 283-300, 1986. [Mehlman, 19791 A. Mehlman. Semi Markovian manpower models in continuous time. J. Appl. Prob. 16: 416-422, 1979. [Papadopoulou and Vassiliou, 1994lA. Papadopoulou and P.-C. G. Vassiliou. Asymptotic behavior of non homogeneous semi -Markov processes. Linear Algebra Appl. 210: 153-198, 1994. [Vassiliou and Papadopoulou, 19921 P.-C. G. Vassiliou and A. Papadopoulou. Non homogeneous semi -Markov systems and maintainability ofthe state sizes J. Appl. Prob. 29: 519-534, 1992.
Markov property of the solution of the generalized stochastic equations KHALDI Khaled Univdrsitd dc Boumcrdcs FacullE des Sciences DBpartement de MathCmatiques BOUMERDES 35000
ALGERIA (email: k k h a l d i h b b . dz) Abstract. Some models of probabilities are described by generalised stochastic equations. These models lead to the resolution of boundary problems for random distributions (generalized equations). We are interested in the equation Lz = f in S C IRd where L is a linear operator, f is a random distribution and to the class of boundary conditions on the frontier I’ = a s in ordcr t o dcfinc for thc corresponding boundary conditions. The resolution of boundary problems for random distributions lead to the Markov property for the solution of these equations. AMS Classif : Primary 60G12; Secondary 47A15 Keywords: Linear operator, stochastic equation, stochastic distribution, Markov property.
Some models of probabilities are descripted by generalised stochatic equations. These models (like that prediction) lead to the resolution of boundary problems for random distributions (generalizedequations). We are interested in the equation Lx = f in S c IRd where L is an linear operator, f is a random distribution and to the class of boundary conditions on the frontier r = dS in order to define for the corresponding boundary conditions. The resolution of boundary problems for random distributions lead to the Markov property for the solution of these equations. The boundary problems for the linear equations of the type:
Lx ( t )= f ( t ), t E s1
(1)
where L is a linear operator , lead to search the distribution x (t), t E T , in T 2 E d , containing the domain T C E d , such as 2
( t )= Y ( t ), t
E T\S2
(2)
where T\S is the additional of S in T. So that the probleme admits a (only) solution, the distributions f ( t )and v(t) as the solution x(t) have to belong to a certain class which we shall describe more low. The solution x(t) is bound to the operator L verifiying the equation:
L[ ( t )= [* ( t ), t E T3 229
(3)
230
Recent Advances in Stochastic Modeling and Data Analysis
where ( ( t )and (* ( t )are distributions with values in Hilbert's space H . By distribution with values in H , we understands a linear continuous application: € : cp E c r (TI ((3: €1 E H4 (4)
-
In first, we are interested in the properties of the solution { ( t )of the boundary problem (3) that we prolong in W' = C r (T)space of distributions u = (cp, u),cp E C r (T) . The distribution u E C r (T) is,
we define for a some set S ( S C T) the space
The solution E ( t )of the equation (3) for a local operator L possesses the following markov property: for everything S C T with border I' = dS, the . projection of the space H+ (T\S) in H+ ( S ) coincides with H+ (r) One notices that an equation of the type (3) with L = l*l and E* ( t ) = 1'17 ( t )occupies an important place in the theory of the stochastic differential equations of the type L ( ( t ) = t E T,where 1 is a differential linear operator, 71 (t)a distribution in H (Hilbert's space) called "white noise" in t E T. For every u E C r (T) , the distribution
L u = ( v , L u ) , cpECoOO(T) is a positive linear application u --+ (cp, L
Lu : 07
(7)
with 11 cp 11 = (cp, L u ) ; and one considers the espace de Hilbert's W,completed of C r (T) by the scalar product
(u, v) = ( u l L v ) ,u, E C r ( T ) 8
(8)
By using (8) ,one corresponds to every v E W the distribution Lv :
Lv
=
(cp,Lv) = (cp,v), cp E c r (T)9
(9)
and one introduces W* (set of distributions Lv prolonged by continuance on W ) : Lv = (ul Lv)= (u, v) 10 (10) It is evident that W' is the dual of W and exactely W * is the set of functional linear on W deescribed by (9), with
Markov Property of Solution of Generalized Stochastic Equations
11 Lv 11
231
= ( L v ,Lvu)i = 11 cp 11 = l S U p p (cp, v)
The equation (1)est undderstood inthe sense
f
(cp, Lx) = (9, ), cp E
c,- (S)11
(11)
aand the boundary conditions (2) as
( u , x )=
(U,V),
u E W * ,Suppu C T\S12
(12)
For the search for the solution of the problem (1) - (2), one considers the Hilbert's space H defining the isometric application [* : u E W
-+ (u,
t*)E H13
(13)
defined by ( 3 ) . Thc application (4)maybc idcritificd with thc isomctric application
E
:u E
W'
-+
(u, [) =
(L-lu, [*) E H14
(14)
We define H (S)and H* (S)respectively by H (S)= ((cp, E ) , cp E C r ( S ) ) and H* (S) = ( ( p , t*),(cp E C r (S))).It is evident that H (T) = H* (2"). Alterward, one deEne H = H ( T ). That is to say q } the scalar product of [ and 7 E H . One has so
{c,
The condition ( T L , t )I(cp, <*) , (cp E C r (S)) ,for an opened set S C: T is equivalent in Suppu _C T\S. By indicated by H* (S)' = H* ( T )8 H' (S),onehas H+ (T\S) = H" (S)Iwhere H+ (T\S) = { ( u , < ) ,u E W * ,Suppu C T\S}
et H; (T\S) = H (S)I where H; (T\S) = { ( v : E ) , v E W, Suppu S T\S} because {(v, 1 (% 01 = (v,4 = (u, v) 16 (16)
r*>
that one deducts of (15) by passage on the limite of cp -+ v E W.
Theorem 1. The unique solution x E W of the problem (1) - (2) is given by 2=
(u, x ) = (9,
f) + (17r+z,v)
where U: is the operator of projection on H+ (I1) and g the solution of the equation L*g = x - n;x, Suppu 3.
232
Recent Advances in Stochastic Modeling and Data Analysis
Proof.
AS
H =H
( s )CB~H (3) = H ( s )c~ ~H+ (r)@ H (T\s)',
x E H can be written under the shape x = X I +
every
+ 2 3 where 2 1 , 2 2 and 2 3
22
are the orthogonal projections of x on the sub-spaces H (S)l,H+ (Z') and H (T\S)I . One has then (u, x) = (u, XI) (u, 22) (u, x3), x E H .As
+
+
(q
(u, x) = (Lu,(L*)-'x)= (L*Lu,z)= (u,( L * L ) - ' x ) , one has (u, 23) =
(f, (L*)-' x;()because V y E L"H (T\S)l, (u, y ) = (Lu,g ) , g E H (T\S)l, g
=
(L*)-' y . It remains to show that (Lu,g) = (f,g) , g E H (T\S)l. The equation LX = f means (cp, LX) = (cp, f) , cp E CF (s)or LX - f E H ( T \ s ) ~ . As a consequence (Lx- f , g) = 0, g E H (T\S)l and so any solution of the x) = (u,2 2 ) (f,(L*)-' x3) or equation Lx = f can be represented by (u,
+
(u, x) = (u,2 2 )
+ (f,g) where g = (L*)-l (x- 2 2 ) with 2 2 = Tl:x.
Theorem 2. The equation Lx ( t )= 0 , t E s17
(17)
where L is a local ogrator local with the boundary conditions
(u, X) = (21,
U)
,
E W*, Suppu
C r18
(18)
possesses a unique solution x E W in S
Proof. As one say that the solution is written by
(m
x) = (9, f) + U) It remains to check that all solution x E W with null boundary conditions is egal to 0 in S. The condition (18) give (2, €*) IH*(Sl), S1 = S and 0 = (u,z) = (u,Lx), = { ( u . <), (x,<*)}, S u m C r mean that I (x,[*) IH+( r ). The dbromposition H = H @ H+ ( r )@ H or (u, x) = 2
{(Ti,
(),
(Z)(*)}
= (21,
= 0, suppu
7
(s>
c s:
Theorem 3. The equation (3) with a local operator L possesses the markov property. I
I
Proof. As H+ (T)= H * ( S U (T\s>) where H* (S U (T\S)) = H* ( S )@ H* (T\S) direct sum of the two sub-spaces H* (S) and H* (T\S) because vcp E cM , (su (T\S)), 9 = 9 1 + 9 2 , cp1 E cM , (S) and cpz E c,- p\q, which, as elements of W are orthogonal: (91,c p ~ )= (cpl,Lcp2)= 0. One has consequently {(cpl, [*) , ( 9 2 , [*)} = (cpl, 9 2 ) = 0. We have so the orthogoI nal decomposition H = H @I H+ (r) @ H (T\S)l in which H* ( S )@
(s>
H * (I') = H* (T\S)l = H+ (S) and H+ (r)@ H* (T\S) = H * (S)l = H+ (T\S) . By indicating by LI+ (S) and D+ (T\q the orthogonal projections on H+ ( S ) and H+ (T\S) respectively, one has IT+ ( S )H+ (T\S) = H+ (T\S) lf+ ( S ) = 11, (I'). What means that the distribution ( ( t ) possesses, in the espace of Hilbert, the markov
.
H
Markou Property of Solution of Generalized Stochastic Equations 233 References [l]G. KALLIANPUR and U. MANDREKAR. The markov property for gcncrdizcd gniissian random ficlds. Ann. Inst. Fourier, 1974, V.2, N2, p.143-167. [2]K.KHALD1. Boundary problems for local limited operators in Hilbert’s spaces. Stochastics processes , mathematical statistics and applications. University of Moscou. 1989, p.31-33. [3] K.KHALD1. Boundary problems for stochastics equations and stochastics markov ficlds. Proc. 3nd. International Conference on Applied Mathematics and Engineering Sciences. CIMASI 2000. Casablanca. 2000. [4] P. LEVY. A special problem of gaussian random functions. Proc. 3nd Berkeley Symp. Math. Stat. Prob. 1956, V.2, p.133-175. 131 Y . ROZANOV. Sorric bouiidary problciiis for pour diffcrciitial gciicralized equations. Math. Zametki, 1987, T.l, N.l, p.llO-118. 161 Y . ROZANOV. Boudaiy problcrris for stochastic partial diffacritial equations. Universitat Bielfeld. Forschungszentrum Bielfeld Bochum - Stochastic. 1987, N.23. [7] Y. ROZANOV. Stochastics markov fields. Naouka. Moscow. 1981.
Partially Markov models and unsupervised segmentation of semi-Markov chains hidden with long dependence noise JerGme Lapuyade-Lahorgueand Wojciech Pieczynski GET/INT, CITI Department, CNRS UMR 5 157 9, rue Charles Fourier, 91000 Evry, France
Abstract. The hidden Markov chain (HMC) model is a couple of random sequences ( X , Y ) , in which X is an unobservable Markov chain, and Y is its observable noisy version. Classically, the distribution p(ylx) is simple enough to ensure the Markovianity of p ( x l y ) , that enables one to use different Bayesian restoration techniques. HMC model has recently been extended to “triplet Markov chain” (TMC) model, which is obtained by adding a third chain U and considering the Markovianity of the triplet T = ( X , U , Y ) . When U is not too complex, X can still be recovered from Y . In particular, a semi-Markov hidden chain is a particular TMC. Otherwise, the recent triplet partially Markov chain (TPMC) is a triplet T = ( X , U , Y ) such that p ( x , u ( y )is a Markov distribution, which still allows one to recover X from Y . The aim of this paper is to introduce, using a particular TPMC, semi-Markov chains hidden with long dependence noise. The general iterative conditional estimation (ICE) method is then used to estimate the model parameters, and the interest of the new model in unsupervised data segmentation is validated through experiments. Keywords: hidden Markov chains, triplet Markov chains, unsupervised segmentation, image segmentation, iterative conditional estimation.
1. Introduction Let X = (Xn),sn6N and Y = ( < ) , s n s N two stochastic processes, where X is hidden and Y is observable. In the whole paper, each X n takes its values in a finite set of classes i2 = {u,,. ..,w,} and each takes its values in R . The problem of estimating X from Y , which occurs in numerous applications, can be solved with Bayesian methods once one has chosen some accurate distribution p(x,y ) for 2 = (X, Y ) . The hidden Markov chain (HMC) model is the simplest and most well known model. Its applications cover numerous fields, and can be seen in recent books or general papers [Koski, 20011, [CappC et al., 20051. [Ephraim and Merhav, 20021. However, it is insufficient in some situations and thus it has been extended to “hidden semi-Markov chains” models [Faisan et al., 20051, [Guedon, 20031, [Moore and Savic, 234
Partially Markov Models and Unsupervised Segmentation
235
20041, [Yu and Kobayashi, 20031. Otherwise, a long dependence noise does exist in some situations [Doukhan et al., 20031, but can not be taken into account in the classical HMC. The aim of this paper is to propose a new model in which the hidden chain is a Markov one, and in which the noise is a long dependence one. On the one hand, we exploit the fact, already mentioned in [Pieczynski and Desbouvries, 20051 that an HSMC is a particular “triplet Markov chain” (TMC, [Pieczynski et al., 20021, [Ait-elFquih and Desbouvries, 20061). On the other hand, we exploit the ideas proposed in [Pieczynski, 20041. We also propose a parameter estimation method of “iterative conditional estimation” (ICE) kind [Fjortoft et al., 20031, and show that the new model can be of non negligible interest in unsupervised data segmentation.
2. Triplet Markov chains and hidden semi-Markov chains
(x)lsncN
Let X = (X,1)ILn6N and Y = be two stochastic processes mentioned above. The problem of estimating X = x from Y = y can be solved once the marginal posterior distributions p ( x , Iy) are calculable. Let us consider an taking its values in a finite state space A = {l,...,L } and auxiliary process such that ( X , U ) is a Markov chain whose distribution, given by
The chain X
is then called “semi-Markov chain”. If we consider
N
p(ylx) = n p ( y n l x n ) the , triplet ( X , U , Y ) is the classical “hidden semi“=l
Markov chain” (HSMC). In this paper, we consider a more sophisticated noise distribution p ( y l x ) , which is a “long dependence” one. To introduce it, let us consider stationary Gaussian process Y = (q,...,Y N ). It will be called “long-dependence’’ if its covariance function y ( k ) = E(<<_,) - E ( % ) ( x - , )
-
is such that there exist a ~]0,1] and C for which y ( k ) Ck-“ when Ikl- > 00 .
The new model T = (X, U’,U 2 Y , ) we propose is the following. (X, U’) is a semi-Markov chain defined by (l), with A’ = {l,..., L l }. The chain U 2 is such that each U,: takes its values in A’ = (1,..., L,] and, at each
y1
= 1 , ..., N ,the
236
Recent Advances in Stochastic Modeling and Data Analysis
variable U: designates the number k I L, of previous indices n - 1 , ..., n - k such that xn = x,,-i =.,. = x , , - ~ and , x,, # x , ~ _ .~ Therefore _, we can say that U,: is the exact past sojourn time in xn, while U: is, according to (l), a minimal future time sojourn in xn (in our model, u: = 1 does not imply that P ( X , += ~ X , , ~ X ,= ~ ) 0). We can note that when xn+,= x,, , u:,,~= U: -1 u:+, > 1) and u,:+, = u,:
(if
+ 1. Otherwise, as the noise is a long dependence one,
the distribution p(ylx) is not a Markov one. In the model we propose, the distribution of y,,, conditional on X,,, = x,,+,, U:+, = k , and Y,= yi , ..., = y,, , depends on X,,+,= xn+,and Yn-k = yn-k, ..., q, = y , . Therefore, for
<
y , x,) is Gaussian with the each class x,~, the distribution p(yn-u:,yn-u:+17...,
-
mean vector M::+') = ( M x ,,....,MJ
and the variance-covariance matrix
"?+I tlmeS
r : : + l )
such that
(r:!+l))z,, = 0;" (li - j [-t1)-"" .
Finally, the distribution of the new model T = ( X , U L , U z , Y )we propose is defined by
P ( X d , U 2 , Y ) = P ~ ~ , ~ P ~ ~ : ~ x~ , ~ P ~ ~ ~ / ~ , ~ P ~ Y N-l
4,) P ( d +Ix,,+I ,
x nP(X,,+IIX?!,
7
4
Mu,:+,
,,=I
1%
,Xn+i7
d
>P(Y,+iI%+,4+l ,Y" 1 7
where:
1%
$4= 4"(Xa+,) if 4,> 1 and P(X,,+I1%1if 4 = 1 ; P<4+, IX,,, 4,) = d,,;-l(4+,) if 4,'1 and P ( 4 + Ix,8+1 I 1if ,I. = 1; P<4+,IX", x,l+l4 = d*n*+, ( u k ifx, = %+I 4 < Lz and 4 if x,, P(X,,,
2
>
3
>
>
2
(u,:+l)
f
X,,+I or
u: = L, ; = P(Yn+,IXn+1) if P0.'n+llX"+l,4+12Yn)
4+1
= 0 3 and P(Y,+IIX +. i.Y,, -,:+,+,'...'u.,
if
u:+] 2 1.
Moreover,
in
the
last
relation
the
Gaussian
vector
P(Y"-";+,+], Y,1-a;+,+2 ,..., Y,,, lxn+l 1 verifies long memory condition above.
It is then possible to show that for observed Y = y , the distribution p(x7u1,uzly)is a Markov distribution and, as ( X , U i 7 U z )is finite, the
Partially Markov Models and Unsupervised Segmentation
237
I
classical “Backward” and “Forward” calculations give p(x,,,u,:, u,: y ) , which
I
I
p(x, ,u,: ,u: y ) used in Bayesian MPM segmentation.
gives p(x,,y ) = u: .ti:
3. Parameter estimation with ICE The “Iterative Conditional Estimation” (ICE) method we use in this paper is based on the following principle [Fj0rtoft et al., 20031. Let 6 = (el,...,Om) be the vector of all real parameters defining the distribution p ( t ) of a TMC
T = ( X , U , Y ) , and let &t) be an estimator of 6 defined from the complete data t = (x,u,y) . ICE is an iterative method consisting on: (i) initialize 6’ ; (ii) compute
@,“+I
= J?[~~(X,U,Y)IY = y,@)
for the components Oj for which
this computation is workable; (iii) for other components 6, , simulate (xp,up), ..., (xp,up) according to p(x,uIy,Bq) and put 6:+’=
S(xp,up,y) +...+ B(xp,u,q, y )
.i
We see that ICE is applicable under very slight two hypotheses: existence of an estimator
6(t) from
the complete data, and the ability of simulating
( X , U ) according to p(x,uly). The first hypothesis is not really a constraint because if we are not able to estimate 6 from complete data (x,u,y), there is no point in searching an estimator from incomplete ones y . The second hypothesis is always verified for any TMC T = ( X , U , Y ) ; in fact, p(x,uly) is a Markov chain distribution. In order to detail how ICE is working let us specify the different parameters and the possibility of their estimation from the complete data T = ( X ,U’ ,U z,Y ) . First, the distribution p ( x , ,u : ) and the transitions of the Markov chain ( X ,U ’ ) are defined by p ( x , ,u :, x,, ui) which we propose to
, ) by the classical counting estimator (the estimate from T = ( X ,U’,U 2 Y function I is defined by I ( a = b) = 1 if a = b ,and 0 otherwise)
238
Recent Advances in Stochastic Modeling and Data Analysis
Second, the “noise parameters” are the means M x mthe , variances c2rn , and the long dependence parameters arn,which we propose to estimate from T = ( X , U ’ , U 2 , Y )by
Concerning (2), then the computation of (ii) in ICE is possible and gives: 1 N-I P ( y + ’ ) ( x 1 , ~ : , x 2 , ~ : ) = - ~ ~ (=XI&,: X , , =u:,xn+,=x2,u,;+l= u : Jy,Q,), N -1 where p(x, = = U : , X , , + ~ = X ~ , U : += ~ u: Iy,t9,) are computed using the forward-backward algorithm. Concerning the noise parameters, the conditional expectation is not computable and we have to use (iii). In experiments below the initialization is obtained from the segmentation by the classical k-means method, and we take 1 = 1.
4. Experiments The new “hidden semi-Markov chains with long dependence noise” (HSMCLDN) model generalizes, on the one hand, the classical “hidden semi-Markov chains” (HSMC) and, on the other hand, the “hidden Markov chains with long dependence noise” (HMC-LDN), which are a particular case of HSMCLDN such that X is a Markov chain. The aim of this section is to test the interest of these two generalizations in unsupervised data segmentation framework. To illustrate the results we will use images of size N = 128 x 128 . Such a bidimensional set of pixels is transformed into a mono-dimensional set using a Hilbert-Peano scan [Fjortoft et al., 20031, which gives a mono-dimensional chain. Such a representation is quite pleasant because it allows one to appreciate visually the degree of the noise, and also the quality difference between two segmentation results. We present three series of results. In the first series, the data suit SHMC and the question is whether using the new more complex HSMC-LDN does not degrade the results. The second series is devoted to the converse problem:
Partially Markov Models and Unsupervised Segmentation
239
when data suit HSMC-LDN, how do SHMC and HMC-LDN work? Finally, in the third series we use data produced by non one of the three models. Let (X,Ul,Y)be a classical SHMC, with I, =10, means equal respectively to 1 and 2, and variances equal to 20. The distribution of p(u,,tl\xn^,u,, = 1) is uniform x
on
A1,
and
p(xn,xn+l ua = 1) = 0.4995
for
xa=xn+l,
x
P( n^ ^i "„ =1) = 0.0005 for xn*xn^. The obtained realisation Y = y, presented in Fig.l, is then segmented by three methods. The first one is the MPM method based on true parameters; thus the result is the reference one. The second method is the MPM unsupervised method based on the classical HSMC and ICE, while the third method is the MPM unsupervised method based on the new HSMC-LDN model, with £2 = 50, and the related ICE. The aims of this experiment are, on the one hand, to show the robustness of the HSMC-LDN model and on the other hand, to see how the new model manages the independent noise. According to the results presented in Fig.l, we see that the new model gives comparable results. This is due to the good behaviour of the parameter estimation method; in fact, The estimates of means are 1.01, 2.04 for HSMC and 0.98, 1.97 for HSMC-LDN. The estimates of variances are 19.81, 20.71 for HSMC and 19.84, 20.46 for HSMC-LDN. Finally, the estimates of a are, for HSMC-LDN, 15.28 and 5.95.
X =x
MPM 3.74%
HSMC 4.55%
HSMC-LDN 4.57%
Fig. 1 Segmentation of the HSMC model according to three methods. Let us now describe the second series of experiments. Here, we aim to segment a HSMC-LDN which is neither a particular SHMC nor a particular HMC- LDN. The the semi-Markov chain (X,U}) is the same as above. For the noise, the means are respectively equal to 1 and 2, the variance is equal to 1 and a = 0.5 . According to Fig. 2 we see that neither HSMC nor HMC-LDN can compete with HSMC-LDN when data suit the latter. The difference in error ratios is very large, which means that HSMC-LDN is a really significative extension of both HMC-LDN and HSMC.
240
Recent Advances in Stochastic Modeling and Data Analysis
X=x
HSMC 31.15%
HMC-LDN 21.53%
HSMC-LDN 3.16%
Fig. 2. Three unsupervised segmentations of data simulated according to HSMC-LDN In this second example, the estimates of means are 0.97, 2.44 for HSMC, 1.08, 2.22 for HMC-LDN, and 0.98, 1.97 for HSMC-LDN. The estimates of variances are 0.59, 0.56 for HSMC, 0.83, 0.78 for HMC-LDN, and 0.96, 0.93 for HSMC-LDN. Finally, the estimates of a are 0.69, 0.72 for HMC-LDN and 0.62, 0.61 for HSMC-LDN. Finally, we consider a hand-written image X = x presented in Fig. 3. The means of the noise are respectively equal to 1 and 2 , whereas the common variance is equal to 1. As above, we are segmenting Y = y by using the three methods SHMC, HMC-LM and SHMC-LM. As above, we consider that Z, = 10 for the semi-markovianity and L, = 50.
X=x
Y= y
HSMC 24.70%
HMC-LDN 18.27 %
HSMC-LDN 6.31 %
Fig. 3. Three unsupervised segmentations of hand written X = x noisy with a long dependence noise. The estimates of means are 0.75, 2.26 for HSMC, 0.91, 2.04 for HMC-LDN, and 0.99, 1.99 for HSMC-LDN. The estimates of variances are 0.69, 0.66 for HSMC, 0.92, 0.93 for HMC-LDN, and 1.01, 1.02 for HSMC-LDN. Finally, the estimates of a are 1.07, 1.07 for HMC-LDN and 0.93, 0.92 for HSMCLDN.
5. Conclusion As a general conclusion we can say, accordingly to different experiments results, that the new semi-Markov chain hidden with long dependence noise model proposed in this paper turns out to be of interest, when unsupervised segmentation is concerned, with respect to classical simpler models.
Partially Markov Models and Unsupervised Segmentation
241
References [CappC et al., 20051 0. CappC, E. Moulines, T. Ryden, Inference in hidden Markov models, Springer, Series in Statistics, 2005. [Doukhan et al., 2003],] P. Doukhan, G. Oppenheim, and M. S. Taqqu, editors, Theory and Applications of Long-Range Dependence, Birhhauser, 2003. [Ephraim and Merhav, 20021 Y. Ephraim and N. Merhav, Hidden Markov processes, IEEE Trans. on Information Theory, Vol. 48, No. 6, pp. 1518-1569,2002. [Faisan et al., 20051 S. Faisan, L. Thoraval, J.-P. Armspach, M.-N. MetzLutz , F. Heitz, Unsupervised learning and mapping of active brain functional MRI signals based on hidden semi-Markov event sequence models, IEEE Trans. On Medical Imaging, Vol. 24, No. 2, pp. 263-276, 2005. [Fj0rtoft et al., 20031 R. Fjerrtoft, Y. Delignon, W. Pieczynski, M. Sigelle, and F. Tupin, Unsupervised segmentation of radar images using hidden Markov chains and hidden Markov random fields, IEEE Trans. on Geoscience and Remote Sensing, Vol. 41, No. 3, pp. 675-686,2003. [Guedon, 20031 Y. Guedon, Estimating Hidden Semi-Markov Chains from discrete Sequences, Journal of Computational and Graphical Statistics, Vol. 12, No. 3, pp. 604-639, Sept. 2003. [Koski, 20011 T. Koski, Hidden Markov models for bioinformatics, Kluwer Academic Publishers, 2001. [Moore and Savic, 20041 M. D. Moore and M. I. Savic, Speech reconstruction using a generalized HSMM (GHSMM), Digital Signal Processing, Vol. 14, NO. 1, pp. 37-53,2004. [Pieczynski et al., 20021 W. Pieczynski, C. Hulard, and T. Veit, Triplet Markov Chains in hidden signal restoration, SPIE s International Symposium on Remote Sensing, September 22-27, Crete, Greece, 2002. [Pieczynski, 20041 W. Pieczynski, Triplet Partially Markov Chains and Tree, 2nd International Symposium on ImageNideo Communications over fixed and mobile networks (ISIVC’O4), Brest, France, 7-9 July, 2004. [Pieczynski and Desbouvries, 20051 W. Pieczynski and F. Desbouvries, On triplet Markov chains, International Symposium on Applied Stochastic Models and Data Analysis, (ASMDA 2005), Brest, France, May 2005. [Yu and Kobayashi, 20031 S.-Z. Yu and H. Kobayashi, A hidden semiMarkov model with missing data and multiple observation sequences for mobility tracking, Signal Processing, Vol. 83, No. 2, pp. 235-250,2003.
CHAPTER 6 Parametricmon-Parametric
Independent x: distributed in the limit components of some chi-squared tests Vassilly Voinov', Mikhail Nikulin2 and Natalie Pya' Kazakhstan Institute of Management, Economics and Strategic Research 050010 Almaty, Kazakhstan
(e-mail: voinovv0kimep. kz, pyaQkimep.kz) EA 2961, Statistique Mathematique Universite Bordeaux 2 Bordeaux, France (e-mail: nikouQsm.u-bordeaux2.fr)
Abstract. Non-parametric and parametric explicit decompositions of the classical Pearson, Pearson-Fisher, Hsuan-Robson-Mirvaliev and other tests on a sum of asymptotically independent chi-squared random variables with one degree of freedom in case of non-equiprobable cells are discussed. The parametric decompositions can be used for constructing more powerful tests, and can be considered as alternative proofs of limit theorems for some chi-squared type goodness-of-fit statistics. Keywords: Chi-squared tests, Decomposition of chi-squared tests, Modified chisquared tests, Power of goodness-of-fit tests.
1
Introduction
Possibly Ronald Fisher ([Fisher, 19251) was the first noted that "in some cases it is possible t o separate the contributions to x2 made by the individual degrees of freedom, and so t o test the separate components of a discrepancy". In 1951 Lancaster ([Lancaster, 19511, see also [Gilula and Krieger , 19831) used the partition of x2 t o investigate the interactions of all orders in contingency tables. Cochran ([Cochran , 19541) wrote "that the usual x2 tests are often insensitive, and do not indicate significant results when the null hypothesis is actually false" and suggested to "use a single degree of freedom, or a group of degrees of freedom, from the total x2", t o get more powerful and appropriate tests. This idea was used implicitly by [McCulloch, 19851, [Voinov et al, 2007 (a)],and [Voinov et all 2007 (b)]. In this note we consider two types of decompositions of X2-type statistics: a non-parametric decomposition of [Anderson, 19941 (see also [Anderson, 19961 and [Boero et a1 , 20041) for equiprobable cells and a non-parametric, and parametric decompositions in case of non-equiprobable cells based on ideas of [Mirvaliev, 20011. Throughout the paper vectors and matrices are boldfaced. 243
244
Recent Advances in Stochastic Modeling and Data Analysis
Anderson's non-parametric decomposition
2
[Anderson, 19941 suggested a decomposition of the classical non-parametric goodness-of-fit statistic in case of equiprobable classes to get additional information about the nature of departures from the hypothesized distribution. A theoretical derivation of that decomposition has been done by [Boero et a1 , 20041. Consider briefly this theory. In case of equiprobable intervals the Pearson's sum would be
Let x be a r x 1 vector with components N,!"' and p be r x 1 vector with I eeT/r](x - p ) / ( n / r ) ,where e components n / r , then X: = (x - P ) ~ [ is r x 1 vector of ones. There exists a ( r - 1) x 1 matrix A such that AAT = I,ATA = [I - eeT/r]. Under the transformation y = A(x - p ) the Pearson's sum can be written down as X: = y T y / ( n / r ) where , r -1 components y : / ( n / r ) are independently distributed in the limit under Ho. [Boero et a1 , 20041 proposed the following way for constructing the matrix
x:
A. Let H be the Hadamard matrix with orthogonal columns, i.e. HTH = rI. For r being a power of 2 and using the basic Hadamard matrix
')
Hz=('1-1
one can construct matrices of higher order as follows: H4 = Hz C3 Ha, H8 = H4 @ Hz, and so forth. The matrix A is then extracted from the partition
H = ($A). For r = 4, e.g., after rearranging the rows, one has
1-1
1-1
With this matrix the transformed vector y for four equiprobable cells will be y = A(x - p ) or, explicitly, - n/4)+ y=2
l(
( ~ 4 ~ n/4) ) - ( ~ 4 -~n/4) ) ( ~ 1 %- n/4) ) -
(N,(") - 4 4 ) - ( N p- 4 4 )
-
-
( N p- n/4) + ( N p- 4 4 )
( N p - 4 4 ) - ( N p- 4 4 ) + ( N p- 4 4 ) - (ivy - 4 4 )
)
.
(3)
Thus, the Pearson's sum is decomposed as follows:
x; = [(N,(")
-
n/4)
+ (Ni") - n/4)
-
n
( N i n )- n/4) - ( N p ) - n/4)I2+
Independent
xl Distributed
in the Limit Components
245
all terms in ( 4 ) being independent x: distributed in the limit and may be used separately from each other. From ( 4 ) one sees that the first term is sensitive to location, the second one to scale, and the rest one to skewness. Our investigation shows that first and third components of ( 4 ) possess, e.g., no power when testing for logistic null distribution with mean zero and variance one against symmetric alternatives (normal, triangular, and double-exponential, all with mean zero and variance one). At the same time the second component of ( 4 ) gives power, which is on average 1.5 times more than that of the total x2 sum. The decomposition analogous to ( 4 ) can be done also for T = 8. In that case a term sensitive to kurtosis will appear, but it is difficult to relate the remaining three components to characteristics of the distribution [Boero et a1 , 20041.
3
A transformation of a random vector
= (21,..., ZT)Tbe a random vector such that E Z = 0, ( d i j ) , i,j = 1, ..., T , the rank of D being k 5 T . Denote
Let Z
Z(i) = (21,
..., zi)*,
E(ZZT) = D
=
Di = E ( Z ( i l Z $ ) ) ,
d,(j) = C O V ( ~ Z~ (, j ) ) , d(i)j= C O V ( Z (Z~j)),,
i ,j = 1,2, ..., k .
Consider the vector 6 ( t )= (61, ..., 6t)T with components
Assume Z(o) = 0, ID01 = 1. Theorem 1. Let 6 ( t )be a vector with components ( 5 ) , then
E4t) = 0,
{
E d(t)dT,)}= I t ,
(6)
where I t is the identity matrix of order t , t = 1, ...,k . A Proof. Using the well-known formula = IA//D - BA-~CJ one lBDl sees that 6i defined by (5) can be represented as
246
Recent Advances in Stochastic Modeling and Data Analysis
The first equality in (6) follows immediately from (7). To prove the second equality we have to show that components of are normalized. Indeed =
-
W
{
{
E 2, - ~ ~ z ~ l ) D ~ ~ l27 ~ (-z~ -~l ~) }l )
1-
~ ; ~ l ~z ( z - l )
{
6,,- d ~ z - l ) D ~ ~ l d ( z -=l )1.z }By the same lines it can be shown ID%( are uncorrelated, i.e. E6,6, = 0, i # j . that components of Theorem 2. The following identity holds
11 6 ( t ) [I2= 6: + * . . + 6:
= Z$)DtlZ(t),
t = 1,...,k .
(8)
Proof. If t = 1, the theorem holds. Suppose that the above factorization holds up to t = k - 1 inclusive and prove it for t = k . Since
then using block matrix inversion we get
Z&)DilZ(k)= Z&-l)DL?lZ(lc-i)+ (&t . (.i. - d&g-l)DL?lZ(k-i)) = b:
-
d~(k-1)D~?ld(k-i)ic)
+ . . . + di-1 + 6; = IlS(lc)112,
which completes the proof. Thus, the linear transformation defined by (7) diagonalizes the quadratic form Z;)D<'Z(,) of (8). Corollary 1. If a vector Z is distributed normally with parameters (0, D), then components of 6 ( t ) are independent standard normal random 112 possesses the chi-squared distribuvariables and the quadratic form tion with t degrees of freedom (t = 1,..., k ) . Remark 1. If a matrix B is p.d., it can be uniquely represented as B = UUT,where U is a real-valued upper triangular matrix with positive diagonal elements. Theorems 1 and 2 permit to modify this Cholesky decomposition as follows. If the rank of D equals k, k < r , then we may interchange columns of the non-negative definite covariance matrix D such that the first k columns will be linearly independent. From Theorems 1 and 2 we have Lemma 1. Let a r x r matrix D be n.n.d. of rank k , then
where a T x k matrix R = (RkiO) with Rk being a lower triangular matrix with elements
rii =
Jm,-riid;i-I) rij =
(DT-il)j ,
(9)
where j = 1,...,i - 1, i # j , i 2 2, i = 1,...,k . In this formula (Ai)j denotes j - s column of the leading sub-matrix of order i x i of the matrix A. Lemma's proof follows from Theorems 1 and 2 and from the representation
6 ( k ) = RZ(T).
Independent
x; Distributed
in the Limit Components
247
4 Components of the classical and modified chi-squared stat ist ics 4.1
A decomposition of Pearson's sum for non-equiprobable cells
Let components of a vector Z be Zi = (n~i)-~/'(N:"'- n p i ) , i = 1, ...,T , then under Ho the vector Z is normally distributed in the limit with mean zero and the covariance matrix D = I,-qqT of rank r-1, q = (p:", . . . , P : / ' ) ~ It is easily verified that if Dk = Ik - qkqz, then k
IDkI
=
1-
ZPi,
D i l = Ik
+
i=l
where qk = (p;'21...,pL'z)*l get elements of b - 1 as Tii =
k
= 1, ...,r
-
(
1-
k
ZPi )
-1
qkqr,
(10)
i=l
1. Substituting (10) into (9) we
d r n ,
Thus, from this and 6(,-1) = RZ(,), where R = (R,-l:O), it follows that
Corollary 2. The following decomposition for the Pearson's sum holds
x: = ZTZ = 6; + 6; + . . . + 6,"-1 , where 6: under Ho are independent distributed in the limit as xf.Note that such a decomposition is not unique. Remark 2. By the same lines it can be shown that asymptotically equivalent Dzhaparidze-Nikulin test ([Dzhaparidze and Nikulin , 19921, see also [Van Der Vaart, 19981)
U: (8,)
= V(,lT
(8,) [I - Bn(B:Bn)-B:]V(")(8n)
where V(n)(e,)is a vector with components vi"'(8,) = (n~i(8,))-'/'(N/~)npi(8,)), i = 1, ...,r , 8, being any 6-consistent estimate of an unknown s-dimentional parameter 8 = (01,..., OS)*, and Pearson-Fisher test (see [Greenwood and Nikulin, 19963)
x:(e,)= v(")(e,)v(e,), where 8, is an estimate obtained by minimizing the Pearson's sum or asymptotically equivalent one, possess in the limit the following decompositions:
248
Recent Advances in Stochastic Modeling and Data Analysis
+ + x:
u;(e,)= b:(8,) + . . ._+~5;-~-~(8,) and X:(O,) = &:(On) . . . b?-s-l(On) redistributed random spectively, where Sf (&) and Sf (&) are independent variables. [Rayner, 20021 basing on the Neyman's smooth goodness-of-fit score test proposed a technique for constructing distributed in the limit components of the Pearson-Fisher statistic in the presence of unknown nuisance parameters. Unfortunately, this technique on the contrary of our approach does not provide those components explicitly. In the next Section a decomposition of the well-known Rao-Robson-Nikulin modified test ([Drost , 19881, [Van Der Vaart, 19981) is considered.
xy
4.2
A decomposition of the Rao-Robson-Nikulin statistic
In this case
DI, = Ik - q k q c
-
BkJ-'B:,
k
=
1, ..., T
-
1,
(12)
where matrices B and J are defined, e.g., in [Nikulin and Voinov, 20061 and [Greenwood and Nikulin, 19961. If Rank(B) = s, then
IDkl
=
k
(
1-
CPi) IJ
-
Jkl/lJI
and
where Mk = I k
+q k q z / ( l
Using Lemma 1 one gets
k
p i ) and Jk = BzMkBk,
-
~ i= i
-
1,
i
-
1.
#j,
i = 1, ...,r - 1,
J-,
+ biJ-lBT-1) (D;-I)j,
rz.j . - T 2% . . (fiqT-1 where j = 1, ...,i
k = 1, ..., T
i=l
i
2 2, (DY-l1)jis the j - s
(14)
column of DYIl de-
fined by (13) and bi is the i-th row of B. With R = (&-1iO) and maximum likelihood estimates (MLEs) of all matrices involved we finally obtain
qv-l)(8,)
(15)
= RVI;; ( e n ) ,
8, is the MLE of 8 . Thus, we have the Theorem 3. Under the proper regularity conditions ([Nikulin, 19731 and [Moore and Spruill, 19751) the expansion
where
Y,"(8,) = ST(8,)
+ . . . + S,"-,(8,)
of the well-known Rao-Robson-Nikulin statistic Y: (8,) ([Greenwood and Nikulin, 19961) holds and in the limit under Ho statistics i = 1, ..., r - 1, are distributed independently as and the statistic Y,"(8,) is distributed as
@(en),
XZ-~.
x:
Independent
x:
249
Distributed in the Limit Components
A decomposition of the Hsuan-Robson-Mirvaliev statistic
4.3
If the moment type estimator (MME) 8, of 8 is used, then the limit covariance matrix of standardized frequencies will be ([Mirvaliev, 20011)
D = I,
+ (c
- cv-lcT
- qqT
-
BK-'v)v-~(c - B K - ~ v ) ~ .
Elements of matrices C,K, and V are defined, e.g., in [Voinov et al, 2007 (a)]. If Dk = Ik - qkq$ - CkV-'C; ( C , - BkK-'V)V-l(Ck - BkK-lV)T, then
2%)/ v -
+
k
IDkI
= (1 -
D ; ~= A,
-
vkllLkl//v12,
A ~ ( c ,- B,K-~v)L;'(c,
where
+ MkCk(V
A, = Mk L, = v
+ (c,
-
-
-
B,K-~v)*A~,
(16)
Vk)-lC$Mk,
B , K - ~ v ) ~ A , ( c ,- B ~ K - ' v ) ,
d-,
Consider the matrix R,-1 with elements ~ i i= 1,...,~ - 1 , r i j= - ~ ~ ~ ( d ~ ~ ~ - ~ ) ) ~ ( D ~ ~=~1),..., j ,i-1, w h e ri ef j j , (d:(i-l))T= -(fiqT-l+ciV-lCT-l
ci being the i-s row of C and
-(ci-biK-lV)V-'(Ci-i
(DF21)jis the j - s
i i
=
2 2,
-Bi-~K-lv)~),
column of DF21 defined by
(16). From Lemma 1 and the transformation 6,-1(an) = RV/:;(6J,), where
R = (R,-l:O), with MMEs of all matrices involved we get the Theorem 4. Under the proper regularity conditions ([Hsuan and Robson , 19761) the expansion
Y,"(e,)=sf(en)+.-+b:-l(a,)
(a,)
of the Hsuan-Robson-Mirvaliev statistic Y$ ([Mirvaliev, 20011) holds and i = 1,..., T - 1, are distributed in the limit under Ho statistics independently as and the statistic Y;(6,) is distributed as
xf
&(a,),
References [Anderson, 1994lG. Anderson. Simple tests of distributional form. Journal of Econometrics, 62:265-276, 1994. [Anderson, 1996lG. Anderson. Nonparametric Tests of Stochastic Dominance in Income Distributions. Econometrica, 64:1183-1193, 1996. [Boero et a1 , 2004lG. Boero, J. Smith, K.F. Wallis. Decomposition of Pearson's chi-squared test. Journal of Econometrics, 123:189-193, 2004.
250
Recent Advances in Stochastic Modeling and Data Analysis
[Cochran , 1954lW.G. Cochran. Some Methods for Strengthening the Common x2 Tests. Biometrics, 10:417-451, 1954. [Drost , 1988lF. Drost. Asymptotics f o r generalized chi-square goodness-of-fit tests. Amsterdam: Center for Mathematics and Computer Sciences, CWI Tracts,V.48, 1988. [Dzhaparidze and Nikulin , 1992lK.O. Dzhaparidze, and M.S. Nikulin. On evaluation of statistics of chi-squared type tests. In Problems of the Theory of Probability Distributions, St. Petersburg: Nauka, 12:59-90, 1992. [Fisher, 1925lR.A. Fisher. Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd, 1925. [Gilula and Krieger , 198312. Gilula and A.M. Krieger. The Decomposability and Monotonicity of Pearson’s Chi-square for Collapsed Contingency Tables with Applications. JASA, 78:176-180, 1983. [Greenwood and Nikulin, 1996lP.S. Greenwood and M.S. Nikulin. A guide to chisquared testing. New York: John Wiley and Sons, 1996. [Hsuan and Robson , 1976lA. Hsuan, D.S. Robson. The x2 goodness-of-fit tests with moment type estimators. Commun.Statist. Theory and Methods, A5: 15091519, 1976. [Lancaster, 1951lH.O. Lancaster. Complex Contingency Tables Treated by the Partition of x2 . J. Royal Statist. Society, B13:242-249, 1951. [McCulloch, 1985lCh.E. McCulloch. Relationships among some chi-squared goodness of fit statistics. Commun. Statist. Theory and Methods, 14:593403, 1985. [Mirvaliev, 2001lM. Mirvaliev. An investigation of generalized chi-squared type statistics. Doctoral thesis, Academy of Science of the Republic of Uzbekistan. Tashkent, 2001. [Moore and Spruill, 1975lD.S. Moore, M.C. Spruill. Unified large-sample theory of general chi-squared statistics for tests of fit. Annals of Statistics, 3:599-616, 1975. [Nikulin, 1973lM.S. Nikulin. Chi-square test for continuous distributions. Theory of Probability and its Applications, 18:638-639, 1973 (in Russian). [Nikulin and Voinov, 2006lM. Nikulin, V. Voinov. Chi-squared testing. In S. Kotz, N. Balakrishnan, C.B. Read, B. Vidakovich, editors, Encyclopedia of Statistical Sciences , 2nd Edition, volume 2 , John Wiley and Sons, Hoboken, New Jersey, pages 912-921, 2006. [Rayner, 2002lG.D. Rayner. Components of the Pearson-Fisher Chi-squared statistic. J . of Applied Math. and Decision Sciences, 6(4):241-254, 2002. [Van Der Vaart, 1998lA.W. Van Der Vaart. Asymptotic Statistics. Cambridge: Cambridge University Press, 1998. [Voinov et all 2007 (a)]V. Voinov, R. Alloyarova, N. Pya. Recent achievements in modified chi-squared testing. In F. Vonta, M. Nikulin, N. Limnios, and C. Huber, editors, Statistical Models and Methods for Biomedical and Technical Systems, Birkhauser, Boston, Ch.18:249-265, 2007. [Voinov et all 2007 (b)]V. Voinov, R. Alloyarova, N. Pya. A modified chi-squared goodness-of-fit test for the three-parameter Weibull distribution and its applications in reliability. In C. Huber, N. Limnios, M. Mesbah, and M. Nikulin, editors, Mathematical Methods for Reliability, Survival Analysis and Quality of Life, HERMES, London, Ch.13:193-206, 2007.
Parametric Conditional Mean and Variance Testing with Censored Data Wenceslao GonzBlez Manteiga’
, C6dric Heuchenne’,
and Cesar SBnchez
Sellerol
*
Departamento de Estatistica e 1.0. Universidad de Santiago de Compostela Campus Sur. 15706 Santiago de Compostela, Spain (e-mail: uencesQzmat .usc.es , cse1leroQusc. es) HEC-Ecole ge gestion de I’Universitb de Likge Universite de Liege 7 Boulevard du Rectorat, 4000 Liege, Belgique (e-mail: C .HeuchenneQulg .ac .be)
Abstract. Suppose the random vector (X, Y ) satisfies the heteroscedastic regres) Var(Y1.) and E sion model Y = m ( X ) c.(X)&,where m(.) = E(YI.), u z ( ( -= (with mean zero and variance one) is independent of X. The response Y is subject to random right censoring and the covariate X is completely observed. New goodness-of-fit testing procedures for m and uz(.)are proposed. They are based on a modified integrated regression function technique which uses the method of [Heuchenne and Van Keilegom, 2006b] to construct new versions of functions of the data points. Asymptotic representations of the processes are obtained and weak convergence to gaussian processes is deduced. Keywords: Goodness-of-fit tests, Kernel method, Least squares estimation, Nonparametric regression, Right censoring.
+
1
Introduction
We consider the following heteroscedastic model
Y
=m(X)
+ cT(X)E,
(1)
E is independent of X (one-dimensional), m . ( X ) = E [ Y I X ]and o2((X)= V a r [ Y I X ] Suppose . also that Y is subject to random right censoring, i.e. instead of observing Y, we only observe ( Z , A ) ,where 2 = min(Y,C), A = I ( Y 5 C) and the random variable C represents the censoring time, which is independent of Y, conditionally on X . Let ( y Z , Ci,X i , Zi, Ai) (i = 1,.. . , n) be n independent copies of (Y, C, X , 2,A ) . The aim of this paper is to test the hypothesis
where
Ho : !P E M versus H1 : 9 $ M , where
M
m(.)or .(.)
= {!P* : 29 E 0 ) is a class of parametric functions,
and 0
c RD. 251
(2)
9(.) is either
252
Recent Advances in Stochastic Modeling and Data Analysis
The approach used in this paper was introduced by [Stute, 19971 and is based on an estimator of the integrated function !P(.),
I ( x )=
1:
S(.z)dE&),
where F x ( x ) = P ( X 5 x). Following the lines of [Stute, 19971, the corresponding integrated process is given by n
I P ( x ) = n-l'z
($(XZ,K) - P ! (Xi)) I
5 x) ,
( X Z
(3)
i=l
using the fact that I ( z ) = E [ l { x l z ) $ ( X ,Y)], where E [ $ ( X ,Y)IX] = ! P ( X ) . Therefore] $ ( X , Y ) = Y or (Y - m ( X ) ) ' and may depend on a vector of parameters according to the required test. When censored data are present, extensions of methods proposed by [Heuchenne and Van Keilegom, 2006a], [Heuchenne and Van Keilegom, 2006bl are used to estimate the parameters of !P(.) (possibly $(.;)) and replace censored $(.;) by artificial versions which can be considered as uncensored. Although a number of goodness-of-fit tests exists for the regression function with censored data (see by example [Pardo-Fernhdez et al., 2007]), few results are obtained for the conditional variance and especially for a function to test which is nonlinear instead of polynomial. [Stute et al., 20001 developped a goodness-of-fit test for censored nonlinear regression but it suffers from restrictive assumptions. This is due to the use of the bivariate KaplanMeier estimator of [Stute, 19931. It assumes that (1) Y and C are independent (unconditionally on X ) and that (2) P ( Y 5 CIX, Y) = P ( Y 5 CIY), which is satisfied when e.g. C is independent of X. Both assumptions are often violated in practice. The paper is organized as follows. In the next section, the testing procedure is described in detail and Section 3 summarizes the main asymptotic results, including the weak convergence of the proposed process (the extension of I P ( z ) to censored data) to a Gaussian process. The proofs of those main results can be obtained on request to the authors.
2
Notations and description of the method
The idea of the proposed method consists of first estimating the unknown .) due to censored observations, and second insert those sofunctions obtained artificial functions into the classical process (3). Define
Parametric Conditional Mean and Variance Testing with Censored Data
253
under the null hypothesis (!Pok(.) = @(.) if HO is true). The index k indicates to which test corresponds the new data point $)"*(Xi,Zi, Ai). Indeed, 1. for k = 0, $'(Xi, y Z ) = yZ corresponding to a goodness-of-fit test for the conditional mean m, 2. for k = 1, Q1(Xi,yZ) = ( y Z - mo,(Xi))' corresponding to a goodnessof-fit test for the conditional variance a 2 ,assuming that the conditional mean has a known parametric form (and the true vector of parameters is defined by Bo), 3. for k = 2, Q 2 ( X i ,y Z ) = ( y Z - m(Xi))' corresponding to a goodness-of-fit test for the conditional variance u 2 , not assuming any parametric form for the conditional mean m. Hence, we can work in the sequel with the variable @* ( X i , Zi, &) instead y Z ) . In order to estimate $I)"* ( X i ,Zi, Ai) for a censored obserof with $)"(Xi, vation, we first need to introduce a number of notations. Let ma(.) be any location function and no(.)be any scale function, meaning that m o ( x ) = T ( F ( . I z ) )and ao(x)= S ( F ( . I x ) )for some functionals T and s that satisfy T(FaY+b('(x)) = a T ( F y ( . I x ) ) b and s(Fay+b('lx))= a S ( F y ( . l x ) ) ,for all a 1 0 and b E R (here Fay+b(.lz)denotes the conditional distribution of aY b given X = x). Let EO = (Y - m o ( X ) ) / a o ( X ) . Then, it can be easily seen that if model (1) holds (i.e. E is independent of X ) , then EO is also independent of X . Define F ( y l z ) = P(Y 5 ylz), &(z) = P ( X 5 z), F:(y) = P(EO 5 y), for Eo = (2- m o ( X ) ) / a o ( X )we , denote Hz(y) = P(Eo 5 y), and for Co = ( C - m o ( X ) ) / a o ( X )G:(y) , = P(Co5 y). Rx = [xo,211 denotes the compact support of the variable X . We have
+
+
$)"*(Xi,Zi, Ai) = $)"(Xi,yZ)Ai
;J $"Xi, m O ( x i+ ) .O(XZ)Y)
+
1 - F,o(E,o)
@YY) (1 - Ail,
k = 0,1,2, for the following choices of ma and 'a :
where F-'(slz) = inf{y;F(ylx) 2 s} is the quantile function of Y given x and J ( s ) is a given score function satisfying So1 J ( s ) ds = 1. When J ( s ) is chosen appropriately (namely put to zero in the right tail, there where the quantile function cannot be estimated in a consistent way due to the right
254
Recent Advances in Stochastic Modeling and Data Analysis
censoring), m o ( z )and oo(z)can be estimated consistently. The distribution F(ylz) in (4) is replaced by the [Beran, 19811 estimator, defined by (in the case of no ties) :
where
K is a kernel function and {a,} a bandwidth sequence. Therefore,
my(.) =
1 1
P ( S l . ) J ( S )
ds,
0
P(.)=
@;-1(Sl.)2J(S)
ds - 7i202(.)
0
estimate m o ( z )and oo2(x).Next,
denotes the [Kaplan and Meier, 19581-type estimator of F," (in the case of no ties), where @ = (Zi - 7i2°(Xi))/r?o(Xi),,!?'pi, is the i-th order statistic of
Ey, . . . , E: and A,i) is the corresponding censoring indicator. This estimator has been studied in detail by [Van Keilegom and Akritas, 19991. This leads to the following estimators for $ k ( X i ,Y,)(lc= 0 , l ) :
I , @ ( X ~Zi, , Oi)= YGi
= Y,Ai
(8)
Parametric Conditional Mean and Variance Testing with Censored Data
255
F ( y ) = 1) for any distribution F . Truncations by T in the above integrals and denominators are due to right censoring (however, when TF," 5 7@, T can be chosen arbitrarily close to r q ) . In G p ( X i , Z i , A i ) , 190 is then replaced by its estimator obtained in [Heuchenne and Van Keilegom, ZOOSb], while m(.)in G p ( X i ,Zi,Ai) is replaced by a nonparametric estimator, say f?q-(Xz), developed, by example, in [Heuchenne and Van Keilegom, 2005a], [Heuchenne and Van Keilegom, 2005bl. In the particular case of estimating E(Y,IX = x), the estimator of [Heuchenne and Van Keilegom, 2005bI is very easy to understand. In fact, it reduces to a weighted average of estimated of yi, where E ( r * I X = z) = E(Y,IX = x), for i = artificial versions 2,. . . , n. Here also, the lost information due to censoring is in some way completed using model (1). That leads to
r*
- mO(x))/ao(x). Therefore, we will define where E,", = (Zi
c n
751*(x) =
Wi(5,an)Y;Z*
i=l
for
- f ? ~ ~ ( z ) ) / i ? ~and ( x )wi(x, un),i = 1,.. . ,n, are the local where EYx = (Zi polynomial weights. Finally, the functions & Y ( X i ,Zi,Ai) (resp P8;,( X i ) ) ,k = 0,1,2, replace $ ( X i , y Z ) (resp @(Xi)),i = 1,.. . ,n, in (3) for which we define n
C[4k(xil zi,~
dzk := argminokEek
i- @ ) 8k(~i)12,
(12)
i=l
as estimators for the parameters describing M I , = {Po, : dI, E @ I , } ( @ I , is a , is a positive integer and @8(.) is either m,j(-)or compact subset of R D kDk uz(.),the tested parametric variance), the class of parametric functions corresponding to the goodness-of-fit test k , k = 0,1,2. Since 4 g ( X i ,Zi,&) = &?(Xi, Zi, Ai,.9T0)= ( y Z - mozo(Xi))2*,i = 1,.. . , n , we will use in the
4%(19T~),
sequel & p ( X i ,Zi,Ai, dTo) = k = 0 , 1 , 2 (especially to develop the proofs). In order to focus on the primary issues, we assume the existence of a well-defined minimizer for (12). Solutions for those problems can be obtained using an (iterative) procedure for nonlinear minimization problems,
256
Recent Advances in Stochastic Modeling and Data Analysis
like e.g. a Newton-Raphson procedure. Therefore, we consider the following expression n
ICPk(z) = n-'/'
~ ( G & ( I ~~ Y~, T~( X)i ) ) I ( X i5 z), k = 0,1,2. -
nk
(13)
i=l
More precisely, we propose a Kolmogorov-Smirnov type statistic
and a Cramer-von Mises type statistic
where @x(.)is the empirical distribution of the X-values. The null hypothesis (2) is rejected for large values of the test statistics. As it is clear from the definitions of Gk((29zo), ! P o ~ k ( X iand ) for k = 0,1,2, expression (13) is actually estimating
19zk
n
n-'/'z($k(eT) P ~ ~ ( x ~ ) )5I z), ( x ~k = 0,1,2, -
(14)
i=l
where +o* T i - Y& = KAi
(15)
Parametric Conditional Mean and Variance Testing with Censored Data
257
Remark 2.1 (Test with known parametric variance) In the case k = 0, we test a parametric form for the conditional mean without assuming any parametric form for the conditional variance. We could consider such a parametric form introducing it at the denominator of each term of (13) for k = 0. This would be equivalent to define + ( X ,Y ) = Y / a o ( X )for some 8. An estimator for the vector of parameters 0 could be obtained by example using (12) for k = 2 and the analytic form of the corresponding test statistics would be straightforward.
3 Asymptotic results We start by developing an asymptotic representation for the expression (13) under the null hypothesis and where the remaining term is op(n-l/’) uniformly in z. This will allow us to obtain the weak convergence of the process I C P ( z ) . Finally, the asymptotic distributions of the proposed test statistics are obtained. The assumptions, notations and proofs of the results below can be provided by the authors.
Theorem 1. . Under some assumptions and the null hypothesis Ho, n
n
xz(xi, ni,8 0T
= n-’
T
7
ek
) + Rn(z)
i=l
f o r some specific functions x z ( X i , Zi, Ai, SUP{IR,(Z)~;Z
E
e:, eT), k = 0 , 1 , 2 , where
R ~ =)op(n-l/’)
Theorem 2. Consider the assumptions of Theorem 1. Then, under the null hypothesis Ho, the process n
C(Gg(fizo) -
P ~nkT ( x ~ ) ) I I(z),x ~ k = 0,1,2, z E R X , i=l converges weakly to a centered gaussian process W k ( x ) with covariance function
I c p k ( Z ) = np1/’
T
T
e l ) x z ’ ( X z ,zi,ni,eo , e k 11. Corollary 1. Consider the assumptions of Theorem 1. Then, under the null hypothesis Ha, Co.(Wk(z),Wk(z’))= E [ X z ( X z ,zz, nil
258
Recent Advances in Stochastic Modeling and Data Analysis
References [Beran, 1981lR. Beran. Nonparametric regression with randomly censored survival data. Technical Report, Univ. California, Berkeley. [Heuchenne and Van Keilegom, 2006a]C. Heuchenne and I. Van Keilegom. Polyncmial regression with censored data based on preliminary nonparametric estimation. To appear in Annals of the Institute of Mathematical Statistics. [Heuchenne and Van Keilegom, 2006b]C. Heuchenne and I. Van Keilegom. Nonlinear regression with censored data To appear in Technometrics. [Heuchenne and Van Keilegom, 2005a]C. Heuchenne and I. Van Keilegom. Estimation in nonparametric location-scale regression models with censored data. Conditionally accepted by Annals of the Institute of Mathematical Statistics. [Heuchenne and Van Keilegom, 2005blC. Heuchenne and I. Van Keilegom. Mean preservation in Conditionally accepted by Journal of Multivariate Analysis. [Kaplan and Meier, 1958lE.L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53:457-481, 1958. [Pardc-Fernkndez et al., 20071J.C. Pardc-Fernkndez, -, and et al. Goodness-of-fit tests for parametric models in censored regression. To appear in Canadian Journal of Statistics. [Stute, 1993lW. Stute. Consistent estimation under random censorship when ccvariables are present. Journal of Multivariate Analysis, 45:89-103, 1993. [Stute, 1997lW. Stute. Nonparametric model checks for regression. Annals of Statistics, 25:613-641, 1997. [Stute et al., 2OOO]W. Stute, -, and et al. Nonparametric Model Checks In Censored Regression. Communications in Statistics. Theory and Methods, 29:1611-1629, 2000. [Van Keilegom and Akritas, 199911. Van Keilegom and M.G. Akritas. Transfer of tail Annals of Statistics, 27:1745-1784, 1993.
Synthetic data based nonparametric testing of parametric mean-regression models with censored data Olivier Lopez’ and Valentin Patilea2 Crest-Ensai and Irmar Campus de Ker Lann Rue Blaise Pascal BP37203 35172 Bruz Cedex, France (e-mail: lopezaensai .f r) Crest-Ensai Campus de Ker Lann Rue Blaise Pascal BP37203 35172 Bruz Cedex, France (e-mail: p a t i l e a a e n s a i .fr)
Abstract. We develop a kernel smoothing based test of a parametric mean-regression model against a nonparametric alternative when the response variable is rightcensored. The new test statistic is inspired by the synthetic data approach for estimating the parameters of a (non)linear regression model under censoring. The asymptotic critical values of our tests are given by the quantiles of the standard normal law. The test is consistent against any fixed alternative, against local Pitman alternatives and uniformly over alternatives in Holder classes of functions of known regularity. Keywords: Nonparametric test, Synthetic data, Right censoring.
1 Introduction Parametric mean-regression models, in particular the linear model, are valuable tools for exploring the relationship between a response and a set of explanatory variables (covariates). However, in survival analysis such models are overshadowed by the fashionable proportional hazard models and the accelerated failure time models where one imposes a form for the conditional law of the response given the covariates. Even though mean-regression models involve weaker assumptions on the conditional law of the responses, the popularity of the parametric mean-regressions with censored data greatly suffers from the difficulty tto perform st,at,istical inference when not all responses are available. This paper’s main purpose focuses on a further step in the statistical inference for parametric mean-regression models under right censoring, that is nonparametric lack-of-fit testing. Checking the adequacy of a parametric regression function against a purely nonparametric alternative has received
259
260
Recent Advances in Stochastic Modeling and Data Analysis
a large amount of attention in the non-censored case and several approaches have been proposed. See, among many others, [Hardle and Mammen, 19931, [Zheng, 19961, [Stute, 19971, [Dette, 19991, [Horowitz and Spokoiny, 20011, or [Guerre and Lavergne, 20051, and the references therein. But for censored data, these approaches are not directly applicable. To our knowledge, very few solutions for nonparametric regression checks with right-censored responses have been proposed. Following the approach of [Stute, 19971, test based on an empirical process marked by weighted residuals was introduced in [Stute et al., 20001, the role of the weights being t o account for censoring. The limit of their marked empirical process is a rather complicated centered Gaussian process and therefore the implementation of the test requires numerical calculations. [SBnchez-Sellero et al., 20051 reconsidered this type of test and provided a complete proof of its asymptotic level. However, for technical reasons, [SBnchez-Sellero et al., 20051 drop some observations in the right tail of the response variable and therefore the resulting test i s no longer omnibus. Moreover, neither [Stute et al., 20001 nor [SAnchez-Seller0 et al., 20051 studied the consistency of the test against a sequence of alternatives approaching the null hypothesis. In this paper, we consider an adaptation of the kernel-based test statistic studied by [Zheng, 19961 t o regression with right-censored responses. Closely related test statistics can be found also in [Hardle and Mammen, 19931, in [Horowitz and Spokoiny, 20011, or in [Guerre and Lavergne, 20051. Under suitable conditions, the test statistic converges in law to a standard normal when the model is correct.
2
Model and assumptions
+
Consider the model Y = m ( X ) E , where Y E R, X E RP, E ( E I X ) = 0 almost surely ( a s . ) , and m (.) is an unknown function. In presence of random right censoring, the response Y is not always available. Instead of ( Y , X ) , one observes a random sample from ( T ,6, X ) with
where C is the “censoring” random variable, and 1~ denotes the indicator function of the set A . In our setting, the variable X is not subject to censoring and is fully observed. We want to check whether the regression function m (.) belongs to a parametric family
M where
=
{f (0, .) : B
E 8
c Rd}
(2)
f is a known function. Our null hypothesis then writes Ho
: 380,
E ( Y I X ) = f ( B o , X )u . s . ,
(3)
while the alternative is P [E( Y I X ) = f ( B , X ) ] 5 c for every B E 0 and some c < 1. For testing Ho, first we need to estimate 190.
Nonparametric Regression Checks with Censored Data 2.1
261
Estimating (non)linear regressions w i t h censored data
Since the observed variable T does not have the same conditional expectation as Y , classical techniques for estimating parametric (non)linear regression models like M must be adapted t o account for censorship. One of the proposed approaches is the synthetic data (SD) procedure. In this approach one replaces the variable T with some transformation of the data Y*, a transformation which preserves the conditional expectation of Y . Several transformations have been proposed, see for instance [Leurgans, 19871, [Zheng, 19871. In the following, we will restrain ourselves to the transformation first proposed by [Koul et al., 19811, that is
y* =
6T
(4)
l-G(T-)'
where G ( t ) = P (C 5 t ) . The following assumptions will be used throughout this paper to ensure that E ( Y * 1 X ) = E ( Y 1 X ) for Y * defined i n (4). A s s u m p t i o n 1. Y and C are independent. Assumption 2. P ( Y 5 C I X , Y ) = P ( Y 5 C I Y ) . These assumptions are quite common in the survival analysis literature when covariates are present. Assumpt,ion 1 is an usual identification condit,ion when working with the Kaplan-Meier estimator. [Stute, 19931, pages 462-3, provides a detailed discussion on Assumption 2. Notice that Assumption 2 is flexible enough to allow for a dependeiice between X and C. Moreover, Assumptions 1 and 2 imply the following general property: for any integrable
m,XI,
Unfortunately, one cannot compute the transformation (4) when the function G is unknown. Therefore, given the i.i.d. observations ( T l , J l , X 1 ),..., (Tn,6,, X,), [Koul et al., 19811 proposed t o replace G with its Kaplan-Meier estimate
(6) and to compute
y.*=
6iTt 1 - G(Ti-)'
i = 1,..., n.
Next, [Koul e t al., 19811 proposed t o estimate 00 by
esD that minimizes
(7)
Recent Advances in Stochastic Modeling and Data Analysis
262
esU
over 0. They obtained the consistency of and the asymptotic normality of f i ( e S D- 0 0 ) in the particular case of a linear regression model. [Delecroix et al., 20061 generalized these results to more general functions f (0,
.I
3
Nonparametric test procedures under censoring
Testing the adequacy of model M is equivalent to testing
300,
Q (00) = 0
where
Q (0) = E [U(0) E [U (0) I XI g (X)l
(9)
U (0) = Y - f (0, X) and g denotes the density of X that is assumed t o exist. The choice of g avoids handling denominators close t o zero. When the responses are not censored, one may estimate Q (00) by the kernel-based estimator
where 8 is an estimator of 00 such that 6 - 00 = O p ( n - l / ’ ) , U i ( 0 ) = Y , - f (0, Xi),K is some p-dimensional kernel function, h denotes the bandwidth and for z E R P , K h ( z ) = K ( z / h ) . See [Zheng, 19961. See also [Horowitz and Spokoiny, 20011 or [Guerre and Lavergne, 20051. Using a consistent estimate V: of the asymptotic variance of nhP/2Qn(e), the smoothing based test statistic with non-censored responses is
Under the null hypothesis the statistic behaves asymptotically as a standard iiorinal and therefore the rioiiparainetric test is defined a s
‘‘ Reject
H0 when TZc 2
”,
Z I - ~
where zlTn is the ( 1 - a ) t h quantile of the standard normal law. As an estimate V,”,one could use either
or
with $(z) a nonparametric estimator of u2 (z) = V u r ( ~I X = z). The former choice for @ : is simpler but is likely to decrease the power of the test because the squares of the estimated residuals of the parametric model produce an upward biased estimate of o2 (z) under the alternative hypothesis.
Nonparametric Regression Checks with Censored Data
263
A test statistic in the case of right-censored responses
3.1
In the following, the observations are ( T I &,XI), , ..., (T,, S,,X,), a random sample from ( T , S ; X ) .In the spirit of the SD approach, consider the estimated synthetic responses p;, ..., Y; obtained from formula (7). Now, the is ) analogue of ~ , ( i
where
i = iSD and
are the estimated synthetic residuals. The statistic
0,””(0) estimates
Q S D ( 0 )= E [Us” (0) E [Us” (0) I X ]g (X)]
(16)
with U s D (0) = 6T [l - G (T-)]-’ - f (0. X ) . By (5), if Assumptions 1 and 2 hold then the null hypothesis is equivalent t o QsD (0,) = 0. Therefore Q,””(e) can serve t,o build our first test, statistic.
[c:”] of the asymptotic variance of 2
Now, given a consistent estimates
nhP/2Q:D(8),we propose the following test statistic:
The corresponding omnibus test is
‘‘ Reject Ho
when T:”
2
”
To estimate the variance of nhp/’Q,””(8) we consider
4
Asymptotic analysis
The most difficult part of t,he st,itdy of our tests is the investigation of the properties of Q:D (0). This quadratic form is difficult to analyze even when Ho holds true and 0 is equal to 00, since it does not rely on i.i.d. quantities Ui, as the quadratic form (10) does. In fact, due t o the presence of G in (15), each U:D (0,) depend on the whole sample. Then, a key point is to
264
Recent Advances in Stochastic Modeling and Data Analysis
show that under Ha, in some sense, Q,””(J) is asymptotically equivalent to the “idea1”quadratic form
where
and Q r L s ( O o ) can be done like in the nonThe asymptotic study of Q:”(Oo) censored case. Therefore, the asymptotic level of our tests will be obtained as a consequence of the equivalence result and using techniques for kernel-based tests in the i.i.d. case. A similar equivalence result deduced under fixed or moving alternatives will serve for studying the asymptotic consistency of our tests. 4.1
Assumptions
We list here some assumptions needed for the convergence of our test.
Assumption 3. The test procedure using the unknown Y,*converges. See, for instance, [Horowitz and Spokoiny. 20011 for a set of sufficient conditions. Assumption 4. If x = (zl, ..., xp),let K (z) = I? (zl) ...I? ( z p )where ?j is a symmetric continuous density of bounded variation on R. Assumption 5. The bandwidth h belongs to a range NFln = [h,i,, h,,,], n 2 1, such that h,,, + 0 and nhzin + 03. Assumption 6. Let H ( t ) = P(T 5 t ) , F ( t ) = P ( Y 5 t ) . Let C(y) = [l-G(t)]-’ [ 1 - H ( t ) ] - l d F ( t ) .The following integrability condition holds,
s:i
for some
E
> 0.
Assumption 6 has t o be connected with the integrability condition in [Stute, 19951 and [Stute, 19961, which permits to obtain a central limit theorem for Kaplan-Meier integrals. In our case, we were forced to introduce an extra E > 0, which makes our Assumption slightly more restrictive, but permits to extend the i.i.d. representations of [Stute, 19951 t o kernel smoothing. 4.2
Behavior of the tests under the null hypothesis
The following theorem gives an asymptotic representation of the test statistics under Ha stated in (3).
T : ”
Nonparametric Regression Checks with Censored Data
265
Theorem 1. Under Assumptions 1 to 6, under Ho,
an probability, where 2 2 [ V 3 6 o ) ] = n ( n- 1 ) h P
c[u,””(eo,]z[u;”(Bo,]2
K2 (Xi
t#j
Moreover, under H o ,
Corollary 1. Under Assumptions 1 to ti, the test defined in the p,revious section has asymptotic level a . 4.3
Behavior of the tests under the alternatives
Consider a sequence of measurable functions X,(z), of alternatives
n 2 1, and the sequence
We study consistency against three types of alternatives: fixed alteriiatives, Pitman local alternatives, and smooth alternatives. The first case is obtained when the function A, is the same function for each n . Pitman alternatives consist in considering X,(z) = r,X(z), where X is a fixed function, aid r, a deterministic sequence tending to zero. The last case we consider is the consistency uniformly over alternatives that belong to a Holder smoothness class and vanish as n + co.The regularity s of the Holder class is supposed known. It can be shown that a representation similar to that of Theorem 1 holds under the alternatives. Hence our test will have the same property as the test in the uncensored case. In particular, it will be consistent against any fixed alterndive and against Pitman alternatives with rn decreasing to zero slower than n-1/2h-P/4. For the smooth alternatives, the fastest rate at which A, can approach to zero while permitting consistent testing uniformly over the Holder class of functions considered is arbitrarily close to the optimal rate of testing n - 2 s / ( 4 s + P ) , provided that s > 5p/4.
References [Delecroix et al., 2006lM. Delecroix, -, and et al. Nonlinear censored regression using synthetic data. Working Paper No. 2006-1 0, CREST-INSEE, 2006.
266
Recent Advances in Stochastic Modeling and Data Analysis
[Dette, 19991H. Dette. A consistent test for the functional form of a regression based on a difference of variance estimators. Ann. Statist. 27:1012-1040, 1999. [Fan and Huang, 2001]J. Fan and L.S. Huang. Goodness-of-fit tests for parametric regression models. J. Amer. Statist. Assoc., 96:640-652, 2001. [Guerre and Lavergne, 2005lE. Guerre and P. Lavergne. Data-driven rate-optimal specification testing in regression models. Ann. Statist., 33:840-870, 2005. [HBdle and Mammen, 1993lW. Hardle and E. Mammen. Comparing nonparametric versus parametric regression fits. Ann. Statist., 21:1296-1947, 1993. [Hjort, 199OlN.L. Hjort. Goodness of fit test in models for life history data based on cumulative hazard rates. Ann. Statist. 18:1221-1258, 1990. [Horowitz and Spokoiny, 2001lJ.L. Horowitz and V.G. Spokoiny. An adaptive, rateoptimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica, 69:599-631, 2001. [Koul et al., 1981lH. Koul, -, and et al. Regression analysis with randomly right censored data. Ann. Statist., 9:1276-1288, 1981. [Leurgans, 19871s. Leurgans. Linear models, random censoring and synthetic data. Biometrika, 74:301-309, 1987. [SBncliez-Sellero et al., 20051C. Srincliez-Sellero, -, and et al. Uniform representation of product-limit integrals with applications. Scand. J . Statist., 32:563-581, 2005. [Stute, 19931W. Stute. Consistent estimation under random censorship when CDvariables are present. J . Multivariate Anal., 45:89-103, 1993. [Stute, 1995lW. Stute. The central limit theorem under random censorship. Ann. Statist., 23:422-439, 1995. [Stute, 1996lW. Stute. Distributional convergence under random censorship when covariables are present. Scand. J. Statist., 23:461-471, 1996. [Stute, 1997lW. Stute. Nonparametric models checks for regression. Ann. Statist., 25:6 13-641, 1997. [Stute et al., 2OOO]W. Stute, -, and et al. Nonparametric model checks in censored regression. Comm. Statist. Theory Methods, 29:1611-1629, 2000. [Zheng, 198712. Zheng. A class of estimators of the parameters in linear regression with censored data. Acta Mathematicae Applicatae Sinica, 3:231-241, 1987. [Zheng, 19961J.X. Zheng. A consistent test of functional form via nonparametric estimation techniques. J . Econometrics, 75:263-289, 1996.
CHAPTER 7
Dynamical Systems / Forecasting
Application of the single index model for forecasting of the inland conveyances Eugene Kopytov’, Diana Santalova’ 1
Transport and Telecommunication Institute 1 Lomonosov Str., LV-1019, Riga, Latvia (e-mail: kopitovOtsi.1~) * Faculty of Transport and Mechanical Engineering Riga Technical University 1 Kalku Str., LV-1658, Riga, Latvia (e-mail: [email protected]) Abstract. In this paper the regression models used for description and forecasting of the inland rail passenger conveyances of the regions of Latvia were considered. Two estimation approaches were compared: the classical linear regression model and the single index model. Various tests for hypothesis of explanatory variables insignificance and model correctness have been lead, and the cross-validation approach has been carried out as well. The analysis has shown obvious preference of the single index model. Keywords: Passenger conveyances, Forecasting, Regression model, Single index model
1
Introduction
Comprehensive planning of transport company activity demands presence of set of the models that adequately describe functioning of the railway and application of various mathematical methods for forecasting of rail passenger conveyances. The models considered in the given paper allow analyzing passenger flows and forecasting the demand for passenger transportation along some particular sectors of the railway on the whole territory of Latvia [Kopytov and Demidovs, 20061. In the course of the suggested research linear regression model and the single index model (SIM) have been received. The object of consideration is inland rail passenger conveyances in the regions of Latvia In the present research different statistical information about passenger transportation and social-economic development of Latvian regions in the period of 2000 - 2003 is used [Latvijas dzelzcejS 2003, Statistical Yearbook of Latvia 20031. The calculations have been made in the packages STATISTICA 6.0 and MathCad 12. Principal aim of the research is to construct the linear model and the single index model, and then to choose from them one, which gives the better forecasts ~
268
Application of Single Index Model for Forecasting of Inland Conveyances
269
of conveyances. For that the described below cross-validation approach is used. We have used the residual sum of squares RSS for comparing the elaborating models [Srivastava 20021. Especially for the single index model the series of experiments is carried out with the aim to determine the optimal value of bandwidth h [Hardle et al. 2004, Santalova 20061. The paper is organized as follow. First of all the used regression models are considered from theoretical point of view, then the used experimental data are described. After that we consider the suggested group models for conveyances forecasting and results of carried out estimation and comparing analysis of these models. 2
Description of the Models
In present research all investigated models are group models [Andronov et al., 19831. In other words we are able to forecast inland rail passenger conveyances for all considered regions using the same models. With respect to used mathematical model we consider linear regression models and semiparametric regression models. The linear regression model can be represented as follows:
where
xi
= (1
PT =(Po xi,, ...
...
Pd)
is a vector of unknown coefficients,
is a vector of values of independent variables in i-th
observation. One of semiparametric models is used as well model [Hardle et al. 20041:
-
the single index regression
where g(.) is an unknown link function of one dimensional variable and
zi = P T x i is called an index. Here we assume only that unknown function m(x) is a smooth function. As g ( . ) function the kerneZ function usually is considered [Hardle et al. 20041. Therefore we need to estimate the unknown coefficients vector p and the
270
Recent Advances in Stochastic Modeling and Data Analysis
link function g ( . ) . For the latter, the Nadaraya-Watson kernel estimator can be applied:
i=l
where zi = (x -
p
is the value of index for the i -th observation, Y, is the
value of dependent variable for the i-th observation and Kh(-)is so called kernel function. As K , (.) we use the Gaussian function
where h is a bandwidth. The unknown parameter vector criterion:
p
is estimated by use of the least squares
For that we use the gradient method. The corresponding gradient is the following:
Application of Single Index Model f o r Forecasting of Inland Conveyances
271
and
is the first order derivative of the Gaussian kernel. We are able to compare single index models by the residual sum of squares RSS only. We calculate the residual sum of squares as follow:
where n is the number of observations, d is the number of estimated coefficients, is observed value and g ( x i ) is estimated value.
3
Creation the Models for the Inland Rail Passenger Conveyances Forecasting
As an example that shows an opportunity of the suggested models, research of influence of social and economic parameters of Latvian regions development on demand of passenger transport service is carried out. All the needed data have been obtained from the reports [Latvijas dzelzcelS 2003, Statistical Yearbook of Latvia 20031. For the experiments statistical information about 28 Latvian regions (districts and cities) in the period of 2000 - 2003 have been chosen. First of all, the forecasted variable is the inland rail passenger conveyances, expressed in hundreds of passengers. Let us denote it by Y. The considered explanatory factors, or predictors, are [Kopytov and Demidovs, 20061: xl - the population density; x 2 - the number of enterprises per a unit of territory; x3-the number of enterprises per 1000 residents; x4-the density of the unemployed population; x5 - the number of schools per a unit of territory; x6 - the number of buses per a unit of territory; x7- the number of buses per 1000 residents; x8 -the number of railway stations.
272
Recent Advances in Stochastic Modeling and Data Analysis
Now we describe the investigated regression models. The first model is the simple linear regression model (2) and the second model is the single index model (3). The dependent variables
Y(') in linear
model and Y ( 2 )in SIM are inland rail passenger conveyances. Explanatory variables are all eight mentioned above. So, we have two regression models and our task is to estimate the unknown coefficients j? for the models, to compare the suggested models and to prove the preference of semiparametric model. Taking into account the fact that absolute preference of SIM in case of data smoothing has been shown in [Andronov et al. 2006, Santalova 20061, let us demonstrate only the obtained results for smoothing. We obtain the best variant of the linear model by using Backward Stepwise mode of Statistica 6.0 package. In the process of carrying out calculations the number of prediction variables was decreased from 8 to 5. At last, only five most significant predictors are included in this model, such as x2, x3, x4, x5 and xg. RSS for this model is 42 430, coefficient R2 is equal to 0.48 and the calculated value of Fisher criterion is 16.58. The theoretical value of Fisher criterion for 5 and 89 degrees of freedom and level of significance ct = 5% is 2.32. So, this model is adequate. Equation for the estimated linear model is such:
h(Y(')(x))=-255-82x2 +16x, -34x4 +13033x, +18xg.
(10)
Now we discuss results of estimation of investigated single index model. Estimation of coefficients j? has carried out with different bandwidths. Note that in the single index model only the most significant predictors are included, i.e. x2,x3, x4, x5 and x8 . This restriction can be explained also by calculation possibility of MathCadl2 package. The best chosen single index model with h = 0.5 and RSS = 8 877 can be written as:
i=l
Application of Single Index Model for Forecasting
of
Inland Conveyances
273
p'
where vector of estimated coefficients = (12 265 78 0.3 0.1). So, we can conclude that the chosen single index model with h = 0.5 in the case of smoothing gives more precise results than the linear model in sense of RSS.
4
Cross-Validation Analysis
Now we consider the suggested models from the other point of view. We have used the cross-validation approach. That means we estimate the unknown coefficients p for the models on the basis of statistical data from period 2000 2002. Then using the obtained estimates of p we forecast the conveyances for the 2003 and compare these forecasted conveyances with real ones, i.e. we calculate RSS for both models. The optimum value of bandwidth h is found for the single index model as well. The Table 1 contains the estimates of p f o r considered linear regression model. Coefficient R2 is equal to 0.42 and the Fisher criterion is 10.29, the residual sum of squares is 79 029. So, the investigated linear model is adequate in the case of cross-validation as well. The observed conveyances and the corresponding forecasts for the tested period are displayed at Figure 1.
Variable Intercept x2 x3 x4
~5 X8
h.
Std.Err. t(70) 121,94 -1,83 -223,30 -74,35 29,50 -2,52 14,24 9,43 1,51 4,55 -6,13 -27,89 11258,lO 2229,49 5,05 1537 9,47 1,64
p-level 0,071314 0,014008 0,135366 0,000000 0,000003 0,104623
Fig. 1. Forecasting by the linear model Now let us analyze the single index model in details. We begin with a choice of the bandwidth size. Our task is to find the optimal value of bandwidth ho that gives a minimal value of RSS (see [Hardle et al. 20041). The series of experiments was performed and the different estimates of p and values of RSS depending of various h were obtained as well. The corresponding results for the analysed single index model are shown in Table 2. We can see all the estimates of p differs from each other depending on h in spite of they were obtained for the same begin value of p . The values of RSS corresponding to various h are presented in Table 3. Thus, the best result for RSS is achieved for ho = 2. As it was supposed the residual sum of squares increases if h is bigger and smaller than optimal value of h. The forecasted conveyances by the single index model with ho = 2 and observed conveyances as well are shown on the Figure 2. Coefficients
A x2 x3 x4
X5 x8
Bandwidth h
10 6253 -15700 62 860 619 25750
5 2462 7730 35 180 3 13380
3
2
121 3 177 148 0.7 3 110
-38 2555 180 0.05 1561
Table 2. The estimates of p for the SIM
I -6 330 12 0.2 0.1
Application of Single Index Model for Forecasting of Inland Conveyances r
10 79453
I
5
46 194
Bandwidth h 3 I 50454
I
2 40992
1
275
1 166 415
Table 3. The values of RSS for the SIM
Obviously, the forecasted values are very close to the observed values almost in all the observations. Moreover, the SIM shows more precise forecasts for regions with large conveyances that the linear one.
Figure 2. Forecasting by the single index model The values of RSS for both investigated models in cases of smoothing and cross-validation analysis are collected in Table 4. As we can see, the values of RSS are less in all the considered cases.
Linear Model SIM
Smoothing 42 430 8 877
Cross-Validation 79 029 40 992
Table 4. The values of RSS
Conclusions In the course of the suggested research several regression models, which allow evaluating the influence of the main social-economic factors on the volumes of passenger transportation by the railway transport in the regions of Latvia have
276
Recent Advances in Stochastic Modeling and Data Analysis
been received. Two group models were compared: the classical linear regression model and the single index model. Various tests for hypothesis of explanatory variables insignificance and model correctness have been lead, and the crossvalidation approach has been carried out as well. The results of analysis show the obvious preference of the single index model, in other words the semiparametric approach has given better results than classical parametric approach in cases of smoothing and forecasting.
References [Andronov et al., 19831 A. Andronov et al. Forecasting ofpassenger conveyances on the air transport. Transport, Moscow, 1983 (in Russian). [Andronov et al., 20061 A. Andronov, C. Zhukovskaya and D. Santalova. On Mathematical Models for Analysis and Forecasting of the Europe Union Countries Conveyances. In RTU zindtniskie raksti, Informdcijas tehnologijas un vadtbas zindtne, 2006 (in printing). [Hardle et al., 20041 W. Htirdle et al. Nonparametric and Semiparametric Models. Springer-Verlag, Berlin, 2004. [Kopytov and Demidovs, 20061 E. Kopytov and V. Demidovs. Virtual Anticipatory Models in Decision Support System of Railway Transportation. In Daniel M. Dubois, editor, International Journal of Computing Anticipatoy Systems, Published by CHAOS, Institute of Mathematics, University of Liege, 2006. [Latvijas dzelzcej9, 20031 Annual Report of State Joint-stock Company “Latvijas dzelzcel?’ for 2003. Published by SJSC “Latvijas dzelzcelS”, Riga, 2004, (in Latvian). [Santalova 20061 D. Santalova. Forecasting of Railway Freight Conveyances in EU Countries on the Base of the Single Index Model, In Proceedings of RelStat ’06, Riga, 2006 (in printing). [Srivastava 20021 M.S. Srivastava. Methods of Multivariate Statistics. John Wiley & Sons Inc, New York, 2002. [Statistical Yearbook of Latvia, 20031 Statistical Yearbook of Latvia 2003. Published by Central Statistical Bureau of Latvia, Riga, 2003.
Development and Application of Mathematical Models for Internet Access Technology Substitution Apostolos N. Giovanis’ and Christos H. Skiadas’ IUniversity of Teesside, Dept. of Business Studies 8, Pellinis St. & 107, Patission Ave., 1 12 5 1, Athens, Greece
*Technical University of Crete, Dept. of Production Engineering & Management 73 100, Chania, Greece
Abstract Technological change is a dynamic and multidimensional phenomenon involving substitution and diffusion of technologies. In this process new technologies are entered into the market while the old ones are displaced. Various models have been constructed to explore the principles of technology substitutions. In general there are two approaches in tackling the problem. The first approach is using the replacing factor as the measure of technology substitution assuming that it is a function of market share captured by the technology or/and the time elapsed since technology introduction into the economic system. The second approach models the competitive technologies penetration separately where their intrinsic and competitive effects are explicitly represented into the relevant models. The purpose of this paper is to review the existent modeling approaches of technology substitution and analyze the factors affecting the interrelationship between the competing technologies. Finally, the models under review will be used to describe the substitution of the dial-up by the broadband technology for internet connectivity purposes and the models’ fitting and forecasting performance is illustrated. Keywords: Substitution Technologies.
Models,
Technology
Marketing,
Internet
Access
Introduction Rai (1999) defined technology substitution as the process by which an innovation is replaced partially or completely by another in terms of its market share over a period of time. The replacement of the old technology may be instantaneous o r it m a y take a lot of time. The new technology m a y seem to be evolutionary or revolutionary depending upon the take-over time period and each successive generation m a y have a new niche market b y creating new customers. Both the new and the old technology in the market influence the penetration of each other. The final result of this battle depends on the attacker’s and the defender’s strengths and weaknesses. The study of new technologies’ diffusion is the object o f interest for researches of market development and planners of technology development at the firm, regional and national levels. Considering the problems of 277
278
Recent Advances in Stochastic Modeling and Data Analysis
reaching relevant data on technological development the majority of the modeling approaches are focused on the time pattern of the technology development considering the market shares of competing technologies. The aim is the development of easy to handle models of technology substitution able to support the activities of explaining and forecasting the competing products penetration levels. Ryan and Gross (1943) note two important features of this process: the very wide range of rates at which different innovations spread over the time and the general S-shaped diffusion of most innovations. Broadly speaking the diffusion process consists of an introductory phase when the diffusion rate is not very high, followed by the growth phase of relatively quick penetration. The third phase is called the maturity phase in which the market share of the technology reaches its maximum value. In the fourth phase, the market share of that technology declines, caused by emergence of new, better technology. The history showed that the new technology emerges in the second phase of the old technology (Marchetti, Nackisenovic, 1979). The current paper will refer to models with two competing technologies. In the first part of the paper, the most popular technology substitution models are presented and commented. The models under investigation are separated in two categories, those which are using the technology replacement factor as dependent variable and those in which both the intrinsic and competitive effects are present in the relevant models. In the second section, the models are used to represent the substitution of dial-up by broadband technology using monthly data from UK. The models fitting and forecasting performance is illustrated and several conclusions are provided.
1 Review of Technology Substitution Models Most approaches aiming to describe technology substitution are based on the analogy to dissemination of information or to epidemic processes (diffusion of disease through contagion andor external influence). This paper deals with two types of technology substitution models. The first approach is using the replacing factor as the measure of technology substitution assuming that it is a function of market share captured by the technology or/and the time elapsed since technology introduction into the economic system (Fisher-Pry Type Models). The second approach models the competitive technologies penetration separately where their intrinsic and competitive effects are explicitly represented into the relevant models using the appropriate parameters into the respective mathematical forms (Lotka-Volterra Type Models). 1.1
Fisher-Pry Type of Models
The first group includes the models which assume that as the total population remains constant, the evolution of competing technologies is renormalized in
Models for Internet Access Technology Substitution
279
considering only the relevant market share in the total population. To analyze such cases technology substitution models have been proposed by many researches including Fisher-Pry, Blackman and Rai. They studied substitution on the basis of measuring the relative market share of the old versus the new technology competing in the market. The model proposed by Fisher-Pry (1971), which is the most widely used model to describe technology substitution processes, is based on the following assumptions: the substitution process is competitive once substitution has progressed as far as few percent, it will proceed until a complete takeover takes place and the rate of substitution is proportional to the remaining market potential It may be described by the following mathematical formula d f / d t = b .f .( I - f ) , which on integration yields ln(f/(I - f))= a + b .t . Here f denotes the market share of the new technology, (I - f ) denotes the market share of the old technology, b is the intrinsic diffusion rate of the new technology and a represents the constant of integration which value is given by the initial fraction. Recently a class of new substitution models has been proposed by Rai (1999) to describe the substitution process. These models relax the third hypotheses of the Fisher-Pry model assuming that the replacement factor is nonlinear in nature and are based on the arguments that influence is a function of both market share and time. The mathematical formulas describing these models along with the Fisher-Pry model are given below:
..
.
Fisher-Pry Parabolic
df = b .f .(1 - f ) dt
df = (a + b st). f .(I - f) dt
Power
( i,)
In - - a + b .t('+')
(3
Exponential 1.2
Lotka-Volterra Type of Models
The base of the second group of technology substitution models is the traditional diffusion laws such as the logistic and gompertzian ones. This group of models relaxes all the hypotheses posed earlier for the Fisher-Pry
280
Recent Advances in Stochastic Modeling and Data Analysis
type of models since several forms of technologies coexistence in the economic environment can be represented by them. The first model in this group is the well-known Lotka-Volterra model (LVM) which is based on logistic growth. Logistic growth has been popularly used as modeling intraspecific (within one species) competition. The LVM models interspecific (between species) competition since it includes the effects of intraspecific competition, but adds the competitive effect of another species. The interaction between two species (or technologies) can be expressed in general terms via LVM. When two species, X and Y, are competing together in the same environment, the LVM equations become:
This system of equations contains all fundamental parameters that impact the rate of growth of both species. Namely, a j ,i = X,Y is the logistic parameter for the species i when it is living alone and bi,i = X,Y is the limitation parameter of the niche capacity related to the niche size for the species i. The values of cH and cyx are related to the overlap. In other words, they represent the sales that one technology may lose due to fact that the other technology won one. The system of equation (2), formulates a measurement for one’s ability to attack, counterattack or retreat, as the case may be. The discrete forms of equation (2) have been worked out by Lesli (1957) as follows:
where
a, =ear
The discrete form of LVM has been used by Modis (1997) and Hwang et al. (2005) in order to examine competition phenomena in technology marketing and financial markets respectively. Assuming that X is the incumbent and Y the attacker, A represents the attacker’s advantage and D the defender’s
Models for Internet Access Technology Substitution 281
counterattack (Farrell, 1993). Smitalova and Sujan (1991) distinguished and labeled six types of competition according to the sign of the former two coupling parameters involved in LVM which are given in Table 1.
Table 1: Competition Types based on LVM
I
cXy
I
cYx
I
TypeofInteraction
I
Pure competition
+ +
+
Mutualisdsymbiosis
+
0
CommensalisdParasitism
0
Amenalism
0
Neutralism
0
Predator - Prey
The second model in this group is called GCM and is based on the Gompertz Growth Model. The Gompertz equation can be written as:
where F is the saturation level of the diffusion process and c is the diffusion rate. A discrete form of for the Gompertz equation can be written as (Dennis and Taper, 1994): ~ (+ I t) = ~ ( t exp(a ) . - c .ln(X(t)))
(9)
In equation (9) the carrying capacity corresponding to the process saturation level is given by F = exp(a/c) . The value of constant a is determined by the size of population. Assuming that two species X and Y, are competing together in the same environment, the equations for the Gompertz Competition Model (GCM) is a direct extension of equation (9) and are given by the following formulas:
where c, denotes the diffusion rate if technology i assuming no competition, a, denotes the limitation parameter of the niche capacity related to the niche size of technology i and n, m denote the interaction effect with other technologies. Environmental stochasticity can be introduced as follows:
282
Recent Advances i n Stochastic Modeling and Data Analysis
~ (+ 1)t = ~ ( t > exp(al . - c1+1n(X) + n .ln(Y(t))+ wl.~ ( t ) ) (1 1) ~ (+ t1) = ~ ( t )exp(a2 . - c2 .ln(Y(t)) + m . ln(X(t>)+ w2 .~ ( t ) ) where w land w 2 are positive constants and Z(t) is normally distributed with mean 0 and variance 1. The two forms of equation (1 1) can be transformed to a logarithmic scale as follows: ln(x(t + 1))= ln(X(t>)+ a, - c, .ln(x(t)) + n .ln(Y(t)) + w, . ~ ( t ) (12) ln(Y(t + I ) ) = ln(Y(t>)+ a2 - c, .ln(Y(t>)+ m .ln(X(t))+ w, .~ ( t ) Baretto (2003) distinguished and labeled four types of competition according to the value of model’s parameters involved in GCM which are given in Table 2.
Condition
Type of Interaction
1/m < al/a2 > n Technology X wins; technology Y is wiped out l / m > al/a2 < n Technology Y wins; technology X is wiped out I / m > al /a2 > n Both technologies coexist in stable equilibrium l / m < al /a2 < n Both technologies coexist in unstable equilibrium
2 Empirical Analysis The six models under review will be used to explain the substitution mechanism of dial-up by the broadband technology for internet connectivity purpose. Dial-up is the first internet access service that uses a standard modem over the regular copper wiring, with the usual modem speed of 56K, though actual speeds vary greatly. Broadband is a higher-speed transmission of data over a connection that is “always-on (upload speeds: 384 Kbps download speeds: up to 24 mbps). This means a broadband connection is online and ready to go 24 hours a day. Direct Subscriber Line (DSL) and cable modems are two examples of always-on technologies. The available data are concern the monthly market shares of the two technologies in UK from Jan 2001 until Dec 2005. The data for the two technologies’ market shares retrieved form http://www.statistics.gov.uk. The market shares of the two technologies and the relevant replacement factors are given in Figure 1. It can be seen that the market share of broadband overcome those of dial-up during the second quarter of 2005 and that the replacement factor is not a linear function of time. The first 48 months will be used for models’ calibration while the last 12 months will be used to compare their forecasting performance. The forecasting time-horizon is enough considering the dynamism of the broadband technology.
Models f o r Internet Access Technology Substitution
283
For the Fisher-Pry type of models the dependent variable is the replacement factor and the models’ calibration outcomes are given in Table 3 . All models’ parameters are statistically significant.
Figure 1: Internet Access Technologies Market Shares in UK
ZOO,
-a-Dial-up
MS
+Broadband
MS
+Replcement
Factor
From the results it is obvious that the Exponential model presents the best fitting performance followed by Power and Parabolic models. The original Fisher-Pry model failed to explain effectively the substitution process since the technology replacement factor is not a linear function of time. Table 3: Fitting Performance of the Fisher-Pry Type of Models
284
Recent Advances in Stochastic Modeling and Data Analysis
The calibration statistics for the LVM andGCM are given in Table 4. The IV/2SLS method (Pindyck and Rubinfield, 1981) was used for the models’ parameter estimation. The method showed quick convergence for both substitution models. Both models’ parameters are statistically significant and supports the Predator-Prey type of competition for the specific technology substitution process since for the LVM cH and cyx have different signs ( c H >O and c y x < 0) while for the GCM the ratio al/a2 is greater than llm and less than n.
Table 4: Fitting Performance of the LVM and GCM LOTKA-VOLTERRA MODEL Standard Error
0,002 0,021 0,395
0,006
1 1 11: 0,026
CXY
I
GOMPERTZ COMPETITIONMODEL
I
Coefficients
0,064
0,024
-0,014
0,007
0,085
0,042
BB: Broadband - DU: Dial-up It can be seen that the LVM produce better results than the GCM since the respective model’s Sum of Squared Errors (SSE) is lower for both technologies. The resulted values of cH and cKy shows that every time Broadband gains a sale, Dial-up would lose 0,026, showing the raise revolutionary technology. The 12-month ahead forecasting performance for all six technology substitution models is given in Table 5.
Table 5: Forecasting Performance of the Six Reviewed Models Forecast
SSE Dial-u
Models for Internet Access Technology Substitution
285
From Table 5 it can been seen that overall the LVM produce better results for both technologies while the Power extension of the Fisher-Pry models predicts better the Broadband penetration. Both the calibration and forecasting performance of six reviewed models are illustrated in Figure 2.
Figure 2: Fitting & Forecasting Performance of Reviewed Models
Conclusions In this paper six models were presented expressing the early and the latest approaches in technology substitution process representation. The early modeling approaches are using the technology replacement function as substitution measure which depends on the market share of the technology and/or the time elapsed since the new technology launching. The latest modeling approaches are measuring the market share of competing technologies with respect to its intrinsic diffusion rate as well as to competition effects. The models were applied to the dial-up technology
286
Recent Advances in Stochastic Modeling and Data Analysis
substitution by the broadband technology in UK. The latest modeling approaches are more useful on understanding the how and when of the substitution mechanism, while the early ones are simpler and easier to be applied in real conditions. The outcomes of the empirical analysis yields that the Lotka-Volterra Model shows overall greater exploratory and forecasting capabilities in representing substitution processes since it is able to represent several types of competition types
References Barreto, L. S. A Gompertzian Discrete Model for Tree Competition, Silva Lusitana, 11(1):77-89, 2003. Dennis, B. and Taper, M. L. Density Dependence in Time Series Observations of Natural Populations: Estimation and Testing, Ecological Monographs, 64:205-224. 1994. Farrell, C. Survival of the Fittest Technologies, New Scientist, 137: 35-39, 1993. Fisher, J. S. and Pry, H. A Simple Substitution Model for Technology Change, Technological Forecasting and Social Change, 2:75-88, 1971. Hwang, S., Lee, S. and Oh, H. A Dynamic Competition Analysis of Stock Markets, Journal of Emerging Market Finance, 4: 1-25,2005. Leslie, P. H. A Stochastic Model for Studying the Properties of Certain Biological Systems by Numerical Methods, Biometrica, 45: 16-31, 1957. Marchetti, C. and Nakicenivic, N. The Dynamics of Energy Systems and Logistic Substitution Model, RR-79, IIASA, Laxenburg, Austria. Modis, T. Genetic Reengineering of Corporations, Technological Forecasting and Social Change, 56: 107-118, 1997. Pindyck and Rubinfield. Econometric Models and Economic Forecasts, McGraw-Hill, 1981. Rai, L. P. Appropriate Models for Technology Substitution, Journal of Scientific and Industrial Research, 58: 14-18, 1999. Ryan, B. and Gross, N. The Diffusion of Hybrid Seed Corn in two Iowa Communities, Rural Sociology, 7:15-24, 1943. Smitalova, K. and Sujan, S. A Mathematical Treatment of Dynamical Models in Biological Science, Ellis Honvood, West Sussex, England, 1991.
Exploring and Simulating Chaotic Advection: A Difference Equations Approach C. H. Skiadas Technical University of Crete, Chania, Crete, Greece Abstract: This paper explores the chaotic properties of an advection system expressed in difference equations form. In the beginning the Arefs blinking vortex system is examined. Then several new lines are explored related to the sink problem (one central sink, two symmetric sinks, eccentric sink and others). Chaotic forms with or without space contraction are presented, analyzed and simulated. Several chaotic objects are formulated especially when special rotation angles or a complex sinus rotation angle are introduced in the rotation-translation difference equations. Very interesting chaotic forms arise when elliptic rotation-translation equations are applied. The simulated chaotic images and attractors express several vortex-like forms resulting in various situations and especially in fluid dynamics. Keywords: Chaotic advection, The sink problem, Aref system, Rotationtranslation equations, Rotation angle, Vortex, Vortex flow, Chaotic simulation.
1. Introduction Questions addressed when dealing with chaotic advection turn back to nineteenth century and the development of Hydrodynamics and especially the introduction of the Navier-Stokes equations (Claude Navier, 1821 and George Stokes, 1845). The vortex flow case and the related forms including vortex-lines and filaments, vortex rings, vortex pair and vortex systems can be found in the classical book by Horace Lamb first edited in 1879[6]. However, the formulation of a theory that partially explains the vortex problem and gives results that coincide with the real life situations is only a matter of recent years, along with the use of computer experiments. The introduction of terms like chaotic advection and the blinking vortex system came only last decades in order to define and analyze specific vortex flow cases. In most cases the problem setting and solution followed the differential equations approach which mostly was directed to solve a boundary value problem of a Navier-Stokes equation formulation. Few interesting cases are based on a difference equations analogue in the direction to simply explain in more details the vortex flow problem. However, the formulation and analysis of vortex flow problems by using the difference equations analogue can be very useful for several cases if a systematic study is applied. In this paper we follow the difference equations methodology by introducing rotationtranslation difference equations and a non-linear rotating angle along with a space contraction parameter in order to study chaotic advection problems. The interconnections between the difference and the differential equations case is also studied in specific cases. 287
288
Recent Advances in Stochastic Modeling and Data Analysis
2. The Sink Problem 2.1. Central sink Consider a circular bath with a sink in the center at ( x , y ) = (0,O). The water inside the bath is rotating counterclockwise. A colored fluid is injected in the periphery of the bath. Find the shape of the fluid filaments if the sink is open. Geometrically the problem is that of rotation with contraction following a parameter b < 1. The rotation-translation model is applied with the translation parameter u = 0 . The equations of flow are:
x,+, = b(x, cos 4t - Yt sin 4t 1 Yt+l
= b(xt sin 4t + Yt cos 4t )
The contraction to the radial ( r =
d m )
direction is found
from the last relation and the equations of flow
yt+l = bJ(x2
+ y?) = br,
The rotation angle is assumed to follow a function of the form
4t = c + d l r ,2
The space contraction is given by estimating the Jacobian of the flow
J = b2 . When b C 1 a particle is moving from the periphery of the bath to the sink in the center of coordinates following spirals as is illustrated in Figurel. The parameters selected are b = 0 . 8 5 , ~ = 0,d = 0.4 and the initial point is at ( x , y ) = (1,O) . When the same case is simulated for particles entering from the periphery of the rotating system at time t = 0,1,2,..., the following Figure 2 results. The spiral forms start from the periphery and are directed toward the central sink. It is also interesting that while the spiraling flow continues, colored co centric circles appear. These circles have smaller diameter or disappear, as the rotation parameter d is smaller. The parameter b also influences the spiral. Next Figure 3 illustrates an advection case for parameters b = 0 . 9 5 , = ~ 0 and d = 0.01.
Figure 1. Spiral particle paths
Figure 2. Spiral forms directed to the sink
Figure 3. Spiral formation toward a central sink
Exploring and Simulating Chaotic Advection 289
2.2. The contraction process From the above rotation-contraction equations and the very simple relation yt+l
follows that the radial contraction is Ayt = yt+1 - y t
=br,
= byt
= -(l-b)yt
-yt
The differential equation for the contraction process is found by observing that dr Ar - ~ t + ~ - y t
_ --dt
At
-
(t+l)-t
= -(l-b)r
The resulting differential equation expressing the radial speed
t = -(l-b)r is solved to give
r = roe-( 1-b)t
ro is the initial radius. As the equation for the rotation angle is given earlier, the movement is totally explained. The paths are spiraling toward the center. When the movement covers a full circle the new radius will be where
r = roe-( l-b)2d@)
3. Eccentric Sink In the following the case of a circular bath with an eccentric sink is analyzed. The sink is located at ( x ,y ) = (a,()).The equations of flow are: Xt+l
= b((xt - a ) cos $4 - Y t sin 4t 1
Yt+l
= W x t - a )sin 4)t + Y t cos 4 t )
The rotation angle is assumed to follow an equation of the form = c dlr,2
qt
where yt = J ( x t -
+
+ yt2
The limit argument ( 4 + 1 , Y t + l ) = (xt 7 Yt ) = (x,Y
1
will give the relation
x2 + y 2 = b2((x-a)2+ y 2 ) or after transformation
290
Recent Advances in Stochastic Modeling and Data Analysis
[
X
2
+
+Y2 =
This is the equation of a circle with radius
(5) ab R = -centered at 1-b2
The flow is not symmetric. The colored fluid starting from the outer periphery of the bath approaches the sink in few time periods as is illustrated in Figure 4. The parameters selected are u = 0.15,b = 0 . 8 5 , = ~ 0 and
d = 0.1. To simplify the process it is assumed that the colored fluid is introduced simultaneously in the periphery of the bath. Then gradually the circular form of the original colored line is transformed to a chaotic attractor located at the sink's center (x,y ) = (a,()).The attractor is quite stable in form and location. Changes are possible by changing the parameter values. The attractor also appears even if the colored particles are introduced into a small region of the bath as is presented in the next Figure 5. The colored particles are introduced in a square region (0.1 0.1) at the right
*
(x,y ) = (1,o). The parameters selected are a = 0.15,b = 0 . 8 5 , = ~ 0 and d = 0.8. As the vortex parameter d is
end of the bath at
higher than the previous case the chaotic attractor appears at the 6th time step of the process. The attractor is also larger than the previous case.
Figure 4. Chaotic attractor in eccentric sink
Figure 5. Chaotic attractor in eccentric sink
4. Two Symmetric Sinks 4.1. Aref's blinking vortex system Chaotic mixing in open flows is usually modeled by using the 'blinking
Exploring and Simulating Chaotic Advection
291
vortex-sink system' invented by Aref,1983[1], 1984[2] and Aref and Balachandar,l986[3]. Arefs system models the out-flow from a large bath tub with two sinks that are opened in an alternating manner in order to take place a chaotic mixing in the course of the process. To model the velocity field due to a sink we assume the superposition of the potential flows of a point sink and a point vortex. If z = x iy is the complex coordinate in the plane of flow the complex potential for a sinking vortex point is
+
w(z) = -(Q+iK)lnlz-z,l
= (?a,O) and 2@ is the sink strength and 2nK the vortex strength. The imaginary part of w ( z ) is the stream function Y =-Klnr-Q@ where, Z,
And the streamlines are logarithmic spirals defined by the function
@ = -(K/Q) In r
+ const.
The differential equations of motion in polar coordinates are
i. = -Q/r
i-4
=
K/r
And their solutions are
r=dr:-2Qt and
4 = 41 -(K/Q)ln(?/ro) The flow of the system is fully characterized by the non-dimensional sink strength
r7 = QT/a2
and the ratio of vortex to sink strength
5 = K/Q .
T is the flow period and a is the distance of each sink from the center of coordinates. As it is indicated in the literature (KBrolyi and TC1,1997[5], KBrolyi et a1,2002[4]) chaotic flow appears for parameter values = 0.5 or
5
larger and = 10. More precisely when particles are injected into the flow in few time periods are attracted in a specific region (the attractor) of the flow system. Several studies appear last years investigating the phenomenon theoretically and experimentally. The theoretical studies include also simulations by using large grids (1000x1000) and arithmetic solution of the general equations of flow. These studies suggest that the attractors are time periodic according to the time periodicity of the flow. However, if only one sink is used a stable attractor could be present at least theoretically and following simulation experiments as is presented above. This is modeled by investigating the geometry of the flow. First of all Arefs blinking vortex system is applied in a rotating fluid. We select a counter-clockwise rotation. The symmetric sinks are located at (x,y ) = (-a,()) and (x,y ) = (a$) and the time period is
T = 1. According to this system the flow is not
292
Recent Advances in Stochastic Modeling and Data Analysis
stationary and there are jumps in the velocity field at each half period
In other words a particle located at
(-a,())
appears at
Tl2.
(a,()) the next time
period as is illustrated in the next graph of Figure 6.
Figure 6. The two symmetric sinks model The modeling we propose is to analyze a discrete time system, as is Arefs system by using the theory of difference equations and discrete systems. It looks more convenient and highly simpler considering the geometry of this system. The model we search must be a rotation-translation one with a parameter b < 1 expressing the gradual shortening of the radius r , which leads the particles to follow logarithmic spiral trajectories around the sinking vortices. Following to above theory a rotationtranslation model of this type is expressed by the difference equation: i @t
zt+l = a + b ( z ,-zZ,)e
The above complex equation can be written as
x,,, + i ~ , +=~a+b[(x,+ a ) + i y , ] ( c ~ s $+~i . ~ i n $ ~ ) The system of iterative difference relations for X and y is obtained by equating from both sides the real and the imaginary parts of the last complex formula
x,+, = a+b[(x,+a)cos$f - y t sin$t] Yt+l - b[(x,+a)sin$f + yt C O S @ ~ ] If a particle is located at position (X,y ) = (-a,()) the next point after time t = 1 will be located at (XI,y1) = (a,()).The next problem we have is to define the form of the function of the angle flow the differential equation for
$
4 . From the original differential equations of
is
$ = QK/r2 where
r=
d
As the value of
$ yields: And thus
m At
is equal to the periodic time
T = 1 the last equation for
Exploring and Simulating Chaotic Advection
@=&+,
293
175 r
Now it is very easy to find that for Arefs blinking vortex flow the value of = 7T in order to have a half-cycle rotation from one sink to the other. The last equation for @ is a ~ s owritten as
@ =&+ =,
d r
C+,
d
r
d = 175 is the vortex strength. For the experiments presented in the literature 17 = 0.5,5 = 10 and thus d = 5 . However, the chaotic region is Where
more wide as is illustrated in the next figures. The graph of Figure 7 of a chaotic attractor illustrates the two-sink case for parameter values a = 1,b = 0.8, c = 7T and d = 3 . There are two main vortex forms counter-balancing each other. The first form is located at the right hand side sink at (x,y) = (a,()). The second vortex form is centered at
(x,y) = ( a + 2ab cos(@),2absin(@)), where @ = d/(4a2).
The two main vortex forms can be separated when the parameter d expressing the vortex strength is relatively small. Such a case is presented in the next Figure 8. The parameter d = 1 while the other parameters remain the same with the previous example. The attractor is now completely separated into two chaotic vortex forms (attractors).
Figure 7. Chaotic attractor in the two-sink problem
Figure 8. Two distinct vortex forms ( d = 1)
Another idea is to give high values to the parameter d expressing the vortex strength. The selection of a value d = 2n for the vortex strength parameter leads to a more complicated vortex form as is presented in the next Figure 9. There are three equilibrium points for time t = 1,2,3 . The first of these points is the center of the right hand side sink.
294
Recent Advances in Stochastic Modeling and Data Analysis
Figure 9. The chaotic attractor with strong vortex strength parameter
d = 2n 5. Conclusions In this paper we examined the chaotic properties of chaotic advection systems starting from the classical Arefs blinking vortex system. The study followed a difference equation methodology which is, in several cases, more simple and more instructive from the differential equations analogue. W e analyzed and applied a rotation-translation set of difference equations with a dynamical non-linear rotation angle. The resulting chaotic images and attractors express several vortex-like forms resulting in various situations especially in fluid dynamics.
References [ 11 H. Aref. Integrable, chaotic, and turbulent vortex motion in two-dimensional
flows. Ann. Rev. Fluid Mech., 15:345, 1983. [2] H. Aref. Stirring by chaotic advection. J. Fluid mech., 143:1, 1984. [3] H. Aref and S. Balachandar. Chaotic advection in a Stokes flow. Phys. Fluids, 29~3515-3521,1986. [4] G. Khrolyi, I. Scheuring, and T. C z i r h . Metabolic network dynamics in open chaotic flow. Chaos, 12(2):460-469,2002, [ 5 ] G. Kkolyi and T. Ttl. Chaotic tracer scattering and fractal basin boundaries in a blinking vortex-sink system. Physics Reports, 290:125-147, 1997. [6] H. Lamb. Hydrodynamics. Cambridge University Press, Cambridge, 1879.
CHAPTER 8
Modeling and Stochastic Modeling
Likelihood Ratio Tests and Applications in 2D Lognormal Diffusions * Ram6n Gutikrrez' , Concepci6n RoldBn2, Ram6n Gutikrrez-SBnchez' , and Jos6 M. Angulo' Department of Statistics and Operations Research University of Granada. Campus de Fuentenueva, s/n, El8071 Granada, Spain (e-mail: rgjaimezmugr .e s , ramongsaugr .e s , jmanguloaugr .es) Department of Statistics and Operations Research University of J&n. Las Lagunillas, s/n, E-23071 J d n , Spain (e-mail: iroldanaujaen. e s )
Abstract. In a previous work by the authors, maximum likelihood estimators (MLEs) were obtained for the drift and diffusion coefficients characterizing 2D lognormal diffusion models involving exogenous factors affecting the drift term. Such models are well-known t o describe properly the behaviour of real phenomena of interest, for instance, in geophysical and environmental studies. The present paper provides the distribution of these MLEs, the Fisher information matrix, and the solution t o some likelihood ratio tests of interest for hypotheses on the parameters weighting the relative effect of the exogenous factors. Keywords: Diffusion Random Field, Likelihood Ratio Test, Lognormal Diffusion Process, Exogenous Factor.
1
Introduction
The usefulness of diffusion random fields in describing, for example, economic or environmental phenomena, has led to significant developments, particularly regarding inferential aspects. In that respect, from the contribution to theoretical foundations for 2D diffusions given in [Nualart, 19831, we considered 2D lognormal diffusions involving exogenous factors affecting the drift term. We obtained maximum likelihood estimators (MLEs) for the drift and diffusion coefficients, which characterize these diffusions under certain conditions (see [GutiQrez et al., 20051). Using these MLEs, we developed techniques for estimation, prediction and conditional simulation of 2D lognormal 20071). diffusions in [GutiCrrez et d., In this paper, the results obtained previously are completed with the obtaining of the distribution of the MLEs and the Fisher information matrix. This work has been partially supported by projects MTM2005-09209 and MTM2005-08597 of the DGI, Ministerio de Educacio'n y Ciencia, and projects FQM-2271 and FQM-990 of the Andalusian CICYE, Spain. 296
Inference in a 2 0 Lognormal Diffusion 297
However, the main interest of this work is to solve some likelihood ratio tests for the parameters weighting the effect of the exogenous factors, which are relevant for possible application. The contents are organized as follows. First, the 2D lognormal random field model is introduced. Second, estimation of the drift and diffusion coefficients based on a discrete finite set of data is given. Next, the distribution of the MLEs and the Fisher information matrix are calculated. Finally, some likelihood ratio tests for the parameters involved in the formulation of the drift term are solved.
2
Lognormal Diffusion Random Fields
[Nualart, 19831 considered a class of two-parameter random fields which are diffusions on each coordinate and satisfy a particular Markov property related : . Using this theory, we introduced a 2D lognormal to partial ordering in R diffusion random field as follows (see [GutiCrrez and RoldLn, 20071). Let { X (z) : z = (s, t ) E I = [0,S] x [0,T ]c R$} be a positive-valued Markov random field, defined on a probability space ( 0 ,A, P ) , where X (0,O) is assumed to be constant or a lognormal random variable with E [In X (0,O)] = q50 and V a r (In X (0,O)) = The distribution of the random field is determined by the following transition probabilities:
0-0".
+ +
p ( B ,(s h, t + k ) I (z1,z, zz) ,z) = P [ X (s h, t k ) E B I x (s, t k ) = 2 1 , x (z)
+
+
= z,
x (s + h, k ) = 221 ,
where z = ( s , t ) E I , h, k > 0, (z1,z,z2) E R t , and B is a Bore1 subset. We suppose that the transition densities exist and are given by
S(Y,(S+h,t+k)
Y for y E
{ -; JG
(In ( + h ; m z i h 3 k ) 2 }
1
-
I (z1,2,52),z)
R+,with
mz;h,k
=
J,'
s+h
.I
t+k 2
?L (0, 7)dad71
az;h,k =
LS"I"*
B (a,7)dad-r,
-
and 5 , B being continuous functions on I . Under these conditions we can assert that { X (z) : z E I } is a lognormal diffusion random field. The oneparameter drift and diffusion coefficients associated are given by a1
(z) z := 61 (z) (
+ -B1 (z) 2l -
)
(z) z2 := B 1 (z) 2 2 ,
z,
B1
2,
Bz (z) x2 := Bz (z) z2,
298
Recent Advances in Stochastic Modeling and Data Analysis
for all z = ( s , t ) E I , z E R+. The random field {Y (z) : z E I } defined as Y (z) = 1nX (z) is then a Gaussian diffusion random field, with SL and B being, respectively, the drift and diffusion coefficients, and 61,62, B 1 and B 2 being the corresponding oneparameter drift and diffusion coefficients. Furthermore, if z, z’ € I , z = (s, t ) , z’= (s’,t’) , then m y (2) := E
+
[Y (z)] = 40
1’I’ + 1’1
ii (a,T ) d a d T , t
a$ cy
(2) := Var (Y (z))=
a:
(z,z’) := COV (Y (2),Y (z’))= 0:
(Z
B (a,I-) d a d r , A 2’),
where we write z A z’ for (s A s’,t A t’),with ‘A’ denoting the minimum. Henceforth we will assume that the conditions usually considered for estimation of the drift and diffusion coefficients in the one-pacameter case hold; that is, P [lnX (0,O) = 401= 1 (i.e. a$ = 0) and o$ (z) = Bst, z = (s, t ) E I .
3
Inference in the 2D Lognormal Diffusion Model
Let {X (z) : z E I } be a lognormal diffusion random field. Data X =(X(zl), ...,X (z,))~ are assumed to be observed at known spatial locations z1 = (sl,tl), 22 = (s2,tz),..., z, = (sn,tn) E I. Let x = (21,22 ,...,z,)~ be a sample. Let us consider the log-transformed n-dimensional random vec...,Y (z,))~ = (InX (z1), I ~ (z2), X ...,l n x (z,))~ = tor, Y = (Y (z1),Y (221, l n X , and the log-transformed sample, y = (yl,y2, ...,Y,)~ = lnx. We denote
3.1
M L E s for the Drift and Diffusion Coefficients Using Exogenous Factors
Suppose that the drift coefficient 6 of Y is a linear combination of several known functions, set {hl (z) , ...,h, (z): z E I } , with real coefficients 41,...,4, : P
Z E I a=l
Inference in a 2 0 Lognormal Diffusion 299 Defining, for z = ( s , t ) E I ,
the mean of Y is given by
Thus, denoting F = (fo, f1, ...,fp), with fa = ( f a (z1) , f a t for (Y = 0,1, ...,p, and = (40,411..., 4p) , we have
+
my = (4ofo
,
( 2 2 ) ....,fa(z,))~,
+ 4lf1 + ... + 4pfp)= F+.
Let us write ( ~ A1 ~ 2 (tl ) A t2)
(si A
Ey = BM :=B
~ 2 (ti ) A t2)
S2t2
( ~ A1 sn) (tl A tn) ( ~ A2 s n ) ( t 2 A
. . (s1 A sn) (tl A tn) . ' . ( ~ A2 sn) ( t 2 A t n ) '
tn) . ' .
Sntn
With this notation, the MLEs for the drift and diffusion coefficients are, respectively,
+* and
=
(4;,4;, ...,4;lt -
=
( F ~ M - ~ FF ) -~~M - I1nx
(1)
1
B* = - (lnx - m;)t M-' (lnx - m;) , n
(2)
where m; = F+* (see [GutiCrrez et al., 20051).
Remark. I n m a n y practical applications, a polynomial trend provides a suitable representation f o r the drift surface,
m(z) =
c
fpklSktZ, z
=(Sit),
OgC+Z
f o r some appropriate choice of r. 3.2
Distribution of the MLEs, +* and B*
The expected value of the MLE
is
E [+*I=( F ~ M - ~ F F ) -~~M - ~[ lEn x ] = ( F ~ M - ~ F F) -~~M - ~[YI E
+
= ( F ~ M - ~ FF ) -~~M - I F=~
300
Recent Advances in Stochastic Modeling and Data Analysis
and then, +* is unbiased. Taking into account that Y y-1 N (F+,BM) , it is clear that the distributions of the estimators are given by
N (4, B ( F ~ M - I F ) ~ ~ )
+*-
(3)
and The last estimator is biased. Therefore, we consider the following transformation: n 1 B** = B* = (1nX - F+*)tM-l (1nX - F 4 * ) , ( 5 ) n-p-1 n-p-1 which is an unbiased estimator of B . In addition, B* and +* are independent and
(
n - p f l
2 ~ a (nB*) r =
1
B*) =
V a r (B**) = ~ a r
( n- P
-
1)
2132 n-p-1
Therefore, the covariance matrix of (b* and B** is
( B (FW;~F)-'
-
n-p-1
P+l
with Ot = ( 0 , ...,0 ) .
3.3
Fisher Information Matrix
We first calculate dln L
-= - B - l ( l n X
a+
dlnL -
n
--
aB
2B d21nL n aB2 2B2
+
-
1
~ 4M -)~ F~,
(1nX - FC$)~M-'(1nX - F+),
I
2B2
n
(In X - F+)t M-' (In X - F+) ,
B3
and
E[=]=--a2In L
n 2B2
2(n+p-l) 2B2
--n+2p-2 2B2
.
Inference in a 2 0 Lognormal Diffusion 301
Therefore, the Fisher information matrix is
4
Parametric Hypothesis Testing
We split the vector 4 = (q30,41,..., q$,)
t
as follows:
where +1 is pl x 1 and 42is pa x 1, with pl in testing the hypothesis
+ p2 = p + 1. We are interested
Ho : 41 = & where q1is a pl x 1 fixed vector versus H1 : cpl # &. The total region and the region associated with null hypothesis are, respectively,
Under these hypotheses,
and the likelihood ratio statistic for testing HOis
For obtaining the distribution of this statistic, we first denote A = FtM-'F and C = FtMP11nX and consider the following partitions:
302
Recent Advances in Stochastic Modeling and Data Analysis
and then
On the other hand, we can write l n x - Fq5 = Inx - F@*,+F@>-F@
Using the previous notation,
Finally, the likelihood ratio statistic is written as
Inference in a 2 0 Lognormal Diflusion -
where
-
3,)t
(&a - &)
is
W1
(
->
pl, B
303
and distributes indepen-
dently of g; (see [Anderson, 20031). This means that the distribution of A f is the same as the distribution of
U U+V’ where U and V are independent random variables with distributions given n - p - 1,B and Wl
->
4.1
Some interesting contrasts
If we consider the null hypothesis HIJ :
= 0,
Pl
with Ot =
(0, that we have
1- Af n - p - 1
A?
5
+
Fp1,n-p-1.
Pl
Conclusions
Considering a 2D lognormal diffusion model, in this paper we have calculated the distribution of the MLEs of the drift and diffusion coefficients, the Fisher information matrix, and solved some likelihood ratio tests for hypotheses on the parameters weighting the relative effect of exogenous factors affecting the drift. The results obtained are important for real applications; in particular, for prediction and conditional simulation following the techniques described in [GutiQrez et al., 20071).
References [Anderson, 2003lT.W Anderson. A n Introduction to Multivariate Statistical Analysis. Wiley & Sons, New Jersey, 3rd edition, 2003. [GutiCrrez and Roldan, 2007lR. Gutikrrez and C. Roldan. Some analytical and statistical aspects related to 2D lognormal diffusion random fields. Scientiae Mathematicae Japonicae Journal Online, e-2007:341-360, 2007. [GutiCrrez et al., 2005lR. Gutikrrez, C . Roldin, R. Gutikrrez-Sanchez, and J.M. Angulo. Estimation and prediction of a 2D lognormal diffusion random field. Stochastic Environmental Research and Risk Assessment, 19:258-265, 2005. [Gutikrrez et al., 2007lR. GutiCrrez, C . Roldan, R. Gutikrrez-Sinchez, and J.M. Angulo. Prediction and conditional simulation of a 2D lognormal diffusion random field. Methodology and Computing in Applied Probability, DO1 10.1007/~11009007-9029-3, 2007. [Nualart, 1983lD. Nualart. Two-parameter diffusion processes and martingales. Stochastic Processes and their Applications, 15:31-57, 1983.
Cartographical Modeling as a statistical method for monitoring of a spatial behaviour of population Irina Pribytkova’ 1 Institute of Sociology of National Academy of Sciences of Ukraine Shovkovichnaja str., 12 Kiev, Ukraine (e-mail: Abstract. The Cartographical Modeling belongs to the system of common scientific methods we use in search of new knowledge and its proving. The study of spatial relations is based on a map providing the most complete description and comprehension of any territorial problems. A map gives a new information of more high order on mapping phenomena which is hidden in an initial figures. This new information one have got due to generalization of statistics is of particular value to scientific research and practical needs. The process of generalization results in discovery of the cartographical structures forming a certain system. Analysis of these structures enables the revelation of spatial regularities in disposition, proportion, combination and dynamics of sociodemographic and socioeconomical processes and phenomena. Besides, the cartographical modeling provides the transition from discrete to continuous knowledge. This is the only method to obtain the continuous picture of spatially unbroken phenomena on the basis of discrete factual information (Aslanicashvili A,, 1974). The importance of uninterrupted knowledge contained in the cartographical model is conditioned not only by its possibility to reveal the changes of investigated process or phenomena “from place to place” but also by its potentialities to bring to light a significant spatial relations between them and other social and natural processes and phenomena represented in the given model (map). The new knowledge obtained in the course of modeling serves as a basis for working out of the management decisions. The comparison of identical models for a few years in succession gives us the notion about the nature and rate of changes and development of spatial structures. The cartographical modeling may be regarded as one of the modification of latent structure analysis which pursues an object to reveal and distinguish the latent groups of population with peculiar social organization, material and cultural consumption, goals, preferences and behaviour. The permanent observation of current statistical information during a long time creates the necessary grounds for organization of data base. The collection of statistical data, their standardization and compiling of series of relevant maps are integral parts of monitoring as a system of supervision and control after the processes of spatial behaviour of population. 304
Cartographical Modeling as a Statistical Method
305
The scientific programme of monitoring includes also the working out of prognoses concerning eventual changes in the course of spatial self-organization of people, providing it with necessary information about possible unfavourable consequences, appraisals of regulation decisions and their efficiency. Present paper contains the analysis of a spatial behaviour of rural population in Ukraine since the seventies, carried out by means of cartographical modeling of statistical data in the monitoring regime. Keywords: Cartographical Modeling, spatial relations, map, factual information, continuous knowledge, latent structure analysis, statistical information, permanent observation, supervision, control.
1
The common conception of a method
The cartographical modeling belongs to the system of common scientific methods used in search of new knowledge and its proving. The study of spatial relations is based on a map providing the most complete description and comprehension of any territorial problems. Many researchers perceive the maps only as illustration for text or figures having no independent scientific value. Meanwhile the map is none other than information system, channel for transmitting of spatial information. Cartographical language can be regarded as the peculiar sign system: the cartographical images are the primary means to transfer the information. Giving rise to visual notions, the cartographical images enable to obtain the integral perceiving of spatial structures. When analyzing the figures in the tables the thought and attention of researcher are distracted from the general to the particular. Cartographical images, as a means of transmitting of spatial relations, proves to be immeasurably stronger than algebraic symbols because the object mode of thinking is more effective than Eormal. The cartographical language has many positive qualities. It is universal and gets over all speech barriers without difficulties. It has a laconic and capacious character and enables to express the judgments in the lapidary form. And at last it has the two-dimensional nature. All these properties of a map language enlarge to a considerable extent the information capacities of a cartographical model. Just the two-dimensional character of cartographical linguistic system affords the possibilities for the investigation of spatial relations. A map gives a new information of more high order on mapping phenomena which is hidden in an initial figures. This new information one have got due to generalization of statistics is of particular value to scientific research and practical needs. The process of generalization results in discovery of the cartographical structures forming a certain system. Analysis of these structures enables the revelation of spatial regularities in disposition, proportion, combination and dynamics of sociodemographic and socioeconomical processes and phenomena.
306
Recent Advances in Stochastic Modeling and Data Analysis
2 The cognition properties of cartographical model The cartographical model represents the scientific abstraction received in consequence of generalization of a concrete properties of studied objects. The abstract character of generalized cartographical model A. Berliant defines as one of a main positive qualities of a map: applying to its contents the researcher can observe not only separate or systematized factual statistics but also an integral spatial image. He meets with “system of notions”, appearing in the course of map composition and fixed in the cartographical model and its legend (Berliant, c. 19). Besides, the cartographical modeling provides the transition from discrete to continuous knowledge. This is the only method to obtain the continuous picture of spatially unbroken phenomena on the basis of discrete factual information (Aslanicashvili A., 1974). The importance of uninterrupted knowledge contained in the cartographical model is conditioned not only by its possibility to reveal the changes of investigated process or phenomena “from place to place” but also by its potentialities to bring to light a significant spatial relations between them and other social and natural processes and phenomena represented in the given model (map). A new knowledge obtained in the course of modeling serves as a basis for working out of the management decisions.
3 The methodological premises It is expendient from the methodological point of view to research the spatial structures and regularities of their development on the basis of cartographical analysis of fractional administrative units - rural administrative regions and within the bounds of the last - on a scale of village councils. The regions and provinces are too large-scale territorial units for such investigation. Cartographical modeling on a scale of provinces enables to carry out the macrodivision of territory into districts by examined sign and to reveal only the levels of its intensity in the spatial aspect. The preparation of initial data is a previous stage of cartographical modeling. Relevant methods are defined by goals and tasks of modeling in every concrete case . The method of data groups, for example, is used as a basic principle when composing the statistical maps known as cartograms that are instrumental for revealing of regularities in spatial distribution of mapping signs. The grouping of statistical data is simultaneously carried out by geographical sign and by size of indices. The correct choice of value intervals of statistical indices used for procedure of grouping is especially important stage when working out the cartogram.
Cartographical Modeling as a Statistical Method
307
Precision of spatial model depends on size of these intervals. It is expedient to select them by method of consecutive approximation. On the one hand, such an order helps to avoid the excessive, unnecessary detailing in the picture of spatial structure of modeling process or phenomena and, on the other hand, to ensure an adequate representation of its essential features. There is no need to use any standartizated scale of gradation or purely mechanical statistical methods/modes. In every concrete case when defining the limits of intervals, one must take into consideration not only the range of values distribution proper but also the necessity to show the existing differences in the spatial structure of investigated phenomena (processes, structures).
4 Cartographical modeling of latent structures The cartographical modeling may be regarded as one of the modification of latent structure analysis which pursues an object to reveal and distinguish the latent groups of population with peculiar social organization, material and cultural consumption, goals, preferences and behaviour. The analysis of latent structure is to be started with appraisal of empirical data and working out of hypothesis on the presence of a few definite groups of population forming the latent structure. And after that the proposed hypothesis has to be verified in the statistical respect on the basis of factual data. The model of latent structure tests the fact of presence of postulated groups, but more deep penetration into essence of problem calls for additional information. The revelation of latent structures as the instrument of analysis may be of use also when studying the population attitudes towards different problems, for statistical interpretation of regional distinctions in the structure of people’s consumption, for explaining of the intensity of population moving within the urban and rural areas, for estimation of life conditions inside of cities and their suburbs etcetera. The data for study of population behaviour or measuring of social structure parameters may be received in the course of sociological surveys of public opinion or be got from current statistical returns. The last source of information is more preferable. The cartographical model, worked out on the basis of a current statistics on size and structure of rural population, enables to reveal the system of regions with specific socio-group organization of their inhabitants. In other words, this model confirms the existence of latent groups of population in the countryside and indicates their localization in the space.
308
Recent Advances in Stochastic Modeling and Data Analysis
5 Monitoring of a spatial behaviour of population The permanent observation of current statistical information during a long time creates the necessary prerequisites for organization of data base. The collection of statistical data, their standardization and compiling of series of relevant maps are integral parts of monitoring as a system of supervision and control after the processes of spatial behaviour of population. The appropriate statistical data for analysis should to meet the demands of a highest possible spatial detailing, uniformity, simultaneity, authenticity, continuity and comparability. The infringement of least one of these demands results in lowering of cognitive value of initial information and obtaining rather probable than true knowledge of the research subject. The scientific programme of monitoring includes also the working out of prognoses concerning eventual changes in the course of spatial self-organization of people, providing it with necessary information on possible unfavorable consequences, appraisals of regulation decisions and their efficiency. The revealing of regularities in behaviour of rural population in Ukraine in process of spatial selforganization is based on the analysis of statistics for the last forty years. So far as territorial movement of population is closely bound up with alteration of place and character of labour (in other words it is the movement of mainly able-bodied contingents), only the rural population in working age was chosen as an object of modeling. The cartographical models of spatial disposition of rural population in ablebodied age in Ukraine give a clear view of the geographical location of rural inhabitants and their concentration or dispersion in definite regions of countryside. The comparison of identical models for a few years in succession gives us the notion about the nature and rate of changes and development of spatial structures and discovers the important spatial relations between movements of rural inhabitants and urbanization in Ukraine. We come to the conclusion that concentration of urban and concentration of rural population within the bounds of the Ukraine are two sides of one and the same process of urbanization. Towns and cities perform in this process the duties of peculiar nucleifor crystallizationfor a new sociospatial structures of rural resettling. The large cities with strong economical potential and diverse functions have the greatest influence on the level of concentration of rural able-bodied inhabitants and area of their location in suburbs. At the same time, one can observe the rise of separate hotbeds and then a whole zones where the dispersion of rural population and destructive demographic processes adliance with a high speed. The deep changes in the structure of spatial self-organization of rural population take place over a long period of time. The main point of this process lies in the permanent deepening of its territorial polarization.
Cartographical Modeling as a Statistical Method
309
The demographical consequences of spatial self-organization of countrypeople are highly various, closely tied and are evinced first of all in different types of dynamics and age structure of rural population in the regions of concentration and dispersion. Rise in the number of rural inhabitants at the ablebodied age and increase in amount of large villages in the suburbs are accompanied by improving of demographical structure of rural population in these areas. A social and territorial mobility of village inhabitants is gaining in strength and scope, the structure of employment changes for the better. At the same time destructive demographical processes are observed in the rural areas of dispersion. Reproductive activity is reduced, mortality rises rapidly, the life expectancy at birth is growing shorter, the age structure of rural population is getting worse and worse.
*** In conclusion we would like to remark that the spatial selforganization of rural inhabitants within the Ukraine is on-going process. It should be expected for subsequent deepening of age disproportion in age structure of rural population in the areas of dispersion, lessening of their number and decrease of labour potential in countryside.
References [Aslanikashvili, 1974lA. F. Aslanikashvili. Metacartography. Basic Problems Tbilisi, Metsniereba, 1974. (Russian). [A. M. Berliant, 1978lA. M. Berliant. Cartographical method of research. Moscow, Moscow State University. Publishing House, 1978. [Garner, 1967lGamer B. J. Models of Urban Geography and Dislocation of Populated Area. Edited by Richard J. Charley and Peter Hagget, London, 1967. [Et al., 1973]Mathematical Methods in Social Sciences. Moscow, Progress, 1973 (Russian). [Osipov and Andreyev, 1977lG. V. Osipov and E. P. Andreyev. Methods of Measurement in Sociology Moscow, Nauka, 1977. (Russian). [Stouffer, 19401s. A. Stouffer. Intervening Opportunities: A Theory Relation Mobility and Distance. The American Sociological Review, 1940, v. 5, No. 6, p. 845-867. [Ullman, 1956lEdward L. Ullman. The Role of Transportation and the Bases for Interaction in Man’s Role in Changing the Face of the Earth. University of Chicago Press, Chicago, 1956. [Zablotsky, 1975lG. A . Zablotsky. Estimation of Social Conditions of Population Settlement and Mathematical Simulation of Urban Development. In: Social Base for Urban Development. Moscow, 1975 (Russian).
310
Recent Advances in Stochastic Mo~elingand Data Analysis
6
Cartographical ~ o d e l i ~asg a Statistical Method 311
Learning and Inference in Switching Conditionally Heteroscedast ic Factor Models Using Variational Methods Mohamed Saidane and Christian Lavergne Department of Mathematics University of Montpellier I1 34095 Montpellier, France (e-mail: {saidane, Christian.Lavergne}Qmath.univ-montp2. fr) Abstract. A data-driven approach for modeling volatility dynamics and comovements in financial markets is introduced. Special emphasis is given to multivariate conditionally heteroscedastic factor models in which the volatilities of the latent factors depend on their past values, and t h e parameters are driven by regime switching in a latent state variable. We propose an innovative indirect estimation method based on the generalized EM algorithm principle combined with a structured variational approach, that can handle models with large cross-sectional dimensions. Extensive Monte Carlo simulations and preliminary experiments with financial data show promising results. Keywords: Factor Models, HMM, Conditional Heteroscedasticity, EM Algorithm, Variational Approximation.
1
Introduction
In the financial econometric literature, factor models have been developed and used widely in the area of asset pricing as an alternative t o the Capital Asset Pricing Model (CAPM) since the early 1960s. In this context, factor models have been used as a parsimonious means of describing the covariance matrix of returns since the single-index model of [Sharpe, 19631. Traditionally, these issues were considered in a static framework, but recently, the emphasis has shifted toward inter-temporal asset pricing models in which agents decisions are based on the distribution of returns conditional on the available information, which is obviously changing. It is now generally accepted that asset returns are heteroscedastic. A large body of empirical research (e.g., [Schwert, 19891) reports convincing evidence that the volatility of asset returns is time-varying. Yet, the literature that combines multi-factor models of asset returns with a time-varying covariance matrix is relatively small. This is probably due t o the econometric challenges associated with estimating and testing such models. Papers that examine a linear factor model of asset returns in which the covariance matrix of returns is heteroscedastic include [Jones, 20011 and [Demos and Sentana, 19881 and their references.
312
Variational Methods for Factor Models 313
This paper extends the different models proposed in the above literature to a multi-state model by allowing for model transitions that are governed by a Markov chain on a set of possible models describing the different states of volatility. The switching conditionally heteroscedastic latent factor model is presented in Section 2, followed by a description of a maximum likelihood estimation procedure based on the Expectation-Maximisation (EM) principle combined with a structured variational learning approach, the empirical results in Sections 4 and 5, and a summary in Section 6.
2
The Switching Factor Model
This specification supposes that excess returns depend both on unobservable factors that are common across the multivariate time series, and on unobservable different regimes that describe the different states of volatility. 2.1
Dynamic Factor Structure
Let y t denote the q-vector of excess asset returns and ft denote the k-vector of latent factor shocks in period t . In our switching conditionally heteroscedastic factor model, the realized excess return on an asset is the sum of its expected return, k systematic shocks and an idiosyncratic shock. In matrix notation, the switching factor model for the excess return vector is:
st
-
P(St = j / S t & l = i )
t = l , ..., n and i , j = l ,..., rn fs,
= Hii2f,*
Y t = &fs,
-
+ Est
f:
where with
Es,
- N(0,
Ik)
- N(&,
, *st)
where St P(St = j / S t - l = i) is a hidden Markov chain indicating the state or the regime at the date t . In an unspecified state St = j , 0, are the ( q x 1) mean vectors, fi the ( k x 1)vectors of unobserved common factors, &itthe ( q x 1) vectors of idiosyncratic noises, Xj the ( q x k ) factor loadings matrices, *j are ( q X q ) diagonal and definite positive matrices of idiosyncratic variances, and Hi the ( k x k ) diagonal and definite positive matrices whose elements are the variances of the common factors presumedly time varying and their parameters changes according to the regime. In particular, we suppose that the variances of the common factors follow univariate Generalized Quadratic Autoregressive Conditionally Heteroscedastic processes GQARCH(1,l ) , the I-th diagonal element of the matrix Hi under an unspecified regime St = j is given by hit = w i a 1jf ijt -2 6: hit-l for 1 = 1,..., k .
+$ f i - l +
+
314
Recent Advances in Stochastic Modeling and Data Analysis
To guarantee the positivity of the conditional common variances and the covariance stationarity, we impose the constraints w j , a;, 6; > 0 , yj2 5 4w$z~i and a:. 6; < 1, V j, I . For model identification we suppose that q 2 k and rank(Xj) = k , V j . We suppose also that the common and idiosyncratic factors are uncorrelated, and that ft and E t f are mutually independent for all
+
t , t’. 2.2
A State-Space Representation
The state-space representation of our model, with continuous state variable ft, is given by: [Measurement Equation] [Transition Equation]
y t = Os,
-
+ Xs,fs, + cst + fs,
fa, = O.fS,-,
-
where ESt/V1:t-1 N(0,Q8,) and fst/V1:t-l N(0,Hst). The information set available at time t is denoted by V1:t-l = { ~ l : t - l , F l : t - l , S ~ :where ~-l}, YlZT= {yl, ..., y T } , F1:T= {fl, ...,fT} and Sl:, = (5’1, ..., S T } . In a given state St = j , the prediction equations are given by:
E(ft+l/Vl:t) = f&
=0
v j = 1, ...,m
Var(fit+i/vi:t) = hit+,/, = a$+ $fZ,/,
+a$
[fi;,+ h:t/,] + 6$hit/t-l
where h;t/t is the 1-th diagonal element of Hilt. The updating equations are:
fi+l/t+l =
Hi+,/,
X’.xj - 1 3 t+l/t (Yt
- Qj)
x/,~ -1 i Hi+i/t+i = Hi+i/t - Hi+,/, 3 t+l/tX.iHi+l/t where xi+,/,= XjH:+,/,X; +\kj. Importantly, given the degenerate nature of the transition equation, smoothing is unnecessary in this case, so that fz/,, = f-” t / t and H:/n = Hi/,.
3
Learning
An efficient learning algorithm for the parameters of our switching factor model can be derived by generalizing the Expectation Maximization (EM) algorithm [Dempster et al., 19771. EM alternates between optimizing a distribution over the hidden states (the E-step) and optimizing the parameters given the distribution over hidden states (the M-step). Any distribution over the complete sequence of hidden states, Q ( S , F ) ,can be used to define a lower bound, B,on the log-probability of the observed data:
Variational Methods for Factor Models
315
where 0 denotes the parameters of the model and we have made use of Jensen’s inequality to establish (3). The E-step holds the parameters fixed and sets Q to be the posterior distribution over the hidden states given the parameters: Q ( S ,3)= P ( S ,F/y, 0).This maximizes B with respect to the distribution, turning the lower bound into an equality, which can be easily seen by substitution. The M-step holds the distribution fixed and computes the parameters that maximize B for that distribution. Given the change in the parameters produced by the M-step, the distribution produced by the previous E-step is typically no longer optimal, so the whole procedure must be iterated. Unfortunately, the exact E-step for our factor model is intractable, because the posterior probability of the real-valued states is a Gaussian mixture with mn terms. In order to derive an efficient learning algorithm for this system, we relax the EM algorithm by approximating the posterior probability of the hidden states. The basic idea is that, since expectations with respect to P are intractable, rather than setting Q ( S ,F)= P ( S ,F/y)in the E-step, a tractable distribution Q is used to approximate P. The difference between the bound B and the log likelihood is given by the Kullback-Liebler ( K L ) divergence between Q and P :
While there are many possible approximations to the posterior distribution of the hidden variables that one could use for learning and inference in switching factor models, we focus on the following:
where the P are unnormalized probabilities, which we will call potential functions and define soon, and ZQ is a normalization constant ensuring that Q integrates to one. The terms involving the switch variables St define a discrete
316
Recent Advances in Stochastic Modeling and Data Analysis
Markov chain and the terms involving the state vectors ft define m uncoupled factor models. Like in mean field approximations we have removed the stochastic coupling between the chains that results from the fact that the observation at time t depends on all the hidden variables at time t. However, we retain the coupling between the hidden variables at successive time steps since these couplings can be handled exactly using the forward-backward and Kalman smoothing recursions. The discrete switching process is defined by
where the q,(j) are variational parameters of the Q distribution. These parameters scale the probabilities of each of the states of the switch variable at each time step, so that q , ( j ) plays exactly the same role as the observation probability p(yt/St = j ) would play in a regular hidden Markov model. The uncoupled factor models in the approximation Q are also defined by potential functions which are related to probabilities in the original system. These potentials are the prior and transition probabilities for ft multiplied by a factor that changes these potentials to try to account for the data:
P(f,3)= [p(fl/Sl= j ) P ( Y l / f l , s 1 = j ) l C ? P(fl / ft -1) = [P (ftlft-1, st = j)P (Yt/ft,
st = j)l
(8)
(9)
where the [t’j’ are variational parameters of Q. The vector Et plays a role very similar to the switch variable S t . Each component [t‘j’ can range between 0 and 1. When [,(j’ = 0 the posterior probability of fl under Q does not depend on the observation at time t . When $’ = 1, the posterior probability of fj under Q includes a term which assumes that factor model j generated y t . We call Et‘j’ the responsibility assigned to factor model j for the observation vector y t . 3.1
The Variational Fixed Point Equations
To maximize the lower bound on the log-likelihood, KL(QIIP) is minimized with respect to the variational parameters $’ and qp’ separately for each sequence of observations. For convenience we will express the probability density P in the log domain, through its associated energy function or hamiltonian, Ft. The probability density is related to the hamiltonian through the usual Boltzmann distribution (at a temperature of l ) ,P ( . ) = exp {-‘H(.)}, where 2 is a normalization constant required such that P integrates to unity. We then similarly express the approximating distribution Q through its hamiltonian ‘HQ.
Variational Methods f o r Factor Models 317
Comparing ‘HQ with 3-1 we see that the interaction between the SF’and the f j variables has been eliminated, while introducing two sets of variational parameters.’ In order to obtain the approximation Q which maximizes the lower bound on the log-likelihood, we minimize the Kullback-Leibler divergence as a function of these variational parameters,
KL(QIIP) = EQ [‘H - ‘HQ] - log ZQ
+ log
(10)
where EQ denotes expectation over the approximating distribution Q. Both Q and P define distributions in the exponential family. As a consequence, the zeros of the derivatives of K L with respect to the variational parameters can be obtained simply by equating derivatives of lE~(7-l)and EQ(’HQ)with respect to corresponding sufficient statistics St),f l and h$ where Ri = EQ[f{fi’] - EQ[fl]IEQ[fl]’is the covariance of f l under Q. The fixed-point equations for q p ) and ti’) are given by:2
To compute $) it is necessary to sum Q over all the S, variables not including St. This can be done efficiently using the forward-backward algorithm on the switch state variables, with q p ) playing exactly the same role as an observation probability associated with each setting of the switch variable. To compute q p ) it is necessary to calculate the expectations of f j and fifi’ under Q. These expectations can be computed efficiently using the Kalman smoothing algorithm on each state-space approximation of the factor model, where for model j at time t , the data is weighted by the responsibilities 3.2
The EM Training
The first maximization step computes the model parameters that optimize the expectation of the complete log-likelihood, which is a function of the expectations of the hidden variables. For our factor model, the parameters Oj, Xj and *j can be computed analytically. This is a weighted version of the reestimation equations for factor models without regime switching. Similarly, where Sp)= 1 if the switch state is in state j , and 0 otherwise. The equations are satisfied when <,(j) = EQ[S,(~)]. Using the fact that EQ[S,(~)] = Q(St = j ) , we get the fixed-point equation for
’.
318
Recent Advances in Stochastic Modeling and Data Analysis
the re-estimation equations for the switch process, 7rj and p i j , are analogous t o the Baum-Welch update rules for HMMs [Lavergne and Saidane, 20061. In a second maximization step, given the new values of ~ j p i, j , 6 j , Xj and qj,the parameters q5? = { ~ ~ , - y ~ , for a ~j ,=61, ~ ..., . }m can be updated by maximizing the observed log-likelihood function:
+
where X$-l = XjH:/,-,X$ \kj and HZ/t-l is the expectation of Ht, conditional on Dl:t-l, obtained via the quasi-optimal version of the Kalman filter (2.2). However, for the implementation of the optimization algorithm it is necessary to identify the optimal sequence of the hidden Markovian states, which can be carried out using the variational parameters or an approximated version of the Viterbi algorithm [Saidane and Lavergne, 20061. Once this sequence is known, on each segment of data the function L* is maximized through the fmincon optimization Matlab function.
4
Monte Carlo Experiments
The example used for simulation experiments has q = 6 observable variables and only one conditionally heteroscedastic common factor. We consider the case of two states model with the initial state S1 = 1 and the transition probabilities p l l = 0.95, p12 = 0.05, p2l = 0.02 and p22 = 0.98. The hypothesis of changing the state with the stock market’s reaction t o events implies that the properties of the considered hidden chain change over time: as time increases, the state index increases, decreases or stay the same. Here, the constant term of the GQARCH specification, w j is fixed to one V j = 1, 2. The iterations of the EM algorithm stop when the relative change in the likelihood function between two subsequent iterations is smaller than a threshold value = lop4. The initial parameters for the EM algorithm, were obtained by randomly perturbing the true parameter values by up to 45% of their true values. In this experiment we try to estimate the parameters of the model and to investigate the asymptotic distribution of these estimates. With this intention, we generated 100 sequences using the true model parameters given in table 1. Thereafter, our estimation procedure was carried out on each of these sequences. Table 1 shows the average and standard deviation of the estimates. The results indicate that the estimation method works well. The sample means are very close to the true ones, and the standard deviations are small. To investigate the asymptotic distribution of the estimates 6,we have used the [Shapiro and F’rancia, 19721 statistic in order t o test the univariate
Variational Methods for Factor Models
319
Table 1. Simulation with n = 800.
0 X diag( e) 4 *0.9348 (0.0938) 2.0752 (0.0411) 0.9993 (0.0477) 0.0838 (0.0905) 1.oooo** 2.0000 1.0000 0.1000 1.9423 (0.0923) 2.0147 (0.0432) 1.0016 (0.0504) 0.3054 (0.0517) 2.0000 2.0000 1.0000 0.3000 0.9243 (0.0815) 2.0424 (0.0291) 0.9881 (0.0566) 0.3847 (0.0374) State 1 1.0000 2.0000 1.0000 0.40000 1.9618 (0.0802) 1.9813 (0.0336) 0.9956 (0.0567) 2.0000 2.0000 1.0000 0.9355 (0.0956) 2.0417 (0.0313) 0.9984 (0.0610) 1.0000 2.0000 1.oooo 1.9466 (0.0894) 1.9785 (0.0382) 1.0006 (0.0508) 1.oooo 2.0000 2.0000 1.9694 (0.1536) 3.0654 (0.0336) 1.9976 (0.0997) 0.1867 (0.0906) 2.0000 3.0000 2.0000 0.2000 2.9428 (0.1512) 2.9816 (0.0322) 2.0039 (0.0982) 0.2011 (0.0388) 3.0000 3.0000 2.0000 0.2000 1.9353 (0.1474) 2.9819 (0.0307) 1.9785 (0.1106) 0.5907 (0.0292) State 2 2.0000 3.0000 2.0000 0.6000 2.9362 (0.1488) 3.0274 (0.0296) 1.9776 (0.1117) 3.0000 3.0000 2.0000 1.9386 (0.1577) 3.0296 (0.0269) 2.0034 (0.1074) 2.0000 3.0000 2.0000 2.9534 (0.1528) 3.0105 (0.0284) 1.9984 (0.1127) 3.0000 3.0000 2.0000 * Averages
(.) Standard deviations. **
Parameter values for the true model.
normality of each component of 0.All the results show that the ShapiroFkancia test fails to reject the null hypothesis (the 0% are a random sample from M ( p , o),with p and o unknown) at the significance level a = 5%. In this experiment we have studied, also, the behavior of the estimates when the size of the sequence n increases. With this intention, we generated sequences of observations of sizes n = 800, 1000, 1200 and 1500 using the parameters given in table l (with a hundred replications per simulation). Thereafter, we have used the Kullback-Leibler divergence [Juang and Rabiner, 19851 in order t o measure the distance between estimates 6 and true parameters 0 0 . For each value of n, the estimation procedure was carried out a hundred times, and the KL distances between each of the hundred estimators and the true parameter were evaluated on a new sequence, independent of the first hundred sequences used t o obtain the estimators. All the results show a general decrease in average and spread of the distances with increasing n. This imply a n increasing accuracy and stability of the estimators as n increases.
320
5
Recent Advances i n Stochastic Modeling and Data Analysis
Financial Data
The switching factor model is now applied to modeling the interrelationships between the currencies of eight European countries during the currency crisis of 1992-1993. With this intention, we analysed a dataset on weekly average returns of closing spot prices for eight currencies relative to the US dollar in price notations. The time series considered here are the French Francs (FRF), Swiss F'rancs (CHF), Italian Lira (ITL), German Marks (DEM), Belgian Francs (BEF), Spanish Pesetas (ESP), Swedish Krona (SEK), and British Pounds (GBP) from 07/17/1985 to 01/22/1997 (600 observations). Volatilities
Variational Parameterr
Fig. 1. Volatility of the FRF, DEM and GBP series and the Variational parameters of the three hidden states EZ.
Using model selection criteria AIC and BIC, the results show that Ic = 2 and rn = 3 is strongly favored. Broadly, our results show that the model is capable of accurately detecting abrupt changes in the time series structure and, in particular, the severe disruption by the violent storm which hit the European currency markets in September and October 1992. We can clearly see from figure 1 that the third model is responsible for the high volatility segments, the second model is mainly responsible for the time period before September 1992, and the first for the lower volatility segments after 1993. The average duration stay in the first regime is about 31.88 months versus 89.38 in the second and 28.73 in the third. Our results show also that all the correlations between the European currencies have decreased just after August 1992. This is the effect of financial contagion that can be defined as a significant increase or decrease in co-movement of financial prices experienced by a group of countries following a crisis elsewhere. Finally, the Ljung-Box statistic for the serial correlation of the squared residuals does not also reject the null hypothesis of uncorrelated squared residuals. Hence, all the covariance or correlations between the different exchange rate returns are explained by the common and specific factors.
Variational Methods f o r Factor Models
6
321
Conclusion
The paper has developed a novel solution t o the problem of modeling conditionally heteroscedastic financial time series subject t o Markov switching within a multivariate framework. This new specification takes into account, simultaneously, the usual changing behavior of the common volatility due to common economic forces, as well as the sudden discrete shift in common and idiosyncratic volatilities that can be due t o sudden abnormal events. Our Monte Carlo simulations have shown promising results of the proposed algorithm, especially for segmentation and tracking tasks. An interesting direction for further research is the generalization of this model t o one where the state transition probabilities are not homogeneous in time. The study of such models would provide a further step in the extension of hidden Markov models t o probabilistic factor analysis and allow for further flexibility in financial applications, where accurate on-line predictions of the time varying covariance matrices are very useful for dynamic asset allocation, active portfolio management and the analysis of options prices.
References [Demos and Sentana, 1988lA. Demos and E. Sentana. An em algorithm for conditionally heteroscedastic factor models. Journal of Business & Economic Statistics, 16:357-361, 1988. [Dempster et al., 1977lA. Dempster, Laird, N., and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Statistical Society Series B, 39:1-38, 1977. [Jones, 2001]C.S. Jones. Extracting factors from heteroskedastic asset returns. Journal of Financial Economics, 62:293-325, 2001. [Juang and Rabiner, 1985lB.H. Juang and L.R. Rabiner. A probabilistic distance measure for hidden markov models. A T & T Technical Journal, 64:391-408, 1985. [Lavergne and Saidane, 2006lC. Lavergne and M. Saidane. Hidden markov models for conditionally heteroscedastic financial time series. Research Report I N R I A RhGne-Alpes, 2006. [Saidane and Lavergne, 2006lM. Saidane and C. Lavergne. Learning and inference in mixed-state conditionally heteroscedastic factor models using viterbi approximation. In Vassil N. Alexandrov, G. Dick van Albada, Peter M. A. Sloot, and Jack Dongarra, editors, Lecture Notes in Computer Science, Part IV, volume 3994, pages 372-379, 2006. [Schwert, 1989lG.W. Schwert. Why does stock market volatility change over time? Journal of Finance, 44:1115-1153, 1989. [Shapiro and Francia, 1972lS.S. Shapiro and R.S. Francia. An approximate analysis of variance test for normality. Journal of the American Statistical Association, 67~215-216,1972. [Sharpe, 1963lW.F. Sharpe. A simplified model for portfolio analysis. Management Science, 9:277-293, 1963.
Correlation tests based NARMAX data smoother validation Li. Feng. Zhang, Quan. Min. Zhu* and Ashley. Longden
* Faculty of Computing, Engineering and Mathematical
Sciences
University of the West of England, Frenchay Campus Coldharbour Lane, Bristol, BS16 lQY, UK (e-mail: Ouan.zhu0,uwe.ac.uk) Abstract. In 2006, a set of new correlation hnctions named combined ODACF and ODCCF was proposed for detecting nonlinear correltionships. In the present study, a new validation method which is based on the nonlinear correlation tests is proposed to check the quality of NARMAX data smoothers without detailed prior knowledge of the actual noise. A simulation example is implemented to demonstrate the effectiveness and efficiency of the new method. Keywords: Nonlinear modeling, Data smoothing, NARMAX, Nonlinear correlation tests.
1
Introduction
Nonlinear autoregressive moving average model with exogenous inputs (NARMAX) data smoother is a relatively simple noise reduction method for nonlinear input/output systems (Aguirre and et al, 1996). There is a problem in the real applications of NARMAX data smoother that the detailed prior knowledge of underlying systems and signals are always not available so that it is difficult to determine whether the noise is properly removed from the measured data. Usually, signal to noise ratio (SNR) is used as an important criterion that the iteratively identification procedure can be stopped when the specified SNR is met. However, in most practical situations, the actual SNR is unknown and the estimated SNR can not be used to precisely check whether the data is smoothed adequately. To overcome this problem, the correlation tests based model validation methods are used to validate various smoothers (Lee, 2002, Mendes, 2002). In 2006, a set of first order correlation functions named combined omnidirectional auto-correlation function (ODACF) and omni-directional crosscorrelation function (ODCCF) was developed to detected nonlinear associations. Then, the new correlation functions were applied to widely and effectively validate various nonlinear models. Compared to the other higher order correlation tests, the new procedure provides concise expressions, efficient detection power, and less validation plots. 322
Correlation Tests Based N A R M A X Data Smoother Validation 323
In this study a new validation method is proposed based on combined ODACF and ODCCF to check the goodness of NARMAX smoothers. Compared to the model validation approaches, the new method includes a direct correlation test between residuals and both negative and positive lagged outputs to meet the special need of validating NARMAX smoother. Numerical studies are given in this study to demonstrate the new methodology that it can be applied to directly and effectively validate NARMAX model based data smoother.
This study is organised as follows. In sections 2 and 3 the NARMAX smoother is briefly introduced and the new correlation tests based smoother validation method is proposed. In section 4 a simulation example of Lorenz system is employed to demonstrate the NARMAX model based data smoothing and the new smoother validation method. Conclusions, then, are given in section 4.
2
NARMAX modelling based data smoothing
The NARMAX smoother is a global nonlinear smoother to iteratively identify a NARMAX model in terms of both the positive and negative lagged outputs, inputs and residuals. It is estimated directly from the data and no prior knowledge is assumed (Mendes, 2002). Since the smoother contains the terms related to the further, a much better prediction will be attained. In this section, the structure of the NARMAX smoother and the iterative fitting procedure is briefly introduced. Firstly, the general form of the NARMAX smoother is presented as follows.
i is the iteration index, ( ( t ) denotes the residual. q(,,= [p,,p2,...,pn,lr and 9(,,= [BI,B,,...,Bmlr denote model term and parameter vectors of the ith iteration. p,(t) denotes a linear or nonlinear term such as &(t
-l),
j&,(t+1),
c(,+ll ( t - 1)
and etc. 0,( t ) denoted the corresponding j&,(I - 2)f18-,, ( t - 3 ) , parameter. NARMAX smoothing is an iterative procedure and can be summarised as follows. 1.At the first iteration, let yto,(t) denotes the original noisy outputs. A NARX model is defined and estimated to produce the first smoothed data set and residuals to give
324
Recent Advances in Stochastic Modeling and Data Analysis
2.At the second iteration, the first smoothed data set is used to replace the original outputs. The residual sequence which is obtained fiom the first iteration is used as a new regressor of the smoother, and new terms are added in to the model. Consequently, a new NARMAX model structure is determined. The new smoother, then, is estimated to approximate the first smoothed data set and a new residual sequence is produced. It is derived as follows.
3.Then the iterative procedure is continued. The smoother
El(.)with
i 2 3 is
iteratively estimated to obtain the smoothed data set f l , l ( t ) .It is derived as follows.
i is increased by one until the residuals are converged and the target of smoothing such as signal to noise ratio (SNR) is met or the correlation tests based validation is passed. For time series system, u " ( t ) are removed from the smoother and it is so-called NARMA smoother. (Lee, 2002, Mendes, 2002).
3
Validation Algorithm for NARMAX smoother
Firstly, the combined ODACF and ODCCF based model validation are briefly introduced. Consider any two data sequences ~ ( t and ) u ( t ) . Omni-directional cross-correlation functions (ODCCFs) are formulated as follows. Let
Nt)= Iu'(t)l
and prime ' denotes that the mean level has been removed
B ( t ) = IE'(t)l
1
N
from the corresponding data sequence such as (u(t))'= u ' ( t ) - - Z u N (=I
'2
( t ).
Correlation Tests Based N A R M A X Data Smoother Validation 325
The results obtained from ODCCFs are combined to constitute a much condensed formulation named combined ODCCF ( p,,,,,( 5 ) ) to provide better illustration for detected correlations (Zhang, Zhu and Longden 2007).
For a special case u ( t ) = & ( I ) , the functions (6) and (7) are called combined ODACF. In the study of Zhang and et a/, the theoretical proofs and simulation studies shows that the new correlation tests provides an enhanced nonlinear correlation detection power. For a valid identified model, the residuals should be reduced to white noise sequence and are uncorrelated to delayed residuals, inputs and outputs. The model validity tests, hence, is formulated as follows.
326
Recent Advances in Stochastic Modeling and Data Analysis
The correlation test based model validation approaches can be used to provide effective solutions for validating NARMAX smoother (Aguirre, 1996, Lee, 2002, Mendes, 2002). The above method, therefore, also can be applied to deal with this kind of issues. However, there is a special feature of data smoothing that future outputs are available and used as important regressors during the smoothing procedure that it is essentially different from system identification. Therefore, the model validation methods are still insufficient to meet the strict requirement of data smoothing and NARMAX smoother validation method need to be proposed. For data smoothing, consider a proper smoothed output sequence Rk,(t)which should include entire predictable information from the noisy data. The overall residuals are computed by subtract Rk,(t)from the measured outputs y ( t ) as
=u
( t ) - ~ , t ~ ,........................... (.)
Y'*I ( t ) , u'-I ( t ) )
= y(t)-F(y'"(t),u'-'(t))
(9) Based on this concept, a set of new correlation tests is proposed in this study to validate NARMAX smoothers. Then, t ( t ) will be an adequate estimation of the actual noise. It means that the overall residual sequence is the unpredictable part of the original data that it should be uncorrelated to all the regressors of the smoother. In other words, <(t) should be uncorrelated to y'*'(t), u'-'(t> and
t ' - ' ( t ) .The combined ODACF and ODCCF tests for an adequate smoother are derived as follows.
Correlation Tests Based N A R M A X Data Smoother Validation 327
{ p":(z)=O,
r20
Compared to (8), (10) include a new test p,,:*(r) .. to detect the correlationship between residuals and both passed and future outputs since both positive and negative lagged outputs are used as important regressors in the NARMAX smoother. If the residuals do not include any considerable predictable information all the correlation functions will lie inside the confidence interval.
4
Simulation study
In this section, the new NARMAX smoother validation procedure is demonstrated by simulation studies. Firstly consider the Lorenz equation represented as follows.
i
x, = v , ( x , - 4 ) x,
= q,x, - x , - X , X ,
x,
= q,x, - x , x ,
The parameters were set as ill = 1 0 , q2 = 28 and 113 = 3 . The initial values of Lorenz system were set as x , = - 5 , x, = -6 and x , = 2 . (1 1) on the interval of
[rlo I r IlOz] was investigated and a fourth order Runge-Kutta algorithm with an integration interval of z/3000s was applied to simulate the system. A sampling interval of T, = n/300s was chosen and the data sequences with length of 3000 were generated. Subsequently x , ( t ) was observed as the outputs of system and a normally distributed white noise with zero mean and variance of 2 is used to corrupted x, ( t ). The measured outputs y ( t ) is defined as
Figure 1 shows the noise free outputs and measure outputs. Figure 2 shows the phase portrait plots for xI( t ) and y ( t ).
328
Recent Advances in Stochastic Modeling and Data Analysis
Fig. 1. The time series plots of noise free outputs and measured outputs
Fig. 2. The phase portrait plots of the noise free outputs and measured outputs Since (1 1) is not excited by any input signal, NARMA smoothers are used to reduce the noise from raw data. To demonstrate the new validity tests, two candidate smoothers are employed in this study. They are derived as follows.
After iterative fitting procedure, the parameters were determined and the smoothed data was produced. Figure 3 shows the phase portrait plots for the two smoothers that compared with the phase portrait plot of x, ( t ) as shown in figure 2 smoother (13) provides a much better noise reduction.
\
/
n
*
x
__ (
1
,
1
Fig 3 The phase portrait plots of the smoothed data set for the two smoothers
Correlation Tests Based N A R M A X Data Smoother Validation 329
Figure 4 shows the results obtained from using the new validation method. It clearly suggests that (13) can be used to adequately smooth the raw data and (14) is invalid.
Fig. 4. The results obtained from using the new validity tests for the two smoothers
5
Conclusions
In the present study, a new validation method has been proposed based on combined ODACF and ODCCF tests to check the quality of NARMAX data smoother. Compared to the other approaches, the new method provides an enhanced detection power and a compact and visual illustration. It is believed that the method can be applied to wider range of nonlinear data smoothers. Simulation studies are implemented in this study to demonstrate the new method.
References Billings, S. A., and Zhu, Q. M., 1994, Nonlinear model validation using correlation tests, International Journal of Control, 60, 1107-1120. Billings, S. A.and Voon, W. S. F., 1986, Correlation based model validity tests for nonlinear models. International Journal of Control, 44,235-244. Zhang, L. F., Zhu, Q. M., and Longden, A., 2007, A set of novel correlation tests for nonlinear system variables, International Journal of System Science, 38,47-60. Zhu, Q. M., Zhang, L. F., and Longden, A,, 2006, Development of omnidirectional correlation functions for nonlinear model validation, Autornatica, Accepted for publication. Aguirre, L. A., Mendes, E. M., and Billings, S. A., 1996, Smoothing data with local instabilities for the identified chaotic systems, International Journal of Control, 63,483-505. Mendes, E. M., 2002, Multivariable nonlinear smoothers as and aid for identifying models from chaotic data, Proceedings of the American Control Conference, Anchorage, 8-10, May, 95 1-956. Lee, K. L., and Billings, S. A., 2002, The effects of noise reduction on the prediction accuracy of time series, International Journal of System Science, 15, 1207-1216.
Kernel based confidence intervals for survival function estimation Dimitrios I. Bagkavos' , Aglaia Kalamatianou2, and Dimitrios Ioannides3
'
Accenture Marketing Sciences 246 Kifissias Str. 15231 Athens, Greece (e-mail: dimitrios .bagkavosQaccenture .com) Deptartment of Sociology Panteion University 136 Leof. Syngrou 176 75, Athens, Greece (e-mail: akalam@panteion. gr) Deptartment of Economics University of Macedonia 136 Egnatia Ave. 176 75, Thessaloniki, Greece (e-mail: dimioanQuom.gr)
Abstract. Based on an existing kernel survival function estimate that admits censored data, we develop confidence intervals t o help assess the validity of the estimate. Practical issues of estimation are discussed and then the developments are applied t o a real data set. The results are analyzed and discussed further. Keywords: Censored data, confidence intervals, kernel estimate, survival function.
1
Introduction
[Kalamatianou and McClean, 20031 modeled the distribution of the duration of undergraduate studies in a Greek university using survival analysis techniques. In particular their nonparametric estimation part is based on the Kaplan-Meier estimator, [Kaplan and Meier, 19581, which provides a step function as an estimate of the true survival function. [Kulasekera et al., 20011 proved that a smoothed version of the Kaplan-Meier curve would be more efficient in terms of its asymptotic properties. This, together with the work of [Marron and Padgett, 19871 which extents kernel estimates to censored data situations, motivated the development of a smooth, kernel based, survival function estimate in [Bagkavos and Kalamatianou, 2006a]. Given that in practice it is desirable to have some assurance about the validity of the model, here we develop and implement confidence intervals that admit censored data and use them to verify the estimate of the survival function in [Bagkavos and Kalamatianou, 2006al. However the proposed confidence intervals can be used to any type of lifetime data and can easily be adapted to similar, kernel based, survival function estimators. For 330
Kernel Based Confidence Intervals 331
full description of the data together with motivation and potential applications of such a study see [Kalamatianou and McClean, 20031. See also [Simonoff, 19961 for an introduction to kernel density estimation. The rest of the paper is organized as follows: In section 2 we give the estimate of the survival function and an automatic method to implement it in practice. Using this as a vehicle, we develop confidence intervals to verify the estimation of the true curve. A brief description of the data, application of the theory of section 2 and results are discussed in section 3.
2
Estimation
Suppose we have a sample Xy, X i , . . . , X: of i.i.d. survival times censored at the right by i.i.d. random variables U l ,Uz, . . . , U,, which are independent from the X!’s. Let f be the common probability density and F the distribution function of the Xzp’s. Denote by H the distribution function of the Ui’s. Typically the randomly right censored observed data are denoted by the pairs (Xi,&), i = 1 , 2 , . . . , n with Xi = min{X:, Ui}and Ai = l { x ~ < where ua} l{.}is the indicator random variable of the event {.}. An estimate of the unknown pdf f can be defined as I -
where K , called kernel, is a function that integrates to 1, h, called bandwidth, is the amount of smoothing applied to the estimator and fi* is an estimate of 1- H , typically taken to be the Kaplan-Meier estimator, slightly modified in order to avoid division by zero, i.e.
with ( Z i , A i ) being the ordered (Xi, &), i = 1 , .. . , n. Estimator f ( z )has been widely discussed in the literature. For motivation for using this estimate see [Marron and Padgett, 19871. Also see the references therein for an overview. Practical implementation of f(x) requires selection of the kernel K and the bandwidth h. Of the two, of greater importance is selection of the smoothing parameter as this affects the asymptotic properties of the estimate and its visual performance, e.g. [Wand and Jones, 19951, page 13. An estimate of the survival function S ( x ) = 1- F ( x ) ,can be obtained by integrating f ( x ) ,i.e. Sn(x) = 1 - @(z),where
332
Recent Advances in Stochastic Modeling and Data Analysis
with
where
p=xfhand al(p) =
l:
u’K(u)du, 1 = 1,2,3.
This extra modification is used because typically kernel estimates such as f ( x ) are biased more than the usual in the boundaries of the region of estimation. See [Simonoff, 19961, page 49 for details. The purpose for using a function such as B ( x ) is to eliminate endpoint effects and restore the bias at the boundaries to levels of the bias at the interior. Note that
and thus B(x) > 0 is a proper density. Bandwidth selection is typically done by choosing h which minimizes some error criterion. Here we use the least squares cross-validation method, developed for the case of distribution function estimation by [Sarda, 19931. We employ the method to minimize the Integrated Squared Error (ISE) for the reasons exhibited in [Sarda, 19931, among which most important is that such a procedure produces an estimate that has minimized squared error across the entire region of estimation. In detail, the objective is to choose h which minimizes the ISE of P ( x ) . This is accomplished by minimizing I I
CV(h)=
E{&)- P ( x ) } ’ i=l
where Pi(x) is computed from the definition of k ( x ) by leaving out X i . Minimization of C V ( h )typically is done by a grid search for h in the interval n-’/56-/4 < h < 3n-1/5i?-/2, where b is the sample standard deviation, and then extend the interval if the minimum is at the endpoints of either side. After the best point is found a possible improvement would be a quasi-Newton approach. Now, in order to derive confidence intervals, we first need the asymptotic properties of Sn(x), as well as its asymptotic distribution. Let
p i ( K )=
1:
uiK(u)d u and R ( f )=
.I
f’(u)d u .
Let Kh(.)= hplK(./h). The following conditions are assumed throughout
Kernel Bused Confidence Intervals 333
1. S(z) has two derivatives and S ” ( x )is bounded and uniformly continuous. 2. For 1 = 0,1,2, K ( l ) is bounded and absolutely integrable with finite second moments. 3. R ( K ) < +co and p 2 ( K ) < +co 4. p o ( K ) = 1 and p i ( K ) = 0, for every positive odd integer i. 5 . there exists small enough h such that K h ( y - z)/(l- F ( y ) ) is uniformly bounded for Iy - z1 > M , for any A4 > 0. Conditions 1-4 are satisfied by virtually all kernels in use in practice. Condition 5 essentially means that there should be enough censored data at the right end of the estimation region for the asymptotics to apply. We have the following two theorems which are proved in the appendix.
Theorem 1. Let n + +m, h
-+
0 and nh
-+
co. Then
h2
JE{S&)} - S(z) = s ” ( z ) L p 2 ( K ) + O(h4)
Theorem 2. Under conditions 1-4,
Then, a l O O ( 1
-
a)%confidence interval for S n ( z )is
As the confidence interval above uses quantities that are unknown in practice, we use S n ( z ) ,-f(z), f(z) and fi*(z)instead of S(z), S”(z), f ( z ) and H ( z ) respectively. This yields
In the next section we apply estimator S n ( z )together with appropriate parametric estimators for the total population as well as for the subgroups of men and women and we discuss the results.
334
3
Recent Advances in Stochastic Modeling and Data Analysis
Results
In this section we apply the theory of the previous section to a real life dataset. The available sample is 10,313 observations which represent study times in months of students who entered any department of a certain Greek university from the academic year 1983-84 until 1992-93. Minimum study time is 46 months, without any upper limit for graduation. However the maximum study time in this dataset is 162 months. The main variable of interest is duration of studies. The follow up period lasted until the end of February 1997. This gives a different number of possible graduation opportunities for each cohort of entrants. At the end of the follow up period not all students have obtained their degrees. This results in right censored data. Duration of studies for each student is calculated as the period between the first enrolment and graduation. For censored observations, the study period is calculated by subtracting the date of entry from the date of the end of the follow up period. The available data are also examined according to sex which is also recorded. That is because this factor may affect duration of studies. On the contrary, age, in this type of studies is not considered to be an important factor as the vast majority of students is of about the same age. The objective is to estimate the distribution of the duration of studies of the total population as well as that of the subgroups of men and women. This is done by using three times estimator S n ( z ) ,once for the full dataset, once for the subset of male students and once for the subset of female students. As basis for the kernel W we use the Epanechnikov kernel,
3 K ( z ) = - (1 - zZ), -1 5 z 5 1. 4
+
At the boundaries, i.e. in [46 h) and (162 - h,1621, and with purpose to eliminate boundary bias, the kernel takes the form K p ( z )= K ( z ) ( a bz), -1 5 II: 5 p , where
+
a
=
64(2 - 4p 6p2 - 3 p 3 ) b= (1+ P ) ~19 ( - 18p 3p2)' (1
+
+
240(1 - P ) ~
+ ~ ) ~ ( -1 18P 9 +3P2)
'
(4)
The kernel W results from integration of K (or K p in the boundary). Motivation for use of the Epanechnikov kernel as a basis for W comes from its optimality properties, described in Wand and Jones (1995). In all cases bandwidth selection was done by minimizing (7) as described in section 2. For comparison, we use the parametric estimates developed for the same data set in [Kalamatianou and McClean, 20031. These are
S,(z)= 0.26801 + 0.6344e-0.05471("p46) for the total population,
Sw(z)= 2401 + 0.6566ep0.06198("-46)
Kernel Based Confidence Intervals
335
for the women subgroup and
Sm(x)= 0.3081 + 0.6034e-0.04398("~46) for the male subgroup. In all cases z > 46. In the opposite case, i.e. x < 46, all estimates are defined to be equal to 1. All three parametric estimates have the same functional form, which in general can be written as
and which allows an easy interpretation of the parameters. In particular, y0 is the number of individuals that never experience the event of interest which is graduation. Also, the factor yq is the percentage of individuals that graduate after the 46th month, according to an exponential model with parameter A.
h
Nonparametnc estimate Parametric estimate
Men Parametnc Men Nonparametnc Women Parametnc Women NonDaramelnc
........
,
I
40
60
80
1W
120
---
-
I
I40
160
Duration 01studies in rnonlh~
(a) Kernel and parametric estimates for(b) Parametric and nonparametric estithe total population mates for the subgroups of men and women
Fig. 1. Parametric and nonparametric estimates of the survival function for the total population and separately for men and women subgroups.
The main conclusion from figures l ( a ) and l(b) is that kernel estimators such as the survival function estimate employed in this study give quite similar results as the use of the parametric methods in the sense that inference is the same. See [Bagkavos and Kalamatianou, 2006b] for a quantitative comparison of these parametric and nonparametric estimates.
336
Recent Advances in Stochastic Modeling and Data Analysis Survival function Len confidenceinterval Right confidenceinterval
Men survivalhindon
Black iine Blue line Red line
Blacksolidline Domed line Blue line Red he
oniidenceinlenal(menand women) conl&nce mlewaqmenand women)
I
,
I
,
I
,
60
80
100
120
140
16C
6C
80
month
100
I
I
I
120
140
16C
monlh
(a) Confidence interval for the total popu-(b) Confidence interval for the subgroups of men and women lation
Fig. 2 . Confidence interval for the total population and the subgroups of men and women
In figures 2(a) and 2(b) we utilize (3) and construct confidence intervals, using confidence level a = 5%, for the estimates of the total population as well as the subgroup estimates of men and women. The pictures verify the estimation of Sn(x) in all cases as for all three groups we examine; the estimated curve is constantly within the confidence intervals. In this sense the inference from this estimation, as exhibited in [Bagkavos and Kalamatianou, 2006bl remains valid.
A
Proofs of theorems 1 and 2
To prove theorem 1, in order to derive the asymptotic properties of from lemma 1 in [Kim et al., 20051 we have that
Sn(x),
{
E P ( x ) } = p h ( xdy
-
(IE{~(x)})~}.
(6)
For the bias, by setting X- y = sh in (5), expanding F ( z - sh) in Taylor series around x,using condition 4 and the definition of Sn(x) the result follows. For the variance we use the fact that Var{Sn((z)} = Var{p(x)}. Now, by setting
Kernel Based Confidence Intervals x
-
337
y = th, and a Taylor expansion ar o u n d x,
1 (y)!;iY) W2
dy
=
W X R)( W )+ 0 (hn-1) .
1- H ( x )
(7)
B y ( 5 ) , (6), (7) a n d t h e fact that asymptotically, nP1S2(x)is negligible the result follows. For the proof of theorem 2 first note t h a t i t is equivalent to prove that
Then, the result follows easily by utilizing the findings of theorem 1, together w i th t h e fact that R ( W ) and p 2 ( K ) are finite, on applying t h e Lyapunov theorem.
References [Bagkavos and Kalamatianou, 2006alD. Bagkavos and A. Kalamatianou. Analysis of duration of studies data. In I. Vonta, editor, Proceedings of the international conference on statistical methods for biomedical and technical systems., volume 1, pages 207-212, 2006. [Bagkavos and Kalamatianou, 2006blD. Bagkavos and A. Kalamatianou. Analysis of duration of studies d at a by kernel methods. Communications I n Dependability And Quality Management, submitted for publication, 2006. [Kalamatianou and McClean, 2003lA. Kalamatianou and S. McClean. The perpetual student: Modelling duration of undergraduate studies based on lifetypetype educational data. Lifetime data analysis, 9:311-330, 2003. [Kaplan and Meier, 1958lE.L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53:457-481, 1958. [Kim et al., 2005]Bae, W., Choi, H. and Park, B. Kim, C., -, and et al. Nonparametric hazard function estimation using the kaplanmeier estimator. Journal of Non parametric Statistics, 17:937-948, 2005. [Kulasekera et al., 2001]Williams, C., Coffin, M. and Manatunga, M. Kulasekera, K.B., -, and et al. Smooth estimation of the reliability function. Lifetime data analysis, 7:415-433, 2001. [Marron and Padgett, 1987lJ.S. Marron and W.J. Padgett. Asymptotically optimal bandwidth selection for kernel density estimators from randomly right censored samples. Annals of Statistics, 15:1520-1535, 1987. [Sarda, 1993lP. Sarda. Smoothing parameter selection for smooth distribution functions. Journal of Statistical Planning and inference, 35:65-75, 1993. [Simonoff, 19961J . Simonoff. Smoothing methods in statistics. Springer, New York, 1996. [Wand and Jones, 1995lM. Wand and M.C. Jones. Kernel smoothing, volume 60. Chapman and Hall, London, 1995.
Chaotic Data Analysis and Hybrid Modeling for Biomedical Applications Wlodzimierz Klonowski’’, Robert Stepien”, Marek Darowski2,Maciej Kozarski2 GBAF, Medical Research Center Polish Academy of Sciences (IMDiK PAN) 5 Pawinskiego Str., 02-106 Warsaw, Poland (e-mail - wklon@,gbaf.eu; roberts(ii,gbaf.eu) Institute of Biocybemetics and Biomedical Engineering Polish Academy of Sciences (IBIB PAN), 4 Trojdena Str., 02-109 Warsaw, Poland (e-mail: mdar(i2ibib.waw.d; [email protected]) Abstract. We propose two methods of analysis of chaotic processes to be applied in sensory analysis. These methods may be used off-line in clinics e.g. for analysis of biosignals registered during sleep, or implemented into new sensor systems e.g. for drivers’ vigilance monitoring in real time; they may also be applied in new type of hybrid models of circulatory and respiratory systems. Keywords: analysis of chaotic processes, biosignal analysis, sensory analysis, hybrid modeling
We have developed two methods based on chaos-theoretical approach and fractal analysis that are very useful in processing of data obtained from new (nano)sensors [Klonowski, 20061, in particular in biomedical applications when combined with new hybrid models. The first method is based on symbolic dynamics. We differentiate the signal numerically and represent the signal’s derivative as a series of two symbols, with digit 1 being the symbol corresponding to a non-negative and digit 0 the symbol corresponding to a negative value of the derivative. This series is then subdivided into sequences called words, each one consisting of I consecutive symbols. Series of the words is firther analyzed using a moving window technique. Namely, we calculate the number of occurrences, L . of some specific word in each window, so obtaining a distribution of the chosen word in the series encoding the whole EEG-signal under consideration. The most interesting are words consisting only of 0’s (or only of 1’s). Depending on the chosen length of a word , I , and of the sampling frequency, fs , this method enables detection in the spectrum of the analyzed signal of the waves with frequencies smaller than certain value f: Simple considerations show that f = ( f s / 2 I ) . For example, if fs = 128 Hz and I = 8 then f = 8 Hz , i.e. this method will detect in such a case waves of frequencies smaller than 8 Hz; the greater is L{ 8x0) - the number of occurrences of the word consisting of 8 consecutive symbols 0, the more slow-wave components (those with frequencies smaller than 8 Hz) is present in the analyzed EEG-signal. And the appearance of slow waves may be related to wakefulness-sleep 338
Chaotic Data Analysis and Hybrid ~ o d ~ l ~339 n g
transition and to decreasing of vigilance level as it may be observed on EEG of a person falling asleep (cf. Fig. 1). The second method is based on fractal dimension of single EEG-signal. It was demonstrated that Higuchi's fractal dimension, Df,of EEG-signal shows significant individual differences but the trend is always the same - Df decreases from wakefulness to sleep stage 1. Artifacts only increase Df. First the program analyzes driver's EEG 'at start', calculates mean Dt,for the driver in wakefulness and takes it as the 'reference value', Dp . Then, if during driving Dt, decreases in several subsequent epochs and reaches certain threshold below Dp , so showing that the driver may be falling asleep, the program activates an alarm - Df increases showing that the driver's vigilance increased again. The threshold that activates alarm may also be individually tuned relatively to the driver's D p The example of applying this method in analysis of EEG-data recorded in a real-car driver's vigilance-monitoring experiment in CERTH (Thessaloniki, Greece, data of June 22,2005, channel C3) are shown on Fig. 2.
0 10 DO 20 a0 30 00 40 00 50 DO 60 00 70 DO
A2 A1 02
OZ 01 T6 P4 PZ P3
BO 00
T5 a, T4 g c4
1000 1100 1200
m
5
na OD
cz
:; F8 F4 FZ
f3
F7 FP2 FPZ
FPl 0
200
400
600
800
1000
tlsl
Fig 1.
1200
1400
340
Recent Advances in Stochastic Modeling and Data Analgsis
wake
t-
stage 1 --
REM
-.
stage 2
-
stage 3
-
stage 4 -
c
Df
time
t
1s
16
c
0:O:O
0:3:0
0:6:0
090
012:O
015:O
01&0
0:21:0
0:24:0
time
Fig. 2. These nonlinear methods of signal analysis may also be applied in combination with hybrid models of circulatory and respiratory systems, a new type of models developed in IBIB PAN for testing and development of prosthetic devices and for planning of medical procedures [Ferrari et al., 20031. In case of physical models one has to build a different model for each specific application and such models are usually expensive while their accuracy and flexibility are rather poor; numerical models are much cheaper, accurate and flexible, but their applicability is limited. Hybrid models are combinations of numerical and physical models. First, a numerical model is developed and then some of its parts are transformed into a physical model. Any part of the numerical model
Chaotic Data Analysis and Hybrid ~ o d e l a n g 341
can be replaced by a physical section (hydraulic, pneumatic, or electrical) and two interfaces. The advantages of such solution are evident - the physical part of the model is minimized and reduced to the very essentials of the specific application. On the other hand, the numerical part can easily be replaced or modified to reproduce the remaining parts of the whole modeled system, e.g. circulatory system. By using lumped parameter method the exchange between the numerical and the physical sections can be limited to one variable. In our model the interfaces are based on impedance converters (Fig. 3.).
Fig. 3. owledg~ent$ This work was partially supported by IBIB PAN statutory funding. GBAF acknowledges support by European Union FP6. IP SENSATION under grant IST-507231 and by SPUB of the Polish Ministry of Education and Science. MDarowski acknowledges support of the Foundation of Polish Science. GBAF acknowledges obtaining data on vigilance monitoring in real driving conditions from CERTH, Thessaloniki, Greece, Partner in SENSATION Project. Calculations of fractal dimension were done by Ms. E.Olejarczyk.
eference$ [Klonowski, 20061 W.Klonowski: Application of nonlinear dynamics in biosignal analysis, Proc. SPIE, , vol. 5915, pp. 335-344, 2006; [Ferrari et al., 20031 G.Ferrari, M’Kozarski, C.De Lazzari, K.Gorczynska, R.Mimmo, MGuaragno, G.Tosti, and M.Darowski: Modelling of cardiovascular system: development of a hybrid (numerical-physical) model, Int. J. Art. Organs, vol. 26, pp. 1104-14,2003.
Stochastic Fractal Interpolation Function and its Applications to Fractal Analysis of Normal and Pathological Body Temperature Graphs by Children Anna So&' Babes-Bolyai University Faculty of mathematics and Computer Science M. Kogalniceanu str. No. 1. 400084 Cluj Napoca, Romania (e-mail: [email protected] .ro) Abstract. In this article we will prove existence and uniqueness conditions for stochastic fractal interpolation functions. As application we will study the normal and pathological human body temperature control. Keywords: fractal function, stochastic fractal interpolation, fractal dimension.
1 Introduction The notion of fractal interpolation function was introduced by Barnsley in [l]. The function f : I + R, where I is a real closed interval, is named by Barnsley fractal function if the Hausdorff diiiiensioiis of his graph is noniiiterger. In this paper we introduce such continuous fractal functions with interpolation properties and the random version of them. This type of functions describe not only profiles of mountain ranges, tops of clouds and horizoiis over forests but also temperatures in flames as a fuiiction of time, electroencephalograph pen traces and even the minute-by minute stock market index. In application the experimental data, recruted from healthy children and from clinical records of children with febrile infection diseases, were processed using fractal interpolation. The conclusions arc: the human thermic control system is probably a fractal; the values of fractal diincnsions werc different for healthy and sick children; fractal analysis could be used as a diagnostic tool in the febrile diseases.
2
Fractal interpolation function
Let X be an interval and let @i : X such that
-+
X i be a given collection of N bijections
{ X i = @pI(X)IiE (1, . . N } } 342
Stochastic Fractal Interpolation Function and its Applications 343 is a partition of X , i.e.
Ui,,Xi N
For gi : X i
=X
and
i n t ( X i )n i n t ( X j )= 0,
, i E (1, ..N}< define U i g i : X
+Y
( ~ i g i(x) ) = g j ( x ) for
-+
for
i # j.
Y by
x E Xj.
Assume that mappings Si : X x Y + Y , Si(x,.) E L i p " ( Y ) , x E X are given , i E (1, . . N } . Lip
Sf = uis,(@;l, f We say
0
@;I)
f is a selfsimilar fractal function if S f = f.
+
Let (50,..., ZN} be a set of N 1 distinct points in X and let (yo, ...,YN} be a set of points in Y c R. The collection := {(xo,yo), ..., ( z ~yN)} , is called a set of interpolation points in X x Y . A fractal function f is said to have the interpolation properties with respect to r if f ( s j ) = yj for all j = 0,1, ...,N.
r
Denote
C*(X,Y):={f EC(X,Y)I
f(xj)=yj,
j E ( 1 , ..., N}}.
In [l]Barnsley prove the following result:
r
Theorem 1 Let be a set of interpolation points and let S be a scaling law for functions. Suppose si(z0,yO)= Yi-1,
S i ( Z N , Y N ) = Yi
for all i E (1, ..., N} and A, := maxri < 1. Then there exists a unique selfsimilar function f * E C*( X ,Y ) and the fractal dimension of its graph is
the unique real solution FD the equation N
i= 1
344
Recent Advances in Stochastic Modeling and Data Analysis
Example 1 Let I := [a,b] and r := { ( x , ,y , ) } c I X R zs gzven, a Suppose @, : I + I,. @,(Z) := a,a d,,
:= ZO,
b
:=
XN.
+
where a,, d , E R,
i E (1, ..., N } . Denote
C ( I ) := {f E C ( I , I ) I f ( x , ) = y,, Let
S,: I
i E (0, ..., N } } .
x R -+ R defined by
&(z, y ) := cix
+ r i y + ei
for i E {I,...,N } , x E I . If lril < 1 is given we can compute ai,ci,di, ei by the conditions S i ( x 3 , Y O ) = Yi-1, Si(XN1YN) = Yi. W e have
3
Random fractal interpolation function
Next we consider the random version of the above construction. ..., S N ) is a random variable whose The random scaling law S = (57, value are scaling laws. We write S = distS for the probability distribution d determined by S and = for the equality in distribution. Let (ft)tEx be a stochastic process or a random function with state space Y. The random function Sf is defined up to proba,bility distribution by
Sf = u i s z ( q l , f ( i )O S , 1 ) , where S,f('), ..., f") are independent of one another and f ( 2 ) (1, ...,N } . We say f is a selfsimilar random fractal function, if d Sf = f.
f, for i E
Stochastic Fractal Interpolation Function and its Applications 345
r
Let := {(zi,yi)} c X x Y a set of interpolation points in X x Y . A random fractal function f is said to have the interpolation properties with respect t o if f ( z i ) = yi a. s. for all i E {0,1, ..., N}. Let @i : X + X be contractiv Lipschitz maps such that @i(zo) = zi-1 and @ i - ~ ( z=~ z )i for all i E (1, ..., N}. Let S be a random scaling law defined by Si : X x Y + Y such that Si(z,.) E Lip<'(Y) for all z E X and
r
si(zo,Yo)= Yi-1
a.s.
and
S~(~N,= YN yi ) a.s. for all i E {I, ...,N}. Denote C W ( XY, ) := { f : 0 x X
+
Y ,f
continuous a s . }
and
as.,
C : ( X , Y ) := { g E C W ( X , Y ) l g ( z i = ) yi
i E { 0 , ..., N } } .
Let
IL,
:= { g : 0 x
X
<
+ Y / e s s s u p e s s s u p d y ( g W ( z ) , a ) 00) X
W
for some a E A . For f , g E
IL,
we define
dt,( f , 9 ) := ess SUP d,(f",
gW),
W
where dm(f1g) = esssupd(f(z),g(z)). X
Theorem 2 Let r a set of interpolation points in X x Y and let S be the random. scafing laui d e f i m d above. If ,A := ess sup, maxi ry < 1 and esssupmaxdy(Si(a,f ( a ) ) , a )< 00 w
(1)
2
for some a E X , then there exists the selfsimilar random function f * E
C : ( X , Y ) . Moreover, f * is unique up to probability distribution. Proof.: One can check that (L,, d,*) is a complete metric space. Next we show that S : IL, -+ IL, is a contraction map with contraction constant A,. Using the Lipschitz property of Si we have d L ( S f , S g ) = esssupd,(SfW,SgW) = W
= e s s s u p e s s s u p d c ( ~ i ~ i ( ~ fPW i( 2l) , o ~ ; ' ( z ) ) , W
X
uisi(@;7(2),gw(i)
< ess sup(r;ess
-
W
0
@ i l ( z ) )5 )
sup d y ( f " ( 2 ) X
(z), g w ( i ) ( z ) )5) X,d&
( f ,g ) .
346
Recent Advances in Stochastic Modeling and Data Analysis
Then there exists f * with 9f * = f *. For the uniqueness of f * we definy a s in [5] a metric on the set L , of probability distributions of members of L, by
The (13,) )@ ;
is a complete metric space and S is a contraction map. To see
this, choose f ( i ) 2 F and g(i) 5 G such that (f(i),g(i))are independent of one another and such that
d:(F’, G ) = d z ( f ( i ) , g ( z ) ) . d
Choose (S1, ...,S N )= S independent of ( f ( i ) , g ( i ) ) Since .
d;(Sfy
Sg(i)) 5 A,c,(f@),g(i))
it follows that
d 2 (SF, SE) 5 A d 2 (F, E). Then there exists f * E C : ( X , Y ) which satisfies 9. We have to prove that f * ( x i ) = yi a s . for all i E {I, ..., N}.For, f*(Xi) =
(Sf*)(.i) = uisi(@z:y?& f’ 0 @;1(xi))= = Si(x~,f*(x~)) = yi
a.s.
Remark. a) If X = I , Y = IR and Six,y = Si(y) then we have the Corollary of Theorem 6 in [ 5 ] . b) F * ( X ) is the selfsimilar random set K* which satisfies S in Theorem 2 in [ 5 ] . c) The graph of the random function is a selfsimilar random set. Example 2 Let X = [0,1], Y = defined b y
IR and N >
9. The interpolation set is
r := { ( x i , y i ) E [o, 11 x q o = xo < 2 6 < ... < xN Suppose @i
whece ai,di E R,
: X +Xi,
@ i ( x ):= aix
i E (2, ..., N } . Let Si : X x Y &(x, y)
:= cix
= 1).
+di,
+
Y defined by
+ riy + ei
for i E {l,...,N},x E I where ri is a random variabge such that A, := ess supwmaxi ri < 1. W e can compute ai, ci, di, ei b y the conditions @i(x,) = xi-1, @ i ( x ~ =xi ) and
Si(xo,yo)= yi-1,
S ~ ( Z N , Y= N yi )
a.s.
Stochastic Fractal Interpolation Function and its Applications
347
for all i E (1,...,N } . Let Wi : X x Y + X x Y deuibed by Wi(x, y ) = ( @ a ( x ) , S a ( x , y ) )f o r i E {l,..., N } . Using the random scaling law W :=
(WI, ..., WN), defined by
wa : x x Y + L x Y,
S z ( x , y )= (@i(x),sz(x,y)) i = 1, ..., N ,
f o r a n y KOC X x U one defines a sequence of random sets
K , = WK,-I = UEIW,WK,-l = Wn(Ko). Then esssupdH(Wn(Ko),graphf*) + 1 W
as n + CQ, where
4
dH
denote th.e Ha.usdor8 distance.
Applications
As application fractal analysis of normal and pathological body temperature graphs by children will be given. The human body temperature is controlled by a complex biocybernetic system ([8], [9],[10]). The dynamics of the body temperature is non-linear in time, therefore, in the same way as other physiological graphs e.g. EEG, ECG, EMG, its graph cannot be described and modelled in the terms of Euclidean geometry. Because the electrophysiological graphs mentioned above possess a proven fractal structure, we supposed the body temperature graphs to have the same feature. In this case we could obtain valuable information using fractal tools in the analysis of these graphs. The main objectives of this study were: 1) to study the fractal features of the physiological body temperature graphs; 2) to compare the values of the fractal dimension (FD) of normal temperature graphs and febrile graphs, registered in the frames of febrile diseases with known origin; the goal of this comparison was t o look for correlations bctwccn the type of the febrile infections’ patomechanism and the FD-value of the corresponding graphs. Material and methods: first we studied physiological body teinperat.ure graphs using a series of time scales. Thereafter we compared the FD values of normal and pathological temperature graphs recorded from children. Results: 1) by applying a range of time scales (1 t o 12 hours) over the temperature data recorded hourly in 24 hours, results that for the precise determination of the FD value, there is needed a larger amount of experimental data (continous record); 2) the mean values and dispersion of the FD values of the febrile graphs are different of that ones of the normal graphs (figs 1,2 and 3). Discussion: 1)the pathophysiological mechanisms of the febrile reaction are not completely cleared yet. Thus there appears to exist a relationship between the type and quantity of the endogene pyrogens and the dynamics
348
Recent Advances in Stochastic Modeling and Data Analysis
of the febrile reaction. The results of this study raisc the idca of interdependency between the pathophysiological mechanisms of some febrile diseases with infectious origin arid the FD value of the febrile graphs. To confirm this hypothesis there are needed more accurate determination of the body temperature values. 111 the case of confirmation, the methods of fractal analysis could be extended over other febrile diseases including fever with unknown origin; 2) The F D values of the temperature graphs can quantify the dynamics of the system which controlls the body temperaturc.
Fig. 1. Bacterial
Fig. 2. Healty
Stochastic Fractal Interpolation Function and i t s Applications
349
Fig. 3. Virus
References l.M.F. Barnley: Fractal functions and interpolation, Constructive Approximation, 2 (1986), 303-329. 2.M.F. Barnley: Fractals Everywhere, Academic Press, 1993. 3.J.B.Bassinghtwaighte, L.S. Liebovitch , B.J.West: Fractal Physiology, Oxford University Press, New York, Oxford, 1994. 4.J.E.Hutchinson: Fractals and Self Similarity, Indiana University Mathematics Journal, 30 (1981), no.5, 713-747. 5.J.E.Hutchinson, L.Riischendorfi Selfsimilar Fractals and Selfsimilar Random Fractals, Progress in Probability, 46, (ZOOO), 109-123. 6.J.Kolumbdn1 A. Sobs: Invariant sets of random variables in complete metric spaces, Studia Univ. "Babes-Bolyai", Mathernatica, XLVII, 3 (2001), 49-66. 7.R. Massopust: Fractal functions, fractal surfaces and wavelets, 1994. 8.M. SzBkely, A.A.Romanovsky: Pyretic and antipyretic signals within and without fever: a possible interplay. Med. Hypotheses 1998 Mar; 50 (3): 213218 9.M. SzBkely , Z. SzelBnyi: A kise'rletes ldz puto~iriechuri~lz~ii~su I . Prostuylundinok szerepe a kiizponti reguldcids vciltozdsokban. EME Orv. Ertesito, 1994: 66:119121 10.Z. SzelBnyi, M. SzBkely: .4 k i s k l e t e s ldz patomechanizmusa II. -4 5holecystokinin-oktapeptid lehetsiges szerepe a ldz kialakuldsdban. EME Orv. Ertesito. 1994: G6: 123-127
A Modeling Approach to Life Table Data Sets Christos H. Skiadas' and Charilaos Skiadas' Technical University of Crete University Campus 73100 Chania, Crete, Greece (e-mail: skiadasQermes .tuc .gr) Hanover College Hanover Indiana, USA (e-mail: skiadasQhanover.edu)
Abstract. A modeling approach to Life Table Data sets is proposed. The method is based on a stochastic methodology and the derived first exit time probability density function. The Health State F'unction of a population is modeled as the mean value of the health states of the individuals. The form for the health state function suggested here is relatively simple compared to previous attempts but the application results are quite promising. The model proposed is a three parameter model and it is compared to the three parameter Weibull model. Both models are applied to the Life Table Data for males and females in Greece from 1992 to 2000. The results indicate that the proposed model fits better to the data than the Weibull model. The methodology for the model building and the model proposed could be used in several cases in population studies in biology, ecology and in other fields. Keywords: Life Table Data, Health State Function, Stochastic modelling, Hitting time density function, Gompertz model, Weibull model.
1
Introduction
Many attempts were done during the last centuries t o model Life Table Data and the inevitable decay process of an original population during time. The most important for this study are the models proposed by Gompertz t o express the law of human mortality and the Weibull model as a model t o express the failure of items in a set of products. The later is also a flexible model t o express the distribution function of the number of deaths of a population. The task of this study is t o propose a simple three parameter model that can be applied t o life table data based on a stochastic theory presented in previous studies. In these studies more complicated models where proposed and applied. However, these models where quite heavy t o handle and apply. Furthermore, as these models have several parameters, it is not possible t o test their fitting ability relative t o the simpler three parameter models in use, like for instance the Gompertz the Weibull models. 350
A Modeling Approach to Life Table Data Sets
2
351
Model Analysis
The Gompertz model, proposed by Benjamin Gompertz (1825) and analysed by other researchers many years ago for the analysis needed in this study (Winsor, 1932), has the form of the following density function: g ( t ) = ke-etexp (-be-et)
The model is left skewed and it is not easy to apply to life table data in all the data range. Instead it is a good model to express the mortality data from an age close to 30 years up to a maximum level of the death rate, usually between 70-80 years depending on the data set used (males or females), (Haybittle, 1998). The Weibull model was proposed by Waloddi Weibull (1951). The probability density function of this model has the form:
Usually this is considered as a 2-parameter model. However, when applying the model to data as is the case in this study an extra parameter is present. Then the model takes the following 3-parameter form:
f(x;k , A, c) = c ( ; ) k - '
e-
(3
where the parameter c will be determined when fitting the model t.o the data.
3
The Proposed Model
In previous studies a general model was proposed, based on the theory of the hitting time of a stochastic process in a barrier located at a distance a from the horizontal axis which expresses the age of the individual (Janssen and Skiadas, 1995; Skiadas, 2006b,a). The stochastic variable expresses the health state of the individual whereas the mean value of the process expresses the health state of the population. This model, for the case of the state of human health, has the following probability density function (Skiadas, 2006b,a):
(a* - H ( t ) ) *
g ( t ) = k*c r m
where k* is a normalisation constant defined by the formula:
lm
g(t) dt = 1
352
Recent Advances in Stochastic Modeling and Data Analysis
Where H ( t ) is the health state function and g is the variance. Although the concept of the health and the health state are terms very much defined and used in our societies from the very beginning, a mathematical analytic form is not obtained. It is a common knowledge that the health state decrease over time, and also that frequent and sudden changes appear, thus leading to the acceptance that the health state follows a stochastic path during time. Regarding the mean value of the health of a population denoted by H ( t ) in equation (l),this must be expressed by a function of unknown form which is decreasing at least for large values of the age t . The development and application of the Gompertzian theory in the last two centuries indicated that a rapidly decreasing function may express the state of human health in the last years of the human life very well (Haybittle, 1998). Instead, in the middle period of human life, the mean health state could be expressed by a slowly decreasing function of time. There remain the first years after birth, when the state of the human health is in relatively lower levels. The modeling of this early period, along with the remaining life time, was done in previous studies by introducing a quite heavy model for the health state function. The application of this model to life table data was successful. However, it was not possible to compare it to simpler models, such as the Gompertz and Weibull models, due to the larger number of parameters of the proposed model. Also the proposed model was modeling the total life period including the first stages of human life, something that is not possible by applying the simpler models. Application of this model shows that the form of the health state function for large ages is mainly flat and slowly decreasing. Another very important point is that when we expand the unknown health state function H ( t ) in a Taylor series, the rapidly decreasing part of the function could be expressed simply by a term having a large exponent. This means that a function of the following form could be a simple but quite good approximation to the real situation: H(t)=
-
(eqb
where c, e and b are positive parameters. For b > 0 we see that H ( t ) is a decreasing function. Especially for b >> 1, then H ( t ) is a rapidly decreasing function. Figure 1 illustrates this case for various values of the parameter b (c = 20, l = 0.03 and b = 3 , 4 , 5 ) . Introducing the above value for H ( t ) in equation (l),the following form results:
+
where a = a* c, k is the new integration constant and the variance r~ is included in the parameters k , a and e. Without loss of generality, in many applications it could be assumed that 0 = 1.
A Modeling Approach to Life Table Data Sets 353 20
-;
15
-
10
-
5-
0 I
I
I
I
Fig. 1. Health state functions of the form H ( t ) = c - (!z)~.
Figure 2 illustrates the above case for various values of the parameter b ( c = 20, != 0.03 and b = 3 , 4 , 5 ) . It is clear that the higher the value of the parameter b, the faster the rate of decrease is, and the sharper the density function becomes.
I
0.6
0.5
-
::: b= 5
0.4
-
0.3
0.2 0.1
0.0
20
40
60
80
100
t
Fig. 2 . The density function for a health state function of the form H ( t ) = c - ( l ~ ) ~ .
354 Recent Advances in Stochastic Modeling and Data Analysis
4
Applications to Life Table Data and Comparisons
Depending on whether the life table data we are looking at are for male or female populations, a different value of b will be used ( 6 = 4 for males and b = 5 for females is selected). The following table shows the results of fitting the life table data for Greece from 1992 to 2000 to our model, using a nonlinear least squares fit, and comparing the fit with the corresponding fit from the Weibull distribution. For the purposes of the performing fit, the parameters are set in a slightly different way. Here is the density function from the proposed model: g(t; k , l ,a ,b) = Ic (lt)-3/2 exp And the corresponding equations for the Weibull model: g(t; c, B , ~ c )= c (tB)"-l exp (-(tB)') Table 1 summarises the non-linear regression results for Greece (males) from 1992 to 2000. The proposed monomial model shows better fitting behavior compared to the Weibull model for all the nine years studied. Table 2 shows the corresponding results for females. Another interesting point is to find out at which time T the value of the health state function H ( t ) becomes zero. This is achieved when
For the case of males ( b = 4) the estimated value is T = 82.1707 in 1992 and T = 82.69 in the year 2000. There appears to be an increase of t = 0.51929, or approximately half of a year, in a 9 years period. For the case of females the estimated value is T = 85.37205 in 1992 and T = 86.61715 in the year 2000. There appears to be an increase o f t = 1.2451 in a 9 years period. Using our estimates for the parameters of the model, we may construct a graph to illustrate the health state function for males and females at a specific year. The Figure 3 shows the health state function for males and females at the year 2000. As expected from the above theory, in both cases there appears to be a relatively stable period, represented by the flat part of the curve for the years up to 40 for females and up to 30 for males. Then a gradually decreasing period follows. The health state function shows higher values for females compared to that for males. Figures 4 and 5 illustrate the raw data for Greece 2000 for females and males respectively, along with curves expressing the proposed model and the Weibull model. In both cases the fitting is quite good. The relatively higher sum of squared error for males is due to the sharp form of the data in the
A Modeling Approach to Life Table Data Sets 355
I
11
11
Proposed Model Fit
Year
k
e
a
ss
I
Weibull Fit C
k
B
SS
1992 9.49813 0.02523 18.47361 3.95820 8.66475 7.41596 0.01236 5.26180 1993 9.51720 0.02523 18.47850 3.91172 8.68521 7.42218 0.01235 5.18750
1995 9.35083 0.02508 18.33543 4.61074 8.54554 7.29311 0.01232 5.76867 1996 9.20313 0.02500 18.15040 3.74582 8.42971 7.15397 0.01232 4.62984 1997 9.25545 0.02503 18.19395 3.48055 8.47724 7.18893 0.01233 4.24332 1998 9.20220 0.02494 18.13499 3.75199 8.42760 7.11416 0.01230 4.28701 1999 9.26440 0.02500 18.20473 3.27281 8.47366 7.15850 0.01231 3.64226 2000 9.33965 0.02501 18.29502 3.72047 8.52884 7.20542 0.01230 3.95039
Table 1. Fit comparison for Greece, Males.
I
11
Year
11
Proposed Model Fit k
e
a
ss
I
Weibull Fit C
k
B
SS
1992 10.50017 0.02149 20.76597 3.09603 11.42873 10.36549 0.01181 4.30740
11
~~ 110.1466810.0116613.22056 1999 10.3388010.02118120.66995~ 2 . 0 0 7 9 911.23670
Table 2. Fit comparison for Greece, Females.
356
Recent Advances i n Stochastic Modeling and Data Analysis
20
40
60
100
80
t
Fig. 3. Estimated health state functions for males and females, Greece 2000.
range of the maximum death rate, as is illustrated in Figure 5 . For males the main part of the error term is due t o fluctuations around the range of the maximum death, as well as the deaths at ages 18-28 years, which are mostly due to accidents and other reasons.
0
Actual Values Proposed Fit Weibull
w -
0
-
0
20
40
60
80
age
Fig. 4. Fit, Females, Greece, 2000
100
A Modeling Approach to Lzfe Table Data Sets
357
0
I
I
I
I
I
I
0
20
40
60
80
100
age
Fig. 5 . Fit, Males, Greece, 2000
A comparative view of the sum of square residuals (SSR) is illustrated in the Figures 7 and 6, where the SSR of the proposed model are plotted against the SSR of the Weibull model. The results show an almost stable superiority of the proposed model for the case of females. In the males case the proposed model shows lower SSR in all the tested time period. However, the difference tends to shorten over time. The results regarding the SSR of the proposed model show that perhaps a model with a varying parameter b could give better results. Also we can test our assumption regarding the values of the parameter b selected for males ( b = 4) and females ( b = 5 ) . The parameter is close or higher to b = 5 for females whereas for males there appear values larger or smaller than b = 4. However this value is close to b = 4 for the years 1996 to 2000.
5
Conclusions
In this paper we proposed and applied a 3-parameter model to express the distribution function of the number of deaths of a population. This model was applied to the Greek life table data from 1992 to 2000. The proposed model was tested and compared with the Weibull model. The comparative results were quite promising, indicating a better fit for the proposed model than the commonly used Weibull model. The good application results of the proposed model strengthen the underlying theoretical assumptions of the stochastic theory used. The modeling of life table data sets based on the hitting time theory and the resulting
358
Recent Advances in Stochastic Modeling and Data Analusis Greece, Males
1992
1994
1996
1998
2000
Year
Fig. 6. Comparison of the sum of square residuals for the two models, Males. Greece, Females
f Weibull
1992
1994
1996
1998
2000
Year
Fig. 7. Comparison of the sum of square residuals for the two models, Females.
A Modeling Approach to Life Table Data Sets 359 probability density function seems to be a quite promising new direction of research.
Bibliography B. Gompertz. On the nature of the function expressing of the law of human mortality. Philosophical Transactions of the Royal Society, 36:513-585, 1825. J. L. Haybittle. The use of the gompertz function to relate changes in life expectancy to the standardized mortality ratio. International Journal of Epidemiology, 272385-889, 1998. J. Janssen and C. H. Skiadas. Dynamic modelling of life-table data. Applied Stochastic Models and Data Analysis, 11(1):35-49, 1995. C. H. Skiadas. Dynamic modeling of greek life table data. In F. Vonta, M. Nikulin, N. Limnios, and C. Huber-Carol, editors, Statistical Methods for Biomedical and Technical Systems, pages 449-454. Birkhauser, 2006. C. H. Skiadas. Stochastic modeling of greek life table data. Communications an Dependability and Quality Management, 2006. W. Weibull. A statistical distribution function with wide applicability. Appl. Mech., 18:293-297, 1951. C. P. Winsor. The Gompertz curve as a growth curve. Proceedings of the National Academy of Sciences, 18(1):1-7, 1932.
An extended quadratic health state function and the related density function for life table data Charilaos Skiadas’, George Matalliotakis’, and Christos H. Skiadas’ ‘Hanover College, Indiana, USA (e-mail; [email protected] ) Technical University of Crete, Chania, Crete, Greece (e-mail: [email protected] and [email protected] ) Abstract. In previous papers a dynamic model expressing the human life table data by using the first-passage-time theory for a stochastic process was formulated. It was also proposed a model for the health state function of a population and it was applied to the data of some countries. In this paper we propose a quadratic and an extended form for the health state function and we introduce this form to the density function derived by using the firstpassage-time theory for a stochastic process for the number of annual deaths of a population. The Health State Function H(t) is assumed to be close to zero at the time of birth and then increasing to a maximum health level and gradually decreases to a zero level at the time of death. The form of the density function includes also the high level of deaths occurring the first years of the childhood. Another very interesting feature of the extended quadratic health state function is that the resulting density function could fit quite reasonably the raw life table data provided by the Bureau of the Census. The use of a simple quadratic model and an extended quadratic model provides the researcher with a quite good explanatory tool with implications to pension funds and option theory. Keywords: Health state function, Life table data, density function, first-passage-time theory, stochastic process, pension funds, option theory.
1. Introduction In previous papers [2, 4, 5 , 61 a dynamic model expressing the human life table data by using the first-passage-time theory for a stochastic process was formulated. After the pioneering work of Siegert (1951) [3] regarding analytic solutions of the first-passage-time probability problem a quite extensive bibliography appeared. The aim was either to derive analytic solutions or, when this is not possible, to approximate satisfactorily the firstpassage-time density function under consideration. The first passage time from an absorbing barrier located at distance a from the origin was analytically described in previous papers [2,4, 5 , 61, where the main task was to formulate a probability density function called first passage time density function. It was also proposed a model for the health state function of a population and it was applied to the data of France, Belgium and Greece. 360
A n Extended Quadratic Health State Function
361
2. Methods In this paper we propose a quadratic and an extended quadratic form for the health state function and we introduce this form to the density function derived by using the first-passage-time theory for a stochastic process for the number of annual deaths of a population. The extended model is applied to the mortality data of Greece. A stochastic simulation is also performed for the Health State Function. In previous studies we introduced the concept of health state of a population modeled by a continuous-time stochastic process
s = (S(t),t 2 0 ) where the random variable S(t) represents the health state of an individual at time t. The event "death" is defined as the time that this health state hits for the first time a minimal health level called a. Accordingly, the life duration of the individual is the value of this hitting time T of the set (0, a ) for the process S.
3. The stochastic model and the related parameters This model is a stochastic one of continuous time of the form with constant (3 provided by the stochastic differential equation:
dS(t) = p(t)dt +odW ( t ) where S(t) is a stochastic variable expressing the state of health of an individual, p ( t ) is a function of time expressing the infinitesimal mean development of the state of human health and here termed as deterioration function [I] according to the analysis given by B. Gompertz as early as 1825, CT is the infinitesimal variance of the human health that is assumed to be constant in the proposed model and W(t) the standard Wiener process. S(t) comes from direct integration from the above stochastic differential equation. According to the definition of p ( t ) this function must be related to the function expressing the mean state of the human Health H(t) by
We assume that the mean value H(t) or the mean state of health function during lifetime will follow a path of the general form illustrated in Figure 1. There is a rapid improvement of the state of human health after birth when H
362
Recent Advances in Stochastic Modeling and Data Analysis
is close to a low value and then it follows a period of slow improvement and decline.
Deterioration Function p(t)=dH(t)/dt
20
-10 Figure 1. Health State Function and Deterioration Function
4. The hitting time density function The main assumption regarding the state of human health is that it follows a stochastic process expressed by S(t) and that the end of the life-time is achieved when the stochastic variable S(t) arrives at a minimum level of health state here denoted by a. This level a in terms of the first passage time theory is expressed by a single barrier located at a distance a from the origin. This is illustrated in Figure 2 where the paths for two individuals are illustrated along with the health state function H(t). Health State Function H(t) and Stochastic Paths 6o
1
50 -
Years
I
0
10
20
30
40
50
60
70
80
90
100
Figure 2. Health State Function and Stochastic Paths
110
A n Extended Quadratic Health State Function
363
Consequently, the density hnction g(t) expressing the distribution of the first passage from the barrier a is exactly the probability density function that provides the number of deaths between t and t+dt where t is just the age of the individuals. This function for the case of the state of human health studied here has the following form:
where k is a normalisation constant defined by the formula:
1g(t)dt
=1
5. A Quadratic Health State Function Without significant loss of generality we may adopt the following simplifications: The barrier a could be considered that 1) outside the exponent is included in the integration constant and 2) inside the exponent is included in the health state function. We also assume CT = 1. Another useful transformation is by replacing the parameter k by k* where k = k * & ; ~ t Thus the resulting density function becomes as follows:
.
_ _3
g( t ) = k * ( t l T ) ' e x p [ - q ] L
J
The Health State Function H(t) is assumed to be close to zero at c.e time of birth and then increasing to a maximum health level and gradually decreases to a zero level at the time of death. Assuming that the starting point is the zero health state level an elementary quadratic function will express the health state function, that is
t H ( t ) = bt(1 - -) T Where T is a critical age to be specified and b is the health state parameter. The resulting form of the density function is given by
364
Recent Advances in Stochastic ~ o ~ e l z nand g Data Analysis
Obviously, the maximum level for the density function is achieved when t = T, where T is the age with the maximum death rate. We already know that this level is higher but close to 80 years for women and less than 80 for men. The quadratic form for the health state function has several advantages. First, is quite simple and provides immediately a maximum level at the age t = T/2. This level is equal to Hmax = bTi4. That is the maximum health state and is proportional to the year Twhere the maximum death rate is achieved. The quadratic Health State Function is easily simulated and the results are illustrated in the next Figure 3. The parameter T = 80 years, thus giving as maximum health state level t = T/2 = 40 years a very reasonable result. The maximum level of the health state function is Hntax = bT/4=2Ob
1
10
19
28
37
46
55
64
73
Age (years)
Figure 3. The Quadratic Health State Function The resulting density function for a quadratic health state function is illustrated in the next Figure 4.
A n Extended Quadratic Health State Function 365 The Density Funcion
0
20
40
60
ao
100
Age
Figure 4. The density function of the Quadratic Model We select the maximum death rate at t = T = 80 years. Then the form of the density function includes also the high level of deaths occurring the first years of the childhood.
6. The Extended Quadratic Health State function However, the form of the quadratic function assumes a large number of deaths during the first period of life and we propose an extended form assuming a health state function of the form:
H ( t )= & F ( c
-(t
or
H ( t ) = &(c
- (Zt)b)
Where c is a parameter and I = 1IT. The improvement here is that the first term in the previous quadratic function is included in a square root thus enabling a better capturing of the first years period of the human life. The second quadratic term has now a varying
366
Recent Advances in Stochastic Modeling and Data Analysis
exponent b giving more flexibility to the model and enabling to capture effectively the medium and last periods of the human life time. Then the density function will be:
-
g ( t ) = k ( l q 3 I 2e
(It)(c-( I t ) b ) 2 2t
or
An additional very interesting feature of the extended quadratic health state function is that the resulting density function could express quite reasonably the raw life table data provided by the Bureau of the Census. Between other futures of the model also the low mortality level between 1 and 45 years is very well expressed and the mortality rates for higher ages. The Table I and the diagrams (Figures 5 and 6) below is the result of fitting the extended model on Greek life table data for males and females, 2000. The Health State Function is also illustrated in the same Figures.
Table I Greece, 2000
b
I
Males
1.8
Females
2.5
Parameters
C
F
0.0580
16.690
323.61
3.684
0.0391
20.648
253.62
3.413
MSE
A n Extended Quadratic Health State Function
367
Males, 2000 P
r9
S N
_------
2 2
c
0
20
40
60
80
100
Figure 5. Fitting the Extended Quadratic model. Greece 2000, males The parameters of the last formula are estimated by using a non-linear regression analysis technique. The parameters of g=g(t) are estimated by an iterative direct non-linear least squares method that leads to the minimization of the sum of the squared errors. The results show that even this relatively simple model (the extended quadratic) can capture all the raw data series form for Greece including the first period of the life time and giving also good results for the area of the maximum death rate.
Recent Advances in Stochastic Modeling and Data Analysis
Females, 2000
0
actual data fitted data Health state (scaled)
i 20
40
60
0
80
100
Age
Figure 6. Fitting the Extended Quadratic model. Greece 2000, females
7. Conclusion Of course the results of the more complicated model used in the previous works are more accurate but the use of a simple quadratic model provides the researcher with a quite good explanatory tool, especially when the proposed theory is not widely spread. Applicants find quite difficult the handling of the complete theory including stochastic and deterministic modelling, the first hitting time theory, stochastic simulation and nonlinear regression analysis. However, the resulting extended quadratic density function and the establishment of a health state function based on the most strong data basis we have in use are quite good achievements as to approve and use the proposed theory and method.
An Extended Quadratic Health State Function 369
References 1. B. Gompertz, On the Nature of the Function Expressing of the Law of Human Mortality, Philosophical Transactions of the Royal Society, 36, 5 13-585 (1825). 2. J. Janssen and C.H. Skiadas. Dynamic modelling of life-table data. Applied
Stochastic Models and Data Analysis, 1(1):3549, 1995. 3. J. F. Siegert, On the First Passage Time Probability Problem, Physical Review, 81, 617-623 (1951). 4. C. H. Skiadas. Stochastic modeling of greek life table data. Communications in Dependability and Quality Management, 9(3): 14-2 1,2006. 5. C. H. Skiadas. Dynamic Modeling of Greek Life Table Data, In F. Vonta, M. Nikulin, N. Limnios, and C. Huber-Carol, editors, Statistical Methods for Biomedical and Technical Systems, (BIOSTAT 2006, May 29-June 1, 2006, Limassol, Cyprus), pages 449454,2006 6. C. H. Skiadas. The Development of the Expected Life Time in Greece During Last Decades: An Application of a Dynamic Model of Life Table Data, 4th Conference in Actuarial Science & Finance on Samos, September 14-11,2006
CHAPTER 9 Statistical Applications in Socioeconomic Problems
Dumping influence on a non iterative dynamics Ckcile Hardouin’ SAMOS-Matisse - CES Universite Paris 1 90 rue de Tolbiac 75634 Paris Cedex 13, France (e-mail: hardouinQuniv-parisl .fr) Abstract. We consider n agents displayed on S choosing one by one a standard A or B according to a local assignment rule. There is no asymptotics in space or in time, since the scan of the network is unique. We study the final behaviour by simulations. The main goal of this work is to evaluate the effect of an initial
dumping on the final configuration. Keywords: Adoption dynamics, Cooperative systems, Dumping.
1
Introduction
This paper explores the diffusion of technological innovations, under a simple and real framework: agents choosing between competitive technologies. Many empirical or theoretical works study the agents’ behaviour in the process of standard’s adoption. Several modellings are proposed: Markov chain, cellular automata, Gibbs fields... We consider here a unique and non reversible choice for each agent, but we examine various choice procedures, in which the previous decisions are of more or less significance. More precisely, let S be a spatial finite set, S = {1,2, . . , N } 2 ,with n = N 2 sites; we can choose for S the bidimensional torus, and we a s u m e that the neighbourhood system is the four nearest neighbours system. Other sets S and other neighbourhood systems can be conceived, but radically, this will not change the qualitative nature of our results. If A is a subset of S, we denote d A = {i E S,i4 A and 3 j E A s.t. i and j are neighbours} the neighbourhood of A , and di = d { i } . For all the dynamics we study in this work, the agents make a choice between two standards A and B. The choice is made individually, one by one, according t o as sequential assignment rule. When this choice depends of the local context, we say that there is spatial coordination, the spatial dependency being positive if there is cooperation between the agents, and negative in case of competition. A scan of S is a tour of all the sites. When the scans are indefinitely repeated, we get the well known Gibbs sampler and it is possible t o characterize the probability distribution of limits configurations. When the dynamics is synchronous (all the agents make their decision simultaneously), there is still ergodicity but it is difficult t o explicit the limit distribution (See [lo], and [9] for a full description). 371
Recent Advances in Stochastic Modeling and Data Analysis
372
Our context here is different: we consider a non iterative dynamics with a unique scan of S. In this case, we don’t know the final configuration, since there is no asymptotics in space nor in time; besides, obviously, the final configuration is linked t o the initial one. We propose to study empirically this situation, in the case of an initial occurence of standards A at a given rate r.
2
Non-iterative dynamics and dumping
Let us note yi E E = { - l , + l , O } the state of site i , where +1 states for A is chosen, -1 for B, and 0 denotes that the choice has not yet been made. We want to study the effect of an initial contamination or dumping on the final configuration. An initial rate of contamination r ( T E [0,1])means that [nr]agents have A ([TI denotes the integer part of T ) : the initial layout is ] +1 randomly distributed on S , the other therefore composed of [ n ~sites sites being “non occupied”, with assignment 0. Then, these sites are visited one by one, in a random arrangement, and the new visited site is credited with +l or -1 according t o a local assignment rule. This rule is the same rule for all the agents and it is associated with the possibly previous choices of the neighbour sites. Obviously, the final configuration depends on the initial dumping rate r . We consider the three following assignment rules:
1. The strong majority choice: the agent choices the majority standard adopted by his neighbours. In case of equality, or if there are no occupied neighbour sites, he chooses A (resp. B) with probability T (resp. 1-T). 2. The weak majority choice: if the number of occupied sites is less or equal 2, then the agent chooses A (resp. B) with probability T (resp. 1-T). If the number of occupied neighbour sites is more or equal 3 , the agent follows the strong majority rule. 3. The probabilistic Ising type choice: if the 4 neighbour sites are non occupied, the agent chooses A (resp. B) with probability T (resp. l-T).On the contrary, noting Yai the configuration of the 4 nearest neighbours of site i, he chooses A with probability
P
is a parameter of spatial coordination; there is cooperation if ,O >0, while P < 0 leads to competition. When P + fm, the Ising rule is similar t o the strong majority rule. For simplicity, we fix T = 0.5.
Dumping Influence on Non Iterative Dynamics
373
Our experimental study is the following: we start with an initial rate ~ = 2 %and we increase T until 99%. For each value, and for each assignment rule we simulate 400 realizations on the square torus of size N = 64 ( n = 4096). For the Ising rule, we have chosen ,D = 0.5 (weak spatial coordination), ,L? = 1 which corresponds t o a beginning of aggregation, and p = 3 which leads to clusters. On the basis of the simulations, we present some estimations of important characteristics of the resulting configurations: the final proportion of standards A, spatial correlations, clustering indexes, connectedness measurements. These features show the influences of both the assignment rule and the initial rate on the final layouts. Finally we give analytic results for the distribution of the number of occupied neighbour sites under the null hypothesis HO that there is neither cooperation nor competition. These results allow testing Ho.
3 Empirical study of the final layout 3.1
The final frequency of standard A
For each value of T and each assignment rule, we get a sample of 400 final frequencies of standard A. The final frequency is estimated by the mean +A ( T ) .
Wmm.*”*
d.,Ymm.a
Fig. 1. Final frequency of standards A function of the assignment rule ; o :Ising p = 0.5 ; o : Ising?!, = 1 ; 0 : Ising ,!? = 3 ; * : strong.
: x : weak
Figure 1 shows the evolution T H ? A ( T ) for five assignment rules. As expected, the increase is stronger when the rule strengthens local cooperation. Thus, for the strong majority rule with an initial dumping ratc of 20%, the standard A will occupy 90% of the sites; moreover, if 80% of the sites initially non occupied would have been set equally to A and B, the final rate of A would have been ~ % + 2 0 % = 6 0 % ; the difference betwccn the two cases is
374
Recent Advances in Stochastic Modeling and Data Analysis
30%. We can see that the dumping amplifies the initial bias, intensifying the
A choice along the adoption process. In the case of a 10% rate of dumping, the difference is equal t o 23% (the final rate of A being 78%). we can see that the dumping influence is more important for little values of T . We also can compare the different rules; the curves for p = 3 and the strong majority rule coincide, while the increase for the weak rule is slower than for the king p = 0.5 case. Finally, we plotted the histograms of ? A ( T ) against T and the assignment rule. When T is small, we observe a Gaussian shape which is confirmed by a test. On the contrary, when the rate is greater than a critical rate T ~the , gaussian hypothesis is rejected. In fact, the final proportion of A is very close t o 1, with a quasi null dispersion. The threshold T~ depends on the rule; it is about 40% for the strong majority rule and Ising p = 3, and 50% for king p = 1. The main interest of a Gaussian feature is t o build a confidence interval for the final proportion of standards A. 3.2
Spatial correlations
The analytic expression of the spatial correlation is not explicit but we can easily obtain its characteristics via a Monte Carlo method. Thus, we calculate in this way the correlations at distance one, distance 2 and A, and finally the one based on the 8 nearest neighbours, which we will denote respectively p1, p a , p2, and p&,, and this is done for each rule. Comparing the different correlations for a same rule, we observe p1 2 psv 2 p,pj 2 p2 ; besides the correlations all decay towards zero but more quickly when the neighbourhood is close; then they are all equal from T = 85%. Comparing the rules, all the different correlations have similar bchaviours, so we present only p1. The main aspect is that the dumping’s influence is more important for weak levels of initial rates, producing high correlation.
w - ’ c,=l,, YzYa%--82
in We present on Figure 2 the evolution of p1 = 1-y* function of T for the different assignment rules. We take the agreement p1 = 0 when the field is constant with zero variance. Whatever the rule is, the correlation is positive and decreases to zero. We can observe a kind of hierarchy between the rules: for a dumping rate less than 50%, the spatial correlation is more important for the rules which inforces the choice of A. Then the decrease is faster for “strong” rules: the correlation equals 0 for T = 75% in the case of the strong majority rule while 0 is reached for T = 0.95 in the case of the weak rule.
Dumping Influence o n N o n Iterative Dynamics
375
We also see that when the dumping rate is 50%, all the correlations are equal. We now compare the obtained features issued from configurations (C) with the case of layouts resulting from a random uniform distribution (CO) of the same final rates of sites A and B . The spatial characteristics of this new field (CO)are of course different and we compare them with those obtained from (C). For instance, Figure 2 show the different behaviours of p1 for fields (C) and ((30).In the uniform case, the correlation is always close to zero (about can be negative. So it is clear that spatial correlation is a good criterion to distinguish the fields, a positive correlation more than 0.002 corresponding to type (C).
I
.
.
.
0
10
2(1
yl
. I0
,
.
w
m
. 70
,
w
. (0
I
Irn
l 10
-#."b#t..--A
.
In
.
m
.
,
I)
-..",a
,a
,
rn
.
.
M
I0
.
m
.
rn
I
Irn
I
Fig. 2. left: correlation pl for fields (C); right: correlation pl for fields (CO) associated to fields (C); rules: x : weak ; 0 : Ising /3 = 0.5 ; o : Ising /3 = 1 ; 0 : Ising
p =3 ;*
3.3
: strong.
Spatial clustering measurements
Figure 3 below present an example of a realization of fields (C) and ((20). As expected we observe specific cooperative textures in the images (C). We propose here several indexes evaluating this spatial feature. Two clustering indexes Let us definite the absolirte cluster zndez I A as the number of edges joining neighbour sites which are together A, normalized by the total number of sites A.
IA
=
2 c i e s xi
where xi =
'
is equal to 1 if the site i has been assigned by A, and 0 else.
376
Recent Advances in Stochastic Modeling and Data Analysis
Fig. 3. king assignment rule /3 = 1, with initial rate T = 0.05; final A proportion: 63.04; I R = 1.2856 Left: pl = 0.4829, I A = 1.6177, ncc = 9, mcc = 286.89, mazcc = 2553. Right: pl = -0.0033, I A = 1.2583, ncc = 61, mcc = 42.33, maxcc = 2465.
The relative cluster zndez I R is defined as the ratio of the absolute cluster indexes of fields (C) and associated ((20).
Figure 4 show these indexes’ evolution according t o the initial rate T of standards A, for the different assignment rules and for the two types of fields. Concerning the absolute cluster index, its evolution is similar for fields (‘C) and associated ((20). It is the initial rate which allows t o distinguish the fields. When it is more than 50%, the curves are identical, while the values I A ( C ) are much more important than those of IA(C0) in case of small rates T . The smaller T is, the more important is the difference between the two fields, and this whatever the assignment rule. The threshold rate categorizing the fields is varying with the rule. For instance, it is T = 20% for the strong majority rule and 30% for the Ising ,6 = 0.5 rule. The graph of the relative cluster index (Figure ??, Right) confirms the previous remark. The decrease of I R is faster for a rule enforcing standard A. Finally, we conclude that the absolute cluster index is a good criterion t o determine fields issued from a choice procedure if we know that the initial dumping rate is low valued. Connectedness indexes A topological parameter which well characterizes clustering is a connectedness measure of standards A in the final configuration. The images obtained from simulations show that, for fields (C) issued form a choice procedure, a clustering organization of sites A appears. Moreover, the clusters become less numerous but wider when T increases. For the corresponding fields (C) with same final number of sites A but randomly uniformly dispatched, we get many and small clusters.
Dumping Influence o n N o n Iterative Dynamics
a
ID
B
s
40
m
M
,I)
M
377
8)
_r-r.."b..ll(.d.d.l
Fig. 4. Left: absolute cluster index for field (C);
Middle: absolute cluster index for field (C0)associated; Right : Te~atiwec~usteTindex; Rules : x : weak;O : Ising /3 = 0.5 ; o : king /3 = 1 ; 0 : Ising p = 3 ; * : strong. We propose t o calculate three connectedness indexes, for fields (C) and associated ((30); ncc is the number of connected components (of sites A); mcc is the mean size of these components, and maxcc is the size of the largest one.
Fig. 5 . mean number of connected components for fields (C)(left)and (Co)(right): Ising p = 0.5 ; o : Ising /3 = 1 ; 0 : Ising /3 = 3 ; * : strong
z : weak;O :
We show in Figure 5 the evolution of ncc according to the different assignment rules for fields (C) and corresponding ((20).Once again we observe a hierarchy between the rules. More interesting is the comparison of fields (C)
378
Recent Advances in Stochastic Modeling and Data Analysis
and (C0);their behaviour seems to be similar but the scale is different and when the initial contamination rate is low, we can clearly distinguish fields of type (C) and ((30).For each rule, the number of connected components is much more important for fields (CO)with random spreading. From an initial threshold rate which depends on the rules but not exceeding 30%, thcn the number ncc is similar for both fields. We turn t o the average size of connected components, defining the size of a component by the number of sites which lay inside. The evolution of mcc is given in Figure 6 for each rule and for fields (C) and (CC,).On the contrary of the previous index ncc, it is difficult to visually distinguish between the two behaviours, since the ordinates scale is very large. Therefore we plot for a single rule the evolution of ncc for the two fields; we give for example the case of the Ising rule with /3 = 1 in Figure 6, Right. We get the same behaviour for the other rules. The curves join and overlay for T 2 30%. For small values of I-, the two curves appear to be not so different and we could think that mcc is a bad criterion t o distinguish the fields; in fact, the ordinate scale is still large and for instance, 7=2% corresponds t o 144 for the field (C) and 13.8 for the associated ( C O ) that , is mcc(C0) is more than 8 times mcc(C). We conclude that mcc is useful t o determine the types of final configurations in the case of small rates of initial contamination. Besides, the effect of dumping is more important for small values of I-. Finally, we have calculated the size of the largest connected component maxcc. The previous comments apply again, even more significantly, since the increase of maxcc is faster than the one of mcc. However, the evolution curves are quickly identical, and we can clearly see the dumping effect only for the smallest values of the initial rate of standards A; for instance T < 10% in the case of the Ising rule with p = 1, see Figure 7 .
4
Distribution of the number of occupied neighbour sites
We can achieve some probabilities calculus on this non asymptotical framework. We consider a fix site visited on the scan tour. Whatever the assignment rule is, the choice of the agent depends on the number of his neighbours who have already make their decision. We give here the distribution of this number of “occupied” neighbours at the moment. We consider the lattice S with n sites such that cach site has the same number of neighbours v. We assume that at k = 0, n, sites are occupied by standard A (n, = nT); then at each time, a free site is randomly visited and becomes occupied. There are n - n, successive settings. For an arbitrary but fix site, we define the random variable Y k by the number of occupied
Dumping Influence o n Non Iterative Dynamics
.m/
,
,
'
379
I
Fig.6. Average size of the connected components for the fields (C)(left) and (Co)(middle);rules : x : wealc;O : king p = 0.5 ; o : Ising ,B = 1 ; 0 : king /3 = 3 ; * : strong Right: Average size of the connected components for the Ising rule with /3 = 1 : ( C ) : * ; (CO) : 1m
,
,
,
,
,
,
,
,
,
Fig. 7. king rule with p = 1; maxcc(C): * ;maxcc(Co) neighbours at time k . We suppose n, < n, n >> 2v in the set (max(0, n, k - n v),1,.., min(v, k - 1
+
Then we recognize that
+
Yk
+ 1; Y k takes its values + n,)} . We get
follows the hypcrgcometric distribution.
We can add two results. Let us define the events: Ak : "the site j is occupied at exactly time k"
380
Recent Advances in Stochastic Modeling and Data Analysis
“The site j is already occupied at time k” The index j does not appear in these probabilities since the scan is random. We get Bk :
We deduce P ( B k ) =
*.
Let us then explicit the mean probability P1 of the occupied neighbours during the course.
c p(yk l ) = L c c n ( k + ‘) n ( n
n--12,
p1 = 1 n-n7
-
k=l CL(n-l-u)!
Ct
k-l+n,--l
CtCn-1-V
n-n,
k=l
Ck-l+n, n.-1
”-1-1
n-nr
(n-n,).(m-l)!
Let us denote
Pi =
?l-nT
=
-
nT
-
k=l s=l u=o N,=n - n, = n(1 - r ) the number
N,”
-x
N,
1
(n- 1)(n-2). ..( n - u) ’ N,
IC
-
nT
N,
N,
1-T
1 and u - 1 being fix, we get:
).
of non initialized sites.
r1 I ( L + L - L u-1-1 )n
k=l s=l
-
u=O
(1-L-L) N,
N,
0
If the initial rate r = 0, then Po’s limit is
&.Else, we obtain by suc-
1
cessive integrating
+ n-cc
CL(l - T ) ”
c
&(&)l-k(v~~~~~k)!
k=O
which is
again
This formula is still valid for r = 0. It is interesting to know if Z L , ” >,1~ to see the dumping’s effect against non initial contamination. Without loss of generality, we can assume that u is even. If T 2 $ and 1 5 i, then Zi,”,, 5 1. In fact (1 - ~ ) ” ( & ) j 5 [r(1 - T ) ] ” ’ ~ 5 (1/4)”/’ which implies
(i)”C1 C:+l
ZL,~ 5, ~
j=O
5 (f)”
”12
c C:+l
j=O
can take values less or more 1 (but 5 u
5 1 since
+ 1).
U
P
C C:+l
= 2”. Else,
ZL,~,,
j=O
References 1.Besag J., 1974, Spatial interaction and the statistical analysis of lattice systems, JRSS B, 36, 192-236 2.5. Besag and P.A.P. Moran, On the estimation and testing of spatial interaction for Gaussian lattice processes, Biometrika, 62, 3, 555-562, 1975
Dumping Influence on Non Iterative Dynamics
381
3.Bikhchandani S., Hirshleifer D., Welch I., 1992, A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political economy, Vol. 100 n5 4.Bikhchandani S., Hirshleifer D., Welch I., 1998, Learning from the behaviour of others: conformity, fads, and informational cascades. Journal of economic perpectives Vol 12 n3 5.Chopard B., Dupuis A,, Masselot A., Luthi P., 2002, Cellular automata and lattice Boltzmann techniques : an approach to model and simulate complex systems. Adv. Complex systems 5 Vol 2-3, 103-246. 6.A.D. Cliff and J.K. Ord, Spatial nutocorrelation, Pion, Londres, 1981, 2d. Ed. 7.Cox J.T., 1989, Coalescing random walks and voter model consensus times on the torus Z d , Ann. Proba., Vol. 17,n4,1333 - 1366 8.Galam S. 1997, Rational group decision making: a random field ising model a t T=O. Physica A 238. 66-80. 9.Geman D., 1990, R m d o m fields and inverse problem in. im.agin,g, Ecole de Saint Flour, Lect. Notes in Maths. n1427, Springer 10.Guyon X. et Hardouin C., 2001, Standards adoption dynamics and tests for non spatial coordination, Lecture Notes in Statistics 159, 39-56, Springer ll.Orl6an A., 1998, Informational influences and the ambivalence of imitation. Lesourne & Orlean Eds. Advances in self organization and evolutionary economics. 39-56. Economica. 12.Lesourne & Orlean Eds. Advances in self organization and evolutionary economics : Introduction. 1-7. Economica.
Firm turnover and labor productivity growth in the Italian mechanical sector Luigi Grossi’, Giorgio Gozzi’
’ University of Verona
Facolt5 di Economia Via dell’artigliere, 19 37129 Verona, Italy (e-mail: luiai.arossiiir?.univr.it) * University ofparma Facolt’i,di Economia Via Kennedy, 6 43100 Parma, Italy (e-mail: giorgio.eozzi(ir).unipr.it) Abstract. This paper provides estimates of labor productivity growth in Italian mechanical sector during the period 1997-2002 analyzing its determinants with particular attention to the role played by turnover (exits and entrants) and company size. Data come from the longitudinal company accounts database of Research Centre of Unioncamere.The analysis put in evidence the global slowdown of productivity and the positive impact on productivity growth given by gross firm turnover. Transition matrices reveal a relevant degree of persistence in the period and a positive relation of this persistence with firm size. We apply the decomposition of [Baily et a/., 19961 to analyse the role of turnover: survivors account in negative of overall growth, on the contrary, the contribution of entrants and exits is positive and very high. Keywords: firm turnover, labor productivity growth, longitudinal data, transition matrices.
1
Introduction
Traditional productivity studies have typically used aggregate - andor industry level data to study the sources and patterns of productivity growth. However, theoretical studies of industrial organization have suggested that aggregate productivity growth typically stems from behavior at the firm - or plant - level. In fact, these theories have shown that plant-level dynamics due to plant heterogeneity is one of the most important factors in the evolution of industrial productivity ([Ericson and Pakes, 19951; [Olley and Pakes, 19961; [Baldwin and Gu, 20061). Economic literature about global productivity is extremely rich and an extensive survey is provided in [Hulten, 20001. In this paper we will focus on labor productivity (LP from now on), which is only one component of Total Factor Productivity (TFP). The fact that LP varies widely between plants and companies is well-known ([Bartelsman and Doms, 20001; [Bottazzi et a/., 20021). This observation give raises to a number of questions. Performance in any one year is to some extent a matter of luck, so both the good and bad performer of a 382
F i n n Turnover and Labor Productivity G r o w t h 383 particular year will tend to appear averaged in a different one. So how much of the variation is due to purely transitory factors and does it therefore diminish over time? Second, what role does the process of competition play? Does the spur of competition cause laggard firms to improve? Or does it simply cause them to be eliminated by liquidation or take over? Up to now, it has been difficult to analyze these questions in a rigorous way because of a lack of adequate data covering companies of each size in the whole Italian economy (not just manufacturing). Recent years have seen a relevant increase in studies on productivity. This is partly due to rising availability of longitudinal micro-level data (LMD).This paper is a preliminary attempt to give an answer to the previous questions, using the large data base of company accounts constructed by Research Center of Unioncamere. In our study we investigate the distribution of LP in an important Italian manufacturing sector: the sector Dk29. Available data from the Unioncamere Database (UD) allow to compute only LP measured by value added per employee. Our analysis puts in evidence the positive impact of firm demography in raising the overall efficiency of the mechanical sector. In the period 1997-2002turnover caused in fact an increase of LP because companies with productivity above the mean entered and companies with productivity below the mean exited. Notwithstanding, the impact of the incumbent companies on productivity growth was not positive as well, but on the contrary their average productivity made worse. This is an unexpected result because it seems sensible to think that competition could raise productivity in incumbent firms encouraging firms to innovate, to reduce costs and to improve the organisation of production. Transition matrices computed over the examined period show that companies enter with different intrinsic productivity and maintain more or less the same position in the distribution of the sector: the diagonal cells of the transition matrices show that a large fraction of companies in a productivity size class remaining in the same class six years later. This means that the mechanical sector evidences a permanent heterogeneity among companies. A decomposition of productivity growth shows a negative reallocation effect between companies. This can be interpreted as reflecting the creative destruction processes: for the companies of the mechanical sector the destruction has been creative, the turnover has contributed to the productivity growth even if partly smoothed because of a negative reallocation effect growth. The remainder of this paper is organized as follow: section 2 describes the variables used to compute the LP, how to deflate them, the labor productivity dynamics. In section 3 we analyze the dynamics of LP during the period 19972002 using transition matrices. Section 4 reports the decomposition of LP with respect to different groups of companies defined according productivity and employment growth and then according firm size. The final section contains some comments on results and final remarks.
384
Recent Advances in Stochastic Modeling and Data Analysis
2
Data, measurement procedure and labor productivity
Data for our research is taken from the administrative file of company accounts of Cerved, which is the largest and most accurate database of company accounts in Italy , suitably processed from the statistical point of view by Unioncamere. The data contained in the Cerved file has been integrated with the data of REA information system file by Research Centre Unioncamere. For more detailed information about the data, see [Ganugi et a/.,20051. The source of data in our research is the universe of companies of the Italian mechanical sector (Branch Dk29, Construction of machines and mechanical devices) in operation in the period 1997-2002 . As it is known, the Italian mechanical sector represents - both for Added Value and Exports - one of the strategic sectors of the Country . The UD can be considered an “open” database because it contains firms which enter the set after transformation of their juridical type (individual firms or partner-ship becoming companies) companies which follows an opposite procedure and consequently exit from the set, and firms which enter and exit in the same period (mergers, wind-up, bankruptcy proceedings). Each company has a unique identification number. We identify entry and exit by linking successive years together. If a unit has a new identification number, it has entered. If a previous number has disappeared, then it has exited and if the number is the same, the unit is a survivor. In order to limit the possible impact of measurement problems, it was decided to use definitions of continuing, entering and exiting firms on the basis of three (rather than the usual two) time periods. Consider an establishment observed in t. If it is present in t-1 but absent in t+l it is an exitor. Likewise, if it was absent in t-1 but present in t+l it is an entrant. However, it is also possible to be absent in t1 and f+l, i.e. an entrant that exits after one year. This latter category is thus both an entrant and an exitor. Consequently, the number of establishments in t consist of stayers plus entrants plus exitors less one-year-only establishments. LP estimates are derived as the ratio of a measure of output and inputs. Output can be measured in different ways and this can lead to different estimates of productivity growth. Basic measures of output are added value, gross output and sales. Company account data allow us to measure LP either by value added per employee (LPAV), by gross production (output) (LPGO) per employee, or by sales per employee (LPSA). In order to make our computations comparable to previous study about productivity dynamics in firms panel data, we focused on LPAV. To take the inflationary dynamics into account, it is necessary deflating the nominal Added Value at the industry level (all companies in an industry use the same deflators). Deflation of this variable involves double deflation because the volume change of added value combines the volume change of gross output and intermediate inputs. We deflated the added value with an industry implicit price deflator of double deflation. For each company group we computed the above cited deflated productivity index called LPAV95. Weighted means for each year are reported in Table 1
Firm Turnover and Labor Productivity Growth 385 Year 1997 1998 1999 2000 2001 2002 Weighted Mean
All 44626 44497 45131 46289 47274 461 I6
Survivors 45 120 45742 45965 47033 46849 44600
45684
45897
Stayers
Entrants
44943 45439 46958 47522
38283 41401 39458 50590 63250
46248
48298
Exitors 40402 39201 39592 4 1446 44478
42145
Table 1. Labor uroductivitv in terms of deflated Added Value for different types of companies (eurolire 1995 per employee)
In the whole period (1997-2002) the LPAV of all companies is increased of 3.34. These findings are much lower to those given in previous OECD studies [Barnes et al., 20011 where, for the Italian manufacturing sector, the growth rate of LPAV was 27.8% in the period 1985-1990 and 20.9% in the period 19871992. Thus, the growth of LP in Italian mechanical companies showed a strong slowdown.
3
Productivity growth analysis through transition matrices
Table 2 shows a 6 year transition matrix for LPAV95. Each diagonal cell shows the fraction of companies in the quantile remaining in the same quantile 6 years later. The figures in the top row and first column show the 20th, 40th etc. productivity quantiles, with the 20th being the bottom productivity quantile and the 100th being the top. The quantile positions of entrants and exitors are also shown. Consider the top left cell of the matrix. This shows that 23% of companies in the lowest productivity quantile were still in the lowest quantile after 6 years. Working along the row, the next cell shows that 13.4% of companies in the lowest quantile (20th) in 1997 moved up into the second quantile (40th) in the last year. Note (column entitled 100%) that only 4% of companies move into the top quantile. The column headed Exitors shows that 48.3% of companies from bottom quantile exited. A number of interesting features emerge from this analysis. First, the diagonal elements are the highest elements, indicating that between 22% and 32% of companies stay in the same quantile for these six year gap. Second, much fewer establishments move quantiles; the off diagonal elements are all smaller than the diagonal elements. It is also interesting to note that, excluding Exitors and Entrants, the upper triangular part of the three matrices contains always lower values of the corresponding cells of the lower triangular part. This means that, in the considered period, survivors reduces on average the productivity as it is pointed out in Table 1.
386
Recent Advances in Stochastic Modeling and Data Analysis
auantilesof LPSA (Constant Prices= 1995) in 1997
First (20%) ~, Second(40%) Third (60%) Fourth(80%)
quantiles of LPAV95 (constant prices = 1995) in 2002 First Second Third Fourth Fifth (20%) (40%) (60%) (80%) (100%) Exitors 23.0% 13.4% 6.4% 5.0% 4.0% 48.3% 13.4% 22.7% 17.5% 8.9% 4.2% 33.4% 6.5% 17.6% 22.2% 15.5% 5.8% 32.3% 4.4% 8.8% 17.9% 27.1% 12.5% 29.3%
Total 100% 100% 100% 100%
Fifth(100%) Entrants
3.6% 27.8%
100% 100%
5.5% 18.4%
8.3% 18.9% 16.1% 14.3%
32.0% 23.5%
31.7%
Table 2. LP transition matrix, all companies (LPAV95)
Third, the majority of companies who exit are from the lowest quantile. Finally the last row considers the entry rates. The first column show the fraction of companies in the first quantile, who entered at some point over the period (about 28%). The fractions are rather evenly spread between 1997 and 2002, while companies that enter spread over many parts of the distribution. The distribution present a bimodal u-shaped distribution: the majority enter in the first and the last quantile. In reality, between 1997 and 2002 the mean and the median of LP distribution increase if we consider the group “all companies” and decrease regarding “survivors”.
4
Decomposing productivity growth
Suppose we divide manufacturing establishments up into a number of groups. We wish to know what is the contribution of each group to overall productivity growth over a given period. A decomposition close to the one used by Baily et al. (1996) will be employed . Let Y,, be total generic output (added value, gross output or sales) and EiI be employment in group j at time t. Then LP in group j is defined as LP,l = Y,, I E,, ,j = 1,2,...,k t = 0,...,T . It can be shown [Gozzi et al., 20051 that the productivity growth of the sector from 0 to T can be decomposed as follows: Aggregate productivity growth = Within group effect + Reallocation levels effect + Reallocation growth effect. The within effect capture the gain in aggregate productivity coming from within companies productivity growth weighted by initial output share. The reallocation level effect captures the gain in aggregate LP coming from the expanding market of high productivity companies, or from low-productivity companies’ shrinking share weighted by initial shares while the reallocation growth effect captures the gain in aggregate LP coming from high productivity growth companies’ expanding shares or from low-productivity growth companies’ shrinking shares. With reference to the time series of all companies in the period 1997-2002 for sector DK29, the real output (1995 prices) is increased, yearly on average over
Firm Turnover and Labor Productivity Growth 387
the six years, by 1.11% in terms of added value. In the same period employment rose by 0.42% and productivity by 0.67% (see Table 3). Exits accounted for 28% of employment in 1997 and productivity in these companies was 3.9% lower than those of the survivors. So, the disappearance of these companies would certainly have raised productivity. Apart from exits, several other changes were going on at the same time. First, productivity is decreased in survivors by -0.23%. Exitors were replaced by new entrants which accounted for 17.5% in 2002 employment. These new companies had higher productivity than the survivors as attained by 2002, and the percentage gap is very larger than that between exitors and survivors in 1997. Such increase covers the reduction of productivity of the survivors: globally, companies operating in 2002 (survivors plus entrants) present a positive growth of productivity with respect to the total companies operating in I997 (survivors plus exitors) . Concluding, in the period 1997-2002 turnover caused an increase of LP because companies with productivity above the mean are entered and companies with productivity below the mean are exit. Unfortunately the impact of the incumbent companies on productivity growth was not positive as well, but on the contrary their average productivity made worse. This is against any sensible expectation because, one would expect competition to raise productivity in incumbent firms. It encourages firms to innovate by reducing slack, putting downward pressure on costs and providing incentives for the efficient organization of production.
Exits/ entrant Survivor Total
Share empl. 1997
Share
empl.
Product ivity
Product ivity
2002
1997
2002
Euro
Euro
%
%
1995
1995
28.3 71.7 100.0
17.5 82.5 100.0
43376 45122 44627
53277 44600 46116
Annualized growth of Value Emplo Produ added pent ctivi tY %
-4.53 3.25 1.11
%
%
-7.40 3.52 0.42
4.56 -0.23 0.67
Table 3. Productivity (LPAV95) and employment: survivors versus exits and entrants
We apply the decomposition of productivity growth to analyse the role of turnover. Table 4 shows the decomposition of productivity growth between survivors on the hand and exitdentrants on the other. From the last column, we can see that survivors account in negative of overall growth. On the contrary, the contribution of entrants and exits is positive. These findings are very lower than those reported for productivity in Italy by [Scarpetta et a/., 20021 in previous periods for the same sector using OECD data: +3.5 in 1987-92, +4.7% in 199297. The within effect gives a positive contribution 65% while the reallocation effects (level plus growth) in the period 1997-2002 show a contribution to the LP growth larger than one third. Globally, the total reallocation effect has a negative contribution which comes from a mix of a positive and fair level effect and a consistent negative growth effect. What does a negative reallocation contribution
388
Recent Advances in Stochastic Modeling and Data Analysis
to LP growth suggest? Negative reallocation contributions are often interpreted in the literature as reflecting the creative destruction processes while the within effects are interpreted as reflecting more traditional sources of productivity growth (the average firms become more productive with advancing technology). However, rather than being alternatives, these effects (within vs. reallocation) may be closely related. 1997-2002
Within effect %
Exits/ Entrants Survivors Total
1.26 -0.17 1.09
Reallocati on effect: level % 0.06 0.02 0.08
Reallocation effect: growth % -0.41 -0.10 -0.51
Total contribution &
0.91 -0.24 0.67
Table 4.Decomposition of productivity growth, 1997-2002:survivors versus exits and entrants
5
Conclusion
In this paper productivity growth in Italian mechanical sector (Dk29) since 1997 to 2002 has been analyzed. The database is represented by the UD which includes the universe of companies of Italy for the same period. The choice has been to construct a measures of LP based on Value added which is the most used balance item in productivity literature and allows direct comparisons. The results obtained through our analysis are rich and put in evidence different aspects of this sector of Italian economy. 1) The survivors have the considerable loss of LP which can be explained by a conjoint effect operating intensely in a relevant group of unsuccessful upsizers companies (about 40%): the improvement of employment and the impossibility to improve proportionally the levels of output have produced the worst negative rates of productivity growth. How much this can be considered a cyclical or structural aspect of Italian economy it can not be decided in this paper. What can be surely affirmed is the increasing relevance of this phenomenon in 2002 respect to 1997 (see, [Gozzi e t a / . ,20051) 2) Entrants are characterized by relevant higher LP than the survivors. Perhaps it is in consequence of their better performance that allowed them to enter. At the opposite exitors have lower LP than survivors. An high degree of turbulence within the sector can then be considered efficient because it involves respectively the exit and the entrance of low and high productivity units. From a policy perspective, this result is in favor of the necessity to curb the costs of entrance and exit of firms in the sector. 3) Transition matrices reveal a considerable persistence in the level of LP. A percentage of firms which goes from 22% to 32% maintains the same level of productivity in 2002 with respect to 1997. Furthermore, the degree of persistence increases considerably with the LP level.
Firm Turnover and Labor Productivity Growth 389 4) The LP growth of the universe (“all companies”) is decomposed in 3 components. The important role of turnover emerges very clearly, net entry gives a strong positive contribute to the productivity growth in the period 1997-2002. 5) A further general observation is linked to the necessity to take into account the business cycle which, in 2000-2002, was definitely negative. In this period, incumbent companies (which in the paper were called “survivors”) could have been the major looser in terms of productivity during a negative phase of the business cycle, because, being more structured, have a larger fraction of unused plants and employees. O n the contrary, smaller companies are more laborintensive, flexible and adaptive. This could be one possible interpretation of larger productivity of new entry companies. Some results obtained splitting the period 1997-2002 in two sub-periods (1997-2000: expansive phase, 2000-2002: recovery phase) can support this conclusions. In 1997-2000 the annualized L P growth of incumbents was +2.07. In 2000-2002 L P o f incumbents decreased yearly on average much more than in the entire period 1997-2002 (1.68%). 6) W e conclude, that more detailed research is needed to unravel the complex dynamic forces that determined LP growth.
References [Baily et al., 19961 M. N. Baily, E. J. Bartelsman, and J. Haltiwanger. Downsizing and productivity growth: myth or reality?, in Mayes, D. G., (ed.), Sources ofProductivity Growth, Cambridge, Cambridge University Press, 1996. paldwin and G y 20061 J. R. Baldwin and W. Gu. Plant Turnover and Productivity Growth in Canadian Manufacturing.Industrial and Corporate Change, 15: 417-465,2006. [Barnes et al., 20011 M. Barnes, J. Haskel and M. Maliranta. The Sources of Productivity Growth: Microlevel Evidence for the OECD. OECD working paper DSTI/EAS/IND/SWP/AH(200 1)14,2001. [Bartelsman and Doms, 20001 E. J. Bartelsman and M. Doms. Understanding Productivity: Lessons fiom Longitudinal Microdata. Journal of Economic Literature, 38: 569-594,2000, [Bottazzi et al., 20021 G. Bottazzi, E. Cefis, G . Dosi. Corporate Growth and Industrial Structure. Some Evidence from the Italian Manufacturing Industry. Industrial and Corporate Change, 11: 705-723, 2002. @%cson and Pakes, 19951R. Ericson and A. Pakes. Markow perfect industry dynamics: a framework for empirical analysis. Review ofEconomic Studies, 62: 53-82, 1995. [Ganugi et al., 20051 P. Ganugi, L. Grossi, G. Gozzi. Testing Gibrat’s law in italian macro-regions: analysis on a panel of mechanical companies. Statistical Methods and Applications, 14: 101-126, 2005. [Gozzi et d.,20051 G. Gozzi, L. Grossi, P. Ganugi, C. Gagliardi. Size, growth and productivity dynamics in italian mechanical fums. Proceedings of Monitoring ha&, 2005. [Hulten, 20001 R.C. Hulten, Total factor productivity: a short biography, NBER Working Paper No 7471. [Olley and Pakes, 19961 G. T. Olley and A. Pakes. The dynamics of productivity in the telecommunications equipment industry, Econometrica, 64: 1263-1297, 1996. [Scarpetta et al., 20021 S. Scarpetta, P. Hemmings, T. Tressel and J. Woo, The role of policy and institutions for productivity and firm dynamics: evidence from micro and industry data. OECD Economics Department Working Papers 329, OECD Economics Department, 2002.
Continuous Sampling Plan under an acceptance cost of linear form Nicolas Farmakis and Mavroudis Eleftheriou Aristotle University of Thessaloniki Department of Mathematics, Statistics and Operational Research Sector 54124 Thessaloniki, Greece (e-mail: f armakisOmath. auth.gr and e-mail: melef thQgmail .corn) Abstract. This paper examines the economic performance of CSP-1, under specified outgoing quality limit (AOQL) and a realistic assumption of linearly variable acceptance cost. A mathematical programming model is developed t o determine the unique combination of the plan's parameters (i", f") for which minimum total cost, per item produced, is achieved. Extended sensitivity analysis explores the behavior of the proposed model and validates its satisfying adaptation to real quality control conditions. Keywords: CSP-1, economic model, acceptance cost.
1. Introduction Continuous sampling plans were introduced by Dodge (1943) and ever since have prevailed in the quality control procedures of continuous flows of discrete products. The CSP-1 involves the alternation of periods of 100% inspection and sampling, and assumes that the manufacturing process is statistically controlled, meaning that the probability of producing a non-conforming item is constant and equal to p . The modern approach to continuous sampling plans is characterized by economic objectives. The cost minimization of a plan's implementation turned to be the state of the art in the research area of modern industry. The basic costs involved in the implementation of a continuous sampling plan are the inspection cost, the replacement cost and the acceptance cost. Inspection cost expresses the labor cost of the inspector and the cost of using the inspection equipment and can be considered constant (per unit)during a quality control process. The replacement cost reflects the cost of replacing an item found non-conforming and can also be considered constant. The acceptance cost expresses the cost of not inspecting a non conforming item during the sampling phase and is related to the cost of the loss of the customer's goodwill. It can be argued that the last cost is very susceptible to slight changes in the plan's parameters or to the general conditions of the process. In this paper, the economic design of CSP-1 under linear acceptance cost is presented. A modification t o the Cassady et al's (2000) model allows formulating a mathematical programming model and determining the parameters values that ensure minimization of the total expected cost per item produced. 390
Continuous Sampling Plan 391
2. The economic model of CSP-1 under linear acceptance cost To avoid unnecessary complexity, the proposed economic design of continuous sampling plans will be examined only for the case of the simpler one, CSP-1, as it was introduced by Dodge (1943). This economic design is based mainly on the findings of Cassady et a1 (2000). Defining the inspection cycle as the portion of the production flow constituted by one phase of 100% inspection and one sampling phase (Figure l ) , Cassady et a1 (2000) claimed that the total expected cost of implementing CSP-1 per unit produced during one inspection cycle is dependent on the following three costs: 0
0
0
the fixed cost (cs) of inspecting an item. Let Cs be the random variable expressing the inspection cost per item produced. It is easily derived that E(C,) = c,AFI(p), where AFI(p) is the long run average portion of items inspected. the fixed cost (c,) of replacing an item, that has been found t o be nonconforming. This cost includes the cost of producing the replacement item, the inventory and obsolescence costs associated with storing the replacement items, and the costs of reworking or disposing of the found non-conforming item. Let C, be the random variable expressing the replacement cost per item produced. Obviously, E(C,)=c,AFI(p)p. the cost of accepting (without inspection) a non-conforming item during the sampling phase. This cost includes warranty costs, service costs, liability costs and the costs of the loss of the customer’s goodwill. Let C, be the random variable expressing the acceptance cost per item produced. Then E(Ca)=ca [l-AFI(p)]p.
Fig. 1. The inspection cycle of CSP-I.
Experience and common sense support the following relationship between the three costs [Cassady et al., 20001: cs
I cr 5 ca
(1)
The values of c, and c, are usually fixed and basically dependent on the kind of products checked, the special conditions of quality control imposed by the product features or the consumer’s demands and the structure of the production line. The value of c, however cannot be considered constant because it is composed by additional fluctuating costs reflecting the variable consumer’s loss of goodwill (dependent on the quantity of defective items
392
Recent Advances in Stochastic Modeling and Data Analysis
encountered), the cost of withdrawing defective items from the market and the cost of preserving the market's tolerance (dependent on general financial conditions). Therefore the economic design of a continuous sampling plan would have been more realistic if it involved a variable acceptance cost. This economic design is expressed through the following assumptions which are also considered fundamental for formulating a mathematical programming model:
0
0
0 0
The per item inspection and replacement costs are considered constant. the cost of accepting a non-conforming item found during the sampling phase is linearly proportional to the average number of items (per inspection cycle) not inspected in the sampling phase, the production process is under statistical control (the probability of an item to be non-conforming is constant and equals p ) , the inspection is perfect, non-conforming items are repaired or replaced.
From Cassady [Cassady et al., 20001, the total expected cost per item produced is then given by the sum:
If we define u as the expected number of items inspected during the phase of 100% inspection, and v as the expected number of items passing during the sampling phase of an inspection cycle, then it can be derived [Duncan, 19861: that u and IJ for the case of CSP-1 are:
u=-
1 - 4% PqZ
v=-
1 fP
where q=l-p. Thus AFI(p) is easily computed:
Substituting the AFI (p) from (5) into (2), we take the total expected cost per item produced:
However according t o the second assumption, the acceptance cost is variable and defined as:
Continuous Sampling Plan 393
+
c, = X p ( 1 - f)wp (7) where X is the constant of the per item acceptance cost composed of a fixed portion and p is the constant of the per item acceptance cost composed of a variable portion. Substituting equation (7) into (6), we take the following analytical form of the total expected cost per item produced, given that the acceptance cost is not constant.
E ( C ) = C s f 2 + C T f 2 P + (1 - f)P(Ii((l - f ) P + Xf)
+
(8)
f(qZ(1 - f) f ) After acquiring the analytical expression of the total expected cost, the determination of the CSP-1 parameters that provide both a specified AOQL and minimization of the total expected cost is mathematically and computationally feasible. Modifying Cassady et al's model, the nonlinear mathematical programming model arising is as follows:
Minimize
subject to max (AOQ) = AOQL
WPll
(9)
i 2 O,integer,and 0 5 f 5 1 where AOQ is the average outgoing quality and AOQL is the specified by the consumer maximum value of AOQ. Dodge [Dodge, 19431 derived that: PI =
iAOQL
+1
i+l (1- p$+l
f = zAOQL . + (1 - pl)i+l
(11)
where p1 is the value of p where AOQL is achieved. Therefore, equation (11) gives the value off (for a given value of i) for which AOQL is reached. Thus the mathematical programming model (9) reduces to:
Minimize
E ( C )=
Csf2
+ G f 2 P + (1 - f ) 2 P q i P + f(1- f ) W i X f ( q i ( 1 f ) + f) -
subject to (1 -
f= ZAOQL
iACWf+l)i+l
+ (1 - iAoQL+l)i+l i+l
394
Recent Advances in Stochastic Modeling and Data Analysis
i 2 0 , integer, and 0 5 f 5 1 For given values of i, and using the constraint of the model (12) to determine (z*, f ? can be found that ensures minimum E(C).
f, a unique combination of
3. Realistic values for the parameters of the linear model - Numerical Example The effect of the C, (in eq. 7) on the configuration of the total cost is substantial, as will be demonstrated in the extended sensitivity analysis of the next paragraph. Consequently, the choice of these parameters should be thoroughly tested as well as their adaptation to real conditions of quality control. However, experience has led us to criteria usually adequate to limit the number of probable values of these parameters. More specifically, as a rule of thump, the values of X and p should ensure that: 0
0
the value of c, is greater than the inspection and replacing costs, i.e. c, > c,, c,, as stated in (l), the value of c, is at most five times the replacing cost, i.e. c, 5 5c,.
For example, for c,=l, cT=20, AOQL=O.l% and p=0.25%, a combination of X and p that ensures relation (1) and at the same time total cost is minimized, is (X,p) = (1,lO). For this example, the minimum cost is achieved for (i, fl=(551, 0.277) and the acceptance cost per unit produced is 27.043. 4. Sensitivity analysis
4.1. The effect of the incoming fraction of defective items ( p ) Table 1 presents the behavior of the minimum E(C) for CSP-1, for specific values of the parameters and varying p . It is easily perceptible that: 0 0 0
0
i* decreases firmly as p increases. f* increases firmly as p increases. AFI increases with p . This finding was expected because as p increases it is more likely that 100% inspection will be conducted. E(C) is increasing with p , while the acceptance cost c, decreases.
4.2 The effect of the variable portion of the acceptance cost ( p ) . Table 2 depicts the behavior of i* and E(C) as the variable portion of the acceptance cost ( p ) increases. It is easily observed that i*decreases and E(C) increases slightly[]. 4.3. The effect of the screening cost c, Table 3 provides the intuition that as the screening cost c, increases, the i* increases, f* decreases gradually and E(C) is also rapidly increasing. 4.4. The effect of the replacement cost c, The last but not least economic factor whose effect on the economic performance of CSP-1 should be examined is the replacement cost c,. As shown
Continuous Sampling Plan 395 P
0.0020 0.0021 0.0022 0.0023 0.0024 0.0025 0.0026 0.0027 0.0028 0.0029 0.0030 0.0031 0.0032 0.0033 0.0034 0.0035 0.0036 0.0037 0.0038 0.0039 0.0040
i
f
752 0.1871 709 0.2031 670 0.2190 633 0.2354 600 0.2514 569 0.2675 54 1 0.2832 515 0.2987 49 1 0.3139 469 0.3287 449 0.3429 430 0.3570 413 0.3703 397 0.3833 382 0.3960 368 0.4083 355 0.4202 343 0.4315 332 0.4422 321 0.4533 311 0.4636 -
AFI 0.5091 0.5308 0.5508 0.5695 0.5867 0.6028 0.6176 0.6315 0.6445 0.6565 0.6678 0.6785 0.6884 0.6978 0.7066 0.7149 0.7228 0.7302 0.7373 0.7440 0.7504
E(C) 0.5646 0.5850 0.6042 0.6224 0.6395 0.6556 0.6709 0.6853 0.6990 0.7119 0.7242 0.7359 0.7470 0.7576 0.7678 0.7775 0.7868 0.7958 0.8044 0.8127 0.8207
Ca
35.7599 32.3935 29.5326 26.9783 24.8253 22.9053 21.2529 19.7854 18.4859 17.3396 16.3336 15.4089 14.6065 13.8723 13.2021 12.5920 12.0385 11.5386 11.0895 10.6489 10.2557
Table 1. Economical results for CSP-1 under different values of incoming quality p (X=l, p=8, c,=l, c,=20, AOQL=O.l%).
in table 4 the i* gradually increases with c, while f* decreases slowly. The total expected cost per item produced E(C) increases slowly with c,. 5 . Conclusions In this paper an extended modification of Cassady et al’s (2000) model is proposed attempting to minimize the total expected cost per unit produced, under: 0 a specified average outgoing quality limit (AOQL) and 0 the more realistic assumption that the acceptance cost c , varies linearly with the not inspected, non-conforming items during sampling phase. Extended sensitivity analysis has been conducted t o scrutinize the effect of a fluctuating set of parameters on the economic performance of CSP-1. Much research has to be done to thoroughly examine the economic performance of continuous sampling plans, under more complex cost conditions. The case of non linearly variable cost promises an optimized adaptation to real quality control conditions.
396
Recent Advances in Stochastic Modeling and Data Analysis P i 1 650 2 636 3 623 4 611 5 600 6 589 7 579 8 569 9 560 10 551 11 543 12 535 13 528 14 520 15 513 16 507 17 500 18 494 19 488 20 482
f 0.2277 0.2341 0.2401 0.2459 0.2514 0.2569 0.2622 0.2675 0.2724 0.2774 0.2820 0.2866 0.2908 0.2956 0.2999 0.3036 0.3081 0.3119 0.3159 0.3198
AFI 0.6000 0.6002 0.6005 0.6008 0.6012 0.6017 0.6022 0.6028 0.6033 0.6040 0.6046 0.6053 0.6059 0.6067 0.6074 0.6080 0.6088 0.6096 0.6103 0.6111
E(C) 0.6345 0.6378 0.6410 0.6441 0.6471 0.6500 0.6529 0.6556 0.6583 0.6609 0.6635 0.6660 0.6684 0.6708 0.6731 0.6754 0.6776 0.6798 0.6819 0.6840
ca 4.3916 7.5450 10.4927 13.2651 15.8908 18.3497 20.6998 22.9053 25.0366 27.0429 29.0066 30.8630 32.7059 34.3598 36.0159 37.6929 39.1801 40.7024 42.1518 43.5296
Table 2. Economical results for CSP-1 under different values of p ( X = l , c,=l, c,=20, AOQL=O. l%,p=O.25%). cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
i 569 610 626 635 641 645 647 649 651 652 654 654 655 656 657
f 0.2675 0.2464 0.2387 0.2345 0.2318 0.2300 0.2290 0.2282 0.2273 0.2268 0.2259 0.2259 0.2255 0.2250 0.2246
E(C) AFI 0.6028 0.6556 0.6009 1.2572 0.6004 1.8578 0.6003 2.4581 0.6002 3.0583 0.6001 3.6585 0.6000 4.2586 0.6000 4.8586 0.6000 5.4587 0.6000 6.0587 0.6000 6.6588 0.6000 7.2588 0.6000 7.8588 0.6000 8.4589 0.6000 9.0589
ca 22.9053 25.4657 26.5120 27.1126 27.5178 27.7902 27.9270 28.0643 28.2020 28.2710 28.4094 28.4094 28.4787 28.5482 28.6178
Table 3. Economical results for CSP-1 under different values of screening cost cs ( X = l , p=8, cr=2O, AOQL=O.l%, p=0.25%).
Continuous Sampling Plan 397 i Cr -
15 16 17 18 19 20 21 22 23 24 25
568 569 569 569 569 569 569 570 570 570 570
f 0.2681 0.2675 0.2675 0.2675 0.2675 0.2675 0.2675 0.2670 0.2670 0.2670 0.2670
AFI 0.6028 0.6028 0.6028 0.6028 0.6028 0.6028 0.6028 0.6027 0.6027 0.6027 0.6027
E(C) 0.6481 0.6496 0.6511 0.6526 0.6541 0.6556 0.6572 0.6587 0.6602 0.6617 0.6632
Ca
22.8450 22.9053 22.9053 22.9053 22.9053 22.9053 22.9053 22.9658 22.9658 22.9658 22.9658
Table 4. Economical results for CSP-1 under different values of replacement cost cr ( X = l , p=8, c,=l, AOQL=O.l%, p=0.25%).
References [Balamurali and Jun, 2006]S. Balamurali and C.H. Jun. Average outgoing quality of csp-c continuous sampling plan under short run production processes. Journal of Applied Statistics, 33:139-154, 2006. [Blackwell, 1977lR. Blackwell. The effect of short run production on csp-1. Technometrics, 19:259-263, 1977. [Cassady et al., 2OOO]C. Cassady, -, and et. al. Demonstrating deming’s kp rule using an economic model of the csp-1. Quality Engineering, 12:327-334, 2000. [Dodge, 1943lH. F. Dodge. Inspection plan for continuous production. The Annals of the Mathematical Statistics, 14:264-279, 1943. [Duncan, 1986lA.J. Duncan. Quality Control and Applied Statistics. Irwin, 1986. [Farmakis and Eleftheriou, 2005lN. Farmakis and M. Eleftheriou. Continuous sampling plans for industrial quality control. In Proceedings of 18th Pan-Hellenic Conference of Statistics, 2005. [Farmakis and Eleftheriou, 2006alN Farmakis and M. Eleftheriou. The effect of speech disorders on the quality of life. Communications i n Dependability and Quality Management, 9:98-104, 2006. [Farmakis and Eleftheriou, 2006blN. Farmakis and M. Eleftheriou. A new continuous sampling plan without statistical control. In Proceedings of 18th PanHellenic Conference of Statistics, 2006. [Farmakis, 2002lN. Farmakis. Introduction to Sampling. Thessaloniki, 2002. [Kandasamy and Govindaraju, 2OOO]C. Kandasamy and K. Govindaraju. Design of generalized csp-c continuous sampling plan. Journal of Applied Statistics, 27:829-841, 2000. [Murphy, 1959lR.B. Murphy. Stopping rules with csp-1 sampling inspection plans. Industrial Quality Control, 16:lO-16, 1959. [Stephens, 1990lR.B Stephens. How to perform Continuous Sampling (CSP), volume 2. American Society for Quality Control, Milwaukee, 1990.
A Dynamic Programming Model of a Machine Tool in Flexible Manufacturing Bernard F. Lamond Universite Lava1 Faculte des sciences de l’administration Quebec (Quebec), Canada G1K 7P4 (e-mail: Bernard. LamondOf sa.ulava1.ca) Abstract. We present a dynamic programming model for the optimal operation of a flexible machine tool, t o find a sequence of cutting speeds for the successive tools used in a cutting operation, in order to minimize expected makespan. The expected tool life decreases with cutting speed and each tool takes a setup time to install. We compare the optimal dynamic policies with well-known static policies. Keywords: Stochastic dynamic programming, Tool life model, Flexible machine.
1
Introduction
We revisit the problem of selecting the cutting speed of a flexible machine tool that was previously addressed in [Lamond and Sodhi, 19971 for deterministic tool life and in [Lamond and Sodhi, 20061 for stochastic tool life. T h e economic life of a cutting tool varies with speed according t o Taylor’s classic relation, and there is a setup time whenever a tool is loaded by the machine operator, so the total processing time is the sum of cutting plus setup times. We seek a decision rule for choosing the cutting speed of each tool according t o the remaining cutting distance, so as t o minimize total processing time. Previous optimization models assumed a constant cutting speed for every tool used to complete a part type. Here, we use dynamic programming t o show that such a static policy is optimal for a deterministic tool life model but not for a stochastic tool life model, in which case a dynamic policy is better, except for the exponential distribution and without tool magazine.
2
Deterministic tool wear
As in [Lamond and Sodhi, 19971, we relate the nominal tool life t t o the cutting speed v using the empirical formula of [Taylor, 19061:
v/v, = ( t r / t ) 7 =+ t
= t, (v,/w)
111)
,
(1)
where t, is the nominal tool life at some reference speed v, and 77 is a given constant such that 0 < 77 < 1. Assuming deterministic tool wear, the tool life t is known with certainty. Hence during its economic life, a tool will execute a n amount of cutting that is equal to the distance 398
A Dynamic Programming Model of a Machine Tool 399 Y = vt = Y r ( v r / v ) a, (2) where y, = v,t, and a = (1 - ~ J ) / Q . Conversely, for a single tool to cut exactly a distance y during its economic life, the cutting speed and time are
v = v, (yr/y)lla
and t = t, (y/yr)l'(l-q).
(3)
Lemma 1. The cutting speed v and time t given by eq. (3) minimize the cutting time for a given distance y while using n o more than one tool. Proof. Any 6 > v is not feasible because the cutting distance 5 < y, but any 6 < v is not optimal because the cutting time t^ = y/6 > y/v = t. 0
Lemma 2. For cutting a distance y using k > 1 tools, the optimal cutting speed is given by eq. (3) with y replaced by y l k , with optimal cutting time
Proof. For i = 1 , 2 , . . . , k, let yi be the distance cut by the ith tool. Using Lemma 1 and eq. ( 3 ) , we want to minimize t = t, (y: . . . y l ) /y: subject to the constraint y1 + . .. yk = y, where y = 1 / ( 1 - q ) > 1. The 1st-order necessary conditions for optimality imply y1 = . . . = Y k = y/k. 17 Now suppose there is a setup time S each time the machine operator loads a tool manually, and let V(x) be the optimal processing time for cutting a distance x of a given part type. Then V(x) satisfies the dynamic program
+ +
+
~ ( x =) min
OlY5X
{s + t , (y/yr)l'(l-~~)
(5)
where y is the distance to be cut by t,he first tool, and with V(0) = 0.
Theorem 1. There is an increasing sequence
{XO,X I ,x2,.. .}
,
with
k = l I 2 , ...,
(6)
and xo = 0 , such that a n optimal decision rule f o r the dynamic program of eq. (5) is to use exactly k tools when the cutting distance x is in the interval %k-1 < x 5 x k , and the associated optimal processing time is 1
V(X) = k S
+kl/* t ,
(5)
lI(1-q) '
(7)
Proof. This follows from lemmas 1 and 2. Adding k setup times to eq. ( 4 ) gives eq. ( 7 ) . Equating the latter for k and k 1 and solving for x gives the critical points in eq. (6). Convexity of z-l/* implies xk < x k + l . o According to Theorem 1, if the cutting distance x E ( x k - 1 , xk] then the optimal cutting speed is given by eq. ( 3 ) with y = x / k , so that k tools are used, each tool cutting a distance x l k during its economic life. Hence the optimal decision rule is completely specified by the critical points x l , x2, . . ..
+
400
Recent Advances in Stochastic Modeling and Data Analysis
Now it is convenient to define the optimal cutting speed 5 of the continuous relaxation model, where the integrality of tool setups is neglected, as in [Boothroyd and Knight, 19891 and [Lamond and Sodhi, 19971:
where the lst, term is the cutting time and the 2nd term approximates the number of tools with the cutting time divided by the tool life given in eq. (1). But the processing time is larger than the continous relaxation, so we have V(x) 2 z/[(1 - 77)5], where the right hand side was obtained by replacing v by 6 in eq. (8). We remark that the economic tool life at cutting speed 6 is equal to as and, thus, the cutting distance of a tool is y = aS6.
Lemma 3. T h e cutting speed 5 i s optimal for x = ky, k = 1 , 2 , . .., with optimal processing t i m e equal t o continuous relaxation: V(k6) = Icy/[( 1-77)6].
Proof. Let us minimize the difference between the optimal processing time and the continuous relaxation, when k tools are used. From eq. (7), we have V(z) -
F X } =min{kS+
(1 - 7))v
x>o
k'latr 1
):(
1/(1-7)
-
?}
X
(1 - 77)v
z>o
Taking the 1st derivative with respect to z, equating to 0 and solving, we get z = ky, ( v r / 6 ) " = ky, the last equality following from eq. (2). Replacing z by ky in eq. (7) gives V(ky) = k S kt, (jj/yr)l'(l-q) , but eq. (3) indicates the last term is equal to k times the tool life at cutting speed 5, hence V(k9) = k ( l a ) S , and the result follows.
+
+
Theorem 2. For eack k = 1 , 2 , . . ., we have ky 5 x k 5 (k + 1/2)9.
Proof. First, we use eq. (8) to get 6 = as6 = al--llyr ( S / t r ) l - q and next, using eq. (6), we see the above inequalities are equivalent to
We will show only the first inequality, the other proof being similar. Letting K. = k 1/2, multiplying both sides by 7 7 ~ . ' / ( ' - q ) and simplifying, we get
+
77
1
[
- 77)K. (1 - 1/(2K.))1/"
-
(1
+ 1/(2.))1/" l
l
Now for y > 0, one can show easily (using a Taylor series expansion, for instance) that (1 - x ) - T - (1 x)-y 2 2yz. With y = 1/a and z = 2 ~this , implies the expression in brackets is 2 l / ( a ~ .and ) , the result follows. 0 We now remark that, at the critical distance z k , the optimal cutting speed jumps from a lower limit k' to an upper limit g k , where
+
A Dynamic Programming Model of a Machine Tool 401 These limits get closer when z increases, so the cutting speed 6 of the continuous relaxation is nearly optimal for large jobs using many tools. Corollary 1. For k = 1 , 2 , . . ., the optimal cutting speed limits satisfy 3, 5 6 5 vk and v k / g k = (1 +
Proof. The latter equality follows from eq. (3) because, by definition of x k , the optimal cutting distance jumps down from x k / k to z k / ( k 1). The fact that the speed limits include 6 follows from Theorem 2. 0
+
Difference between optimal processing time and continuous relaxation
Total cuning distance
".
Ootimal mttina soeed of first tool
............. 0.6 Total cuning distance Ootimal cuttino distance of firs1 tool
0
100
200
300 400 Total cutting distance
500
600
Fig. 1. Optimal processing time and cutting speed for deterministic model
Figure 1 illustrates these properties for a part type described in 55. The 1st chart shows the optimal processing time minus the lower bound from continuous relaxation (solid line), and the actual processing time of always using speed 6 minus the lower bound (dotted line). The 2nd chart shows the optimal cutting speed (solid line) and 6 (dotted line). The 3rd chart shows the optimal cutting distance per tool. The circles indicate the critical distances X k where the optimal solution switches from k to k 1 cutting tools, and the squares indicate the distances kjj where 6 is optimal.
+
3
Stochastic tool wear
To specify a stochastic tool life model, it is convenient to define a random variable W such that E[W] = 1 and Var(W) = E2 where E is the (fixed) coefficient of variation of the tool life. Then at cutting speed v, the random
402
Recent Advances in Stochastic Modeling and Data Analysis
tool life is equal to tW and the corresponding cutting distance is equal to y W , where t and y are given respectively by eq. ( 1 ) and eq. (2). These assumptions are consistent with the experimental findings of [Wager and Barash, 19711. In the static model of [Lamond and Sodhi, 20061, where a given (fixed) distance z has to be cut using the same cutting speed u for every tool, the successive tool lives are iid random variables tW1,tWz, . . ., and similarly for the corresponding distances y W 1 , y W 2 , . . ., therefore the expected number of tool setups is equal to S w ( z / y ) , where w ( 0 ) is the renewal function of a renewal process with interarrival times { W l , Wz,. . .}. A convenient expression for w ( 0 ) is given in [Lamond and Sodhi, 2006, eq. (15)] for the Erlang distribution, and w ( 0 ) = 1 0 for exponential tool life. The expected processing time for cutting a part type is the sum of cutting time plus expected setup time: T P ( z ,y ) = z/w S w ( z / y ) . Under exponential tool life, we have
+ +
so the optimal cutting speed is equal to 6 of eq. (8), as in the continuous relaxation, with 6 = aS6, giving the optimal processing time
For distributions other than exponential, the renewal function is not so simple and the optimal cutting speed must be found by a numerical method. We now investigate whether a dynamic policy can decrease further the expected processing time by selecting the cutting speed of a tool as a function of remaining distance. The optimal expected processing time V(z), for cutting a distance z starting with a fresh tool, satisfies the dynamic program
with V(z) = 0 for all x < 0. In eq. ( l o ) , the first two terms inside the brackets are respectively the setup and cutting times of the current tool, while the third term is the expected processing time required to complete the part type after the current tool is worn out. Let G(u,y ) and g ( u , y ) denote respectively the cumulative distribution function (cdf) and probability density function (pdf) of U = y W . Then
is the expected cutting time of the current tool and eq. ( 1 0 ) becomes
{
V(z) = min S Y>O
+ p(x, y ) +
lz
V(z - u)g(u,y ) du
Moreover, if G(0, y ) = 0 , as we assume, then V(0) = S .
, z 2 0.
(12)
A Dynamic Programming Model of a Machine Tool 403 When U has an Erlang distribution with shape parameter r , then its scale parameter is X = r / y , and the expected cutting time is given by
When U has an exponential distribution ( r = l),eq. (13) reduces further to
Theorem 3. When the tool life has an exponential distribution, the optimal solution of the dynamic program of eq. (12) is the static policy y = 1.
Proof. Suppose V ( u ) = T P * ( u )for 0 5 u < x. Then substituting eq. (9), eq. (14) and the exponential pdf in eq. (12), and integrating by parts, we get V ( z )= miny>o { E ( x ,y) - F ( x ,y)e-”/Y}, where
The 1st-order necessary condition for optimality, Ev- [Fg-xF/y2]e-”/Y = 0 , is straightforward to verify, with both terms equal to zero for y = 9. 0 When the tool life distribution is not exponential, there are no reasons why the static policy of [Lamond and Sodhi, 20061 should be optimal, but the optimal dynamic policy y ( z ) and expected processing time V ( x )have to be evaluated by a numerical method. First, we discretize the state variable z: given an upper bound I and a (large enough) integer n, we use a finite grid of points x = ih for i = 0 , . . . , n and a step size h = ?/n. We start with i = 0 because V ( 0 )= S, and there are n major iterations, where the discrete optimal value V ( i h )and action y(ih) are found at the ith iteration. Next, we replace V ( x - u)g(u,y ) du by a discrete approximation
s,”
Q(i,h, Y) = Cj=lv((i - j ) h ) p ( j h, , Y) ~ ( h, j ,Y ) = G ( j h ,Y ) - G ( ( j- l ) h ,Y ) , j = 132,. . .
(15) (16)
with p ( 0 , h, y) = 0 and, when U has an Erlang distribution,
+
+
Finally, we replace eq. (12) by V ( i h ) = min,>o { S cp(ih,y) Q(i,h , y ) } . This stochastic dynamic programming (SDP) method is described in Figure 2.
4
Machine with tool magazine
In flexible manufacturing systems, machines are often equipped with a tool magazine. This permits having a number of different tools immediately available for processing, without incurring a setup time. Moreover, extra tools
404 Recent Advances in Stochastic Modeling and Data Analysis Step 1 . Choose the number of states n and the upper bound C, and specify an error tolerance E > 0 for the line search. Step 2. Let h = E / n and v(0)= S. Step 3. For i = 1 , . . . ,n do: (major step to compute v ( i h ) ) Step 4. For k = 1 , 2 , . . . repeat: (line search for optimal action) Step 5. Determine candidate action yk. Step 6. Compute 'p(ih,yk) using eq. (13). Step 7. Set p ( 0 , h, yk) = 0 and G(0, yk) = 0. Step 8. For j = 1, . . . , i, do: (inner loop for discrete probabilities Step 9. Compute G(jh,yk) using eq. (17). Step 10. Compute p ( j , h, y k ) using eq. (16). Step 11. End loop on j . Step 12. Compute Q ( i ,h, yk) using eq. (15). Step 13. Compute T ( i ,h, yk) = S ~ ( i hyk) , Q(i,h, yk). Step 14. End loop on k when convergence criterion satisfied at k * . Step 15. Set ~ ( i h=) yk* and P(ih) = ~ ( h, i Yk') , where yk* minimizes T ( i ,h, yk). Step 16. End loop on a.
+
+
Fig. 2. Discrete SDP procedure for a machine without tool magazine
are manually inserted one by one in the machine when needed, and thus each incurs a setup time S. Let TC be the number of tool slots in the magazine. The special case TC = 0 corresponds to a machine with no tool magazine, as in the previous sections, and for which all tools require a setup time. Let Ve(z) be the optimal processing time for cutting a distance z when there are l fresh tools in the magazine. In the deterministic model, the static policy of [Lamond and Sodhi, 1997, Table 41 is optimal. According to this policy and Theorem 1, the number of tools to use for cutting a distance z is equal to max{k,!} if zk-1 < z 5 z k . Then Vo(z) = V ( x )as in eq. (7) and
Ve(z) = max(0, Ic
-
1)s+
1
max{k, e } l l a
'' (5>
lI(1-7)
, c 2 1.
(18)
But in the stochastic model, Vo(x)= V(z) satisfies eq. (12) and, for l > 1,
fi-l(x - u ) g ( u ,y) du. A simple adaptation of the algorithm in Figure 2 can be used to compute a discrete approximation G ( i h ) ,for l = 1 , . . . ,T C , as shown in Figure 3.
5
Numerical illustration
The optimal dynamic policy is illustrated in Figure 4 with the 4th part type of [Lamond and Sodhi, 2006, Table 21.
A Dynamic Programming Model of a Machine Tool 405
Step 3. For i = 1 , . . . , n do: (major step to compute &(ih)) Step 12. Compute Q ( i ,h,yk) using eq. (15) with ? replaced by Step 13. Compute T ( i ,h, y k ) =-cp(ih,y k ) Q(i,h, yk).
+
G-1.
Fig. 3. Modified steps of SDP procedure for a machine with a tool magazine Erlang ( ~ 1 1 )
Exponential
7
I ---; -ZI, ..........
,
0
0
100
,
,
200 300 400 Distance to cut (x)
500 Distance to cut (x)
Fig. 4. Optimal dynamic policy (expected cutting distance y(z) of 1st tool)
The tool characteristics are S = 100 seconds, w, = 1 m/s, q = 0.38 and w,t: = 5 . For cutting 550 meters with 3 tools in the magazine, the expected makespan of a dynamic policy is 977 seconds compared to 984 seconds for a static policy. The relative improvement seems more important for small jobs.
6
Acknowledgements
This research was supported in part by the National Science and Engineering Research Council of Canada, under Grant 0105560. Ousman Assani’s help with some of the computations is also acknowledged.
References [Boothroyd and Knight, 1989lG. Boothroyd and W. Knight. Fundamentals of Muchining and Machine Tools. Marcel Dekker, New York, 1989. [Lamond and Sodhi, 1997lB. F. Lamond and M. S. Sodhi. Using tool life models to minimize processing time on a flexible machine. IIE Truns., 29:611421, 1997. [Lamond and Sodhi, 2006]B. F. Lamond and M. S. Sodhi. Minimizing the expected processing time on a flexible machine with random tool lives. IIE Truns., 38:l11, 2006. [Taylor, 190GlF. W. Taylor. On the art of cutting metals. Truns. ASME, 28:31& 350, 1906. [Wager and Barash, 1971lA. B. Wager and M. Barash. Study of the distribution of the life of HSS tools. Trans. ASME, 93:1044-1050, 1971.
Particle filter-based real-time estimation and prediction of traffic conditions Jacques Saul, Nour-Eddin El Faouzi', Anis Ben Aissa2, and Olivier de Mouzon2 LMFA, University Claude Bernard Lyon 1 43, Boulevard du 11 novenibre 1918
69622 Villeurbanne Cedex, France (email: jacques.sauiii,Luiiv-lvoii1. fr) Transport and Traffic Engineering Laboratory INRETS, LICIT, laboratoire d'ingenierie circulation transports 69675 Bron Cedex, France ENTPE, LICIT, laboratoire d'ingenierie circulation transports 695 18 Vaulx-en-Velin, France (e-mail: elfaouzi@,inrets.fl-, aiiis.ben-aissa(ii,inrets.li-,olivier.de-mouzoiii~iiirets.fr) Abstract. Real-time estimation and short-term prediction of traffic conditions is one of
major concern of traffic managers and ITS-oriented systems. Model-based methods appear now as very promising ways in order to reach this purpose. Such methods are already used in process control (Kalman filtering, Luenberger observers). In the application presented in this paper, due to the high non linearity of the traffic models, particle filter (PF) approach is applied in combination with the well-known first order macroscopic traffic model. Not only shall we show that travel time prediction is successfully realized, but also that we are able to estimate, in real time, the niotonvay traffic conditions, even on points with no measurement facilities, having. in a way, designed a virtual sensor. Keywords: Real-time traffic estimation, Bayesian Monte Carlo, travel time prediction.
1
Introduction
With the rapid deployment of Intelligence Transportation Systems (ITS), the users are more and more informed about traffic conditions and special events during their travel. It has been recognized that the full benefits of ITS systems cannot be realized without an ability to anticipate traffic conditions in the shortterm ([Sussman, 20001). Hence, short-term traffic prediction of traffic state could play a key role in various ITS applications, such as Advanced Traffic Management System (ATMS), Dynamic Vehicle Navigation System (DVNS) and Advanced Travel Information Systems (ATIS). The purpose of predicting short-term traffic conditions is to forecast traffic flow variables such as traffic volume, travel speed, or travel time in the range from 5 to 30 minutes ahead. For travel time estimation and prediction, various methodologies and techniques have been explored in developing short-term prediction models. Due to the non-linearity, complexity and uncertainty of contributing factors of the traffic conditions in the traffic system, the forecasting method based on deterministic models cannot meet the accuracy needed by ITS applications. To overcome this limitation, many non-deterministic forecasting 406
Particle Filter-Based Real- Time Estimation and Prediction 407 models have largely improved the accuracy of forecasting. Such a model can be well adapted to different locations and easily transplanted, e.g. [Vlahogianni et al., 20041. In this scope, our research effort focused on developing a stochastic traffic modeling framework that enables travel time estimation and prediction with high reliability. The basic components, in our application, are first that the traffic can be modeled, and second that, at any time, the traffic state is completely characterized by a finite number of quantities called the state vector of the system, from which any interesting quantity can be given. The dynamic evolution of the system is modeled following the well-known macroscopic traffic model LWR (LighthillWitham-Richards, [Lighthill and Whitham, 19551, [Richards, 1956]), which gives the evolution of the state vector X, at every time step. An observation equation gives the relation between measured quantities y , and state vector XI.The problem then consists of estimating the state vector X,, using the observations y, . Since the state equation is highly non linear, this task will be carried out using the particle filter approach known also as the Bayesian Sequential Monte Carlo method ([Doucet et al., 20011, [Chen, 20031, [Doucet, 19981).
2
Modeling framework Traffic model in brief
2.1
A macroscopic approach is based on hydrodynamic analogy describing the behaviour of the traffic flow. The first model of traffic flow was introduced concurrently by [Lighthill and Whitham, 19.551 and [Richards, 19561. The use of this macroscopic model for simulating and predicting future traffic conditions along the roadway implies a space-time discretization. One of the best first order discretization schemes for the computation of the entropy solution (see [Velan and Florian, 20021) of the LWR is the Godunov scheme ([Godunov, 19591, [Lebacque, 19961). Hence, the niotorway is discretized into n cells (Figure 1). I
II
I I
++ I
I I
I
I
Ik,
I
1I,
!
I
h.
2
k, is the density in cell i [ v e h h ] q, is the exit flow of cell i [vehhin]
k,,
Fig. 1. Space Discretization.
Fig. 2. Fundamental Diagram.
k,,,,
Recent Advances in Stochastic Modeling and Data Analysis
408
The basic hypothesis, in the LWR model, is the existence of a quasi stationary relation q = Q , ( k ) between flow and density in a given cell i. This relation takes the general typical shape depicted in Figure 2 ([Greenshields, 19351), where: q,,, is the maximum possible flow in the cell.
k,, is the critical density of the cell. k,,, is the maximum possible density ofthe cell. The state equation is then built as follows: Conservation equation:
Supply-demand equation: 8
.
supply of cell i+ I (to its upstream):
R(
Demand ofcell i (to its downstream):
r(k’)= Q, (min(k’,k:))
Resulting flow leaving cell i:
(r:
k1+’)
= Q,,, (niax (k’+’,k;+’)).
=min(~(k:+‘),r(k:)).
The measured variables are flows andlor densities on some specific cells. The observation equation is then written:
Y , = ex, 5
c
where is the observation matrix, with as many lines as measured variables. However, for a numerical application, one must distinguish the observation time step Ato from the numerical time step AtN used for the numerical solution of the state equation. The observation time step is an on-field constraint, typically 6 min in our application. The numerical time step is constrained by the CourantFriedrichs-Lewy (CFL) condition in order to obtain a stable numerical scheme ([Godunov, 19591, [Lebacque, 19961). The numerical time step will be always such that Ato is multiple of A t N ,i.e. Ato = NoAtN,with No chosen so that the CFL condition will be always satisfied everywhere.
x,
T
With state vector defined as = (k,,k, ,...k,, ;qo,q,, . .* q,l) , the dynamic system will be written in tern1 ofobservation time step:
{
XI+, = F ( X , - u , )
Yl
=ex,
Therefore, No numerical time steps are needed for an observation time step. The function F ( X ,,u, ) contains then implicitly N, numerical Godunov time steps.
Particle Filter-Based Real- Time Estimation and Prediction 409 In this system, the inputs are the traffic conditions at the boundaries: U,
= ( U , ,u2)‘ , where U , (resp. u2) is the upstream demand (resp. downstream
supply). 2.2
State vector estimation
In the previous section, a traffic model has been described. The aim is then to estimate the state vector using the observations. The traffic model is highly non linear, therefore Kalman filter can hardly be used. Sequential Monte-Carlo or particle filter approach provides the solution to deal with such a case. Monte Carlo methods are widely used to simulate the dynamic of complex systems. This approach will be used in the present work (see e.g. [Ben Aissa et al., 20061, [Mihaylova and Boel, 20041 and [Doucet et al., 20011 for a detailed description of the application of SMC to the traffic application, or [Arulampalam et al., 20021 and [Doucet et al., 20011 for the theory of the SMC). In general, importance sampling method is used, as it is in fact difficult to sample directly from the posterior density P(X, yo,,uoI-, ) . However, in this
1
first application, since the measurements directly concern components of the state vector, we considered that partial Gaussian state space case, see [Doucet et al., 20011 and [Doucet, 19981 for a detailed description ofthis case.
3 3.1
Application to Travel Time Prediction Data and Network Site
The application has been performed on a French motorway section, owned by the “APRR” private company, from ‘MAcon Sud’ to ‘Belleville’. This route is approximately 21 km long. Two traffic detectors are located in this route. The section between the detectors is 15 km long. A road accident occurred in between the traffic detectors, causing congestion. This site is very interesting due to available data: Additionally to detector traffic data, data from toll collection system was also available for some vehicles which both enter the motorway before the first detector and leave it after the last one. This data provides individual travel time (magnetic toll stamps). Once filtered. it constitutes a reference experienced travel time. Therefore, predicted travel time coming from the Bayesian Monte Carlo state vector reconstruction can be validated by these independent measurements issued from what we can call probe vehicles. The section between the detectors has been discretized in 50 cells of 300 m long. The observation time step of the detector data base is 6 min. This is the time step for the re-estimation of Monte Carlo Bayesian procedure.
410
3.2
Recent Advances i n Stochastic Modeling and Data Analysis
Model calibration
The model parameters, i.e. critical density and maximum flow, have been determined from independent data. Data from a different - but similar - motorway section have been retrieved. These data cover traffic conditions from the free flow to heavy saturated one, but with no perturbing events like accidents or other incidents. Comparison between the measured flows and the corresponding ones given by our model in the same conditions allows the best estimation of the model parameters. A criterion is needed for this task. The relevant criterion is the minimization of the joint entropy, (or maximization of mutual information) between measures and model (see [Li and Vitany, 19971 and [Guiasu, 19771). The minimization of this positive and parameter dependant quantity gives the optimal choice for the parameter values. As an illustration, the optimal maximum flow and critical density for a three lanes motorway stretch are founded to be: k,= 0,12 veh/m, and
3.3
q,, = 100 veh/min.
State vector estimation and Travel time prediction
Once the traffic model established, Sequential Monte Carlo Bayesian estimation of the state vector of the system is performed. After the model calibration, and in order to fulfil the CFL condition, a numerical time step of 0.143 min has been chosen, leading to No =42 numerical time steps for an observation one. First, we act as if the traffic takes place in normal conditions. The diagnosis comes then from the gap between on-field measures and measures rebuilt by the Bayesian procedure. Figure 3 shows the measured and the re-estimated tlows at the downstream end of the section. Most of the time. the re-estimated curve is close to the measured one. However, two kinds of events can be observed on the graph. First, clear breakdowns of the sensor can be seen: In these cases, the returned value is zero. The measure is correctly rebuilt by the Bayesian procedure.
Fig. 3. Downstream flows with normal traffic conditions hypothesis.
Fig. 4. Downstream flow taking into account the accident occurrence.
Particle Filter-Based Real- Time Estimation and Prediction 411 Second, in the 2:OO pm - 3:OO pm slot an anomaly is clearly occurring in the motorway traffic observation. It can be due to a sensor default like a drift, or a traffic incident. With no other information, sensor default is the hypothesis, and the measure is rebuilt. In fact an accident occurred, and was reported by the patrol data and lasted one hour. This accident is modelled as restricted values of critical density and maximum flow during the accident duration at the accident location which corresponds to the cell number 25. In this cell, free velocity and maximum flow have been respectively divided empirically by 4 and 9, which correspond to reasonable values. With these new physical conditions the comparison between downstream measured flow and re-estimated one is shown figure 4. The agreement is now good, including the growth of flow which follows the release of vehicles at the end of the accident. A global view of the spatio-temporal traffic concentration and tlow in the section, as estimated by the Bayesian procedure, is depicted in figure 5 . The concentration and flow dependence in abscissa and time are surfaces in a 3-D diagram. The traffic jam provoked by the accident i s seen in the concentration graph, with a decrease of concentration downstream the jam. At the same time, flow is almost zero downstreani the accident because of no vehicles travelling and in the jam because of very low speed. The sudden rise of flow just after the jam is also very well reproduced. The maximum length of the traffic jam can be evaluated which gives a value of around 2 km, in comparison to the value of 2.2 km reported in the patrol data base.
(a) Concentration
(b) Flow
ig. 5. Global view of concentrationand flow in space and time in the motormay section.
The estimated time evolution of concentration and flow in cell number 23, just upstream the accident and where no direct measurement is performed, are depicted Figure 6. The rise of concentration, which reaches almost the maximum value, and the decrease of flow during the traffic jam caused by the accident
412
Recent Advances i n Stochastic Modeling and Data Analysis
appear distinctly. During this jam, the vehicle velocity can be calculated using concentration and flow values, and is around 3 km/h. The estimated density and flow in a non direct measured cell behaves then like a virtuul sensor. The predicted travel time is shown Figure 7. It is the travel time predicted for vehicles entering the section. Indeed, at current time t we have estimated the history of past state vectors. We can then, using the traffic model, perform a prediction of travel time for vehicles entering the section. As we said above, for this particular niotorway section, toll gates exist before and after the section. Therefore, a real-world measurement of travel time experienced by the vehicles can be calculated from toll gate data base. A comparison between this real measurement and the prediction can be performed and is depicted Figure 7. A very good agreement is clearly seen, showing that estimations of densities and flows based on a traffic model provide relevant travel time predictions.
Fig. 6. Estimated concentration (vehlm) and flow (veh/min) in cell 23.
4
Fig.7. Predicted travel time for vehicles entering the section.
Conclusion
In this paper, we have proposed an iterative stochastic approach to capture dynamically the spatio-temporal behaviour of traffic flow for the purpose of short-term travel time prediction. The basic hypothesis is the ability to model the traffic evolution by a state equation. Therefore, the travel time estimation and prediction problem can be successfully reformulated and solved as a sequential estimation process, using Monte Carlo procedure. Second, the calibration process was performed using maximization of mutual information principle. Finally, the obtained results pointed out the predictive capabilities of the underlined estimation process, and its benefit for real-time travel time prediction.
Acknowledgements The authors wish to thank the APRR niotorway company for providing the realworld data used in this research.
Particle Filter-Based Real- T i m e Estimation and Prediction
413
References [Arulampalam et al., 20021 M. Arulanipalani, S. Maskell, N. Gordon and T. Clapp. A tutorial on particle filters for online nonlinearhon-Gaussian Bayesian tracking. IEEE. Trans. on SIgnalProcessing, 50(2):174-188, 2002. [Ben Aissa et al., 20061 A. Ben Aissa, J. Sau, N.-E. El Faouzi and 0. de Mouzon. Sequential Monte Carlo Traffic Estimation for Intelligent Transportation System: Motorway Travel Time Prediction Application. 2'ld ISTS, Lausanne, 2006. [Chen, 20031 Z. Chen. Bayesian filtering: From Kalmati filters to particle filters, and beyond. Adaptive Systems Lab., McMaster University, Hamilton, ON, Canada, 2003. [Doucet, 19981 A. Doucet. On sequential simulation-based methods for Bayesian Filtering. Technical Report CUEDF-INFENGKR 3 I0 Department of Engineering, Cambridge University, 1998. [Doucet et al., 20011 A. Doucet, N. de Freitas and N. J . Gordon. Sequential Monte Carlo Mefhods in Pracfice, Springer, N.-Y., 200 1 [Godunov, 1959] S. K. Godunov. A difference scheme for numerical computation of discontinuous solution of equations of fluid dynamics. Math. Sbornik. 47(89):27 1306, 1959. [Greenshields, 193.51 B. D. Greenshields. A study in Highway capacity. Proceedings ofthe Highway Research Board, Vol. 14, pp. 448-477, 1935.
[Guiasu, 19771 S. Guiasu. Infomation Theory with Applications, McGraw-Hill, New York, 1977. [Lebacque, 19961 J.-P. Lebacque. The Godunov scheme and what it means for the first order traffic flow models. Proceedings of International Symposium of Traffic flow Theory, J-B. Lesort (Eds), pp. 647-677, Lyon, France, 1996. [Li and Vitany, 19971 M. Li and P. Vitany. A n lntrodzrctiot1 to Kolrnogorov Complexity ant its Applications, Springer, 1997. [Lighthill and Whitham, 19551 M. H. Lighthill and G . B. Whithain. On Kinematic waves 11: A theory of traffic flow on long crowded roads. Proc. ROJU/SOC..London, 229(A). pp. 317-345, 1955. [Mihaylova and Boel, 20041 L. Mihaylova and R. Boel. A particle filter for freeway traffic estimation. Proceedings of the 43'd IEEE CDC Conference, Nassau, Bahamas, 2004. [Richards, 19561 P. I. Richards. Shock-waves on the highway. Op. Res. 4, pp. 42-51, 1956. [Sussman. 20001 J. Sussman. Introduction to Transportation Problems. Artech House. Norwood, Massachusetts, 2000. [Velan and Florian, 20021 S. Velan and M. Florian. A Note on the entropy Solutions of the Hydrodynamic Model of traffic flow. Transportation Science, 36(4):435-446,2002. [Vlahogianni et al., 20041 E. I. Vlahogianni, J. C. Golias and M. G. Karlaftis. Short-term Traffic Forecasting: Overview of Objectives and Methods. Transport Review, 24, pp. 533-557,2004,
Probability of trend prediction of exchange rate by ANFIS George S. Atsalakisl, Christos H. Skiadas’, llias Braimisl
’ Technical University of Crete University Campus 73 100 Chania, Crete, Greece (e-mail: atsalakis(~e~nies.tuc.gr) Abstract. Modelling the human behaviour in the market of the exchange rate was always an important challenge for the researchers. Financial markets are influenced by many economical, political and even psychological factors and so it is very difficult to forecast the movement of future values. Many traditional methods were used to help forecasting short-term foreign exchange rates. In their effort to achieve better results many researchers started to use soft computing techniques over the last years. In this paper a neuro-fuzzy model is presented. The model uses a time series data of daily quotes of the euro/dollar exchange rate in order to calculate the probability of the trend prediction as far as exchange rate. The data is divided into the training data, checking data and testing data. The model is trained using the training data and then the testing data is used for model validation. Keywords: Neuro-fuzzy, exchange rate forecasting, time series forecasting.
1
Introduction
The difficulty in predicting the exchange rates has been a long-standing problem in the international financial sector. This is mainly because of the uncertainty and volatility that hinder the efforts for an informed prediction. The reason for this volatility is that the currency exchange markets are influenced by many economical, political and even psychological factors. But because of the great profits that someone can gain from investing in this field, many researchers have tried to solve this problem. As a result, many methods were used in order to predict the currency exchange rate. At first heuristic methods were used such as moving averages [Brown, 19631 and exponential smoothing and adaptive exponential smoothing [Triggs, 19671. Later on, many researchers started to use the autoregressive-integrated-moving-average (ARIMA) models [BoxJenkins, 19701, which have been widely used since the early 1980’s. I n the mid 1980’s, researchers focused on the volatility of foreign exchange rates. As a result. the Autoregressive Conditional Heteroskedasticity (ARCH) model was proposed by Engle (1982) in order to predict short414
Probability of Trend Prediction of Exchange Rate by ANFIS 415
term volatility. Finally, Chien and Leung (2003) developed a Bayesian vector error correction model for forecasting exchange rates. However, the traditional statistical techniques for forecasting currency exchange rates do not have satisfactory results. Yao and Tan (2000) argued that classical time series analysis, based on the theory of stationary stochastic processes, do not perform satisfactorily on economic time series. This is because economic data are not simple autoregressiveintegrated-moving-average processes, they cannot be described by simple linear structural and they are not simple white noise or random walks. In the last decade, with the rapid advancement of computer technologies and the growing popularity of artificial intelligence, researchers and practitioners have become more capable of adopting artificial neural network in financial forecasting. Although most artificial neural network models share a common goal of performing functional mapping, different network architectures vary greatly in their ability to handle different types of problems. Such models can be found in: Chen, A.-S et a1 (2004), El Shazly M.R. et al (1997), El Shazly M.R. et al (1999), Leung M.T. et a1 (ZOOO), Li L. et a1 (2004), Lisi F. et a1 (1999), Makridakis S. (1979), Qi, M. et a1 (2003) and Podding T. et al ( 1996). Throughout this paper a neuro-fuzzy model is presented in order to calculate the probability of the trend prediction of the exchange rate. The paper is organized as follows: first of all, some general information about the ANFIS is being investigated, and then there is a description of the model. The results using the data are being presented. Finally a conclusion is being made after taking into account all the issues discussed and rose in this study.
2
Theoretical approach of ANFIS
A neuro-fuzzy system is being defined as a combination of Artificial Neural Networks (ANN) and Fuzzy Inference System (FIS) in such a way that neural network learning algorithm is used to determine the parameters of FIS [Jang, 19931. Adaptive Neural Fuzzy Inference System (ANFIS) is a system that belongs to neuro-fuzzy category. Functionally, there are almost no constraints on the node functions of an adaptive network except piecewise differentiability. Structurally, the only limitation of network configuration is that it should be of feedforward type. Due to this minimal restriction, the adaptive network's applications are immediate and immense in various areas. In this section, we proposed a class of adaptive networks, which are functionally equivalent to fuzzy inference systems.
416
Recent Advances in Stochastic Modeling and Data Analysis
The fuzzy reasoning mechanism that ANFIS uses is presented in the figure 1.
21
‘4x + q l Y + ‘t
22
= P=*
f
Q2Y + ‘1 Weighted Average
Fig. 1. Fuzzy reasoning [Jang 19951
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Fig. 2. ANFIS architecture [Jang, 19971 For simplicity, is assumed the fuzzy inference system under consideration has two inputs xand yand one output z . Presumably that the rule base contains two fuzzy if-then rules of Takagi and Sugeno’s type: Rule 1: If x is A, and y is B, then f, = p , . A’ + q, . y + 7, Rule2: If x is A, and y is B, then f,= p 2 . x + q2 . y + rz The ANFIS architecture is shown in the Figure 2. The node functions in the same layer are of the same function family as described below:
Probability of Trend Prediction of Exchange Rate b y ANFIS
417
Layer 1: Every node i in this layer is a square node with a node function:
0,'<x> = P,4,(1) where x - the input to node i andA,- the linguistic label (small, large, etc.) associated with this node function. In other words, 0,' is the membership function of A, and it specifies the degree to which the given xsatisfies the quantifierA,. Usually the
pA,(x) is chosen to bell-shaped with maximum equal to 1 and minimum equal to 0, such as the generalized bell function: 1 PA,
=
b,
I+rjlqcJ] or the Gaussian function:
-[7) 2
PAt(x) = e
where a,,b,,c, is the parameter set. As the values of these parameters change, the bell-shaped functions vary accordingly, thus exhibiting various forms of membership function on linguistic label A , . Parameters in this layer are referred to as premise parameters. Layer 2: Every node in this layer is a circle node labelled n, which multiplies the incoming signal and sends the product out. Layer 3: Every node in this layer is a circle node labelled N. The i-th node calculates the ratio of the i-th rules firing strength to the sum of all rules' firing strengths: -
w ,=
w,
WI + w , i=l, 2 For convenience, output of this layer will be called normalized j r i n g strengths. Layer 4: Every node i in this layer is a square node with a node function
op (x) = w,. A = w,( p , . + q, . + r , ) where: W,- the output of layer 3 { p , , q , , r , } - the parameter set. Parameters in this layer will be referred to as consequentparameters.
418
Recent Advances in Stochastic Modeling and Data Analysis
Layer 5: The single node in this layer is a circle node labelled C that computes the overall output as the summation of all incoming signals, i.e., oZ5 ( x ) = overalloutput
Consider using all possible parameters which the number is a fimction of both, the number of inputs and the number of membership function then can be defined number of all rules as:
1=I
and if premispara, is the number of all parameters which are necessary for membership functions then the number of all parameters is defined as
parq, = p r e m i s p a r ~ z I ,a A 4 . J ;
+ Rulen(In,> + 1)
J=l
Some papers that used ANFIS model for time series forecasting in finance, have been presented in the literature [Atsalakis, 2005a,b,c], [Ucenic, 20051.
3
Model description
In this study, an Adaptive Neural Fuzzy Inference (ANFIS) model is used to forecast the trend of the exchange rate of euro and dollar one-step ahead. Fuzzy inference systems using neural networks were proposed in order to avoid the weak points of fuzzy logic. The biggest advantage is that they can use the neural networks' learning capability and can avoid rule matching time of an inference engine in the traditional fuzzy logic system. The model has three inputs and one output and the forecasting value is given by the following equation:
A t + 1) = f ( Y ( t ) ,Y ( t - 11, Y ( t - 2)) The data used in this model concerns daily quotes of foreign exchange rates of eurohsd, which are displayed as time series. A number of 1355 daily observations are used, from which the first 1067 observations are used to train the model and the 269 to check the model. As we can see in the figure below the model gives very low training and checking error and the step size, initially, was set in 0.1.
Probability of Trend Prediction of Exchange Rate b y ANFIS 419 Error C u w s
Training Error
0.114 0.112 0.11
Fig. 3. Training-checking error and step size Three membership functions have been used. The membership functions used are of the triangular type. The formula of such function for training the model is:
0, x l a
x-a ,asxlb b-a P*, (4= c-x , blxlc c-b 0, c l x where the parameters a and c locate the "feet" of the triangle and the parameter c locates the peak. The initial and final membership functions are presented in the Figures 4 and 5.
420
Recent Advances in Stochastic Modeling and Data Analysis (a) Initial MFs on input
0.4
0.2
-0.3
-0.1
-0.2
0
0.1
0.2
0.3
0.4
0.5
0.3
0.4
0.5
0.3
0.4
0.5
(b) Initial MFs on input 1
0.5
~
-0.3
-0.1
-0.2
0
0.1
0.2
(c) Initial MFs on input 1
0.5-
-0.3
-0.2
-0.1
0
0.1
0.2
Fig. 4. Initial membership functions (a) Final MFs on input
0.4
~
' - - '
0
-0.3
-0.2
-0.1
0
0.1
0.2
0.4
0.3
0.5
( b ) Final MFs on input
_---~
05-
_-,
_--(b) Final MFs on input
____ -----
05-
-0 3
-0 2
-0 1
0
01
0 2
03
0 4
o5
Fig. 5. Final membership functions 4
Results
This paragraph presents the results by using the model (ANFIS). The figure below shows the diagram of the model prediction trend of the currency exchange rate.
Probability of Trend Prediction of Exchange Rate by ANFIS 421 Actual
M ~ U ~ and S
ANFIS prediction
0 4
0 3
0 2
0 1
j
o -0 1 -02
-03
1 0
20
30
40
50
60
70
time
Fig. 6. Actual exchange rate trend and ANFIS prediction trend As can be seen in figure 6 the ANFIS model performs very well and it can successfully follow the direction of the change in the exchange rate movement. In order to see that more clearly, it is calculated how many times the actual close rate has positive and negative values and it is being done the same for the ANFIS close rate. Then, it calculates the percentage of accuracy that the ANFIS makes a correct prediction. By following this procedure, the ANFIS model has 62.79% possibility to predict the trend of currency exchange rate successfully.
5
Conclusion
ANFIS, when properly configured, provides pattern-matching capabilities that can be used for predicting the ups and downs fluctuation of exchange rate. The model demonstrates the potential of neuro-fuzzy modelling in the arena of financial prediction. The results can be used by various trading strategies in order to verify the returns. Any system that can predict the trend more than 50% can be profitable. The approximate reliability of the neuro-fuzzy predictor being 63% is extremely important despite the fact that the correct direction of the price provides no evidence of the magnitude of the movement.
References [Atsalakis and Ucenic, 200Sal G. Atsalakis, C. Ucenic. Time Series Prediction of Water Consumption Using Neuro-fuzzy (ANFIS) Approach, Interxufional Conference on Water Economics. Stutisfics and Finance. Rethynino, Greece, (200Sa). [Atsalakis el a/., 2005bl G. Atsalakis, C. Ucenic, G . Plokamakis. Forecasting of Electricity Demand Using Neuro-fuzzy (ANFIS) Approach, /n/ernational Conference on NHIBE, Corh. Greece. (200Sb).
422
Recent Advances in Stochastic Modeling and Data Analysis
[Atsalalus et al., 2005c] G. Atsalakis. C. Ucenic, and C. Skiadas. Time series prediction of the Greek manufacturing index for the non-metallic minerals sector using a Neurofuzzy approach (ANFIS), Conference “lnternationcrl$niposium on Applied Stochastic Models and Data Analysis”, France, Brest, (2005~). [Box and Jenkins] G. Box, and G. Jenhns. Time Series Analysis, Forecasting and Control. San-Fransisco, CA: Holden-Day, (1970). [Brown, 19631 R. Brown. Smoothing, Forecasting and Prediction. Englewood Cliffs, NJ: Prentice Hall, (1 963). [Chen, A. and Leung, 20041 A. Chen and M.T. Leung. Regression neural network for error correction in foreign exchange forecasting and trading. Compuiers & Operation Research, 31, 1049-1068, (2004). [Chen, A. and Leung, 20031 Chien, A. J. & Leung, M. T. (2003). A Bayesian vector error correction model for forecasting exchange rates, Computers & Operation Research, 30,887-900, (2003). [El Shazly and Hassan, 19971 M.R. El Shazly, and E. Hassan. Comparing the forecasting performance of neural networks and forward exchange rates. Journal ojMultinational Financial Management 7. 345-356. (1997). [El Shazly and Hassan, 19991 El Shazly, M. R. & El Shazly E. Hassan. Forecasting currency prices using a genetically evolved neural network architecture. International Review ojFinancia1 Analysis 8, 67-82, (1 999). [Engle, 19821 P.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50,987-1007, (1982). [Jang, 19931 J-S. Jang. ANFIS: Adaptive-Network-Based Fuzzy Inference Systems. IEEE Transactions on Systems. Man. and Cybernetics, Vol. 23(3): 665-685, (1993). [Jang et al., 19971J-S. Jang, C-T. Sun and E. Mizutani. Neuro-jkzy and sofr computing ; a computational approach to learning and machine intelligence, Prentice Hall, (1 997). [Leung, 20001 M.T. Leung et al. Forecasting exchange rates using general regression neural networks. Computers & Operations Research 27, 1093-1 110, (2000). [Li, 20041 L. Li, W. Pang, MD. Trout. Forecasting Short-Term Exchange Rates: A Recurrent Neural Network Approach. Neural Networks in Business Forecasting. G. Peter Zhang Group Inc 195-212, (2004). [Lisi and Schiavo, 19991 F. Lisi and R.A. Schiavo. A comparison between neural networks and chaotic models for exchange rate prediction. Computational Statistics & Data Analysis 30, 87-102, (1999). [Makridakis and Hibon, 19791 S. Makridakis and M. Hibon. Accuracy of forecasting: An empirical investigation. Journal of the Royal Statistical Society, A, 142, 97-145, (1 979). [Qi and Wu, 20031 M. Qi and Y. Wu. Nonlinear prediction of exchange rates with monetary fundamentals. Journal ojEmpirical Finance 10,623-640, (2003). [Poddig and Rehkugler, 19961 T. Poddig and H. Rehkugler. A ‘world’ model of integrated financial markets using artificial neural networks. Neurocomputing 10, 25 1-273, ( 1996). [Triggs, 19671 D. Triggs. Exponential smoothing with adaptive response rate. Operations Research Quarterly, 18,53-59, (1967). [Ucenic and Atsalakis, 20051 C. Ucenic, G. Atsalakis. Forecasting the profit for the Greek non-metallic sector using a Neuro-fuzzy approach (ANFIS), Proceeding of 4th International Conference on the Management o j Technological Changes, Technical University Crete, Chania, Greece, (2005). [Yao and Tan, 20001 J. Yao, and C.L. Tan. A case study on using neural networks to perform technical forecasting of forex. Neurocompuling, 34,79-98, (2000).
The Organizational Structure of Greek Libraries: The State of the Art and the perspective of team working Anthi Katsirikou University of Piraeus Library anthi6lunipi.a
Abstract The paper examines the organizational structure of Greek libraries, in the context of continuing change. It seeks the influence of the information technology and the new complex procedures that are necessary. Management turns to a more vertical hierarchy transferring decision -making and responsibilities to the lower levels. What that means is that library staff has to participate to decision making, problem solving, risk undertaking, innovation transferring. The most active role of human resources of the libraries ensures the effectiveness and efficiency of the management of library change. Consequently, the team working has characterized as the most effective working model. Key wordr: Library Management, Team Working, Decision -Making, Change Management, Innovation transfer
Introduction The research used questionnaires, which posted or faxed or emailed to the Greek libraries, in order to seek the staff role to the organization and decision -making. We send 133 questionnaires and received 109 answers, but we used the 96 ones as the others were not fully answered. 49% of the respondents were library directors and decision makers and 51% of them were members of the staff. The method of the analysis is the Q-Analysis, which reveals the interrelations between the parts of a system and the way the data are connected. (1)
The team- working in libraries There are many definitions and descriptions of the team (Lurey and Raisinghani, 2001, Soete, 1998: 29, Katzenbach and Smith, 1993, Buchholz KCX~ Roth, 1987: 15, Herlau and Darso, 1994, Bluck, 1996: 8, Soete, 1998: 32 and 72). A team is a small group of people (typically fewer than twenty) with complementary skills committed to a common purpose and set of specific performance goals. Its members are committed to working with each other to achieve the team’s purpose and hold each other fully and jointly accountable for the team’s results (2). It is obvious that the team is not a sum of individuals who simply cooperate. Teams cannot exist for long without a 423
424
Recent Advances in Stochastic Modeling and Data Analysis
performance driven purpose to both nourish and justify the team’s continuing existence. In any situation requiring the real-time combination of multiple skills, experiences, and judgments, a team inevitably gets better results than a collection of individuals operating within confined job roles and responsibilities. Teams are more flexible than larger organizational groupings because they can be deployed, refocused, and disbanded, usually in ways that enhance rather than disrupt more permanent structures and processes. Teams are more productive than groups that have no clear performance objectives because their members are committed to deliver tangible performance results. Teams and performance are an unbeatable combination (2). The Benefits of the work organization in teams is detailed in bibliography (Birchall and Lyons, 1995: 138, Baldwin and Migneault, 1996: 39, Darso, 2001: 111, Bluck, 1996: 1). A variety of reasons forced libraries to significant changes from traditional structure to a more modern and flexible. Some of the reasons are (3): 1. The influence of the information technology and the new complex skills that is necessary. 2. Libraries like service enterprises have to turn more to a clientcentred organizational model than to a collection centred one. 3. Management turns to a more vertical hierarchy transferring decision -making and responsibilities to the lower levels. 4. Consequently, the team working has characterized as the most effective working model.
The goal of the survey is the comparative study on: J J J
J
J J
J
The organizational points between the different kinds of libraries. The organizational structure of the libraries. The process of change. The staff attitude on changes. The role of the staff on decision -making. The innovation transfer. The undertaking of risk.
The influence between technology and changes examine the two first questions. The majority of the responses, more than 90% supports the relation among those factors.
The Organizational Structure of Greek Libraries
425
S
98%
po~oses ~ecidesthe ~ ~ ~ ~ g e s . The following two questions examine the proposal and decision on changes in library services to users (diagrams 3 and 4). The results appear that the high levels of the a ~ i n i s ~ a t i oare n the protagonists in both procedures. Only the whole staff appears to propose changes to the services. ~~~
new services to users were Other Central Adm Members of Staff All Staff Head of Library
0%
5%
10%
15%
20%
25%
30%
35%
426
Recent Advances in Stochastic ~ o d e l ~ nand g Data Analysis
4. The new services to users were decided by:
.......
......
5%
0%
10%
20%
15%
25%
30%
35%
Accordingly, the majority of organizational changes have proposed and decided by the administration of the library, while staff is not absent. It is important to say that staff takes responsibility more in organizational structures of the library, the internal procedures than in services (diagrams 5 and 6).
5. The organizational changes were propose^
Members of Staff % F
0%
P + --,
5%
10%
.................
15%
20%
6. The o r g ~ z a t i changes o~ were d
25%
30%
~ by: d
35%
~
~
Members of Staff All staff Head of Library
0%
/
5%
10%
15%
20%
/
/
25%
30%
35%
The Organizational Structure of Greek Libran'es
Public Libraries
Head Library
Central Academic Libraries
Departmental Academic Libraries
of
AIIStirff
Head of Library
Head of Library
AN Staff
All Staff
427
Members of stuff
Administration
Members of Staff
Cvtrtrul Administration
Members of Staff
Central
Central Administration
Special Libaries
Head of Lihrury
proposed by: New services 8 decided by: Organizational 8 changes
All StClff
Mwibers
Cetitritl
0f
Atlririirirtratioit
2
2
6
5
3
3
3
3
2
staff
proposed by:
Organizational changes decided
6
by:
Remarks J Library staff can propose changes to work context, but they cannot decide on them, especially on the services, except of a low percentage. J On the contrary, they feel free to propose and decide on the organizational changes. J Public and central academic libraries perform independcnce of the central administration. Their staffs are the protagonists of the evolution. J Legislation effects on the range of the autonomy of every kind of library.
The attitude of the staff to the changes. Continuing education is the sine qua non condition for libraries and librarians. That includes not only the training in new technology, but also the new services, the new organizational changes etc. This is important in order to achieve the continuing change. The results of the survey show that the 44, 3% of the staff adapt the changes with self study and 30, 5% attended seminars. Self-study declares the personal interest of every professional, while the seminars declare more the interested of the organization. The training helped staff to realize its abilities (34, 7%), its knowledge (29, 7%) and its preferences (28, 7%) (Diagrams 7, 8). The question “what staff realized by training” is important, because individual skills are essential to the constitution of a team. Especially technical and functional skills are the base of the achievement of its goals. On the other side, the power of teams is that they are vehicles for personal learning and development. Their performance focus helps teams to identify skill gaps and the specific development needs of team members to fill them. Every member of library staff can apply hisher knowledge; use hisher experience; discover hisher talents; reveal hisher competences.
The ~ r g n n i ~ n t ~ o Structure nnl of Greek Libraries
429
7. How did staff adapt to new changes?
Following S e m a r s
Examining the staff attitude to the changes, they face changes as challenge (50,5%), accept after training (43,9%), insecurity (4,7%) and embarrassment (0,9%). Staff members face positively the changes and the procedures to change. On the other hand, after the changes, staff has initiatives (43,4%), participate to the decision making (25,4%), manage routines (15,6%) and apply innovation (15,6%) (diagrams 9, 10).
8.
staff to realize t h e k
Preferem skills and coqetencies
Knowledge Talents L
0%
5%
10%
15%
20%
25%
I "4
30%
35%
430
f
Recent Advances in Stochastic odel ling and Data Analysis
at staff do after the organizat~ona
Innovative applications Initiatives Decision making Routines
0%
10%
20%
30%
40%
SO%
\
The organization of the work The work organization is a key factor because the library s t ~ c t u r edefines the individual roles, identities, peer groups and overall hierarchy of the staff. The library culture defines the values, attitudes and behavioral norms of the same staff. The structure of a library may be more or less rigid or flexible, depending on the management style. If library recognizes its new identity, as a team, motivate staff to participate to decision-m~ing,to setting of goals, to form the changes. A team approach should improve communication ~ o u g h o u the t organization. Teams should be aware of how their decisions affect others, and so will cooperate with them to achieve mutually acceptable solutions, rather than acting arbitrarily on their own. Team management can produce innovation and creativity (5). The majority of libraries (57%) answered that their organization is horizontal hierarchy and team working, while the 43% response that their organization is vertical hierarchy (diagram 11).
The Organizational Structure of Greek Libraries
431
11. The organization of work
157%
'[•Traditional organization of work • Group work-Horizontal hierarchy
Conclusion What follow is the results for every kind of libraries. Public Libraries •S Lifelong learning is based mainly on the self- study, ^ Staff is open to changes, •/ Staff is involved in initiatives and decision making, and •/ They work in a horizontal hierarchy. Central Academic Libraries J Staff and director mainly propose and decide. v' Lifelong learning is based mainly on the Seminars and selfstudy, •/ Staff is open to changes, / Staff is involved in initiatives and decision making, and / They work in a horizontal hierarchy. Departmental Academic Libraries •/ Staff and central administration mainly propose and decide. S Lifelong learning is based mainly on the special training and self- study, •" Staff is open to changes, J Staff apply innovation but manage routines too, and S They work in a horizontal hierarchy. Special Libraries / Staff and the director mainly propose. /" Central administration mainly decides. ^ Lifelong learning is based mainly on the self- study, ^ Staff is open to changes, ^ Staff has a part in decision making and takes initiatives, and ^ They work in a traditional hierarchy. Libraries are experienced in organizational structures. Some of them create working teams, as the profit organizations; others decrease the traditional hierarchies moving to the horizontal ones. The culture of the organization,
432
Recent Advances in Stochastic Modeling and Data Analysis
the support of change or traditional structure effect on the model success. So the new library model will collapse without staff participation. In spite of technology, humans achieve the effectiveness and efficiency that is leadership, cooperation and common concept. (4)
Greek Libraries are experienced on the process of change. This attitude declares their capability and their willingness to face change as a challenge, but they are far from creativity procedures yet. Creativity is the way that helps administration and staff to realize the talents and turns, but this realization demands the application of the special methods and tools, which develop the creativio.
References 1.
2. 3. 4. 5. 6.
Makios, Th. (1990): Statistical analysis of a Greek paper mill using Q Analysis. Thesis, supervisor C. H. Skiadas. Chania: Technical University of Crete: Dept. of Production Engineering and Management. (in Greek) Katzenbach, J. R. and Smith, D. K. (1993): The wisdom of teams creating the High-Performance. Boston: Harvard business school press. Brophy, P. (2000): The Academic Library. London: Library Association. Saunders, L. M. (1999): The human element in the virtual library. Library trends, 47(4): 771-787. Drucker, Peter (1996): M&rmwczrdrmfmjKozvwvia. Ah'lva, Gutenberg. Bluck, Robert (1996): Team management. London: Library Association.
Bibliography 1. 2. 3. 4. 5. 6. 7.
Baldwin, D. A. and Migneault, R. L. (1996): Humanistic Management by Teamwork. Englewood, Colorado, Libraries Unlimited Inc. Birchall, D. and Lyons, L (1995): Creating tomorrow s organization. Pitman. Buchholz, Rh. D. and Roth, Th. (1987): Creating the High Performance Team. Edited by Karen Hess. Wiley. Darso, Lotte (2001): Innovation in the Making. Denmark, Samfundslitteratur. Herlau, H. and Darso, L. (1994): The Kubus System and Innovative (HighPerformance) Teams. London, Uwin Workshop. Lurey, J. S. et a1 (2001): An empirical study of best practices in virtual teams. Information and Management, 38(8). Soete, G. J . (1998): Use of Teams in ARL Libraries: a SPEC Kit. Washington, Association of Research Libraries.
CHAPTER 10
Sampling and Optimization Problems
Applicability of Importance Sampling to Coupled Molecular React ions Werner Sandmann University of Bamberg Dep. Information Systems and Applied Computer Science Feldkirchenstr. 21 96045 Bamberg, Germany (e-mail: Werner. sandmannQwiai.mi-bamberg . de) Abstract. Importance Sampling is a variance reduction technique possessing the potential of zero-variance estimators in its optimal case. It has been successfully applied in a variety of settings ranging from Monte Carlo methods for static models to simulations of complex dynamical systems governed by stochastic processes. We demonstrate the applicability of Importance Sampling to the simulation of coupled molecular rextions constituting biological or genetic networks. This fills a gap between great efforts spent on enhanced trajectory generation and the largely neglected issue of reduced variance among trajectories in the context of biological and genetic networks. Keywords: Stochastic Simulation, Importance Sampling, Molecular Reactions.
1
Introduction
As a result of systems' complexity and the huge amount of data that is nowadays available] mathematical modeling and analysis of biological and genetic systems is a n emerging area of growing importance. Molecular reactions are the basic building blocks of living systems and essentially biological or genetic networks are formed by coupled molecular reactions. In chemical terminology the fundamental rule of a molecular reaction is given by a stoichiometry Sm1
Sml
+
' ' '
sm,SmP
--+
sm,+lSm,+l
++
' ' '
SrneSmp
(1)
where s,, , . . . , smY E N are stoichiometric coeficients, are called reactants] S,,,, , . . . , S,, are called products and both reactants and products are molecular species. Such a chemical equation expresses that the left hand side of the arrow can be transformed t o the right hand side of the arrow. Complex chemical processes are given by sets of such reactions. The stoichiometry thus defines which molecular species may react t o result in a certain product and how many molecules are involved in a reaction. The temporal behavior is expressed by assigned reaction rates. Several mathematical model approaches reflecting different (but related) viewpoints exist for coupled molecular reactions, and the exact meaning of the reaction rates depends on the chosen model type. A comprehensive treatment of modeling approaches can be found in [Bower and Bolouri, 20011.
with r,! E
N, r 5 !,
S,, , . . . , S,,
434
Applicability of Importance Sampling to Coupled Molecular Reactions
435
In the stochastic approach that we adopt in this paper the system state is given by the number of molecules of each species and the transient (time dependent) state probabilities are given by the so-called chemical Master equation. The underlying stochastic process is a Markov jump process and in fact the chemical Master equation is equivalent to the Kolmogorov differential equations. Since direct solution of the chemical Master equation is often analytically intractable, stochastic simulation is in widespread use to analyze systems of coupled molecular reactions, which in its crude version is well-known in the according community as the Gillespie algorithm [Gillespie, 19771. However, stochastic simulation is inherently costly and besides suffers from the random nature of simulation results. While several attempts to enhance trajectory generation for specifically structured systems have been reported, e.g. [Gillespie, 20011, [Rao and Arkin, 20031, [Rathinam et al., 20031, [Cao et al., 20051, no essential efforts have been spent to reduce the variance among trajectories. We aim at filling this gap by applying Importance Sampling, a well-known classical variance reduction technique, to the simulation of coupled molecular reactions. The remainder of the paper is organized as follows. Section 2 briefly exposes the stochastic approach to modeling coupled molecular reactions and its relation to Markov processes. In Section 3 the general measure theoretical Importance Sampling setting is given from which the formulae for the application to coupled molecular reactions are derived. Then feasible ways of applying Importance Sampling in this specific setting are investigated. Finally, Section 4 concludes the paper.
2
Stochastic Modeling of Coupled Molecular Reactions
Stochastic interpretations of chemically reacting systems can be traced back to the 1960s [McQuarrie, 19671. A formulation on a physical basis has been provided in [Gillespie, 19761, [Gillespie, 19771 and later on rigorously derived in [Gillespie, 19921. The basic assumptions are that the system is well stirred and thermally equilibrated, meaning that a well stirred mixture of d E N+ molecular species S l, . . . , S d inside some fixed volume interact at constant temperature. The system state at any time t 2 0 is a discrete random vector X ( t ) = ( X , ( t ) ,. . . , X d ( t ) ) ,where for each species Sk, k E { I , . . . , d } and t 2 0 a discrete random variable X k ( t ) describes the number of molecules of species sk present at time t . The set S C N d of all possible system states constitutes the system’s state space. The conditional transient (time dependent) probability that the system is in state z E S at time t , given that the system starts in an initial state zoE S at time t o , is denoted by p @ ) ( z ):= p ( t ) ( z / z ot o, )
=P
( X ( t )= 2 1 X ( t 0 ) = 20).
(2)
The system state changes due to chemical reactions between molecules of some species. These reactions can be decomposed into unidirectional reaction
436
Recent Advances in Stochastic Modeling and Data Analysis
channels R1,. . . , RM such that each reaction channel takes the form (1). The reaction rate of each Rm,m E (1,. . . , M } is given by a well defined function a,, called the propensity function of reaction channel R,, where a,(z)dt is the conditional probability that a reaction of type R, occurs in the infinitesimal time interval [t,t d t ) , given that the system is in state z at time t . That is a,(z)dt = P ( R , occurs in [t,t d t ) I X ( t ) = z) . Given that the system starts in an initial state zo E S at time t o , the temporal evolution of the system is expressed by the chemical master equation (CME)
+
+
where urn = (v,~, . . . , V m d ) is a state change vector and V m k , k E { 1,.. . ,d} denotes the change of molecules of species Sk due to a reaction of type R,. The reaction rates a, are time-independent since the probability that a reaction occurs within a specific time interval only depends on the length of this interval and not on the interval endpoints. Thus, given a current system state, the next state in the system’s time evolution only depends on this current system state and neither on the specific time nor on the history of reactions that led to the current state. Hence, the time evolution of the system is mathematically described by a stochastic process ( X ( t ) ) t l owith d-dimensional state space S C N d , and due to the just stated independence of time and history this stochastic process is a discrete-state Markov process, a Markov jump process, or a continuous-time Markov chain (CTMC). 2.1
Equivalence of CME and Kolmogorov Differential Equations
Terminology and notation in the theory of CTMCs is usually rather different from that used to express the CME. Therefore, we briefly explain how they correspond to each other. The multidimensional discrete state space can be mapped to the set N of nonnegative integers, i.e. each state 2 E S is uniquely assigned to an integer i E (1,. . . , IS]}. The probability that a transition from state i E N to state j E N occurs within a time interval of length h 2 0 is denoted by p i j ( h ) , and correspondingly P ( h ) = ( p i j ( h ) ) i , j E is ~ a stochastic matrix, where P(0) equals the unit matrix I, since no state transitions occur within a time interval of length zero. It is well known (cf. [Bremaud, 19991, [Van Kampen, 19921) that a CTMC is uniquely defined by an initial probability distribution and a transition rate matrix, also referred to as infinitesimal generator matrix, Q = ( q i j ) i , j G N consisting of transition rates qij where Q is the derivative at 0 of the matrix function h H P ( h ) . The relation of each P ( h ) to Q and an explanation for the term infinitesimal generator matrix is given by P ( h ) = exp(hQ). In that way Q generates the the transition probability matrices by a matrix exponential function which is basically defined as an infinite power series. Hence, all information on transition probabilities is covered by the single matrix Q. In terms of P and
Applicability of Importance Sampling to Coupled Molecular Reactions 437
Q the Kolrnogorov forward differential equations, the Kolmo,gorov backward differential equations, and the Kolmogorov global differential equations can be expressed by (from left to right: forward, backward, global)
a
at
= P(t)Q,
a
-atP ( t )
= QP(t),
a p ( t ) = P ( t )Q ,
at
(4)
where p ( t ) denotes the vector of the transient state probabilities corresponding to (2). Explicitly writing the Kolmogorov global differential equations in terms of the coefficients and some algebra yields
Now, the equivalence of the CME and the Kolmogorov differential equations can be easily seen by interpreting i E N as the number assigned to state z E S, i.e. pkt) = ~ ( ~ ) ( qzi j) ,= a,(z) if j is the number assigned to state z w,, and qji = a,(z - v,) if j is the number assigned to state z - v,.
+
2.2
Stochastic Simulation
The essential part of any simulation is to imitate the system under consideration. Consequently, simulation of coupled molecular reactions consists of generating trajectories of a CTMC. With the terminology used in the derivation of the CME this is celebrated (though nothing else than a crude direct generation of trajectories, which is known a t the latest since the 1950s) as the Gillespie algorithm [Gillespie, 19771 in the biochemical literature: Init t := to und z := zo repeat M 1. Compute all a,(z) and a.,(z):= a,(z) 2. Generate two random numbers u1, u2, uniformly distributed on ( 0 , l ) 3. Generate time T to next reaction: T = - ln(ul)/ao(z) 4. Determine reaction type: m = min{k : al(z)+.. .+ak(z) > u 2 a o ( z ) } 5. Set t := t T ; z := z w, 6. Store/Collect/Handle Data until ”terminating condition”
+
+
An equivalent version using a different interpretation of the CTMC dynamics is due to [Gibson and Bruck, 20001.
3
Importance Sampling
Importance Sampling is a variance reduction technique that makes use of a change of measure. The original system is simulated under a different
438
Recent Advances in Stochastic Modeling and Data Analysis
probability measure, and the systematically biased results are weighted by a correcting factor, the likelihood ratio, to yield unbiased estimates. In a general measure theoretic setting, Importance Sampling is based on the Radon-Nikodym theorem, and all applications of Importance Sampling can be derived from this setting. Consider two probability measures P and P* on a measurable space (0,A ) ,where P is absolutely continuous with respect to P * , which means that for all A E A, P * ( A )= 0 + P ( A ) = 0. Then, the Radon-Nikodym theorem guarantees that the Radon-Nikodym derivative L = dP/dP* exists, and that
VA E A
:
P ( A )=
L
L(w)dP*.
(6)
In the context of Importance Sampling the probability measure P" is called the Importance Sampling measure, and L is referred to as the likelihood ratio. The basic property exploited by Importance Sampling is that expectations with respect to P are identical to expectations with respect to P* when weighting by the likelihood ratio. Let L be a version of the likelihood ratio and Y a random variable on (0,A). Then
E p [ Y ]=
J' Y(w)dP
=
J'
Y(w)L(w)dP*= Ep- [ Y L ] .
(7)
Using a different density or probability distribution/measure is called a change of measure, and it is the essential part and the art of Importance Sampling to perform this change of measure such that more accurate estimates can be achieved. Many early applications of Importance Sampling can be found in [Hammersley and Handscomb, 19641. The framework for stochastic processes, which is of special interest in our setting of coupled molecular reactions has been given in [Glynn and Iglehart, 19891. 3.1
Application to Coupled Molecular Reactions
Applying Importance Sampling to coupled molecular reactions first of all requires the distribution or density, respectively, of reaction paths. The discrete state of the system changes due to molecular reactions. Let t l < t z < . . . denote the successive time instants a t which reactions occur, and Rm, the reaction type that occurs a t time t,, where m, E (1,.. . , M } . Define T, := t % + 4, ~ the time between the i-th and the (i 1)-th reaction. Hence, state z ( t z ) is reached due t o the i-th reaction Rmz at time t , and remains unchanged for a sojourn time of r, after which the (i 1)-th reaction R,%+,occurs a t time tz+l and changes the state to z(t,+l).Hence, the time evolution of the system is completely described by the sequence of states and corresponding sojourn times, and in compact form (z(to),ro), ( z ( t l ) , r 1 ) ,( z ( t z ) , r z ) ., . describes a trajectory. For a trajectory up to the R-th reaction, considering the Markovian property implying exponentially distributed sojourn times, the reaction
+ +
Applicability of Importance Sampling to Coupled Molecular Reactions
439
path density is given by R
dto)(zo)
arn,-1
’
( 4 t i - l ) ) exp ( Q O ( z ( t i - 1 ) ) T i - l )
1
(8)
i=l
+
where ao(x(ti-1)):= a l ( z ( t i - 1 ) ) ...ah.i(z(ti-l))just as in the Gillespie algorithm. Now, in order to perform an Importance Sampling simulation, we need to change the underlying probability measure, which is in the case of coupled molecular reactions determined by the propensity functions. The requirement of absolute continuity leaves us a great freedom in how to change the measure. It is only necessary that all reaction paths that are possible (have positive probability) under the original measure remain possible with Importance Sampling. That means each measure on the sample path space that meets the aforementioned can be considered, even non-Markovian models are allowed as long as they assign positive probabilities to all possible reaction paths. Nevertheless, we should avoid a large increase in trajectory generation efforts compared to the original measure. Thus, obviously the most natural change of measure is to remain in the Markovian world and the easiest way is to simply change the original propensity functions to ” Importance Sampling propensity functions” aQ such that for all m E { 1, . . . , M } a t ( x ) = 0 + a m ( z )= 0, z E S , or equivalently, starting with the original propensity functions, a,(z) > 0 + a & ( z )> 0, z E S. Importance Sampling then generates trajectories according to the changed propensity functions and multiplies the results with the likelihood ratio to get unbiased estimates for the original system. Trajectory generation is thereby performed as before, e.g. by the Gillespie algorithm, where now the changed propensity functions are used, yielding a sequence of states with according sojourn times and reaction path density as in (8). Thus the likelihood ratio becomes
where we have kept the initial distribution unchanged. Rewriting this likelihood ratio yields
which shows that the likelihood ratio can be efficiently computed in course of the trajectory generation without much extra computational effort by successively updating its value after each reaction. In particular, the unbiased number of molecules can be obtained a t any time. Although naturally arising the change of measure as described above may be too restrictive. In cases where more flexibility is needed, it is possible
440
Recent Advances in Stochastic Modeling and Data Analysis
to use a different change of measure in ea,ch simulation step or propensity functions that depend on the number of already occured reactions (corresponding t o a nonhomogeneous model) or the history of the just executed , for simulation steps. Formally, define functions P g ) ( z ( t o ) ., . . , z ( t r ) ) where . . . , z ( t T ) )> 0. Then the all m E (1,. . . , M } : a,(z(t,)) > 0 + ,Og)(z(to), reaction path density under Importance Sampling is R
i=l
and the corresponding likelihood ratio (leaving the initial distribution unchanged) becomes
which can be easily updated after each reaction in course of the simulation. 3.2
Further Issues
Now we are done with demonstrating the applicability of Importance Sampling to coupled molecular reactions in that we have given a framework and general rules. An issue that remains open is to concretize these change of measure rules, i.e. how to change the propensity functions in order t o achieve variance reduction in practice. The functions af or &), respectively, must be chosen dependent on the specific model under consideration. In fact, this is an art of Importance Sampling, and a large body of literature exists on change of measure guidelines for specific model classes, e.g. in the context of rare event simulation, a review of which is far beyond the scope of the present paper. The reader is referred to, e.g. [Heidelberger, 19951, [Bucklew, 20041, [Sandmann, 20071. Stiff systems are of particular interest and difficulty in analyzing coupled molecular reactions. In stiff systems reaction rates differ in orders of magnitude, which arises because reactions occur on multiple time scales meaning that some reactions are much slower than others and occur significantly rarer. Thus, even the generation of one single trajectory becomes very computer time demanding. In this setting Importance Sampling can result in both accelerated trajectory generation and reduced variance among the trajectories.
4
Conclusions
We have shown how t o apply Importance Sampling to stochastic simulations of coupled molecular reactions. General conditions and different feasible ways
Applicability of Importance Sampling to Coupled Molecular Reactions
441
to perform the change of measure have been given, all of which render efficient computation of the involved likelihood ratios possible. Further research includes the s t u d y of specific change of measure strategies a n d its application to a variety of models. In particular, excessive case studies a r e required to demonstrate t h e efficiency gains achieved by Importance Sampling.
References [Bower and Bolouri, 2001lJ.M. Bower and H. Bolouri, editors. Computational Modeling of Genetic and Biochemical Networks. MIT Press, Cambridge, MA, 2001. [Bremaud, 1999lP. Bremaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York, 1999. [Bucklew, 2004lJ.A. Bucklew. Introduction to Rare Event Simulation. Springer, New York, 2004. [Cao et al., 2005lY. Cao, -, and et al. Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems. Journal of Computational Physzcs, 206:395-411, 2005. [Gibson and Bruck, 2000lM.A. Gibson and J. Bruck. Efficient exact stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry, A 104:1876-1889, 2000. [Gillespie, 1976lD.T. Gillespie. A general method for numerically simulating the time evolution of coupled chemical reactions. Journal of Computational Physics, 22:403-434, 1976. [Gillespie, 1977lD.T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry, 71(25):2340-2361, 1977. [Gillespie, 1992lD.T. Gillespie. A rigorous derivation of the chemical master equation. Physica A , 188:404-425, 1992. [Gillespie, 20011D.T. Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics, 115:1716-1732, 2001. [Glynn and Iglehart, 1989lP.W. Glynn and D.L. Iglehart. Importance sampling for stochastic simulations. Management Science, 35:1367-1392, 1989. [Hammersley and Handscomb, 1964lJ.M. Hammersley and D.C. Handscomb. Monte Carlo Methods. Methuen, London, 1964. [Heidelberger, 1995lP. Heidelberger. Fast simulation of rare events in queueing and reliability models. A C M Transactions on Modeling and Computer Simulation, 5( 1):43-85, 1995. [McQuarrie, 1967lD.A. McQuarrie. Stochastic approach t o chemical kinetics. Journal of Applied Probability, 4:413-478, 1967. [Rao and Arkin, 2003]C.V. Rao and A.P. Arkin. Stochastic chemical kinetics and the quasi-steady-state assumption: Application t o the Gillespie algorithm. Journal of Chemical Physics, 118:4999-5010, 2003. [Rathinam et al., 2003lM. Rathinam, -, and et al. Stiffness in stochastic chemically reacting systems: The implicit tau-leaping method. Journal of Chemical Physics, 119:12784-12794, 2003. [Sandmann, 20071W. Sandmann. Efficiency of importance sampling estimators. Journal of Simulation, 2007. To appear. [Van Kampen, 1992lN.G. Van Kampen. Stochastic Processes in Physics and Chemistry. Elsevier, North-Holland, 1992.
Bispectrum estimation for a continuous-time stationary process from a random sampling Karim Benhenni and Mustapha Rachdi Universitt. de Grenoble Laboratoire Jean Kuntzmann UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France (e-mail: Mustapha.RachdiQupmf-grenoble .fr and Karim.BenhenniQupmf-grenoble.fr) Abstract. We propose an asymptotically unbiased and consistent estimate of the bispectrum of a stationary continuous-time process X = { X ( t ) } t E wThe . estimate is constructed from observations obtained by a random sampling of the time by { X ( T ~ ) where } ~ ~ { ~ ~, k is }a sequence ~ ~ ~of real random variables, generated from a Poisson counting process. Moreover, we establish the asymptotic normality of the constructed estimate. Keywords: Periodogram, Cumulants, Quadratic-mean consistency, Bispectral density, Point process.
1 Introduction The idea of constructing the Fourier transforms of high order cumulants was suggested by Kolmogorov, and polyspectra were introduced in [Shiryaev, 19621. In [Brillinger, 19651 and [Brillinger and Rosenblatt, 19671 authors gave a comprehensive treatment of the theoretical properties of polyspectra, and have discussed also the estimation of polyspectra from sample records (these estimation procedures are based on a generalization of the window technique a.pplied to products of the finite Fourier tmnsform of the da.ta.). Bispectra was discussed in [Tukey, 19591 and [Akaike, 19661, and an application of the bispectral analysis t o the study of ocean waves is given in [Hasselmann et al., 19631, to tides in [Cartwright, 19681, and to turbulence in [Lii et al., 19761 and in [Helland et al., 19791. These Fourier transforms -or rather, the Fourier transforms of the corresponding high order cumulants- are called polyspectra, and a.re defined forimlly as follows. Let { X ( t ) } t E abe a weakly stationary process up t o order k , and let the real , . . , S k - 1 ) denotes the joint cumulant of order k of the set number C i k ) ( s ls2;. ofrandomvariables { X ( t ) , X ( t + s l ) ., . . , X ( t + s k - 1 ) } , i.e. Cik)(s1,s2,. . . ,S k - 1 ) is the coefficient of (21,. . . , &) in the expansion of the cumulant generating function I E ( Z ~., . . z k )
+ a X ( t + s1) + . . . + a X ( t +
= In (E{exp ( z l X - ( t )
sk-I))})
(note that, by the stationarity condition, CLk)(s1,.. . , S k - 1 ) does not depend on t ) . 442
Bispectrum Estimation for a Continuous-Time Stationary Process 443 The outstanding property of polyspectra is that all polyspectra of order higher than two vanish when { X ( t ) } t E Ris a Gaussian process. This follows immediately from the well known property that all joint cumulants of order higher than two vanish for multivariate gaussian distributions. Hence, the bispectrum, trispectrum, and all higher order polyspectra are identically zero if {X(t)>,,R is Gaussian, and these higher order spectra may thus be regarded as measures of the departure of the process from Gaussianity. The main aim of this paper is then t o construct an estimate of the bispectrum of continuous-time stochastic process from a random sampling and t o study its asymptotic properties, namely its mean-square convergence and its asymptotic normality. In Section 2, we give some preliminaries on the time-sampling technique adopted here. Section 3 is concerned with the construction of the bispectrum estimate and the main results of its asymptotic properties. The last section is devoted t o the proofs.
2
Preliminaries
Let X = { X ( t ) } t Ebe ~ a 4th order stationary process, with mean zero and continuous integrable covariance function. From [Karr, 19911, there exists a counting process N,independent of X , which is associated to a sequence {7k}kEZ of random variables taking their values in R. The process N is defined by: N ( A , w ) = 1.4 for ( A , w ) E f 3 ~x 0 , where f 3 is ~ the real borelean a-algebra and N ( A , w ) is the number of T ~ ' Sbelonging to A . We assume that, for every set .4 in BR, the random variable N(.4) has a Poisson distribution P ( A ( A ) ) ,where A ( A ) = B p ( A ) and p is the Lebesgue measure on R and 3 denotes the mean intensity which is assumed t o be known. We consider the sample process Z = {Z(t)},,, constructed from the sequence { X ( T , ) } , ~and ~ the counting process N ( t )as follows.
xkEZ
Definition 1 The sample process 2 zs defined by: Z ( A ) = JA .X(t)N(dt)= Ck,ZX ( T k ) n A ( T k ) = CrkEA x('%)rV A E The process 2 is also called zncrement-process and can be written as: Z ( t ) = Jot X ( S ) or in the differential representation: d Z ( t ) = X ( t ) cLN(t), which proves that 2 is a stationary process and that its covariance function R , is such that: R z ( d u ) = R x ( u ) (Bb(L1) 02)du.
+
Denote, respectively, by d$!) and r$f) the spectral densities of the process X of the process X and the increment process 2. If R, and its Fourier transform FR, are absolutely integrable then dp) exists, is bounded, uniformly continuous and is given by &'(A)
= ,B2&'(A)
+ g3
R,(O), VA E R
Recent Advances in Stochastic Modeling and Data Analysis
444
q5p)
Thus, the estimate of ’4 : can be deduced from the estimate of and R,(O) (Cf [Lii and Helland, 19821, [Lii and Masry, 19941, [Monsan and Rachdi, 19991 and [Gallego and Ruiz, 20001).
Bispectrum estimation and its asymptotic properties
3 Let
denote by
US
d c ) and
q5F) the respective bispectrum of X and
+ t ) , x ( u 2 + t ) ,~ ( t )=)B { ~ ( u l t+)~ dCg)(u~ u2) , = cum {d2(u1 + t ) , dZ(uz + t ) , d Z ( t ) }
c ? ) ( u i j u2) = cum {x1(u1 and
then q5!$
and
2.Set
( u+z t ) ~ ( t ) )
4L3’ are defined by
and
Equation (1) is the relationship which allows to estimate data as follows. Given the observations {X(T~)};::), T the number of points
4:’
from discrete
> 0, where n/(T) is
{ T ~ Cfalling } in [O, TI, we estimate the bispectrum 4:)
by
estimating the bispectrum of 2,the function d~ and the constant CF’(0,O). 7 3 ) dZ,, 7 3 ) and QT the respective estimates of For this, let us denote by dx,T, d c ) , 4L3) and dj. Consider the three dimensional spectral window W and , verify the following assumptions. the bandwidth b ~ which h
Bispectrum Estimation for a Continuous-Time Stationary Process 445 Assumptions 1 (i) W E L1 n Lm(R3) i s a positive function such that:
I
I
and ~ w ( u l ,au, uz,-ul-~z) < - C ( 1 + 11(u1>u2)112)-2--E, are two positive real numbers, and II(ul,uz)112 denotes the euclidean norm of (u1,uz).
(ii) IW(ul,uz, -ul-u2)1 j = 1, 2 where C and
E
(izi) (bT)TEWas a sequence of positive real numbers such that: br
Inorder to construct
$7Z3, T),
+ +W
G
and - 4 0 as T T
+ +W
weset: W T ( U ~ , U ~= ,U b~W(bTui,bT’[Lz.,bTu3) ~)
The periodogram (empirical estimate of
4L3)) is defined by h
h
IT(A1,A,)
=
; ~ ^ ~ , T ( x ~ ) ~ ~ , T ( x ~ ) ~ ~ z,T ~ (2- x 1 I, (~T)~T
(2)
h
where d z , T ( X ) = C::’ X ( T ~exp(-i ) X Q ) , is the finite Fourier transform of the observations {X ( i - k ) } f l T ) . In order to obtain a consistent estimate of $f’, we smooth 1, by WT, which yields to: h
c +W
P;L(X~,
~
2 =)
1 (E)
WT
(A,
- wi,~2 -
w j ; -XI - ~2 - w t + j )
IT
(ui,w j )
i,j=-m
where w j = 2 ~ j / T for j E
Z denotes the Fourier
frequencies.
Assumptions 2 R, and Ci3’ are absolutely integrable and for all k = 1,.. . , 5
where u = ( u , ~ ..,.. , u , k ) and llw,lll = k-th derivative of 4, ( u ) .
k Cj=l Iu,~/,
where
~P’(u,)denotes the
We have that under Assumptions 2, the measure dCg) is integrable and
from which we infer the following properties that are useful t o establish the asymptotic behavior of JL3i(X,, A,) (Cf Theorem 1). For this aim, the following proposition gives-the asymptotic behavior of the bias and the covariance of the periodogram I T .
446
Recent Advances in Stochastic Modeling and Data Analysis
Proposition 1 Let X I ,
Xp..
p1 and pp be any real numbers, then
E{TT(Xl,A,)}
=
+g)(A1> + XZ)
where 0 ( l / T ) is uniform in XI and
+
Xp.
(l/T)
Moreover, the covariance as
+
where X3 = -(XI Xz), p3 = -(pI p z ) , S3 is the set of all permutations of {1,2,3} and &(A) = sin(TX/2)/(TX/2).
Theorem 1 If both assumptions 1 and 2 are satisfied, then the bias and the covariance of are given by
$2;
qF!;(Xl,X2)}
In order to estimate the function
h
= &)(Xl,XZ)
bT +O(+
we propose the following estimate:
$J,
zz:)
where DX,=(X)= exp(-iXTj) X ’ ( T ~ ) .Then, the asymptotic behavior of this estimate is studied in the following proposition.
Proposition 2 Under Assumptions 1 and 2, we have (3)
For the term Ci3’(0, O),!?/(27r)2, we propose the following estimate:
for which the asymptotic properties are given in the following proposition.
Bispectrum Estimation for a Continuous- Time Stationary Process 447 Proposition 3 Under Assumptions 1 and 2, we have ,-. 1 B E{ETT)= ---c(~)(o,o) and var{CTj = o (-) T
(27r)Z
Finally, in order to estimate the bispectrum of estimate,
X,we propose
the following
and its asymptotic properties are given in the following theorems.
Theorem 2 Under Assum.ptions 1 and 2, th,e bispectmim estim,ate defined b y equation (4) is asymptotically mean square convergent. Moreover, its asymptotic integrated mean squared error i s given by:
In the following theorem, we establish the asymptotic distribution of the bispectrum estimate.
Theorem 3 For an integer k 2 3, suppose that X is stationary up to order k and the kth order cumulant of X is absolutely integrable. Let ( A ~ , J , A z , ~ ) , . . . , (AI,~, A+.) be r couples of real numbers. Then under assumptions of Theorem 2, the standardized variables
are jointly asymptotically normally distributed with mean 0 and covariances may be obtained easily from Theorem 1 and Propositions 2 and 3.
4 Proofs In order t o save space in this paper, the proofs of the previous results were summarized. For more details, the readers are advised to contact the authors. Proof of Proposition 1. Let us begin by computing the bias of TT. for this aim, by (2), we have that E{TT(AI,AZ)}
=
1
(2n)2T ( r o , ( 0 ) ( 2 n ) 2 0 f ’ ( A 1 , A Z +0(1)) ) =+f)(Al,AZ)
where 0(1/T) is bounded uniformly in
A1
and Az, and where
+0(1/T),
448
Recent Advances in Stochastic Modeling and Data Analysis
where the sum {l,. . . , 2 k } .
Let
cyi =
C,, is extended
over all the indecomposable partitions of
X i for i = 1, 2, 3 and cq = pi--3 for i = 3, 4, 5. Then
From [Brillinger, 19691 and (5), we get T - l
COV{TT
(XI
j
XZ),
%(PI
j
PZ)}
where v is an indecomposable partition, I / : is a subpartition such that Iv:l = (vi(- 1 where Ivi( denotes the number of elements of J / C , and is the spectral density of order lvil of 2. We have that DT and for lvil 5 6 are bounded, then the maximum order of magnitude of the quantity (6) is 0 ( T L P 3That ) . is, for 1 = 1 the order of magnitude is 0(T2), for 1 = 2 the order of magnitude is O(T-’) and for 1 = 3 the partitions are composed of the pairs {XI, p p ( l ) } {, A l , p p ( 2 ) } {AI, , pp(3)} where p E S3,and if 1 > 3 then there exists a unique set in the partition and thus the cumulant vanish. Thus
4p)
dp)
1. By some simple computations, from Proposition 1,
We have t o study both series ( 7 ) and (8). In order t o establish the asymptotic behavior of (7), we consider the finite sum (for h/l E N)
$2 z,,=-M
WT(X1
-w,,Xz
-“,,-A1
-
X z -w(t+3))&
( 3 )(“t,”,)
Bispectrum Estimation for a Continuous- Time Stationary Process
Xg(-bTXi
449
(9)
- ~ T W M - ~ , - ~ T X Z - ~ T W M - - ~ )
+
+
where y denotes the function y(z,y) = ~ ~ ' ( x / O T XI,:y/bT &). On the other hand, we have that: when A j , B j , hj and Nj be real numbers such that Aj 5 B j , hj > 0 and Nj = B j - A j / h j for j = 1,.. . k , then N1-1
fihi i=l
Nk-1
nl=0
d v ) dv + R , (10)
g ( A l +nlhl,...,Ak+,nkh,k) =
t..
Jn:=liAc,Bi)
nk=O
where IRI 5 C Cf=lhj with C is a positive constant uniform in k . Thus, by applying ( l o ) , the term (9) becomes 2 ( M + l ) b T n -bT
J:
Z(M+l)bT
7
bTA2
W(Ul,'UZ, +-bT 1 M b TT
XI
+o
-'u1--.u2)y(~u1;u2)d~u1LE~uz
L Z +M- bbT A x Z
since R verifies IRl 5 47rC b T / T . Thus, when M
+
+m, (9) becomes
On the other hand, the series (8) is less than
thus
Since 4i3) is bounded, W is absolutely integrable and bT + +m, the result is obtained by a simple application of the dominated convergence theorem. Now, we are concerned with the consistency of $L3&. For this, from Proposition 1, the general term of the covariance cov is:
3
wTb1 - W v l r P z
-drzrp3 -
3
A~),$gi(pl,p~)}
1
U I r 3 ) 6 ( C w , ~ ) 6 ( ~ ~ ~ , ) I I T G ( d s , , ) ~ E~ Vt) )(UI~.)N Z=1
z=1
z=1
which may be written as the sum of two terms and we have t o study the asymptotic behavior of each one.
450
Recent Advances in Stochastic Modeling and Data Analysis 0
The first term of (11) is:
i=l
i=l
i=l
Notice that this sum contains no more than ( 5 - 1) variables, but that we consider these ( 5 - 1 ) variables since at the end of the calculations we extend the sum t o all the terms. Thus (12) may be written as
Moreover, by applying (10) t o the general term of the sum (13), we get W ( ~ l , u 2 , u 3 ) W ( v l , v 2 , v 3 ) b ( u+l u 2 + u 3 ) 6 ( v 1 + v 2
+US)
where yzC { x 1 , x 2 , x 3 , - 1 1 l 1 - 1 1 2 , - 1 1 3 } and rl, c { U 1 , ~ 2 , ~ 3 , - - 2 r i , - v 2 , - ' u 3 ) analogous t o v, element of the partition v , S,, = CzEy, x, S,,, = x, and 7,'c yzsuch that Ir:l = Iyz(- 1, and q: is defined in a similar way. Remark that, if 1 > 3, the studied term vanishes since there exists an element of the partition which is a single set. But if 1 5 3, the main terms are given for 1 = 3. We conclude that the first member of the general term is:
zzE1),
0
The second term of (11) may be bounded by:
Bispectrum Estimation for a Continuous-Time Stationary Process
andasabove, weapply (10) t o I W ( s , y , - s - y ) W ( t , z , - t - z ) l , to
J-1:
IW(Ul,U2,
-u1
-
u2)W(v1,v2, -01
- v2)l d U l
d.u2 d V l
dV2
451
whichleads
+0
(3
Thus the second term behaves as o (b$/T). Since W is symmetric and absolutely integrable, the result is then obtained by a simple application of the dominated convergence theorem.
Proof of Proposition 2. From (lo), we have by a classical calculation that
As W is absolutely integrable and $J is bounded, the result (3) is obtained by a simple application of the dominated convergence theorem. On the other hand, from (lo ), we have
452
Recent Advances in Stochastic Modeling and Data Analysis
which completes the proof.
Proof of Proposition 3. The average of
eTis such that
eT
Hence, is an unbiased estimate of C~’(0,0)3/(27r)2 For the variance, we have that
where f(.)
+ + 5 R; ( 0 )R x ( u )+ 2 R; ( u )
= 3Rx(0)Cy)(0,0,0) +3Rx(0)Cy)(u,u,u)l O R $ ( O ) C ~ ) ( u , u , O )
+9 C$) (,u,O)C$)(,u,,u)
Then, under Assumptions 2,we obtain the result.
Proof of Theorem 2. The asymptotic behavior of the bias of the estimate is directly obtained by writing
To compute the cross-covariance of the estimate properties:
~
C
o
~
{
~
~
~
(
~
l
,
~
~
Thus, we deduce that:
)
,
~
we use the following
T
}
(g).
=~ o’ =
0
(
~
)
0
(
~
)
($)
c o v { ~ ~ j T ( ~ l , ~ z ) , ~ ~ , )=T0 (~l,~2)}
which completes the proof of the theorem.
Proof of Theorem 3. Notice that, for space reasons we omitted all the technical details which could be obtained from the authors. In another hand, the reader can refer t o the papers [Lii and Masry, 19941 and [Monsan and Rachdi, 19991 and follow the same steps in order to prove the asymptotic normality of the estimate by using the cumulants theory.
Bispectrum E s t i m a t i o n f o r a Continuous- T i m e Stationary Process
453
References [Akaike, 1966lH. Akaike. Note on higher order spectra. Ann. Inst. Stat. Math., 18:123-126, 1966. [Brillinger and Rosenblatt, 1967lD.R. Brillinger and M. Rosenblatt. Computation and interpretation of k-th order spectra. Spectral Anal. T i m e Ser., Proc. advanced Semin. Madison, pages 189-232, 1967. [Brillinger, 1965lD.R. Brillinger. An introduction to polyspectra. Ann. Math. Stat., 36, 1965. (Brillinger, 1969]D.R. Brillinger. Asymptotic properties of spectral estimates of second order. Biometrika, 56:375-390, 1969. [Cartwright, 19681r). Cartwright. A unified analysis of tides and surges round north and east britain. Philos. Trans. R. SOC.London A , 263:l-55, 1968. [Gallego and Ruiz, 2000lA. Gallego and D. P. Ruiz. Ar-modelling bispectrum estimation: a comparison of existing methods and new contributions. COMPELthe international journal of Computation and Mathematics in Electrical and Electronic engennering, 19(1):48-69, 2000. [Hasselmann et al., 1963lK. Hasselmann, W. Munk, and G. MacDonald. Bispectra of ocean waves. Tzme series analysis, M . Rosenblatt, Ed- New York : John Wiley, 1963. [Helland et al., 1979lK.N. Helland, K.S Lii, and M Rosenblatt. Blspectra and energy transfer m grad-generated turbulence. I n Developments in Statistics, P . R. Kmshnalah, Ed., Academic Press, New Yo&, 2:123-155., 1979. [Karr, 19911A. F. Karr. Point processes and their statistical inference (Second Edition Revised and Expanded). Marcel Dekker, inc, 1991. [Lii and Helland, 1982lK. S. Lii and K. N. Helland. Cross-bispectrum computation and variance estimation. A C M Transactions on Mathematical Software, 7(3):284-294, 1982. [Lii and Masry, 1994lK. S. Lii and E. Masry. Spectral estimation of continuoustime stationary processes from random sampling. Stochastic processes and their applications, 52:39-64, 1994. [Lii et al., 1976lK.S Lii, M. Rosenblatt, and C.W. Van Atta. Bispectral measurements in turbulence. J . Fluid Mech., 77(1):45-62, 1976. [Monsan and Rachdi, 1999lV. Monsan and M. Rachdi. Asymptotic properties of padic spectral estimates of second order. Journal of Combinatorics, Information d System Sciences, 24(2):113-142, 1999. [Shiryaev, 1962lA.N. Shiryaev. Some problems in the spectral theory of higherorder moments. i. (english. russian original). Theor. Probab. A p p l . , translation f r o m Teor. Veroyatn. Primen. 5, 293-313 (1960), 5, 1962. [Tukey, 1959lJ.W. Tukey. An introduction to the measurement of spectra. Probability and Statistics, H. Cramr Vol.:300-330, 1959.
Search via Probability Algorithm for Engineering Optimization Problems Thong Nguyen Huu', Hao Tran Van' 'Mathematics-Informatics department University of Pedagogy 280, An Duong Vuong, Ho Chi Minh city, Viet Nam Fax: 084-8398946 (E-mail: [email protected]) Abstract. This paper proposes a new numerical optimization technique, Search via Probability (SP) algorithm, for single-objective optimization problems. The SP algorithm uses probabilities to control the process of searching for optimal solutions. We calculate probabilities of the appearance of a better solution than the current one on each iteration, and on the performance of SP algorithm we create good conditions for its appearance. We test this approach by implementing the SP algorithm on some test single-objective
optimization problems, and we find very stable results. Key works: Numerical optimization, Probability, Stochastic, Random, Algorithm.
1 Introduction There are many algorithms, traditional computation or evolutionary computation, for single-objective optimization problems. Almost all focus on the determination of positions neighbouring an optimal solution and handle constraints based on violated constraints. We suggest a new stochastic approach to find optimal solutions, Search via Probability (SP) algorithm. The SP algorithm uses probabilities to control the process of searching for optimal solutions. We calculate probabilities of the appearance of a better solution than the current one on each iteration, and on the performance of SP algorithm, we create good conditions for its appearance. There are three problems: 0 To evaluate objective function, the role of left digits is more important than the role of right digits of a decided variable. We calculate the changing probabilities of digits of decided variables for searching an optimal solution the first time. 0 Based on the relation of decided variables in the formulas of constrains and objective function we select k variables (Ilkln) to change their values instead of selecting all n variables on each iteration. 454
SP Algorithm for Engineering Optimization Problems 0
455
Because we can't calculate exactly the number of iterations of a stochastic algorithm for searching an optimal solution the first time on each performance. We use unfixed number of iterations, which has more chance to find an optimal solution the first time with necessary number of iterations.
The model of single-objective optimization problem We consider a model of single-objective optimization problem as follows: Minimize subject to
f (x)
where
a, I x , $b,,a,,b, ~ R , i = ..., l , n.
g, (x) I 0 ( j = 1,. ..,r)
3 Probabilities of changes and selecting vaIues of a digit We suppose that every decided variable x, ( l l i l n ) has m digits that are listed from left to right x , ~ ,x12,..., x,, (x,, is an integer and 0lxlJ<9, llj<m). We calculate changing probabilities of digits which can find better values than the current ones on each iteration. 3.1 Probabilities of changes According to the feasible region of problem which is narrow or large; we have two cases as follows: Case 1: The feasible region of problem is large; therefore the right digit depends on the next left digit for evaluating objective function. We consider the j-th digit xI1of variable x,.Let A, be an event that the value of xIJcan be changed and e be the probability of event AJ. A A , ) = 9, (1 5 j 5 m)
We find values of digits of a variable from left digits to right digits one by one. If the value of left digit x,k (k=l, 2, ..., j-1) is not worse than the previous one, we have to fix left digits and change the value of j-th digit to find a new value, and we hope that it may be a better value than the current one of an optimal solution. The event of changing the value of 1-th digit: A, The event of changing the value of 2-th digit: A,A, Generally,-the event of changing the value ofj-th digit: A, 4 ...A,-, A, (1 5 j 5 m) After a certain number of iterations of the algorithm, we hope that these events occur one after the other. Hence we have the product of these events: (4 x A , u )... .A,IA, and the probability of this event is: ~
G4
ra..
456
Recent Advances in Stochastic Modeling and Data Analysis -
__
~
P(A~.A~4
. . . A , - ~ A , )= 41 .(I- q1)qZ.. .(1 - qI)(l-- qz 1.. .(I
= 41(1 - q l ) l - l q 2 ( 1
-
-
q,-I)q,
9 x 2 . .q,-,(l . - 4,-&,
Because these events are independent from one another, the value of the probability is maximum if 1 1 4, =:,q* . . , .= J J-I
I
.,.4/-, =-,q, 2
=1
We have average probabilities of changes 1 1 1 p =-(1+-+ ...+,) ' j 2 J
(I<j
Example: m=7, p=(0.37, 0.41, 0.46, 0.52, 0.61, 0.75, 1) Case 2: Some problems have many constraints so that their feasible regions are very small or narrow. To find a feasible solution, at first many left digits have to find si
Probabilities of finding next digits.
With m=7, we have two set of changing probabilities as follows: Set ofprobabilities I: (0.37,0.41, 0.46, 0.52,0.61, 0.75, 1) Set of probabilities 11: (0.57, 0.64,0.71,0.79,0.86, 0.93, 1) Remarks: The changing probabilities of digits of a variable increase from left to right. This means that left digits are more stable than right digits, and right digits change more than left digits. In other words, the role of left digit x,, is more important than the role of right digit x , ~ (+l l~j l m - I ) for evaluating objective function. First we use Set of probabilities 11 to find values of left digits, and then we use Set of probabilities I to find values of right digits. However, we do not know when the values of left digits can be found, therefore we mix probability I with probability I1 on each iteration. And the price we have to pay for this is that after the values of left digits are found, they are still calculated repeatedly! According to statistics of many experiments, the best thing is to use probability I in the ratio 60%-70% and probability 11 in the ratio 40%-30%.
SP Algorrthm for Engrneerzng Optzmzzatron Problems 457
3.2 Probabilities for selecting values of a digit Consider j-th digit with changing probability p,, let rl, r2 and r3 be probabilities of events below: rl: probability of choosing a random integer number between 0 and 9 for j-th digit r2: probability ofj-th digit incremented by one or a certain value (+I, ...,+ 5). r3: probability ofj-th digit decremented by one or a certain value (-1, ...,-5). with two probabilities I-pj and pj+l: Now we consider two digits a, and If the value of a, is not worse than the previous one, we have the probability so that a,,, can find a better value than the current one of an optimal solution as follow: 1 10
1
1
100
100
r, -+ r, -+ r,-
Because of r1+r2+r3=l,this probability is maximum if rl=l, r2=r3=0. If the value of a, is worse than the previous one, we have the probability so that a, and a,,] can find better values than the current ones of an optimal solution as follows: 1
1
50 + r, -+ r3100 100
Because of rl+r2+r3=l,this probability is maximum if rl=O, r2=r3=o.5 The average probabilities rl, r2 and r3 of both two cases: rl=0.5, r2=r3=0.25
4 Selecting k variables ( l l k l n ) to change their values On each iteration, if we select all of n variables to change their values, the ability of finding a better solution than the current one may be very small. Therefore we select k variables ( K k i n ) to change their values, and after a number of iterations the algorithm has more chance to find a better solution than the current one.
5 The Random Search via Probability Algorithm The main idea of SP algorithm is that decided variables of problem are separated into discrete digits, and then they are changed with the guide of probabilities and combined to a new solution. We suppose that a solution of problem has n decided variables, every variable has m=7 digits, M is a number of inside iterations of an outside iteration. SP algorithm is described with general steps as follows: S 1. Select a random feasible solution x, let Fx=f(x). s2. Loop=O; S3. Let P= (pl, p2,. .., p,,,), if (probability of a random event is 30%) then P=PI else P=PII. S4. If (probability of a random event is 50%) then <select a random number L from 1 to 5, and set p,=O (i=l, ...,L)>
458
Recent Advances in Stochastic Modeling and Data Analysis
Search v i a Probability Algorithm S5. Let y’x; select k variables (l<&n) of solution y and symbol yi (Isilk). Let yij be j-th digit (1 <j<m) of variable y, (1Silk), the technique for changing value via probability ofj-th digit is described as follows: For i=l to k do Begin yi=o; For j=1 to m-1 do If (probability of a random event is p,) then If (probability of a random event is rl ) then yi= yi+lO’”*random(lO); Else If (probability of a random event is rz ) then y,= y, +lo’-’*( xij -1); Else yi= yi +lo’-’*( xi, +l); Else y,= y, +10 1-j* yi= yi + 10’-”’ *random( 10); If (yibi) then y,=bi; End; S6. If y is an infeasible solution then return S3. S7. Let Fy=f(y). If Fy , it means that we fix the values of L left digits to increase the speed of finding the values of m-L right digits, because the right digits vary more than the left digits.
6 Examples Using PC, Celeron CPU 2.20GHz, Borland C++ 3.1. Select value to parameter M=l00000. We performed 30 independent runs for each example. The results for all test problems are reported in Tables. 6.1. Minimization of the Weight of a Speed Reducer
S P Algorithm for Engineering Optimization Problems Minimize f(x) = O.7854xIx:(3.3333x: 7.4777(x,3 + x:,
459
+14.9334x3 -43.0934)-1.508xl(x,2 +x:)+
+ 0.7854(x4x,2+x,x:)
subject to g, (x> =
27 ~
xlx22x3
397.5 - 1 20, g,(x) = ~x1x;x:
5x2 1 5 0, g*(x) = -XI
g,,(x)
1 . 5 +1.9 ~~ =
-1 I 0, g,,(x) =
1.93~: 120,g3(x)=-1 2 0, XZX3X64
X
1 5 0, gg(x) = 1 - 1I 0, 12x2 1 . 1 +1.9 ~~
1.93~: -1 5 0 , g4(x) = ____- 1 I 0,
x4 x5 x2x3x74 where 2.62x,I3.6,O.7Ix2<0.8, 172x,I28, 7 . 3 I ~ ~ I 8 . 3 , 7.3 5 x, I 8.3, 2.9 2 x6 23.9, 5.0 I x7 15.5.
For solving this problem, Ray et al. used an Optimization Algorithm Based on the Simulation of Social Behaviour, Mezura-Montes et al. used a Simple Evolutionary Algorithm, Akhtar et al. used a Socio-Behavioral approach. Table I : ComDarison of results for SDeed Reducer Desirm. SP algorithm [Ray et al., [Mezura-Montes [Akhtar et 20031 3.500000 0.700000 17.000000 7.300000 7.715321 3.350215 5.286655 -0.073915280397873318 -0.197998527141949127 -0.499172447764996807 -0.904643902796802735 -0.000000298998887224 -0.000000303397127444 -0.70250OOoooO~13 -0.0000000oooO0000063 -0.583333333333333259 -0.05 1325684931506944 -0.0000000648061 17507 2994.47151499891152OO
3.5000068 1 0.70000001 17.00000000 7.32760205 7.71532175 3.35026702 5.28665450 -0.07391711 -0.1980001 1 -0.49350137 -0.90464384 -0.00000064 -0.00000002 -0.70250000 -0.00000 193 -0.58333253 -0.05488856 -0.00000023 2994.73051820 2994.74424 1(*)
et al., 20031 3.506163 0.70083 1 17 7.460 181 7.962143 3.362900 5.308949 -0.077734 -0.201305 -0.474119 -0.897068 -0.01 1021 -0.012500 -0.702147 -0.000573 -0.583095 -0.069144 -0.027920 3025.005127 3025.00569(*)
al., 20021 3.503 122 0.700006 17 7.549126 7.859330 3.365576 5.289773 -0.075548 -0.1994 13 -0.456175 -0.899442 -0.013213 -0.001 740 -0.702497 -0.00 1738 -0.582608 -0.079580 -0.017887 3008.08
460
Recent Advances i n Stochastic Modeling and Data Analysis
(*) These statistics were checked by using our test program.
6.2. The problem of Himmelblau and two approximate problems Problem description [Hock et al., 19811: Minimize f ( x ) = 5.3578547 x i
+ 0.8356891 x l x 5 + 37.293239 x, - 40792 .I41
subject to
g,(x)
= 85.334407
+ 0.0056858 TI + T
2 ~ ,- ~0.0022053 , x3x5,
+ 0.0071317 x 2 x 5 + 0.0029955 X,X, +0.0021813 x i , g 3 ( x )= 9.300961 + 0.0047026 x3x5+ 0.0012547 xIx3+ 0.0019085 x3x4, 0 5 g , ( x ) I 9 2 , 9 0 5 g , ( x ) I 110,20 5 g 3 ( x )5 2 5 , where 78 I x1 5 102, 33 I x p 5 45, 27 I x 3 5 4 5 , 27 5 x4 5 45, 27 I x5 5 45. g , ( x ) = 80.51249
a) Test Problem 1: problem of Himmelblau (the correct version) [Himmelblau, 19721
Where = x 2 x 5 and T, = 0.0006262 for Test Probleml. The best known solution for Test Problem 1 is f* = -30665,538. For solving Test Problem 1, Gen et al. used Genetic Algorithms, Homaifar et al. used Constrained Optimization via Genetic Algorithms, Himmelblau used Nonlinear Programming. Table 2: ( bmparison of results )r Test Problem 1. SP algorithm
g3W f(X)
I
78.000000 33.000000 29.995257 45.000000 36.77581 1 91.9999996895 98.8404999879 20.0000000787 -30665.538482843880
I
[Gen et al., 19971 8 1.490000 34.090000 3 1.240000 42.200000 34.370000 91.781860 99.318806 20.060409 -30183.575622
[Homaifar et ul., 19941 78.000000 33.000000 29.995000 45.000000 36.776000 92.000043
98.840510 19.999935 I
-30665.608767
I
[Himmelblau, 19721 78.620000 33.440000 3 1.070000 44.180000 35.220000 91.792731 98.892932 20.13 1578 -30373.948730
violated.
b) Test Problem 2: A problem of Himmelblau (the incorrect version) [Coello, 20001
Where = x2x5and T2 = 0.00026 for Test Problem 2. For solving Test Problem 2, Coello used a Self-Adaptive Penalty Approach for Engineering Optimization Problems.
SP Algorithm for Engineering Optimization Problems 461
SP algorithm 78.000000 33.000000 27.070998 45.000000 44.969240 91.9999995850 100.4047838179 20.0000000290 -3 1025.5601491484
[Coello, 20001 78.0495 33.0070 27.0810 45.0000 44.9400 91.997635 100.407857 20.00191 1 -3 1020.859
Test Problem 3: A problem approximates to the problem of Himmelblau. Where q =x2x3 and r, =0.00026 for Test Problem 3 . The best known solution for Test Problem 3 is unknown [Hock et al., 19811. SP algorithm found many the best solutions with 6 digits after decimal point for Test Problem 3 . These the best solutions differ in values of their variable x2. Here 4 examples are: Table 4: SP algorithm’s results for Test Problem 3 . C)
Solution 1 78.000000 44.966946 27.061094 45.000000 44.999999 90.480297435 1 107.0474283569 20.0000000693 -3 1026.4276379629
Solution 2 43.129272 27.06 1094 45.000000 44.999999 90.197545621 1 106.0282993926 20.0000000693 -31026.4276379629
Solution 3 38.830046 27.06 1094 45.000000 44.999999 89.5360496482 103.6440540 147 20.0000000693 -31026.4276379629
Solution 4 78.000000 36.108773 27.061094 45.000000 44.999999 89.1173437857 102.1349026995 20.0000000693 -3 1026.4276379629
7. CONCLUSIONS In this paper, we proposed a new approach for single-objective optimization problems, Search via Probability algorithm. The SP algorithm used probabilities to control the process of searching for an optimal solution. We calculated the probabilities of the appearance of a better solution than the current one, and on each iteration of the performance of SP algorithm, we created good conditions for its appearance. The idea of SP algorithm was based on essential remarks as follows: The role of left digits was more important than the role of right digit for evaluating objective function. We calculated probabilities for searching better values than the current ones of digits from left digits to right digits of
462
Recent Advances in Stochastic Modeling and Data Analysis
every variable. Decided variables of problem were separated into discrete digits, and then they were changed with the guide of probabilities and combined to a new solution. The complexity of SP algorithm of a problem was not based on the type of expressions in the objective function or constraints (linear or nonlinear), but on the relation of decided variables in the formulas of object function or constraints; therefore if there were k independent variables ( I l E n ) , it would be sufficient to find a better solution than the current one, we only needed to select k variables to change their values on each iteration. We could not calculate exactly a number of iterations for searching an optimal solution the first time because SP algorithm was a stochastic algorithm; therefore we used unfixed number of iterations which has more chance to find an optimal solution the first time with necessary number of iterations. We tested this approach by implementing the SP algorithm on some test single-objective optimization problems, and we found very stable results. However if the feasible region of problem was very small or narrow, SP algorithm could not generate a random feasible solution or it found slowly the values of right digits of variables. We were researching these drawbacks with many various approaches of SP algorithm. We were also applying SP algorithm for solving multiobjective optimization and discrete optimization problems.
References [Akhtar et al., 20021 S. Akhtar, K. Tai and T. Ray. A Socio-Behavioural Simulation Model for Engineering Design Optimization, Engineering Optimization, 34(4):341-354, 2002. [Arora, 19891 J. S. Arora. Zntroduction to Optimum Design, McGra-Hill, New York, 1989. [Belegundu, 18821 A. D. Belegundu. A Study of Mathematical Programming Methods for Structural Optimization, Dept. of civil and environmental engineering, University of Iowa, Iowa, 1982. [Coello, 20001 Carlos A. Coello Coello. Use of a Self-Adaptive Penalty Approach for Engineering Optimization Problems, Computers in Industry, Elsevier Science, Vol. 4 I, NO.2, pp. 113--127, January 2000. [Deb, 19971 K.Deb. GeneAS: A robust optimal design technique for mechanical component design, Evolutionary Algorithms in Engineering Applications, SpringerVerlag, pp. 497-514, 1997. [Deb, 19911 K.Deb. Optimal design of a welded beam via genetic algorithms, AIAA Journal, 29(1 I), 2013-2015,1991. [Gen et al., 19971M. Gen and R. Cheng. Genetic Algorithms & Engineering Design, John Wiley & Sons, Inc, NewYork, 1997.
SP Algorithm for Engineering Optimization Problems
463
[Himmelblau, 19721 D. M. Himmelblau. Applied Nonlinear Programming, McGraw-Hill, New York, [4.1], 1972. [Hock et al., 19811 W . Hock and K. Schittkowski. Test Examples for Nonlinear Programming Codes, Lecture Notes in Economics and Mathematical Systems, Vol. 187. Springer-Verlag, Berlin Heidelberg New York [4.3], 1981. [Homaifar et al., 19941 A. Homaifar, S. H. Y. Lai and X. Qui. Constrained Optimization via Genetic Algorithms, Simulation, 62(4):242-254, 1994. [Kannan et al., 19941 B . K. Kannan and S . N. Kramer. An augmented Langrangian multiplier based method for mixed integer discrete continuous optimization and its applications to mechanical design, ASME Journal of Mechanical Design, 116, 3 I8320, 1994. [Mezura-Montes et al., 20031 E. Mema-Montes, C. A. C. Coello, and R. Landa-Becerra Engineering Optimization Using a Simple Evolutionaiy Algorithm, In Proceedings of the Fiftheenth International Conference on Tools with Artificial Intelligence (ICTAI'2003), pages 149-156, Los Alamitos, CA, November 2003. Sacramento, California, IEEE Computer Society, 2003. [Ragsdell et al., 19761 K. M . Ragsdell and D. T . Phillips. Optimal design of a class of welded structures using geometric programming, ASME Journal of Engineering for Industry Series B, 98(3), 1021-1025, 1976. [Ray et al., 20031 T . Ray and K. M. Liew. Society and Civilization: An Optimization Algorithm Based on the Simulation of Social Behaviour, IEEE Trans. On Evolutionary Computing, Vol 7(4), pp. 386-396,2003. [Sandgren, 19881 E. Sandgren. Nonlinear integer and discrete programming in mechanical design, In Proceedings of the ASME Design Technology Conference, pages 95-105, Kissimine, Florida, 1988. [Siddall, 19721 J. N. Siddall. Analytical Decision-making in Engineering Design. PrenticeHall, Englewood Cliffs, 1972.
Solving the Capacitated Single Allocation Hub Location Problem Using Genetic Algorithm Zorica Staniinirovid Faculty of Mathematics University of Belgrade Studentski trg 16/IV 11 000, Belgrade, Serbia (e-mail: zoricastmmatf .bg . ac .yu) Abstract. The aim of this study is to present a Genetic Algorithm (GA) for solving the Capacitated Single Allocation Hub Location Problem (CSAHLP) and to deiiionstrate its robustness aiid effectiveness for solving this problem. The apprcpriate objective function is correcting infeasible individuals to be feasible. These corrections are frequent in the initial population, so that in the subsequent generations genetic operators slightly violate the feasibility of individuals and the necessary corrections are rare. The solutions of proposed GA method are compared with the best solutions presented in the literature by considering various problem sizes of the AP data set. Keywords: Evolutionary computation, genetic algorithms, capacitated hub location problems, discrete location.
1
Problem formulation
Hub networks are often used in transportation and telecommunication networks where traffic (such as mail, telecommunication packets or airline passengers) must be transported from an origin to a destination point, but where it is expensive or impractical to use direct origin-destination links. In order to facilitate consolidation of traffic, hubs can be iused as intermediate switching points. When designing a hub network, different types of constraints may be involved. For example, the iiuinber of hubs t o be located may be fixed t o p , the allocation of non-hub nodes may bc either to a single hub (single allocation scheme) or to multiple hubs (multiple allocation scheme), different types of mpacity restrictions on hubs may be a.ssurned, a.s well a.s fixed costs of establishing hubs. An exhaustive survey of hub location problems and their cl%qsificationcan be foinid in [Campbell et al., 20021. The goal of CSAHLP is to locate a set of hub nodes and to allocate the non-hub nodes to the hubs from the chosen set so as to minimize the sum of transportation costs between origin-destiiiatioii pairs and t'he fixed costs of establishing hubs. The CSAHLP also assumes a single allocation scheme and limited amount of flow collected at each hub. The problem is NPcomplete, since its subproblem, the Ucapacitated Single Allocation Hub Location Problem-USAHLP, is proved to be NP-hard in [Kara and Tansel, 19981. 464
Genetic Algorithm for Solving the CSAHLP 465 In the literature, CSAHLP has only been considered by Ernst and Krishnamoorthy in [Ernst and Krishnamoorthy, 19991. The authors presentcd new mixed integer LP formulation for the CSAHLP and described two heuristic algorithms for its solution based on simulated annealing (SA) and random descent (RDH). The upper bounds obtained by the heuristics are used in an LP-based branch-and bound method, which provides optimal solutions for smaller size problem instances with n 5 50 nodes. For realistic sized problems n = 100,200 that could not be solved exactly, the proposed RDH and SA heuristics provided solutions in a reasonable amount of computer time. The formulation of the CSAHLP from [Ernst and Krishnamoorthy, 19991, which is used in this study, has the following notation: Cij = distance between nodes i and j (in metric sense); Wij = the aiiiouiit of flow (number of units of flow) between an origin-node i and destination-node j ; r k = the collection capacity of hub k ; F k = the costs of establishing hub k ; Oi = the amount of flow that depart-s from node i ; i.e. Oi = Wij; Dj = the amount of flow that is distribut,ed t,o node j , i.e. i3 = wij; x , a ,6 =parameters that reflect the unit rates (costs) for a collection (originhub), transfer (hub-hub), and distribution (hub-destination) respectively. The decision variables Zij E (0, l} have value 1 if node i is allocated t o a hub node j , 0 otherwise ( Z k k = 1 implies that the node k is a hub), while Yll represent the amount of flow that is origiiiated froiii iiode i) collected at hub k and distributed via hub 1. Using the notation mcntioned above, the problem can be written as:
Cj”=, CZ=,
n
n
n
min):
CikZik(XOi
i=l k = l
with constraints:
n
n
+ SDi)
n
ffcklYi1 i = l k = l 1=1
c
+
FkZkk k= 1
n
zik =
1 for every i = 1, ...; n
k=l Zik
5
Zkk
for every i , k = 1, ..., 12
The objective function (1) rninimizcs the sum of transportation cost between all origin-destination pairs via hub nodes and the fixed costs of locating the set of hubs. Constraint (2) states that each node is allocated to cxactly one hub,
466
Recent Advances an Stochastac Modelzng and Data Analysas
while constraint (3) enforces that flow is only sent via open hubs. preventing direct transmission between non-hub nodes. Constraints (4)represents flow coiiservatioii equality in the network. The amount of flow that is collected in a hub is limited by ( 5 ) . Finally, constraints (6) and ( 7 ) specify variables YIL and Z,, t o be non-negative and binary respectively.
2
Proposed GA
Since each hub problein has its own specific structure (objective function, decision variables and constraints), there is no general solution approach for solving all hub problems, or at least a smaller group of them. Few additional constra.ints or a. slight modifica.tioii of the problem structure ca.n substa.iitia,lly change the computational behavior of the designed solution approach. Exact methods can not provide solutions for large-scale hub location problems, which arise from practice, in a reasonable amount of time. Therefore, genetic algorithms, as robust heuristic methods (see [Back et al., 20001) are very promising approaches for solving hub location problems. Some successful applications of the GA for hub location problems can be found in the literature: [Abdinnour-Helm, 19981, [Topcuoglu and et al., 20051, [Kratica et al., 20061, etc. Representation of individuals: Genetic code of an individual consists of n genes, each referring to one network node. First bit in each gene takes value 1 if the current node is located hub, 0 if not. Considering these bit values, the array of opened hub facilities is formed. Remaining bits of the gene are referring to the hub that is assigned to the current node. Hub nodes are assigned to themselves. For each non-hub node, the array of located hub facilities is created and arranged by non-decreasing order of their distances from the current node. This strategy, named "nearest neighbour ordering", ensures that "closer" hubs have higher priority than the "distant" ones in assigning them t o non-hub nodes. For example, genetic code: 00/10/10~02~10 corresponds to the following solutioii: the first bits in each gene (0, l, l.,0, l ) denote established hubs (nodes 1, 2 and 4), while the remaining bits of the genes (0,0, 0,2,0) show assignments. Non-hub node 0 is assigned to its closest established hub, while node 3 is assigned to the third hub from the corresponding array of established hubs. Hub nodes 1, 2 and 4 are obviously assigned to themselves. Objective function: The indices of established hubs are obtained from the first bits of each gene. For each iioii-hiib node, the array of established hubs is arranged in non-decreasing order with respect to distances from the current node. The index of a hub that is assigned to the current non-hub node is obtained from the remaining part of the gene (if its value is T , the r-th hub is taken from the previously arranged array). Arranging the array of established hubs is performed for each individual in every generation
Genetic Algorithm for Solving the CSAHLP 467 After the assigning procedure described above has been performed, the objective value is simply evaluated only by summing distances origin-hub, hub-hub and hub-destination, multiplied with flows and correspoiidiiig parameters a , b and by a.dding the sum of fixed costs for establislied hubs. It may happen that a non-hub node is allocated t o a hub whose remaining capacity is not enough to satisfy the node’s demand. In this case, the next hub from the array of established hubs for the current node that satisfies the capacity constraint is taken. If there is no such a hub, we consider the individual infeasible by set,ting its fit,ness to 0. This case is very rare i n practice and it usually happens if the sum of hub capacities is less than the overall flow in the ne.twork. So, the infea.sible individuals (in the sense of insufficient hub capacities) will be generated in the initial population with very small probability. The applied strategy of correcting individuals with insufficient capacities to feasible ones may slightly affect the quality of the GA solution, but it preserves the diversity of the genetic material. If all the infeasible individuals were encountered in the population, they may become dominant in the following genera.tions aad the algorithm might provide 110 solution or finish in a local optimum. If the incorrect individuals were excluded from the population, the possibility of premature convergence would rapidly increase. Genetic operators: The GA method uses an improvement of standard touriiaineiit selection, nained fine-grained tournaineiit selection - FGTS. It is used in cases when the average tournament size Ftour is desired to be fractional (see [Filipovic, 20031). In this GA implementation Ftour = 5.4. After a pair of individuals is selected, modified one-point crossover operator is applied to them producing two offsprings. A bit position i (crossover point) is randomly chosen in the genetic code and whole genes are exchanged starting from the gene that contains the chosen crossover point. Crossover is performed with the rate pcross = 0.85, which means that around 85% individuals take part in producing offsprings. Offsprings generated by crossover operator are subject to modified twolevel mutation with frozen bits (see [Kratica et al., 20061). Basic mutation rates used in the GA impleineiitation are: 0.4/n for the bit on the first position in a gene, O.l/n for the bit on the second position in a gene, while the following bits have repeatedly two times smaller mutation rate (0.05/n, 0.025/n, ....). The appearance of frozen bits during the GA generations may increase the possibility of premature coilvergelice significantly ([Kratica et al., 20061). Therefore, comparing to basic mutation rates, frozen bits are muted with: 2.5 times higher rate (1.0/n instead of 0.4/n) if they are positioned at the first place of the gene and 1.5 times higher rate (0.075/n, 0.0375/n, ...) otherwise. Note that smaller values of mutation rate and frozen factor for the remaining part of the gene are used, because it is important to have many zeros there (each zero corresponds to the closest hub facility for particular non-hub node).
x,
468
Recent Advances in Stochastic Modeling and Data Analysis
Other GA characteristics: The initial population numbers 150 individuals. Each individuals of the initial population is generated with following strategy: the first bit in each gene takes value 1 with 5/11 probability; the second bit in each gene is generated with 2.5111 probability and the following bits take value 1 with two times smaller probability than the previous ones (1.25111, 0.625111, ...). Since "closer" hubs for each non-hub node are favoured, it is desirable that the second segment in initial genetic code contains many zeros. One third of the population is replaced in every generation, except the best 100 individuals that are directly passing into the next generation. These elite individuals preserve highly fitted genes of the population. Their objective values are calculated only in the first generation. The infeasible individuals (in the sense of insufficient hub capacities) in the initial population are corrected t o be feasible. The applied genetic operators preserve their feasibility, so the infeasible individuals do not appear in the following generations. If an individual with same genetic code appears again in population, its objective value is set to zero, which prevents it to enter the next generation. The appearance of individuals with the same objective value, but different. genetsiccodes is liiiiited t,o coiist,ant. N,, = 40. Described strategy helps in preserving the diversity of genetic material and in keeping algorithm away from local optima. The running time of GA is also improved by caching technique (see [Kratica, 19991). The number of cached objective values is limited to Ncache= 5000 in GA
3
Computational results
The proposed GA approach was tested on the AP hub instances [Beasley, 199961, with n 5 200 nodes and parameters x = 3 . 0 = 0.25 and S = 2. Two types of capacities and fixed costs on the nodes are assumed: tight (T) and loose (L), which gives four types of problems LL, LT, TL and TT for each problem size n. The GA was coded in C programming language and run on an AMD Athlon K711.33GHz with 256 MB RAM memory. On each AP instance, the GA method was run 20 times. The maximal number of GA generations N,,, = 5000 is used as a stopping criterion. The GA also terminates if the best individual or the best objective value remained unchanged through NTep= 2000 successive generations. On all instances that were tested, these stopping criterions allowed the GA to converge to high quality solutions. The results of the proposed GA on smaller size AP instances with n 5 50 nodes are presented in Table 1, while Table 2 contains results on the larger AP instances n = 100,200. The columns of Table 1 and Table 2 contain following data: instance's dimension, fixed cost and capacity type; optimal solution for current instance (only in Table 1) obtained with LP-based branch and bound method - BnB [Ernst and Krishnamoorthy, 19991 (for instance 50TT
G e n e t i c Algorithm for Solving the CSAHLP 469
IInst. -I Out.soI. II name LP BnB 1 Bestsol. l0LL 224250.0551 opt opt l0LT 250992.262 l0TL 263399.943 opt 10TT 263399.943 opt opt 20LL 234690.963 opt 20LT 253517.395 opt 20TL 271128.176 opt 20T1 296035.402 opt 25LL 238977.95 ~~
25LT 25TL 25T1 40LL 40LT 40TL 40T7 50LL 50LT 50TL 50T1
- I
GA
1 t [ s ] 1 tt,t[s] I g e n Igap[%]Ia[%]I
10.0191 0.940 120321 0.128 0.064 1.105 2125 0.000 0.127 1.067 2255 0.120 0.028 0.967 2051 0.046 0.043 2.169 2035 0.000 0.456 2.406 2461 0.074 0.451 2.565 2438 0.000 0.211 2.160 2205 0.297 0.402 3.066 2288 0.779 opt 0.127 2.969 2077 0.980 276372.5 opt 0.251 3.077 2174 2.247 310317.64 opt 0.548 3.003 2425 0.845 348369.15 241955.71 opt 0.226 5.266 2082 0.082 opt 1.15 6.463 2394 3.538 272218.32 opt 0.399 5.619 2146 0.000 298919.01 354874.10 356509.86 0.842 5.750 2321 4.249 opt 0.521 7.448 2139 0.267 ' 238520.59 opt 1.066 8.534 2269 0.518 272897.49 opt 1.875 8.866 2526 0.720 319015.77 417440.99* 422794.56 4.348 10.981 3100 2.793
eval 10.5741 35547.6 0.000 49641.4 0.446 47350.6 0.137 45390.3 0.000 66000.7 0.227 75465.8 0.000 78903.6 0.415 53469.3 1.221 76972.7 0.562 67922.3 2.300 72906 0.526 64927.9 0.214 73973.1 2.686 76834.2 0.000 78009 3.113 71647.6 0.652 76537.9 1.009 80235.6 0.934 92312.4 1.591 110612.1
:ache 65.1 53.3 58.3 55.8 35.3 37.9 35.6 51.5 32.9 34.8 33 46.5 29.1 35.9 27.5 38.3 28.6 29.5 26.8 28.7
Table 1. GA results on smaller AP instances
only the best value of BnB method is presented, since BnB found no optimal solution in this case); the best value of GA (Best.so1) with mark opt in cases when GA reached optimal solution; average time t (in seconds) needed to detect the best GA value; total time ttot (in seconds) needed for finishing
GA; the average number of generations g e n ; a percentage agap = & ~~
20
C gapi z=1
where gap.z -- 100solz-OPt.sol Oot,sol is ' evaluated with rcspect to the optimal solution Opt.so1, or the best-known solution Best.so1: i.e. g"p, = 100soLz-Best.soL Best.so1 I' n cases where no optimal solution is found (sol, reprcscnts the GA solution obtained in the i t h execution); standard deviation of the average gap (T =
J'
-
(gap, - agap)2(in percent); the average number of evaluations eval;
2o i 2o =l
savings achieved by using caching technique c.uch,e (in percent). GA concept cannot prove optiinality and adequat,e finishing crit,eria, that, will fine-tune solution quality, does not exist. Therefore, as column ttot in Table 1 and Table 2 shows, our algorithms run through additional ttot - t time (until finishing criteria is satisfied), although they already reached optimal (best) solution. As it can be seen from Table 1, the proposed GA approach quickly reaches all optimal solutions on A P instances with 72 5 50 nodes in less than
470
Recent Advances in Stochastic Modeling and Data Analysis Inst.
I%[.
Best.so1. t [ s ] tt,t[s] gen gap[%] eval cacht lOOLL 246713.97 5.300 56.052 5520 1.071 0.914 193743.9 29.3 IOOLT 256207.52 29.020 81.965 7814 3.194 1.791 279284.7 27.5 lOOTL 364515.24 15.184 66.293 6484 0.773 2.158 246441.4 23.8 lOOTT 475156.75 24.164 75.963 7298 4.391 4.442 270017.9 25.8 200LL 241992.97 168.966 424.517 8295 0.698 1.323 340043 18 200LT 270202.25 142.640 410.405 7588 1.864 2.227 302818.1 20.2 200TL 273443.81 80.872 325.878 6621 3.63 2.96 246438 25.2 200TT 291830.66 195.174 427.568 8968 0.81 0.8 349444.5 22.1
1.875 seconds. The exception is the instance 40TT where G A has 4.249% average gap from the optimal solution. For instance 50TT, where no optimal solution is known in advance, GA has 2.793% average gap from the best known solution obtained by BnB method. In Table 3, a comparison of the
~
'he besl IInst. I RDH and SA heur. I GA name IBest. sol. of RDH or SP Jpha 200 MHz (sec)lBest. sol.lAMD 1.33 GHz (sec, __ nethod same loom I 246713.97 18.55 (246713.971 56.052 GA 256207.52 81.965 lOOLT 256638.38 24.71 SA 30.16 364515.24 66.293 lOOTL 362950.09 SA 474680.32 34.83 475156.75 75.963 lOOTT 136.01 241992.97 424.517 same 200LL 241992.97 RDH 437.21 270202.25 410.405 2OOLT 268894.41 same 273443.81 325.878 2nnTL 273443.81 195.28 GA 292754.97 200TT -
Table 3. Comparisons on large A P instances
results for eight large A P instances obtained by proposed GA method and RDH and SA heuristics [Ernst and Krishnamoorthy, 19991 is presented. The second column of Table 3 contains better solution of both SA and RDH for the current A P instance. The third column shows the DEC 3000/700 (200MHz) computational time for which the SA/RDH heuristic obtained the corresponding solution. In the next two column the best GA solution and corresponding AMD (1.33GHz) computational time are presented. The last column specifies which of the three heuristic met hods gave the best overall solution for the current A P instance. As it can be seen from Table 3, the proposed GA gave better solution in comparison with other two heuristics in two cases, the SA also provided better solution on two A P instances, while the RDH was better than other methods in one case. In remaining three cases all three heuristics gave the same solution.
Genetic A l g o r i t h m f o r Solving the C S A H L P 471
4 Conclusions In this paper, a new genetic algorithm has been introduced t o the CSAHLP. Applied objective function ensures that infeasible individuals do not appear in the generations of the GA. Arranging located hubs in non-decreasing order of their distances for each non-hub node directs GA to promising search regions. By using mutation with frozen bits, the diversibility of genetic material has been increased. For the same reason, the number of individuals with same objective function value, but different. genetic code is limit,ed. Caching technique additionally improves computational performance of the GA implementation. Computational experiments reveal that the performance of the proposed GA is quite satisfactory. For smaller AP instances, the GA is highly effective in finding high quality solutions that match with the optimal ones in a very short time. Its performance on large A P instances with regard to both solution quality and computation time shows the potential of this algorithm as an useful metaheuristics for solving CSAHLP and other capacitated hub problems, as well as more complex hub location models. The future work could be also concentrated on the parallelization of the GA and its hybridization with exact methods.
References [Abdinnour-Helm, 19981s. Abdinnour-Helm. A hybrid heuristic for the uncapacitated hub location problems. European Journal of Operational Research, 106:489-499, 1998. [Back et al., 2OOOlT. Back, -, and et al. Basic algorithms and operators. In Ewolutionary Computation 1, 2000. [Beasley, 199961J.E. Beasley. Obtaining test problems via internet. Computers and Artificial Intelligence, 18:429-433, 19996. [Campbell et al., 2002lJ.F. Campbell, -, and et al. Hub location problems. In Drezner Z. and Hamacher H., editors, Facility Location; Applications and Theory, pages 373-407, 2002. [Ernst and Krishnamoorthy, 1999lA.T. Ernst and M. Krishnamoorthy. Solution algorithms for the capacitated single allocation hub location problem. Annals of Operations Research, 86:141-159, 1999. [Filipovic,2003]V. Filipovic. Fine-grained tournament selection operator in genetic algorithms. Computing and Informatics, 22:143-161, 2003. [Kara and Tansel, 1998lB.Y. Kara and B.C. Tansel. On the allocation phase of the p-hub location problem. Technical Report, Dpt. of Ind. Eng.; 1998. [Kratica et al., 2006]J. Kratica, -, and et al. Two genetic algorithms for solving the uncapacitated single allocation p-hub median problem. European Journal of Operational Research (to be published), 2006. [Kratica, 19991J. Kratica. Improving performances of the genetic algorithm by caching. Computers and Artificial Intelligence, 18:271-283, 1999. [Topcuoglu and et al., 2005lH. - Topcuoglu and et al. Solving the uncapacitated hub location problem using genetic algorithms. Computers and Operations Research, 32967-984, 2005.
CHAPTER 11
Data Mining and Applications
Robust refinement of initial prototypes for partitioning-based clustering algorithms Sami Ayramo', Tommi Karkkainenl , and Kirsi Majava' University of Jyviiskyla P.O.Box 35 (Agora) FIN-40014 Jyviiskyla, Finland (e-mail: samiayramit .j y u . f i)
Abstract. Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of synthetic and real-world data sets. Conclusions and discussion on the results are given. Keywords: clustering, initialization, robust statistics, data mining.
1
Introduction
Clustering is a descriptive data analysis technique, which is undoubtedly one
of the core methods in data mining (DM) and knowledge discovery from databases (KDD) [Hand et al., 2001,Tan et al., 20051. Roughly speaking, clustering methods can be classified into partitioning and hierarchical methods. Partitioning-based methods are very simple t o implement and usually require less memory than hierarchical methods. Clustering is a challenging task by itself because of the computational complexity of the problem. Major problems of the partitioning-based data clustering are non-uniqueness of solutions, tendency t o empty clusters, and sensitivity t o erroneous and incomplete observations. The non-uniqueness problem follows from the non-convex shape of clustering cost functions. Classical methods such as K-means [MacQueen, 19671 are often trapped into one of the locally optimal solutions, because they are actually local-search methods. T h e exhaustive search for finding the globally optimal partition is also impractical, since the number of different partitions is huge even for small data sets. In order t o avoid sub-optimal solutions, empty clusters etc., localsearch algorithms should be initialized carefully. In addition t o the improved quality of solutions, a n intelligent initialization method may reduce the number of clustering iterations on the full-data. For completeness, disturbing
473
474
Recent Advances in Stochastic Modeling and Data Analysis
effects of noise, outliers, or missing values might be reduced by using statistically robust methods [Karkkainen and Ayramo, 20041. A common assumption is that robustness means increased time for computation, but in this paper, it will be shown that this is not always the case. In this paper, a robust cluster initialization method is preIt is based on the so-called clustering refinement principle sented. [Bradley and Fayyad, 19981. Thus far, the refinement principle has been a p plied only using non-robust estimates (e.g., the sample mean). Through the numerical experiments, we show that the robust estimates can provide significant advantages over the classical clustering methods. Throughout the paper, we denote by (x)i the ith component of a vector x E RP. Without parenthesis, xi represents one element in the set of vectors The l,-norm of a vector x is given by
2
Robust clustering method
Let us first consider the following family of optimization problems [Karkkainen and Heikkola, 20041:
Obviously, q = Q = 2 leads to the well-known least-squares problem, whose minimizing solution is the sample mean, whereas q = 2~ = 2 yields the problem of the spatial median. The spatial median (a.k.a. Fermat-Weber point) is the minimizing point for the (weighted) sum of the Euclidean distances to n destination points in RP [Huber, 19811. From the point of view of statistics, the spatial median is a robust multivariate M-estimate of location. The breakdown point is the smallest fraction of contamination that can have an infinite influence on an estimator. The breakdown point of the spatial median is 50%. It is also orthogonal equivariant, but not affine equivariant estimate, which means that the estimate remains invariant after any rotation of data, but scale changes in variables may not cause the corresponding effect to the estimate. In the univariate case, the spatial median coincides with the coordinate-wise median (that is q = Q = 1). From the computational point-of-view the spatial median is problematic, because case u = xi for any i = { 1,. . . , n } leads to the ill-defined gradient. This yields challenges for its reliable computing. As pointed out, e.g., in [Karkkainen and Heikkola, 20041, the problem of the spatial median is a non-smooth optimization problem, which means that it can not be treated by using the classical ( C ' ) differential calculus.
Robust Refinement of Initial Prototypes
475
In this paper, SOR(successive over-relaxation)-accelerated iterative Weiszfeld algorithm is utilized to the robust prototype estimation [Karkkainen and Ayramo, 20051. The basic iteration is based on the smoothed "&-distorted"problem formulation whose first-order necessary conditions for the stationary point are given by:
2 JII.
u - xi - xill;
i=l
+&
= 0.
The next candidate v can be solved from the "linearized" equation
c n
a:(v - Xi) = 0 ,
(3)
i=l
where a: defines explicit weights for the denominator of (2): 1
at =
Jll(ut
- Xilll;
+&'
In the acceleration step, the next candidate solution ut+' is obtained by applying SOR-type of step-size factor to v as follows U t + l = ut
+w(v-d),
wE[O,2],
(4)
where w is the step-size factor, (v - ut) is the search direction, and v is obtained from (3). The algorithm terminates to the given stopping criterion: IIut+l - utll, < tol. The algorithmic formulation can be straightforwardly generalized to deal with the missing data by projecting all computational operations to the existing data values. Based on the general vector norms and ( l ) ,a generalized K-estimates clustering problem is defined by n
subject to ( c ) i E (1,. . . , K } for all i
=
1 , .. . , n,
where c is a code vector, which represents the assignments of the data points to the clusters, and m(+ is the prototype estimate (e.g., the sample mean for K-means) of the cluster, in which data point xi is assigned to. Here the choice q = 2a = 2 leads to the problem of K-spatialmedians (later we refer to it as Kspatmed). The K-spatmed clustering method itself follows the similar expectation-maximization strategy as the K-means methods. Instead of the sample mean, the spatial median points are approximated with the SOR-algorithm to estimate the most representative points for the data clusters. Hence, the computational complexity of K-spatmed is O ( n p K t E M t S O R ) , where t E M is the number of clustering iterations and tsoR is the number of SOR-iterations.
476
3
Recent Advances in Stochastic Modeling and Data Analysis
Initialization of clustering algorithms
Cluster initialization methods can be divided into three classes: random, distance optimization, and density estimation [He et al., 20041. The multiple repetitions from random points is often considered as the de-facto method [Bradley and Fayyad, 19981. Despite the provided minimal initialization cost, the consequence might be the significantly increased number of full-data clustering iterations for the failures. Therefore, this is not supposed to be feasible strategy for large DM data sets. Here this strategy is referred to as RND. Katsavoudinis et al. [Katsavounidis et al., 19941 present an expeditious and parameter-free initialization method that is based on the distance optimization principle. The method will be referred here as KKZ. After choosing the vector with the maximum norm (e.g., Euclidean) as the first prototype, KKZ iteratively searches and selects the points that are most distant to the already selected ones. The computational complexity of KKZ is attractive O ( K n p ) ,since only one distance computation is needed for each nonprototype point at each iteration. The clustering refinement is a density-estimation method, which builds on the following main steps [Bradley and Fayyad, 19981: 1. Draw m sub-samples {XI,. . . , X,} from the data X. 2. Cluster each sub-sample (empty clusters are not allowed) into K groups. The obtained prototypes {Cl, . . . , C,} serve as a refined data set X (size (mK x P)). 3. Cluster the data X m-times using the prototypes { C l , .. . , C,}, in turn, as the initial points. 4. The initial prototypes are the ones that produce the smallest distortion on the refined data set X.
The number and size of the sub-samples must be determined by the user. The results in [Bradley and Fayyad, 19981 indicate that from the refined initial points the quality of the K-means results can be improved. According to the knowledge of the authors, the clustering refinement has not been applied using robust clustering. Now, we propose a new combination of the clustering refinement and robust clustering method [Karkkainen and Ayramo, 20041. The fast SOR-accelerated spatial median estimator (cf. (4)) enables us to.build a robust clustering method with efficient initialization and missing data treatment. Thus, K-spatmed including the fast approximation of the spatial median replaces the K-means algorithm in the clustering refinement. From now on, we refer to this robust refinement aggregate as ROB and to the original K-means-based method as REF. ROB should be more tolerable to noise and gross errors than the original REF. Especially, on small sub-samples gross errors may have critical effect to the initial prototypes without robust methods. Each step of the method can
Robust Refinement of Initial Prototypes
477
easily be generalized to handle incomplete data sets using all the available data [Little and Rubin, 1987,Ayramo, 20061. The computational complexity of ROB-initialization is O ( K p n , , , n , , , t ~ ~ t ~ ~ where ~ ) , nsOsis the size and n,,, the number of sub-samples.
4
Numerical experiments
Precision and computational efficiency of the K-means and K-spatmed methods (abbreviated to K-m and K-s, respectively, in Tables 1 and 2) using RND, KKZ, REF, and the proposed ROB initialization are compared on synthetic data sets. The smallest sum of distances from the obtained prototypes to the true cluster centers (one-to-one mapping) is used as a measure of cluster quality. We also study the classification performance on four different kind of real world data sets: Iris, Dermatology (derma), Pen-Based Recognition of Handwritten Digits (pendigits), and Optical Recognition of Handwritten Digits (optdigits). All the data sets are available at the UCI machine learning database [Newman et al., 19981. Overall purity of each clustering is compared to the given classification. The cluster purity measures the extent to which cluster contains objects of a single class [Tan et al., 20051. Let ni be the number of objects in cluster i and nij the number of objects of class j in cluster i. Probability that a member of cluster i belongs to class j is then pij = nij/ni and the purity of cluster i is pi = maxptj. Consequently, the total purity of a clustering is 3
K
purity =
ni C -pi. n i=l
The major part of the experiments were performed using MATLAB software (6.5.1. SP1) on a conventional laptop computer with Intel Pentium 1,4GHz, 512MB of RAM and Windows XP (SP2) 0s. The experiment for optdigits and pendigits data sets were performed using MATLAB (7.1.0. SP3) software installed on HP ProLiant DL585 server with four AMD Opteron 885 (2,6GHz) dual core processors, 64GB of memory, and 64-bit x86-64 Fedora Core 4 0s. In all the experiments, excluding the two digit data sets, all attributes were normalized using the robust median absolute deviation scale estimate: M A D = m e d i a n { I X ~ - M ~ I } T where = ~ , M , = m e d i a n { z i } ~ = =. l Missing values on the derma data were handled by projecting all computational operations to the existing data values. For the SOR-method, the acceleration factor was 1.5 and the stopping criteria were to1 = lop3 (synthetic data) and to1 = (real data). Ten sub-samples of size 10% of the total data size were used in REF and ROB tests (except for pendigits and optdigits data the size of subsamples were 20%).
478
Recent Advances in Stochastic Modeling and Data Analysis
4.1
Experiments on synthetic data
I
I
2
I
1 -5
0
5
-10
-5
0
5
10
Fig. 1. Clean (left) and noise (right) synthetic test data sets with K = 5.
In order to compare the performance of the initialization methods, six bivariate data sets were generated. The data sets vary from quite simple ones with small numbers of error-free symmetric gaussian clusters, to more problematic ones containing a higher number of more dispersed (heavy-tail) and slightly overlapping clusters. In Fig. 1, two examples of the synthetic test data sets are given. On the left, test data K5D2gauss represents the clean gaussian test data sets. For K5D2gauss1 each cluster contains 100 points that were drawn from the bivariate normal distribution with the variance 1.0. On the right, test data K5D2noisy represents noisy test data sets, where 20% of the gaussian data generated for K5D2gauss were interchanged by the points from bivariate laplace distribution with the dispersion parameter values 3 and 4 for 5- and y-axis, respectively. About the naming of the synthetic data sets, we remark that KkDd expresses number of clusters and dimensions, and the last part ”gauss”/” noise” expresses whether or not noise is generated. The size of a cluster is 50 in K3D2gauss and KSD2noisy and 100 in KSD2gauss and KSD2noisy. All the experimental results are summaries of 100 trials. All the synthetic data results are reported in Table 1. Overall, the results indicate that the average error of the K-means prototypes can be reduced more by the robust cluster refinement ROB than by the K-means-based REF initialization when compared to the random initialization. The REF initialization seems to overcome the random initialization RND mainly on the non-contaminated data sets, but it appears to lead nearly equal or even worse K-means clustering results in the presence of noise. For K5D2noisy neither method attains any satisfactory clustering results. As expected, RND provides faster results than REF, but the average number of the K-means iter-
Robust Refinement of Initial Prototypes
479
DATA KKZ+K-s RND+K-m REF+K-m ROB+K-m ROB+K-s K3D2gauss errorjmean 0.33 2.17 0.36 0.45 0.37 errorjmin 0.33 0.36(73) 0.36(100) 0.36(98) 0.33(99) t ime/mean 0.05 0.08 0.23 0.23 0.22 iterjmean 5.0 5.5 3.2 2.9 2.3 K3DZnoisy errorjmean 11.04 2.23 4.22 1.69 1.64 errorjmin 11.04 0.61(46) 0.61(31) 0.61(68) 0.36(88) 0.06 0.21 0.20 0.20 time/mean 0.04 5.2 2.8 2.4 2.5 iter/mean 3 1.50 0.58 0.34 0.20 K5D2gauss errorjmean 0.20 0.21(39) 0.21(33) 0.21(39) 0.19(36) 0.20 errorjmin time/mean 2.23 0.61 0.94 0.91 0.95 iterjmean 5 11.9 5.6 5.0 6.2 K5D2noisy errorjmean 8.62 8.54 8.55 8.67 7.13 errorjmin 8.62 3.56(1) 8.12(1) 5.69(1) 0.40(22) time/mean 0.61 0.75 1.06 1.07 0.93 iter/mean 12 15.0 7.3 8.8 5.6 K9DZgauss errorlmean 6.23 3.70 0.40 0.50 0.61 errorjmin 6.23 0.08(13) 0.08(86) 0.08(87) 0.13(86) time/mean 1.56 2.06 2.96 2.76 2.83 iterjmean 9 13.8 4.7 4.3 4.8 K9DZnoisy errorlmean 9.50 3.53 3.75 2.50 2.23 errorjmin 9.50 0.49(29) 0.49(31) 0.49(58) 0.15(33) timejmean 2.25 2.39 4.08 3.43 3.15 8.8 7.0 iterjmean 15 16.2 12.5
Table 1. Clustering errors, total CPU time and final cluster iterations on the synthetic data sets. In parentheses are given the numbers of trials in which the minimum error for a particular method was obtained.
ations is higher. However, it seems that the speed advantage of RND over REF is somewhat lost due to increasing size of data and number of clusters. When comparing the refinement methods to each other, the results show that ROB causes K-means to converge into optima of better quality than REF. Clear advantage over REF with respect to clustering error were achieved on four out of the six data sets. The results also show that the robust initialization increases significantly the chance of obtaining the smallest rate of error by K-means. The best error of a method on a particular test data was obtained in 34%, 47%, and 59% of the test runs by RND, REF and ROB, respectively. The best errors were acceptable in all other cases but K5D2noisy, which turned out to be cumbersome data for any method. A significant observation is that the use of the robust initialization methods does not seem to increase the running time of the complete clustering process. In five out the six test cases the average running time of the ROB initialized clustering process was shorter or equal to REF. On average, REF plus K-means needed approximately 10% longer time for convergence on the synthetic data sets than ROB plus K-means. The shortened running times for ROB initialized clustering follows from the smaller numbers of clustering iterations. We also investigated, if further enhancements in accuracy and/or running time were attainable by a combination of robust initialization and robust
480
Recent Advances in Stochastic Modeling and Data Analysis
KKZ+K-s RND+K-m REF+K-m ROB+K-m ROB+K-s 0.82 0.79 0.81 0.80 0.81 0.67 purity/min 0.67 0.67 0.67 0.82 0.83(32) 0.82 purity/max(#> 99%) 0.83(22) 0.83(21) 0.83(30) 0.09 2.52 tirne/rnean 0.25 0.25 0.26 7.5 12 iter/mean 6.8 7.9 7.5 0.83 0.87 Derrna purity/mean 0.85 0.86 0.86 purity/min 0.67 0.87 0.72 0.72 0.72 0.96(6) 0.87 purity/max(#> 99%) 0.87(90) 0.95(3) 0.95(1) 0.63 3.49 time/rnean 1.55 1.54 1.66 9.1 6 iter/mean 5.6 5.0 6.9 0.70 Pendigits purity/rnean 0.72 0.71 0.73 0.74 0.70 purity/min 0.67 0.67 0.68 0.68 purity/max(#> 99%) 0.77(11) 0.70 0.77(3) 0.77(25) 0.77(35) 12.80 tirnelmean 14.34 30.75 28.38 28.79 iter/mean 24 30.1 13.8 14.6 13.7 Optdigits purity/mean 0.80 0.77 0.77 0.79 0.78 Duritvlmin 0.80 0.66 0.73 0.73 0.73 0.88(6) 0.88(3) 0.81(58) purityjmax(#> 99%) 0.80(48) 0.80 time/mean 15.87 13.61 11.29 11.07 12.49 iter/mean 29.0 30.0 16.3 16.0 15.0 DATA Iris
Duritv/mean . -.
Table 2 . Cluster purity, CPU time and final cluster iterations on the real-world data sets. The numbers in parentheses express the fraction of the trials in which the purity was higher than 99% of the best value for a particular method.
clustering (ROB+K-s in Table 1). This produced very good results as the composition outperformed the K-means configurations both in the quality and running time. The average running time was almost four percent shorter for ROB+K-spatmed than ROB+K-means because of the smaller numbers of clustering iterations. On five data sets the K-spatmed clustering produced smaller average and minimum errors than K-means. It even managed to find a number of good clusters for the cumbersome case KSD2noisy. Using K-spatmed for the final clustering, the ROB initialization was yet compared to the KKZ method. KKZ leads to good result only on the two non-contaminated data sets with three and five clusters. It seems that KKZ is very sensitive to noise and outliers. Therefore, the robust clustering method can not converge to satisfiable solutions. 4.2
Speed-up and classification error on real data
On the real data sets, we find out that REF yielded only small enhancements with respect to RND (see Table 2). On the derma and optdigits data sets the REF initialization actually prevented the K-means algorithm to obtain the best classifications that were however obtained by RND. Interestingly, the average time to convergence on the optdigits data is shorter for REF+K-means than RND+K-means. For the K-means clustering, the ROB initialization outperformed REF on all but Iris data, on which the methods did not have significant differences. Especially on the large digit data sets, ROB seems to provide K-means with better initial points than REF. This can be seen
Robust Refinement of Initial Prototypes 481
u
0
-
10
20
30
40
50
Number of clusters
Fig. 2 . Scalability with respect to the number of clusters on optdigits data.
both in the increased cluster purities and reduced running times. It is interesting that ROB outperforms RND even in the average running time on the optdigits data. Fig. 2 shows that the robust methods differ from REF neither in scalability with respect to the number of clusters. No remarkable further improvements were achieved by running the K-spatmed method from the robust initial points, but on the pendigits data the average cluster purity was slightly improved. The equal performance may indicate that the used data sets are relatively free from noise and outliers. KKZ seems to perform better on the real data sets than synthetic. On the larger digit data sets the KKZ initialized clustering seems to be comparable with ROB even in running time when K-spatmed is used.
5
Conclusions
In this paper, we have presented a statistically robust initialization method that assists the classic K-means method to improved clustering performance. On most of the data sets, the best results are obtained by the robust initialization. Interestingly, using the fast iterative SOR-method for the computation of the robust spatial median estimate, the whole clustering process becomes comparable or even better, not only in quality, but also in running time or scalability with respect to the sample mean based method. This is an important result, since the common assumption is that robustness can be obtained only at the cost of the increased computation. Although being robust against the worst solutions, a problem that occurred with the robust initialization
482
Recent Advances in Stochastic Modeling and Data Analysis
was that in some cases, i t prevents the finding of the best results t h a t can be found, for example, by the r an d o m initialization. I n order to find o u t more concrete advantages of the robust methods to classification problems, more contaminated data sets should b e tested.
Acknowledgement T h is work was supported by National Technology Agency of Finland under project Production2010 ( Dn r o 3199/31/05).
References [Ayramo, 2006]S. Ayramo. Knowledge Mining using Robust Clustering (PhD thesis). University of Jyvkkyla, 2006. [Bradley and Fayyad, 1998lP.S. Bradley and U.M. Fayyad. Refining initial points for K-Means clustering. In Proceedings of the 15th Internatzonal Conference o n Machine Learning, pages 91-99. Morgan Kaufmann, San Francisco, CA, 1998. [Hand et al., 2001lD. Hand, -, and et al. Principles of Data Mining. MIT Press, 2001. [He et al., 2004lJ. He, -, and et al. Initialization of cluster refinement algorithms: A review and comparative study. In Proceedings of International Joint Conference o n Neural Networks (IJCNN), Budapest, Hungary, 2004. IEEE. [Huber, 198I]P.J. Huber. Robust statistics. John Wiley & Sons, 1981. [Karkkainen and Ayramo, 2004lT. Karkkainen and S. Ayramo. Robust clustering methods for incomplete and erroneous data. In Proceedings of the Fifth Conference o n Data Mining, pages 101-112. WI T Press, 2004. [Karkkainen and Ayramo, 2005lT. Karkkainen and S. Ayramo. O n computation of spatial median for robust d at a mining. In R. Schilling, W. Haase, J. Periaux, and H. Baier, editors, Proceedings of the EUROGEN 2005 Conference. FLM, T U Munich, Germany, 2005. [Karkkainen and Heikkola, 2004lT. Karkkainen and E. Heikkola. Robust forrnulations for training multilayer perceptrons. Neural Computation, 16(4):837-862, 2004. [Katsavounidis et al., 199411. Katsavounidis, -, and et al. A new initialization technique for generalized Lloyd iteration. Signal Processing Letters, 1(10):144-146, 1994. [Little and Rubin, 1987lR.J.A. Little and D.B. Rubin. Statistical analysis with missing data. John Wiley & Sons, 1987. [MacQueen, 19671J . MacQueen. Some methods for classification and analysis of multivariate observations. In L.M. Le Cam and J . Neyman, editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297. University of California Press, 1967. [Newman et al., 1998lD.J. Newman, -, and et al. UCI repository of machine learning databases, 1998. [Tan et al., 2005lP-N. Tan, -, and et al. Introduction to data mining. AddisonWesley, 2005.
Effects of malingering in self-report measures: A scenario analysis approach Massimiliano Pastorel, Luigi Lombardi2, and Francesca Mereu3 Dipartimento di Psicologia dello Sviluppo e della Socializzazione Universitk di Padova Via Venezia, 8 1-35131, Padova Italy (e-mail: massimi1iano.pastoreQunipd. it) Dipartimento di Scienze della Cognizione e della Formazione Universitk di Trento Via Matteo del Ben, 5 1-38068 Rovereto ( T N ) , Italy (e-mail: luigi. 1ombardiQunitn. it) Dipartimento di Psicologia Universitk di Cagliari Via Is Mirrionis, 1 1-09123 Cagliari, Italy (email: franceschina.mereu0tiscali.it) Abstract. In many psychological questionnaires (i.e., personnel selection surveys
and diagnostic tests) the collected samples often include fraudulent records. This confronts the researcher with the crucial problem of biases yielded by the usage of standard statistical models. In this paper we generalize a recent combinatorial perturbation procedure, called SGR (Sample Generation by Replacements; [Lombardi et al., 2004]), to the analysis of structured malingering scenarios for dichotomous data. Combinatorial aspects of the approach are discussed and a n application to a simple data set on the drug addiction domain is presented. Finally, the close relationships with Monte Carlo simulation studies are explored.
Keywords: Sample Generation by Replacements, fraudulent data, scenario analysis.
1
Introduction
In some circumstances social desirability biases may drastically limit the validity of self-report measures. In general, faking and demand characteristics represent serious threats t o the psychometric validity of both social competence tests and self-report measures of socially undesirable behaviors. In particular, possible fake data confront the researcher with the problem of evaluating the effect of malingering responses t o final statistical results. It is worth mentioning that even in the presence of simple undirected (uniform) malinger data the answer t o this problem is not necessarily obvious, as even the random perturbation of data constitutes a biased information which decreases the efficiency of parameter estimates and weakens the accuracy of statistical results. 483
484
Recent Advances in Stochastic Modeling and Data Analysis
A case of particular empirical interest is the situation in which a researcher wants to evaluate the impact of structured malinger data ill testing a given target model. For example, within a simple dichotomous scenario. we might be interested in st,udyinghow the result of an exact Fisher’s test applied to a 2 x 2 contingency data-table varies as a function of different malingering scenarios. A fake-scenario analysis is a methodology for analyzing observed data by considering hypothetical malingering processes and may be considered as an additional analysis a researcher can run to broaden the sources of information she/he is interested in. Therefore, fake-scenario analysis is supposed to allow improved decision-making by allowing more complete consideration of outcomes and their eventual implications. In this paper we propose a simple combinatorial procedure for treating structured fake data in a dichotomous setting. The new procedure extends a recent data generating procedure called SGR (Sample Generation by Replacements, [Lombardi et al., 20041) developed to provide a perturbation model and a sampling procedure to generate structured collections of perturbations. Section 2 of this paper will first outline the basic principles of the new replacement approach. Section 3 will then present an illustrative application of the SGR approach to the analysis of a small data set on the use of ecstasy in an adolescent population. Finally, Section 4 will discuss the relation of the SGR method with Monte Carlo simulation studies. At the end of the section some possible extensions of the SGR approach are also outlined.
2
The method of replacements
Our procedure implements a combinatorial method that can be applied to discrete data with a restricted number of values (e.g., dichotomous or Likerttype scale) and consists of two different components: 1. a perturbation model, 2. a sampling procedure to generate perturbed samples from a given real data set. 2.1
Basic elements
In many social a.nd psychological surveys the resulted dataset often includes incomplete records (missing data) and/or fake records (fake data). In particular? as regards the dichotomous fake-data problem, we think of the dataset as being represented by a collection of pairs d = { (gi,yi) : i = 1,.. . , I } where yi is a group variable with gi = k denoting that individual i belongs to group k ( k = 1,. . . , K ) ; yi is a Boolean response variable where yi = 1 means tha,t individual i gives a,n affirmative answer to a possibly sensitive target question &. We may assume that a certain portion of the response vector y is actually fake-data. The fake-portion y,f of y together with the uncorrupted
Effects of Malingering in Self-Report Measures
485
portion y” of y, constitutes the full data set, that is to say y = y,fUy”. The exact fake-portion yf of y is assumed to be an unknown parameter and only the number 0 < N 5 I of fake data points in y is supposed to be known. The general idea is the following: in order to analyze the data and provide an uncertainty analysis of some statistic of interest we replace some portions y1,. . . ,Y H of y, each of which contains exactly N elements, with new components yyj.. . , yk such that y;I = l h - Y h for all h = 1 , .. . , H.In the SGR approach these new components are generated from an appropriate population, and, therefore, the complete datasets y;, . . . , y> (with y;L = y;I U yz; h = 1,.. . , H ) , are analyzed. We call the data array y;L and yL the hthperturbed array of y and the hth-replaced portion of y, respectively. Finally, it is worth mentioning that within a dichotomous scenario each perturbed array y i represents a node of the I dimensional Boolean hypercube (0,l)’ having y as its origin. In the Boolean hypercube each perturbed array y i has the same Hamming distance
d(Yl,Y) =
c
IY;L,i - Y i l = N
i
from the original data array y .
2.2
Malingering scenarios
Several malingering scenarios can be proposed according to both the typology of the investigated groups and the sensitivity of the self-report measure. The most elementary malingering scenario can be described by means of the principle of indiflerence. This principle reflects the fact that in the absence of further knowledge all entries in y are assumed to be equally likely in the process of faking (across the K different groups). In other words, it assumes a random malingering model compatible with uniform randomly fake-data. In contrast, the availability of external knowledge about process faking may suggest the modeling of more complex scenarios. For example, in personnel selection some subjects are likely to fake a personality questionnaire to match the ideal candidate’s profile (positive impression management or fake-good process, [Ballenger et al., 20011). Similarly, in the administration of diagnostic tests individuals often attempt to malinger posttraumatic stress disorder (PTSD) in order to secure financial gain and/or treatment, or to avoid being charged with a crime (fake-bad process; [Elhai et al., 20011). In these latter scenarios it could be reasonable to consider a conditional replacement model, where the conditioning is a function of response polarity (e.g., negative for fake-good and positive for fake-bad). In general we may define a K x 2 probability matrix P = [ p k j ] , where pki (resp. p k 2 ) denotes the probability that each of the N fake data points in y is associated to an affirmative (resp. negative) response of an individual belonging to group k ( k = 1,.. . , K ) . w e impose that p k j = 1. so, for
C kcj
486
Recent Advances i n Stochastic Modeling and Data Analysis
example, to simulate a uniform random malingering model we set (Vk,V j ) . The system
M
=
pkj =
&
( d , P , N ,( y i , h = 1 , .. . ,H))
defines the formal representation of the malingering scenario. M is said to be consistent if the replaced portion y: of y is stochastically consistent with both d and P for all h = 1,.. . , H , otherwise M is inconsistent. Notice that P induces a constrained random path y = XI,x2,.. , X N = yi of length N on the Boolean hypercube (0, l}'. The random path starts from node y and continues through the nodes of the Boolean hypercube by steps satisfying the following constraint
d(X,+l,Y) = d(x,,y)
+ 1,
n
=
1,.. . , N - 1
(1)
where node x, represents a transition node in the path which is governed by the probability matrix P. A malingering scenario M is inconsistent with respect to a final node X N whenever it does not exist a random path linking y and X N according to the probability matrix P and the full data set d. For example, suppose that K = 2, N = 10, and p l l = 1.0 (that is to say, each of the 10 fake data points in y is associated with probability 1.0 to an affirmative response of an individual belonging to the first of the two groups, k = 1). Moreover, suppose also that in the original data d, if gi = 1 then y i = 0 which means that all subjects in the first group gave a negative response. It is straightforward to verify that the above malingering scenario is stochastically inconsistent. Finally, let T be a statistical test and let t = T(d) be its value when the statistic is computed using the original data set d. The main goal of a replacement analysis is the evaluation of some properties of T under the perturbed sample space
generated according to a consistent malingering scenario M . Alternatively, we may also consider the evaluation of T across the nodes of the random walk y = XI,.. . ,X N = y i (Vh = 1,.. . , H).
3
Empirical data example
In this exploratory study we tested the new procedure on a small data-set from a study in the substance abuse domain. The current section is divided into three subsections: the first introduces the empirical data set and the statistical test; the second discusses the use of malingering scenarios to generate a family of perturbed datasets; and the third evaluates the statistical test with respect to the malingering scenarios.
Effects of Malingering in Self-Report Measures 487
3.1
Original dataset and test statistic
We illustrate the entire procedure using data on the interrelation between gender and ecstasy use. Participants were 22 undergraduate students from an high school in the Sardinia district (Italy). Ages ranged from 18 to 26, with a mean of 22.09 and a standard deviation of 2.15. Data was collected using a single item selected from a survey regarding the use of alcohol and other drugs in adolescents. In particular, the item consisted in a self-report measure of annual ecstasy use. The item was represented by the following question: ’Have you used ecstasy in the last 12 months?’ For purposes of this analysis, annual ecstasy use was considered a dichotomous outcome (at least once = l/none = 0). A contingency table summarizing the data is reported below (Table 1). The resulting (22 x 2) data matrix d was subjected to an exact Fisher’s test to evaluate the association between gender and ecstasy use. The test statistic was not significant ( p = 0.476). no yes
Table 1. Contingency table for data d.
3.2
Modeling malingering scenarios
In the following analyses we supposed that there were no more than a total of nine fake responses in the observed sample (approximately 50% of the sample) and according to this hypothesis we defined three different malingering scenarios: 1. ( M I )An undirected uniform malingering model: 1,2).
pkj =
(Vk = 1 , 2 , V j =
2. ( M z ) An oriented and gender symmetric malingering model assigning positive probabilities of faking to negative answers only: p k 2 = (Vk = 1,2). 3. ( M 3 )An oriented, but gender asymmetric, malingering model such that p12 = .60 (males) and p 2 2 = .40 (females).
In order to evaluate the uncertainty of the statistical test we resort to generating a family of H = 3000 different perturbed matrices with exactly N replacements in accordance to the procedure described in Section 2.2. The three scenarios are based on three different probability matrices each of which represents a different malingering process. According to M I a simple uniform random malingering model is implemented. It reflects the absence
488
Recent Advances in Stochastic Modeling and Data Analysis
of further knowledge about the process of faking governing the transaction between the original data set and the final perturbed array. Unlike, M I , the oriented malingering scenario M2 subsumes a different psychological process. In particular, M z models the generation of fake good responses. Finally, also M S models a fake good-type process, but unlike M2 it assumes that the probability of faking is asymmetric in the two groups with more fraudulent responses in the male group as compared to the female one.
Scenario 1
Ssenarlo 2
... m
0
.
e
.
a
.
.
L
a
.
scenario 3
. o
.
. . . D
*
.
.
.
.
.
.
.
.
Fig. 1. Exact Fisher’s test probabilities as a function of replacement.
3.3
Results
Figure 1 shows the Fisher’s Exact test probabilities as a function of N (number of assumed fraudulent points in the original sample y) for the three malingering scenarios. In its basic form, a large value of the test probability is evidence of a null hypothesis of independence between sex and ecstasy assumption. A Larger circle in the bubble plot indicates a larger size of the equivalence class of the perturbed arrays associated to the same contingency table. Figure 2 shows the proportion of significant Exact Fisher’s tests as a function of N for the three malingering scenarios. The pattern associated to MI showed that the uniform random malingering scenario was in general less sensitive to replacements than both the oriented malingering scenario ( M 2 ) and the asymmetric malingering scenario ( M 3 ) ,the latter being clearly the most sensitive to number of replacements.
Effects of Malingering in Self-Report Measures 489
I
I
,
I
I
1
2
3
4
5
6
7
8
9
number of replacements
Fig. 2. Proportion of significant Exact Fisher's tests as a function of replacement. Vertical segments represent 95% confidence intervals
4
Concluding remarks
The reader may have already noticed some similarities between the approach proposed here and standard Monte Carlo experiments. For example, the idea of generating new data sets. However, the two approaches are substantially different. Usually a Monte Carlo experiment uses a hypothesized model to generate new data under various conditions (e.g. [Robert and Casella, 20041). Therefore the simulated data are used to evaluate some characteristics of the model. This, of course, implies that the distribution of the random component in the assumed model must be known, and it must be possible to generate pseudorandom samples from that distribution under the desired conditions planned by the researcher. Instead of using the hypothesized model structure to generate simulated data sets, our approach uses the original data sample in order to generate a new family of data sets. In particular, these new data sets are obtained by adding structured perturbations in the original data set. The availability of external knowledge about process faking may suggest the modeling of highly structured malingering scenarios. In these more complex scenarios it could be reasonable to consider conditional replacement models, where the conditioning is a function of some response polarity ( e g , fake-good or for fake-bad). In the latter case, each new sample represents an alternative malingering scenario which is directly derived from both the original sample
490
Recent Advances in Stochastic Modeling and Data Analysis
and the assumed malingering model. Next, the result of a target criterion can be compared with the ones obtained from the perturbed samples. Several possible extensions of our approach may be considered. In the present paper, under the assumption of different malingering scenarios, a very simple SGR model has been proposed as a model for Boolean data. However, the current approach can be straightforwardly extended t o categorical data as well as to continuous data. In particular, a SGR model for continuous data would imply a different kind of metric, for example either the cityblock distance ( L 1 ) or the standard Euclidean distance (&). These new extensions would enlarge the general replacement schema by adding more complex constraints with which we could provide more structured perturbed scenarios.
References [Ballenger et al., 20011J.F. Ballenger, A. Caldwell-Andrews, and R.A. Baer. Effects of positive impression management on the neo personality inventory-revised in a clinical population. Psychological assessment, pages 254-260, 2001. [Elhai et al., 2001lJ.D. Elhai, S.N. Gold, A.H. Sellers, and W.I. Dorfman. The detection of malingered posttraumatic stress disorder with mmpi-2 fake bad indices. Assesment, pages 221-236, 2001. [Lombardi et al., 2004lL. Lombardi, M. Pastore, and M. Nucci. Evaluating uncertainty of model acceptability in empirical applications: A replacement approach. In K. van Montfort, 3. Oud, and Dordrecht (NE) Satorra, A. Kluwer, editors, Recent Development on Structural Equation Models, pages 69-82, 2004. [Robert and Casella, 2004]C.P. Robert and G. Casella. Monte Carlo Statistical Methods (second edition). Springer-Verlag, New York, 2004.
The Effect of Agreement Expected by Chance on some 2x2 Agreement Indices Teresa Rivas-Moya University of Mdaga Campus de Teatinos 29071, Mrilaga, Spah (email [email protected]) Abstract. Distinct coefficientscan give the degree of agreement between two raters. Some of them, such as n , K , p and AC1, use the same concept of proportion of observed agreement but they use different measures of Agreement Expected by Chance ( Pc ).
This study analyses the effect of Pc on agreement indices when two raters classify subjects in two categories. Several examples are shown in whch the degree of agreement and their respective Pc are calculated. These show that if the same number of observed agreements go from being equally distributed - in both categories - to being concentrated in only one category (1) j2 and K decrease and their respective Pc increases (2) p and its Pc remain constant, and (3)ACl increases
and its Pc decreases. Key Words: Reliability, Agreement, Marginal Symmetry, Agreement Expected by Chance
1. Introduction Several indices have been defined to analyse the degree of agreement between two raters when they classify a group of n subjects in two categories. K(Cohen, 1960) is one of the most frequently used indices K =
-Po = x ( n i i / n )and Pc
the proportion of observed agreement.
= Z ( n i . n . i / n 2 )-
& is
Pc is a specific measure of the
proportion of Agreement Expected by Chance. Different authors have analysed, from different perspectives, the importance of the influence of Marginal Asymmetry (MAS) - or Marginal Symmetry (MS)-and Pc on K . Based on the fact that Pc is obtained from the Marginal Frequency Distributions (MFD), the study of influence of MAS on K , has led some authors to defining other indices to analyse the agreement. Examples are Z (Scott, 1955), p (Brennan and Prediger, 1981) and ACl(Gwet, 2001). These indices maintain the
k expression, i.e., equal Po but different Pc 491
492
Recent Advances in Stochastic Modeling and Data Analysis
Gwet (2002a) describes the conceptual limitations of k and ?r in 2x2 Tables, when these indices are compared with AC1. He concludes that ( 1 ) in many cases k and 7~can give erroneous results, especially when the sum of the marginal proportions is very different from ‘one’, and (2) AC1 gives more stable results. Gwet (2002b, p. 9) also concludes that K and r a r e more affected by the global propensity of positive classification - which he defines as
p, = ( h . + P , i x ) - than by the differences in the Marginal Proportions. and
?r
K
yield reasonable results when P, is close to 0.5
In general, MAS and
Pc influence
K .
MAS does not always have great
influence on Pc . A study of how to analyse the influence of MAS on the K value is shown in Rivas (2005) and Rivas and Gonzalez (2007). As a part of these studies several examples are shown, such as MAS in observed agreements having no influence on the Pc or K values. However, the same number of observed disagreements, not equally distributed over the two categories, has hardly any influence on the Pc . In addition, it has a varying influence on the MAS and K values in the examples shown. This paper analyses several examples in which (1) total observed agreement (Po)and the disagreements are constant, and (2) the number of agreements in each category is different for each example. Across the examples, the agreements go from being equally distributed - in both categories - to being concentrated in only one category In all the examples MS is analysed. The results obtained with different indices of agreement ( K , ?r ,p ,AC1) and their respective Pc are shown and discussed.
2. Method Based on the classification given in Table 1, the different indices ( K ,?r,p , AC1) and their respective Pc are shown in Table 2. Rater B Category 1 Category 2 Category 1 Rater A
Category 2 Total Column
Total Row
n11
n12
n1
n2 1
n22
n2.
n,,
n.2
n
Table 1 . Classification of two raters in two Categories
Effect of Agreement Expected by Chance 493
Index
n
pc
=(
PC
nl.;nn.l)2
+(
n2.+-nn.2)2
K
1
P AC1
Pc =
; k number of categories
(
p * = n1.2+nnn.1); Pc = 2P * (1- P *) ~
Table 2. Index and Agreement Expected by Chance
3. Data Table 3 shows four classification Tables in which the proportion of (1) total observed agreement (Po)and total observed disagreement are equal, and ( 2 ) agreement in each category is different in each Example. In Examples 1-4 (Ex. 1-4), two raters (A,@ classify 60 subjects in one of two Categories. The total number of agreements between raters n,, ~1~~ (50) is distributed differently (25, 25) (40, 10) (45, 5) (49, 1) in the two cells associated with the agreements. The total number of disagreements n12 nZ1 (10) is distributed equally over both Categories (4, 6) in the two cells associated with the disagreements (See Table 3).
+
+
Example 1 Rater B Category 1 Category 2 Total Row Rater A Category 1 25 4 29 Category 2 6 25 31 Total Column 31 29 60 Example 2 Rater B Category 1 Category 2 Total Row Rater A Category 1 40 4 44 Category 2 6 10 16 Total Column 46 14 60
494
Recent Advances in Stochastic Modeling and Data, Analysis
Example 3 Rater B Category 1 Category 2 Total Row 45 4 49 Rater A Category 1 Category 2 6 5 11 Total Column 51 9 60 Example 4 Rater B Category 1 Category 2 Total Row 49 4 53 Rater A Category 1 Category 2 6 1 7 Total Column 55 5 60 Table 3 . 2x2 Tables Examples 1-4
4. Results Results of the study of MAS are obtained, based on Uebersax (2003) and Rivas and Gonzalez (2007). They are shown in Table 4. Columns 3-4 and 5-6 show the absolute frequency and marginal proportion distributions. On a descriptive level, MS is shown in Figure 1. hi
,
ill
Fig 1. Plot of Marginal Proportion Distribution (Ex. 1-4)
Effect of Agreement Expected by Chance 495
Column 7 in Table 4 shows McNemar statistic and statistical significance when the hypothesis of MS is tested. There are no significant differences between raters when they classify subjects in both Categories in Ex. 1-4 (p=0.527). In this way, there is MS.
Ex.
Category
Rater Rater A B Frequency
Rater Rater A B Proportion
x:
(PI
1 1 2
29 31
31 29
0.483 0.517
0.517 0.483
0.400(0.527)
1 2
44 16
46 14
0.733 0.267
0.767 0.233
0.400(0.527)
1 2
49 11
51 9
0.817 0.183
0.850 0.150
0.400(0.527)
1 2
53 7
55 5
0.883 0.117
0.917 0.083
0.400(0.527)
2 3
4
Table 4. Marginal Proportion and Frequency Distribution and McNemar Test Neither observed agreement nor observed disagreement influences MAS, However, observed agreement and/or disagreement influences Pc values. Study of Po, Pcand K , n , , , A C l . Po and Pc values for each agreement index are shown in Ex. 1-4 (See Table 5) Agreement Index
O P
Ex. 1 0.8333
Ex.2 0.8333
Ex.3 0.8333
Ex.4 0.8333
0.6666 0.5000
0.555 0.625
0.3999 0.7222
0.0739 0.8200
0.6670 0.4994
0.5562 0.6244
0.4010 0.7217
0.0769 0.8194
0.6666 0.5000
0.6666 0.5000
0.6666 0.5000
0.6666 0.5000
0.6666
0.7339 0.3750
0.7692 0.2778
0.7967 0.1800
0.5000
Table 5. Po, Pc and agreement indices
496
Recent Advances in Stochastic Modeling and Data Analysis
EX.1-4 show (1) the same Po = 0.8333, (2) Pc is similar in T (Rows 3 - 4; Table 5) and K (Rows 5 - 6; Table 5). This is shown in Figures 2 and 3.
.._
0.9
I Fig. 2: Plot of
K
11. I
and Pc
Fig. 3: Plot of
?T and
Pc
(3)p and Pc (Rows 7 - 8) remain constant. In addition, in any of these examples, independently of MFD, Pc = 0.5 and p = 0.6666, which can be seen in Figure 4. (4) ACl increases and P, decreases when Po tends to be concentrated in one of the two Categories (Rows 9 - 10; Table 5) as can be seen in Figure 5.
Fig. 4: Plot of p and Pc
Fig. 5: Plot of AC1 and Pc
5. Discussion There is no single definition ofP, . MAS and Pc have influence on agreement indices. However, if there is MS, it is also possible that agreement indices can assume different values, due only to their different Pc . Then, a study of MAS and Pc and their influence on the agreement index is always recommended.
Effect of Agreement Expected by Chance 497
In Ex. 1-4, MS is satisfied and only Pc influences the different indices K ,n ,p ,AC1, showing different values. If the same number of observed agreements go from being equally distributed - in both categories - to being concentrated in only one category (1) ?T and K assume similar values. IT and K decrease and their respective Pc increases
(2) p and its Pc remain constant. Pc asumes a constant value which depends on the number of categories, and not on the number of subjects assigned to each category, and (3) The reverse is the case f o r A C l in relation t o n and K . ACl increases and its Pc decreases Other examples can be given of MAS influence on Pc and the influence of both on agreement indices. In such cases, results would be different. Therefore, in practice it is recommended that not only the value of agreement index be shown, but also the study of MAS and the Pc associated with the agreement index. In this way, more detailed information regarding observed agreement and agreement expected by chance will be given.
References [Brennan and Prediger, 19811 R. L. Brennan and D.J. Prediger. Coefficient kappa: Some uses, misuses and alternatives. Educational and Psychological Measurement, 4: 687-699, 1981. [Cohen, 19601J. Cohen. A coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 1: 37-46, 1960. [Gwet, 200 11 K. Gwet. Handbook of Inter-Rater Reliability. Gaitersburg: Stataxis PC, 2001. [Gwet, 2002al K. Gwet. Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Statistical Methods for Inter-Rater Reliability Assessment, 1: 1-6,2002a. [Gwet, 2002bl K. Gwet. Inter-Rater Reliability: Dependency on Trait Prevalence and Marginal Homogeneity. Statistical Methods for Inter-Rater Reliability Assessment, 2: 1-9,2002b. [Rivas, 20051 T. Rivas. The 2x2 Kappa Coeficient and the Condition of Symmetry Marginal Distributions. Paper presented at the 8'h European Conference on Psychological Assessment, Budapest, September, 2005. [Rivas and Gonzalez, 20071 T. Rivas and M-J. Gonzalez. The Effect of Asymmetry on the 2x2 Kappa Coefficient: Its application to the study of Learning Disabilities. Learning Disabilities: A Contemporary Journal, 5: 59-76,2007,
498
Recent Advances in Stochastic Modeling and Data Analysis
[Scott, 19551. W.A. Scott. Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19,321-325, 1955. Webersax, 20031 J. Uebersax. McNemar Tests of Marginal Homogeneity. In J. Uebersax, editor, StutisticulMethodrfor rater agreement. pp. 1:8,8:8,2003. Available at htta://ourworld.comuuserve.com/homepages/isuebersdamee.htm
Qualitative Indicators of Libraries’ Services and Management of Resources: Methodologies of Analysis and Strategic Planning Aristeidis Meletiou Technical University of Crete, 73 100 Chania, Greece [email protected]. Abstract. Last years the amount of information about data of Libraries’ management is increasing rapidly. This fact is making very difficult the analysis that is concerning decision making about improvement of services. A new methodology for the quality analysis of Greek libraries is described in this paper. The goal is the recognition current state of resources management and the achievement of an effective and efficient decision-making and strategic planning. The study examines the library operation, its services to users, management of the library’s internal procedures, human resources and the cost effective manner to allocate the budget. Usage and analysis by an efficient way of all types of data that are involved in Library’s operation, usage of evaluation indexes about quality of services in Libraries and the combination of their results gives very important conclusions for their structure. This fact leads to establish an efficient strategic planning in any occasion. The term “structure” of a Library includes not only the internal organization of offered services to users, but also the organization of all the procedures in the Library.. Also, includes efficient management of human resources and efficient planning and management of budget allocation. This paper describes ways of data and knowledge mining from a large amount of operational data that are used in services and organization of modern Libraries. The purpose is the extraction of useful conclusions from data mining’s procedure that will cause improvement of all offered services and the efficient organization and operation of modem Libraries (e.g. improvement of collection’s material, improvement of services in specific types of users, strategic planning of efficient budget allocation for material’s acquisition). Implementations of this methodology refers to evaluation of Library’s collection in connection to material’s subject categories, departments that are served, type of resources, type of users and studying interests of them. Another implementation of this method refers to the effects that can be appeared in case of disproportionate fluctuation of one part of Library’s collection (e.g. growth of electronic resources in relation with printed resources, or growth of one subject category in comparison with others), These effects are influencing sometimes not only Library’s internal organization but also offered services to users and staffs practice and knowledge in new activities. Keywords: Human Resources Management, Data Mining, Evaluation Indexes, Strategic Planning 499
500
Recent Advances in Stochastic Modeling and Data Analysis
1. Introduction It is fact that continuously the size of information that is stored daily in databases that are applied in a library is increasing rapidly. It is therefore obviously that the volume of fimctional data (data that are used or are able to be used), that exists in a library, is huge and is increasing continuously. At the same time we should not forget the fact that data (which can be referred to the same object) usually are stored in more than one database, something that leads us to realise immediately the degree of heterogeneity of information in question (a lot of databases with different elements each one, that many times concerns the same object). The problem therefore lies in the concentration of this information in such a way that it will allow easy and efficient analysis aiming at the export of usehl conclusions that leads to the decision-making (e.g. administrative, economic). The technology that allows this is the technology of information’s and knowledge’s mining from databases (data and knowledge mining). Therefore if we want to give a more thorough definition we could say that data and knowledge mining technology (Kao and Chang and Lin 2003) is a process in which is discovered implied knowledge through large databases. This technology has the ability to reveal hidden relations, hidden correlations, hidden models and tendencies in data that are stored with traditional ways in large databases. In other words this technology reveals relations between data that exist but are not obvious and clear (Han and Fu 1999 Hirota and Pedrycz 1999). The article in question describes this process and it proposes methodologies for the exploitation and utilisation of techniques that are applied in areas different than Libraries (e.g. economic, commercial, medical, and technological), in data that concern Libraries and in the sectors where they are activated.
2. General description of process of knowledge’s exploration In this unit we will present a general description of the process of knowledge’s exploration from data, reporting a small description of the different steps that are followed and that compose it.
Libraries’ Services & Management of Resources Qualitative Indicators
Enrichment Data Choice
<
Coding
Knowledae Mining
501
Presentation of results
L Data
Feedback
Picture 1: The procedure of Data and Knowledge Mining
As appears from the model (Picture 1) (Dunham 2004) the process of knowledge’s exploration is separated in six stages. Even if someone can suppose that these stages are independent, in practice the process can turn back one or more steps. For example, in the phase of “coding” it can be realised that the process of “cleaning given” did not become satisfactorily and the system turns two level back, or at the stage of “models’ application in knowledge’s exploration” can be discovered new given and have to be used in order to enrich the already existing data. In the stage of “data’s choice” we collect the functional data that are required for the “process of mining and discovery of knowledge”. These data can emanate from a lot of different and heterogeneous data sources (databases, files of information, etc). In the next stage is taking place an “infiltration” of data from likely unnecessary information, data error and pollution, phenomena that are influencing considerably the process of new information’s exploration leading many times to wrong results. The erroneous data can be corrected or removed, while incomplete data can be estimated and supplemented. For the “infiltration” of data’s pollution the systems of knowledge’s exploration are usually using data mining and pattern recognition techniques (techniques of knowledge mining and recognition of models) or simpler solutions such as deletion of records that have few information, control of attributes range, replacement of invalid values with null etc. We want to emphasize the fact that in the process of knowledge exploration there is not a problem if we remove a very small percentage of polluted records as the final conclusions will not be influenced considerably. On the contrary, usage of these records can cause errors in our conclusions.
502
Recent Advances in Stochastic Modeling and Data Analysis
Anyway, in a lot of cases of knowledge’s exploration we are interesting about elements of pollution, as for example the percentage of users that denied giving their real personal information. In the stage of “enrichment”becomes enrichment of data of exploration from “exterior data”. These data can supplement existing ones and their purpose is to contribute in a more efiicient “data analysis” and in the reception of better results (eg accurate information on some category analysed objects, statistical measurements in small samples of given data, information that was not imported in the initial stage and is considered important). In the stage of “coding and transformation of data” are taking place certain transformations that make more efficient the process of exploration of knowledge. The data that emanate from different sources needs to change in a common form for their further analysis (e.g. fields with different names that are describing the same information). Also, some data is required to be coded or to be converted in useful forms (e.g. transformation date of birth in age, division of financial sums with 1000, transformation of values “yedno” in “l/O”). Another option is the reduction of data in order to reduce the number of useful and valid data values that will be taken into consideration. In the next stage, is taking place the exploration of data by model’s application and techniques of knowledge’s mining. Using them is received the information that has relation with the correlation of data, by which will be produced the useful information of all process. Indicatively knowledge that results from the application of these models and techniques can be: 0 Association rules: It is the extraction of rules and information that connects various objects that interest the user (data miner), as for example “the 60% of customers that buy cars magazines buy also athletic magazines“. 0 Classification: It achieves the classification of data based on values of specific parameters, for example classification of cars based on their consumption or based on number of kilometres. 0 Clustering: Data are categorized in small number of groups with certain particular common characteristics, for example categorization of ways that a product is promoted depending on the customs and the incomes of populations in different regions. 0 Trend and deviation analysis: It is the discovery of main trends and deviations for all analysed data, for example discovery of stocks that are more or less profitable (comparing with the mean) in Stock Exchange. 0 Pattern analysis: It is the discovery of concrete models in the data that are determined by the observer (data miner), for example discovery of most popular paths that a visitor follows in a web site. In the last stage is taking place “the presentation of results” (that was produced in the previous stage) to the observer by a way that can help him to realize the real meaning of them and to have the ability for efficient interpretation. The purpose is to help him in decision-making procedures.
Libraries ’ Services & Management of Resources Qualitative Indicators 503
It is very important how the results will be presented because the usefulness of them mainly depends from this presentation. For this purpose are used various strategies of visualization (optical presentation of data) (e.g. graphic representations as, pies, histograms, 3D presentations) as it is fact that “one picture is equal to thousand words”.
3. The process of knowledge’s mining in data that are related with libraries If we want to apply the procedure that was described, in Libraries we can describe it again, focused to the character and needs of them. The purpose is to give clearly the way that is followed to use methods of knowledge’s mining to Libraries. 3. I . Choice of data to be imuorted and analvsed At first we have to decide what data (from those that are stored in the proportional databases of the Library) will be imported in the process of data and knowledge’s mining. If we want to describe the operational data that can be used at this process and specifically in the stage of data selection, we can say that most important of them are:
3.1.1. Data fiom the OPAC Data collected here are related with the type of books (title, categorythematic collection) that the user is searching for. We have to say that these data are not related with the books that finally the user loans as he can search a lot of books before loans one of them. Typical values that could be selected for analysis are: date of search, book’s ID, book’s title and category in which it belongs. All these data can be easily found in queries log files of OPAC.
3.1.2. Circulation data for users and data about material’s @rinted or electronic) usage Data collected here are related with the identity, category and department of user, what books he loaned, for how much time he kept the material, how many times a book was loaned in a specific period, how many times this book was not available for loan and tagged for reservation, what and how many books were “late” (the user didn’t return them in time), how many loans every user did, how many “in-time’’ and “late” book returns each user made. Typical values that could be selected for analysis are: book’s ID, ID and category of user (e.g. undergraduate, postgraduate, professor), user’s department, number of in-time and late returns of each user, how many and what books he loaned, the book’s subject, the number of loans for each book,
504
Recent Advances in Stochastic Modeling and Data Analysis
the number of reservations that happened in the book and how many times each book was returned “in-time’’or “late”. All these data can be easily found in log files of circulation’s computer system. In electronic material (e-books) typical values that could be selected for analysis are the ID, title and subject of book, the ID and category of user (e.g. undergraduate, postgraduate, professor) that read it and user’s department. All these data can be easily found by the computer system that serves the electronic books (eg log files from the corresponding e-book servers).
3.1.3. LogJiles?om Web Servers Data collected here are related with paths that are followed by a user in one visit to the website of the Library. If we name the web pages of the website (e.g. A, B, C, D...), in each user’s visit we are received the precise paths that follows (e.g. if from page A returns in the C, afterwards in the D and afterwards in the B, the precise path that followed in his visit can be defined as: ACDB). Using these data we can find the most popular destinations and the most popular paths in one website. All these data can be found by the log files of Library’s Web Servers and as a result we can take a table with one column where each line indicates the path of every visit in the website of Library. 3.1.4. Collection’sjournals Data collected here are related with journals (printed or electronic) that compose the Library’s “journal’s collection”. Typical values that could be selected for analysis are: journal’s title, who ordered it (person, department), its cost, its type (printed, electronic), its subject, the supplier, that period that it belongs (or it belonged) in the collection (e.g. for 2 years and afterwards was interrupted). Another important additional value could be a number that indicates journal’s usage (it shows how many times were used for study by a user). This value could be received, for printed material from OPAC (if is allowed journal’s loan) or from a form that can be found in the journal and every user fills it. If journal is electronic this value can be received from log files of Web Servers where e-journals are hosted.
3.1.5.Interlibrary loan Data collected here are related information about Interlibraries’ loan of material like group of users that asked for the loan (eg students, professors), material’s source (where it was found the material for the loan, that is the supplier), how much time it was required to receive the material, the cost of acquisition. Typical values that could be selected for analysis are: ID and category of user (e.g. undergraduate, postgraduate and professor), user’s department, the supplier by which were asked the material and the time (e.g. in days) that passed until receiving and the final cost.
Libraries’ Services & Management of Resources Qualitative Indicators
505
By using and comparing time that was required for the receiving of the material from the supplier, we can define a very useful indicator that shows supplier’s convenience. Moreover value “cost / time” of receiving indicates globally the quality of supplier from which we are asking the loaned material. This indicator can be applied not only in Interlibrary loan procedure (for characterization of supplier from which we ask material) but also with regard to books, journals or any other material that a Library is acquiring from suppliers. In this occasion the time that a supplier needs to finalize the order and sends the material (that is how much time after it was ordered it will be received from the Library) in connection with the total cost of material indicates quantitatively the quality (and the convenience) of a supplier. This fact can lead to decisions like “we do not select supplier A for books in the sector of Mechanics but we select supplier B because it is more convenient” by comparing values of costltime for the books of Mechanics that we acquired from both suppliers in the past. So, we define a new indicator named “Supplier’s Convenient Indicator” that indicates the response time of a supplier of an order that Library makes for a type of material that wants to acquire. This indicator is coming from ratio “response time of the supplier I final cost of material” and it means: If supplier A offers the material with cost X in T1 time and the B the same material with cost Y in T2 time, then we define the indicator as L=TI * (WY).If LT2, then supplier B is “more convenient” than A. According to this indicator, we present a simple example: Suppose that supplier A delivers to the Library a type material (e.g. book) with cost 50€ after 2 months of the initial order and the B the same material with cost 4W after 3 months, then applying the defined indicator: T1=2 months, T2=3 months, X=50€ and Y=40€, therefore: L=T1 * (WY) = 2 * (50/40) = 2,5. This means that supplier A is “more convenient” than supplier B. 3. I . 6. Cost of material Data collected here are related with the cost of material (printed or nonprinted) like cost of books, journals (printed, electronic), audiovisual material, electronic subscriptions, electronic books (e-books). Typical values that could be selected for analysis are: title of material (e.g. title of book), type of material (e.g. electronic book), date of acquisition, its supplier and its subject. 3. I . 7. Parameters that are receivedfiom Institution These parameters are received from Institution (e.g. Central Administration Office of University). Typical values of them are: total number of University’s members per category (e.g. professors, undergraduate students, postgraduate students) and per department, number of departments in University, number and type of
506
Recent Advances in Stochastic Modeling and Data Analysis
courses of each department, subjects of each department (e.g. Physics, Mathematic, Ancient, Latin) that also have relation with subjects of Library’s collection. It is fact that many times all above parameters are taken as criteria in decision-makingprocedure.
3.I . 8. Users ’ Questionnaires The data are collected here are results from questionnaires that are completed by Library’s users and are inserted in a database (e.g. Excel or Access). They are usually related with users’ satisfaction for a service that is provided by the Library. By this way is reported a quantitative value to express this satisfaction. Typical values that could be selected for analysis are: name of service, rating of satisfaction degree of the service (e.g. satisfied, very satisfied), percentage or number of users that belongs in each rating (e.g. 30% (or 120 under graduated students) answered that are very satisfied), user category in which he belongs (e.g. post graduated students), user’s department, user’s date of birth, user’s year of study (in case of student).
3.2. Cleaning finfiltration)of data In the next stage data must be “cleaned” in order to avoid use of unnecessary or “polluted” data that can change and influence our results. The most frequent sources of data’s “pollution” data are: 0 Duplications of some records from database because of typing errors, value’s change of a variable in one only database and not in all databases where this variable is inserted (e.g. user’s address did not change in all the databases that was inserted). 0 User’s refixal to insert all necessary information about him or insertion of wrong information, for example error address or wrong name. 0 Insertion of polluted elements in some fields of database (for example values of form 1111111) So it is necessary to correct or abstract these data so as final data to be clean of erroneous information that do not have particular meaning and can influence the validity of results. 3.3. Enrichment of data In the stage of “data’s enrichment” can be inserted further info about used data. For example can be inserted answers from users’ questionnaires, indicators from Library’s Associations. 3.4. Data Coding In the stage of “data coding” we can convert the data in a specific form. For example all dates in a specific form (e.g. 30/12/2006), Sex (madwoman) in a specific form (e.g. M for man, W for woman), answers from questionnaires about users’ satisfaction for a Library service in a specific form (e.g. 5: very satisfied, 1: no satisfied at all).
Libraries ’ Services €3 Management of Resources Qualitative Indicators
507
This means that we substitute data’s values in one specific form where there are more than two different. 3.5. Knowledge minina - Application of models / techniclues In this stage we apply methods, models and techniques of knowledge mining in order to take the final information. These methods require special types of data structure and special types of algorithmic approaches (Dunham 2004). The framework of this paper is not including detailed analysis of these methods of knowledge’s mining but simply refers to some of the most usual. Purpose of all these methods is creation of a model that is based on data’s insertion. This means that process of modelling creates the model searching in the data. The methods and techniques that are used widely are: 0 Classification of data in categories (groups) according to some characteristic of them. In this method are applied algorithms based on statistics (regression, Bayesian categorisation), based on the distance (each element that is belonging in the same category can be considered that is closer to elements of same category than elements of other categories), based decisions’ trees (creation of tree in order to model the process of categorisation), based on neural networks, based on rules (eg if-then rules), based on combinational techniques. Clustering, that is similar with classification (as well as in the two cases the data are organised in groups). The basic difference is that groups in clustering method are not pre-defined but they are arising through the process. Substantially, clustering is achieved by finding similarities between data according to some characteristics that exist in them. In this method are applied hierarchical algorithms (accumulative, algorithms, special algorithms that are applied in dynamic (continuously altered) databases, algorithms with categorical (not numerically) characteristics. 0 Creation of correlation’s rules, where the data are analysed in such a way that will be produced correlations between them by examining their concrete characteristics (e.g. correlation that can exists between book’s loan time and user’s category). In this method are applied algorithms of sampling, division and parallelism of data.
Methods described above are applied by the software we are using for the process of knowledge’s mining. The detailed way of their operation is rather complicated to be described in this article. Examples of such software are “Clementine” (SPSS Company) and “Darwin” (Oracle Company), while continuously they present in the market new products in this particular sector. From all these data processing and analysis can result information that is not obvious at the beginning but only afterwards the correlation of processed data.
508 Recent Advances in Stochastic Modeling and Data Analysis
This means that in initial data there were ”hidden“ information and was needed application of techniques and methods for mining this in order to be revealed. Therefore we report indicatively some information that can result from data that we described in 3.1 and present very usefkl knowledge in the observer: From analysis of users’ “loan data” and also data about “material’s (printed or electronic) usage” we can find that correlations between characteristics are related with “the number of reservations” and “the number of loans” for a book. So, we can refer in “factor of reservations related to number of loans for a period“, which shows the degree of direct availability of this book. Therefore we define the indicator of “direct availability of‘ book as follows: Indicator of Direct Availability (DAD) = (number of applications for reservation)/(numberof loans) As long as this indicator has a big value, that means that the demand of book title increases. E.g. if in period of one year “book A” was loaned 10 times and in the same period applied 5 applications for reservations means that this book was not available 5 times when it was asked for loan and for this reason was made reservation. Therefore indicator DAD is 5/10 = 0.5. If we increase copies of this book then this indicator will be improved as will not be so many reservations as this book title will be more available. By using the indicator of “direct availability of a book“ we defined, we can put a threshold. We can say that if the indicator exceeds this threshold then it is a good practice to increase the copies of this book in order to increase its availability to the users. At the same time we can find information about groups of users that loan specific categories of books (or other material) more frequent (e.g. second year students of department A loan more frequent, books related with one specific subject comparing to student of other departments). 0 From analysis of “data from log files of Web Servers” we can export direct conclusions about the most popular pages of website and also about the most popular paths that are followed by the users during a visit in it. This can help to improve whole planning of the Web site. E.g. if we realise that some of the most popular paths that are followed are “A,B,C,E”, “A,D,C” and “A,E,C” then we can create an instant link from web page A to web page C in order to make easier and faster a direct access between pages A and C as we found that are two very popular visited pages from users. 0 Finally, one more sector where it can be applied the technology of knowledge mining, afterwards the process of data were described, concerns the decision making about economical planning and budget allocation of each library in the individual academic departments. This task is very important and complicated. The most important parameters in this task are (Graves 1974, Kao-Chang-Lin 2003) (1) size of department, (2) number of students, (3) cost of library material, (4) appropriateness of collection related to subjects that are studying in the department, ( 5 ) number and type of
Libraries ’ Services 63 Management of Resources Qualitative Indicators
509
department’s courses, (6) research activities of the department, (7) rate of budget’s spending that was given in past years in the department and (8) the circulation statistics. Already have been proposed techniques of analysis of these parameters (as algorithm ABAMDM (Acquisition Budget Allocation Modem via Data Mining)) (Graves 1974, Kao-Chang-Lin 2003), about budget allocation for material’s acquisition in Libraries that are serving a lot of departments. Applying therefore techniques of knowledge mining in the above data, we can export results that can help to decision making in budget allocation tasks.
3.6. Presentation of results - decision making Software tools that are related with knowledge mining have a lot of options that present results from the process in such a way that a lot of useful and detailed information can be received. At the final user of these systems are presented computer screens that have graphic representations, tables, histograms and a lot of type of visualized information, capable to give him a very good and effective report of results (e.g. graphical representations of data and correlations of them).
4. Conclusions This article is underlying the imperative need for data processing and the purpose being to receive useful information in the sector of Libraries. It describes the process of exploration of knowledge and the steps that are followed in order to achieve the reception of useful information (that initially is not obvious) from them. Also, it presents data categories that are used in Libraries that can be processed by using models of knowledge’s mining. It was described how general process and steps that are followed in knowledge’s mining can be applied in the cases related with Libraries. Through description of process was emphasized by examples the usefulness of information. This means that sector of knowledge’s mining in Libraries’ data is very important for receiving useful and not initially obvious information and also in decision making and strategic planning that are leading to improvement of organization and offered services.
References List of reports. Alphabetical order of reports. 1.
Dunham, Margaret H. 2002. Data Mining: Introductory and Advanced
2.
Topics, Prentice Hall. Greaves, F. L., Jr. 1974. The allocationformula as a form of book fund management in selected state-supported academic libraries, Florida
State University, unpublished doctoral dissertation.
510
Recent Advances in Stochastic Modeling and Data Analysis 3.
4.
5.
6.
Han, J. K a t Fu, Y. Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering 1l(5) (1999): 798-805 Hirota, K. ~ a Pedrycz, i W. Fuzzy computing for data mining. Proceedings of the ZEEE 87(9) (1999): 1575-1600 Kao, S.-C. ~ aChang, i H.-C. K ~ Lin, I C.-H. Decision support for the academic library acquisition budget allocation via circulation database mining. Information Processing and Management 39 (2003): 133-147. Kim, Hyunki. 2005. Developing Semantic Digital Libraries Using Data Mining Techniques, UMI Dissertation Services.
CHAPTER 12 Clustering and Classification
Languages Similarity: Measuring and Testing Liviu P. Dinu and Denis Enkhescu University of Bucharest, Faculty of Mathematics and Computer Science, 14 Academiei, 010014, Bucharest, Romania, (e-mail: ldinuafuninf .cs .unibuc.ro, denachesofmi .unibuc . r) Abstract. To investigate the similarity of natural languages, we use t h e following motivation: when a listener hears for t h e first time a language, it is plausible that he can distinguish and individualize syllables; due to this fact, he is able to say to which language or to which family of languages the language he hears is similar to. In order to investigate more rigorous the above hypothesi, a statistical analyze of common syllables excerpted from the representative vocabularies of seven Romance languages is presented. Keywords: Romance languages similarity, statistical approach.
1
Introduction
How many languages are spoken in the world? There is a generally accepted estimation that there are over 6000 languages in the world today [Grimes, 2000,Maxwell and Amith, 200511. The natural languages have a life cycle similar t o any living organism: they are born (first of them probably since the turn of Babel), then they have a adulthood period (in many cases this stage is closely related by the sociologic, cultural, economic and military development of the community which spoke the respective language) and, unfortunately, many of them die or arrive on the verge of extinction. A common prediction is that during the XXI-th century more of half of these languages will disappear (see the Linguistic Society of America at http://www.lsadc.org). A language is endangered (the common syntagma which is used t o say that a language is likely t o become extinct in the near future) from various reasons. T h e most abrupt cause of language extinction is the outright genocide: for example, when European invaders exterminated the Tasmanians in the early 19th century, an unknown number of languages were extinct. A similar situation appears when the community is pressured to give up its language and even its ethnic and cultural identity. However, not only the political or military situations determine the endangering or even the extinction of a language, but the economical or cultural environment too. When a community is put under pressure to integrate in a larger According to one count, 6703 separate languages were spoken in the world in 1996. Of these, 1000 were spoken in the Americas, 2011 in Africa, 225 in Europe, 2165 in Asia, and 1320 in the Pacific, including Australia. 512
Languages Similarity: Measuring and Testing
513
or more powerful group, many languages are failing out of use and are replaced by others that are more widely used in the region or nation, such as English in the U.S. or Spanish in Mexico. Many other languages are no longer being learned by new generations of children or by new adult speakers; these languages will become extinct when their last speaker dies. For example, in the Yupik Eskimo communities in Alaska, where just 20 years ago all of the children spoke Yupik, today the youngest speakers of Yupik in some of these communities are in their 20s, and the children speak only English. Not only some exotic languages are endangered nowadays, but also some European languages like Scots Gaelic, Irish, Frisian, Provencal, Basque or some Romance languages spoken by a few thousand people. Ancient Greek and Latin are in a slightly different situation. They weren’t abruptly replaced by other languages but Ancient Greek slowly evolved into modern Greek, and Latin slowly evolved into modern Italian, Spanish, French, Romanian, and other languages. When a community loses its language, it often loses a great deal of its cultural identity a t the same time: this ranges from prayers, myths, ceremonies, poetry, oratory, humor, ways of speaking to children, and terms for habits, behaviors, and emotions. Much is lost from a scientific point of view as well when a language disappears. A people’s history is passed down through its language, so when the language disappears, it may take with it important information about the early history of the community. Languages can be preserved not only by continuing to be spoken, but by being written down and described, too. Some of the ancient languages (including the well-known Latin, Classical Greek, Sanskrit) survived because they and their grammars have been preserved in written form, which the researchers are able to study. In contrast, the complete loss of other languages is due to their lack of documentation. Thus, because so many languages are in danger of disappearing, linguists and related scientists are trying t o learn as much about them as possible. Researchers make videotapes, audiotapes, and written records of language, they analyze the vocabulary and the rules of the language and write dictionaries and grammars. Naturally, not all language documentation is equal. If we cannot know a lot about all languages, then we should at least know a lot about a few languages, and a little about a lot of languages. So, the tendency to group the natural languages is a natural and necessary one. The problem of clustering the natural languages and determination of the similarity relation between different languages (belonging or not to the same class) is one of the most ancient concerns of the linguists which resulted in numerous linguistic atlases. Although the similarity of natural languages is in principal a very vague notion, the linguistic literature seems to be full of claims of classification of natural languages (which are belonging or not to the same category of languages) as being more or less similar. A better knowledge of the similarity of the languages would help us to solve some current problems, like: the automatical translation, the acquisition of a new language, etc.
514
Recent Advances in Stochastic Modeling and Data Analysis
These are some of the reasons which conducted us to investigate the similarity of natural languages. In the following we will provide a new approach of this problem. Before that, we remind some of the main approaches in language similarity problem (cf. [Homola and Kubon, 20061).
2
On the similarity of languages
Most of the claims related to closeness of two or more languages are in some cases a result of a detailed comparative examination of lexical and/or syntactic properties of languages under question, in some cases they are based on a very subjective opinion of the author, in many other cases they reflect the application of some mathematical formula on textual data. The last case contains in many situations a confused answer, because the notion of language similarity can be easy confused with the notion of text similarity. This is a general trend and is quite understandable, due to existing a plethora of mathematical methods for measuring text similarity. However, the notion of text similarity and, by extension, the notion of linguistic similarity, is in an likewise situation with the notion of ”time” of St. Augustine. At the most recent meeting of the Association for Computational Linguistics (ACL-COLING, Sydney, July 2006), an entire workshop was dedicated to the linguistic similarity (Workshop on Linguistic Distances, eds. John Nerbonne and Erhard Hinrichs) and one of the motivation of this workshop was: ”In many theoretical and applied areas of computational linguistics researchers operate with a notion of linguistic distance or, conversely, linguistic similarity... While many CL areas make frequent use of such notions, it has received little focused attention, an honorable exception being Lebart and Rajman (2000)...We assume that there is always a “hidden variable’ in the similarity relation, so that we should always speak of similarity with respect to some property, and we suspect that there is such a plethora of measures in part because researchers are often inexplicit on this point ...” On the other hand, the approach based on the application of some mathematical formula on textual data is too concentrated on the surface similarity of word forms and thus may not properly reflect the similarity of languages. In [Homola and Kubon, 20061 a solid criticism of this approach can be found. Homola and Kubon give a lot of words belonging to different languages in their forms, but with different means in each of them. Much more, they create (more or less) syntactically correct sentences in English containing only Czech word forms or create a similar sentence in Czech containing only English word forms. In [Homola and Kubon, 2006] there are present four types of language similarity (typological, morphological, syntactical and lexical), and each of them is shortly described.
Languages Similarity: Measuring and Testing 515
3
Measuring and testing
To investigate the similarity of natural languages, we have decide to use the following motivation [Dinu and Dinu, 20051: when a listener hears for the first time a language, it is difficult to believe that he is able to distinguish types, syntactic constructions or even words. In fact, it is more probable that he can distinguish and individualize syllables; due to this fact, he is able to say to which language or to which family of languages the language he hears is similar to. To measure the similarity of Romance languages, in [Dinu and Dinu, 20051 the following strategy was used: the representative vocabularies of seven Romance languages (Latin, Romanian, Italian, Spanish, Catalan, French and Portuguese languages) (Sala, 1988) were syllabified. For each vocabulary a ranking of syllables was constructed: the most frequent syllable of the vocabulary was putted on the first position, the next frequent syllable was putted on the second position , and so on. Then each of the seven Romance languages was compared to the other six (using rank distance [l]),for each comparison having a graphic as a result. Some quantitative aspects of used corpus are presented in this volume in [Dinu and Enachescu, 20071 In the following we will use the same corpus, but a different strategy will be used to investigate the similarity of Romance languages. In the statistical approach we consider as random variables the seven Romance languages and as cases the common syllables in the representative vocabularies of the languages. The sample size is 165 (i.e. the number of the common syllables) and each case is represented by a row-vector with seven components. Each component contains the absolute frequency of the syllable in the corresponding language (i.e. the number of occurrences of the syllable in the representative vocabulary of the language). Since most of the common syllables are situated in the first part of the rankings of their languages, their contribution to the general corpus is an important one (cf. previous section); so, an analysis related of this sample is a robust one. A descriptive statistics of the data and also graphically depicted are presented in this volume in [Dinu and Enachescu]. In order to investigate more rigorous the above observations we consider the Nonparametric (rank-order) Correlations Analysis. In this setting we compute the Spearman R and the Kendall Tau correlations matrices (Table 1 and Table 2 respectively and Figure 1 for a matrix scatterplot). Spearman R can be thought of as a regular Pearson product-moment COTrelation coeficient (Pearson r); that is, in terms of proportion of variability accounted for, except that Spearman R is computed from ranks of the common syllables. Kendall Tau is defined as
T = (#agreements - #disagreements)/total number of pairs
516
Recent Advances in Stochastic Modeling and Data Analysis
The variable Latin Latin
1
I
S pearman Rank Order Correlation (common syllables) Italian I Spanish I Catalan IPortugueselRomanianl French
10.71306410.69494610.6252361 0.612548 I 0.729924 10.728573
Table 1. Spearman Rank correlation matrix of the data; MD pairwise deleted; Marked correlations are significant at p<0.5
Kendall Tau and Spearman R imply different interpretations: while Spearman R can be thought of as the regular Pearson product-moment correlation coefficient as computed from ranks, Kendall Tau rather represents a probability. Specifically, it is the difference between the probability that the observed data are in the same order for the two variables versus the probability that the observed data are in different order for the two variables. The statistical significance of the correlation coefficients in Table 1 and Table 2 is p = 0.05. The statistical significance of a result is an estimated measure of the reliability of the result. Specifically, the p-level represents the probability of error that is involved in accepting our observed result as valid, that is, as ”representative of the whole population” Figure 1 represents The variable
K endall Tau Correlation (common syllables)
Latin Latin Italian Spanish Catalan Portuguese Romanian French
I Italian I Spanish I Catalan IPortugueseIRomanianI French
1 10.53290810.53013010.4745931 0.458219 0.532908 1 0.676644 0.600123 0.653891 0.530130 0.676644 1 0.652790 0.753544 0.474593 0.60012 0.652790 1 0.686252 0.458219 0.653891 0.753544 0.686252 1 0.560656 0.573227 0.577982 0.511912 0.483845 0.560255 0.523457 0.542500 0.499679 0.498522
I 0.560656 0.573227 0.577982 0.511912 0.483845 1 0.504332
10.560255 0.523457 0.542500 0.499679 0.498522 0.504332 1
Table 2. Kendall Tau correlation matrix of the data; MD pairwise deleted; Marked correlations are significant at p<0.5
scatterplots (i.e. each syllable is represented by a point of coordinates given by the observed frequencies in the pair of the considered languages) of all possible pairs of variables with the corresponding regression line (i.e. the line which minimizes the sum of squared distances from each point t o it). Histograms (i.e. the height of each bar is proportional with the frequency of
Languages Similarity: Measuring and Testing 517
Fig. 1. The scatterplot matrix of the data
the corresponding syllable) of each variable are shown along the diagonal of the scatterplot matrix. Looking down Table 1 and Table 2 we observe: the first three Spearman R correlated languages are Spanish with Portuguese (R=0.90), Catalan with Portuguese (R=0.86) and Italian with Spanish (R=0.84). At the opposite end we find Portuguese with French (R=0.65) and Portuguese with Latin (R=0.61); the Kendall Tau follows the same trend as the Spearman R coefficients, except the less correlated languages, which are, in this case, Romanian and Portuguese (T=0.48). The above approaches are pair wise; what is really required in our situation is something that is common to all the variables and can be used as a 'score' of a language. From a geometrical point of view, a line or lines (factor axes) that would pass through the centroid of the cloud of points in
518
Recent Advances in Stochastic Modeling and Data Analysis
the multidimensional space are required. The technique that can accomplish this is Principal Components Analysis (PCA). In PCA, basically, the factor axes are thought of as best fit for the cloud of points in the vector space of variables (in our case included in R165)according to the least squares criterion. Mathematically speaking, the objective is to obtain a set of orthogonal vectors, where each vector generates a straight line in R165 with the least squares property. These vectors are called the factor axes and are further used for computing the factor coordinates of the points-variables in R165. Projection of the original variables onto the factor space Fz, generated by the first two factors (the first factor is extracted so as to capture the variance to the maximum extend; second factor is extracted so as to capture the remaining variance to the maximum extend, and so on) can reveal the hidden differences among variables ( see Enkhescu, 2003 for additional details). It may be noted that: the orientation of the factorial axes is arbitrary because they are computed modulo the sign. This fact doesn't alter the form of the cloud and hence the distances between the points; the PCA emphasize only the linear relationship between the variables. A small absolute correlation coefficient between two variables means that they are linear uncorrelated while a nonlinear relation can exist. Table 3 and Figure 2 present the results carried out via the correlation matrix (of the Pearson r coeficients) and not on the covariance matrix in order to avoid that the factors are affected by the differences in the amount of variability in the active variables (i.e. the original variables). The variable F actor-variable correlations based on core (common syllables) Factor 1 I Factor 2 I Factor 3 I Factor 4 I Factor 5 I Factor 6 -0.7858081-0.3402911 0.488528 1-0.1615861-0.0434491 -0.007056 Italian I Spanish 1-0.9103951 0.306901 1-0.0053541 0.016540 I 0.122519 I -0.248353 Catalan
Portuguese Romanian French *Latin
1-0.8906271 0.2230284 1-0.0502161 0.093504 1-0.3408071 -0.861347 0.404738 -0.036941 -0.181776 0.160498 -0.870011 -0.281133 -0.000613 0.377891 0.124036 -0.749121 -0.488114 0.403059 -0.191634 -0.026729 -0.655729 -0.565100 0.295482 -0.074997 -0.166205
O.028466 0.184670 0.076535 -0.025842 0.030853
Table 3. A PCA of the data. Factor-variable correlations based on core (common syllab1es);Active and supplementary variables; *Supplementaryvariable.
The factor space Fz (Figure 3 ) explains 84.49% of the whole variance and suggests three correlation-clusters: (1) Portuguese, Spanish and Catalan; (2) Romanian and Italian; (3) French. The Latin variable is set as supplementary
Languages Similarity: Measuring and Testing 519
variable (i.e. variable not used to compute the factor axes and projected onto the vector subspace generated by the factors). Analyzing the results in Table 3, we observe that Factor 1 explains 91% of the variability of the Spanish variable while Factor 2 is most correlated with the French variable (r=-0.49). Hence from ’the Spanish point of view’ the Catalan, Portuguese and Romanian languages are very similar (highly correlated) while from ’the French point of view’ the Portuguese and the Italian are similar. Finally, it may be noted that all the languages are
Fig. 2. The Principal Components Analysis
anti-correlated with the first factor while for the second factor Portuguese, Spanish and Catalan are positive correlated but Romanian, Italian, French and Latin are negative correlated.
4
Conclusions
During the time, different comparing methods for natural languages were proposed. We saw in Section 2, that these methods can vary a lot. Since the
520
Recent Advances i n Stochastic Modeling and Data Analysis
similarity of n a t u r a l languages is a n i mp o r t an t problem (with applications in automatical translation, language acquisition, problems of endangered languages, etc.), but on the other hand i t is a vague notions, it is im porta nt to have more results coming from different approaches. The more similar results come from different points of view, the more robust the conclusions are. In th is paper we have investigated the similarity of Romance languages based on rankings of syllables from t h e main vocabulary of each language, using two complementary approaches: one based on r an k distance a n d the othe r o n statistics. The conclusions a r e very similar to each other. T h e y are also in concordance w i t h other approaches (e.g. Dinu and Dinu, 2005), bringing a plus of rigor and statistical significance.
Acknowledgements T h e first au t h o r is indebted to MEdC-ANCS which supported t h i s research.
References [Dinu, 2003]Dinu, L.P. On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Infomaticae, 55, 1, 39-50, 2003. [Dinu and Dinu, 2005]Dinu, A,, Dinu, L.P. On the Syllabic Similarities of Romance Languages. A. Gelbukh (ed.): CICLing 2005, Lecture Notes in Computer Science, Volume 3406, pp. 785-789, 2005. [Dinu and Enkhescu, 2007lL.P. Dinu and D. Engchescu. On clustering Romance languages. In this volume. [Enkhescu, 2OOS]En~chescu, D. Statistical Technics for Data Mzning.Univ. of Bucharest Press, Bucharest, 2003 [Grimes, 2000]Grimes, B.F. Ethnologue: Languages of the World. 14th. ed. Summer Institute of Ling., Dallas, 2000 [Homola and Kubon, 2006]Homola, P. and Kubon, V. A Structural Similarity Measure. Proceedings of the Workshop on Linguistic Distances, pages 91-99, Sydney, July 2006. [Lebart and Rajman, 2000]Lebart, L. and M. Rajman. Computing Similarity. In R.Dale et al. (eds.) Handbook of NLP. Dekker: Basel, 2000. [Maxwell and Amith, 2005]Maxwell, M. and J.D.Amith. Language documentation: the Nahuatl Grammar. A. Gelbukh (ed.): CICLing 2005, Lecture Notes in Computer Science, Volume 3406, pp. 474-485, 2005. [Li et al., 2004lM. Li, X. Chen, X. Li, B. Ma, and Paul M.B. Vitanyi. The Similarity Metric. IEEE Transaction on Information Theory, 50:12(2004), 3250-3264, 2004. [Nerbonne and Hinrichs, 2006]Nerbonne, J . and Hinrichs, E.(eds.) Linguistic Distances Proceedings of the Workshop on Linguistic Distances, pages 1-6, Sydney, July 2006. [Woodbury]Woodbury, A.C. What Is an Endangered Language? Linguistic Society of America, Whashington. Also available at http://www.lsadc.org/info/pdf-files/Endangered-Languages.pdf [Ziegler, 2000]Ziegler, A. Word Length in Romance Languages. A Complemental Contribution. Journal of Quantitative Linguistics, 7, 1, 65-68, 2000.
On clustering Romance languages Liviu P. Dinu and Denis Engchescu University of Bucharest, Faculty of Mathematics and Computer Science, 14 Academiei, 010014, Bucharest, Romania, (e-mail: ldinuQfuninf.cs.unibuc.ro, denachesQfmi.unibuc.r) Abstract. If grouping of languages in linguistic families is generally accepted, the relations between the languages belonging to the same family periodically attracts the researchers’ attention. We investigate the similarity of Romance languages based on the syllables excerpted from the representative vocabularies of seven Romance languages (Latin, Romanian, Italian, Spanish, Catalan, French and Portuguese languages) In the statistical approach we consider as random variables the seven Romance languages and as cases the common syllables in the representative vocabularies of the languages. A descriptive statistics of the data is given and also graphically depicted as a box and whisker plot. The purpose of our approach is, given these data, to find out if the Romance languages form ”natural” clusters that can be labelled in a meaningful manner. To answer this question we perform a joining analysis (tree clustering, hierarchical clustering) on this data. In this setting, every language (i.e. variable) represents a singleton cluster. At each of the N - 1 steps (i.e. N = 7 the number of the Romance languages taken into consideration) t h e closest two (least dissimilar) clusters are merged into a single cluster, producing one less cluster at the next higher level. We also display the dendogram obtained by clustering the Romance languages using the nearest-neighbor technique. Keywords: Romance languages clustering, syllable.
1
Introduction
In a metaphoric way, we can say that the natural languages are like the musical pieces. All musical pieces are similar, but some are more similar than others. A human expert, comparing different pieces of music with the aim to cluster them (to form musical genre), will generally look for certain specific similarities. These characteristics typically are related t o the instrumentation, rhythmic structure, and harmonic content of the music. The musical genre are labels created and used by humans for categorizing and describing the vast universe of music. Musical genres have no strict definitions and boundaries as they arise through a complex interaction between the public, marketing, historical, and cultural factors. The parallel with natural languages is immediate. The natural languages are grouped in linguistic families, and there are rarely situations when a language is spoken onely in a bounded area. T h e languages interact and during this interaction they borrow words, syntagma or even more complex structures (related not only t o lexicon or morphology but syntax too). The parallel between natural 521
522
Recent Advances in Stochastic Modeling and Data Analysis
languages and music is, in some cases, even more obvious: for example, the Chinese have tones, and in many situations the meaning of a word is given by its intonation. Wales is known for its predilection to form symmetric phrases, with a strong melodicity. The attempts to automatize grouping of pieces into genres or natural languages into linguistic families followed a line inspired from the methodology of the human experts [Tzanetakis and Cook, 20021. In music, generally speaking, they take a file containing a piece of music and extract from it various specific numerical features, related to pitch, rhythm, harmony etc . Following such a methodology [Cilibrasi et al.] showed that Haydn is more similar to Mozart or Bach than to Metallica or Miles Davis. However, if the differences between genres are reasonable good separated, the similarities inside the same genre conduct in many cases to long debates (”Haydn is just like Mozart - no, he’s not!”). A similar debate appears in natural languages too. If grouping of languages in linguistic families is generally accepted , the relations between the languages belonging to the same family periodically attracts the researchers’ attention. A possible explanation is that the similarity of natural languages is a fairly vague notion, in spite the the fact that linguistic literature abounds in claims of classification of natural languages. Most of the claims related to closeness of two or more languages are in some cases a result of a detailed comparative examination of lexical and/or syntactic properties of languages under question; in some cases they are based on a very subjective opinion of the author; in many other cases they reflect the application of some mathematical formula on textual data. The last case contains in many situations a confused answer, because the notion of language similarity can be easily confused with the notion of text similarity. This is quite understandable, due to the existing plethora of mathematical methods for measuring text similarity. On the other hand, this approach is too concentrated on the surface similarity of word forms and thus may not properly reflect the similarity of languages. In [Homola and Kubon, 20061 a solid criticism of this approach can be found. Homola and Kubon give a lot of examples of words forms belonging to different languages, with different means in each of them. Much more, they create (more or less) syntactically correct sentences in English containing only Czech word forms or create a similar sentence in Czech containing only English word forms. In [Homola and Kubon, 20061 there are present four types of language similarity (typological, morphological, syntactical and lexical), and each of them is shortly described. In the following we propose an approach based on the music sounds languages metaphor. Thus, we claim that when an human expert hears for the first time a language, he classifies this new language by thinking to what group of languages its sounds resemble. But what records the expert? In our opinion, it is difficult to believe that he is able to distinguish syntactic
On Clustering Romance Languages
523
constructions or even words. In fact, it is more plausible that he can distinguish and individualize syllables; due to this fact, he is able to say to which language or family of languages the language he heard is similar to. Based on this supposition, we will further investigate the similarity of Romance languages based on the syllables excerpted from the representative vocabularies of seven Romance languages (Latin, Romanian, Italian, Spanish, Catalan, French and Portuguese languages) (Sala, 1988).
2
Syllables and the similarity
To measure the similarity of Romance languages, in [Dinu and Dinu, 20051 a similar motivation was used. There, the following strategy was used: the representative vocabularies of seven Romance languages (Latin, Romanian, Italian, Spanish, Catalan, French and Portuguese languages) (Sala, 1988) were syllabified. For each vocabulary a ranking of syllables was constructed: the most frequent syllable of the vocabulary was putted on the first position, the next frequent syllable was putted on the second position , and so on. Then each of the seven Romance languages was compared to the other six (using rank distance [Dinu, 2003]), for each comparison having a graphic as a result. In Table 1, we present the number of distinct syllables (type) and the number of all the syllables (token) from every language analyzed. The frequency of the syllables from every language is not uniformly distributed.
Table 1. The percentage covered by the first syllables
Table 1 shows the fact that the syllables are distributed according to some principles of the minimum effort type; thus, a relatively small number of distinct syllables will cover a large part of the corpus of all the analyzed syllables. Generally, the first 300 syllables (ordered according to their frequency) cover over 80% (even 90% for some languages) from the number of all the existent syllables. After this level, the percentage increases slowly. In the following we will use the same corpus, but a different strategy will be used to investigate the similarity of Romance languages.
Recent Advances in Stochastic Modeling and Data Analysis
524
3
Clustering Romance languages
In the statistical approach we consider as random variables the seven Romance languages and as cases the common syllables in the representative vocabularies of the languages. The sample size is 165 (i.e. the number of the common syllables) and each case is represented by a row-vector with seven components. Each component contains the absolute frequency of the syllable in the corresponding language (i.e. the number of occurrences of the syllable in the representative vocabulary of the language). Since most of the common syllables are situated in the first part of the rankings of their languages, their contribution to the general corpus is an important one (cf. previous section); so, an analysis related of this sample is a robust one. A descriptive statistics of the data is given in Table 2 and also graphically depicted in Figure 1 as a box and whisker plot.
Box & Whisker Plot 180 160 140 120
100
80 60 40
20 0
"20 -40
I '
-60 -80
-100 -120
Latin
Spanish
Italian
Portuguese
Catalan
French
Romanian
Fig. 1. Box-Whisker-Plot
It is easy to observe that:
On Clustering Romance Languages 0
525
according to Z i p f ' s principle of minimum effort, the Latin language uses, in mean, the minimum syllables (i.e. 16.48). At the opposite corner, we find the Italian language with a sample mean of 36.25. W.r.t the mean point we can consider three clusters: French and Latin with means varying from 10 to 19, Catalan, Romanian and Portuguese with means varying from 20 to 29 and Spanish and Italian with means varying from 30 to 39; w.r.t. the variability, the minimum sample standard deviation, Std., is reached by French language with 20.18 and the maximum by the Italian language with 66.25.
Table 2. Descriptive statistics of the data
The purpose of our approach is, given these data, to find out if the Romance languages form "natural" clusters that can be labelled in a meaningful manner. To answer this question we perform a joining analysis (tree clustering, hierarchical clustering) on this data. In this setting, every language (i.e. variable) represents a singleton cluster. At each of the N - 1 steps (i.e. N = 7 the number of the Romance languages taken into consideration) the closest two (least dissimilar) clusters are merged into a single cluster, producing one less cluster at the next higher level. Therefore, a measure of dissimilarity between two clusters (groups of languages) must be defined. Let G and H represent two such groups. The dissimilarity d(G,H ) between G and H is computed from the set of pairwise observation dissimilarities dii, where one member of the pair i is in G and the other i' is in H . We choose, in this case, the dissimilarity dZi,
=
1 - T 2%' "
(1)
where rii' is the Pearson linear correlation coefficient between variablelanguage i and variable-language i'. Table 3 gives the dissimilarities between Romance languages.
526
Recent Advances in Stochastic Modeling and Data Analysis
Table 3. The dissimilarities between Romance languages
Single linkage (SL) agglomerative clustering takes the intergroup dissimilarity to be that of the closest (least dissimilar) pair
d s L ( G , H )=
min
ZEG,i’ E H
This is also often called the nearest-neighbor technique (see Enachescu, 2003 for additional details). The most important result to consider in a hierarchical clustering analysis is the hierarchical tree. Figure 2 display the dendogram obtained by clustering the Romance languages using the nearest-neighbor technique. A non-graphical presentation of the dendogram is given in Table 4. The amalgamation schedule table gives sets of languages and their linkage distances (the leftmost column of the table).
Table 4. The linkage rule used to cluster the Romance languages
The graph of amalgamation schedule displayed in Figure 3 is very useful by suggesting a cutoff for the tree diagram. In the tree diagram, as you move to the right (increase the linkage distance) larger and larger clusters are formed of greater and greater within-cluster diversity. If this plot shows a clear plateau, then it means that many clusters were formed at essentially the same linkage distance. That distance may be the optimal cut-off when deciding how many clusters to retain (and interpret).
On Clustering Romance Languages
527
Fig. 2. Dendogram from agglomerative hierarchical clustering with single linkage of the romance language
Analyzing the results, it seems reasonably to cut-off the tree at a linkage distance between 0.151 and 0.286 and hence to obtain four clusters for the Romance languages: Spanish, Portuguese and Catalan; Latin and Italian; Romanian; French.
4
Conclusions
In this paper we have investigated the capacity of Romance languages to form natural clusters based on the syllabic similarity. The conclusions are also in concordance with other approaches ([Benedetto et al., 20021, [Dinu and Dinu, 20C [Li et al., 2004]), bringing a plus of rigor and statistical significance.
Acknowledgements The first author is indebted to MEdC-ANCS which supported this research.
528
Recent Advances in Stochastic Modeling and Data Analysis
035ti 0,325-
0.300 0.275 . ~~
1
f
0.200 0,175
.
References [Benedetto et al., 2002lD. Benedetto, E. Caglioti, and V. Loreto. Language trees and zipping, Phys. Review Lett., 88(4), 2002. [Cilibrasi et al.]Cilibrasi, R., Vitanyi, P. and R. de Wolf. Algorithmic Clustering of Music. [Dinu, 2003]Dinu, L.P. On the classification and aggregation of hierarchies with different constitutive elements. Fundamenta Informaticue, 55, 1, 39-50, 2003. [Dinu and Dinu, 2005]Dinu, A. and Dinu, L.P. On the Syllabic Similarities of Re mance Languages. A. Gelbukh (ed.): CICLing 2005, Lecture Notes in Computer Science, Volume 3406, pp. 785-789, 2005. [Enkhescu, 2003]En5chescu, D. Statistical Technics for Data Mining.Univ. of Bucharest Press, Bucharest, 2003 [Homola and Kubon, 2006]Homola, P. and Kubon, V. A Structural Similarity Measure. Proceedings of the Workshop on Linguistic Distances, pages 91-99, Sydney, July 2006. [Lebart and Rajman, 2000]Lebart, L. and M. Rajman. Computing Similarity. In R.Dale et al. (eds.) Handbook of NLP. Dekker: Basel, 2000. [Li et al., 2004lM. Li, X. Chen, X. Li, B. Ma, and Paul M.B. Vitanyi. The Similarity Metric. IEEE Transaction on Information Theory, 50( 12), 3250-3264, 2004. [Tzanetakis and Cook, 2002]Tzanetakis, G. and P. Cook. Musical Genre Classification of Audio Signals. IEEE Transactions o n Speech and Audio Processing, 10(5), 293-302, 2002. [Ziegler, 2000]Ziegler, A. Word Length in Romance Languages. A Complemental Contribution. Journal of Quantitative Linguistics, 7 , 1, 65-68, 2000. [Zipf, 1932]Zipf, G.K. Selected Studies of the Principle of Relative Frequencies in Language. Cambridge, Mass. 1932.
A clustering method associated pretopological concepts and k-means algorithm T.V. Le, N. Kabachi, and M. Lamure Uiiiversite Claude Bernard Lyon 1 Laboratoire d’InfoRmatique en Image et S y s t h e s d’inforination Groupe MAZD bgtiment Jean Braconnier 43, boulevard du 11 novembre 1918 69622 VILLEURBANNE Cedex FRANCE (e-mail: than-van. leQuniv-lyonl . fr , nadia. kabachiQuniv-lyonl.f r , michel.lamureQuniv-1yonl.fr) Abstract. The aim of t,his work is to define a clustering method starting from the pretopological results related to the minimal closed subset concepts which provide us the view of relations between groups in its structure; then, we consider this result as the pre-treatment for some classical clustering algorithms. Especially, k-means philosophy is observed by its remarkable benefits. Thus we propose a new clnstering method in two processes such as structuring process and clustering one. This method allows us to: obtain a data clustering for both of categorical and numeric data - exclude the limit in determination of cluster number a priori - and attain well-shaped clusters wliose shapes are not influenced on existence of outliers. Keywords: pretopology, pseudoclosure, minimal closed subsets, k-means, clustering.
1
Introduction
Clust,eringor unsiipervised classificat,ion consist,s in part,it,ioiiiiig different, objects into groups such that two objects are more closely related according to some criterions if they belong to the same group and quite different otherwise [Gordon, 19991, [Jain and Dubes, 19881, [Burges, 19981. Clustering applies in many fields such as finding groups of customers behaviours in marketing, identifying groups of patients behaviours in emergency service. ..Among existing algorithms and methods of clustering presented in [Celeux et al., 19891, [Raymond and Han, 19941, [Gupta et al., 19991, and in [Kaufman and Rousseeuw, 19901, the k-means philosophy is frequently used [Picard, 20011, [He et al., 20051, [Tran, 20061, [Chen, 20061. However, it cannot be used with categorical data, the number of clusters must be predetermined and possible outliers can have influence on classes shapes. For overcoming these difficulties, we present a new clustering method based both on pretopological concepts and k-means algorithm functioning in two steps: By omitting these limits, we present a new clustering method based on pretopology and k-means algorithm within its two steps: 529
530
Recent Advances in Stochastic Modeling and Data Analysis The first step is a structuring process. Pretopology [Belmandt, 19931, [Lamure, 19871 provides us concepts for structuring any data set, even if the data set is not endowed by a metric. The basic concept is the concept of pseudoclosure which lea.& to define closed subsets and 1riiriinia.l closed subsets. These minimal closed subsets are the basis for analyzing how a set is struct,ured and permit to define an algorithm for analyzing the way of structuring [Bonnevay and Largeron, 20021, [Le and Lamure, 20061 The second step is the partitioning process. From the previous step the number k of clusters is directly determined as well as the germs of the clusters. The central point is that the number of clusters and the determination of clusters is fully determined by the structure of the data set, we have not to use more or less adequate procedures to determine them. The paper is subdivided in three sections after this introduction. Section 2 proposes basic concepts of pretopology which are needed to understand our method. Section 3 presents the extension of k-means algorithm with the structuring process as a pretreatment and section 4 provides concluding remarks on the proposed method.
2
Pretopology concepts
Pretopology is ail exteiision of topology which differs by the 11011 idempotence of the pseudoclosiire function (which jiist,ifies the t,erminology, t,he pseudoclosure is not a closure in a topological sense). Applications of pretopology are numerous and various, however structural analysis and clustering have already been well explored [Emptoz, 19831, [Nicoloyannis, 19881, [Hashom, 19821. For the following, it is important to note that the nonidempotence of the pseudoclosure function is a fundamental property for our purpose of following up the process of structuring a data set.
Pseudoclosure
2.1
Definition 1 Given a n o n empty set E , a function a ( . ) from Y ( E ) into % ( E )is called a pseudoclosure i;f and only i f V A E Y ( E ) , a ( @ =)(D,a(A)3 A Then, (E, a) is said a pretopological space. According to properties of a(.), we obtain more or less complex pretopological spaces from the most general spaces t o topological spaces. Pretopological spaces of 23 type are the most interesting case. In that case, a(.) fulfills the following property: -
VA E Y ( E ) , V BE P ( E ) , Ac B => a ( A ) c a ( B )
Clustering Method Associated Pretopology and k-Means Algorithm 2.2
531
Pretopology and binary relationships
Suppose we have a family {LRi}i=l,.,," of binary reflexive relationships on a finite set E, it is possible to defiiie a '13 pretopological space from the family {R}i=l,..,"
Let us consider for any x in E: 23,(x) is defined by: % ( x ) = {y E E(z(32zy) u { x } We call D ( x ) the family of the neighborhoods of x, is defined by: - D ( x ) = {V E !p(E)lVi,23z(x) c V } Then given the family '13(x)for any x in E , the pseudoclosure a(.) is defined by: - V A c !J?(E), a ( A )= { x E EIVV E V ( x ) ,V n A # 0)or equivalently: - V A E ! J ? ( E ) , a ( A= ) {x E E l V i , ! E ( z ) n A # 0}
Proposition 1 Given a family {LRi}i=l,,.,n of reflezizie bina.ry relationships on a finite set E, the pretopologica.1 spa.ce(E, a) d e j k e d bg using the pseudoclosure a(.) as described above is a D one. We can see here the interest of '13 pretopological spaces: suppose we have not numeric data to represent members of E in a subset of IR" but only opinions of experts who can express similarities between members of E by different relationships according t,o different point,s of view, pret,opology, by means of D pretopological spaces, enables us structuring the set E and then offers a possibility for clust>eringE . 2.3
Minimal closed subsets
Definition 2 Let (E, a) a pretopological space, A a subset of E is said a closed subset zf and only i f a(A) = A . Definition 3 Given (E, a) a pretopological space, for any subset A of E , we can consider the whole family of closed subsets of E which contain A . If so, we determine the smallest element of that family for inclusion. If exists that element is called the closure of A and denoted F(A). Proposition 2 I n any pretopological space of type 23, given a subset A of E , the closure of A always exists. Given a set finite E, the closure F(A) can be calculated by usiiig the following property that is useful in calculating a distance between dements, it will be detailed in the germ function.
3k < I E J , F ( A )= a k ( A ) = a(ak-' ( A ) ) We denote 5e the family of elementary closed subsets the set of closures of each singleton { x } of P(E). So in a '13 pretopological space, we get: - V x E E , 3F, : closure o f{.} - 5e = {F,lx E E }
532
Recent Advances in Stochastic Modeling and Data Analysis
Definition 4 F is called a minimal closed subset if and only zf F is a minimal element f o r inclusion in 5e.
The pre-treatment method
3
The k-means algorithm was proposed by Forgy in 1965 [Gordon, 19991. It is one of the most frequently clustering algorithm due to its efficiency aiid simplicity. It provides the possibility to classify objects in k groups by a priori selecting k points which are used as germs to determinc the k clusters. Thus, the k-means algorithm works as follows:
1. Selecting k initial objects M, called centres of the k clusters. 2. Assigning each object 0 to the cluster C, of centre M , such as d i s t ( 0 , Ad,) is minimal. 3. Computing the new centre M, of each cluster C,. 4. Go to step 2 until the objects in any cluster do no longer change. One of the limits of this algorithm is the number of clusters must be predetermined and a process for selecting initial centres must be performed in one way or another. By means of the pretopological concept of minimal closed set, we propose a structuring process which enables us to know:
1. How many clusters must be searched for, based on information provided by the process about structuring? 2. What objects of the data set can be used as germs or initializing the k-means algorithm? Thus, we are able to start the k-means algorithm from information provided by the data structure itself aiid not by using more or less artificial processes. 3.1
Structuring process
5e = {F’ei}i=l,,n, a set of elementary closed subset of E , { F m j } j = l , , ma set of minimal one with n = 15e1,rn = 15ml.
Let us call
5m
=
The structuring process can be describcs as follows: 0
0 0
Given the data set E and a family of reflexive binary relationships Ri, define the pseudoclosure a(.) as recalled in section 2 . Determine the family of elementary closed subsets of E. Determine the family of minimal closed subsets of E.
Given the pseudoclosure a(.), we scarch for following function:
ze = 0; for all x E E do{
F, = a ( { x ) ) ;
5e
in a set finite E by the
Clustering Method Associated Pretopology and k-Means Algorithm 533 While (a(F,) # F,)F, = a(F,); If(F, $! 5e)Se = Se U F,;
} Then, we are able to determine the minimal closed subsets Sm by using the following function, by noting that we only need to extract these minimal closed subsets from Se
5m = 0; While(5e # 0){ Choose F
c 5e;
Se = 5e - { F } ; minimal = true; A = Se; While((%# 0)II(minimal)){ Choose G E A; If(G c F)minimal = false; Else if(F c G)$e = 5e - { G } ; A=A-G;
1
If((minima1== true)&& ( F 6 Sm)) 5m = 5m U F ;
1 Example: we study about the toxic diffusion between 16 geographical areas E = { P O ,. . , p l S } . The family of reflexive binary relationships reduce to only one relationship: p i R p j ( p j pollutes p i ) if the distance between pi and p j is less than a positive given threshold T and p j is higher in 3D space equal than pi. We thus get the following table which gives for any pi the set of related p j , the successive pseudoclosures a k ( p i ) of p i and the closure Fpi of pi.
2 2,3 1,2,3 1,2,3 ... 3 2,3 1,2,3 1,2,3 ... 4 4,5,7 ... 4,5 43 5 4,5,6 4,5 ... 4,5 6 6 5,6 4,5,6 ... 7 7 6,7 5,6,7 ... 8 8,9 8,9 S,9,10 ... 9 8,9,10 8,9,10 8,9,10 ... 10 9,10,11 8,9,10 8,9,10 ... 11 11,12 10,11 10,11 ... 12 12,13,15 11,12,13 10,11,12,13,14 ... 13 12,13,16 12,13,14 11,12,13,14 ... 14 13,14 14 14 ... 15 15 12,15 11,12,13,15 ... 16 15 ... 13,16 12,13,14,16
1 ~ 3 1,2,3 4,5f 4,5* 4A6 4,5,6,7 8,9,10* 8,9,10* 8.9,10* 8,9,10,11 8,9,10.11,12,13,14 8,9.10,11,12,13,14 14' 8,9,10,11,12,13,14,1: S,9,10,11,12,13,14,1f
Table 1. Relationship and pseudoclosure data
534
Recent Advances in Stochastic Modeling and Data Analysis
By performing the minimal closed subset algorithm, we get the family of minimal closed subsets. In this example, we get the following minimal closed subsets: {l},{4,5}, {8,9, lo}, (14). This example illustrates how we can interpret a minimal closed subset. If we consider {4,5}, we have a set of two points such as it is not possible t o link any other point of E by the relationship. And {4,5} is the smallest such subset based on the two points 4 and 5. So we can interpret ( 4 , s ) as a strong pattern of t,he structure defined on E by means of the pretopology. Thus, by definition, both the family of minimal closed subsets and the elementary closed subsets characterize the structure underlying the data set E. So, the number of minimal closed subsets is a quite important parameter: it gives us the number of clusters t o use in the k-means algorithm (here four clusters). Moreover, the initial centres for starting the k-means process must be searched in the minimal closes subsets. 3.2
Determining the initial centres
Many possibilities can be used t o determine the initial centres. We propose t o determine them as follows. Two possibilities can occur for the minimal closed subset of a point x:
1. F m ( { x } )= 2 , then z is a centre for a cluster 2. F m ( { x } )= {XI,X Z ,..., xp},we have t o decide what xi to select in F r n ( { z } ) . For that, we calculate Ila(zi)\l,for i = 1, ...,p. And we select x, such as Iia(x,)i( = Max(lla(zi)il),for i = 1,...,p. This means we select the point
x, in F r n ( { x } )which has the greatest number of points in its pseudoclosure, 2 , is then a point which is greatly linked to other points via the relationships Ri. In case where more than one such xo exist, we can adopt two strategies: 0 The first one; quite siniple consists in randomly drawing one x,, The second one returns to the data and leads t o select a x, in a dense area of F m ( { x } ) .This implies that we are able to compute a distance between points of E . To remain coherent with the structure induced by the family of relationships Ri,this distance can be computed in the same way the well known Hausdorff distance: given two subsets A and B of E , we compute the distance & ( AB , ) by determining k , = m i n ( m i n { k l A ~ a ~ ( B ) } , o o ) , k=1m i n ( m i n { k l B ~ a ~ ( A ) } , c oand ) then 6 ( A , B )= min(k,,kl). In case where A and B are reduced to one element x and y, we get the distance b(x,y). We call this distance the pseudoclosure distance. By using the above procedure, we obtain the following set M of initial centres: M = {{l},{4}, {9}, (14)) and the distance table between the elements of E and the initial centres is given hereunder. With the data of the previous example, we obtain the following distance table.
Clustering Method Associated Pretopology and k-Means Algorithm
535
It is then possible t o use the classical k-means algorithm in conjunction with the pseudoclosure distance. The data of the example lead 11st,o the final partition of E: {1,2,3}, {4,5,6,7}, {8,9, lo}, {11,12,13,14,15,16}.
4
Conclusion
This paper presents a new clustering method which combine minimal closed subset algorithm based on pretopology with k-means algorithm. Pretopology provides us an approach to analyze the population structure but does not provide a partition. However, it helps us to find automatically a number k of clusters and k centroids for k-means clustering by result from minimal closed subsets algorithm. Thus, the number of iterations of k-means algorithm is reduced because minimal closed subsets are considered as based groups in its structure. The pseudoclosure distance constructed from the relationships family contributes one more benefit, for this method. This distance is used t o examine the similarity for both numeric and categorical data. Finally, our method was a successful combination between pretopology and k-means algorithm.
References [Belmandt, 199312. Belmandt. Manmel de PrCtopologie el. ses App1ication.s. Edition Herms, 1993. [Bonnevay and Largeron, 20021s. Bonnevay and C. Largeron. A pretopological approach for structural analysis. In Information Sciences, volume 144, pages 169-185, September 2002. [Burges, 1998lChristopher J.C. Burges. A tutorial on support vector machines for pattern recognition. In Data. Mzning and knowledge Discovery, pages 121-167, 1998. [Celeux et al., 1989lG. Celeux, E. Diday, G. Govaert, Y . Lechevallier, and H. Ralambondrainy. Classification Automatique des DonnCes. DUNOD informatique, 1989. [Chen, 2006IH. Chen. On k-median clustering in high dimensions. Proceedings of the seventeenth annual A CM-SIAM symposium o n Discrete algorithm, pages 1177 - 1185, 2006. [Emptoz, 1983lHubert Emptoz. Modkle PrCtopologique POUT la Reconnaissance des Formes. Applications e n Neurophysiologie. PhD thesis, Universith Lyon 1, 1983. [Gordon, 1999lA.D. Gordon. Classzfication 2nd Edition. Chapman, 1999.
536
Recent Advances in Stochastic Modeling and D a t a Analysis
[Gupta et al., 19991s. K. Gupta, K. Sambasiva Rao, and Vasudha Bhatnagar. Kmeans clustering algorithm for categorical attributes. In Data Warehousing and Knowledge Discovery, pages 203-208, 1999. [Hashom, 1982lAbdul Amier Hashom. Plus Proches Voisins et Classijication Automutiyue. .4pplicution des Donne‘es h d u s t r i e l k s . PhD thesis, L’Institut National des Sciences Appliqukes de Lyon, November 1982. [He et al., 2005]X. He, D. Cai, H. Liu, and J. Han. Image clustering with tensor representation. Proceedings of the 13th annual A C M international conference o n Multimedia, nov 2005. [Jain and Dubes, 1988)Anil K. Jain and Richard C. Dubes. Algorithms f o r Clustering Data. Prentice-Hall, Inc, 1988. [Kaufman and Rousseeuw, 1990lL. Kaufman and P. J. Rousseeuw. Finding Groups in Data: A n introduction to cluster analysis. WILEY-Interscience, 1990. [Lamure, 1987lMichel Lamure. Contribution d l’analyse des espuces abstraitsapplication auz images digitales. PhD thesis, UniversitC Lyon 1, may 1987. [Le and Lamure, 2006lV. Le and M. Lamure. A pretopological approach for clustering. Knowledge Extraction and Modeling Workshop, 2006. [Nicoloyannis, 1988]Nicolas Nicoloyannis. Structure Prdtopoloyiques et Clussijcution Automatique: le logiciel DEMON. PhD thesis, UniversitC Lyon 1, juin 1988. [Picard, 2001]Pascal Picard. Classification sur des donnCes 1iCtkrogknes. Master’s thesis, Universitk de la rkunion, 2001. [Rayniond and Han, 1994lT.Ng Raymond and Jiawei Han. Efficient and effective clustering methods for spatial data mining. In Proceedings of 20th International Conference o n Very Large Databases, pages 144-155, Santiago, Chile, 1994. [Tran, 2006lNguyen Minh Thu Tran. Analyse d’une base de graphes isms d’un simulateur en intelligence en essaim. Master’s thesis, Universitk de Nantes, 2006.
Alternatives t o the estimation of the functional multinomial regression model Manuel Escabias', Ana M. Aguilera2, and Mariano J. Valderrama' Universidad de Granada. Dep. Estadistica e 1.0. Facultad de Farmacia. Campus de cartuja. 18071 Granada, Spain (e-mail: M. Escabias: escabias@ugr. e s . M.J. Valderrama: valderraaugr.es) Universidad de Granada. Dep. Estadistica e 1.0. Facultad de Ciencias. Av. Severo Ochoa. 18071 Granada, Spain (e-mail: aaguileraugr . es)
Abstract. Functional logistic regression is one of the methods t h a t have raised great interest in the emerging statistical field of functional data analysis and particularly the one of functional regression analysis when the predictor is functional and the response is binary. The aim of this paper is t o generalize the solutions exposed in the literature to the different problems that arise in t h e functional logit model (as multicollinearity), to the multinomial case where the response variable has a finite set of categories bigger than two. Keywords: Functional data, functional logistic regression, functional multinomial regression.
1 Introduction The functional logistic regression model (FLR) is the most used method to explain a binary variable in terms of a functional predictor related t o it as can be seen in many applications in different fields as epidemiology or medicine (Ratcliffe et al., (2002)). The natural generalization of the functional logit model is the functional multinomial regression (FMR) model where the response variable has a finite set of categories and the predictor is a functional variable. Different attempts have been done in order to formulate and estimate this model in literature (see for example Besse et al. (2005)). In this paper we propose a different approach comparing different methods of estimation based on the approximation of the functional predictor and the parameter functions that arise in a finite space generated by a basis of functions what turns the functional model into a multiple one. The estimation so obtained will be improved by developing a principal component approach and selecting the principal components to be retained in the model according t o their ability to provide the best possible estimation of that parameter functions.
537
538
2
Recent Advances in Stochastic Modeling and Data Analysis
The functional multinomial model
In order to formulate the functional multinomial model, let us consider a functional predictor { X ( t ): t E T } and a categorical response variable Y with S categories {Yl,Yz,. . . , Ys}associated to it. Then, given asample of observations of the functional predictor z l ( t ) ,. . . , z n ( t ) , the sample of observations of the response associated to them is a set of n vectors y1,. . . , yn of dimension S of the form yi = (yil, . . . , yis)' such as Yis =
{
1 If Y, is observed for X ( t ) = ~ ( t ) 0 Other case
and the model express the responses in terms of the functional predictor as yi, = 7ris €is with
+
7rzs
= P[Y = y s / X ( t ) = 5 i ( t ) ]
a, and Ps(t)a set of parameter functions to estimate ( a s = 0, p,(t) = 0), and cisindependent and centered errors with variance "is (1 - 7 r i s ) . As in the logit case, this model can be expressed linearly in terms of the parameter functions considering one of the categories of the response as reference and defining the logit transformations as li, = L,(zi(t)) = 1og[7ris/7ris]. Then, the multinomial model can be expressed as
PS ( t ) by
There exist different ways of defining the logit transformations (see Agresti (1996) for a detailed explanation). The linear expression of the multinomial model leads to an interpretation of the parameter functions in terms of odds ratios similar to the one proposed by Escabias et al. (2005) for the FLR model. In this sense, if we consider an increment of a functional observations according to a function x* ( t )= Ax ( t )= z ( t ) h ( t ) ,the difference between logit transformations is on the one hand the integral of the product of the parameter function multiplied by the function h ( t )
+
L , (z* ( t ) )- L, (z ( t ) )=
L
h ( t )p, ( t )d t , s = 1,.. . ,
s- 1
(1)
and on the other hand the logarithm of the odds of the response Y = y, against the last category Y = ys
Alternatives to Estimation of Functional Multinomial Regression Model
539
so the exponential of the integral (1)is the odds ratio of each response against the last one. As in the FLR model it is impossible to estimate the parameter function by the usual methods of least squares or maximum likelihood (see Ramsay and Silverman (2005)). Moreover it is impossible to observe the functional predictor z ( t )continuously. As in the logit model we can give an estimation of the parameter functions if we consider that the functional variable belong to a finite space generated by a basis of functions and the parameter function too (see for example James (2002), Ratcliffe et al. (2002) or Escabias et al. (2005))
with @ ( t ) = (dl ( t ), . . . , dP ( t ) )a vector of basic functions that generate the space where z ( t ) belong to, and ai = ( a i l , .. . ,sip)' and PS = ( P s l , .. . ,Psp)’ the vectors of basis coefficients of sample curves and parameter functions respectively. So the functional model turns to a multiple one given by li, = as with & ! = functions
+
($uu)
zi
( t )PS ( t )dt
= a:9Ps,
s
=
1 , .. . , S
-
1 i = 1,.. . , n
being the p x p matrix of inner products between basic
In matrix form the vector of logit transformations corresponding to each category L, = ( L l s , . . . , I n s ) can be expressed as
Ls = a S l+ A 9 P S , s
= 1,.. . , S - 1.
and more generally the matrix of logit transformations L = ( l i s )
L
= a,l,l/s
+ A9P.
with P the matrix that has as columns the parameter functions basis coefficients and 1, and 1s vector of ones of dimension n and S respectively. Previous to an estimation of the parameters of this multiple model (parameter functions basis coefficients) it is mandatory to obtain the sample curves basis coefficients. Due to the impossibility of observing the sample curves in a continuous way, the usual information available from them are discrete observations in a set of knots I(: ( t k ) = (z ( t l ), . . . ,z ( t k ) ) what let to obtain the basis coefficients by using different methods as interpolation as was the case in Escabias et al. (2005) for the logit model, or least squares approximation as in Escabias et al. (2004).
540
3
Recent Advances in Stochastic Modeling and Data Analysis
Principal component approach
The maximum likelihood estimation of the parameter functions of the functional multinomial regression model obtained by the approach in the previous section is affected by high multicollinearity what makes the variances of estimated parameter function increase in an artificial way as the FLR model did (see Escabias et al. (2004) for the FLR or Hosmer and Lemeshow (2000) for the multiple logit regression model). As then we propose to avoid the multicollinearity problem by using as covariates of the multiple multinomial regression model a set of principal components of the design matrix A@. Let Z be the matrix of principal components of A@ and V the one that has as columns the eigenvectors of the covariance of A@. Then the multiple multinomial model can be equivalently expressed in terms of all the principal components as
L,
= a,l
+ APP, = a,l + ZV’P,
= a,l
+ Zy,,
and we can give an estimation of the parameters of model (coordinates of
PS( t ) )through the estimation of this one,
which is the same as the one obtained by using the original A@ matrix. Then we propose to approximate these parameters functions by using a reduced set of principal components. There are different criterions in literature to select principal components in regression methods (see Aucott et al. (2000) or Foucart (2000)). Escabias et al. (2004) compared in the functional logit model the classical one that consist of including principal components in the model in the order given by explained variability with the one of including them in the order given by a stepwise method based on conditional likelihood ratio test. The last method revealed as the best of two providing a most accurate parameter function estimation with the less number of principal components. In this work we will compare too these two methods for the functional multinomial model. The model will be tested by different simulated examples and applications with real data.
4
Acknowledgements
This research has been funded by MTM2004-05992 project from the Spanish Ministry of Science and Technology
Alternatives to Estimation of Functional Multinomial Regression Model
541
References l.Agresti, A. An Introduction t o Categorical Data Analysis. (1996) Wiley, New York. 2.Aucott, L. S., Garthwaite, P. H. and Curral, J . Regression methods for high dimensional multicolinear data Communications in Statistics: Computation and Simulation 2000; 29(4):1021-1037 S.Besse, P.C., Cardot, H., Faivre, R. and Goulard, M. (2005). Statistical modelling of functional data. Applied Stochastic Models in Bussines and Industry, 2005; 21: 165-173 4.Escabias, M., Aguilera, A. M. and Valderrama, M. J . Principal component estimation of functional logistic regression: discussion of two different approaches Journal of Nonparametric Statistics,2004 16 (3-4): 365-384 5.Escabias, M., Aguilera, A. M. and Valderrama, M. J . Modelling environmental data by functional principal component logistic regression Environmetrics,2005 16 (1):95-107, G.Foucart, T. A decision rule for discarding principal components in regression Journal of statistical planning and Inference, 2000 89: 187-195 7.Hosmer, D. W . and Lemeshow, S.Applied Logistic Regression, second edition, Wiley: New York, 2000 8.James, G. M.Generalized linear models with functional predictors Journal of the Royal Statistical Society, Series B, 2002 64(3):411-432 S.Ramsay, J . 0. and Silverman, B. W.Functiona1 Data Analysis, Second edition, Springer-Verlag: New York, 2005 lO.Ratcliffe, S. J., Leader, L. R. and Heller, G. Z.Functiona1 data analysis with application t o periodically stimulated foetal heart rate data. 11: functional logistic regressionStatistics in medicine2002;21 (8):1115-1127
A GARCH-based method for clustering of financial time series: International stock markets evidence Jorge Caiado
Nuno Crato
School of Business Administration
School of Economics and Business
Polytechnic Institute of Setubal
Technical University of Lisbon
Campus do IPS, Estefanilha
Rua d o Quelhas, 6
2914-503 Setdbal, Portugal
1200-781 Lisboa, Portugal
(e-mail: [email protected])
(e-mail: [email protected])
July 28, 2007 Abstract In this paper, we introduce a volatility-based method for clustering analysis of financial time series. Using the generalized autoregressive conditional heteroskedasticity (GARCH) models we estimate the distances between the stock return volatilities. The proposed method uses the volatility behavior of the time series and solves the problem of different lengths. As a n illustrative example, we investigate the similarities among major international stock markets using daily return series with different sample sizes from 1966 t o 2006. The data were divided into two sample periods: previous and subsquent t o the terrorist attack on September 11, 2001. From cluster analysis in the period before 9-11, most European markets countries, United States and Canada appear close together, and most Asian/Pacific markets and the South/Middle American markets appear in a distinct cluster. After 9-11, the European stock markets have become more homogenous, and North American markets, Japan and Australia seem to come closer. Keywords: Cluster analysis; GARCH model; International stock markets; Volatility.
1 Introduction The general problem in clustering financial time series is the separation of a set of time series data into groups or clusters, with the property that series in the same group have a similar stochastic dependence structure and series in other groups are quite distinct. To perform cluster analysis of time series, we have to define a relevant measure of distance between the time series in a 542
GARCH-Based Method for Clustering of Financial Time Series
543
data set. The stochastic behavior of most financial time series renders the usual methodologies used to measure the distance between different stock returns inappropriate. Mantegna (1999), Bonanno, Lillo and Mantegna (2001), among others, used the Pearson correlation coefficient as similarity measure of a pair of stock returns. They computed a k x k matrix, where k is the number of stocks, with the k ( k - 1)/2 different pairs of correlation coefficients, and used the metric d C O R ( Z , 9) =
d e ,
(1)
where Fsy is the correlation coefficient between the stock returns of the series x and y. Although this metric can be useful to ascertain the structure of stock returns movements, it has three important limitations: (i) it does not use the information about the autocorrelation structure of each stock return; (ii) it does not take into account the information about the return volatilities; and (iii) it cannot be used for comparison and grouping stocks with unequal sample sizes. In this paper, we present a method for clustering analysis of financial time series without these drawbacks. First, we introduce a distance measure based on the generalized autoregressive conditional heteroskedasticity (GARCH) estimated models of the stock returns. We then investigate whether major international stock markets have similar volatility behavior. Previous studies have investigated the comovements of international equity returns by using mean correlations (see Longin and Solnik, 1995, Karolyi and Stulz, 1996, Mei and Ammer, 1996, Ramchmand and Susmel, 1998, Ball and Torous, 2000, and Morana and Beltratti, 2006), cointegration (see Arshanapalli and Doukas, 1993, Bessler and Yang, 2003, Syriopoulos, 2004, and Tahai, Rutledge and Karim, 2004), common factor analysis (see Engle and Susmel, 1993, and Hui, 2005), and other approaches. However, the problem of identifying similarities or dissimilarities in stock returns seems to be not enough explored in the empirical finance literature using cluster analysis. The remainder of the paper is organized as follows. In Section 2, we introduce the parametric distance-based method for clustering of financial time series. In Section 3, we describe the data set used in this paper. In Section 4, we present the cluster analysis evidence for the empirical results. The final section summarizes the paper.
2
GARCH-feature based distance
We know that many of the recent finance time series theories are concerned with the conditional variance, or volatility, of a process. The volatility is a measure of the intensity of unpredictable changes in asset returns, so we can think of volatility as a random variable that follows a stochastic process. The task of any volatility model is to describe the historical pattern of volatility and possibly use this to forecast future volatility. Engle (1982) introduced the autoregressive conditional heteroskedasticity or ARCH(g) model assuming that the conditional variance depends on past volatility measured as a linear function
544
Recent Advances in Stochastic Modeling and Data Analysis
of past squared values. The need of a long lag length q and the non-negativity conditions imposed in ARCH parameters led Bollerslev (1986) to propose a more parsimonious parameter structure model, the GARCH(p, q ) model, defined 2 q a&. In most applications, the simple by u: = c Cy=,ajot-j GARCH(1,l) model has been found to provide a good representation of a wide variety of volatility processes as discussed in Bollerslev, Chou and Kroner (1992). We now introduce a parametric approach for clustering of financial time series using the information about the estimated GARCH parameters. Suppzse we fit a GARCH(1,l) model to both return series T, and T,. Let L , = (G,, p,) and L, = (G,,&,) be the vectors of the estimated ARCH and GARCH parameters and V, and V, the estimated covariance matrices, respectively. Building upon the work of Caiado, Crato and Peiia (2007), a measure of distance between the volatilities of the return series rt,, and rt,, can be defined by
+ xi=,
+
( L- L,)'V-'(L
~GARcH(~, =Y )
-
Ly),
(2)
+
where V = V, V,. It is straightforward to show that this measure satisfies all the usual properties of a metric (except the triangle inequality): (i) d ( z ,y) 2 0; (ii) d(z,y) = 0 if z = y; and (iii) d(z,y) = d(y,z). The advantages of this measure over other distance-based methods are that it conveys all the stochastic structure of the conditional variance of a process and it solves the problem of comparison of time series with unequal length. We also should note that the proposed distance measure can be easily extended to larger GARCH models and to other type of volatility models.
3
Data description
We consider data of daily index returns for 27 international stock markets from Americas (Brazil, Argentina, Mexico, United States and Canada), from Asia/Pacific (India, Hong-Kong, Indonesia, Malaysia, Korea, Japan, Singapore, Taiwan, and Australia), from Europe (Netherlands, Austria, Belgium, France, Germany, United Kingdom, Spain, Italy, Sweden, Norway, and Switzerland), and from Middle East (Egypt and Israel), as reported in Table 1. These data and correspond were obtained from Yahoo Finance (http://finance.yahoo.com) to the adjusted close prices. Table 2 contains the GARCH(1,l) estimates used to compute the volatilitybased metric defined in (2). The sum of the ARCH and GARCH coefficients quantifies the shock persistence to volatility. A value of unity indicates a unit root in the conditional variance (see Engle and Bollerslev, 1986). The ARCH test is the Lagranger multiplier test for ARCH effects in the residuals (see Engle, 1982). The Q (Q2) is the Ljung-Box test statistic for serial correlation in the residuals (squared residuals). In the GARCH models, all estimated coefficients are significant at conventional levels and have the appropriate signs. The shock persistences to volatility are close to one for all the markets. For Malaysia and Egypt, the summation of ARCH and GARCH estimates is slightly higher than 1. The diagnostic tests show that the models for all the stock markets perform well
GARCH-Based Method for Clustering of Financial Time Series
545
Table 1: Daily indices of international stock markets Stock market New York Stock Exchange TXS Venture Exchange Sao Paolo Stock Exchange Buenos Aires Stock Exchange Mexico Stock Exchange Bombay Stock Exchange Hong Kong Stock Exchange Jakarta Stock Exchange Kuala Lumpur Stock Exchange Korea Stock Exchange Japan Stock Exchange Singapore Stock Exchange Taiwan Stock Exchange Australian Stock Exchange Amsterdam Stock Exchange Vienna Stock Exchange Brussels Stock Exchange Paris Stock Exchange Xetra Stock Exchange London Stock Exchange Madrid Stock Exchange Milan Stock Exchange Stockholm Stock Exchange Oslo Stock Exchange Swiss Stock Exchange Egypt Stock Exchange Tel Aviv Stock Exchange
Country United States (US) Canada (CAN) Brazil (BRA) Argentina (ARG) Mexico (MEX) India (IND) Hong-Kong (HK) Indonesia (IND) Malaysia (MAL) Korea (KOR) Japan (JAP) Singapore (SING) Taiwan (TAI) Australia (AUST) Netherlands (NET) Austria (AUS) Belgium (BEL) France (FRA) Germany (GER) United Kingdom (UK) Spain (SPA) Italy (ITA) Sweden (SWE) Norway (NOR) Switzerland (SW I) Egypt (EGY) Israel (ISR)
Period 1966 - 2006 2000 - 2006 1993 - 2006 1996 - 2006 1991 - 2006 1997 - 2006 1987 - 2006 1997 - 2006 1993 - 2006 1997 - 2006 1984 - 2006 1987 - 2006 1997 - 2006 1984 - 2006 1992 - 2006 1992 - 2006 1991 - 2006 1990 - 2006 1990 - 2006 1984 - 2006 1993 - 2006 2000 - 2006 2001 - 2006 2001 - 2006 1990 - 2006 1997 - 2006 1997 - 2006
Sample size 10259 1716 3329 2473 3721 2293 4890 2233 3165 2277 5602 4692 2277 5607 3557 3437 3899 4185 4000 5687 3321 1752 1452 1429 4001 1815 1853
in terms of the variance equation except Brasil, United Kingdom, Hong-Kong, and Mexico, which show evidence of ARCH effects in the fitted residuals.
4
Cluster analysis
To investigate the affinity between the major international stock markets, we perform a cluster analysis of the time series of daily stock-market indices using all available data for sample periods before and after the terrorist attack on Septemper 11, 2001. For each data set, we compute a distance matrix with k ( k - 1)/2 different pairs using the GARCH-based method. Then, by using dendrogram and multimidensional scaling techniques (see for instance, Johnson and Wichern, 1992) based on the computed distances, we display clusters for the return series.
Recent Advances in Stochastic Modeling and Data Analysis
546
Table 2: Estimates for the international stock-market volatilities based on the GARCHI1.1) model \-r-, ---Market ARCH GARCH Persistence Q(20) Q'(20) LM(20) ~~
~~~
~~
~
0.08017 0.90451 0.05254 0.94309 0.11041 0.87385 0.11909 0.85635 0.11416 0.86893 0.11926 0.84775 0.13476 0.84615 0.13141 0.84919 0.11713 0.88651 0.07183 0.92705 0.12237 0.87537 0.15544 0.80763 0.07212 0.92181 0.22299 0.69768 0.09020 0.90293 0.09491 0.86437 0.10615 0.86154 0.07682 0.90647 0.07827 0.90359 0.09030 0.89146 0.09379 0.89372 0.08154 0.9 1051 0.09499 0.89237 0.12810 0.80014 0.12178 0.83409 0.18812 0.85540 Egypt Israel 0.09684 0.81474 (**) Significant at t h e 1% (5%) level. United States Canada Brazil Argentina Mexico India Hong-Kong Indonesia Malaysia Korea Japan Singapore Taiwan Australia Netherlands Austria Belgium France Germany United Kingdom Spain Italy Sweden Norway Switzerland
~~
0.98468 0.99563 0.98426 0.97544 0.98309 0.96701 0.98091 0.98060 1.00364 0.99888 0.99774 0.96307 0.99393 0.92067 0.99313 0.95928 0.96769 0.98329 0.98186 0.98176 0.98751 0.99205 0.98736 0.92824 0.95587 1.03352 0.91158
230.40* 19.83 79.23* 44.47* 122.55* 69.44* 95.12* 117.62* 130.43* 32.91** 32.57** 104.46* 22.86 91.10* 31.95** 72.41* 72.74* 24.58 27.53 35.20** 30.24 17.83 18.39 29.03 28.78 101.34* 25.23
16.86 10.30 32.56** 17.06 39.34* 15.99 170.64* 18.37 21.39 8.42 15.80 3.46 28.96 7.79 28.36 16.23 4.30 15.40 3.10 58.11* 14.16 25.34 18.24 19.02 4.19 10.84 16.57
16.84 9.47 32.01** 17.43 38.88* 15.72 181.59* 17.73 21.92 7.94 15.85 3.50 27.91 7.80 29.59 16.55 4.24 15.69 3.02 56.61* 13.35 24.59 18.37 19.09 4.17 10.37 16.19
GARCH-Based Method for Clustering of Financial Time Series
4.1
547
Before September 11, 2001
Figure 1 presents the map of distances across international stock markets using the 2-dimensional GARCH scaling and the dendrogram by complete linkage algorithm from which the clusters of markets can be identified. We found that all the markets are nearly at the same first coordinate except Australia, United States and Canada. Looking at the second coordinate, we appear to have the major European markets close together, the South/Middle American markets are at the same position, and some Asian/Pacific markets are at the same location. From the dendrogram, we can split the indices returns into three distinct clusters: Cluster 1 = (FRA, ITA, AUS, GER, NET, KOR, US, BEL, SPA, UK, CAN); Cluster 2 = (IND, SWI, TAI, ISR, HK, INDO, ARG, SING, BRA, JAP, MAL, MEX); and Cluster 3 = (AUST, EGY). Cluster 1 includes eight of the major European markets (France, Germany, Italy, United Kingdom, Netherlands, Spain, Austria and Belgium), the North American countries (United States and Canada) and Korea. Cluster 2 includes the South/Middle American markets (Brazil, Mexico, and Argentina), seven of the major Asian/Pacific markets (Japan, Taiwan, Malaysia, Hong-Kong, India, Indonesia, and Singapore), Switzerland and Israel. Cluster 3 grouped the outliers Australia and Egypt.
4.2
After September 11, 2001
Figure 2 shows the distances across stock markets in the sample period from September 11, 2001 to 2006. We appear to have most developed countries United States, Canada, Australia, Germany and Japan close to each other, and close to European countries United Kingdom, France, Spain, Netherlands and Italy. Looking at the dendrogram, we found three very reasonably clusters: Cluster 1 includes eight European countries (Germany, France, Spain, Netherlands, United Kingdom, Switzerland, Belgium and Sweden), Japan, Singapore, Korea, Israel and Argentina; Cluster 2 includes United States, Canada, Australia, Italy, Taiwan, Hong-Kong, Egypt and Brazil; and Cluster 3 includes Austria, Norway, Malaysia, India, Indonesia and Mexico.
5
Conclusions
In this paper, we introduced a volatility-based method for comparison of financial time series, and we investigated the similarities among major international and stock-markets returns. The proposed method takes into account the stochastic volatility dependence of the processes and solves the problem of classification of time series with unequal length. We performed a cluster analysis for daily stock indices returns with unequal sample sizes from 1966 to 2006. In our empirical study, we found that the persistence estimates are very similar for all stock markets except Australia, which makes it hard to identify dissimilarities among the stock market volatilities.
548
Recent Advances in Stochastic Modeling and Data Analysis
Figure 1: Distances across stock markets for the period before 9-11
r
-
.JAP
1
HK
1
.SING
l
i
e
* EGY
.
AUST
25
(a) Principal coordinates analysis
r1 (b) Dendrogram by complete linkage
I J(1
GARCH-Based Method for Clustering of Financial Time Series
549
Figure 2: Distances across stock markets for the period after 9-11
' NET * KOR
.SING *SPA
'? ISR JAP c GER
.
=AN
AUSI
- us .ITA .
EGY
.BRA
.
TAI
.
MEX
.
HH
-2,L
I
I
1
1 3
(a) Principal coordinates analysis
1 (b) Dendrogram by complete linkage
550
Recent Advances in Stochastic Modeling and Data Analysis
However, using the GARCH-feature based method for the period before 11 September 2001, we found three distinct clusters. One cluster is formed by most European countries, United States, Canada and Korea. The second is formed by South/Middle American markets (Brazil, Argentina, and Mexico), the major Asian/Pacific markets (Japan, Taiwan, Hong-Kong, India, Malaysia, Indonesia, and Singapore), Israel and Switzerland. The third is formed by Australia and Egypt. The results are slightly different in the sample period after the terrorist attacks. The European countries seem to become more homogenous after 9-11, in part due to the euro area markets integration, and the United States, Canada, Australia and Japan markets tend to cluster together. Acknowledgment: This research was supported by a grant from the Fundaqiio para a Ciencia e Tecnologia (FEDER/POCI 2010).
References [l] Arshanapalli, B. and Doukas, J . (1993). "International stock market linkages: Evidence from the pre and post-October 1987 period", Journal of
Banking d Finance, 17, 193-208.
[a] Ball, C. and Torous, W. (2000). "Stochastic correlation across international stock markets", Journal of Empirical Finance, 7, 373-388. [3] Bessler, D. and Yang, J. (2003). "The structure of interdependence in international stock markets", Journal of International Money and Finance, 22, 261-287. [4] Bollerslev, T. (1986). "Generalized autoregressive conditional heteroskedasticity", Journal of Econometrics, 31, 307-327. [5] Bollerslev, T., Chou, R. and Kroner, K. (1992). "ARCH modeling in Finance", Journal of Econometrics, 52, 5-59. [6] Bonanno G., Lillo F., and Mantegna, R. N. (2001). "High-frequency crosscorrelation in a set of stocks", Quantitative Finance, 1, 96-104. [7] Caiado, J., Crato, N. and Peiia, D. (2007). "Comparison of time series with unequal lengths", manuscript. [8] Engle, R. (1982). "Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation", Econometrica, 50, 987- 1008. [9] Engle, R. and Bollerslev, T. (1986). "Modelling the persistence of conditional variances", Econometric Reviews, 5, 1-50.
[lo] Hui, T. (2005). "Portfolio diversification: a factor analysis approach", Applied Financial Economics, 15, 821-834.
GARCH-Based Method for Clustering of Financial Tame Series 551 [ll] Johnson, R. A. and Wichern, D. W. (1992). Applied Multivariate Statistical Analysis. 3rd Ed., Englewood Cliffs, Prentice-Hall.
[la] Karolyi, G.
and Stulz, R. (1996). "Why do markets move together? An investigation of U.S.-Japan return comovements", The Journal of Finance, 51, 951-986.
[13] Longin, F. and Solnik, B. (1995). "Is the correlation in international equity returns constant: 1960-1990?", Journal of International Money and Finance, 14, 3-26. [14] Mantegna, R. N. (1999). "Hierarchical structure in financial markets", The European Physical Journal B 11, 193-197. [15] Mei, J. and Ammer, J. (1996). "Measuring international economic linkages with stock market data", The Journal of Finance, 51, 1743-1763. [16] Morana, C. and Beltratti, A. (2006). "Comovements in international stock markets", Journal of International Financial Markets, Institutions and Money, in press. [17] Ramchmand, L. and Susmel, R. (1998). "Volatility and cross correlation across major stock markets", Journal of Empirical Finance, 5, 397-416. [18] Syriopoulos, T. (2004). "International portfolio diversification to Central European stock markets", Applied Financial Economics, 14, 1253-1268. [19] Tahai, A., Rutledge, R. and Karim, K. (2004). "An examination of financial integration for the group of seven (G7) industrialized countries using an 1(2) cointegration model", Applied Financial Ewnomics, 14, 327-335.
CHAPTER 13
Applications of Data Analysis
Reliability Problems and Longevity Analysis A.I. Michalski Institute of control sciences RAS, Moscow, Russian Federation Abstract. The article describes problems of mathematical description of longevity and ageing processes in living organisms. These problems are similar to the problems considered in reliability theory but need development of additional methods to account for the living organisms specificity. The article describes methods for assessment of heterogeneity phenomenon in population, for analysis of stress experiments, for modeling of survival in changing environment. Results of analysis for longevity experiments with warms C.eleguns and mediterranean fruitflies Ceratitis capituta are presented. Application of the new methods and results, obtained in biodemography, can be profitable in investigation of technical systems under nonstandard conditions and extreme conditions.
Introduction Many differences exist and many features are in common between technical systems and living organisms. From the reliability theory point of view death is an event when an organism can not perform its living functions. On the other hand the decline in the functioning performance of an technical equipment is often referred as the equipment ageing. In mathematical terms failure of a device or death of an organism is described by distribution of failure free duration of work. This distribution is referred as survival function S(X) dealing with living organisms and equals to probability that the life span exceeds a given value. The intensity of death occurrence, which is an analogue to failure rate in reliability, is the main characteristic in longevity analysis and is called the mortality p(x). The one to one relationship
d &
p(x) = - -S(x) / S ( x ) makes them equivalent but they have different meaning in applications. Survival function s(x) gives probability of good operation and is essential feature in technical systems construction. Mortality p ( x ) reflects the chances to die under condition of surviving till age x and characterizes the work of the organism life support systems. This is the reason why mortality is of the prime interest in the living systems investigations. In addition mortality helps to formalize the complex phenomenon of ageing. The article describes different problems and approaches arising in investigation of longevity and ageing in living organisms and they relations 553
554 Recent Advances in Stochastic Modeling and Data Analysis
with traditional problems in reliability theory. The first section is devoted to consideration of common features between reliability and longevity. Some formal models from the reliability theory which describe the ageing process are presented. The second section describes heterogeneity analysis and its importance in analysis of population statistics. Stress experiments with Celeguns warms are described in section three. In this section hormesis effect (positive influence of stress) is described and modeled. The forth section presents consideration of resource allocation problem in relationship to longevity experiments with Cerutitis cupitutu flies living under changing feeding regimes.
1. Reliability and longevity Many attempts were made to link longevity of living organism and reliability of systems from which it is built (Grodzinsky et al., 1987; Koltover, 1997). It is the principal question if it is possible to built aging organism from nonaging elements and how to explain the fact that longevity of living organisms and duration of work in technical systems follow different lows. In Gavrilov and Gavrilova (2001, 2004) these problems are investigated as reliability problem in multi reserved systems. The separate components (technical elements or cells in living organism) are supposed to have constant failure rates which means that they are not aging. The separate components are combined in blocks to duplicate each other for reliability. The blocks are combined in a manner that a failure in one block means failure of entirely system or death of the organism. Formal consideration of such "parallel-sequential" scheme of connection shows that the failure rate of the system is changed with time. In other words in the case of high level of reservation the composition of konaging'' elements results in "aging" system. This is principal conclusion showing that aging process is probably related with redundancy of the life supporting systems. Gavrilov and Gavrilova (2001, 2004) proposed explanation why failure rate in technical systems can be approximated by power function at duration of work (Weibull model) while mortality rate in living organisms is approximated by exponential function at age (Gompertz model). A technical system is composed from elements, which were tested to operate from the very beginning without failure. In such system failure rate is approximated by Weibull low p(x) = xa . The different situation is observed in living organisms, which are not designed but developed. In development phase a living organism produces "elements" (cells, genetic complexes, etc.) which have not been tested and even can be miss functioning. This is observed in form of pathologies at birth. In formal way this is the case of a system composed by portion of failed elements at the very beginning. If the proportion of such elements is random value with Puasson distribution then the failure rate is approximated by Gompertz low p(x) = exp(bx). Gompertz model approximates mortality starting from age of 40 years (Yashin et ul., 2002). This means that at age 40 years human organism
Reliability Problems and Longevity Analysis
555
accumulates so many defects, that it starts to make significant impact in comparison with external mortality. The other property of the described model is deceleration of mortality at advanced ages. This is natural consequence of the proposition that a living organism is composed by nonaging elements. Deceleration of mortality was observed in insects by Curtsinger el al. (1982), Carey et al. (1992). In humans after 80 years old the mortality curve goes lower than exponential mortality curve tending to stable level (Thatcher et al., 1998; 1999). The last observation became possible only in the second half of the XX century when in economically developed countries number of people older than 80 and 100 year significantly increased. Deceleration of failure rate was not observed in technical systems because the long operating systems are usually removed from exploitation in the sake of reliability. A special experiment motivated by biological and demographic observations was conducted in Max Plank Institute for Demographic Research (Finkelstein, 2005). The failure times for 750 miniature electric lamps was registered during 250 hours. The corresponding empirical failure rate curve first demonstrated approximately power growth. Then the deceleration started which was followed by decrease till the level lower than the initial failure rate. The presented examples demonstrate the essential relation between processes of ageing and reliability. The well developed mathematical methods of reliability analysis in technical systems help to formulate the constructive hypothesis about mechanisms of ageing and longevity in living organisms. Observations of aging phenomena in turn set the new problems if reliability theory extending its application. An example of such a problem is the reliability analysis of mass production, related with variability in the properties of elements - heterogeneity, which corresponds to individual genetic and ontogenetic variability in living organisms. Reliability analysis of systems under changing and extreme conditions is analogies to analysis of living organism longevity under stress. Resource allocation between functions of task performance and operation support is analogous to resource allocation between reproduction and survival in living organism.
2. Heterogeneity The model described in Gavrilov and Gavrilova (2001, 2004) captures some principles, which link the reliability theory with longevity in living organisms. This model is too much "mechanistical" to capture the phenomena of longevity increase under a mild stress - hormesis effect (Southam and Erlich, 1943; Michalski et al., 2001; Kaiser, 2003), mortality deceleration in cohorts at older ages (Carey et al., 1992), decrease of failure rate to the level lower than the initial one after long operation of an electronic device (Finkelstein, 2005). Idea of stochastic heterogeneity (Keyfitz and Littman, 1979) applied to living organisms or technical devices is fruitfit1 in explanation of such phenomena. Heterogeneity is effectively applied to description of total and cause specific mortality dynamics in human
556 Recent Advances i n Stochastic Modeling and Data Analysis
population (Vaupel et al., 1979; Vaupel and Yashin, 1985) and to model reaction of laboratory animals to stress (Yashin et al., 2002). Application of heterogeneity to technical systems is considered in Follmann ans Goldberg (1988). The main idea of heterogeneity is that observed in population age specific mortality can be dramatically different from individual age specific mortality, which is considered as a notion for intensity of transition to descended state. When individual intensity of transition to descended state at age x depends on a factor z, probability to survive till age x for is
s(x,z)=..p( - jP(tJ)dt). 0
A heterogeneous population is composed by individuals with different values of factor z distributed in accordance with function P ( z ) . Probability to survive till age x for an individual in such population is Y(X)
=
Is(., z)dP(z)
with mortality given by expression @(X)
d -
s
= - - (X)/ Y(X) =
dx
I&, z)dP(z I
X)
2
where P(z I X) is distribution for factor z among survivors till age x. From this mortality observed in heterogeneous population at age x can be written as conditional expectation of individual mortality among survivors till this age
m
4
(1) = E(P I * Equation (1) links individual and observed in population mortality. By differentiating on x obtain
-p(x)=E dx d -
:(
-PIX
1
-a2(pIx).
(2)
If conditional variance O’(p I X) is large then mortality observed in heterogeneous population can decrease with age while individual mortality increases with age for any individual. An important case is proportional hazard model p(x,Z ) = zp0(x) with z - frailty factor, p0(x) - background mortality. From (1) and (2) it follows
@(d = Po ( M Z I 4
3
For constant background mortality obtain
Reliability Problems and Longevity Analysis
557
-dp ( x ) = -p,2a2(z I x) < 0 , & which means that "non aging" members compose population with decreasing mortality, which can be interpreted as "anti-aging". It is easy to see that in general case of proportional hazard model a relationship valid
-E M = -Po ( x ) c 2( z I x) < 0 , & POk) which means that observed at age x mortality decreases in relation to background mortality with rate proportional to the variance of frailty among survivors till age x. The resulting level of observed in population mortality can be lower that the starting level p(0).This very effect was observed in experiment with long operating electric lamps (Finkelstein, 2005).
3. Stress experiments Special experiments with different levels of stress were conducted with laboratory animals to investigate survival in heterogeneous population. It was expected that frail organisms will die first under unfavorable conditions and mean life expectancy among survivors of stress will increase. In many experiments it was found that the maximum increase in life expectancy was observed under so low levels of stress that no premature death was observed. This is an evidence of stimulating effect of low levels of stress in survival which is known as hormesis (Kaiser, 2003). Hormesis effect Survival in heterogeneous population in the presence of hormesis effect was modeled in Yashin et ul. (2002). Warms C.eleguns at the fifth day of life were heated at temperature 35°C during different periods of time and than survivors were transferred to medium with normal temperature 20°C. Numbers of dead worms were counted during heating and after it daily. Table 1 presents results of experiments and estimates for life expectance and its mean square error under different durations of heating. 1 2 4 Heating duration 0 (hours) Number of worms 137 100 152 133 Numberofdead worms 0 0 0 1 during heating Lifeexpectancy(days) 16.6 18.2 17.6 14.6 Mean square error 0.4 0.6 0.4 0.6
6
8
10
12
164
1
152 14
200 63
178 121
6.8 0.4
4.2 0.2
1.8 0.1
0.8 0.1
Table 1. Longevity in C.eleguns after different periods of heating.
558
Recent Advances in Stochastic Modeling and Data Analysis
The results obtained demonstrate increase of mean life expectancy after moderate (during 1 hour) heating. All warms survived this period of heating which means absence of selection by frailty and presence of hormesis effect. To explain observed effect it was supposed in Yashin et al. (2002) that investigated group of warms was a composition of frail, normal and robust animals. This is a case of discrete heterogeneity. Temperature and duration of heating determined proportion of animals in these three groups. Under moderate stress the proportion of robust animals increases. Development of degenerative processes starts after long heating and leads to increases in proportion of frail animals. Such redistribution can be explained by stimulating effect of stress which activates adaptation mechanisms in living organism. These mechanisms operate effective only in limited range and unable to cope with negative influence of strong stress.
Triggering defense Survival in Celegans after heating stress is an example of a system in which reliability decreases not gradually in result of aging but in stepwise mode (Michalski et al., 2001, Michalski and Yashin, 2003). The most effective defense performs at the beginning of the life and then the defensive mechanisms lose its effectiveness. Figure 1 shows empirical survival curves for control group of worms without heating and worms, heated during 1 hour as it was described above.
Figure 1. Proportion of survived worms in control group (0)and after heating during 1 hour (+). From the figure one can see that the positive effect of heating at the 5thday of life is observed only after the 20* day and it vanishes at the end of the life. After the 20* day of life the defense switches from the high level to the lower one. In Michalski and Yashin (2003) it is shown that such switch is a result of
Reliability Problems and Longevity Analysis
559
nonlinearity of differential equations, which describe production and consumption of protective and harmful substances. At birth an organism occupies a steady state with high level of protection. A parametrical drift of systems with age results in move of this steady state to unstable region and for a short time reaches a new stable state with lower level of protection. Heating in the beginning of life increases amount of protective substances in the organism in form of hit-shock proteins. This prolongs duration of time spent in the state with high level of protection and explains why do worms heated during one hour at survive better than control group at ages 21-27 days.
4. Changing environment and resource allocation Any system is designed to fulfill two task: to perform the operation for which it is designed and to keep operating. Under limited resources these two tasks are challenging and an effective strategy for resource allocation is to be applied. For living organisms the challenging tasks for resource allocation are reproduction and longevity. The optimal relationship between these functions depend on organism specificity and environment conditions. Investigation of such dependencies will clarify the biological mechanisms responsible for life support and longevity. Implementation in the technical systems of effective strategies for resource allocation will increase functional performance of them. The effective way to modulate resource allocation in flies is dietary restriction. It was found that under sugar diet without protein flies lay lower amount of eggs and live longer than under fool protein diet (Carey et al., 1998). The effect in longevity of the stochastic switch between sugar diet and fool protein diet was investigated in (Carey et al., 2005). The flies were given protein and switched back to sugar with different probabilities. The reproduction and longevity were measured. It was observed that both reproduction and longevity take the maximal values for probability to find protein which is equal to 0.2 and for probability to loose protein which is equal to 0.5. The observed phenomenon was modeled using Markov chain model with states corresponding to protein diet, sugar diet and the two unobserved transition states corresponding to sugar-protein change and to protein-sugar change (Michalski et al., 2006). Age specific mortalities at protein and sugar diets were estimated using Gompertz-Makehammodel on survival data under corresponding treatments. Mortality and probabilities of transition from the change states were estimated on survival data under different treatments. Estimates show than as in males so in females the change from sugar to protein diet leads to drop in mortality almost till zero. Later mortality increases as a result of transition to 'protein diet' state. Change from protein to sugar diet is associated with lowering mortality, which increases when transition to 'sugar diet' is made. The lowering of mortality in the result of the diet change explains experimentally observed increase in mean life span
560
Recent Advances in Stochastic Modeling and Data Analysis
under stochastic feeding in comparison with fixed diet. The biological background for such mortality lowering is unclear and needs further experimental investigations.
Conclusion Many problems in longevity investigation are similar to the problems in technical systems reliability. The specificity of a living organism is that it is developing using not perfectly reliable elements while a technical system is designed from elements selected for high reliability. From this follows different lows for mortality in living organism and for failure rate in technical system including population heterogeneity. The common mathematical description allows to consider the process of aging in a living organism as an example of behavior a technical system in the specific conditions and apply the methods for investigation of aging and longevity for investigation and projection of the technical systems reliability. Experience gained in mathematical modeling of aging and longevity allows to formulate new problems and approaches, which are actual in the reliability theory. These are problems related to heterogeneity, functioning under stress, resources allocation in changing environment. Application of principles and mechanisms, used in living organisms for survival, can increase the efficiency of the technical systems functioning.
References 1.
Grodzinsky D.M., Voitenko V.P., Kutlahmedov U.A., Koltover V.K., 1987. Reliability and ageing of biological systems. Kiev: Naukova dumka (in Russian). 2. Koltover V.K., 1997. Reliability concept as a trend in biophysics of ageing. J. Theor. Biol. 184, 157-163. 3. Gavrilov L.A. and Gavrilova N.S., 2001. The Reliability Theory of Aging and Longevity. J. Theor. Biol. 213: 527-545. 4. Gavrilov L.A. and Gavrilova N.S., 2004. Why We Fall Apart. Engineering's Reliability Theory Explains Human Aging. IEEE Spectrum, 41 :30-35. 5. Yashin A.I., Begun A.S., Boiko S.I., Ukraintseva S.V. and Oeppen J., 2002. New age patterns of survival improvement in Sweden: do they characterize changes in individual aging? Mech Ageing Dev 123: 637-647. 6. Carey, J.R., Liedo, P., Orozco, D., Vaupel, J.W., 1992. Slowing of mortality rates at older ages in large medfly cohorts. Science 258: 457461. 7. Thatcher A.R., Kannisto V., Vaupel J.W. , 1998. The force of mortality at ages 80 to 120. Odence, Denmark: Odence Unovercity Press. 8. Thatcher A. R., 1999. The long-term pattern of adult age, J. R. Statist. SOC.A 162: 5 4 3 . 9. Michalski A.I., Johnson T.E., Cypser J.R., Yashin A.I., 2001. Heating stress patterns in Caenorhabditis elegans longevity and survivorship. Biogerontology 2: 35-44. 10. Southam C.M., Erlich J., 1943. Effects of extract of western red-cedar heartwood on certain wood decaying fungi in culture. Phytopathology 33: 5 17-524. 11. Kaiser J., 2003. HORMESIS: Sipping From a Poisoned Chalice. Science 302: 376-379.
Reliability Problems and Longevity Analysis
561
12. James R. Carey J.R., Liedo P., Orozco D., Vaupel J.W., 1992. Slowing of Mortality Rates at Older Ages in Large Medfly Cohorts. Science 258: 457-461. 13. Finkelstein M.S., 2005. On some reliability approaches to human aging. International Journal of Reliability, Quality and Safety Engineering 12:337-346. 14. Keyfitz N., Littman G., 1979. Mortality In A Heterogeneous Population. Population Studies 33: 333-342. 15. Vaupel J.W., Manton K.G., Stallard E., 1979. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16: 439-454. 16. Vaupel J.W., Yashin A.I., 1985. Heterogeneity’s ruses: some surprising effects of selection on population dynamics. Am. Stat. 39: 176-182. 17. Yashin AI, Cypser JW, Johnson TE, Michalski AI, Boyko SI, Novoseltsev VN., 2002. Heat Shock Changes the Heterogeneity Distribution in Populations of Caenorhabditis elegans: Does It Tell Us Anything About the Biological Mechanism of Stress Response? J Gerontol A Biol Sci Med Sci 57:B83-B92. 18. Follmann D.A., Goldberg M.S., 1988. Distinguishing heterogeneity from decreasing hazard rates. Technometrics 30: 389-396. 19. Michalski AI, Yashin AI, 2003. Biological regulation and longevity. Control Science 3: 61-65. 20. Yashin A.I. 1985. Dynamics in survival analysis: conditional Gaussian property versus Cameron-Martin formula. In Statistics and Control of Stochastic Processes (N.V. Krylov, R.Sh. Liptser and A.A. Novikov, eds.) 466-475. Springer, New York. 21. Carey J.R., Liedo P., Muller H.-G., Wang J.-L., Vaupel J.W., 1998. Dual Modes of Aging in Mediterranean Fruit Fly Females. Science 281: 996-998. 22. Carey J.R., Liedo P., Muller H.-G., Jane-Ling Wang J.-L., Zhang Y., Harshman L., 2005. Stochastic dietary restriction using a Markov-chain feeding protocol elicits complex, life history response in medflies. Aging Cell 4: 31-39. 23. Michalski A.I., Carey J.R., Yashin A.I., 2006. Surviving in Changing Environment. The lst Rostock European Exploratory Workshop on Aging and Longevity in Wild Medfly Populations. Ycnexli repomon. 18: 125-136.
Statistical Analysis on Mobile Applications among City People: A Case of Bangkok, Thailand Pakavadi Sirirangsi’ Department of Statistics, Faculty of Commerce and Accountancy, Chulalongkorn University Phyathai, Bangkok 10330, Thailand (e-mail: [email protected])
Abstract. Undoubtedly, a distinguished increasing volume of mobile applications, nowadays, may have some impacts toward mobile IT-business. It is, therefore, of interest that the mobile applications among city people are examined as to obtain noticeable ideas worth for mobile IT-business development. Concerning investigation on mobile applications, sample data are primarily collected in Bangkok in-bound districts by interviewing with questionnaires, basing on stratification and proportional allocation. As for data-analysis, the descriptions of interesting variables in the fourmain groups of mobile applications are, firstly, made. The first category consists of basic applications whereas the second one is composed of sending and receiving messages. The applications available for people’s activities are categorized in the third one. The fourth represents communication-technology and device-transfer. As to make inferences for the true characteristics of interesting variables, estimations are performed and, subsequently followed by hypothesis-testings with appropriate parametric and nonparametric procedures. Relating the associations between variables concerning mobile-application types and mobile-user characteristics togetherwith their attitudes toward mobile technology, the association level and significance test are determined by Cramer’s and Goodman-and-Kruskal-tau Statistics. Then, Regression is employed as to investigate the relationship between an interesting response variable with independent variables. However, in case of binary responsevariable, Logistic Regression is applied. Keywords: Mobile Applications, City People.
1
Introduction
As it is generally known that, mobiles, in the modern days, have been so fardeveloped that people’s activities can be daily proceeded on very conveniently via mobiles in a way that multipurpose applications can be made on one mobile only in stead of a necessity for having each device for each one activity. Consequently, life in such a busy city is, now, much facilitated by modem mobiles which offer so many more functions that the people’s burden is lessened in terms of money spent for buying devices and the space needed for keeping them including time spent for operating. As the applications on mobiles nowadays have been growing up very rapidly, particularly in such a central city, mobile industries may be affected by a very large expansion in the volume of mobiles sold that production and sales policies will have to be developed in time. Furthermore, many 562
Statistical Analysis o n Mobile Applications a m o n g C i t y People
563
enterprises may be faced with a highly competitive products and sales of mobiles in terms of fashionable features and pricing. This study, therefore, is expected to bring about some findings that would be noticeable ideas worthwhile taking into account by the enterprises in order to set development plans and policies concerning Information Technology relating to modem mobile productions and sales. It is, then, aimed to investigate what applications on mobiles, at the present time, have been made among the city people and how the mobile applications are performed while spending their lifetime in the city. In addition, the city people’s characteristics are, also, determined in order to obtain some noticeable ideas worthwhile considering for setting mobile ITbusiness development plan and policies.
2
Data coverage and collection
The mobile users in the city of Bangkok, in the in-bound districts, are taken into consideration while all districts of the in-bound Bangkok have been stratified according to the system set by the Bangkok Metropolitan Office. Then, 8 of the all in-bound districts are selected by basing on the criteria that the selected ones can be good representatives and can cover all main areas of the in-bound Bangkok. As for collecting data for the study, an estimation of sample size is, firstly, performed, and then a sample of 1380 is obtained by the procedure basing on stratification and proportional allocation with taking account of the population size of each district together with the proportion of 0.9 representing mobile users found from the pilot survey. Subsequently, the sample to be selected from each of the eight of the selected districts is proportionally estimated from the all 1380 estimated before. The sampled data are, then, collected independently and unpurposively by interviewing with questionnaires.
3
Data analysis and findings Applications on mobiles: descriptions
3.1
According to the evidences from the study, the applications on mobiles described by all sampled responders who spend lifetime in the city can be categorized into 4 main groups representing as basic applications, sending and receiving messages, the applications relating people’s activities and the applications concerning communication technology and device transfer.
.
Basic applications
For the first group mentioned, the applications of Phone Book, Games, Alarm-Clock, Calendar, Calculator, Organizer, Vibration-System, Caller-
564 Recent Advances in Stochastic Mode~ingand Data Analysis
Ring, Speed-Dial, Voice-Dial, Schedule Recording, Voice Recording and Speaker Phone, all of which are found as the 13 basic applications on mobile usually made by city people. Evidently, among the total of 4054 responded as volume of applications per week, Caller Ring appears as the most popular one rating at the percentage of 32.8, which is subsequently followed by Phone Book appearing as the next one of 14.2 percent. Interestingly, Games is found popular at 8.1 percent representing the third rank. However, Voice Recording and Speaker Phone are found as the two lowest popular ones of 4.5 and 3.4 percents, respectively.
Phone
Calendar
Vibration
Voice dial
book
~
Speaker phone
Applications
Fig. I. Appiication on mobiles.
It is found, in the study, that the applications concerning sending and receiving messages via mobiles are practically made in the forms of SMS, ~~S and EMS. Distinctively, among the total of 435 answers, SMS appears as the most popular one at the rate of 51.3 percents. MMS comes next representing the percentage of 36.1, and then, followed by EMS founded at 12.6 percents, which appears as the lowest one. -i
I
A ppli catrons
Fig. 2. Sendindreceiving messages.
Statistical Analysis on Mobile Applications among City People
565
I n t e ~ e on t ~obiles
As Internet may be regarded as one of the most necessary and popular Eactivities, so far, it is, therefore, worthwhile considering how city people make use of various functions relating to Internet which offer by the system on mobiles. Plausibly, all responses concerning Internet applications on mobiles can be organized into 12 various types representing as E-Mail, Pay via mobiles, Chat, Musics, making Transactions, Games, Data Search, News, Ticket Reservations, Download, Movies and Shopping. It is found that, among the total volume of 6352 uses per week of all responders, Pay for expenses via mobiles represents the highest of all 12 applications mentioned, with the percentage of 10.5 while Ticket Reservations appears as the lowest one at the level of 4.5 percents. In connection with this, making Transactions by mobiles represents the next one after the function used for expense-Pay, at 10.1 percent level. Noticeably, Games and Chat appear as the third and the fourth in descending ranks, with 10.0 and 9.9 percents respectively. In addition to this, for the others, figure 4 may invisibly illustrate the comparative volume of various applications.
I
Appllcatlons
Fig.3. Internet on mobiles.
es on ~ o b i i e s o
~ a m e on s ~ o b i l e by s age
It appears, as generally expected, that those who belonged to the highest group of mobile- game players are the ones whose ages representing the youngs. Consequently, the age group between 24 to 29 years appears as the highest one with 23.1 percents among all 329 responses. Then, the next runner-up is found as the group of the young ones who are at 18 to 23 years of age. The groups of those whose ages range from 30 to 35 years appear as the third in rank, with the level of 16.1 percents. Apart from these mentioned,
566
Recent Advances in Stochastic Modeling and Data Analysis
the others do not indicate highly significant level of mobile game-plays. As to illustrate the comparative proportions of mobile-game plays of all different age groups, the following graph is constructed and shown below.
,$ $ ,
+@
,a" ,."&%O
**
,d." $
Age
Fig. 4. Games on mobiles by age. era on mobiles
am era on mobiles by oeeupatio~ o Additionally, city people's occupation is found as one another significant factor influencing the use of camera function available on mobiles. Further, it is found that business employee group pertains as the biggest one with the 36.3 percentage level. Then, the group of students and the group of business owners are found as the ones in the second rank and the third in ranlc, with the percentages of 19.2 and 15.1 respectively. Additionally, the graph shown below may help indicating comparatively different proportions of the camerah c t i o n application.
Occu nation
Fig. 5. Camera on Mobiles by Occupation.
Statistical Analysis on Mobile App~~cations among City People 567
It is of interest to examine, in this study, how city people's IT background is associative with their Bluetooth application via mobiles. Then, it is found that those with IT background represent 74.3 percents of all 580 responded users, whereas the ones without IT background pertaining 25.7 percents. The two comparative groups can be shown in the following graph.
Fig. 6. Bluetooth by IT background. ireless application
ireless application by IT bac Among all 455 responded answers from those who practically try for wireless application, it appears that 71.0 percents are those whose background is related to Information Technology, whereas the ones, in the another group, with no IT background representing 29.0 percentage level. The two comparative groups are illustrated in the following graph shown below.
ITBaikgroond I Fig. 7.Wireless applications by IT background.
568
Recent Advances in Stochastic Modeling and Data Analysis
3.2
Inferential analysis
Firstly, it is of interest to make inference for the true means of the hours of Internet use on mobiles per day, between the group of those whose education attainment are at least at Bachelor degree and the another one with lower than first degree education. The estimated means are calculated as 2.32 and 2.68 respectively. Further, the interval estimate for the true mean-hours using Internet per day are estimated, by based on
,as 2.45 5 p 52.92 and 1.97 I p 52.67 for the lower and at least first-degree groups respectively. Then, it may be reasonably suspected that the mean hours using Internet on mobiles by the group of those with at least firstdegree education is at least 2 hours per day. Further, the t-test is taken into account as to test hypotheses, and it is evidently found that H,: p ? 2 cannot be rejected at 0.05 significance level. Accordingly, it can be concluded that the true mean hours spent for Internet by the group with at least first degree are at least two hours, at 0.05 level of significance. Comparatively, the inference for the difference between the two true means of the hours spent for mobile- Game play by male and female groups is made. The two estimated mean hours are, then, estimated as 3.66 and 4.23 for the groups of males and females, respectively. Subsequently, the following hypotheses: Ho: Prnale 2 pfernale Ha: Prnale < Pfemale are tested, and what can be concluded is that the mean hours spent for mobile games by male group is significantly less than those for females, at 0.05 significance level. As for making inference for the true proportion of those of at least 25 years of age who use Flash Memory Cards, the hypotheses: H,: P 2 .5 Ha: P < .5 are, then, tested. It evidently appears that H, can be rejected at 0.05 significance level. So, the proportion of those with at least 25 years of age who use Flash Memory Cards is not significantly greater than or not equal to .5, at 0.05 significance level. Concerning sending SMS via mobiles among various age-groups, Kruskal Wallis Test is applied to test whether or not the true SMS volume sent per day by all 9 different age-groups are equal, as it is,also, significantly found that the variances among age-groups are not the same at 0.05 significance level. As a result, the average SMS amount sent per day by all different age groups are found significantly different, at 0.05 significance level. In addition, it is of interest taking into consideration how the average volume of
Statistical Analysis o n Mobile Applications among City People
569
SMS sent by the three different groups, defined as ones whose education attainment are lower, equal to, and higher than first degree, are associated with education. Then, the association between the SMS volume and the educational groups mentioned is determined by the Cramer’ Statistic
which is calculated as 0.393 signifying the association level with 0.000 p-value for the significance test. In the case that the association between the amount of mobile Internet used per day and occupations, classified as Government and NonGovernment groups are considered, Goodman and Kruskal Tau Coefficient is calculated as 0.022 representing the very low association level with the p-value of 0.002 for the significance test. As to examine how the volume of mobile chat per week is dependent on age, monthly income, number of mobiles owned, mobile price, monthly expense and chat time-period, the multiple regression is applied. It is, then, found that the chat volume per week significantly depends only on number of mobiles owned, at 0.05 significance level with 0.000 p-value. The estimated regression obtained may be shown as follows:
Y (chat volume per week) = 0.784 + 0.056 x (chat time-period). In addition, logistic regression is taken into account as to determine whether or not flash memory card used, which may be accounted as an interesting dichotomous variable, depends on city people’s IT background, education level, work for government or not, monthly income and memory size. Then, it is significantly found that working for government or not is the only one independent variable which affects flash memory use, with 0.004 pvalues and at 0.05 significance level. The equation obtained may be described as follows: Y (using flash memory card /not using) = 1.752 +2.872 x (working for government).
4 Conclusions Concerning mobile applications among city people of Bangkok, the evidences from the sample data indicate practical mobile applications which can be categorized into 4 groups consisting of basic applications, sending and receiving messages, applications relating people activities and the applications concerning communication technology and device transfer. Among all 13 types of the basic applications found, caller ring is the most popular one rating at 32.8 percents followed by phone book with the percentage of 14.2 percents. For the message-application group, SMS, MMS and EMS are evidently found as practical mobile applications relating sending and receiving messages. Additionally, SMS appears as the highest one with 5 1.3 percents whereas MMS is the next with a lower percentage of 36.0 percents. Concerning the
570
Recent Advances in Stochastic Modeling and Data Analysis
applications relating people’s activities via mobiles, among the all applications responded which cover camera, video, radio, MP3, games and Internet, it is found that MP3 appears as the most favoured mobile application pertaining to 23.0 percents, followed by games which is found as the next popular one at the rate of 18.1.Then, for the applications concerning data transfer between devices on Bluetooth, communication via Infrared, connection between devices by data link function, using flash memory card and wireless application, it appears that Bluetooth is the most popular one at 29.6 percents followed by wireless which represents 23.2 percents. Further, the evidences resulted from inferential analysis indicate that the mean hours spent for internet application on mobiles by the city people with at least bachelor degree education is at least equal to 2 hours. Interestingly, it appears that females spend average hours for games on mobiles more than males, at 0.05 significance level. Relating flash- memory -card use, the proportion of the city people of at least 25 years of age who use flash memory cards is found less than .5. Additionally, by investigation on association between interesting variables, it is found that SMS on mobiles is associated with people’s education attainment at rather low level. In case of mobile internet application, the association between the volume of internet applications and city people occupations, defined as government and non-government, is found at a very low level, as well, at 0.05 level of significance. Then, the evidences from regression analysis indicate that the volume of chat, via mobiles, per week is dependent on people’s time period for chat only. For a binary response variable, by logistic regression, using flash memory card is found related with government-nongovernment work only, at 0.05 significance level. Concludingly, as modem technology has been continuously developed, mobile applications among city people are evidently found significantly grown up very rapidly. However, it is hoped that the findings of this study would be ideas beneficial for mobile IT business development, and that would possibly bring about high technology, which remarkably facilitates mobile applications in all practices.
References [Agresti, 1984lAnalysis of Ordinal Categorical Data. John Wiley & Sons Inc. Canada, 1984. [Agresti, 1990]Categorical Data Analysis. John Wiley & Sons Inc. Canada, 1990. [Albright, Winston and Zappe, 2004lData Analysis & Decision Making with Microsoft Excel. Brooks/Cole Publishing Company. U.S.A, 2004. [Daniel, 1999lApplied Nonparametric Statistics. PWS-KENT Publishing Company. Boston, 1990.
Pollution sources detect ion via principal component analysis and rotation Marie Chavent', Herv6 Gukgan2, Vanessa Kuentz', Brigitte Patouillel, and J6rbme S a r a c ~ o ' , ~ Institut de MathCmatiques de Bordeaux (FR CNRS 2254) Universit6 Bordeaux 1 351 Cows de la, lib6ration 33405 Talence, France (e-mail: chaventamath. u-bordeaux1 .f r) ARCANECENBG Le Haut Vigneau BP 120, 33175 Gradignan Cedex, France (e-mail: arcaneacenbg . in2p3. f r) GREThA, Universitd Montesquieu - Bordeaux IV Avenue LCon Duguit 33608 PESSAC Cedex (e-mail: Jerome. Saracco@u-bordeaux4. f r )
Abstract. Air pollution is a widely preoccupation which needs the development of control strategies. To reach this goal, pollution sources have to be precisely identified. Principal component analysis is a possible response to this problem. Indeed this factorial method enables to detect sources, that is to have a qualitative description of them. In this work, techniques of rotation are a useful help for the association of variables with factors. We highlight the fact that the rotation must be applied to the standardized principal components, so as to keep good interpretation properties. This methodology has then been applied to a problem of air pollution on a french site. Keywords: Factor Analysis, Rotation, Pollution data.
1
Introduction
It is of great importance to identify air pollution sources in the development of air quality control strategies. Receptor modeling, using measurements of aerosol chemical composition a t a sample site, is often a reliable way t o provide information regarding source characteristics [Hopke, 19911. Some multivariate receptor models are based on the analysis of the correlations between measured concentrations of chemical species, assuming that highly correlated compounds come from the same source. One commonly used multivariate receptor model is Principal Component Analysis (PCA) [Jolliffe, 20021. PCA extracts the principal components accounting for the majority of variance of the data that are then qualitatively 571
572
Recent Advances in Stochastic Modeling and Data Analysis
interpreted as possible sources. In a second step a rotation can be used to facilitate the qualitative interpretation of the principal components. However, to keep good interpretation properties, the rotation must not directly be applied t o the principal components. In Factor Analysis techniques, rotations are usually used and well defined. We will show that the rotation has: to be applied t o the standardized principal components. We apply this methodology t o air pollution sources detection. We present some results obtained in the framework of the project PRIMEQUAL (Projet de Recherche Iiiterorganisnie pour une MEilleure QUalitC de I’Air ii l’kchelle Locale) of the french Ministry of Ecology. The following three steps process has been implemented: collecting PM2.5 (particles that are 2.5 microns or less in diameter, also called fine particles) with sequential fine particle samplers on a french urban site, measuring of the chemical composition with PIXE (Particle Induced X-ray Emission) method, and finally applying a PCA with rotation to identify the sources.
PCA and Factor Analysis
2
Notations. We consider a numerical data matrix X = (x:),~,~ where n objects are described on p < .n variables d , . . . ,xp. Let X = be the
(?i).,L,p
j-
x3--2’
standardized data matrix: 2 , wit,h %j and s f the sample mean and the sample standard deviation of 2 2 . Let R be the sample correlation matrix of xl., . . . ,xp: R = X ‘ M X where M = $I.rLwith .m = 12 or ‘n- 1 depending on the choice of the denominator of s j . The correlation matrix can also be written R = 2’2with 2 = A d l / ’ X . Let us denote by 7- 5 p the rank of 2 and consider the singular value decomposition of 2:
z= uA1/2V’
(1)
where:
0
is the (,r,,r) diagonal matrix of the ‘r nonnull eigenvahes X k , k = 1, ..., ‘r ordered from largest to smallest. of the matrix 2’2(or ZZ’), U is the ( n , , ~orthonormal ) matrix of the T eigenvectors u’, k = 1, ..., T of 22’associated wit,h t,he first, ‘r eigenvalues. V is the ( p , T ) orthonormal matrix of the T eigenvectors d ,k = 1: .... T of 2’2= R associated with the first T eigenvalues.
From the singular value decomposition of 2,we deduce the following decomposition of 2: X = M-1/2UA1/2V1 (2)
Pollution Sources Detection via PCA and Rotation
573
An overview of PCA and Factor Analysis. Let q 5 T . PCA and Factor Analysis operate by writing the standardized matrix as:
X-= G, Bh + E,
(3)
where G, is a (,n,q)matrix corresponding to the factors or principal components, whereas the ( p , q ) matrix B , provides information that relates the components t o the original variables xl,.. . ,xp. The ( n , p ) matrix E, is the rest of the approximation of
X, = 2 and then E,
X by 2, = G,Bb. Note that if q = 'r, we have
= 0.
In PCA, when q = r , equation ( 2 ) is written:
2 = !pV'
(4)
with 9 = M p 1 / 2 U A 1 / 2 . The columns of !p, called the principal component scores matrix, are the r principal components g k = f i & u k l k = 1, ..., T . Since U and V are orthonormal, we have $ k = X u k for k = 1, ...,T and = Xk. In other terms, the kth principal component g k is a linear combination of the p columns of X. The coefficients of uk are called the principal component scoring coefficients. In pract,ice, the user ret,ains only t-he first q < T eigenvalues of A , and the corresponding approximation of 2 is then:
2, = !p& where P !, and V, are the matrices !p and V reduced to their first q columns. In Factor Analysis (with the PCA estimation method), when q = equation (2) is written:
X
= F.4'
T,
(5)
with F = M P 1 j 2 U and A = V A 1 f 2 . The columns of F , called the factor scores matrix, are the T factors f k = f i u k , k = 1,...,T . The matrix ii = ( u $ ) ~ is ,the ~ loading matrix, also called fact,or pattern mat,rix. The coefficient a,$ is equal to the correlation between the variable d and the kth factor f k . It is also equal to the correlation between x j and the kth principal component d l k . k Since U and V are orthonormal, we have f = S 2)for k = 1, ... ,'r and 6 var( f k , = 1. Note that the kth factor f li is a linear combination of the p columns of X . The coefficients of 2 are called the factors scoring 6 coefficients. Moreover, one can observe that 'f = Then the factors f k can also be seen as the standardized principal components, and the factor
&.
574
Recent Advances in Stochastic Modeling and Data Analysis scoring coefficients are the standardized principal components scoring coefficients. When the user retains only the first q < T eigenvalues of A , the corresponding approximation of X is then:
where F, and A, are the matrices F and A reduced to their first q columns.
3
On the good use of rotation
Let T be an orthogonal transformation matrix, TT’ = T‘T = I,, corresponding to an orthogonal rotation of the q axes in a p-dimensional space. Applying directly this orthogonal transformation t o the principal components obtained by PCA gives:
x,= !PJ(V,T)’. The q rotated principal components are the q columns of the matrix &, = PYT.These rotated principal components are no more mutually orthogonal and then they are no longer principal components. For this reason, the rotation must be applied to the standardized principal components:
Zq= F,T(A,T)’ The y rotated standardized principal components (factors) are the q columns = F,T. The rotated standardized principal components of the matrix have the property to be mutually orthogonal and of variance equal t o 1. In order t o be able to interpret the q rotated factors, it is important to remark that the coefficients c?; of the matrix -4,= A,T are the correlations
Fy
-k
between the rotated factors f, and the variables 2 3 . From a practical point of view, the orthogonal transformation matrix T is then defined in order to construct a matrix A, such that each variable xj is -k
clearly correlated t o one of the rotated factor f, (that is c?; close t o 1) and then not correla,ted to the others rota.ted fa,ctors (tha.t is E$* close to 0 for k* # k ) . The most popular rotation technique is varimax. It seeks rotated loadings that maximize the variance of the squared loadings in each column of A,.
4 Application to air pollution sources detection In air pollution receptor modeling, the ( n , p ) data matrix X consists in the measurements of p chemical species in n samples of fine particulates. In this
Pollution Sources Detection via PCA and Rotation 575 application, n = 61 samples of PM2.5 have been collected with sequential fine particle samplers by AIRAQ' in the urban french site of Anglet, every twelve hours, in december 2005. The concentrations in ng 7 7 1 - ~ of p = 16 chemical compounds (A1203, Si02, P, S04, C1, K, Ca, Ti, Mn, Fe203, Ni, Cu, Zn, Br, Pb, C-Org) have been measured with the PIXE method by ARCANECENBG'. The coefficient, xi is then the concentration of the j t h chemical compound in the ith sample. In order to identify the sources of fine particulate emission in the samples, we have applied a PCA to the concentration matrix X , followed by an orthogonal rotation. We have then associated groups of correlated chemical compounds t o air pollution sources. We give here some results obtained with this methodology . The loading matrix 25 obtained after a varimax rotation of the standardized principal components (factors estimated by PCA in the Factor Analysis model) of the matrix X is given in Table 1.
X"
h3
x" x5
A1203 0.981 0.087 -0.042 0.070 -0.038 Si02 0.979 0.012 -0.055 0.104 -0.074 P 0.972 0.090 -0.017 0.071 -0.092 SO4 -0.028 0.765 0.247 0.180 -0.345 C1 -0.153 -0.274 -0.136 -0.181 0.879 K 0.597 0.716 0.111 0.233 0.031 Ca 0.608 0.091 -0.113 0.560 0.272 Mn -0.279 0.119 0.604 0.582 -0.238 Fe203 0.198 0.282 0.289 0.848 -0.112 CU 0.213 0.359 0.161 0.816 -0.149 Zn -0.029 0.053 0.977 0.129 -0.044 Br 0.490 0.615 0.097 0.281 0.392 Pb 0.004 0.163 0.969 0.126 -0.054 C-Ora -0.018 0.893 0.021 0.222 -0.160
Table 1. Correlations between the chemical compounds and the rotated standardized principal components (factors).
The loading matrix can be used to associate, if possible, sources to the rotated factors. Indeed we observe for each factor the strongly correlated - 3 coumpounds. For instance Zn and P b are strongly correlated t o f5 . Because Zn and P b are known t o have industrial origin, this rotated factor is associated t o the industrial pollution source. In the same way the element R6seau de surveillance de la qualit6 de l'air en Aquitaiiie Atelier Rkgional de Caractkrisation par Analyse Nuclaire Elkmentaire - Centre d'Etudes Nuclkaires de Bordeaux Gradignan
576
Recent Advances in Stochastic Modeling and Data Analysis -5
C1 is strongly correlated to f5 , which is then associated with sea salt pollution. Possible associat,ions between t,he five rotated factors and five pollution sources are given in Table 2 . Factor1 Soil dust Factor2 Combustion Factor3 Industrv Factor4 Vehicle
LLFl Table 2. Factor-source associations
In order to confirm t,hese associations we have confronted the rotatred factors with external parameters such as meteorological data (temperatures and wind directions) and the periodicity nightlday of the sampling. The of the rotated factors are the columns of the matrix p . . The coefficient matrix p5 represents a “relative” contribution of the source k t o the sample i . Fig. 1 gives for instance the evolution of the relative contribution of the ”4 source associated with fs . The night samples have been distinguished from the day ones, which enables t o notice that the contribution of this source is stronger during the day than at night. It is then a confirination that this source corresponds to vehicle pollution.
f:
I:
Fig. 1. Evolution of the Factor4 associated to cars pollution
In the same way, Fig. 2 gives the evolution of the relative contribution “ 2
of the source associated with fs . We notice an increase in the contribution of this source at the middle of the sampling period, which corresponds t o a decrease in the temperature measured on the sampling site (see Fig. 3). This is a confirination that this source corresponds to combustioil and heatiiigs pollution. To conclude we can say that the identification of the sources by PCA is only a first step of a more difficult work which consists in quantifying the
Pollution Sources Detection via PCA and R o t a t i o n I
41
577
I1
Fig. 2. Evolution of the Factor2 associated to heatings pollution
Fig. 3. Evolution of temperatures
sources. Although, it is important to discover the sources, the real problem is to define, in percentage of total fine dust mass, the quantity of each source.
References [Hopke, 1991lP.K. Hopke. Receptor Modeling for Air Quality Management. Elsevier, Amsterdam, 1991. [Jolliffe, 2002lI.T. Jolliffe. Principal Component Analysis. Springer Verlag, New York, 2002.
Option pricing and generalized statistics: density matrix approach Petr Jizba Institute for Theoretical Physics Freie Universitat Berlin Arnimallee 14, D-14195 Berlin, Germany (e-mail: j izbaophysik.fu-berlin.de) Abstract. I present a path integral formulation of the density matrix for RBnyi and Tsallis-Havrda-Charvit statistics. As a practical application I derive the associated option pricing formula generalizing the Black-Scholes analysis for Gaussian stock fluctuations. Perturbation expansion around the Black-Scholes formula is performed and properties of the ensuing expansion are discussed. Keywords: Black-Scholes formula, Generalized statistics, Density matrix.
1
Introduction
During the last two decades, the Boltzmann-Gibbs statistical mechanics undergone an important conceptual shift. While successful in describing stationary systems characterized by ergodicity or metric transitivity, it fails to depict statistical properties of many realistic systems in biology, astrophysics, non-linear dynamics, etc. It is symptomatic of such cases that one tries to find refuge in a new paradigm known as generalized statistics. The notion generalized statistics refers t o processes with broad distributions for which the usual Central Limit Theorem (CLT) is inapplicable. Theoretical justification for such distributions is provided by the generalized Central Limit Theorem (GCLT) erected in mid 30’s by P. LQvy [Ldvy, 19541, [Feller, 19661. Under the wings of GCLT fall, for instance, information-theoretic systems of RQnyi [Rdnyi, 19761, non-extensive Tsallis-Havrda-Charv6t (THC) systems [Havrda and Charvbt, 19671, [Tsallis, 20071, Levy flights [LQvy,19541, [Mantegna e t al., 19941, etc. Phenomena obeying generalized statistics are very diverse including fractional diffusion processes [Gorenflo e t al., 19991, multiparticle hadronic systems [Wilk and Wlodarczyk, 20001, multifractals [Jizba e t al., 20041, systems with long-time memories [Tsallis, 20071, or stockmarket returns [Kleinert, 20061. The purpose of this paper is to introduce a new class of option-pricing models in which the issue of the generalized statistics is addressed. Generalized statistics can enter the stage in two distinct ways. Firstly, one may try t o use directly a generalized-statistics distribution for asset price fluctuations. This is however not without consequences. Because the variance is infinite, the volatility is ill defined, and one cannot formulate the dynamics of the underlying stock returns in terms of the geometric Brownian motion. 578
Option Pricing and Generalized Statistics 579 The corresponding generalizations of the pricing strategy inevitably produces new classes of stochastic processes, e.g., processes with a statistical feedback [Borland, 20021, processes with risky options [Bouchaud and Potters, 20001, etc. Relevance of such pricing mechanisms is often difficult to judge as they critically depend on a concrete situation at hand. Second possibility is to start directly with the generalized-statistics density operator which, as a rule, is formulated in the phase-space with the help of Hamiltonians. There one may hope that the ensemble technique of statistical physics is a useful framework in a finance modeling. In usual option-pricing models this is indeed the case. Above expectation is further reinforced by the observation [Brody and Hughston, 20061 that any Hamiltonian process is a martingale. The corresponding time-compounded density matrix constructed thereof then represents the desired measure of stock fluctuations at given time t. Although the density matrix and density operator are connected via Fourier transform, the density matrix need not be heavy tailed even if the density operator is. The advantage of the second approach is in that most of the mathematical machinery valid in the Black-Scholes (BS) analysis is valid also here. In particular, one still has drift independent options, and puts and calls obey the put-call parity relation. On the other hand, the desired features such as the fat tails and peaked middles of asset fluctuations are present for sufficiently long times. The original heavy tailed density operator is then imprinted in the fact that the time-compounded density matrix has a semi-heavy tail (usually liner exponential law combined with a power law). The latter ensures that ensuing option pricing formula can noticeably differ from the BS solution for rather long expiry dates (days or even weeks) despite the validity of CLT. Such a formulation could be pertinent, e.g., in short maturity options. It will be this second approach that I will utilize here. My actual focus will be on a specific subclass of systems with power-law tail density operators. The subclass in question is represented by the maximal-entropy (MaxEnt) density operators resulting from R6nyi/THC information entropies. The paper is organized as follows. In Section 2 I present essentials for both R6nyi and THC statistics that will be needed in the main body of the paper. With the help of Schwinger’s trick I lay down in Section 3 the path-integral representation of the density matrix corresponding to R6nyi/THC statistics. In Section 4 I formulate an ensuing generalized option pricing formula. To the leading order in the large expiry date and/or small escort parameter I recover the BS formula. Finally, Section 5 is devoted to Conclusions.
2 Some fundamentals of Rhyi’s and THC statistics A useful conceptual frame allowing to generate important classes of distributions is based on information entropies. Information entropies generally represent measures of uncertainty inherent in a distribution describing a given statistical or information-theoretical system. Central r61e of information en-
580 Recent Advances in Stochastic Modeling and Data Analysis
tropies is in that they serve as inference functionals whose extremalization subject to certain constraints, yields the MaxEnt distribution. I shall focus here on two entropies, first on Rknyi's entropy defined as s i R ) = - l O1g C p ; , q>o, (1) 1-q and second on the THC entropy that is defined as
The discrete distribution P = { p i } is usually associated with a discrete set of elementary events. In the limit q --+ 1, the two entropies coincide with each other, both reducing to the Shannon-Gibbs entropy
s= - c
p i logpi.
(3)
i
Thus the parameter S 3 q - 1 quantifies the deviation from the BoltzmannGibbs statistics or from Shannonian information theory. It is well recognized by now [Jaynes, 19571 that within the context of Shannonian information theory the laws of equilibrium statistical mechanics can be viewed as inferences based entirely on prior information that is given in terms of expected values of energy, energy and number of particles, etc. For the sake of simplicity I shall consider here the analog of canonical ensembles, where the prior information is characterized by a fixed energy expectation value. The corresponding MaxEnt distributions for SiR)and SiTHC)can be obtained by extremizing the associated inference functionals
where a and P are Lagrange multipliers. The subscript r labels two conceptually different approaches. In information theory one uses the linear mean
while in non-extensive thermostatistics one utilizes a non-linear mean
The distribution Pi(q) is called escort or zooming distribution. Simple analysis shows [Bashkirov and Sukhanov, 20021 that
Option Pricing and Generalized Statistics
581
Here = P / q and AEi = Ei- (HjT. The same result holds also for THC entropy provided P H P / p: and H p: . Generalized distributions of the form (7) are known as Tsallis distributions and they appear in numerous statistical applications [Tsallis, 20071. An important feature of Tsallis distributions is that they are invariant under uniform shifts E of the energy spectrum. So one can always work directly with Ei rather than AEi.
xi
p/ xi
3 Path-integral representation with Tsallis distribution For the P ( 2 )= {p:”} distribution the un-normalized density operator reads
(8) where we have initially allowed for the above-discussed freedom by introducing an arbitrary energy shift parameter E . Later we set E = 0. We now observe that the density matrix (8) can be rewritten with the help of Schwinger’s trick as a superposition of Boltzmann distributions
This allows to write the density matrix P(2brz,;P(E)) superposition of path integrals
E (2bjb(P(E))1xa) as a
where P ( s ) = ~ P ( E ) sand ,
f,,v(P)
=
r(v)pwpv-le--cLP
is the normalized Gamma probability density function (GPDF) [Feller, 19661. A profile of GPDF is shown in Fig. 1. Density matrix for the Tsallis distribution can be thus viewed as the superposition of Gibbsian density matrices having their temperatures smeared with GPDF. In the limit q -+ 1+
and hence the density matrix (10) approaches the canonical density matrix of the Gibbs-Boltzmann statistics of the inverse temperature v / p = P(0). The reader may easily check that result analogous to (10) holds also for distributions P(’). In such a case 0 < q < 1, 6 = 1 - q and ,O H
p.
582
Recent Advances in Stochastic Modeling and Data Analysis
Fig. 1. GPDF for various values of p and v. Note that the shape is characterized by v (a shape parameter) while the profile is characterized by p (a rate parameter). More precisely] the skewness is 2/& and the variance is u / p 2 .
4
Option pricing formula and generalized statistics
To start with, I consider a continuously tradable stock and assume that minute stock fluctuations (I take a minute as our time unit) are described by the stationary Tsallis’ density operator (8). I now analyze the corresponding modifications to the BS formula that are inflicted by the use of RQnyi/THC statistics. To this end I note that the compounded effect of stock fluctuations over period of time t is described by the Tsallis’ density operator
The corresponding time-compounded density function
(t = t b
- ta
Note that
> 0) can be conveniently
P(zb,t b ;
z,
rewritten in the path-integral form
t,) fulfils the Chapman-Kolmogorov relation
J-W
This composition rule is a direct consequence of the semigroup property of
,c~(P)~. For Gaussian stock fluctuations with tb-ta > 0 the riskfree martingale measure density has the form [Kleinert, 20061
Option Pricing and Generalized Statistics
583
where the variance v = o2 corresponds to the square of the volatility and r w = r,, a 2 / 2 represents the riskless constant-interest rate. By choosing the Hamiltonian in (15) to be H = p 2 / 2 + p r z w / v I can write
+
with A x = 2 6 - x a - rw(tb - ta). The measure density (17) is depicted in Fig. 2. For IAxI >> 1 the asymptotic behavior of (17) is
Ax
Fig. 2. Normalized measure density (17); for various values of p and 6 at fixed expiration time and ( u 2 ) (left), for different expiration times at fixed 6 and p (right). I set the riskless interest rate TW to be 12%.
which reveals the semi-fat tail. It should be noted that the exponential suppression ensures that all momenta are finite. This fact is a key in the proof that (17) is a riskfree martingale measure density [Kleinert, 20061. The reader may also notice that the mean of v with respect to f t , , t / 6 is l/bp = p(0). So in the context of the measure (17) we have that p(0) = (0')). By having the martingale measure, I can calculate the option price at an arbitrary earlier time ta through the evolution equation as
Im 03
o ( x a ,t a ) =
dxb
o ( Z b , tb)P(xb,tb; xa, t a )
.
(19)
The value of the option at its expiry date t b is given by the difference between the underlying stock price s b = S(tb) and the strike price K , i.e.,
Options with the terminal condition (20) are known as call options. The (3 function ensures that the owner of the option will exercise his right to buy
584
Recent Advances in Stochastic Modeling and Data Analysis
the stock only if he profits, i.e., only when s b - K is positive. Should I have started with the Gaussian fluctuations, the prescription (20) together with (17) would yield the BS pricing formula [Bouchaud and Potters, 20001:
O(~')(Z,, t,)
= ~ ( t , ) ~ ( y +-) e-rw(tb-ta)KQ(Y-1,
(21)
where @(y) is the cumulative normal distribution and Ylt =
log[S(ta)/K]
+ (rW *
$ 0 2 )( t b
-ta)
(22)
The corresponding generalization of the BS formula that takes into account the Rknyi/THC statistics can be obtained by utilizing P(zb,t b ; x,, t,) instead of P(M~rW)(z:b,tb;2,,ta). Using the BS formula (21), I have
The superscript v reminds that the variance c2 appearing in (22) should be replaced the variable u. As the integration in (23) acts only on the @(y*) parts of O~B')(x,,t a ) ,I can write the option price in the form
where Nl and N2 represent a non-Gaussian smearing of @(y+) and Q(y-), respectively. The integration in (23) can be done explicitly, yielding
Symbol PV stands for the principal part associated to the pole p = 0. With the help of the Mellin inverse transform we have [Jizba and Kleinert, 20071
Here
1Fl
is the confluent hypergeometric function and (z)k
=
r(,+ + 2) r(z)
Option Pricing and Generalized Statistics
585
is the Pochhammer’s symbol. In case when S + O+ and/or t + 0;) one can use the asymptotic behavior ( t / S ) , (t/S). and neglect the summation on the third line of (26) due to strongly suppressive term 1 / ( 1 ) 2 n + t , 6 . In this case N1 boils down to --f
as expected. Here ( y + ( 0 2 ) )3 y+(P(O)). Calculations of the N2 term give an identical result [Jizba and Kleinert, 20071, provided y+ ++ y-. The asymptotic behavior (27) implies that in the 6 O+ and/or t + 0;) limit one regains the BS formula. This should not be surprising. The 6 -+ O+ case does not bring any volatility smearout and thus it yields Gaussian stock fluctuations. The t m case must yield the BS formula due to CLT. There is yet another interesting situation, namely case when A M 0, i.e., when S(t,) FZ Ee-TW(tb-ta). Using that $‘I(...,..., -A M 0 ) FZ 1 and omitting the last two sums in (26) I obtain that both Nl and N2 approach @((Y+))~A=o and @ ( ( y - ) ) I ~ = orespectively, , implying that O(z,, t a ) O(BS)(xa,ta)/~ Options = ~ . with A = 0 are known as at-the-money-forward options. So whenever the option is at-the-money-forward I regain back the BS formula. It should be stressed that many transactions in the over-thecounter markets are quoted and executed at or near at-the-money-forward. Let us finally notice that should I have started with put options, i.e. options with the terminal condition ---f
--f
then I would have obtained the pricing equation in the form
This shows that the above option-pricing model fulfills important consistency condition known as the put-call parity [Bouchaud and Potters, 20001
oP(za, t a )= oC(za, t a )- s(t,)+ Ke-TW(tb-ta) .
(30)
Here O p and Oc represent the put and call option, respectively.
5
Conclusions and outlooks
This article presents a new application of Tsallis’ density operator in the European-style option pricing. The main result is the generalized BS pricing formula that takes into account such features as the volatility fluctuations and semi-heavy tails of asset price fluctuations, yet it still preserves many
586
Recent Advances in Stochastic Modeling and Data Analysis
attractive features of t h e original BS model, including martingale density measure and put-call parity. Last but not least, above analysis also provides a new interpretational frame for Tsallis’ escort parameter q. I n option pricing models where stock fluctuations are directly fitted with Tsallis distribution [Borland, 20021, the q parameter is a free parameter without a n inherent meaning. In contrast, in the case when the density operator is Tsallis’ one, it can be rigorously shown [Jizba and Kleinert, 20071 t h a t q represents t h e characteristic time t* bellow which t h e Gaussian treatment of stock fluctuations is inadequate.
Acknowledgments I would like to gratefully acknowledge discussions with Prof. H. Kleinert (FU) and Dr. P. Haener of Nomura International Plc. I also acknowledge a financial support from the Doppler Institute in Prague and from the Ministry of Education of the Czech Republic (research plan no. MSM210000018).
References [LBvy, 1954lP. LBvy. Thkorie de 1’ Addition des Variables AlBatoires. GauthierVillars, Paris, 1954. [Feller, 1966lW. Feller. An Introduction to Probability Theory and Its Applications, V01.2. John Wiley, London, 1966. [RBnyi, 1976lA. RBnyi. Selected Papers of Alfred RBnyi, V01.2. AkadBmia Kiado, Budapest, 1976 [Jizba et al., 2004lP. Jizba and T. Arimitsu. The world according to Renyi: Thermodynamics of multifractal systems. Annals of Physics, 312: 17-57, 2004. [Havrda and Char&, 1967lJ.H. Havrda and F. Charvit. Quantification Method of Classification Processes: Concept of Structrual a-Entropy. Kybernatika, 3:30-35, 1967. [Tsallis, 2007]C. Tsallis. h t t p : / / t s a l l i s . c a t . cbpf .br/biblio. htm. [Mantegna et al., 1994lR.N. Mantegna, -, and et al.. Stochastic Process with Ultraslow Convergence to a Gaussian: The Truncated LCvy Flight. Physical Review Letters, 73:2946-2949, 1994. [Gorenflo et al., 19991R. Gorenflo, -, and et al. Discrete random walk models for symmetric LBvy-Feller diffusion processes. Physica A, 269:79-89, 1999. [Wilk and Wlodarczyk, 2000lG. Wilk and Z. Wlodarczyk. Interpretation of the Nonextensivity Parameter q in Some Applications of Tsallis Statistics and LCvy Distributions. Physical Review Letters, 84:2770-2773, 2000. [Kleinert, 2006lH. Kleinert. Path Integrals in Quantum Mechanics, Statistics, Polymer Physics and Financial Markets, 4th ed. World Scientific, Singapore, 2006 [Borland, 2002lL. Borland. Option Pricing Formulas Based on a Non-Gaussian Stock Price Model. Physical Review Letters, 89:098701-098704, 2002. [Bouchaud and Potters, 2OOOlJ.-P. Bouchaud and M. Potters. Theory of Financial Risks. Cambridge University Press, Cambridge, 2000.
Option Pricing and Generalized Statistics
587
[Brody and Hughston, 2006]D.C. Brody and L.P. Hughston. Quantum noise and stochastic reduction. Journal of Physics A: Mathematical and General, 39:833876, 2006. [Jaynes, 1957lE.T. Jaynes. Information Theory and Statistical Mechanics. Physical Review, 106:620-630, 1957. [Beck et al., 1993lC. Beck and I?. Schlogl. Thermodynamics of chaotic systems: An introduction. Cambridge University Press, Cambridge, 1993. [Bashkirov and Sukhanov, 2002lA.G. Bashkirov and A.D. Sukhanov. The distribution function for a subsystem experiencing temperature fluctuations. Journal of Experimental and Theoretical Physics, 95:440-446 , 2002. [Jizba and Kleinert, 2007lP. Jizba and H. Kleinert. Option pricing and generalized statistics. FU Berlin Preprint, 2007.
CHAPTER 14
Miscellaneous
Inference for Alternating Time Series Ursula U. Miiller' , Anton Schick" a n d Wolfgang Wefelii~eyer~ Department of Statistics Texas A&M University College Station, TX 77843-3143, USA (e-niail: u s c h i a s t a t . tamu. edu) Department of Mathematical Sciences Binghaniton University Bingliamton, NY 13902-6000, USA (e-mail: antonamath . b i n g h a t o n . edu) Mathematisclies Institut Universitat zu Koln Weyertal 86-90 50931 Kiiln, Germany (e-niail: wef elmamath. uni-koeln. de)
Abstract. Suppose we observe a time series that alternates between different autoregressive processes. We give conditions under which it has a stationary version, derive a characterization of efficient estimators for differenthble funct,ionals of Llie model, and use it to construct efficient estilrrators for the autoregressioir pa.ra.rneters and the innovation distributions. We also study the cases of equal autoregression parameters and of equal innovation densities. Keywords: Autoregression, Local asymptotic norniality, Seniiparanietric model, Efficiency, Adaptivit,y.
1 Introduction By a n alternating AR(1) process of period in we mean a time series X h , = 0 , 1 , . . . , t h a t alternates periodically between m possibly different AR(1)
t
processes, x j i n + k = '8kXjnzfk-l
$- &j n, + k,
j = 0, 1 , .. . ,
k
=
1:. . . , 77%.
(1)
where t h e iniiova.tioiis ~ t t ,E N, a.re iridependent with iiiea,ii zero a n d fiiiite variances, arid &j,+k has a positive density j ' k . T h e n t h e m-dimeiisioiial process Xj = (X+l),,t+l,. . . . Xj,)T, j E N; is a hoinogcneous Markov chain. 11s transitioii density froin to X, = x = ( X I , . . . ,: c , , ) ~ depeiids only on t h e last component, of Xj-1, say X O , and is givcn by
xjPl
n nz
( x 0 . x )H
fk(5li
- '8kZk-1).
k=l
Note t h a t a n alternating AR( 1) process is not a multivariate autoregressive process, which would require a represeiita.tion X, = OX,- I cJ for a matrix 0 a n d i.i.d. vectors ~ j .
+
589
590
Recent Advances in Stochastic Modeling and Data Analysis
If we replace A-jTJl+kP1in (1) by its autoregressive representation and iterate this m - 1 times, we arrive at the representation &m+k
= 7(6)X[j-l)m+k
+ 'Ijrn+k,
j 6 N,
(2)
where
n
7(8)=
c
nz-1
nL
6 = (61,. . . .flm.)T,
6k,
=
Vjrn+k
k=l
%t(6)c,jni+k-t,
t=O
and, setting 6,s= d t if s = t mod m, 1-1
s=o
In pa.rticular, ~ o ( 6 = ) 1 and ~ , j ~ , = ~ (~6j () 6 )The . illnovations 7 j T J L + k , j E N, in (2) are independent with positive density. Hence for each k = 1,.. . , m the subseries X j m + k , j = 0 , 1 , . . . ., is AR(1) a i d an irreducible and a.periodic Ma.rkov cha.in, a.nd positive Ha.rris recurrent if a.nd only if 1~(.19)1< 1. In particular, we do not need IOkl < 1 for all k . We obtain that the m-dimensional Markov chain Xj, j E N,is irreducible and aperiodic, and < 1. In t,his case we also have positive Harris recurrent, if and only if I~(6)l infinke-order moving average repre.sentatioiis 02
Xjrn+k =
j 0.1.. . , C7kt(6)Ejm+k-t, =
.
k
=
1 ; .. . ,m.
(3)
t=O
In the following sections we derive efficient estiina.tors for submodels of alteriia.ting AR( 1) processes. We trea.t depeiideiicies between the autoregression pa.ra.meters a.nd also consider the cases of equal a.utoregression pa.ra.meters aiid of equal iiniovatioii densities. In Section 2 we give coiiditions under which the alternating AR.(1) model is locally asyinpt,ot,ically normal, and characterize efficient, estiinat,ors of vector-valued fiiiict,ioiials. In Section 3 we construct efficient. e.st,imat,orsfor t,he autoregressioiz parameters and the innovation distributions. Section 4 considers submodels with equal innovation densities.
2
Characterization of efficient estimators
In order to describe possible depeildellcies between the aiil.oregression parameters .81,. . . , O m , we reparainetrize t,hem a s follows. Let p 5 ni and .4c RP open, let 6 : A + R", and set 19k = 6 , ( ~ )for Q E -4.Set f = (fl,. . . . , f J , L ) T . Our model is semiparainetric; its distribution is determined by ( Q , f ) . Fix Q E A with ~ T ( S ( Q < ))~ 1. Assume that 6 : A + Rn' has continuous partial derivatives at Q , and write & for the m x p matrix of partial
Inference for Alternating T i m e Series
591
derivatives and I& for its k-th row. Assume that & is of full rank. Fix innovation densities 1 1 , . . . , f n L . Assume that the fk are absolutely continuous with a x . derivative and finite Fisher information J k = E [ C ; ( & k ) ] ,where Pk = - f L / f k . Introduce perturba.tions Q , ~= e .n-’/’t with t E Rp, and f k n u k ( x ) = f ( ~ ) ( l,np1/’~uk(x))with u.k in the space Uk of bounded inea~ )0]and E [ E ~ * U ~=( 0. E ~These ) ] two surable fiinctioiis such that E [ u I ; ( E= conditions guarant.ee that f k n u , is a mean zero probability density for n sufficiently large. The transition density froin A-j,,,+k-l = xk-1 to A-jrn+k = xk is f k ( x k - 9 k x k - l ) . The perturbed txansitioil density
fk +
+
(zk-l,xk)
fknuh(xk
-dk(@r,t)xk-l)
is Helliiiger differentiable with derivative (xk-1,
zk)
&txk-lek(Zk
- Gkxk-1)
+uk(xk
- 19k.k-1).
Here and in the following we write 29 for 2 9 ( ~ ) . Set U = U1 x . ’ ’ x U,,,, u = ( ~ 1 , .. . aiid f,, = ( f l n U ,.,. . , fl,i.,u,,,)T. Suppose we observe Xo,Xi,. . . ,Xn. Let P , a.nd Pn.tudenote their joint la.ws uiider ( Q , f ) a.nd (e.,lt,fnU), respectively. Followiiig [Koul a i d Schick, 19971, who treat nonalternating autoregression, we obtaiii locul usymnptotic norina1,ity
where D is the diagonal matrix with entries E I X ? ] J 1 ., . . ., EIXzl]J,,L. Here we have used tha.t X O ! ~ ( E .~. .) ,,X,,~-~C,(E,),U~(E~), . . . , u ~ ( E , ) a.re uncorrelated. We can now characterize efficient estimators as follows, iising resiilts originally due t,o HAjek aiid LeCam, for which we re.fe.r t,o Section 3.3 of the monograph [Bickel et al., 19981. Let U k denote t,he closure of UI; in L?(fk) and set U = u1 x . . . x The squared norm of (t.,u) on t,he right-hand side of (4) determines how difficult it is, asymptotically, to distinguish between ( ~ , fand ) (ent,f,,).It defines an inner product on Rp x U. A realvalued functional p of ( Q , f ) is called diflerentiuble at (@, f ) with gradient (tqp; uq)E Rp x U if
unL.
n’”((cp(@,t,fau)
-
p(Q, f ) ) + t;.lsTD&t
+
m
E [ u ~ I , ( E ~ ) w ( (E5 ~ ) )] k=l
for a.11 ( t j u )E
Iwp
x U . An estiimtor @ of p is called regulur a.t (elf)with
limit L if
nl/’(@ - (cp(prLt, f,,))
+L
under P,ltu, (t,u) E Rp x U.
(6)
592
Recent Advances in Stochastic Modeling and Data Analysis
The convolution theorem of Hgjek and LeCam says that L is distributed as the convolution of some random variable with a normal random variable N that has inean 0 arid variance
t p T D & t v+
c ,n
E[,L2,,(Ek)].
k=l
This justifies calling @ eficient if L is distributed as N . An estimator @ of cp is called asymptotically linear at ( Q , f ) with influence function g if g E L2(P1) with E(g(Xo.,XI)IXo)= 0 a.nd n
nl/'(+ - p(Q,f ) ) = n-11' ~ g ( q - l ) J j +) O P ( 1 ) . ,j = 1
It follows from t,he convolution t(he0re.mt,hat. an estimator @ is regular and efficient if and only if it is asymptotically linear with eficient influence function
c m
g(.o,x)
=
(&Av2k-1lk(2k
-
8k.k-1)
+Uvk(Zk
-
8kZk-1)).
k=l
+
The iiiner product in (5) decoiiiposes into n i 1 iiiner products on IW" and u1,.. . , Urn. This implies that the gradient of a functional p of e only is the same for each subniodel in which some or all of t,he fr; are known. Hence asymptotically we cannot, estimat,e p bett,er in these subniodels. In t'his sense, fiinctioiials p ( ~ )are adaptiije wit,h respect to f . Similarly, fiinct.ionals of fk are adaptive with respect to the other parameters. For a q-dimensional f u n c t i o d cp = (cpl. . . . , cpy)T of ( Q , f ) , differentiability is understood componentwise. For an estimator @ of cp, asymptotic linearity is also understood componentwise, and regularity is defined as in (S), now with L a q-dimensional random vector. It is then coiivenient to write the gra.dient of cp a.s a. iria.trix (Tq, I],) whose S-th row is the gra.dient of cps; so differentiability (5) reads
c in
nl'%(Qnt,fn")
-
cp(Q,f))
+
r,lPTn4t +
E"Jq,-k(Ek)Uk(Ek)]
(7)
k=l
for all (t,u) E IWP x U. The convolution thcoreiii then says that L is distribut,ed as the convoliit.ion of some random vect,or with a normal randoin vector N that has mean vector 0 and covariance matrix
F i d l y , (p is regu1a.r a.nd efficient if a.nd oiily if it is asymptotically 1iiiea.rwith (q-dimensional) eficient influence fu,nct%o*n
Inference for Alternating Time Series
3
593
Construction of efficient estimators
Autoregression parameters. Suppose we want to estimate e. By adap) e is obtained from (7) as (Tq,0) tivity, the gradient of the functioiial c p ( ~ = with Tq solving
nl”(elLt- e) = t T
so Tq = (4
T,8
=
.T
D h t t E R;
Oh)-‘, a.nd the efficient iiiflueiice function .T
g(z0,x) = (79 D.Q)-l
c m
is
.T
79,Zk-lek(Sk
-
9kZk-1)
k l . T
Hence the asymptotic va.ria.nce of an efficient estiiimtor is (8 D & ) - l . Followiiig [Koul and Schick, 19971, we caii construct an efficient estimator of e, with this iiifluexice fuiictioii, by the Newtori-Ruphson procedure. This is a one-st,e.piinproveineiit of a root-rz consistent. iilitial estimator. As initial est,imator of e we can t,ake e.g. the least s p m w ~ s t i m m t o r5, the ininiiiiiim in e of n
J=1 k
m
l
i.e. a solution of the inartiiigalc estimating eyuatioii
An efficient estimator is then
-~
Here we have estimated !k by 2, = - f k / , f k wit.11 f k an appropriate keriie.1 estimat,or of f k based on residiials Ejm+k = Xjm+k - 2 9 k ( G ) X j n l + k - 1 for j = 1 , .. . ,n, and we have est,iniated D by pliiggiiig in empirical estimators for yk = E[X,?]and J k ,
A special case i s the alternating AR(1) model (1) with equal autoregression parameters 191 = ’ . = 8 f n = 0 . This is described by the reparainetrizatioii 8(S)= ( 0 , .. . with I9 playing the role of e. Then & = (1... . l)T aiid ~
594
Recent Advances in Stochastic Modeling and Data Analysis
.T
= Do = E [ X I ] J 1 + . (8) reduces to
19 D h
+ E[X:,]Jm, and the efficient influence function
c in
g(z0,x) = DO1 Zk-l&(Zk - O Y k - 1 ) . k=l
Hence the asymptotic variance of an efficient estimator is DO‘. An initial estimator for 19 is the least squares estimator
t=1
t=l
and an efficient estimator is the one-step iniprovei-nent
with Ct = Xf - SA’-, and Do = ; Y , j l
+ ” . + jin,,3,,,
Innovation distributions. Suppose we want t,o estimate a linear func) ] h , ( z ) f k ( z )dx of the innovat,ion dist,ribution, tional p ( f k ) = E [ h ( ~ k = By ada,pt,ivit,y.,t,he gradient, of p is obt,ained from (5) as where h E L?(fk). (O,u,) with uqi = 0 for i # k and uqI; solving n’”(~(fk.nu,) -
so
u,k
p ( f k ) )= E [ h ( & k ) U k ( & k ) ]= E [ U p k ( E k ) ? L k ( E k ) ] , uk E
is the projection of h onto
uk;
uk,
Hence the efficient influence fiinction is g(Z0. x) =
h(zk - 0 k “ k - 1 )
-
E [ h ( & k )-] EIEkh(Ek)l
(”k
-
OkICk--1),
EEI
and the asymptotic variance of an efficient estimator is
An efficient, estimator, with influence fiinct,ion g? is
This requires that, h, or fk is snfficienLly smooth. For appropriat,e assumptions we refer to [Schick and Wefelmeyer, 20021. An alt,ernat,iveto t,he above additive correction of an empirical estimator are weighted empirical estimators Cy==, wjh(Ejrm+k)., where the random weights w j are chosen such that wjij,,,+k = 0 ; see [Owen, 20011 and [Muller et al., 20051.
c,”=,
Inference for Alternating Time Series
4
595
Equal innovation densities
Suppose that the innovation distributions are known to be equal, fl = ’ ’ = f m = f ,say. As in Section 2. assume that f is absolutely continous with finite where & = -f’/f. Introduce perturbations Fisher information J = E[P’(E)], fnzL(z) = f(z)(l n-’/’u(z)) with 2~ in the space U of bounded measurable functions such tha.t E[u(E)] = 0 and E [ E u ( B )= ] 0. Then local asymptotic norinality (4) reduces to
+
1
T
.
nz -E[u’(E)] o P ( l ) , 2
~.
- -t 6 D16t 2
+
-
where D1 is the diagonal matrix with entries E [ X : ] J ,. . . , E I X . i l ] J . Let denot,e the closiire. of U in L ? ( f ) . A real-valued fiinctional 9 of ( e ,f ) is differen,tia.bZe at ( e l f )witahgradred (tv.,u q ) E RP x U if
u
n1/’(cp(ent,f T r U )
-
~ ( ef)) .,
+
t:hTDlht
+~ ~ ~ E [ Z L ~ ( E ) U ( E ) ]
for all ( t , u ) E IWP x U . The factor ,772 is there because we count the observations in blocks of length 772. The inner product on the right-hand side decomposes into two inner products on RP and on Hence functioiials of Q or f a.re a.da.ptive with respect to the other pamineter.
u.
Autoregression parameters. reduces to
The efficient influence function (8) of
e
e is again obtained as one-step iinproveinent of a root-n consistent initial estimator
An efficient, est,iinator 6 of
e,
c m
6 = e + (h(e)Tm(e))-’
h~(e)X,~,ll+k-le(~.711,+k)
k=l
-p/f
Here we caii estimate e by 2 = with f a kernel estimator based 011 all residua.ls c j m , + k = Xjm+k - i ? k ( e ) X j m + k - l for j = 1 , . . . % I Ia.nd k = 1 , . . . ;m, arid we can estimate J by
596
Recent Advances in Stochastic Modeling and Data Analysis
Innovation distribution. Suppose we want t o estimate a linear functional cp(f) = E [ ~ ( Efor ) ]h E L?(f). By adaptivity, the gradient of p is ( 0 , ~with ~ ) up E U solving n"'(((p(fnU) - ( ( p ( f ) ) = E[h(z)u(~)]= m . E [ v . , ( ~ ) u ( ~ )8], ,E U ; so mu., is t h e projectmionof h onto
u,
muv((.)= h((.)- E [ h ( & ) ]-
E [ ~(E h )] (. E[E?]
~
Hence t h e efficient influence function is
a n d t h e asymptotic variance of an efficient estimator is
A n efficient estimator is obtained, similarly
in Sect,ion 3 . as
Of course, if both t h e autoregression pamineters a.iid the innova.tion densities are equal, ,191 = . . . = 9,, = '8aiid f l = . . . = fn,, = f, theii t h e alternating AR( 1) inodel reduces t o t h e usual AR( 1) iriodel X t = r9Xt-l E ~ where t h e Et axe independent with density j', a,nd t-he sainple size is 42m.
+
Acknowledgment. Anton Schick was support,ed in part, by NSV Grant DMS 0405791.
References [Bickel et al., 1998]P.J. Birkel, C.A.J. Klaa.ssen, Y . Ritov, and J.A. Wellner. Efficient u7~dAduptive Estirriution for SerrvLpururriet7ic Models. Springer, New York, 1998. [Koul and Scliick, 1997lH.L. Koul and A. Scliick. Efficieiit estimation in nonlinear autoregressive time series models. Bernoulli, 3:247-277, 1997'. [Muller et ul., 2005]U.U Muller, A . Schick, and W. Wefelmeyer. Weighted residualbased density estimators for nonliiiear autoregressive models. Statistica Sinica, 15:177-195, 2005. [Owen, 2001lA.B. Owen. Empivical Likelihood. Chapman & Hall, London, 2001. [Schick and Wefelmeyer, 2002]A. Schick and W. Wefelnieyer. Estimating the innovation distribution in nonlinear autoregressive models. ilnnuls of the Institute of Statistical Mathematics, 54:245-260, 2002.
,
Estimation of the moving-average operator in a Hilbert space CBline T u r b i l l ~ n ’ ~Jean-Marie ~~~, Marion” and Besnik Pumo3 Universitk Pierre et Marie Curie Laboratoire de Statistique Thkorique et Appliquee 175 rue du Chevaleret 75013 Paris , France (e-mail: cturbillon(0yahoo. f r) 2
Institut de Mathkmatiques Appliquees Centre de Recherche et Etudes en Applications Mathematiques 44-46 rue Rabelais - BP 10808 49008 Angers Cedex 01, France 3
Institut National d’Horticulture UMR SAGAH 2 Rue Lenbtre 49045 Angers Cedex 01, France Abstract. We propose a recursive method for estimating an Hilbert Moving Average process of order 1. Under some assumptions this problem is equivalent to resolution of a non linear equation in an Hilbert space. Our method consist in resolving a non-linear equation recursively, from functional analysis results, and determine a recursive sequence of operators which converges to the solution of our equation. Keywords: Linear processes in Hilbert space, Moving Average Hilbertian, Asymptotic convergence, Recursive method of Riesz-Nagy, Estimation.
1
Introduction
Let [ = ([t.,t E R) be a real valued continuous time process observed on a finite time interval [O,T].By dividing EX into time intervals of equal length 6 we can msociate to this real valued process a functional valued process in some abstract space by setting:
This approach is often used in order to transform a continuous time process in a discrete time process with values in some functional space. For a review on statistical analysis of functional d a t a we refer to Ramsey and Silverman [Ramsey and Silverman, 19973. In this paper we consider t h e p r e diction problem of the process t on a n entire time-interval [T,T 61, or
+
597
598
Recent Advances in Stochastic Modeling and Data Analysis
equivalently the prediction of X.rL+lknowing X I , .. . ,X,. The pioneering work in this direction is due to Bosq in [Bosq, 19911 which introduce the Hilbert autoregressive process, denoted ARH. Using autoregressive representation many authors have proposed and studied various models in order to obtain better predictions: [Besse and Cardot, 19961, [hlourid, 20021, [Antoniadis and Sapatinas, 20031, [Rachedi, 20051, [Marion and Pumo, 20041, [Guillas and Damon, 20051, [Mas and Pumo, to appear]. In [Bosq, 20001 the author gives a complete presentation of general linear process in Banach spaces ( L P B ) . In particular the chapter 7 gives conditions for the existence and invertibility of L P B (see also [Merlevede, 19961). Unfortunately there is little known on estimation problem of the unknown parameters for general L P B processes. In this paper we consider the estimation problem of the moving average processs with values in a Hilbert space H, denoted M A H , which is new up t o the authors' knowledge. Let H be a separable Hilbert space with norm
II.II
and scalar product
< ., . > and C the space of bounded linear operators from H to H . A H valued process X = ( X n ,n E Z)is a moving average process of order 1, we will write, M A H ( 1 ) in the following, if it can be presented as
x,
= En
+ &(&,-I)
(1)
where E = ( E ~n, E Z)is a strong H-white noise, P E L and Elle(~~-~)ll > 0. Recall that a strong H-white noise is a sequence E ~ Sof iid H-random variables such that 0 < E J J E<~cc JJ and ? E ( E ; )= 0). We consider in this paper the estimation problem of the unknown operator 4' based on observations XI,. . . ,X,of the MAH(1) process. Notations and assumptions
Let C and D be the covariance and cross-covariance operators of X given by
C(.)= E < X O , .> X o and D ( . ) = E < X O , . > X I ,
and the covariance operator of
E
defined as
CE(.)= E
< E O , . > EO.
We will use the following assumptions H1 : H2 : H3 : H4 :
3jo 2 1 : llej''l\&< l., EI/Xol14< m., & and CEcommute, ( v j , j 2 1) eigenvectors of C, are known,
The estimation of P is based on the equation
C'D
-
ec-t D = 0
Estimation of Moving-Average Operator in Hilbert Space
599
The estimation of l is a difficult problem because this equation is not linear, and the fact of being in an infinite-dimensional space implies problems with non invertibility and unboundedness operators. Our approach consists in solving (2) by a recursive way, based on a method proposed by Riesz and Nagy [Riesz and Sz-Nagy, 19551 to solve a simple second degree equation whose terms are operators. In Section 2 we briefly discuss the Riesz and Nagy method for resolving an equation of operators like A2 - A + R = 0 and then propose an iterative method t o solve an equation as A 2 R - A + R = 0. In Section 3 we rewrite the initial equation (2) in order to apply the Riesz-Nagy method, determine an estimator of! and finally give the a convergence result for our estimator.
2
Resolution of second-degree equations in the space of operators
In this section, we give general results concerning equations of second degree for operators. At first, we present the Riesz and Nagy method, on which our result are based. Secondly, we extend this way of resolution t o propose results more adapted to solve our equation (2). 2.1
Riesz and Nagy method
Riesz and Nagy present an iterative method to solve an equation as
A’ - 2A + R
= 0,
(3)
with R a positive and symmetrical operator such that R 5 I H (where I H is the identity operator of H , and inequality R 5 I H is equivalent to say that I H - R is a positive operator). To resolve (3), they first rewrite it like A = -1( R 2
+ A’),
end then, show that the solution of (3) is the limit of the sequence (AT.,r 2 O), defined recursively by
{;:+l:;(R+A;)
r 2 0
This result follows from [Vigier, 19461, who establish that “All monotone, bounded sequence of symmetrical operators converges to a symmetrical operator”. For more details concerning Riesz and Nagy demonstration, we refer to [Riesz and Sz-Nagy, 19551.
600
Recent Advances in Stochastic Modeling and Data Analysis
2.2
General results
First at all, we give a useful lemma concerning positive, symmetrical operators obtained by an iterative procedure. 1
Lemma 21 Let R be a continuous operator from H to H, such that ((RII< -, 2 and ( A T r, 2 0 ) an operators sequence defined by
{A A,+, o ==OA;R+ R. Then, each operator of the sequence ( A T , r2 0 ) satisfies
Vr20
llA,ll < 1.
The second result furnishes a resolution method for an equation of the form
A ~R A + R = 0,
(4)
1 with R a positive, symmetrical operator such that 1 IRI I < -, and the solution 2 A of this equation is a symmetrical operator. Proposition 21 Let R be a positive, symmetrical operator, which satisfies 1 (lRII < 2,and let (A4,,72 0 ) be a sequence of operators given by
This sequence converges to a symmetrical operator A , which is the solution + R = 0. Moreover, one gets the following bound of the equation A 2 R - A
3 Estimation of l One recalls that our problem is to estimate the coefficient operator 47 of a MAH(1) process satisfies (l),where C is the solution of the equation
Unfortunately, although the equations (2) and (4) are almost similar, one of our previous result can't be directly applied to M.4H(1). That's why, in this section, firstly we expose our method to rewrite equation (2) as (4). Secondly, we define and estimate the autocorrelation operator of M A H ( l ) ,needed for the estimation of e, the latter is described in a final part.
Estimation of Moving-Average Operator in Hzlbert Space
601
3.1 The principle of the method
In the order to write the equation (2) in the same form as (4),the best would be to multiply (2) by C-l. However, the inversion of this operator is not necessarily possible. That's why, we first deal with properties of operator C with the aim of establishing conditions for existence of C-l. The spectral decomposition of C is given by
j=1
with (cj, wj), j 2 1, is a complete sequence of eigenelernents of C. To ensure the inversibility of C, we make the assumption
H5 : P ( < X0,uj >= 0) = 0,
for all j
2 1.
Consequently, one gets
Thus, we may define an operator p, defined on C ( H ) ,by setting p = U C - l , and the equation (2) may be written under the form D'p
-
a + p = 0.
( 7)
Observe that, if one replaces in this equation e by A , and p by R one obtains the equation A 2 R - A + R = 0 , which solution is determined by Proposition 21. However, Proposition 21 resolve this last equation by a recursive method depending on R. Therefore, by analogy, resolution of (7) needs to construct a sequence depending on p . That's why, in the next subsection we define and study an estimator of p. 3.2
Estimation of p
Let us recall that the estimation of autocorrelation operator p had been introduced by Bosq for H-valued autoregressive process of order (1) [Bosq, 19911. First of all, we give the following notations n : number of observations X I ,X ? , ' . . X , ,, k , : an integer such that k , _< n, and such that k,,
-
00, when n + infty, Hk,, : the subspace of H generated by (211,. ,vk,,} the k , eigenvectors, supposed to be known, associated to the k , greater eigenvalues of C, &: the orthogonal projection operator on HA.,&. I
Recent Advances in Stochastic Modeling and Data Analysis
602
Under assumption H3, projection on pi,,Ok,,
Hk,,
- ek,,
of the equation (7) yields
+ Pk,,
= 0.
(8)
Now, in order to estimate pk,, we define natural estimators C, and D, of respectively C and D , by setting
and 00
D, =
Edjn< u j , . > u j
with
dj, =
12 -
j=1
Since, assumption H5 implies that of pk,, may be written as
1
,--l
~
1 Z=1
< Xi,uj >< X i + l , u j >
c, is invertible over H k , , , an estimator
Now that j n , k , , is defined, we can give a result deals with uniform convergence.
Proposition 31 If assumptions H l , H2, H3, H4 Hilbert-Schmidt operator, then
provided lim- nc'n -(Loy.n)" 3.3
, H5
hold, and p is an
> 0 for some a > 2.
Main Results
Firstly, we indicate a result concerning asymptotic behavior of the estimator of E, denoted jn,k,,, for a fixed n, and r + m. Secondly, one gives the main result of this paper: the behavior of Er.,,k,, when 'r + 00 and a + co. Behavior of the estimation of E , for 12 fixed and r + co Note that, by replacing ek., by .4 and pk, by R, equation (8) coincides with A 2 R - A + R = 0, then we will make adequate assumption to apply Proposition 21. Thus, one considers 1 2
H6 : p is a symmetrical, positive operator such that llpll < -. and the set =
{
wg :
3 N ( w o ) such that for d l 'n 2 N ( w o ) one gets
\lb,,k,,\l
< i} 2
.
Estimation of Moving-Average Operator in Hilbert Space 603
Proposition 32 For every wo E tors sequence as:
and
11
2 N ( w o ) , one defines a n opera-
For a suficient large n, f o r 'r + m, this sequence converge almost surely to the solution of the equation i i > k , , l k 3 k t ,- P^n,k,,
+ bn,k,,
0;
Moreover, we get the bound
Thus, one obtains f o r a fixed n, and r
--+ 03
ll&r+l.n,k,, - &n.k,,II
+
Asymptotic behavior of f!r,n& The next statement deals with behavior of
0
a.s.
i!r,rL:kr,,
when
T
and
12 tend
to M.
Theorem 31 If assumptions H1 to H 6 hold and if p is a n Halbert-Schmidt operator, then f o r r , 'n,kn + 00 Iltr,n,k,,
~
+
0
0.5'
.nc8
provided l i m L -( L o p )
> 0 fore some a > 2 .
References [Antoniadis and Sapatinas, 20031A. Antoniadis and T. Sapatinas. Wavelet methods for continuous-time prediction using hilbert-valued autoregressive processes. J.Multzvariate Anal., 87(1):133-158, 2003. [Besse and Cardot, 1996lP. Besse and H. Cardot. Approximation spline de la pr6vision d'un processus fonctionnel autorkgressif d'ordre 1. Canadian Journal of Statistics, 24:467-487, 1996. [Bosq, 19911D. Bosq. Modelization, non-parametric estimation and prediction for continuous time processes. Nato Asi Series C, 335:509-529, 1991. [Bosq, 20001D. Bosq. Linear Processes in function spaces. Springer Verlag, NewYork, 2000. [Guillas and Damon, 20051s. Guillas and J. Damon. Estimation and simulation of autoregressive hilbertian processes with exogenous variables. Statistical Inference for Stochastic Processes, 8:185-204, 2005. [Marion and Pumo, 2004jJ.M. Marion and B. Pumo. Comparaison des modeles arh(1) et arhd(1) sur des donn6es physiologiques. Annales de l'ISUP., 48:2938, 2004.
604
Recent Advances in Stochastic Modeling and Data Analysis
[Mas and Pumo, to appear]A. Mas and B. Pumo. The arhd model. Journal of Statistical Planning and Inference, 137:538-553, (to appear). [Merlevede, 1996lF. Merlevede. Processus lineaires hilbertiens: inversibilite, thkoremes limites. estimation et prevision. Ph. D. Thesis, University of Paris 6, 1996. [Mourid, 2002lT. Mourid. Estimation and prediction of functional autoregressive processes. Statistics, 36:125-138, 2002. [Rachedi, 2005lF. Rachedi. Estimateurs cribles des processus autorkgressifs banachiques. Ph. D. Thesis, University of Paris 6, fiance., 2005. [Ramsey and Silverman, 1997]J Ramsey and B.W. Silverman. Functional Data Analysis. Springer Verlag, Springer, 1997. [Riesz and Sz-Nagy, 1955lF. Riesz and B . Sa-Nagy. Lecons d’analyse fonctionnelle. Gauthier-Villas, Paris, 1955. [Vigier, 1946lJ.P. Vigier. Etude sur les suites infinies d’operateurs hermitiens. Ph. D. Thesis, GenBve, fiance., 1946.
Monte Carlo Observer for a Stochastic Model of Bioreactors Marc Joannides'
, Irene Larramendy-Valverde' , and Vivien Rossi'
Institut de MatliCinatiques et ModClisation de Montpellier (I3M) UMR 5149 CNRS Place Eugbne Bataillon 34095 Montpellier cedex 5 France (e-mail: {Marc. Joannides,Irene .La.rramendy}Quniv-montp2. fr) Laboratoire de Biostatistique Institut Universitaire de Recherche Clinique 641 avenue du Doyen Gaston Giraud 34093 Montpellier cedex 5 France (e-mail: Vivien.RossiQiurc.montp.inserm.fr) Abstract. This paper proposes a (stochastic) Langevin-type formulation to modelize the continuous time evolution of the state of a biological reactor. We adapt the classical technique of asymptotic observer commonly used in the deterministic case, to design a Montecarlo procedure for the estimation of an unobserved reactant. We illustrate the relevance of this approach by numerical simulations. Keywords: Biochemical Processes, Stochastic Differential Equations, M o n t e Carlo, Observer.
1
Introduction
We are interested in monitoring the state of a biological reactor, which is basically a tank in which microscopic living organisms consume a nutrient. Monitoring the process is often a critical issue in many industrial applications, as it is a first necessary step towards its control. However, only a few of the components of the state (the instantaneous composition of the reactor) are measured by sensor. In particular, the concentrations of the biomass are generally not available on line, yet they are of the greatest importance for the process control. This problem led t o the design of observer as software sensor, which are estimators of the unobserved components based on the available measurements. The reconstructed state is then used in the control process as if it were completely observed. A review of the commonly used techniques can be found in [Bastin and Dochain, 19901. Observers found in the literature can be divided in three distinct classes: the observers obtained from the general theory (Kalman-like and nonlinear observers) exploit all the knowledge given by the model, including 605
606
0
0
Recent Advances in Stochastic Modeling and Data Analysis the kinetics part. However, modeling the biological kinetics reaction is a difficult task, so that the model used by the observers could differ significantly from the reality. This results in a (possibly important) estimation bias. The asymptotic observers make use of a specific feat,ure of the bioprocesses model, related t o the notion of reaction invariant. The idea is t o design an observer for the total mass of the components involved in the biological process, and then to reconstruct the whole state with this observer and the measured components. This approach circumvent the knowledge of the kinetics but its rate of convergence highly depends on the operating conditions. Observers that lie somehow in between these two classes are based on a partial knowledge of the kinetics. They use a parametric model of the reaction kinetics and attempt to estimate the parameters together with the state itself.
All of these approaches have been implemented for various industrial applications ([Dochain, 20031). It should be noted that, excepted for the Kalnian filter, these techniques were mainly developed in a completely deterministic context. Uncertainties in the modelling are accounted for only through varying parameters, and performance in the presence of noisy inputs or measurements are evaluated by numerical simulation. The attempts t o tackle these problems are very few. We should mention the interval observer of [Rapaport and Dochain, 20051 which uses the notion of cooperativity to produce bounds for the asymptotic observer, when the dynamics and the input are uncertain. On the stochastic side, [Rossi and Vila, 20051 proposed a formulation as a filtering problem. Stochastic terms are introduced in the dynamics, the measurements and the initial condition. The object of interest is then the conditional probability law of the whole state given noisy measurement of some components. The model considered was obtained by adding a discrete time white noise t o the deterministic model. The approach presented in the present paper consists in modeling the uncertainties on the dynamics by a stochastic differential system in a way that is consistent with the notion of invariant. We then design a set of asymptotic observers which is used, together with the observed components t o approximate the probability law of the unobserved ones. State estimation, va.ria.nces,bounds and corifidence regions a.re obta.ined from this Monte-Ca.rlo approximation. Recall first the classical model obtain from the mass-balance principle, for a continuous stirred tank reactor :
Monte Carlo Observer for Stochastic Model of Bioreactors
607
where the substrate S is consumed by a biomass B with yield coefficient c1, the biological reaction being represented by
S a B . Here bt and st denotes the concentration of biomass B and substrate S respectively, D is the dilution rate, .(.) the reaction kinetics and s? the substrate concentration in the inlet. Many forms for the reaction rate have been proposed in the literature. The most commonly used is the Monod model
2
Stochastic model
The deterministic dynamics (1) is the sum of three vector fields representing the action of three sources of variations (the biological reaction itself, the flow out and the inlet), each of which being subject to random disturbances. Therefore, if noise terms are to be introduced, they should affect independently the three different directions. The random nature of biochemical reaction at the molecular scale has been mentioned and studied by many author, see [Gillespie, 20001 or [El Samad et al., 20051. At a macroscopic scale, [Kurtz, 19781 modelled the overall effect of these individual reactions on the global concentrations, by an additive noise term of variance proportional to the reaction kinetics (or propensity function) r . In this context, the state (&, S t ) is then a Markov process satisfying the Langevin chemical equation
where V, denotes a Wiener Process. Notice that the drift and the diffusion coefficients act in the same direction. On another side, the flow in and out may also be more precisely described by a stochastic dynamics. Indeed, the dilution rate D denotes actually the average value, ignoring the inhomogeneity of the medium. Adding (formally) independent white noises to D in the flow in a.nd out, we get a stochastic dynamics for the exchange with environment:
-(D
+ W$)
(i:)+
(D
+ wjn)
(syn)
Putting all this together, we get our stochastic model
.
608
Recent Advances in Stochastic Modeling and Data Analysis
We observe that nonlinearity in model (2) lies only in the reaction kinetics. St and note that this We therefore define the total mass proccess Zt = Bt quantity remains unchanged through the biological reaction. The dynamics of Zt, obtained from (2) is then linear
+5
dZt = -D (Zt - 2:")dt
+ Zt dW,Out + 2:"dWj"
.
(3)
It is worth noting that this linear SDE has an explicit solution ([Klebaner, 19981)
where
2
Ut = exp{-(D
+ %2) t + WtOUt} .
+&
Finally, notice that the linear change of coordinate ( b , s) + ( b , s), gives the equivalent model ( Z t ,S t ) ,where 2,has a linear dynamics, which is independent of the reaction kinetics r ( . ) . This feature can be exploited to design efficient simula.tion a.lgorithrns. In the next section, we will use it t o produce an estimation of the unknown concentration Bt , based on the completely observed concentration St.
Monte-Carlo approximation
3
A possible adaptation of the asymptotic observer approach to this stochastic model could be done as follows: Generate initial conditions for a set of N independent asymptotic observers thereafter named particles. Let each particle evolve independently according to dynamics (3), up to time t . Deduce a set of observers Bt using the observed component St.
Otdenotes the law of the mass-balance
process 2,. First of all, t o approximate initial condition Q o ,we begin simulating an approximation of the initial condition PO, the law of (Bo,S O ) . Let {(b6,sd),i = l,..,N } be an N-sample distributed according to Po. We define the Monte-Carlo approximation of Po by
Let
1-
i=l
@'r
It follows from the law of large numbers that if N is large enough then will be a good approximation of Po in the sense that for each 4 bounded measurable
Monte Curlo Observer f o r Stochastic Model of Bioreactors We then deduce the Monte-Carlo approximation of
609
Qo by
. N
where
1-
2; = b6
+ 2.
We then generate N independents solutions of ( 3 ) ,starting from the N initial conditions (2;). Denoting by (z),i= 1, ..N} this set of solutions, we define the empirical measure:
I Back t o the original model 1 Finally,
we translate this empirical law
making use of the observation St. Let {b; = z," -
Or,
3,i = 1,.., N} and define
The unobserved component Bt is estimated by the average -
N i=l
Notice that fi.? provides us with other useful statistics like variance or mode.
4 Numerical results We first illustrate the behaviour of the stocliastic model through iiuinerical siinula.tion, using a. Monod model for the growth ra.te and a. coiista.nt influent concentration s$. The following table shows the values of the parameters:
2
0.33 h-'
5 g/l
0.05 h-'
5 g/l
Table 1. Parameter values
We use the Euler-Maruyama scheme to simulate the solutions of all the SDE involved, see ([Kloeden and Platen, 19923). Fig. 1 shows a typical trajectory of the stochastic system starting from the equilibrium of the corresponding deterministic model (2,054,0393). As
Recent Advances in Stochastic Modeling and Data Analysis
610
.-
7
2 2 m 2a 0
0
o? 0 I .
2
2 2m 2a 2 2 t-
2.050 2.055 2.060 2.065 2.070 2.075
0
20
40
Bt
60
80
100
60
80
100
Zt
xx x N
0
t-
R
0 W
0
N 9
m
2
Ln 0
R
0
20
40
60
80
100
0
20
40
Fig. 1. System evolution starting from equilibrium
expected, the diffusion coefficient is predolnillant so that the system keeps oscillating in a neighbourhood of the equilibrium state. Fig. 2 shows a trajectory for the same system initialized with a value far from the equilibrium. We observe the predoininance of the drift coefficient, which draws the system near the equilibrium, which is the right behaviour. Reasona.ble changes in pa.ra.meters does not a.ffect this globa.1 picture. Performance of the Monte-Carlo asymptotic observer is illustrated by Fig. 3 and Fig. 4, which use observation st from Fig. 1 and Fig. 2 respectively. We represent the particle cloud { b f } f ” together with its density estimation. The true state is spotted by the vertical line. Observe that, since the particle cloud moves according t o the general dynamics, it follows closely the true value. Indeed, each particle is an individual asymptotic observer, i.e. a potential state of the unobserved component. Therefore, the estimated density is, in some sense, a summary of our knowledge on this component based on observed part.
5
Conclusion
We have established the relevance of a continuous time stochastic modelization in biotechnology, that is able to take advantage of the existing knowhow in biology and optimization. We have successfully shown the feasibility of this approach by numerical simulations. At least two directions for future
Monte Carlo Observer for Stochastic Model of Bioreactors
1.0
1.2
1.4
1.6
1.8
2.0
0
2.2
20
40
60
2-
z9:
2 R
7
r 9
Fig. 2. System evolution starting far from equilibrium
n 0. m
$8-
E::
-
Pm.
'I
EZ. m0 -L
Fig. 3. M o n t e c a r l o observer - equilibrium
*' gm
~
5
1" E L
0,
-
0 -
Fig. 4. M o n t e c a r l o observer
80
100
611
612
Recent Advances in Stochastic Modeling and Data Analysis
investigations are generalization t o higher dimensional models ( p reactions involving q reactants) a n d a more realistic treatment of observations. Indeed, we have supposed here that one reactant was observed without noise, in t h e manner of optimizers, whereas this is clearly not t h e case in practical situations. Particle filters would surely b e inore appropriate t o t h e case of noisy observations.
References [Bastin and Dochain, 19901G. Bastin and D. Dochain. On-line estimation and adaptive control of bioreactors. Elsevier, Amsterdam, 1990. [Dochain, 2003lD. Dochain. State and parameter estimation in chemical and b i e chemical processes: a tutorial. Journal of Process Control, 13(8):801-818, December 2003. [El Samad et al., 2005lHana El Samad, Mustafa Khammash, Linda Petzold, and Dan Gillespie. Stochastic modelling of gene regulatory networks. Int. J. Robust Nonlinear Control, 15(15):691-711, 2005. [Gillespie, 20001Dan T . Gillespie. The chemical langevin equation. J . Chem. Phys., 113~297-306,2000. [Klebaner, 1998lFima C. Klebaner. Introduction to Stochastic Calculus with Applications. Imperial College Press, London, September 1998. [Kloeden and Platen, 1992lP.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations, volume 23 of Applications of Mathematics. Springer Verlag, New York, 1992. [Kurtz, 1978lThomas G. Kurtz. Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl., 6:223-240, 1978. [Rapaport and Dochain, 2005lA. Rapaport and D. Dochain. Interval observers for biochemical processes with uncertain kinetics and inputs. Math. Biosci., 193(2):235-253, January 2005. [Rossi and Vila, 2005lVivien Rossi and Jean-Pierre Vila. Filtrage de bioprockdk de d6pollution. approclie par convolution particulaire. e-STA, 2( l ) , 2005.
Monte Carlo studies of optimal stopping domains for American knock out options Robin Lundgren' Malardalen University Department of Mathematics and physics Box 883 721 23 VSsteris, Sweden (e-mail: robin. 1undgenQmdh. se)
Abstract. In this paper generalized barrier options of American type in discrete time are studied. Instead of a barrier, a domain of knock out type is considered. To find the optimal time of exercising the contract, or stopping a Markov price process, an optimal stopping domain can be constructed. To determine the optimal stopping domain Monte Carlo simulation is used. Probabilities of classification errors when determining t h e structure of the optimal stopping domain are analyzed. Keywords: American Option, Knock out option, Monte Carlo Simulation, Optimal Stopping.
1
Introduction
American options give us the possibility t o exercise them at any moment of time up t o maturity. An optimal stopping domain for American type options is a domain that, if the underlying price process enter it, then the option should be exercised in order t o get the maximal payoff. Studies about optimal stopping for Markov type processes were done in [Snell, 19521, [Chow et al., 19711, [Shiryaev, 19781, [Peskir and Shiryaev, ZOOS]. Optimal stopping for American options in[Jacka, 19911, [Kukush and Silvestrov, 19991, [Kukush and Silvestrov, 20041, [Jonsson et al., 20041, [Jonsson, 20051 and [Jonsson et al., 20051. In this paper the theory of the optimal stopping problem for discrete time knock out American options is considered. Preceding works about barrier options are [Boyle and Lau, 19941, [Baldi et al., 19991 and [Broadie et al., 19971. T h e use of Monte Carlo simulation in finance was introduced by [Boyle, 19771, extended in [Boyle et al., 19971 and summarized in [Glasserman, 20041. We develop a n algorithm for generating the optimal stopping domain for American type knock out options. Monte Carlo simulation is used to determine the structure of the optimal stopping domain. This paper includes a short description of the considered model, then some illustrative examples and description, finally analysis of the algorithm where we look at the probabilities of making classification errors. 613
614
Recent Advances in Stochastic Modeling and Data Analysis
The model
2
In this paper we consider a price process to be a geometrical random walk with multiplicative log normal increments S, = S,--le~+aw*,n = 1 , 2 . .. where p 2 0 denote the daily drift of the stock, g > 0 denote the daily volatility of the stock, and W,, n = 1,.. . , N are independent standard normal random variables, where N is the expiration date. A payoff function is a measurable function gn(x) : [0,ca)H [0, co). We assume the following model condition
+ + + +
where R, = TO T I . . . T,, & = 0 and T, is the risk free interest rate between moment n and n 1. Introduce a knockout domain H , and define the rules such that if the underlying process enter the domain, the contract will be worthless. The knock out domain H has the following structure H = { & , H I , . . . , H N } . We introduce the random time T H , the first time entering the knock out domain as TH = min{n 2 0 : S, E H,}. Our goal is to maximize the expected gain. @g(T) =
Ee-RTgr(ST)X(r< T H )
(2)
for all Markov moments 0 5 r 5 N , where x(r < T H ) is the indicator of the set ( T < T H ) . The time we are looking for is the optimal stopping time rapt i.e The optimal stopping time for the American knock out options have a structure similar with those for ordinary American options. Define the operator T, acting on a non-negative measurable function f(z)as
T n f ( z )= Ee-Tn+lf(An+l(z, yn+l))~(An+~(z, K + 1 ) $ H ~ + I ) Xf( Z Hn). Let w o (z) = g ~ ( x ) x (4zH N ) and for n = N - l , N - 2 , . . . , O by the re$ H,) gives two sets: the cursion WN-,(Z) = max{g,(z), T , w N - , - ~ ( z ) } x ( z stopping domain r, = {x E R+ : gn(x) = W N - , ( X ) } , and the continuation domain r," = {x E R+ : g,(z) < W N - , ( Z ) } . The two sets are disjoint. By introducing the two dimensional Markov process 2, = (S,, T,) where T, = ~ ; = , X ( S If, H I , ) ,and then using the general results for optimal stopping theory of random processes in [Shiryaev, 19781, the following theorem about optimal stopping time can be proved.
Theorem 1. The optimal stopping time maximizing (3) is given by rapt = min(0
5n5N
:
S, E T n }
(4)
and @g(roTopt) = W N ( s 0 ) . (5) This theorem is used to construct and analyze the algorithm for finding optimal stopping domains.
Monte Carlo and American Knock Out Options 615
3 Examples of optimal stopping domains As an example, let us consider a constant yearly interest rate of 1%, the underlying process has the daily parameters of p = 0.0, o = 0.05. A put option with payoff function gn(x) = ( K - x)+ is used with K = 35, and we use A4 = 100000 simulations. First example is a knock out domain constant in time at 20 of down and out type. In figure 1, it is seen that the boundary of optimal stopping domain is not monotonic, as in the case of standard American put options.
Days
Days
Fig. 1. To the left: The structure of knock out domain. To the right: The corresponding optimal stopping domain.
s
In figure 2, it is seen that if a small strip knock out domain with 32.5 5 5 33.5 and 25 5 n 5 35 is introduced, then the boundary of optimal
stopping domain has a multi-threshold structure at some moment of time.
Fig. 2. To the 1eft:The structure of knock out domain. To the right: The corresponding optimal stopping domain.
In figure 3, we instead consider a non standard payoff function? a piecewise
616
Recent Advances in Stochastic Modeling and Data Analysis
linear payoff function given by g,(z) = max(K2 - z , a l ( K ~- z),O), where we use K2 = 35, K1 = 15 and a1 = 3.5. It is then seen that the boundary of optimal stopping domain has even more unpredictable multi-threshold structure.
e= ..y1
2 21
e
5
o
I
f
B
2
5
m
J
s
(
n
I
f
y
I
I
,
O
,
5
2
0
6
m
n
a
1
0
C
Y
I
Days
Days
Fig. 3. To the left: The structure of knock out domain. To the right: The corresponding optimal stopping domain.
More examples of the structure of optimal stopping domains for standard, quadratic, piecewise linear and other types of payoff functions and knock out domains will be presented in [Lundgren, 20071.
4
The Monte Carlo algorithm
First define an upper and a lower boundary for the stock prices. Denote s, as the upper level of stock prices and s~ as the lower level. Then define all prices by sn,j = sl + j A , j = 0 , 1 , . . . , J , where sn,j is the price a t moment n with level j . Note that s,,~ = s1, define A in such way that S,,J = s, where n = 0 , 1 , . . . , N denote each moment of time. So define a grid in time and stock prices with discrete points (n ,s , , ~ ) . In this study of optimal stopping domains start from expiration date N and for each point (n,sn,,i)on the grid compare the profit we make by exercising the option at time n with keeping the option until time n + l . Investigate for each moment n = 0 , 1 , . . . , N if the stock price are in the knock out domain s,,~ E H,, and also if the stock price between two adjacent points (n ,s n , j ) and (n ,s,,j+l) belongs to the stopping domain or not. That is done by constructing an interval In,j
= [sn,j -
A/2,S n , j
+ Al2)
and if I n , j W , # 8 say that sn,j E r,. Also note that since the option matures a t n = N all prices a t that moment belongs to the optimal stopping domain, that is r, = [ s l , s,]. We use backward induction and next consider the moment n = N - 1, at this moment we have the choice to exercise the option
Monte Carlo and American Knock Out Options 617
a t the moment n = N - 1 or to keep holding the contract until the moment n = N . so for each j we make the comparison if g ( s N - 1 , 3 ) 2 T w ~ ( S N - ~ , , ) then I n - l , j nrn-1,3 # 0. So for the moment n = N - 1 the optimal stopping domain is given by, rN-1
U
= 3:9(sn-1,,
1N-1,~.
)>-JO(SN-l.,
1
Monte Carlo simulation is used to determine T q J ( S N - 1 , 3 ) . For dent simulations the continuation profit will be given by
indepen-
and the approximated optimal stopping domain will be given by rN-1
U
= 3
g(sn-1
IN-^,^.
(6)
,)2~K)&N-l,,)
For the moment n = N - 2 and eachg the comparison g ( S N - 2 , j ) 2 T w ~ ( S N - Z , ~ ) is made, and then c rnp1,3. For the moment n = N - 2 the optimal stopping domain is given as in (6). At the moment when we determine the continuation profit, we also need to consider the possibility of entering the stopping domain and having an early exercise a t the moment n = N - 1, that is E f ~ We ~also need ~ to . consider the possibility of entering the knock out domain and getting a zero payoff. The continuation profit is determined by T,-,g(sN-z) -(W = M (e-Fg(sN-,)x(sN-l E i.N-l)x(sC)Ll $2 ~ N - 1 1
stLl
+ c,=~
+e-2Tg(sC))X(s:)-l
(2)
(2)
$2 t - l ) x ( s : ) L l
!$ HN-l)X(S$)
$2 H N ) )
For every moment n = 0 , 1 , . . . , N - 3 we have to determine the continuation profit T W N - n - 1 ( S n , 3 ) of every stock price sni3,g = 0 , 1 , . . . , J , and use the fact that we know the structure of the optimal stopping domain for each moment of time n 1, n 2 , . . . , N - 1, N when making the estimates.
+
5
+
Classification errors
When we study the algorithm we study the probabilities of classification errors. We have two types of classification errors. First we have the classification error when the algorithm indicates that the stock price sn,jbelongs to the optimal stopping domain g(sn,j) 1 T i M ) g ( s n , j ) ,but instead belongs t o the continuation domain g ( s n , j ) < T W N - n - l ( S , , j ) . From the central limit theorem we get that the probability pn,j is h
618 Recent Advances in Stochastic Modeling and Data Analysis
piM)
where the standard deviation of the estimate is given by u n , j / aand on,?is the standard deviation of one component. The second type of classification error we can make is when the algorithm indicates that the price does not belong to the optimal stopping domain g(s,,j) < Tn ^ ( M ) (s,,j), but the stock price does g ( s n , j ) 2 T w N - , - ~ ( s , , ~ ) , we get by the central limit theorem the probability of make such kind of classification error to be
So the probability of making such error are the same as for the first type of error. Use the second estimate of ?AM)g(sn,j) to get a good estimate of on,j. Define the estimate of the second moment of the moment n = N - 1 as - ( M I -I (’) 2) TN-l - M e - 2 r g 2 ( S N , j ) X ( S(N , j $ H N ) . For moments n = 0 , l . . . , N - 2
xE1
the formulas are similar. Then the estimate of the variance is i?&. = T, - ( M )g ( s n , j ) - (T, -(MI g(s,,j))’. Let EC7t = max(EC7t, sn,j) be the optimal continuation profit. Then define the operator L and the dimensionless measure of the variance d i , j as
Note that if L,,j 5 0 the optimal strategy is to exercise. Then the probability of making classification error is given by
Let us consider the case when n = N - 1. That because of when using the optimal stopping domain for estimation of the price of the option contract, the price is more sensitive to perturbations of the stopping domain close to maturity. We now use Monte Carlo simulation to determine an estimate of Ecyt the values defined above. We will consider a yearly interest rate r = 4% a strike price of K = 100, A = 1.0 and a barrier at HN = 85. We use A4 = lo7 simulations for an underlying process with a yearly drift of p = 0.0 and yearly volatility of o = 0.24. It is seen in table 1 that when s N - l , j = 97 the measure L ~ - i , jis first time negative. From that we can conclude that the strategy to exercise the option at that moment is more profitable than to continue to hold the option until moment N . We also see that the variance -increases as the underlying price approaches the barrier. We also see that EgZi,jfor S N - I , ~= 86 is less than when S N - I , ~= 87, this is because of the large probability of crossing the barrier. When studying table 2, it is seen that the probability of making classification error has the greatest value around s N - l , j = 97, and from table 1 it is known that it is the boundary of the stopping domain.
Monte Carlo and American Knock Out Options 619 N-l,j 99.00 98.00 97.00 96.00 95.00 94.00 93.00 92.00 91.00 90.00 89.00 88.00 87.00 86.00 -
(sN-l,j) 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
10.00 11.00 12.00 13.00 14.00
HN 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00 85.00
qp:,
Uk-l,?
1.21216 2.03980 2.98171 3.96526 4.95841 5.95165 6.94492 7.93844 8.93252 9.92549 10.90246 11.74688 11.94892 10.46743
1.37511 1.85832 2.06349 2.09163 2.06295 2.01901 1.97731 1.93460 1.89355 1.85841 1.93013 3.17346 10.83933 32.00194
LN-l,j 0.2 1216 0.01990 -0.00610 -0.00869 -0.00832 -0.00806 -0.00787 -0.00770 -0.00750 -0.00745 -0.00887 -0.02109 -0.08085 -0.25233
4-1.i 1.37511 0.46458 0.22928 0.13073 0.08252 0.05608 0.04035 0.03023 0.02338 0.01858 0.01595 0.02204 0.06414 0.16328
y?i,j, the variance and for the measures Z N - I , ~&, l , j . Note that the measure E N - I , ~ changes sign when entering the optimal stopping domain. Table 1. Estimated values of the expected continuation profit
~
-2 uN-l,j
sN-l
99.00 98.00 97.00 96.00 95.00 94.00 93.00 92.00 91.00 90.00 89.00 88.00
M = 5 . 104 0.0 1.11, 0.3292 0.3249 0.2766 0.1662 0.0672 0.008088 0.0379 0.001421 5.026, 10-- 0 5 0.0
M= 1 . 1 0 5
0.0
M = 1 . 106 0.0
11
M= I . 107 M= I . lo8 0.0 0.0 7.074. 10-l'
0.0 0.0 0.0 0.0 0.2929 0.1451 .^ 0.0468 4.758. lo-'" 0.0 0.2618 3.597. 1 0 - l ~0.0 0.0739 1.188. 1 0 - l ~0.0 0.0305 3.186. 1 0 - l ~0.0 0.0128 1.155. 0.0 0.0344 6.661. 0.0 0.001539 1.110. 10-16 0.0 0.003090 -09 0.0 0.0 7.286. 10 0.0 0.0 0.0
0.0 0.0
Table 2. The probability of classification error PN-. 1 , j , note that the greatest
6
Conclusions
This paper presents the results of experimental studying of the structure of optimal stopping domains for American knock out options. The optimal stopping time is given by first hitting moment of the optimal stopping domain (4), the structure of optimal stopping domain is given by the recursion ( 5 ) . Optimal stopping domains can poses complex, sometime unpredictable multi-threshold structure determined by the payoff functions and knock out domains. From Table 2 it is seen that the probability of making classification errors has the greatest value near the boundary of the optimal stopping domain and maturity.
620
Recent Advances in Stochastic Modeling and Data Analysis
References [Baldi et al., 19991P. Baldi, L. Caramellino, and M.G. Iovino. Pricing general barrier options: A numerical approach using sharp large deviations. Mathematical Finance, 9 (4) :293-322, 1999. [Boyle and Lau, 1994lP.P Boyle and S.H Lau. Bumping up against the barrier with the binomial method. The Journal of Derivatives, 1:6-14, 1994. [Boyle et al., 1997lP.P Boyle, M. Broadie, and P. Glasserman. Monte carlo methods for security pricing. Journal of Economic Dynamics and Control., 21:12671321, 1997. [Boyle, 1977lP.P Boyle. Options: A monte carlo approach. Journal of Financial Economics, 4:323-338, 1977. [Broadie et al., 1997lM. Broadie, P. Glasserman, and S. Kou. A continuity correction for discrete barrier options. Mathematical Finance, 7(4):325-348, 1997. [Chow et al., 1971]Y.S. Chow, H. Robbins, and D. Siegmund. The theory of Optimal Stopping. Houghton Mifflin Comp., 1971. [Glasserman, 2004jP. Glasserman. Monte Carlo Methods in Financial Engineering. Springer, 2004. [Jacka, 1991lS.D Jacka. Optimal stopping and the american put. Mathematical Finance, l(2):l-14, 1991. [Jonsson et al., 2004lH. Jonsson, A.G. Kukush, and D.S. Silvestrov. Threshold structure of optimal stopping strategies for american type option. i. Teor. fmovirnost. ta Matem. Statyst., 71:82-92, 2004. [Jonsson et al., 2005lH. Jonsson, A.G. Kukush, and D.S. Silvestrov. Threshold structure of optimal stopping strategies for american type option. ii. Teor. fmovirnost. ta Matem. Statyst., 72:42-53, 2005. [Jonsson, 2005lH Jonsson. Optimal Stopping Domains and Reward Functions for Discrete Time American Type Options. Phd thesis, Malardalen University, 2005. [Kukush and Silvestrov, 1999lA.G. Kukush and D.S. Silvestrov. Optimal stopping strategies for american type options with discrete and continuous time. Theory of Stochastic Processes, 5(21)(1-2):71-79, 1999. [Kukush and Silvestrov, 2004lA.G. Kukush and D.S. Silvestrov. Optimal pricing for american type options with discrete time. Theorg of Stochastic Processes, 10(26)(1-2):72-96, 2004. [Lundgren, 2007lR. Lundgren. Structure of optimal stopping domains for american put options with knock out domains. Technical report, Department of Mathematics and Physics, Malardalen University, 2007. [Peskir and Shiryaev, 2006lG. Peskir and N. Shiryaev, A. Optimal Stopping and free boundary value problems. Birkhauser, 1 edition, 2006. [Shiryaev, 1978lN. Shiryaev, A. Optimal Stopping Rules. Springer Verlag, 1978. [Snell, 1952lL. Snell, J . Applications of martingale system theorems. Transactions of the American Mathematical Society, 2(73):293-312, 1952.
SONAR Image Denoising Using a Bayesian Approach in the Wavelet Domain Sorin Mogal and Alexandru Isar’ GET / ENST Bretagne TAMCIC / CNRS UMR 2872 Techople Brest-Iroise CS 83818 - 29238 Brest Cedex 3 France (e-mail: sorin.moga@enst-bretagne . fr) Timisora ”Politehnica” University Electronics and Telecommunications Faculty, Communications Dept., Timisoara, Romania e-mail: [email protected]) Abstract. The performance of image denoising algorithms using the Double Tree Complex Wavelet Transform, D T CWT, followed by a local adaptive bishrink filter can be improved by reducing the sensitivity of t h a t filter with the local marginal variance of the wavelet coefficients. In this paper is proposed a solution for the sensitivity reduction based on enhanced diversity. Keywords: wavelet transform, Bishrink Filter.
1
Introduction
During acquisition, the SONAR images are corrupted by multiplicative noise (speckle). The aim of a n image-denoising algorithm is then to reduce the noise level, while preserving the image features. Such a system must realize a great noise reduction in the homogeneous regions (that represent the majority of a SONAR image) and the preservation of the details of the scene in the other regions. There is a great diversity of estimators used like denoising systems. A possible classification criterion for these systems takes into account the theory that is found at the basis of each one. In this respect, there are two categories of denoising systems, those based on wavelet theory and the others. In fact, David Donoho, introduced the word denoising in association with the wavelet theory, [Donoho and Johnstone, 19941. From the first category, taking into account their performance, we must mention the denoising systems proposed in [Foi et al., 20071 and [Walessa and Datcu, 20001 The denoising system proposed in [Foi et al., 20071 is based on the shape-adaptive DCT (SA-DCT) transform that can be computed on a support of arbitrary shape. The SA-DCT is used in conjunction with the Anisotropic Local Polynomial Approximation (LPA) - Intersection of Confidence Intervals (ICI) technique, which defines the shape of the transform sup62 1
622
Recent Advances in Stochastic Modeling and Data Analysis
port in a pointwise adaptive manner. Since supports corresponding to different points are in general overlapping, the local estimates are averaged together using adaptive weights that depend on the region’s statistics. The denoising system proposed in [Walessa and Datcu, 20001 is a maximum a posteriori (MAP) filter that acts in the spatial domain. It makes a different treatment of regions with different homogeneity degree. These regions can be treated independent with the same MAP filter choosing between different prior models. The multi-resolution analysis performed by the wavelet transform, (WT) has been shown to be a powerful tool to achieve good denoising. In the wavelet domain, the noise is uniformly spread throughout the coefficients, while most of the image information is concentrated in the few largest ones (sparsity of the wavelet representation), [Foucher et al., 2001,Achim and Kuruoglu, 20051. The corresponding denoising methods have three steps, [Donoho and Johnstone, 19941: 1. The computation of the forward WT, 2. the filtering of the wavelet coefficients, 3. the computation of the inverse wavelet transform of the result obtained, (IWT). Numerous WTs can be used to operate these treatments. The first one was the Discrete Wavelet Transform, DWT, [Donoho and Johnstone, 19941. It has three main disadvantages, [Kingsbury, 20001: lack of shift invariance, lack of symmetry of the mother wavelets and poor directional selectivity. These disadvantages can be diminished using a complex wavelet transform. In the following, the Dual Tree Complex Wavelet Transform, D T CWT, [Kingsbury, 20001, will be used. This is a redundant W T , with a redundancy of 4. All the WTs have two parameters: the mother wavelets, MW and the primary resolution, PR, (number of iterations). Another appealing particularity of those transforms, becoming from their multiresolution capability, is the interscale dependency of the wavelet coefficients. Numerous nonlinear filter types can be used in the W T domain. A possible classification is based on the nature of the useful component of the image to be processed. Basically, there are two categories of filters: those built supposing that the useful component of the input image is deterministic and those based on the hypothesis that this component is random. To the first category belong the hard-thresholding filter, [Donoho and Johnstone, 19943, the soft-thresholding filter, [Donoho and Johnstone, 1994,Luisier et al., 20071, that minimizes the Min-Max estimation error and the Efficient SURE-Based Inter-scales Pointwise Thresholding filter [Luisier et al., 20071, that minimizes the Mean Square Error, (MSE). Filters obtained by minimizing a Bayesian risk, typically under a quadratic cost function (a delta cost function (maximum a posteriori-MAP estimation [Foucher e t al., 2001,Sendur and Selesnick, 20021, [Achim and Kuruoglu, 20051, [Gleich and Datcu, 20061) or the minimum mean squared error - MMSE esti-
mation [Pizurica and Philips, 2006,Portilla et al., 20031 belong to the second category. The construction of MAP filters supposes the existence of two
SONAR Image Denoising
623
statistical models, for the useful component of the input image and for its noise component. The MAP estimation of w, realized using the observation y = w n, (where n represents the W T of the noise and w the W T of the useful component of the input image) is given by the following relation called MAP filter equation:
+
where fz represents the probability density function (pdf) of 5 . Generally, the noise component is supposed Gaussian distributed. For the useful component there are many models. We have proved in [Isar et al., 20051 that this distribution changes from scale to scale. For the first iterations of the W T it is a heavy tailed distribution and with the increasing of iterations number it converges to a Gaussian. There are two solutions to deal with this mobility. The first one supposes to use a fixed simple model, risking an increasing of imprecision across the scales. This way, there is a chance to obtain a closed form input-output relation for the MAP filter. This is the case of the bishrink filter [Sendur and Selesnick, 20021. An explicit input-output relation has two advantages: it simplifies the implementation of the filter and it permits the analysis of its sensitivities. The second solution supposes to use a generalized model, defining a family of distributions and the identification of the best fitting element of this family for the distribution of the wavelet coefficients at a given scale. For example in [Foucher et al., 20011 is used the family of Pearson’s distributions, in [Achim and Kuruoglu, 20051 the family of S cx S distributions and in [Gleich and Datcu, 20061 is used the model of Gauss-Markov random field. The use of such generalized model makes the treatment more precise but implies implicit solutions for the MAP filter equation, it can be solved only numerically, demanding more time and memory resources, and the sensitivities of the filter obtained cannot be evaluated. If the pdfs fw and f n do not take into account the interscale dependency of the wavelet coefficients than the MAP filter obtained is called marginal. This paper proposes a new denoising method for images based on the association of the D T CWT with a filter bank composed by different variants of bishrink filters. With the aid of those filters the diversity is enhanced in the wavelets domain. This gain in diversity allows us to locally correct some distortions produced by the association DT CWT - bishrink filter. In fact, we propose a statistical segmentation of the result obtained applying the association DT CWT - bishrink, in regions with different homogeneity degree. Each such region is treated with a different element of the filter bank. Finally the results obtained this way are averaged together. The second section presents the architecture of the proposed denoising system. The aim of the third section is the presentation of some simulation results.
624
2
Recent Advances in Stochastic Modeling and Data Analysis
The proposed denoising method
We consider the denoising of an image s corrupted by additive white Gaussian noise, AWGN, with variance o:. In order to exploit the interscale dependency of the wavelet coefficients, let w 2 k represent the parent of W l k . The parent is located at the same geometrical coordinates like the child, but at the successive scale. The problem is formulated in wavelet domain as g l k = Wlk n l k and Y2k = w 2 k n 2 k . We can write:
+
+
Y k = w k -k n k
(2)
where w k = ( W l k , w 2 k ) , Y k = ( Y l k r Y 2 k ) and n k = ( n l k r n 2 k ) . In [Sendur and Selesnick, 20021 is proposed a Laplace bivariate model for the wavelet coefficients of the useful component of the input image and a bivariate Gaussian model for the wavelet coefficients of the noise. The MAP estimator derived using these models is:
Here (g)+ is defined as:
0, if g < 0 g , otherwise
'
(4)
This estimator, named bishrink filter, requires the prior knowledge of the noise variance o i , and the marginal variance g 2 , of the useful component of the input image, for each wavelet coefficient. In [Sendur and Selesnick, 20021, the marginal variance of the kth coefficient is estimated using neighboring coefficients in the region N(lc),a squared shaped window centered at the kth coefficient with window size 7 x 7. The estimator described by (3), ) is named local adaptive bishrink filter. One of its most important parameters is the marginal variance, o . The with 6 is defined by: sensitivity of the estimation
For the coefficients satisfying:
this sensitivity becomes:
SONAR Image Denoising 625
It is inverse proportional with the local degree of homogeneity measured by the value of 6. The goal of this paper is to reduce the distortion of zones with different degree of homogeneity of the image produced by a denoising system based on the DT CWT and the local adaptive bishrink filter. The solution proposed is the enhancement of the estimation diversity. Two types of DTCWT are computed. For each wavelet coefficient, three variants of bishrink filter are applied, obtaining six different estimations 6 1 . Averaging these values, a better estimation is obtained. For the wavelet coefficients with higher 6 a reduced number of variants is applied. This procedure is equivalent with the use of p different denoising systems in the region corresponding to values of 8 belonging to the interval I7-, (defined latter), and the fusion of their results. The architecture of the proposed denoising system is presented in figure 1
Fig. 1. The architecture of the proposed denoising system.
six estimates of the wavelet coefficients & I A , w Z A , & g ~ ,& I F , W ~ and F are ~ produced. For each one the IDTCWT is computed, obtaining six estimates $ A , S,,, S ~ A$,I F , 32, and 3 3 ~ The . image is segmented in six classes following the values of the local standard deviations of its pixels. Using the class selectors CS1 - CS, each image S ~ -AS 3 3 ~is treated in a different manner. The segmentation block, Segm, creates the map of each class (the list of pixels positions belonging to that class). The corresponding class selectors, CS,, use these maps. They pick up the pixels of their input images with the positions belonging to the corresponding map, generating the class of those images. CS1 has only one input and generates the first class of the image 9 2 , 4 . CS2 has two inputs and generates the secand so on. The first class of the final ond class of the images S2,4 and estimate 31 is identical with the first class of the image s 2 A . The second class of the final result, S 2 is obtained averaging the pixels belonging to the w
3
626
Recent Advances in Stochastic Modeling and Data Analysis
second classes of- the images $ A and $ A and so on. For the last class of the final result, $5, containing uniform zones, all the pixels*belonging to the sixth class of the estimates 21.4, & A , 9 3 A , & F , S ~ and F S 3 F are averaged. The filter F1 is of bishrink type. The construction of the filters F 2 and F 3 correspond to two diversification principles: the estimation of the local standard deviations of the wavelet coefficients and the pdf of those coefficients. There are two kinds of filters used for the computation of the D T CWT: for the first level and for the other levels, [Kingsbury, 20001. The first diversification in figure 1 is realized by the selection of two types of filters for the first level. The first one is selected from the ( 7 , 5)-tap Antonini filters pair (for DT CWT A) and the second one (for D T CWT F) corresponds to the pair of Farras nearly symmetric filters for orthogonal 2-channel perfect reconstruction filter bank, [Abdelnour and Selesnick, 20051. The estimation of CT in [Sendur and Selesnick, 20021 is not very precise for two reasons. First, it is based on the correct assumption that y1 and y2 are modeled as zero mean random variables. But their restrictions to the finite neighborhood N ( k ) are not necessary zero mean random variables. So, is better to estimate first the means in the neighborhood . The second reason of imprecision is the fact that [Sendur and Selesnick, 2002: refers only to one of the two trees of the D T CWT. In the following the detail wavelet coefficients produced by this three will be indexed with re. The detail wavelet coefficients produced by the other tree will be indexed with im. i m f i y , im6i and zm6yare computed in The local parameters each neighborhood N ( k ) . Then the global estimation of the marginal standard deviation obtained by averaging reby and imeyis done. The filter F2 is a variant of the bishrink filter based on this estimation of the marginal standard deviation. In [Isar et al., 20051 is proposed a new variant of bishrink filter, named mixed bishrink filter, that acts for the first three iterations of each DWT like a bishrink filter, for the forth iteration like a local adaptive Wiener filter and for the fifth iteration of each DWT (the last one) like a hard thresholding filter with the threshold equal with 3en. The filter F 3 in figure 1 is a mixed bishrink filter. The image 9 2 . 4 is segmented in classes whose elements have a value of the local variance, 8 2 ,~belonging to one of ap a+ z , 1 6 ~ ~ m a zwhere: ) p = ~a1 r = 0 and six possible intervals I , = ( a p ~ 2 ~ m a7 = 1. The class selector CS, in figure 1, selects the class associated to the interval 17-p. Preliminary tests proved that the six estimates are classified from better to poor in the form: 91.4,$ A , 9 3 A , S ~ F3 2, , and 9 3 ~ from , the pick signal to noise ratio, PSNR, point of view. These tests also suggest the following values for the bounds of the intervals I p : a2 = 0.025, a 3 = 0.05, a4 = 0.075, a5 = 0.1 and a6 = 0.25. The first class of the final result contains some pixels of the image & A . The second class of the final result is obtained by averaging the second classes of two partial results, 6 2 and ~ 6 3 and ~ so on. The sixth class of the final result is obtained by averaging the sixth classes ~ 9 3 ~ . of all the partial results, ,!&A, S ~ A9 ,3 . 4 , & F , 2 2 and
SONAR Image Denoising 627 In the case of SONAR images the algorithm already presented may be completed with a logarithmic input block (because the noise is multiplicative) with an inverse logarithmic output block and with a mean correction procedure.
Fig. 2. The proposed denoising system b) corrects some distortions introduced by the bishrink filter a).
3
~ i ~ u l a t i oresults n
An example, highlighting the better performance of the new denoising algorithm in the uniform zones, is given in figure 2 for the image Lena. The original image was perturbed with an AWGN with un = 100. A region ohA illustrated in figure 2a). The same region tained cropping the image S ~ is was also extracted from the image S and is illustrated in figure 2b). Analyzing the two pictures in figure 2 it can be observed that some very localized distortions, present in picture a), was corrected in picture b). We also compared the proposed algorithm to other effective systems in the literature, namely the local adaptive bishrink filter in [Sendur and Selesnick, 20021, the denoising system based on the steerable pyramid proposed in [Portilla et: al., 20031 and the denoising processor introduced in [Achim and Kuruoglu, 20051. The comparison was done using two images: Boats and Barbara, having the same size, 512x512 pixels and the results are presented in Table 1. 3 ~ i ~ u l a t i o n results
The results presented illustrate the effectiveness of the proposed algorithm. The comparison made suggests the new denoising results are competitive with the best wavelet-based results reported in literature. So, the less precise
628
Recent Advances i n Stochastic Modeling and Data Analysis
statistical model c a n be locally corrected obtaining a fast denoising algorithm. One of o u r future research directions is the formalization of t h e heuristic choices in t h i s paper. T h e results obtained for the tre a tm e nt of SONAR images seems also to b e very good.
5
Acknowledgment
T h e research w i t h t h e results reported in t h i s p ap er was realized in the framework of t h e Pr o g r a mme de recherche franco-roumain Brancusi with t h e title: De'bruitage des images SONAR e n utilisant la the'orie des ondelettes.
Table 1. The PSNR values of denoised images for different test images and noise levels (a,) of (A) noisy, (B) denoising system in [Portilla et al., 20031, (C) denoising processor in [Achim and Kuruoglu, 20051, (D) local adaptive BISHRINK filter in [Sendur and Selesnick, 20021 and (E) our proposed algorithm.
References [Abdelnour and Selesnick, 2005lA.F. Abdelnour and I.W. Selesnick. Symmetric nearly shift-invariant tight frame wavelets. Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processzng, IEEE Transactions on], 53(1):231-239, Jan. 2005. [Achim and Kuruoglu, 2005lA. Achim and E.E. Kuruoglu. Image denoising using bivariate /spl alpha/-stable distributions in the complex wavelet domain. Signal Processing Letters, IEEE, 12(1):17-20, Jan. 2005. [Donoho and Johnstone, 1994lD. L. Donoho and I. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3):425-455, 1994.
SONAR Image Denoising
629
[Foi et al., 2007lA. Foi, V. Katkovnik, and K. Egiazarian. Pointwise shape-adaptive dct for high-quality denoising and deblocking of grayscale and color images, to be published in ieee transactions on image processing, vol. 16, no. 4,april2007. http://www.cs.tut.fi/ foi/papers/tipsadct_revised-doublecolumn.pdf, 2007. [Foucher et al., 2001]S. Foucher, G.B. Benie, and J.-M. Boucher. Multiscale map filtering of sar images. Image Processing, IEEE Transactions on, 10(1):49-60, Jan. 2001. [Gleich and Datcu, 2006lD. Gleich and M. Datcu. Gauss-markov model for waveletbased sar image despeckling. Signal Processing Letters, IEEE, 13(6):365-368, June 2006. [Isar et al., 2005lA. Isar, S . Moga, and X. Lurton. A statistical analysis of the 2d discrete wavelet transform. In J . Janssen and P. Lenca, editors, Proceedings of the XIth International Symposium on Applied Stochastic Models and Data Analysis. ENST Bretagne, May 2005. ISBN 2-908849-15-1. [Kingsbury, 2000lN. Kingsbury. A dual-tree complex wavelet transform with improved orthogonality and symmetry properties. In Image Processing, 2000. Proceedings. 2000 International Conference on, volume 2, pages 375-378~01.2, 10-13 Sept. 2000. [Luisier et al., 2007lF. Luisier, T. Blu, and M. Unser. A new sure approach to image denoising: Interscale orthonormal wavelet thresholding. Image Processing, IEEE Transactions on, 16(3):593406, March 2007. [Pizurica and Philips, 2006lA. Pizurica and W. Philips. Estimating the probability of the presence of a signal of interest in multiresolution single- and multiband image denoising. Image Processing, ZEEE Transactions on, 15(3):654-665, March 2006. [Portilla et al., 2003lJ. Portilla, V. Strela, M.J. Wainwright, and E.P. Simoncelli. Image denoising using scale mixtures of gaussians in the wavelet domain. Image Processing, IEEE Transactions on, 12(11):1338-1351, Nov. 2003. [Sendur and Selesnick, 2002lL. Sendur and I.W. Selesnick. Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on/, 50(11):2744-2756, Nov. 2002. [Walessa and Datcu, 2000lM. Walessa and M. Datcu. Model-based despeckling and information extraction from sar images. Geoscience and Remote Sensing, IEEE Transactions on, 38(5):2258-2269, Sept. 2000.
Performance evaluation of a tandem queueing network Smail Adjabil and Karima Lagha2 Laboratory LAMOS (Jniversity of Bejaia 06000 Bejaia, Algeria (email: adjabiahotmail.tom) Laboratory LAMOS University of Bejaia 06000 Bejaia, Algeria (e-mail: 1aghaQyahoo.f r) Abstract. In order to evaluate the characteristics of a tandem queueing network, we propose a study, taking into account the qualitative properties of distributions. For this, we consider different bounds (lower and upper bounds) for different classes of nonparametric distributions. These bounds are computed while applying the QNA method (Queuing Network Analyser). To verify whether the proposed intervals include(c0ntain) the approximate values, we have considered some approximations as those corresponding to KLB (Kramer Langenbach Benz) and simulation methods. Two algorithms have been constructed for programming the methods, and implemented for under the assumption that the inter-arrival distribution of the network is parametric or nonparametric. Keywords: Queueing networks, nonparametric distribution, simulation.
Introduction An efficient designing of networks cannot be performed without knowing their performances. Performance evaluation allows, for instance, to: - compare different topologies and protocols of networks according t o their application and the service offered. - anticipate and correct eventual problems related t o performances. There are three techniques for performance evaluation: Technique of Data (direct measurement), analytical technique and the simulation technique. In the present work, we consider a queueing network in tandem. A performance evaluation is carried out by using the qualitative property of distributions for determining a lower and upper bound for the characteristics of the network. These bounds will be computed, after determining the parameters of the distributions characterizing the inner flows by using the QNA method( Queueing Network Analyzer). In view t o verify whether the proposed intervals contain the approximate values, we have applied the KLB(Kramr and Langenbach-Belz) and the simulation methods.
1
Themodel
We consider the following queuing network:
630
Performance Evaluation of Tandem Queueing Network h
w
-?cIIlRo
-xo
w
631 w
-sTIlItlo
Fig 1. Open series queuing network The studied network is composed by S stations in tandem, with 1 server. The served customers in a given station are directed to the next station(by order). For all stations, the discipline considered is : first in first out (FIFO), and the capacity of the stations are assumed to be unlimited. The inter-arrival and service times are independent and distributed according to a general distribution, with rates X and p, respectively. C, and C, denote the coefficients of variation of the two processes (arrivals and services). If Pz3 is the probability that a customer that leaves station i enters station j , then: p.,- 1 , i f j = i + l pour Z = O ,..., s 0, otherwise. where Pol denotes the probability that a customer who arrives to the system enters station 1.
{
2
Bounds of the characteristics
We consider a queuing system of type A / G I / l , where A denotes inter-arrival times distribution and G I a general distribution of service. If A belongs to a nonparametric class of distributions ( D M R L , I M R L , N W U E , y M R L A , . . .) then, the mean waiting time E W f in the queue of the system is bounded by lower and upper bounds [Stoyan, 19841. These approximations can be used for the first station of the network we study. The distribution of the inter-arrivals times between stations in not known (can not be analytically determined), hence, we approach it according to the coefficient of variation.
3
Approximations
1. If the coefficient of variation C, for the inter-arrivals times between stations is greater than 1, then the distribution can be approximated by a hyperexponential distribution H (mixture of exponential distributions). This latter has the D F R (decreasing Failure Rate) property. The characteristics corresponding to the system H / G I / l are:
Where L H / G I / ~and EWH/GII1 stand for the average number of customer and the mean sojourn time in the system H / G I / l (the sojourn time is the sum of the service duration and waiting time in the queue of the system).
632
Recent Advances in Stochastic Modeling and Data Analysis
2. If the coefficient of variation C, is less than 1, the distribution can (C: = i , k 2 2). This be approximated by a Erlang distribution &, distribution has the property N B U E , and the characteristics of the system Ek fGI11 are: 1 1 [p2(C,” 1) k-’ - 11 p 5 LEkJGII1 5 2[p2(c,” 1)+ 2 p ( k p 1 2(1 - P) (1 - P ) -1) + 1 - k-’1 + p . (4)
+ +
+
+
2p(lC-1+ s - 1)
+ 11 + -.P1
where LEk/GI/l et EWE,/GI/1 respectively denote the average number of customers and the mean waiting time in the system Ek f G I f 1. The bounds proposed in this section can be used for approximating the characteristics of the systems inside the network (inner stations). In order to calculate these bounds we need t o know the parameters of the entry flows to each station, for this we apply the Q N A method.
4
The Q N A Method
According t o the equation of the traffic rate, we have
Xj =
g1 XiPij =
Xj-lPj-1,j
= Xj-1,
V j = 2 , . .. ,
s.
Then, X j = X j - 1 = A, V j , where X is the external rate of arrivals t o the network. The departure from a given station of the network are the arrivals t o the next station (see figure )
Fig 2. Parameters of the j t h station. According t o the formula of variability of traffic we have :
c;,j= c,2,j+l = p ; c z j + (1- pj”,c;’;.,
j = 0,
s - 1.
In the case of the network we our study, p j = p = X / p et Czj = C:, V j . Hence, = p2c: (1 - p2)C:j-lr j = 2 , . . . (6) For evaluating the characteristics of the network by using qualitative properties of distributions, we propose two algorithms. These two algorithms consider the bounds (2), (3), (4) and (5) for approximating the characteristics of the inner stations. These bounds (get)the qualitative feature of the hyper-exponential and Erlang distributions. As an application t o the network, two algorithms are constructed. In algorithm 1, we consider the parametric distribution of the external flow ( M , D ,
c:’;.
+
,s.
(5)
Performance Evaluation of Tandem Queueing Network
633
H , E , denote Poisson, Deterministic, Hyper-exponential and Erlang distributions, respectively). In algorithm 2, the external flow of arrivals is characterized by a class of nonparametric distributions. We observe then that the difference between the two algorithms lies in the first station. In order to verify whether the proposed intervals contain the approximate value of the characteristic, we apply the following methods: 1. KLB Method Krmer and Langenbach-Belz (1976)(Kramer and Langenbach-Belz, 19761propose the following approximation in the case of the G I I G I I 1 system.
The average number of customers in the system is deduced using Lindley formula : L = XEWf 2. Simulation The simulation method is used for studying the dynamic behavior and analyzing the performances of the system; it allows the analysis of complex situations that cannot be analytically solved. The simulation used for our application is the discrete-event approach.
+ 2.
5
Comments and Interpretation of the results
In our application, we have considered a service rate equal to 1 in all stations (the network is composed by S stations in tandem). Tables (2) and (1) summarize the results obtained by applying the proposed method, for different values of traffic rate: 0.8, 0.5 et 0.2 corresponding to dense, mean and weak traffic respectively. Table2 : The results produced by simulation are near to those produced by the K L B method, with a slight different that sometimes becomes bigger, as for the case of the E2IHI1 et H I H I 1 systems, when the traffic is dense(p = 0.8) and mean (p = 0.5) respectively. The results given by the K L B and simulation methods belong to the proposed intervals(by the bounds), which could mean that these latter could contain the exact value or the approximate one. Table1 : We fix a class of nonparametric distributions to which belongs the external flow. The comparison of the two tables is based on the qualitative character of parametric distributions: the Erlang distribution is N B U E , y - M R L B with y = m a / k and y - M R L A with y = ma, is also D M R L . The deterministic distribution is I F R , the hyper-exponential is D F R , hence N W U E et I M R L . We notice that the two tables are nearly similar.
Conclusion The aim of our study is to evaluate the performance of an open series queuing network characteristics. We have proposed a study taking into consideration the qualitative property of the flows distributions (internal and external). For this, we have considered different bounds (lower and/or upper bounds) for different classes of non-parametric distributions. These bounds are computed after applying the Q N A method (Queueing Network Analyzer), for
634
Recent A d v a n c e s in Stochastic Modeling a n d D a t a A n a l y s i s
determining the internal flows parameters. In order to verify whether the proposed intervals include (contain) the approximate values, we have considered the K L B and the simulation methods. Two algorithms have been constructed for programming the method proposed in the present work. Running these procedures for different values of traffic intensity (dense, mean and weak) allowed us t o present the different possible cases. The obtained results confirm the validity of this method for our network, since the proposed intervals contain(according t o our results) the considered approximate values.
References [Kramer and Langenbach-Belz, 1976lW. Kramer and M. Langenbach-Belz. A p proximate formulae for the delay in queuing system gi/g/l. Proc. 8th Int. Teletraflc. ( I T C ) , Melbourn, pages 1-3, 1976. [Stoyan, 1984lD. Stoyan. Compamison Methods for queues and other stochastic models. J. Wiley and sons, New York, 1984. p0.2 [2.155,4.807] [3.448,7.691] [2.557,5.166] [4.091,8.265] [2.356,4.987] [3.769,7.978] [2.637,5.238] [4.220,8.380] [2.250,3.063] [3.600,4.900] [2.500,3.443] [4.000,5.509] [2.375,3.253] [3.800,5.205] [2.550,3.519] [4.080,5.631] [2.279,5.333] [3.646,8.533] [ 2.681, 5.6921 [4.289,9.107] [2.480,5.512] [3.967,8.820] [2.759,5.765] [4.414,9.224] [2.053,5.931-7]. [3.284,9.490-73. [2.330,6.290-7] 13.728,10.06473 [2.191,6.110-~r [3.506,9.777-73] [2.385,6.361-7] :3.816,10.178-73
Table 1. Bounds of the characteristics obtained by algorithm 2
Performance Evaluation of Tandem Queueing Network System
DIHli
Characteristic1 p=0.5 p=0.8 p0.2 1Ts-inf.Ts-sup 712.875.21.6701104.250.07.500'100.425.05.2191 [L-infiL-supj ji0.300;17.34oj [02.120,03.750[00.5io,oi.3ioj Ts-KLB 20.569 06.318 03.260 L-KLB 03.159 16.455 00.652 W-Simulation 19.623 05.687 03.260 L-Simulation 02.843 15.698 00.652 [Ts-inf,Ts-sup. [14.430,23.220] [05.250,07.656'[02.675,05.604] [L-inf,L-sup] [11.550,18.580] [02.625,03.880[00.530,01.020] 22.257 Ts-KLB 07.166 03.782 L-KLB 03.583 17.865 00.756 W-Simulation 18.600 05.982 03.093 L-Simulation 02.991 14.880 00.618 jTsinf.Ts-supi )16.000,24.776] [05.500,08.187j[02.875,04.199] .. . [ ~ - i n f ; ~ - s u p ][12.800,19.820] [02.750,04.093! [00.575,00.839] Ts-KLB 07.782 23.789 04.135 L-KLB 03.891 19.031 00.827 W-Simulation 08.025 23.950 03.550 L-Simulation 04.012 19.160 00.710 [Ts-inf,Ts-sup] 110.345,12.207] [01.375,06.000] 00.362,04.216] [L-inf,L-sup] [08.267,09.765] [00.687,03.000] 00.072,00.843] Ts-KLB 11.665 04.340 03.033 L-KLB 02.170 09.332 00.606 W-Simulation 12.214 04.968 03.857 L-Simulation 02.484 09.771 00.771 1Ts-inf,Ts-supl 112.670,13.600] [03.687,06 .OOO] 01.972,04.213] [L-inf,L-supj. [10.130,10.880] [01.843,03.000] 00.390,00.840] 13.416 Ts-KLB 04.340 03.380 L-KLB 10.732 02.635 00.676 W-Simulation 13.425 04.298 03.408 L-Simulation 10.740 02.148 00.681 [Ts-inf,Ts-sup] [07.500,18.724] [03.000,08.312]01.875,05.510] [L-inf,L-sup] [06.000, 14.9791 [01.500,04.156]00.375,01.102] Ts-KLB 17.698 06.886 03.675 L-KLB 14.158 03.440 00.795 W-Simulation 06.053 16.570 03.524 L-Simulation 13.256 03.026 00.704 [Ts-inf,Ts-sup] ~15.000,15.000] ~06.000,06.000~ 03.750,03.7501 [L-inf,L-sup] [ 12.000,12.000][o3.ooo,o3.oooj00.750,00.75oj Ts-KLB 06.000 15.000 03.750 L-KLB 12.000 03.000 00.750 W-Simulation 14.667 05.669 03.957 L-Simulation 11.733 02.843 00.791 [Ts-inf,Ts-sup] [09.640,10.584]04.562,05.250]03.377,03.673] [L-inf,L-sup] [07.712,08.457] 02.288,02.625]00.677,00.734] Ts-KLB 10.393 3.554 L-KLB 8.314 2.528 0.701 W-Simulation 10.54 5.170 4.210 L-Simulation 8.432 2.585 0.842 :Ts-inf,Ts-sup] 13.500,28.5001 04.500,10.550 12.250, 06.000] [L-inf,L-sup] 10.800,17.3401 02.250, 05.250 10.450,01.200] Ts-KLB 26.524 8.707 4.384 L-KLB 21.219 4.354 0.876 W-Simulation 22.687 9.210 3.298 L-Simulation 4.fins 18.149 0.659 I
&/HI1
Table 2. Bounds of the characteristics obtained by algorithm 1
635
Assessment of groundwater quality monitoring network based on information theory Malgorzata Kucharek’ and Wiktor Treiche12 ‘Institute of Environmental Engineering Systems Warsaw University of Technology e-mail: [email protected] 21nstituteof Environmental Engineering Systems Warsaw University of Technology e-mail: [email protected]
Abstract. This paper presents the method of assessment of groundwater quality monitoring network takmg into account quantity of information, which can be delivered to the control system. This study was carried out on groundwater monitoring network of post-flotation waste disposal site, called “Zelazny Most”. This paper shows the pilot study going to reorganization of existing groundwater quality monitoring network. Farther, different scenarios of verification the existing monitoring network are proposed. The density of monitoring network surrounded the reservoir are also considered. The values of transinformation were used to evaluation the amount of information providing by groundwater quality monitoring network. Keywords: Information entropy, transinformation, groundwater quality monitoring network.
1.
Introduction
The theory of information allows to apply new methods in assessment of groundwater monitoring network. One of the most important goal in designing monitoring network is deciding how many information should be provided by monitoring network to control system. Design and operating of groundwater quality monitoring is a complex and difficult task and also produce respectable costs. At the beginning, one should make a decision what kind of parameters must be measured, how oRen pick data up and where control points should be situated. All those parameters strongly affect the cost of network operation but they are also important for information value of the network. Mostly, methodology which allows to evaluate the cost of monitoring system is well established and easy to apply. The problem appears when we need to find quantity criteria to assess quality of groundwater monitoring network. If we assume that the monitoring network is a signal communication system capable of providing hydrological information, we can use the assessment criteria derived from Shannon [C.E. Shannon, et all, 19491 information theory. 636
Assessment of Groundwater Quality Monitoring Network
637
The appearance and development of the information theory has began in 1949 by Shannon [C.E. Shannon, W. Weaver, 19491 work. As the first, he published a basic paper on the mathematical theory of information and introduced the notion of entropy and amount of information. The quality of monitoring network should be evaluated by criteria that allow to answer the following questions: does the quantity of information which can be obtained from network is enough and does the information correspond to the expectations. So far, in a lot of cases, it can be noticed large set of data and huge amount of work and money spent on groundwater monitoring. Apart from so many efforts, the information about quality of groundwater is rather pure. Some methods relating to Shannon information theory, were developed to assess quality of monitoring network [W.B. Harmancioglu, N. Alpaslan, 19921, [N.B. Harmancioglu, S.D. Ozkul, 20031, [D.C. Mays, et al, 20021, [Y. Mogheir, V.P. Singh, 20021, [Y. Mogheir, et al, 20031, [L.M. Nunes, et al, 200411. The works show application of the information theory into water quality monitoring network design. The entropy - based methods are used to recognized structure of spatial variability and minimize redundant information. Furthermore, special structure of the groundwater quality data can produce maps permitting on complex verification of monitoring network stress on protection area at risk. In this article, the main task is to investigate the stochastic dependence between different water quality variables. In this aim we use a fundamental criterion derived from information theory that is value of transinformation (mutual information). Transinformation measures the amount of information shared between two random variables and can be interpreted as the reduction of uncertainty of X due to the knowledge of variable Y. This index was calculated for 5 5 location of the sampling points where concentration of Sodium (Na) and Cuprum (Cu) was measured. Our study involved the groundwater monitoring network serving a reservoir (called “Zelazny Most”) which receives post-flotation contaminants originating from copper ore treatment. The reservoir has been classified as one of the world largest industrial waste disposal site.
2.
Groundwater monitoring network
The object of interest is located in the West-South part of Poland (“Location map of reservoir “Zelazny Most”“ - Fig. 1). The reservoir occupies 1400 ha of surface and volume of the deposited contaminants reaches 350 mln m3. The waste disposal was establish in 1974 and annual deposition of post-flotation contaminants is equal about 27 million tons. From the beginning of 1980 the groundwater quality monitoring is operated. The monitoring area is divided into tree local administrations.
638
Recent Advances in Stochastic Modeling and Data Analysis
ig. 1 - Location map of reservoir "Zelazny Most" The reservoir is surrounded by safety zone from 500 meters to 1500 meters. The groundwater monitoring network consists of 278 piezometers. Sampling is not regular. There were discovered that some monitoring points were sampling 3 times per year while some are sampling one time per 4 years. The following parameters are collected: color, Calcium (Ca), Cadmium (Cd), Chloride (CI), Chromium (Cr), Cuprum (Cu), iron (Fe), Potassium (K), Magnesium(~g),Magnum (Mn), Sodium (Na), Nickel (Ni), N€&, Nitrate (NO,, NO3), Lead (Pb), pH, Electrical Conductivity, Sulphate (SO4), hardness, alkalinity and total dissolved solids. in this study for fbrthermore investigation, two chemical variables were chosen: Cu [mg/dm3] and Na [mg/dm3]. Data for computational analysis came from 1996 to 2005 period and concerned 55 points ("Groundwater monitoring network" - Fig. 2). Data set included 11 measurements for each of 55 piezometers made as following: 4 measurements made on September from 1996 to 1999,6 measurements made from 2000 to 2004 (two per year) and one series made in 2005. Some small lack of data were interpolated using linear method.
Assessment of Groundwater Quality Monitoring Network x 108 5587-
59(6-~.
I
;
I
*
*
I
I
.
i
*
I
. . . . . . . . ~ ~ i i i r ~ * .* ~ ~ .......... ~ ~ * r............
:c
i
j
0.0
639
..!.L
..j
..............
j
%*
!. 5 9 ( 5 - ~ ~ .. ~.~ ~ : ~ . . ~~~~~~~~~~~~~:~~~~~ . . . .. : .........................
{
., 5486
....................
:
> ................... { ....................
f....................
0 $
1 .................. ;
0
.
"
.............
8
j
.............................
5483
0
:....................
:................... :....................
i 5,582
;
.
:..................
;
2 ;
o
....................................... i.................... i................... ...... 1................................
*
J * * ?
~
4
5 581 -.................
* Fm7
;.. ................
.; .................... ;....................
5 708
5 7m
i....................
;..................
i 5 71
5711
5712
3
Fig. 2 - Groundwater monitoring network The problem of the contaminant concentration is systematically increasing because of increasing amount of waste deposit in reservoir (,'Cu2' concentration in some piezometers" - Fig. 3, "'a' concentration in some piezometers" - Fig. 4). The environmental monitoring in the "Zelazny Most " area is very important due to long exploration of reservoir in the future and real dangers for people and animals living near to this area.
Fig. 3 - Cu2+concentration in some piezometers.
Recent Advances in Stochastic Modeling and Data Analysis
640
lslandard Na’ = 200 m g i d d
1
3000
2000
1000
0 Na-1996
Na.2000
1-1999
Na_2000a
Nac2002
Na.20028
Na_20M
Nd_20@48
Na.2005
Fig. 4 - Na’ concentration in some piezometers
3.
Applied Methodology
The base term of information theory is marginal entropy H(X) [Y.Mogheir, V.P. Singh, 20021. Entropy uses probability distribution hnctions to measure the randomness or uncertainty of a random variable. Particularly, the entropy allows to describe quantity of information comes from random variable. If X is a discrete random variable with probability distribution p(x,,), n=1,2,...,N, then information quantity which comes from observation of X can be calculated as follow: A1
n=l
There are three additional types of entropy measures [Y. Mogheir, V.P. Singh, 2002, Y. Mogheir, et al, 20031: joint entropy, conditional entropy and transinformation (mutual information) associated with stochastic dependency between two variables. The joint entropy measures a total information content in both X and Y and is a hnctional of the joint probability distribution p(x,, y,). The total entropy of two independent random variables is equal to the sum of their marginal entropies. Conditional entropy H(X1Y) is a measure of the information content of X which is not contained in the random variable Y. It represents the uncertainty remaining in X when Y is known. The transinformation is interpreted as the reduction of uncertainty in X, due to knowledge of variable Y.
641
Assessment of Groundwater Quality Monitoring Network
where X and Y are two discrete variables defined in the same probability space with probability p(x,) and p(y,,). The transinformation may be expressed as the difference between the total entropy and the joint entropy of the two dependent random variables X and Y: T(X,Y)
= H(X)
+ H(Y) - H(X, Y)
(3 1
The transinformation for discrete random variables can be expressed using contingency table [Y. Mogheir, et al, 20031. This table is filling by frequency of the value that fall into each possible combination of two categories (e.g Moghier et al., 2003). The joint probability (p(x,,,yn)) is calculated by dividing the cell density by the total of the measurements recorded in one piezometer.
Assessment of groundwater quality monitoring network The transinformation value of Sodium and Cuprum for all control points in monitoring network were obtained using equation (2) by computing marginal and joint probability. The table 1 (“Transinformation value for some measurement points”) presents some results. 4.
lp.
Name
1 2 3 4 ... 53 54 55
P-1OE P 11E P 12E P 16E ...
Po 71E Po 91E Po 9E
Coordinate x [m] 5708663.73 5708545.24 570851 7.14 5709529.46 ...
5708917.04 5708855.25 5708989.40
Coordinate y [m] 55 85916.95 5585893.46 55 85891.35 5585947.96 ...
5586421.58 5586048.84 5586229.65
Transinformation Pats1
0.359 0.191 0.177 0.121 ...
0.383 0.641 0.428
Table 1 - Transinformation value for some measurement points Those results allow us to create contour map of the transinformation (“Contour map of transinformation” - Fig 5). This map can be used for assessing amount of information provided by Na and Cu concentration in groundwater.
642
Recent Advances an Stochastic Modeling and Data Analysis
5708000
5709000
5710000
5711000
5712000
Coordinate x
Fig. 5 - Contour map of transinformation
In Figure 6 (“Transinformation with distance”) the transinformation as a function of distance is presented. As we can see most piezometers are located on distance between 1 m - 1000 m. It can be noticed that in 20 measurement points information provided by Na concentration can reduce uncertainty of Cu less than 30%. There are also piezometers in which amount of information are significant higher. Maximum value of transinformation between Na and Cu is equal 70%. In further analysis, some measurement points located from 1 m to 1000 m far from each other and characterized by high value of transinformation should be considered for reduction from monitoring network. There is also need to investigate the amount of redundant information between all parameters recorded in groundwater quality monitoring network surrounding this waste disposal site.
Assessment of Groundwater Quality Monitoring Network 70.
I
0
I
I
643
I
Oj
0
!.
60 - ...............*..................................,................,................ 0
0
-
50
0
9
:.
O
g s
"O
40
I
;
0 ; .............. ................ ................. j
1
e
i ................;................ :
j
o
j
*.
.t........i ................1................. i ............. 1................
e.* O D
~
;
j
-E 30-s..
.
~
;2
0
;
.......... i................;.................i ................ I ................
'$ g
:.
o ID
0.
M
1...............1.................,...................................,................
:
i
D o
i
b
10 - ..............,................,..................................,................ B
0 0
5.
I
I
wo
I
I
I
2m
3UOO
4000
5
Conclusion
In this paper we show that the information value of groundwater quality monitoring network can be investigated by transinformation index which is derived from Shannon information theory. Based on this criterion it is possible to provide complex verification and assessment of ground water monitoring network. Moreover, the reduction of several parameter measures in favour of regularity measurements can be considered. This paper shows the pilot study going to reorganization of existing groundwater quality monitoring network. A reliable assessment of the performance of groundwater quality monitoring network will be of considerable help to environmentalists whose decisions about abating the environmental impact due to such industrial waste disposal site must be based on credible information.
Acknowledgments This research has financial support from Polish Ministry of Science and Higher Education, project number 1 T09D 010 30. The authors acknowledge KGHM Polska Miedz for providing data for this study.
644
Recent Advances in Stochastic Modeling and Data Analysis
References
[N.B. Harmancioglu, N. Alpaslan, 19921 N.B. Harmancioglu, N. Alpaslan, - Water quality monitoring network design: A problem of multi - objective decision making, Water Resources Bulletin, 1992, vol. 28, No. 1, pp.179-192. [N.B. Harmancioglu, S.D. Ozkul, 20031 N.B. Harmancioglu, S.D. Ozkul, - Entropybased design considerations for water quality monitoring networks, In: N.B. Harmancioglu et al. (Eds.) "Integrated Technologies for Environmental Monitoring and Information Production", Kluver Academic Publishers, Dordrecht 2003, pp.119-138. [D.C. Mays, et al, 20021 D.C. Mays, B.A. Faybishenko, S. Finsterle - Information entropy to measure temporal and spatial complexity of unsaturated flow in heterogeneous media, Water Resources Research, 2002, vol. 38, No. 12, 1313, pp. 49.1-49.11. [Y. Mogheir, V.P. Singh, 20021 Y. Mogheir, V.P. Singh - Application of information theory to groundwater quality monitoring networks, Water Resources Management, 2002, vol. 16, No. 1, pp. 37-49. [Y. Mogheir, et al, 20031 Y. Mogheir, J.L.M.P. De Lima V.P. Singh - Assessment of special structure of groundwater quality variables base on the entropy theory, Hydrology and Earth System Sciences, 2003, vol. 7, No. 5, pp. 707-721. [L.M. Nunes, et al, 20041 L.M. Nunes, M.C. Cuhna, L. Ribeiro - Groundwater monitoring network optimization with redundancy reduction, Journal of Water Resources Planning and Management, 2004, vo. 130, No. 1, pp.33-43. [C.E. Shannon, W. Weaver, 19491 C.E. Shannon, W. Weaver - The mathematical theory of communication, The University of Ilionois Press, Urbana, Ilionois, 1949.
Improving type I1 error rates of multiple testing procedures by use of auxiliary variables Application t o microarray data Maela Kloareg' and David Causeur1i2 IRMAR, UMR 6625 CNRS Agrocampus Rennes, Laboratoire de MathCniatiques Appliqukes, CS 84215, 65 rue de St-Brieuc 35042 Rennes cedex (e-mail: maela.kloaregQagrocampus-rennes . fr) CREST-ENSAI, France (e-mail: david . causeurOagrocampus-rennes .fr) Abstract. Simultaneous tests of a huge number of hypotheses is a core issue in high flow experimental methods. In the central debate about the type I error rate, [Benjamini and Hochberg, 19951 have provided a procedure that controls the now popular False Discovery Rate (FDR). The present paper focuses on the type I1 error rate. The proposed strategy improves the power by means of moderated test statistics integrating external information available in a double-sampling scheme. The small sample distribution of the test statistics is provided. Finally, the present method is implemented on transcriptomic data. Keywords: Auxiliary covariate, Double-sampling, False Discovery Rate, Multiple tests, Non-discovery Rate.
1
Introduction
Although multiple testing issues have been widely discussed in the statistical literature for a long time, novel approaches have emerged in recent years to face situations where the number of tests is especially huge. Simultaneous tests have for instance become one of the core issues in the analysis of gene expressions measured in microarray experiments. Here, the main goal is to identify the genes that, show good evidence of being differentially expressed under two conditions (eg. treatments, genotypes or times in kinetic studies). In the recent discussions about a type I error rate that would yield to less conservative decision rules than the traditional Bonferroni strategy, a major innova.tion has come from [Beiijamini a.nd Hochberg, 19951, who define the false discovery rate (FDR) as the proportion of true Ho among the tests for which Ho is rejected. [Benjamini and Hochberg, 19951 also provide a decision rule that is shown by [Benjamini and Yekutieli, 20011 t o control the FDR under a large class of positive dependency between the test statistics. Generally, attempts to improve the existing methods involve a better knowledge in the responses' dependency structure. Unfortunately, the high 645
646
Recent Advances in Stochastic Modeling and Data Analysis
dimensionality of the data usually prohibits the modelling of the whole set of variables’ joint distribution. As mentioned by [Kendziorski et al., 20031, treating variables as independent tend to be less efficient than some Bayesian approaches which take advantage of the shared information between variables. Some authors (see for instance [Lonnstedt and Speed, 20021) proposed moderated versions of the t-statistic where the variable-specific variance estimator that appears in the denominator is augmented by a constant derived from the data of all variables. In many situations, relating the responses t o auxiliary variables can also give insight into the correlation structure of sets of variables. For instance, in the case of transcriptomic data, phenotypic variables, often much easier t o obtain than microarray data, can help interpreting the correlation structure between gene expressions. Integrating biological relevant knowledge and gene expressions is still not usual in the differential analysis, though more traditional in exploratory data analysis. The aim of our paper is to propose a testing method, based on moderated t-statistics that integrates external information to improve the power of the usual testing strategies. This external information is supposed t o be available in the sample for which the core variables concerned by the tests are measured but also on additional items for which the responses are not measured. Improving inference in such a double-sampling framework is not new in some areas of statistics. However, such sampling strategies are usually dedicated t o improvements of estimation procedures and more rarely t o testing issues. In a multivariate regression framework, many papers have dealt with the optimal allocation of the measurements of the outcome and of the auxiliaxy variable (see [Conniffe, 19853; [Ca.useur, 20051). The starting point of the present paper comes from [Causeur and Husson, 20071 who proposed t o adapt the methodology to testing issues. In section 2, some basics about multiple testing are recalled and the impact of a high correlation on the error rates is discussed. Section 3 is dedicated to the definition of our moderated t-statistics in a double-sampling scheme and section 4 addresses the statistical properties of a double-sampling Benjamini-Hochberg procedure. In the talk, the method will be illustrated by microarray data used to select the genes that affect the degree of muscle destructuration in pigs.
2
Simultaneous test of a large number of hypotheses
xy)
Let be the j t h replicate, j = 1,.. . ,nik)of the kth variable, k = 1,.. . , K , for the ith level of a factor. Hereafter, the case of a factor with only two levels will be considered. The usual framework is assumed, namely yZy) N
N ( p i k ) ;Q). The main goal is to point out the variables Y(’) for which the
null hypothesis H r ’ : p y ) = p p ) , k = 1,. . . , K , has t o be rejected in favor of the alternative hypothesis If!‘) : p y ) #
1.1“).
Improving Type II Error Rates of Multiple Testing Procedures b y 647
2.1
The False Discovery Rate (Benjamini and Hochberg, 1995)
Most of the multiple testing strategies are based on the ranked p-values p l 5 p2 5 . . . 5 p~ of the t-tests used to compare the mean levels of the K variables under both conditions. Basically, procedures rely on the choice of a. cut-off t such that, if p," 5 t , Hi,") is rejected. For ea.ch cut-off t , call & the number of false discoveries (or false positives), namely the number of variables for which H r ) is rejected although it is true. Call also Rt the observable number of variables for which Hi,") is rejected. The False Discovery Rate FDRt for the cut-off t is defined by [Benja.mini a.nd Hochberg, 19951 a s the expected rate of false discoveries among the variables for which the null hypothesis is rejected:
[Benjamini and Hochberg, 19951 suggest t o choose t among the ordered pvalues p k . Suppose first t,hat the number mo of true Hi,")is known. If t = pk, then Rt = k and, assuming the p-values are independently and uniformly distributed, an intuitive estimator of FDRt is given by FDR, = m o p k / k . Now, if k* denotes the largest k such that FTRpk 5 a , then the cut-off is p k ' . Usually, K - mo is negligible with respect t o K which allows the replacement of mo by K in the former procedure. Under a quite general assumption of positive dependency, [Benjamini and Yekutieli, 20011 show that such a procedure controls the FDR at level a. A
2.2
The Non-Discovery Rate
The discussions about simultaneously testing many hypotheses have so far focused on Type I error rate. [Dudoit et al., 20031 have however explored different definitions of the power of a multiple testing st#rategy.Among these definitions, 1 - E(Tt/ml) has been widely used, where ml is the number of true H i k ) and Tt is the number of non-rejected H r ) that should have been rejected (false negatives). Hereafter, NDRt = E(Tt/ml)will be used as the type I1 error rate. 2.3
Impact of a high correlation on the type I1 error rate
The following simulation study is intended t o show the impact of a high correlation on the NDR. First, in 1000 datasets with two groups of n = 10 rows, 100 independent variables are simulated according t o a normal distribution with standard deviation 1 and expectation 0 for half of the variables. For the remaining 50 variables, p p ) - p p ) is set to 1.25. In 1000 other datasets, the same feature is reproduced except that each pair of variables has the same
648
Recent Advances in Stochastic Modeling and Data Analysis Within-block correlation=0.00
Within-block correiation=0.90
I
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0 0.1 0.2
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Rate of false negatives
Rate of false negatives
Within-block correiation=0.00
Within-block correlation=0.90
0
'1 xd $ -
a ,
0,000
Rate of false positives
I
t
0.010
#
I
0,020
,
,
,
0.030
,
I
0.040
1
0.050
Rate of false DOSitiVeS
Fig. 1. Distributions of the rates of false negatives and false positives.
intra-group correlation, p = 0.90. For each dataset, a Benjamini-Hochberg procedure is performed with a control of the FDR at level Q! = 0.05 and the rates of false negatives and false positives are calculated. Figure 1, displaying the histograms of the error rates, shows a much larger dispersion of the distribution of the error rates when the variables are highly correlated. In other words, in the case of a high correlation, the type I1 error rate is much more unstable than in the opposite case of independence. Note that, the expectation of the rate of false negatives, namely the NDR, is however, roughly speaking, the same.
3
Testing in the presence of an auxiliary covariate
Although there is no methodological concern considering the case of many auxiliary covariates, the present paper focuses on the situation of only one covariate 2.
Improwing Type II Error Rates of Multiple Testing Procedures b y 649 3.1
Moderated t-statistics
Suppose that measurements Z i j , j = 1,.. . , Ni, of Z are available on a sample ) measured. containing the n ( k )= n y ) + n p )items on which the variable Y ( k is In the following, Zij is assumed to be normally distributed with mean pi and standard deviation CT. Hereafter N = N1 N2 denotes the size of the wider sample. Call Yik)and Zn(k) the n(k)-vectors of the observations of the outcome and the covariate respectively on the sample of size n(‘). The above double-sampling context is a particular case of the general situation described in [Causeur and Husson, 20071 where the test of a General Linear Hypothesis, here H i k ) : p y ) = p p ) against H j k ) : p y ) # p p ) is considered under the assumption of a non-diagonal covariance structure involving 2. If the variance parameters are assumed to be known, the likelihood ratio-test statistic T ( k )resulting from [Causeur and Husson, 20071 can be expressed as follows:
+
[Yp- Yp]+ [&?{ [Zl - 2 2 1 T(”(pk,
gki 0)=
ffk
em ~
[ZI- 221) 1
(1)
is the intra-condition correlation between Y ( k )and 2 , f j k ’ = n i k ) / N , , f ( k ) = n(’)/N are respectively the intra-condition and global sampling fractions, and zi are the intra-condition means of 2 on the samples of size Ni and nik) respectively. Note that, if (& = 0 or if the sampling fractions are 1, T k coincides with the usual t-statistic T i k ) derived on the small sample, which means that no improvement is t o be expected from the covariate. The moderated t-statistics, ?k = T ( / i k , &, 6 ) , integrating the measurements of the covariates, are obtained by plugging in the ML estimators, given by [Causeur, 20051, of the variance parameters in expression (1). where
Pk
zi
3.2
Small-sample distribution
In some traditional fields of application of the multiple testing methods, a very small number of replications in each groups is rather frequent. Therefore, there is an actual need in a non-asymptotic approximation of the moderated test statistics’ distribution. The result that is given hereafter concerning this distribution is not proved here since it is deduced from [Causeur and Husson, 20071. Let us define the random variate T,,~(pi) as follows: T , , N ( ~ ;=)
650
Recent Advances in Stochastic Modeling and Data Analysis
where TI, Tz and T3 are mutually independent. In addition, if 6 p ' = ( p p )p y ) ) / f f kdenotes the standardized difference between the iiieaii levels of the kth response variable in both conditions, TI is distributed according to a normal distribution with expectation
and standard deviation 1. T2 is distributed according to a x:,+,~-~ distribution. Suppose now that B and S are independent random variates following respectively a B ( [ n ? ) + n p ) - 2 ] / 2 , [ N 1 + N 2 - n ( , " ) - n Y ) ] / 2 ) and then T3 is conditionally distributed, given B and S , a s the ratio between a non-central chi-square variable with 1 degree of freedom and non-centrality parameter [pE/(1 - pg)]BSand B . The moderated t-statistic T k and T , , ~ ( p i )have the same limiting distributions when n and N are large and moreover, even in small-sample conditions, ) the distribution of T k . the distribution of T , , N ( ~ $approximates
3.3
Power of the double-sampling test
First, let us consider that the variance parameters are known. It is straightforward checked that the distribution of T ( k ) ( p k ,ffk., 0) is then normal with mean 6n,N(pk) where 6 , % ( p k ) can be expressed as a convex linear combination of 612 = 6 , % ( 0 ) and 6G2 = 6;,&(1): 6;,%(pk) = (1 - p2)S,2
+ pZ6G2.
(3)
Note that 6, and 6~ are the expectations of the test statistics calculated on the sample of size n ( k )and N respectively. Therefore, expression (3) implies that the power of the double-sampling test is always larger than the power of the test based on the small sample only (equality holds if /& = 0 ) and always smaller than the test that would be based on the sample of size N (equality holds if pk = 1). When the variance parameters are no more assumed to be known, the preceding result remains true asymptotically. The left plot of figure 2 displays the power functions for various values of pk together with the power function of the t-test on the small-sample in the case n(,") = n y ) = 3 and N1 = NZ = 20. On the right plot, showing the same functions with n(,")= n r ) = 5 , the difference between the power functions of the single-sampling t-test and the double-sampling t-test for values of pk close to zero is much thinner.
4
Double-sampling Benjamini-Hochberg procedure
The double-sampling multiple tests procedure only differs from the Benjamini-Hochberg approach described in section 2 by the variable-by-variable
Improving Type 11 Error Rates of Multiple Testing Procedures b y 651
I 0
1
2
3
4
5
00
05
10
b
15
20
25
30
b
Fig.2. Power functions for the double-sampling test. On the left plot, nj")= n r ) = 3 and N1 = N2 = 20. On the right plot, nj") = = 5 and N I = N2 = 20.
ng)
tests on which it is based. The following simulation study aims at showing the impact of a relevant auxiliary variable on the power of the procedure. For various values p, 1000 datasets are simulated, with two groups of n1 = 712 = 10 rows and 100 variables, norinally distributed with a null difference between the means in both groups for half of t,he variables and a difference of 1.25 for the second half of the variables, standard deviation 1 and an equal intra-group correlation p with an auxiliary covariate 2.2 is itself normally distributed with standard deviation 1 and iiiean 0 for itmeinsin the first group and 2 in the second group. Nl = N2 = 75 observations of 2 are available in each group, among which the 711= n2 = 10 rows for which the responses are observed. For each dataset, both the usual Benjamini-Hochberg procedure based on the single-sampling t-tests and the modified procedure based on the double sampling-scheme are performed with a control of the FDR at level Q = 0.05. Figure 3 shows the decrease of the mean NDR of the double-sampling procedure when p increases. It also shows that the mean NDR of the singlesampling strategy remains quite unchanged for all the values of p. In fact, the perturbed form of the graph is due to the high dispersion of the distribution of the rate of false negatives that was already mentioned on section 2. Figure 3 also displays histograms of the rates of false negatives for the two approaches for p = 0.6 and confirms this difference in terms of dispersion.
5
Illustration
In the talk, phenotypic variables are used as covariates t o improve the power of the differential analysis in a study of microarray data dedicated to muscle destructuration in pigs.
652
Recent Advances in Stochastic Modeling and D a t a Analysis Within-block carrelation=0.60 Single-sampling
Within-block carrelatian=0.60 Double-sampling
*-
--I
5 : s "gle-rampimg Double sampling
00
02
04
06 P
08
10
00
02 0 4
06 0 8
Rate (11 false negatives
10
00
0.2
04
0.6
08
1.0
Rate 01 false negatlves
Fig.3. Left plot: NDR for various values of p. Right plot: Histograms of the rates of false negatives for the single-sampling and the doublesampling approaches ( p = 0.6)
References [Benjamini and Hochberg, 1995lY. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57:289-300, 1995. [Benjamini and Yekutieli, 2001lY. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependence. Annals of Statistics, 29:1165-1188, 2001. [Causeur and Husson, 2007lD. Causeur and F. Husson. Asymptotic distribution of double-sampling tests for general linear hypotheses. Statistics, To appear, 2007. [Causeur, 2005lD. Causeur. Optimal sampling from concomitant variables for regression problems. Journal of Statistical planning and Inference, 128:289-301, 2005. [Connifle, 19851I). Conniffe. Estimating regression equa.t,ionswith common explanatory variables but unequal numbers of observations. Journal of Econometrics, 27~179-196, 1985. [Dudoit et al., 2003]S. Dudoit, J. P. Shaffer, and 3. C. Boldrick. Multiple hypothesis testing in microarray experiments. Statistical Science, 18:71-103, 2003. [Kendziorski et al., 2003lC. Kendziorski, M. Newton, H. Lan, and M. Gould. On parametric empirical bayes methods for comparing multiple groups using replicated gcnc expression profiles. Statistics in Medicine, 22:3899-3914, 2003. [Lonnstedt and Speed, 2002]1. Lonnstedt and T.P. Speed. Replicated microarray data. Statistica Sinica, 12:31-46, 2002.
Author Index F Fabian, Z., 43 Falukozy, T., 102 Farmakis, N., 390
A Adjabi, S., 636 Aguilera, A. M., 537 Angulo, J. M., 296 Arat6, M., 102 Atsalakis, G. S., 414 Avi, J. R., 35 Ayramo, S., 473
G Garg, L., 146 Giovanis, A. N., 277 Gozzi, G., 382 Grossi, L., 382 GuCgan, H., 57 1 GutiCrrez, R., 296 GutiCrrez-Shnchez, R., 296 Gyarmati-Szab6, J., 137
B Bagkavos, D. I., 330 Bartkute, V., 91 Ben Aissa, A., 406 Benhenni, K., 442 Bitziadis, D., 221 Bogdanovich, V. A., 82 Braimis, I., 414 Burns, L., 154, 162
H Hao, T. V., 454 Hardouin, C., 371 Heuchenne, C., 251
C Caiado, J., 542 Causeur, D., 645 Chavent' M., 571 Coppola, M., 122 Crato, N., 542 Cuvelier, E., 2
I Iliopoulos, G., 213 Ioannides, D., 330 Isar, A., 621
J JimCnez, M. J. O., 35 Jizba, P., 578 Joannides, M., 605
D Darowski, M., 338 De Mouzon, O., 406 Di Lorenzo, E., 122 Dinu, L. P., 512, 521
K Kabachi, N., 529 Kalamatianou, A., 330 Kanniainen, J., 18 Karkkainen, T., 473 Katsirikou, A., 423 Khaldi K., 229 Kloareg, M., 645
E El Faouzi, N.-E., 406 Eleftheriou, M., 390 Enachescu, D., 512,521 Escabias, M., 537 653
654
Recent Advances in Stochastic Modeling and Data Analysis
Klonowski, W., 338 Kokolakis, G., 68 Kopytov, E., 268 Korngreen, A., 5 1 Kouvaras, G., 68 Kozarski, M., 338 Kucharek, M., 636 Kuentz, V., 57 1
L Lagha, K., 630 Lamond, B. F., 398 Lamure, M., 529 Lapuyade-Lahorgue, J., 234 Larramendy-Valverde, I., 605 Lavergne, C., 312 Le, T. V., 529 Lombardi, L., 483 Longden, A., 322 Lopez, O., 259 Lundgren, R., 613 M Majava, K., 473 Malefaki, S., 213 Manteiga, W. G., 251 Marion, J.-M., 597 Mirkus, L., 137 Marshall, A. H., 154, 162, 172 Matalliotakis, G., 360 McClean, S., 146 Meenan, B., 146 Meletiou, A., 499 Mereu, F., 483 Michalski, A., 553 Millard, P., 146 Miskinis, P., 10 Moga, S., 621 Montoro-Gazorla, D., 76 Muller, U. U., 589 N Negrea, R., 26 Nikulin, M., 243 Noirhomme-Fraiture, M., 2
0 Orlando, A., 122 P Papadopoulou, A., 206,221 Pastore, M., 483 Patouille, B., 571 Peled, N., 5 1 PCrez-Oc6n, R., 76 Pettere, G., 114 Pieczynski, W., 234 Pribytkova, I., 304 Pumo, B., 597 Pya, N., 243 R Rachdi, M., 442 Rakonczai, P., 198 Rivas-Moya, T., 491 Roldbn, C., 296 Rossi, V., 605
S Siez-Castillo, A. J., 35 Saidane, M., 3 12 Sakalauskas, L., 91 Sinchez, A. C., 35 Sandmann, W., 434 Santalova, D., 268 Saracco, J., 571 Sau, J., 406 Schick, A., 589 Segovia, M. C., 76 Sellero, C. S., 251 Shaw, B., 154,172 Sibillo, M., 122 Sirirangsi, P., 562 Skiadas, C. H., 287, 350, 360,414 Skiadas, H. C., 350, 360 Soos, A., 342 Stanimirovic, Z., 464 Stepien, R., 338 Symeonaki, M. A,, 182
Index 655
T Tewfik, K., 59 Thong, N. H., 454 Treichel, W., 636 Tsaklidis, G. M., 90,206,221 Turbillon, C., 597
VitCz, I., 102, 130 Voinov, V., 243 Vostretsov, A. G., 82
w Wefelmeyer, W., 589
V
Z
Valderrama, M. J., 537 Vasiliadis, G., 190 Vassiliou, P.-C. G., 182
ZemplCni, A., 198 Zhang, L. F., 322 Zhu, Q. M., 322