This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
{w,u) is u in the real world w. Let O be a weighting function, Q : W— [0, 1] such that V £2. =\. The model of qualitative fuzzy logic with the weighting function is a quaternion M=< W, R, I, Q>, where W is a non-empty set of possible worlds; J? is a binary equivalence relation on W and O is the weight of possible world. Let / be a mapping: I:W* {
(w„ ui) is a formula;
105 (2) If G/, Cw"are qualitative fuzzy logic formulae then ~C W ', G/VG/', G/A G/', Gv,-*Gv" and Gv'<-> Cw" are qualitative fuzzy logic formulae. (3) If Cw(x) is a formula, * is a free variable of Cjx), then (Vx) (Gv(x)), (3x) (Cw(x)) are formulae. (4) All the propositional formulae are generated by applying the above rules with finite times. Remark 2.1 If there is one element in the possible world and « e [0, 1], then the WQFL proposition is a classical fuzzy proposition. So the WQFL is an extension of the classical fuzzy logic. Let 4> (w\ it1) and V (w", u") be atom formulae in WQFL where w'e [w"] R. The valuation of the logical connectives is defined as follows, for a given w e W: (1) V£~ 4>(w',u'))=l-u' (2) FA <*>««')V w(w",u"))=max{u',u"} (3) V£
" can be defined by "~," "V," "A." Definition 2.4 For a given equivalence relation R in W, the truth value of a WQFL formulae Gv is defined as follows:
Let Cw' and Dw»be two formulae of WQFL where w' e [w"] R. The valuation of the logical connectives is defined as follows, for a given wG W; S is the set of individual symbols: (4) FK-CVH-FAC,/); (5) FXClvVDlv..)=max{ P^CV), V,(DW.)}; (6) K/(Cl/AAv..)=min{ FXCW.), FXZV)}; (7) F A C v ' - A ^ FA-CVVZV); (8) ViCw <-> ZV)= Vi{Cw^Dw.) A (£)W.^CW.)); (9) KX( Vx)(C„M)=inf { ^ ( C J x ) ) } ; (10) V,((3x) (Cw(x))=sup{Vl(Cw(x))}
•
xeS
Theorem 2.1 Let
(2)EWtA(CWiADnj)
V g V D ;
CWVDW=DWJACWI;
= (EWtACWi)ADWi,EWiV(CwyDWi)
= (EWk
106 (3) Kk
V (CWIADWJ)-(EWIVCWI)A
(Ewt
V
DW/),
EWt A ( C „ , V ^ H £ , A C W i ) V (E„t ADWj);
(4)cWiAcwrcWi,cwycWi=cWi; (5)CWiAT=CWi,Cwy?=CWi; (6)(CWV
DWj) ACWi =
V)~~CWI
CWi,(CWiADWj)VCwrCWi;
= CWI;
( 8 ) ~ ( C , A ^ ) = ~ C „ V~D W > , ~ ( C „ V D „ ; ) = ~ C „ A~/>„,; (9) (VxXQ, (*)) A (Vx)(DWj (x)) = (Vx)(CWj (x) A DWj (x)); (10) (Bx)(CWi (x)) v (3xXZ\ (x)) = (3x)(CWi (x) v Z^ (x)) ; (11) (VxXC, (x)) v (VxXA., (*)) = (VXXVJO(CW, (*) A DW/ Cv)); (12) pxXQ, (x)) A (3x)(D^ to) = (3x)(3j)(Q: (X) A £>„y (y» Definition 2.5 Let C„, be a PFgFZ. formula and let a e [0, 1]. If there exist Tfla ^
a
f° r a ^ w i e M «
m a
possible world w, then the WQFL formula Cw
is called a -fuzzy true in the possible world [w]R , denoted it by (w, a )-true; Whereas, if there exist STQU ^ a ^or a " w i G H « m t n e possible world w then the WQFL formula Cw is callled a -fuzzy false in the possible world [w]R, denoted as (w, a )-false. The WQFL formula Cw is called a -identically true if for all possible worlds [wi]R in Wthere exist V Q . U . >
a
.
i
The WQFL formula Cw is a -identically false if for all possible worlds [w{]R in Wthere exist V Q . M . ^
a
•
i
Definition 2.6 Two WQFL formulae C n
are said to be equal if w, e
n
[wy]« and V Q M = y1 Q i
and D
U
for all interpretations /, denoted as Cw = Dw .
J
Definition 2.7 R is an equivalence relation on W, w^W, (f(£/)//?={Cw|w,e [wfo.i'e {1,2,...,«}. From above C„, n C„, = ^ and U c^ = (F(U) hold. So
107 (l)If the possible world w £ W is considered a real number on [0,1] and denoted the qualitative fuzzy proposition P(w,u) by wP(u) then the operator fuzzy logic system can be held1'21; n
(2)Assume V[>]. = 1 and w, 5=0 then V(CW)= V[w]. x «. hence the weighting fuzzy logic holds; i=l
(3)Assume fF=[0,l] and w+w^l, so intuitionistic fuzzy logic system can be held where w and u represent true degree and false degree of intuitionistic fuzzy proposition respectively1131. Therefore WQFL is a generalization of classical fuzzy logic and it is more flexible in reasoning. 3. A resolution method Since the resolution principle of classical logic is presented Robinson'141, many methods of fuzzy logic resolution have been discussed'151, such as the resolution method of operator fuzzy logic and the resolution method of intuitionistic fuzzy logic'161 and so on. We will extend the method of fuzzy logic resolution to qualitative fuzzy logic. Definition 3.1 Let
0 has solutions in the interval: (2 — y/E, 2 + i/5), therefore, for c e (0,1], y?(c) — 3 > 0, that is, ai > 3. In addition, inequality
/3) U (4 + 2\/3, +00). Therefore, for c € (0,4 - 2\/3), y(c) - 4 < 0, that is, a\ < 4. - - > e m accordingly. Therefore, Z) is the combination of those variables that explains the greatest amount of variation, and the m-dimensional hyper-plane is the m-dimensional subspace which retains the maximum amount of information about the input data set [9, 10]. The steps involved in PCA are as follows: (1) Normalize the original data set to avoid the much variance of variables and the difference of the measurement units. *j=(* s -*/) / C T ', (' = l,2,"-,/?;y' = l,2,--,n) where xtj is the original data of the /* variable and/ h class, and xt and ai are the sample mean and standard deviation of j * index, respectively. (2) According to the standardized data set (**) , calculate the correlation matrix R = (riJ)pxp, where n *=i
135 Define 0(c) = | (2(c + 4) - Vc 2 + 8c + 4). Inequality 0(c) - 3 > 0 has solutions in the set: (—oo, 2 - \/5) U ( 2 + \ / 5 , +OO). Therefore, for c G (0,1], 0(c) - 3 < 0, that is, a2 < 3. Consequently, for c € (0,4 - 2%/3) and a £ (ai,4], it follows from (7) that
2)L
(Uj=iPj\)-
For
1(2 - a)(l - 2e)| > 1 and a > ax > 3, that is,
fy ^ 0 (i = 1, 2,..., L), and 1 - 2z 2 < 0, one has det[£>F(y 2 )] ^ 0. Let xo = y 2 , x i = y j , for |(2 — a ) ( l — 2e)\ > 1, a G (oi,4] and c £ (0,4 - 2%/3), one has that F ( x i _ i ) = x, and d e t D F ( x i _ 1 ) / 0 for i = 1,2, and moreover, F m ( x 0 ) = x*, xo / 2:' with m — 2. Thus, condition (2) of Theorem 1 is also satisfied. Consequently, CML (2) is chaotic in the sense of Li-Yorke. The proof is completed. 4. S i m u l a t i o n According to Theorem 2, e, a and L are assumed as 0.95, 4 and 32, respectively. The Lyapunov exponents of the CML with these parameters are plot in Fig. 1. It is shown that all the Lyapunov exponents are positive. Thus, the CML is chaotic. The state phase, i.e., the values of the state variables of all sites , is plot in Fig. 2. State variables distribute uniformly in the interval [0,1], which is desirable for applying CML in cryptography. 5. Conclusions A rigorous proof of Li-Yorke chaos in a spatiotemporal chaotic system has been presented in this paper. Meanwhile, a sufficient condition for chaos in the spatiotemporal CML was also derived, which gives a criteria of constructing chaotic CMLs for their applications in some cases where chaos is benefit. 6. A c k n o w l e d g e This research was support by the Germany/Hong Kong Joint Research Scheme (9050180). References 1. K. Kaneko(ecL), Theory and Application of Coupled Map Lattices 1, (1993).
136
5
10
15
F i g u r e 1.
• • : • ; 5
25
30
Lyapunov exponents
= : : ! • : 10
Figure 2.
2. 3. 4. 5. 6.
20
15
' ; : • 20
: ;
i
: : •
25
30
State phase
K. Kaneko, Phys. D34, 1 (1989) K. Kaneko, Phys. Lett. A119, 397 (1987) F.H. Willeboordse and K. Kaneko, Phys. Rev. Lett. 73, 533 (1994) K. Kaneko, Prog. Theor. Phys. Suppl. 99, 263 (1989) H. Shibata, Phys. A292, 182(2001)
137 7. H.P. Lu and S.H. Wang and X.W. Li and G.N. Tang and J.Y. Kuang and W.P. Ye and G. Ku,Chaos 13, 617(2004) 8. P. Li and Z. Li and W.A. Halang and G.R. Chen, Phys. Lett. A, accepted in 2005. 9. P. Li and Z. Li and W.A. Halang and G.R. Chen, Int. J. Bifurcation and Chaos, 16, 2006. 10. P. Li and Z. Li and W.A. Halang and G.R. Chen, Chaos, Solitons and Fractals, accepted in 2005. 11. T.Y. Li and J.A. Yorke, Amer. Math. Monthly 82, 985(1975) 12. F.R. Marotto, J. Math. Analysis and Appl. 63 199(1978) 13. H. Luekepohl, Handbook of Matrices 1998 14. C.P. Li and G.R. Chen, Chaos, Solitons and Fractals 18 69(2003) 15. Y. Shi and G. Chen, Science in China Ser. A Mathematics 34 595(2004) 16. Y. Shi and P. Yu, Dynamics of Continuous, Discrete and Impulsive Systems in press (2005)
ON THE PROBABILITY A N D R A N D O M VARIABLES
ON
IF E V E N T S
B. RIECAN. * Matej Bel University, Tajovskeho 40 SK-97401 Banska Bystrica Mathematical Institute of Slovak Academy of Sciences Stefdnikova 49, SK-81473 Bratislava E-mail: [email protected]
One of the important problems of the theory of IF sets in the creation of probability theory. Recently the family J7 of IF-events was embedded in an MV-algebra, hence many results of the probability theory on MV-algebras can be applied. Of course, the mentioned MV-algebra has some special properties. The aim of the paper is the description of the basic notion of the probability theory on the special MV-algebra.
1.
Introduction
F i r s t recall some basic definitions. B y a n I F - set ([1]) we consider a pair
A = {^A, I>A) of functions fiA, VA : fl —> [0,1] such that VA + VA
< 1-
HA is called a membership function of A, VA a nonmembership function of A. If (Q,,S,P) is a probability space and HA^A are S-measurable, then A is called and IF - event and the probability of A is defined axiomatically (see Definition 2.1). Denote by T the set of all IF-events on fixed probability space (£2, S, P). After some constructive definitions of the notion of probability on F ([5], [6], [15]) a descriptive characterization was given ([8]) and then an axiomatic definition was presented ([9]). Now it is known the general form of all probabilities on the set T ([10]). In Section 2 we show that any probability on T can be described by the help of a probability on the Lukasiewicz triangle ([3], [7]). •WORK PARTIALLY SUPPORTED BY GRANT VEGA 2/2005/02
138
139 The second important notion of the probability theory is the notion of a random variable = a measurable function, hence such / : fl —> R that f~1{A) e. S for any Borel set A € B(R). Following the probability theory on MV-algebras instead of the notion of a random variable we use the notion of an observable as a morphism x : B(R) —> T ([4], [13], [14]). In Section 3 we describe all IF observables in T and prove that to any IF observable there exists their joint observable. 2. Probability Denote by T the family of all IF - events, and by J the family of all compact intervals. In the following definition we shall assume that [a, b] + [c, d] = [a + c, b + d] and [an, bn] / [a, b] if an / a,b„ f b. On the other hand we define (ftA, VA) © (ftByVB) = (l*A © fiB, "A 0 VB) {HA,VB)Q{\XB,VB)
=
{HAQ>IJ-B,VA®VB)
where f®9
= m i n ( / + g, 1)
/ © 3 = max(/ +
fl-l,0).
Moreover {VAn,VAn)
/
{VA,VA)
means fJ-An /
HA,VA„
\
vA.
Definition 2 . 1 . ([9]). An IF - probability on T is a mapping V : T -> J satisfying the following conditions: (i)
P((0,1)) = [0,0])7>((1>0)) = [1,1];
(it)
V{{HA,VA))+V{(IIB,VB))
V({HA,UA))
for any (nA,i/A),
© (HB,UB))
+ V((HA,VA)
= ©
( ^ B , ^ B ) £ T\ («0
(»An,VAn)
/
(»A,VA)
V{{»An,VAn))/V{{llA,VA))-
=*
(»B,VB))
140 In [10] it was proved that to any probability V : T' —• J there exist a,(3 e R,0
= [(1 -a)
f HAdP + a [(1
( 1 - / 3 ) / fiAdP + 0
-
vA)dP,
f(l-uA)dP}.
Moreover, in [12] an MV-algebra M was constructed such that T can be embedded to M. Of course, the aim of the paper is to show that the probability theory on T can be realized without using of MAs a laboratory for T the Lukasiewicz triangle can serve ([3], [7], [11]) A = {(«, v); u, v e R, 0 < u, 0 < v, u + v < 1} endowed with the ordering ( u i . U l ) < (U2,V2)
<^=> Ui < U2,Vi > V2,
and two operations ©, © (ui,Vl)
© (U2,V2) = («1 ®U2,V!
(ui,Vi)0(U2,V2)
= (lil QU2,V!
G>V2) ®V2)
where as before s © t — min(s + t, 1), s © £ = max(s + £ — 1,0). Definition 2.2. Probability on A is any function p : A —> J such that the following properties are satisfied: (i)
p((0,l)) = [0,0] ( P ((l,0)) = [l,l]; (ii)
p({ui,vi))+p({u2,v2))
p({ui,vi))@(u2,v2))+p((u1,vi)0(u2,V2)) for any (wi.wj), {u2,v2) e A; (Hi)
(un,vn)
P((Un,Vn))
/
(u,v) =$> Sp((u,v)).
=
141 If (fi,S, P) is a probability space, and A = ((IA,VA)
—> .T7 we put
I M / M . ^ A ) ) = ( / M^d-P, / i^d-P). Then evidently i>:T -^ A. T h e o r e m 2.3. ^4 mapping V : T' —> J is a probability if and only if there exists a probability p : A —> J such that P = p o tp. Proof. By Theorem of [10] every probability V : T—> J has the form V{(MA,
"A)) = [(1 -a)
f liAdP + a [ (1 Jn Jn
( 1 - / 3 ) / nAdP + 0 Jn Jn
uA)dP,
f{l-vA)dP].
By Theorem 2 of [7] the function p : A —> J given by p{u, v) = [(1 - a)u + Q ( 1 - v), (1 - /?)K + /3(1 - v)] is a probability measure, and evidently P ° ^((AM, »A)) = PW>((/M, VA))) =
= p([ iiAdP, f vAdP) = P((IJ.A,VA))Jn Jn On the other hand, if p : A —* J is any probability, then by Theorem 2 of [7] there exist a, (5 such that p(u, v) = [(1 - a)u + a{\ - v), (1 - 0)u + /3(1 - v)}. Then 7> -> J defined by V{(HA,VA))
= P(TP{{VA,"A)))
=
= p( / / ^ d P , / ^ d P ) = Jn Vn = [(1 - a ) / /MdP + Q ( 1 - / Jn Jn
(1-/3) / iiAdP + P(l-
Jn is a probability by Theorem of [10].
vAdP),
I vAdP)\
Jn
142 3.
Observable
As we have mentioned yet instead of measurable functions / : ft —> R one can consider observables B(R) —> S,A >—> f~1(A). Generalizing this approach we define the notion of IF-observable. Definition 3.1. An IF-observable is a mapping x : B(R) —> J (B(R) being the family of all Borel subsets of R) satisfying the following properties: (t)
x(fl) = ( l n , O n ) ;
(ii) A,BeB(R),AnB
=
x{A) ©x{B) = (0, l),x(AUB) (Hi)
A„ /
A=>
= x(A) ©x(B);
x{A„) /
x(A).
Definition 3.2. The joint IF observable of IF observables x,y : B(R) —> T is a mapping h : B(R2) —> T satisfying the following conditions h(R2) = (ln.On);
(0
(ii) A,B£B(R2),Ar)B
= ®=>
x(A) © h{B) = (0,1), h(A UB)= (Hi)
h(A) © h(B);
An/A^h(An)/h(A).
(iv)
h(C x D) =
for any C,£> G B(fl). (Here (f,g).(h,k)
=
x(C).y(D) (f.h,g.k).)
T h e o r e m 3.3. To any two IF observables x,y : B(R) —» J" iftere exists t/jeir joint ZF observable. Proof. Put x(vl) = (z b (A), 1 - x*(A)),y(B) for fixed u> € fl
= ( ^ ( B ) , 1 - y»(B)), and
At(>l) = x b (yl)( W ),Al(^) = x»(A)(u;), K t ( 5 ) = ^(J5)M,K»,(B) = y»(B)(a,). Then A^,AJ[,,K^,K^ : B(fl) -> [0,1] are probability measures. For C e B(R2) define h(C) =
(h\C),l-h»(C)),
143 where h\C){u)
= (\l
x
KI)(0,
ft'(C)M
= (A«,XK«,)(C)
First we must prove that /^(CO./i^C) : fi —> [0,1] are measurable. If C = A x B, then ftb(^l x B)( W ) = (Xl x , £ ) ( A x B) = = Aj,(A)./£(B) = x\A)(co).y\B)(uj)
=
= ^(A).^(B)M. Since xi(A),y]'(B) are 5-measurable, ft^A x B) : fi -» [0,1] is <Smeasurable as the product of two 5-measurable functions. Since /C = {C € i ? ^ 2 ) ; /i b (C) : fi -> [0,1] is <S-measurable} is a q — cr-algebra containing the family C = {A x B; A € B{R),B e B(R)}, i.e. £ is closed under difference of own subsets and countable unions of disjoint sets. Therefore K, contains the smallest q — cr-algebra over C and it coincides with the cr-algebra B(R2) of all two-dimensional Borel sets ([14], 1.1). Since K. D B(R2), we obtain that hb(C) is <S-measurable for any C e B(R2). The proof for /i»(C) is the same. For to prove that h(C) e T it is necessary to show that ht?{C) + 1 - h*{C) < 1, hence h\C) < 0(C). Of course, we know xb(A) < x*(A), y\B) < yi(B), hence A^ < Aj,, «£ < «£, for any w € fi. We have / ^ ( C ) M = A^ x K ^ ( C ) =
= f nl{Cu)d\l{u) < JR
< f Kl(C")d\i(u) < JR
< [ Kl(Cnd\l(u) JR
=
h\C)(uj).
144 4.
Conclusion
T h e r e is k n o w n a m e t h o d how t o o b t a i n new results of t h e probability t h e o r y on I F events by t h e corresponding results of t h e t h e o r y of M V algebras. In this p a p e r we have shown t h a t t h e m a i n notions of t h e p r o b ability t h e o r y c a n b e described in t e r m s of I F events only. It gives some b e t t e r possibilities for direct applications of t h e probability t h e o r y o n I F events a n d also for image processing p r o b l e m s .
References 1. Atanassov, K.: Intuitionistic Fuzzy Sets: Theory and Applications. Physica Verlag, New York (1999). 2. Cignoli, R., D'Ottaviano, I.M.L., Mundici, D.: Foundations of Many - Valued Reasoning. Kluwer, Dordrecht (2000). 3. Deschrijver, G. - Cornelis, Ch. - Kerre, E.E. Triangle and square: a comparison, Proceedings of the Tenth International Conference IPMU, Perugia, Italy 2004, 1389-1395 (2004). 4. Dvurecenskij, A., Pulmannova, S.: New Trends in Quantum Structures. Kluwer, Dordrecht (2000). 5. Gerstenkorn, T., Manko, J.: Probabilities of intuitionistic fuzzy events. In: Issues in Intelligent Systems: Paradigms (O. Hryniewicz et al. eds.). EXIT, Warszawa, 63 - 68, (2005). 6. Grzegorzewski, P. - Mrowka, E.: Probability of intuitionistic fuzzy events. In: Soft Methods in Probability, Statistics and data Analysis (P. Grzegorzewski et al. eds.). Physica Verlag, New York, 105 - 115, (2002). 7. Lendelova, K., Riecan, B.: Probability on triangle and square. IPMU'2006, Paris, to appear. 8. Riecan, B.: A descriptive definition of the probability on intuitionistic fuzzy sets. In: Proc. EUSFLAT'2003 (Wagenecht, M. and Hampet, R eds.), ZittauGoerlitz Univ. Appl. Sci, 263 - 266, (2003). 9. Riecan, B.: Representation of probabilities on IFS events. Advances in Soft Computing, Soft Methodology and Random Information Systems (M.LopezDiaz et. al. eds). Springer, Berlin, 243 - 246 (2004). 10. Riecan, B.: On a problem of Radko Mesiar: general form of IF - probabilities. Accepted to Fuzzy Sets and Systems. 11. Riecan, B.: On the entropy on the Lukasiewicz square. Joint EUSFLAT LFA 2005, Barcelona, September 7 - 9, 330 - 333, (2005). 12. Riecan, B.: On the entropy of IF dynamical systems. Issues in the representation and Processing of Uncertain and Imprecise Information, EXIT, Warszawa, 328 - 336 (2005). 13. Riecan, B. - Mundici, D.: Probability on MV-algebras. In: Handbook on Measure Theory (E.Pap ed.). Elsevier, Amsterdam (2002).
145 14. Riecan, B. - Neubrunn, T,: Integral, Measure, and Ordering. Kluwer, Dordrecht (1997). 15. Schmidt, E., Kacprzyk, J.: Probability of intuitionistic fuzzy events and their applications in decision making. Proc. EUSFLAT'99, Palma de Malorca, 457 - 460, (1999).
ANOTHER APPROACH TO TEST THE RELIABILITY OF A MODEL FOR CALCULATING FUZZY PROBABILITIES*
C H O N G F U HUANG
D O N G Y U N JIA
Institute of Disaster and Public Security College of Resources Science and Technology, Beijing Normal University Beijing 100875, China. E-mail: [email protected]
In this paper, we suggest a new approach to test the reliability of the interiorouter-set model for calculating fuzzy probabilities. With a sample drawn from a population, we use the model to obtain a possibility distribution on a probability universe with respect to a histogram interval. Then, with N samples drawn from the same population, we obtain N histogram estimates that an event occurs in the same interval. Because the distribution constructed by the histogram estimates is similar to the possibility distribution, according to the consistency principle of possibility/probability, we infer that the model is basically reliable.
1. Introduction It is impossible to precisely estimate a probability distribution of a population with a sample when the probability distribution function of the population is continuous. Using a fuzzy model to deal with the given sample, such as the interior-outer-set model (IOSM)4, we can obtain a fuzzy probability distribution. It is very important to test if the model is reliable. Executing some computer simulation experiments, we have demonstrated5 the reliability of IOSM in terms of the fuzzy expected value. In other words, the demonstration is available only for comparing expected values of a fuzzy probability distribution and a classical probability distribution. Plentiful information in a fuzzy probability distribution has not played an important role in the demonstration. The purpose of this paper is to propose a new approach to test the reliability of IOSM for calculating a possibility-probability distribution (PPD) 6 . * Project Supported by National Natural Science Foundation of China, No. 40371002, and the China-Flanders project BIL 011sll05 entitled "Intelligent systems for data mining and information processing"
146
147 It could be extended to test other models for calculating fuzzy probabilities. Notions related to fuzzy probabilities are introduced in section 2. In section 3, we give a brief survey of the interior-outer-set model. The new approach to test IOSM's reliability is described in section 4. A numerical simulations is shown in sections 5.
2. B a s i c Terminologies 2.1. Uncertainty,
probability
and
possibility
"Uncertainty" has a broad semantic content. When we use the dictionary again to examine these various meanings, two major types of uncertainty emerge quite naturally. They are well captured by the terms "vagueness" and " ambiguity". The concept of uncertainty is closely connected with the concept of information. The amount of information obtained by the action may be measured by the reduction of uncertainty that results from the action. "Probability" is a mathematical measure of the possibility of the event occurring as the result of an experiment. Probability theory is capable of conceptualizing only one type of uncertainty: conflict. Axioms of probability theory do not allow any imprecision in characterizing situations under uncertainty, be it imprecision in the form of nonspecificity or vagueness. "Possibility" is a mathematical measure of the possibility of the object being as a typical object. Possibility theory is capable of conceptualizing another uncertainty: nonspecificity (lack of informativeness). Whenever a basic probability assignment function in the DempsterShafer theory 1 0 induces a nested family of focal elements, we obtain a special belief measure, which is called a necessity measure, and the corresponding special plausibility measure, which is called a possibility measure. The only body of evidence they share consists of one focal element that is a singleton. Since the additivity axiom of probability theory is replaced with the maximum axiom of possibility theory, which guarantees the nested structure of focal elements in possibility theory, the two theories are complementary. Probability is suitable for characterizing the number of persons that are expected to ride in a particular car each day. Possibility theory, on the other hand, is suitable for characterizing the number of persons that can ride in that car at any one time. Since the physical characteristics of a person (such as size or weight) are intrinsically vague, it is not realistic to describe the situation by a sharp distinction between possible and impossible instances.
148 Possibility theory can be formulated not only in terms of consonant bodies of evidence within the Dempster-Shafer theory, but also in terms of fuzzy sets where possibility distributions are in a one-to-one correspondence with fuzzy sets, it is also meaningful to characterize possibility distributions by their degrees of fuzziness. In this paper, we refer to this interpretation of possibility theory as a fuzzy-set interpretation. Klir and Harmanec 8 examined bridge between standard possibility theory 1 2 and probability theory and studied transformations between these two theories. Possibility distribution r and probability distribution p are said to be consistent if it holds that for any event u Prob{u) < Poss{u),
(1)
where Prob denotes the probability measure corresponding to p and Poss denotes the possibility measure corresponding to r. 2.2. Fuzzy
probability
The theory of fuzzy probability was born into a fuzzy community where several researchers started thinking about the probability of a fuzzy event 11 . Gert de Cooman has presented a sound and deep approach 3 to vague probability. Cooman's model follows an approach to modelling uncertainty that was pioneered by Ramsey 9 . The present model also has formal connections with Zadehs fuzzy probabilities 1 3 ' 1 4 , although Cooman believes his model to be fundamentally different, since it has a clear behavioural interpretation, and a calculus that is very different from the one suggested by Zadeh. Engineers refer to fuzzy probability as imprecise probability due to that they haven't any basic probability assignment function in many cases. In this paper, we refer to this interpretation of fuzzy probability as the numerical probability that is a fuzzy quantity defined on the unit interval [0, 1]. The sum of these probabilities is not 1 by the rules of standard fuzzy arithmetic. And, fuzzy probability is characterized by a possibility distribution of probability. 2.3. Possibility-probability
distribution
To avoid any confusion, we restrict ourselves here to study the fuzzy probability that can be represented by a possibility-probability distribution. Definition 1 Let (Q, A, P ) be a probability space, and P be the universe of discourse of probability. Let irx(p) be the possibility that an event occurs
with probability p. Then,
n n ,p = {7rx(p) | x e n , p e ? } is called a possibility-probability
distribution
(2)
(PPD).
It is important to note that, in Definition 1, P is employed to represent a probability measure defining a probability space, and P the universe of discourse of probability. The P P D is a model of the second-order uncertainty 1 along with the first-order uncertainty they may form hierarchical models 2 . 2.4.
Histogram
estimate
Histogram is a model to estimate the probability distribution of an event occurring in some intervals. Let X = {x\,X2, • • • ,xn} be a given sample drawn from a population with P D F p{x). Given an origin XQ and a bin width h, we define the bins of the histogram to be the intervals [xo + mh,Xo + (m + l)/i) for positive and negative integers m. The intervals are chosen to be closed on the left and open on the right. p{x £ Ij) = —(number of xi in the same bin as x),
(3)
is called a histogram estimate (HE) of p{x). 3. Interior-Outer-Set M o d e l Interior-outer-set model (ISOM) is a hybrid model that consists of information distribution method 7 and possibility inference 12 . ISOM is suggested to calculating, with a sample X = {xi,x^, • • • ,xn}, a P P D defined on I x P, where, I = {h,h,
•••Jm)
(4)
and P = {pfc|* = 0Il,---,n} = { 0 , ^ > - - - > l } .
(5)
Let Uj be the midpoint of intervals Ij, A = u3•+1 — ttj, j = 1,2, • • • , j' — 1. Let _ / 1~\0,
9iJ
| Xi - Uj | / A , if | xt - Uj | < A; if \xi-uj\>A.
, . W
Where qtj is called the information gain of that observation Xi distributes to controlling points Uj. It is apparent that for samples within
the intervals Ij, the one with smallest value of q have the highest probability of leaving its interior interval and drift to a neighbor interval. On the contrary, out of interval Ij samples with highest value of q have highest probability of getting in this interval. Let Qj be the list of complemented membership degrees with respect to the information gains, from observations within interval Ij, and let Q t be the list of membership degrees with respect to the information gains, from observations outside interval. Furthermore let sort f [Qj) order the elements of the list according to ascending magnitude and similarly, let sort i (Qj') order the elements of the list according to their descending magnitude. When | Qj | = rij, we can use the formula to calculate a PPD as follows, which is called IOSM. 1st (smallest) element of QJ, if p = 0; 2nd element of Qj, if p = A;
TT/.,
(P)
Last (largest) element of Q , if p ifp=2i; 1, C i 1st (largest) element of Qj i f p = ^n,+2. T 2nd element of Qf, •
Tl
(7)
+ •
if p =
Last element of Q^,
if p = 1.
Then, from a given sample we can obtain a PPD. 4. Description of the New Approach Consider the following case; In an experiment group there are three researchers: (1) A computer scientist who draws N + 1 samples X\, X2, •••, X^, Xiv+i. A sample X consists of n random numbers Xi,i = 1,2, •• • ,n and n is not large. The computer scientist knows that the samples are drawn from a population with density p(x),x 6 R; (2) A statistician who is good at extracting statistical laws through Histogram Method. He does not know where the N samples Xi,X2, • • • ,XN comes from. Studying a sample, he obtain an estimate p(x) to estimate p{x). (3) A fuzzy engineer who is interested in calculating a PPD with a fuzzy model. He also does not know where the sample X^+i comes from. Analyzing the sample, the engineer gives a PPD, i.e., UQ,P to estimate p(x).
151 Let / be an interval with respect to a HE. Vz € / . There is no loss in generality when we supposed that the statistician obtains an estimate pj (x) resulted from sample Xj. Let u>k — Pk{x), the computer scientist obtains a sample: W = {wi,w2,---
,WN}
(8)
Employing a reasonable mathematical statistics method (again histogram model is the simplest method), with W the computer scientist can obtain an estimate f(p) of a probability distribution with respect to probability values in P. When N is large, f(x) is a quality function depicting the scattered field to estimate p(x) with a sample. For the same interval / , there is no loss in generality when we supposed that the fuzzy engineer obtains a possibility distribution irx(p) as a fuzzy probability of t h a t x occurs in / . According to the consistency principle of possibility/probability shown in Eq.(l), Prob could be used to infer if Poss is basically reasonable. Therefore, if irx(p) is similar to / ( p ) , it is natural for the computer scientist to infer that the model to calculate the P P D is basically reliable. The approach to test a model with many HEs is called HistogramCovering Approach. To test a specific model, we might employ a specific form of the approach to accomplish our task.
5. A N u m e r i c a l Simulation E x p e r i m e n t In this section, we employ Histogram-Covering Approach to test the reliability of IOSM for fuzzy probabilities. For the intelligibility we suppose t h a t the samples are drawn from normal distribution AT(6.86,0.372 2 ). Firstly, running Program 2 in paper [5], a generator of random numbers, with MU=6.86, SIGMA=0.372, N = l l and SEED=82495, we obtain 11 random numbers: X = {xi,x2,--,xu} = {0.91,6.59,6.31,6.50,7.03,6.49,7.27,7.13,6.72,7.42,6.34}.
(9)
Employing IOSM, we obtain a P P D : h
h h
P0 / 1.00 0.06 0.04 \ 1.00
Pi 0.41 0.09 0.19 0.45
P2 0.35 0.10 0.19 0.19
P3 0.10 0.29 0.39 0.00
P4 0.09 0.35 0.45 0.00
P5 P6 0.00 0.00 0.41 1.00 1.00 0.29 0.00 0.00
VI 0.00 0.39 0.06 0.00
P8 0.00 0.19 0.00 0.00
P9 0.00 0.04 0.00 0.00
P10 Pll 0.00 0.00\ 0.00 0.00 (10) 0.00 0.00 0.00 0.00 )
152 where h = [5.65,6.25), h = [6.25,6.85), h = [6.85,7.45), h = [7.45,8.05), Pt = i / H . * = 0 , l , - " , 1 1 . Secondly, running the same generator with MU=6.86, SIGMA=0.372, N = l l , and 90 SEEDs, respectively, we obtain 90 samples, X\, Xi, • • • , Xgo. For example, with SEED=876905, we obtain: X3 = {7.14,6.98,6.83,7.00,7.34,6.47,7.65,6.99,6.71, 7.47,6.26}. Employing Histogram Method, with the same intervals used in Eq.(10) and Xk, k = 1,2, • • • ,90, we obtain 90 HEs p\(x £ Ij),p2{x € Ij), • • • ,£90(2: £ Ij),j = 1,2,3,4. Thirdly, for an interval Ij, we obtain 90 estimate values of probability p(x € Ij), forming a sample Wiy For example, for I^ = [6.25,6.85), we obtain: Wi2 = {wi,w2,--,w 9 0 } = {0.55,0.55,0.36,0.45,0.09,0.45,0.64,0.45,0.55,0.27,0.27,0.55,0.64, 0.36,0.36,0.73,0.36,0.45,0.36,0.27,0.64,0.27,0.45,0.36,0.36,0.55, 0.64,0.45,0.45,0.64,0.55,0.27,0.45,0.36,0.45,0.55,0.45,0.36,0.36, 0.45,0.27,0.45,0.45,0.45,0.18,0.36,0.36,0.27,0.73,0.55,0.36,0.45, 0.55,0.64,0.64,0.27,0.55,0.36,0.64,0.45,0.36,0.36,0.36,0.55,0.36, 0.36,0.18,0.27,0.27,0.55,0.55,0.64,0.55,0.45,0.45,0.45,0.55,0.09, 0.27,0.36,0.55,0.45,0.45,0.73,0.45,0.36,0.18,0.36,0.36,0.36} Then, employing the method of information distribution 7 ,with samples Wijtj = 1,2,3,4 and controlling points pt = t/ll,t = 0 , 1 , • • • , 1 1 , we obtain an estimate F(p) shown in E q . ( l l ) . PO Pi V2 h ( 1-00 0.40 0.14 F(p) = I2 0.00 0.08 0.12 h 0.00 0.05 0.19 h \ 1-00 0.45 0.17
P3 0.02 0.46 0.38 0.06
Pi 0.02 1.00 0.90 0.02
P5 P6 0.00 0.00 0.92 0.67 1.00 0.90 0.00 0.00
P7 0.00 0.38 0.71 0.00
P8 0.00 0.12 0.05 0.00
P9 0.00 0.00 0.10 0.00
PlO Pll 0.00 0.00 \ 0.00 0.00 (11) 0.00 0.00 0.00 0.00/
where Ij,j = 1,2,3,4 and pt,t = 0, !,••• ,11 are the same as ones in Eq.(10). Finally, comparing Eq.(10) and E q . ( l l ) , we know that I I ^ p is similar to F(p). According to the consistency principle 8 , we know that IOSM is basically reliable with respect to this computer simulation experiment. Executing a lot of computer simulation experiments with different seed numbers, sample sizes, populations, respectively, we have results showing that IOSM is basically reliable.
153 Acknowledgment The work on this paper was done in Key Laboratory of Environmental Change and Natural Disaster, The Ministry of Education of China. References 1. Gert de Cooman, Possibilistic previsions, in: Proceedings of the 7th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System, Paris, France, 1998, Vol.1, pp. 2-9. 2. Gert de Cooman, Lower desirability functions: a convenient imprecise hierarchical uncertainty model, in: Proceedings of the 1th International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 1999, pp. 111-120 3. G.de Cooman, A behavioural model for vague probability assessments, Fuzzy Sets and Systems, 154(3), (2005), 305-358. 4. C.F. Huang, An application of calculated fuzzy risk, Information Sciences, 142(1), (2002), 37-56. 5. C.F. Huang, A demonstration of reliability of the interior-outer-set model, International Journal of General Systems, 33(2-3), (2004), 205-222. 6. C.F. Huang, C. Moraga, A fuzzy risk model and its matrix algorithm, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(4), (2002), 347-362. 7. C.F. Huang, and Y. Shi, Towards Efficient Fuzzy Information Processing— Using the Principle of Information Diffusion, Physica-Verlag, Heidelberg, 2002. 8. G.J. Klir, D. Harmanec, On some bridges to possibility theory, in: Gert de Cooman, D Ruan, E.E. Kerre (Eds.), Foundations and Applications of Possibility Theory, World Scientific, Singapore, 1995, pp. 3-19. 9. F.P. Ramsey, Truth and probability, in: R.B. Braithwaite (Ed.), The Foundations ofMathematics, Routledge & Kegan Paul, London, 1931, pp. 156-198. 10. G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, Princeton, NJ, 1976. 11. L.A. Zadeh, Probability measures of fuzzy events, Journal of Mathematics Analysis and Applications, 23(1), (1968), 421-427. 12. L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1(1) (1978) 3-28. 13. L.A. Zadeh, Fuzzy probabilities, Information Processing Management, 20(3), (1984), 363-372. 14. L.A. Zadeh, Toward a perception-based theory of probabilistic reasoning with imprecise probabilities, Journal of Statistical Planning and Inference, 105(1), (2002), 233-264.
A NOVEL GAUSSIAN PROCESSES MODEL FOR REGRESSION AND PREDICTION YATONG ZHOU f Dept. Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road XVan, 710049, P. R. China TAIYI ZHANG Dept. Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road Xi'an, 710049, P. R. China ZHAOGAN LU Dept, Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road Xi'an, 710049, P. R. China A novel multi-scale Gaussian processes (MGP) model is proposed for regression and prediction. Motivated by the ideas of multi-scale representations in the wavelet theory, in the new model a Gaussian process is represented at a scale by a linear basis that is composed of a scale function and its different translations. Finally the distribution of the targets can be obtained at different scales. Compared with the standard GP model, the MGP model can control its complexity conveniently just by adjusting the scale parameter. So it can trade-off the generalization ability and the empirical risk rapidly. Experiments show that the performance of MGP is significantly better than GP if appropriate scales are chosen.
1. Introduction In this paper we consider the regression as a problem of finding a desired dependence using a limited number of samples. Once such dependence has been accurately estimated, it can be used for prediction. One goal of the prediction is the prediction accuracy for future data, also known as the generalization ability. And another goal is a lower empirical risk that measures the discrepancy between the true and the estimated targets for the given samples. But these two goals are contradictory if the given sample size is finite. In order to trade-off the * Work partially supported by grant 90207012 of the China National Science Foundation.
154
155 generalization ability and the empirical risk, any model for learning from finite samples needs to have some complexity control. A typical example is the VCtheory that provides a general framework for complexity control called Structure Risk Minimization (SRM) [1]. A Gaussian process is a stochastic process whose joint distribution is a Gaussian. The Gaussian processes are a recent development for solving regression problems, though they have a longer history in spatial statistics [2]. We call a Gaussian process a GP model when it is integrated into a regression modeling. The GP model has emerged as one of the most popular regression and prediction tools [3]. This is perhaps because of its impressive generalization ability over a range of applications. A GP model is specified by a mean and a covariance function. For simplicity, we will only consider GP model with zero mean. Once the covariance function is fixed, it is easy to carry out regression and prediction. Usually the covariance function contains an undetermined parameter G, the GP model controls its complexity by using conjugate gradient method to find the maximum likelihood values of 8. However the evaluation of the gradient of the likelihood requires the inversion of the covariance matrix. Hence calculating gradients becomes time consuming for a large sample size. Additionally, covariance functions most frequently used in the GP model are Gaussian functions. It can be proven that employing this kind of covariance functions is equivalent to representing a Gaussian process by a set of Gaussian radial basis functions centered at different points [4]. But this set of basis functions are not complete, some times the representation is only an approximation. Motivated by the ideas of multi-scale representations in the wavelet theory [5], we represent a Gaussian process at a scale by a linear basis that is composed of a scale function and its different translations, and then a novel multi-scale Gaussian processes (MGP) model is proposed. Comparing to the standard GP model, the new model enjoys three advantages. Firstly, it can control the complexity conveniently just by adjusting the scale parameter. As a consequence it can trade-off the generalization ability and the empirical risk rapidly. Secondly, the scale function and its translations is a complete basis, so the Gaussian processes can be accurately represented at a scale. And thirdly, experiments show that the performance of MGP is significantly better than GP if appropriate scales are chosen. The remainder of this paper is organized as follows. In Sec.2 we review the standard GP model. Following that, in Sec.3 a novel MGP model is proposed.
156 Experiments are reported in Sec.4 where we make comparisons between the two models. Finally some conclusions are presented in Sec. 5. 2. A review on GP model In the following we concentrate to the regression problem assuming that the value of the target function /(x) is generated from an underlying / ( x ) corrupted by Gaussian noise f(x) with mean 0 and variance erj /(i) = / ( x ) +
ff(x).
(1)
Now given a collection of N training samples Z? = {(xn,/n),tt = l,---,Af] , we would like to construct an estimate / ( x ) to the true function / ( x ) which can serve as a reasonable approximation. For convenience, we define sample vectors XN =(xux2,---xN) and corresponding target vector tN =(t[,t2,---tN) . The empirical risk that measures the discrepancy between the true and the estimated targets is defined by
In this section we give a review on the standard GP model and the details may be found in Ref. [4]. The GP model places a Gaussian process prior directly on / ( x ) and /(x). Now assume / ( x ) be generated by a fixed basis functions with a random weight vector w = (w,, w2, • • • wH )
/«=2>/A(4
(2)
In terms of Eq. (1) and (2) we can infer that t„~N(0,C„),
(3)
where CN is a covariance matrix. The next question to address is how to predict/(x JV+ ,) given the test sample x w+l . Let /N+1 denote the target of xN+1, the inference is simple since the joint density P(tN+t,tN) is a Gaussian, the predictive distribution P(tN^ | tN) is also a Gaussian. The distribution P(tNt] 11^) can be obtained by using Baye's rule. After that, it's mean is regarded as / ( x N + 1 ) . To use conjugate gradient method to find the maximum likelihood values of 9 we need to calculate the log likelihood Tand its derivatives. The partial derivatives of T with respect to 0 can be expressed analytically by
51739 = -Q.5Trace[C^ -(dCN /de)\ + 0.5tTNC~lj (dCNld9)C^tN. 3.
Proposed MGP model
One of the areas of investigation in multi-scale analysis has been the emerging theory of multi-scale representations of signals and wavelet transforms [5]. These theories lead naturally to the investigation of multi-scale representations of a stochastic process. Basseville etc. have outlined a mathematical framework for the multi-scale modeling and analysis of stochastic processes [6], [7]. Differ from Basseville's work, the MPG model seeks a multi-scale representation as following
/,(*)=Lw<'Vx)
(4)
where R w is an N x Hj matrix with the element RJ* =
(5)
The matrix C)/ gets the following form C # = ( o M ) 2 R M R M r +
(6)
where I is a unit matrix with the rank N. The («,«') entry of C^' is given by C. (x„,x„,) = alSn, +(a^)2 ^M2'J^
-kW~'*<
~k)>
<7)
k
where 8„„, =1 if n ~ ri and 0 otherwise. tin
Now our task is to infer fj(xNt]), the prediction value of the test sample xN+1 at the scale j . Let t^\ denote the target of %N+] at the scale j . In terms of the assumption that /• (x) is a Gaussian process, it is derived that ^ ,
|t„)ccexp
(8)
2 ( ^ where the mean and variance are given by
)2
158
and k w is the sub block of C(^+1). As anticipated, the Eq. (8) gives the distribution of target t^\x at the scale j . Finally we let ) = tu)
f (x The interval
is called the error bar. It is a confidence interval of 4+i that represents how uncertain we are about the prediction at xw+1 assuming the model is correct. 4. Experiments 4.1. One dimensional toy example Nabney has illustrated the operation of the GP model for regression using 10 training samples drawn from a sine function / ( x ) = sin2;rx with the noise variance a] =1 [8]. Fig. 1(b) shows his results. Similarly, we implement the MGP model for solving the above regression problem. The results at different scales 7(y' = 0,-l,-2,-3) are illustrated in Fig. l(c)-(f). The MGP model involves a numerical calculation of covariance function (7). The function ^ we chosen is the scale function of the Daubechies wavelet with order 10 (DB10). It is smooth and compactly supported. We define the average width of the error bars (A WEB) A
M
where er^ is the error bar of the m-th point which is picked uniformly from the interval of x. AWEB is a meaningful measure of generalization ability because it represents the uncertainty of prediction averagely whereas the generalization ability reflects the prediction accuracy of the model. Assuming the parameter M 50 we obtain the AWEB at different scales in Fig. 1. Performance comparisons between the two models in terms of the empirical risk and the AWEB can be observed in the Table 1.
159
4
;
\
4
\
ft
ft'i
ft*
1V>
*>
\
0*
J 08
s
SS
!
&
0?
H
tt«
(a)
i*
t
1*
«
fiS
B*
(b) 1.
i
8»
(c)
y**-
1
s.
1
V
\ !
> \
/
J.
I
11
\
t
(d)
(e)
(f)
Figure 1, The one dimensional toy example: regression and prediction (solid line) with error bars (dot lines) are presented in the figure, (a) 10 samples (circles) and sine function (dashed line), (b) the GP model, (c)-(0 the MGP model with (c)j=0, (d)j=-l, (e) j= -2 and (f) j= -3
A quick inspection of Fig. 1 and Table 1 shows the scale j plays a key role in controlling the complexity of the MGP model. When j is too large (j = 0), AWEB is small whereas the empirical risk is high. It means that an underfitting has happened. The complexity of MGP model is too low to exhibit the intrinsic characteristics of the estimated function. However, AWEB is large whereas the empirical risk is low when j is too small (j = -3). It means that the complexity of MGP model is so high that an overfitting has happened. Only if appropriate j is chosen (y' = -l or j = -2), i.e., an appropriate complexity is chosen, can good generalization ability and low empirical risk be achieved at the same time. 4.2. Two dimensional toy example The training set are constructed by generating 81 samples randomly on the [8,8]x[-8,8] square and the targets is calculated by Eq. (1), where f(x) is drawn from N(0, 0.25) and /(x) = /(x(l),x,2)) = ((^l))2-(^2))2jsin(o.5^l)) . The test set comes from generating 289 samples on a 17 by 17 grid over the square. The results of two models are shown in Fig. 2 and Table 1. Under the conditions of y' = 2, the performance of MGP is significantly better than that of GP. However, an overfitting and underfitting has happened when 7 = 0 and 7 = 3 respectively.
160
4*^ (c)
Figure 2. The two dimensional toy example: (a) shows the function from which 81 noisy samples are generated and the noisy data points in relation to the function. The offset of each datum due to noise is shown as a dashed line, (b) the results of GP. (c)-(f) the result of MGP with (c) j=0, (d) j=l, (e)j=2and(f)j=3. Table 1. Performance comparisons between the two models in two toy examples. In the table the symbols ER represents empirical risk. Example ID toy
Perform.
GP model
ER*100 AWEB*100
0.11 28.82
ER*10 AWEB*10
18.92 5.17
2D toy
j=-3 0.03 204.23 j=0 2.28 30.76
MGP model j=-2 j=-l 0.06 0.07 44.81 19.53 j=2 j=l 4.53 5.19 4.01 2.68
j=0 6.48 12.80 j=3 22.76 1.23
4.3. Real-word prediction problem Can the good results for the MGP model on toy examples carry over to realworld data? To answer this question, experiment has been performed on the laser data. We use the laser data to illustrate the error bar in making predictions. The laser data has been used in the famous Santa Fe Time Series Prediction Competition. A total of 1000 points are used as the training sample and 100 following points are used for prediction. Fig. 3 plots the predictions and the error bar. The predictions of the MGP model (j = l) match the targets very well except on the region [1065, 1080], Reasonably the model can provide larger error bars for these predictions.
161
,_ , A , v
-"
\
t '
'11te
•••'•"•'•• E n o t
ioS
m5~~~lm
io»
itoo
''rW>
km
1020
./
•41
M h\ I
c
-~
""•'•*••'-'•
V
H\\V
MHO ."" ieeo
\
IOSO
' itoo
Figure 3. Graphs of our predictions on laser data. In the left graph laser data (dashed line), prediction (solid line) and empirical risk (dotted line) are presented. In the right graph the error bar (solid line) and the empirical risk divided by 50 (dotted line) are presented.
5. Conclusions In this work we have proposed a MGP model for function regression and prediction. AWEB is used to act as a meaningful measure of the generalization ability. Experiments indicate that the MGP model can control the complexity conveniently just by adjusting the scale parameter. Additionally, its performance is significantly better than the GP model if appropriate scales are chosen. Future work could include the automatically relevance determination (ARD) of the MGP model. References 1. V. Vapnik, The Nature of Statistical Learning Theory. 112 (1998). 2. C. K. I. Williams, Learning and Inference in Graphical Models. 11 599 (1998). 3. P. Sollich and A. Halees, Neural Computation. 14, 1393 (2002). 4. C. K. I.Williams, Machine Learning. 40, 77 (2000). 5. S. Mallat, IEEE Trans. PAMI. 11, 674 (1989). 6. M. Basseville, A. Basseville and K. C. Chou, IEEE Trans. Information Theory. 38 766 (1992). 7. K. C. Chou, S. A. Golden and A. S. Willsky, Proc. IEEE Int. Conf. ASSP. 1, 1709(1991). 8. I. T. Nabney, NETLAB: Algorithms for Pattern Recognition. 369 (2001).
ON PCA ERROR OF SUBJECT CLASSIFICATION* LIHUA FENG Department of Geography, Zhejiang Normal University, Jinhua 321004, China E-mail: fenglh@zjnu. en FUSHENG HU, LI WAN School of Water Resources and Environment, China University Beijing 100083, China
ofGeosciences,
Since subjective chose could cause the loss of valuable original information, statistics method is employed to deal with multi-variable problem. After normalization, original variables are reduced to several independent synthetic variables on which evaluation is based. Principal Component Analysis provides a good example for this. But the real data calculation shows that the result of Principal Component Analysis is not always complied with the real situation. Sometimes, it can be totally messed up. Some problems exist regarding to this classification. They are as follows: (1) The discrimination ability of PCA is limited, (2) For those samples with big variables, PCA losses its ability of discrimination, (3) When the value of variable increases, on the contrary, class level decreases, (4) The same samples, while different classifications, (5) Variables change a lot, while classification keeps unchanged, (6) While variables change arbitrarily, there are only two different classifications, (7) The position change of variables causes the change of classification, (8) The change of a variable causes the change of the classification. These problems are caused by the nature of Principal Component Analysis itself.
1. Introduction In subject classification, Principal Component Analysis (PCA) is implemented as an objective and practical validation method [1, 2, 3]. It has been used to simplify the high dimensional problems while retaining as much as possible of the information present in the data set [4]. Meanwhile, the weight assigned to each variable that takes into account is obtained objectively to avoid subjective judgment. Therefore it is widely used in finding the synthetic factors, sorting samples, classification, etc. It is especially useful in insect classification, flora classification, environmental quality classification, deposit classification, geology sample classification, etc [5, 6], However, the real data calculation * This work was supported by Zhejiang Provincial Science and Technology Foundation of China (No. 2006C23066).
162
163 shows that sometimes the result of PCA is not complied with the real situation, and problems exist regarding to this classification. These problems will be discussed in this paper. 2. PCA Theory and Method PCA has in practice been used to reduce the dimensionality of problems while retaining as much as possible of the information present in data set, i.e., by linear transformation, PCA reduces dimensionality by extracting the smallest number of synthetic and uncorrelated components that account for most of the variation in original correlated multivariate data and summarizes the data with little loss of information. It simplifies the problem by catching the main features of it [7, 8]. Assume original variables are X\, x2, •••, xp, and the new synthetic variables obtained by PCA are zu z2, •••, zm, which are the linear combination of the original variables x\, x2> ••\xp (m
(3) Compute the characteristic value and characteristic vector of R. According to characteristic equation \R - Al\ = 0, find the eigen values A, of the matrix and sort them in the order of decreasing magnitude A, > X1 > • • • > Xp. At the same time, find the corresponding eigen vector u\, u2, •••, up. They are orthonormal and called principal axes.
164 (4) Compute contribution rate em=XllYJXl and cumulative contribution
rate E.^Aj/f.*, ,=1
•
(=1
(5) Compute principal component z„ = X 2 M / X y=l 1=1
(6) Synthetic analysis. In order to retain as much of the variability in data as possible, what is the necessary accuracy of an m-dimensional system to substitute the original system? This can be estimated by calculating the cumulative contribution rate Em . Usually the minimum m with Em >85% (m< p ) is chosen. Once m is determined, instead of working on all the original variables x\, x2, • • •, xp, we could simply analyze those m principal components. From above analysis, it can be seen that retaining more principal components not only increases calculation, but also decreases the effect of the main principal component. Therefore, it is important to choose several representative principal components from existing principal components [11, 12]. Since subjective chose could cause the loss of valuable original information, statistics method is employed to deal with multi-variable problem. After normalization, original variables are reduced to several independent synthetic variables on which evaluation is based. Principal Component Analysis provides a good example for this [3]. For simplicity, Table 1 lists 5 samples y, with the same 8 variables x:. According to above steps in principal component analysis, first, we normalize 8 variables of 5 samples to get normalized data set (x'j) (i = 1,2,• • • ,8; j = 1,2, • • • ,5). Then we calculate the correlation matrix R = (ru)M and principal components zm . According to the criteria of minimum m (Em> 85%), top 3 principal components are chosen. From this, the data structure is simplified. Based on the product of principal components z\, z2, and z3 and the correspondent weights e,, 3
e2 and e-j, (synthetic principal component Z = ^emzm ), the final synthetic m=l
principal component Z of each sample is sorted as Table 2. According to the synthetic principal component Z of Table 2, the samples are classified as 5 classes. Class I ( Z < - 3 ) , class II ( - 3 < Z < - 1 ) , class III (-1
165 Table 1. 8 variables X, of 5 samples yt Sample
x,
x2
X,
x<
xs
X6
x,
xa
y>
2
2
2
2
2
2
2
2
y2 y> y. y>
4
4
4
4
4
4
4
4
6
6
6
6
6
6
6
6
8
8
8
8
8
8
8
8
10
10
10
10
10
10
10
10
Table 2. Synthetic principal components of each sample and its classification. Sample
Z
Sort
Class
y,
-4
1
I
yi
-2
2
y, y< y>
0
3
n m
2
4
IV
4
5
V
3. Error Discussion of PC A Classification The result of real data calculation shows that some problems exist regarding PCA classification. They are: 3.1. The Discrimination Ability of PCA is Limited In Table 1, let the variables of sample^ are y3 = (38, 2, 2, 2, 2, 2, 2, 2). The synthetic principal components of 5 samples are Z = (-2.33, -0.87, 2.33, 2.04, 3.50). The synthetic principal component Z3 of y3 is the same as Zt (Z3 = Z\ = -2.33), although y\ and _y3 are different samples (x3) = 38, xu = 2). Thus, it is concluded that the discrimination ability of PCA is limited.
166 3.2. For Those Samples with Big Variables, PCA Losses its Ability of Discrimination In Table 1, let >>, =(1000,2,2,2,2,2,2,2), herexn is large. The result synthetic principal components are Z= (-3.85, -1.62, 0.10, 1.82, 3.55). yx still belongs to class I (Z, = -3,85). PCA here is not suitable to those samples with very big variable. 3.3. When the Value of Variable Increases, on the Contrary, Class Level Decreases In Table 1, assume y3 = (100, 100, 100, 100, 2, 2, 2, 2). Comparing with yx, the values of variables of y3 increase (x3\=x32 = *33 = xi4 = 100), and the class level should increase accordingly. But the result synthetic principal components are Z = (-1.03, -0.19, -2.59, 1.49, 2.33), the level of the class it belongs to is lower than that ofy\ (Z3 = -2.59
167 3.6. While Variables Change Arbitrarily, There Are Only Two Different Classifications In Table 1, let y\ =y2=yi=y4 = (2,2,2,2,2,2,2,2) In y5, when variable *5,>0.2, the output of synthetic principal components is Z = (-1.41, -1.41, -1.41, -1.41, 5.66). When variable * 5 ,<0.2, the output of synthetic principal components is Z = (1.41, 1.41, 1.41, 1.41, -5.66). In this example, it is showed that while variables of ys changed arbitrarily, there are only two different classifications. 3.7. The Position Change of Variables Causes the Change of Classification Assume the variables of five samples are as follows: yt = (1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8) ^ = (11,12,3,4,5,6,7,8) ^ = (11,12,13,4,5,6,7,8) ^4 = (H, 12, 13, 14,5,6,7,8) ^ = (11,12,13,14,15,6,7,8) The output of synthetic principal components is Z= (3.44, -0.64, -0.87, -0.94, 1.00). Of y\, if we exchange x\\ =1.1 and xn =2.2, the output of synthetic principal components becomes Z= (-3.25, -0.09, 0.57, 1.17, 1.62). The change of synthetic principal components is tremendous {Z\ = 3.44 —> -3.25), and the correspondent classification of vi changes from class V to class I. Thus we concluded that the position change of variables could cause the change of classification. 3.8. The Change of a Variable Causes the Change of the Classification Let variables of 5 samples are: j>i = ( l l , 12, 13, 14, 15,6,7,8) ^ = ^ = ^ = (1,2,3,4,5,6,7,8) ys = (\. 1,2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1) The output of synthetic principal components is Z= (3.28, -0.90, -0.90, -0.90, 0.59). Of.Fi, if *j5 = 15 is changed to x]s = 5, we have Z= (-1.39, -0.53, -0.53, 0.53, 2.99) (Z\ =3.28 changes to Z, = -1.39). As the result, the classification of Vi changes from class V to class II. This indicates that the change of a variable causes the change of classification.
168 4. Conclusion PCA is a simple and useful classification technique. By linear transformation with minimal information loss, PCA reduces multidimensional variables to a small set of synthetic variables while retain the maximum information of the original data set. Thus, PCA simplifies the data structure and the weights are also obtained objectively. But the real data calculation shows that the results of PCA are not always comply with the real situation, and sometimes this method may fail completely. These drawbacks are associated with the nature of PCA itself. The study of these problems will further our understanding of PCA. As to how to improve, it requires further study. References 1.
X. Li and J. A. Ye. Application of principal component analysis and Cellular Automata in policy-making of space and urban simulation. Science in China (series D), 31(8), 683-690(2001). 2. R. E. Ren and H. W. Wang. Data Analysis of Multivariate Statistic. Publishing House of National Defense Industry, 92-110 (1997). 3. C. Zhang and B. G. Yang. Basis of Quantitative Geography. Publishing House of Higher Education, 145-159(1993). 4. S. Bermejo and J. Cabestany. Oriented principal component analysis for large margin classifiers. Neural Networks, 14(10), 1447-1461 (2001). 5. X. P. Wang. The principal component analysis method of the water quality assessment in rivers. Application of Statistics and Management, 20(4), 49-52 (2001). 6. T. X. Cheng, H. G. Wu and X. H. Sun. A method of tender evaluation based on the PCA. Systems Engineering— Theory & Practice, 20(2), 118-121 (2000). 7. R. J. Bolton, D. J. Hand and A. R. Webb. Projection techniques for nonlinear principal component analysis. Statistics and Computing, 13(3), 267-276 (2003). 8. M. Bilodeau and P. Duchesne. Principal component analysis from the multivariate familial correlation matrix. Journal of Multivariate Analysis, 82(2), 457-470 (2002). 9. A. Pedro and D. Silva. Discarding variables in a principal component analysis: algorithms for all-subsets comparisons. Computational Statistics, 17(2), 251-271 (2002). 10. B. B. Li and E. B. Martin. On principal component analysis in LI August 2002. Computational Statistics & Data Analysis, 40(3), 471-474 (2002). 11. P. Giordani and H. A. L. Kiers. Principal Component Analysis of symmetric fuzzy data. Computational statistics & data analysis, 45(3), 519-548 (2004). 12. M. K. Shukla, R. Lai and M. Ebinger. Principal component analysis for predicting corn biomass and grain yields. Soil Science, 169(3), 215-224 (2004).
OPTIMIZED ALGORITHM OF DISCOVERING FUNCTIONAL DEPENDENCIES WITH DEGREES OF SATISFACTION* QIANG WEI", GUOQING CHEN School of Economics and Management,
Tsinghua University, Beijing 100084, China
In order to tolerate partial truth due to imprecise or incomplete data that may often exist in massive databases, or due to a very tiny insignificance of tuple differences in a huge volume of data, the notion of functional dependency with degree of satisfaction, denoted as (FD)d, has been proposed in [5], along with Armstrong-like properties and the concept of minimal set. This paper discusses and presents several optimization strategies and inference properties for discovering the minimal set of (FD)d and incorporates them into the corresponding algorithm so as to improve the computational efficiency.
1. Introduction Data mining is one of the important and interesting fields in computer science and computational intelligence, and is used to discover hidden, novel and potentially useful knowledge to support decisions. This paper concentrates on a particular type of association knowledge, called functional dependency, in relational databases, which are categorized as a mainstream of data models in current research and applications. For two collections X and Y of data attributes, a functional dependency (FD) X—»Y means that X values uniquely determine Y values. An example of X—»Y is (Student#, Course#)-»Grade, meaning that the value of grade can be uniquely determined by a given value of Student# and a given value of Course#. However, FDs are not explicitly known or are hidden, and therefore need to be discovered. This partly stems from the fact that several decades of IT applications had resulted in a large number of databases that were constructed and maintained in which useful and interesting FDs might have already been hidden. Since the 1990s, an increasing effort has been devoted to mining FDs [3, 5-10].
* Partly supported by the National Natural Science Foundation of China (79925001/70231010), the Tsinghua Research Center of Contemporary Management and the Bilateral Scientific and Technological Cooperation between China and Flanders. 1 Corresponding author: [email protected].
169
170 Formally, let 9?(II; I2, ..., In) be an n-ary relation scheme on domains D 1; D2 Dn with Dom(Ij) = Dj, X and Y be subsets of the attribute set I = {I1; I2,..., I n }, i.e., X, Y c I, and R be a relation of scheme 91, R c DixD2x...xDn. X functionally determines Y (or Y is functionally dependent on X), denoted by X->Y, if and only if VR e 5R, Vt, f e R, if t(X) = t'(X) then t(Y) = t'(Y), where t and t' are tuples of R, and t(X), t'(X), t(Y) and t'(Y) are values of t and t' for X and Y respectively [1-2, 4]. It is important to note that functional dependency possesses several desirable properties, including so-called Armstrong axioms that constitute a FD inference system [1,4]. In discovering functional dependencies, there still exist some open problems. First, in large existent databases, noises often pertain, such as conflicts, nulls, and errors that may result from, for instance, inaccurate data entry, transformation or updates. Apparently, by definition, FDs do not tolerate such noisy or disturbing data. Second, even without noisy data, sometimes a partial truth of a FD may still make sense. For instance, "a FD almost holds in a database" expresses a sort of partial knowledge, meaning that the FD satisfies the relational databases of concern to a large extent. Third, in developing corresponding mining methods, FD inference is desirable but still needs to be further investigated. That is, deriving a FD by inference from discovered FDs without scanning the database may help improve the computational efficiency of the mining process. For example, if both A-»B and B—>C satisfy a relational database, and if A-»C could be inferred directly, then the effort in scanning the database for checking whether A-»C holds can be saved. In 2002, Huhtala et al [3] consider using the concept of approximate dependency to deal with so-called error tuples. In the mean time, Wei and Chen [5] presented a notion of functional dependency with degree of satisfaction (FDd: (X-»Y)0) to reflect the semantic that equal Y values correspond to equal X values at a certain degree (a). Moreover, in 2002, Wei and Chen presented the Armstrong-like inference rules, along with an inference system, based on which the minimal set of (FDs)d has been proposed [5]. Furthermore, a fuzzy relation matrix-based algorithm has been analyzed to perform transitivity-type FD inference. Accordingly, the algorithm for mining (FDs)d, called MFDD, has been provided, which can discover the minimal set of (FDs)d efficiently. In this paper, we will further investigate some important properties of (FD)a, and present two strategies to optimize MFDD. The paper is organized as follows, some preliminaries will be reviewed in Section 2. Section 3 will discuss how to improve the sub-algorithm of computing the degree of satisfaction of a (FD)d. Moreover, some further inference rules will be discussed in Section 4. Accordingly, the optimized algorithm of discovering minimal set of (FDs)d will be presented in Section 5. An illustrative example will be provided in Section 6.
171 2. Preliminaries Definition 1: Let 5R(Ii, I2, ..., In) be a relation scheme on domains D b D2, ..., Dn, X , Y c I , and R be a relation of 5R(I), R c D!xD2x.. .xDn, where tuples tj, tj e R and tj * tj. Then Y is called to functionally depend on X for a tuple pair (tj, tj), denoted as (tMD(X->Y), if tj(X) = tj(X) then tj(Y) = t/Y). It can easily be seen that the FD for a tuple pair could be represented in terms of the truth value, TRUTH(ti,tj)(X->Y), where if ti(X)=tj(X) and ti(Y) * tj(Y), then TRUTH(ti, tj)(X-»Y) = 0; otherwise 1. Subsequently, FD for relation R can be defined in terms of degree of satisfaction. Definition 2: Let 9?(I) be a relation scheme, X, Y c I, and R be a relation of
J^li
, NTP
where NTP represents the number of tuple pairs in R and equals n(n-l)/2. Then given a minimum satisfaction threshold 0, 0 < 0 < 1, if TRUTHR(X—»Y) > 0, then X—>Y is called a satisfied functional dependency. For the sake of convenience, a (FD)d X->Y with TRUTHR(X-»Y) = a is denoted as (X—>Y)a. Moreover, some properties could be derived. Let R be a relation on 5R(I) a n d X , Y , Z c I, we have: Al: If Y c X, then TRUTHR(X-> Y) = 1. A2: If TRUTHR(X->Y) = a, then TRUTHR(XZ->YZ) > a, 0 < a < 1. A3: If TRUTHR(X-»Y) = a and TRUTHR(Y->Z) = p, then TRUTHR(X->Z) > a+p-1. A4: If TRUTHR(X->Y) = a, then TRUTHR(Y-»Z) > 1 - a. The first three properties are similar to the three classical Armstrong inference rules, except for A3 in that it guarantees a lower-bound TRUTH value for a transitive (FD)d that could be inferred without scanning database. Moreover, A4 is important to guarantee that invalid values less than 0 will not be generated in transitive inference. Based on the Al, A2 and A3, the Armstrong-like inference system could be defined, as well as the 0-equivalence and minimal set of (FDs)d. Accordingly, an algorithm called MFDD based on fuzzy relation matrix operation has been proposed, by which the minimal set of satisfied (FDs)d could be discovered efficiently. For details, please refer to [5],
172 However, the algorithm could be further optimized on two aspects. First, the algorithm of computing the degree of satisfaction of a certain (FD)d could be further optimized, which will be discussed in Section 3. Second, we will further investigate the four inference rules, especially including A4, and see how to use them to a greater extent for efficiency purposes. 3. Optimized Sub-Algorithm of Computing the Degree of Satisfaction As presented in [5], the sub-algorithm of computing the degree of satisfaction of a certain (FD)d is very direct and easy based on Definitions 1 and 2. For example, given A—>B, the process is to scanning all the tuples and comparing each pair of tuples. Suppose the number of tuples is n, the computational complexity of the sub-algorithm is n(n-l)/2 = 0(n 2 ). However, it could be found that, if and only if t,(A) = (j(A) and tj(B) * tj(B), then TRUTH(tUj)(A->B) = 0, else TRUTH(tii tj)(A->B) = 1. Thus, the number of tuple pairs whose TRUTH value is 1 is equal to n(n-l)/2 subtracted by the number of tuple pairs whose TRUTH value is 0. In this situation, we can focus on different groups with identical A values, since the TRUTH value of A-»B on a certain tuple pairs with different A values will be definitely 1. In brief, only the tuple pairs with identical A values and different B values are worth considering. Given A—>B, the process of computing the degree of satisfaction is as follows. First, categorize the n tuples into k groups, each of which has an identical A value, denoted as (Aj)-group, 1 < / < k. Second, in each A value group, only the tuple pairs with different B values will result in 0 TRUTH value. Then, we can categorize the tuples in an (Aj)-group into /, sub-groups, each of which has an identical B value, denoted as (Ai; Bj)-group, 1 < j < /,. Accordingly, the number of tuples in each (Ai; Bj)-group, denoted as n,y, could be counted. Third, the TRUTH value of A—>B could be computed with the following function:
TRUTHM->B) = \-Y,
Z (H,y,x„,2)/(„(«-l)/2)-
MiikMjfJiil, j\*h
The optimized sub-algorithm is listed in Table 1. The computational complexity of the algorithm contains two parts. The first part is to categorize all the n tuples into groups according to A and B values. The second part is to count the number of tuples in each groups. In part one, if all the tuples could be categorized into a A groups, and each A group could be further categorized into b B groups. Clearly, there always exists a x b < n. So the computational complexity of part one is no more than 0(a/2xb/2xn) = 0(abn/4) < 0(« 2 /4). In
173 part two, the computational complexity is 0(axb(b-l)/2) < 0(n{b-\)l2). Then the total computational complexity is 0(abn/4+ab(b-l)/2), which is less than the computational complexity 0(«(n-l)/2) of the original algorithm in most situations. In the worst situation, where a=\ and b=n, the computational complexity of optimized algorithm is 0(«/2+n(n-l)/2) which is a little higher than the original algorithm. Fortunately, however, the worst situation could be rarely seen, which represents that all the tuples in the databases have identical A values and identical B values. So generally, the optimized sub-algorithm of computing the degree of satisfaction is more efficient than the original sub-algorithm. This optimization is quite important and will improve the performance of the whole algorithm, since the computation of degree of satisfaction is the basic operation. Table 1. Optimized Sub-Algorithm Degree_Satisfaction(A-»B) N[][] = 0; // Initiate a two-dimension array to store the number in each group. SELECT COUNT(t) FROM R GROUP BY A AND B INTO N[a][b] Non_Truth_Number = 0; FORp=lTOa { FOR q = 1 TO b { FORr = q+nOb { Non_Truth_Number = Non_Truth_Number + N[p][?]xN[/)][r]; } } } Deg_FD = 1 - NonTruthNumber / ((« - 1) x n 12);
4.
(FDs)d Inference Rules
In [5], Al, A2 and A3 have been considered partially to improve the mining efficiency. In this paper, Al, A2 and A3 will be further considered, while A4 will also be utilized. Since these important properties could be further deducted and incorporated into the MFDD algoritiim to improve the efficiency. Based on Al, A2, A3 and A4, some important inference rules could be derived, denoted as Dl, D2, D3 and D4. Dl: Let R be a relation on
174 Based on the above inference rules, some optimized strategies could be inferred and incorporated into the algorithm. Let R be a relation on 31(1), X, Y c I and given any Zn(XuY) = 0 , and given a threshold 0, we have: Strategy 1: if X->Y is satisfied for R, then XZ-»Y is satisfied. Strategy 2: if X-»Y is not satisfied for R, then X->YZ is not satisfied yet. Strategy 3: if TRUTHR(X->Y) < 1 - 6, then for any Z c I, Y->Z is satisfied R. In mining the minimal set of satisfied (FDs)d, Strategies 1 and 3 could be utilized as inference strategies, while Strategy 2 could be regarded as filtering strategy. In [5], only Strategies 1 and 2 have been incorporated into MFDD algorithm. In Section 5, Strategy 3 will be further incorporated as inference strategy to optimize the MFDD algorithm. 5. Optimized MFDD Algorithm The optimized MFDD algorithm is listed in Table 2. Table 2. The Optimized MFDD Algorithm SC_FP = 0 ; IN_FP = 0 ; DIS_FP = 0 ; CA_FP = 0 ; f = {X->Y with a and flag 10 < a < 1, flag = 0, 1,2 or 3, X c I, Y = Ij, 1 < j < m} CA_F,= {f | X = Ij, Y = Ii, 1 G) {f.flag = 1 and Fp = Fp u {f}; } ELSE {f.flag = 2;} + IF (f.a < 1 - G) + { FOR ALL f e F p A N D f . X = f.Y + { f.flag = 3; }} // Inferred satisfied according Strategy 3. F p =Fp®(Fi)°'"'; //Please refer to [5]. FOR ALL f e Fp { IF (f.a > G) {f.flag = 3;} ELSE {f.degree = 0;}} } SC_FP = {f s Fp | f.flag = 1}; DIS_FP = {f € Fp | f.flag = 2};IN_FP = {f e Fp | f.flag = 3}; CA_Fp+, = Generate_Candidate(DIS_Fp); // Please refer to [5]. DIS_FP = 0 ; p ++; } M_F = u , 5llSpSC_Fk; MULTI_FD = Generate_Multi(M_F); F At = M_F u MULTI_FD;
The line marked with "*" are the optimized sub-algorithm of computing the degree of satisfaction. The three lines marked with "+" are the process incorporated with Strategy 3. The process is as follows. If an (FD)d f is found such that its TRUTH value a is no more than 1 - 9 . Then scan the current Fp, and mark all the f e F p with antecedent equal to the consequent of f with flag =
175 3, which means these (FDs)d could be inferred satisfied without scanning the database. The analysis on computational complexity of this optimization process contains two aspects. First, this process is very efficient if there exist any (FD)d, e.g., A->B, with TRUTH value no more than 1 - 0, then all the B->L., 1 < j < m, could be inferred satisfied without any database scanning. Second, whether this process could take effect highly relates to the values of attributes in databases, depending on number of the tuples with different B values and identical A values. 6. A Small Example Suppose a database as listed in Table 3. Notice there is a null value in Location. Table 3. An Example of Database ID 1 2 3 4 5
Department CS IS CS CS CS
Location
.......
#
.
Building 2 Building 2 Building 2 Building 2
Given 0 = 60%, then according to original MFDD algorithm, it could be discovered that the minimal set of satisfied (FDs)d MF = {(ID—»Deparment)i o, (Department—>Location)0.7, (Location—»Department)0.7}, all these 3 (FDs)d could be derived only by scanning database. The set of inferred satisfied (FDs)d rN_F={ID->Location}, and the set of scanned dissatisfied (FDs)d DIS_F={(Department-»ID)o.4, (Location->ID)0.4}. So totally 5 (FDs)d could be determined whether they are satisfied or not by scanning the database. With the optimized MFDD algorithm, the minimal set is die same, however, since it could be scanned that the TRUTH value of Department-»ID is 0.4, which is no more than 0.4 = 1 - 60%, which means that ID->X, X could represent any of Department or Location, will be definitely satisfied according to Theorem 3. Then ID-»Location and ID->Department could be inferred satisfied without scanning the database. So finally, only 4 (FDs)d could be determined whether they are satisfied or not by scanning the database, which could save more time than original MFDD algorithm. 7. Concluding Remarks In this paper, we have further discussed the functional dependency with degree of satisfaction, which could tolerate noisy data and express partial
176 knowledge. Moreover, in order to further improve the performance of the discovering algorithm, this paper has focused on two aspects. First, a group operation based sub-algorithm of computing the degree of satisfaction has been proposed, which can improve the efficiency of basic operation of discovering (FDs)a. Second, some inference rules based on the Al, A2, A3 and A4 properties have been analyzed, especially for A4. Accordingly, the original MFDD algorithm has been further optimized with Strategy 3, based on which some satisfied (FDs)d could be inferred by some highly dissatisfied (FDs)d without scanning the database. The example also illustrates the ideas. It is worth mentioning that in computing the degree of satisfaction of X-»Y, all the tuple pairs will be considered according to Definition 2 (including the tuple pairs with different X values), which is consistent with the notion of functional dependency in classical relational data models. References 1. Chen GQ. Fuzzy logic in data modeling: Semantics, constraints and database design. Boston, MA: Kluwer Academic Publishers; 1998. 2. Codd EF, A Relational Model for Large Shared Data Banks. Communications of the ACM 1970,13(6): 377-387. 3. Huhtala, Y.; Karkkainen, J.; Porkka, P.; & Toivonen, H., 1998. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. Proc. 14th Int. Conf. on Data Engineering, IEEE Computer Society Press. 4. Ullman, Jeffrey D. Principles of Database and Knowledge-Based Systems. Maryland, Computer Sciences Press Inc., 1988. 5. Wei, Q, Chen, GQ, Efficient Discovery of Functional Dependencies with Degrees of Satisfaction, J. of Intelligent Systems, Vol. 19, 1089-1110, 2004. 6. Baudinet M, Chomicki J, Wolper P. Constraint-generating dependencies. J Comput Syst Sci 1999; 59(1):94-115. 7. Bell S, Brockhausen P. Discovery of data dependencies in relational databases. LS-8 Report 14. University of Dortmund, Germany; 1995. 8. Wyss C, Giannella C, Robertson E. FastFDs: A heuristic-driven depth-first algorithm for mining functional dependencies from relation instances. Technical Report 551, CS Department, Indiana University, July 2001. 9. Castellanos M, Saltor F. Extraction of data dependencies. Report LSI-93-2-R. Barcelona: University of Catalonia; 1993. 10. Flach PA, Savnik I. Database dependency discovery: A machine learning approach. AI Commun 1999;12(3): 139-160.
FROM A N A L O G Y REASONING TO INSTANCES B A S E D LEARNING*
PAN W U M I N G College of Computer
Science, Sichuan University, Chengdu 610065, E-mail: pan.wumingQgmail.com
P.R.China
LI T I A N R U I Department
of Mathematics,
Southwest Jiaotong University, Chengdu 610031, P. R. China Belgian Nuclear Research Centre(SCK»CEN), 2400 Mol, Belgium E-mail: [email protected]
The principle of local structure mapping in analogy reasoning is introduced and applied to the problem of learning from data. A conceptually straightforward approach is presented for the classification problem on real-valued data. An experiment on Iris data set is provided to illustrate that the principle of local structure mapping can be an effective mechanism and viewpoint for tasks of learning from data as well as analogy reasoning.
1. I n t r o d u c t i o n Most learning algorithms based on feature vectors of real-valued and discrete-valued numbers, and in all cases there has been a natural measure of distance between such vectors. For example, the fc-nearest neighbor algorithm 1 assumes all instances correspond to points in the n-dimensional space. The nearest neighbour is normally calculated using a distance measure such as the Euclidean distance. Most learning methods address problems of this sort — two input vectors sufficiently "close" lead to similar outputs. The principle that similar inputs lead to similar outputs has been widely applied in the algorithms of learning from data as well as rules-based reasoning. In this paper, we indicate that this principle can be extended to a more general principle, the local structure mapping principle, which "The work was supported by the national natural science foundation of china (60474022) and the natural science foundation of sichuan province, china (05jy029-021-2).
177
178 is manifested in analogy reasoning. The local structure mapping principle is still not applied to data analysis directly. In this paper, we attempt to design new learning algorithms in terms of this principle. The remaining of the paper is organized as follows. In section 2, we discuss the general representation of structures. We investigate how the local structure mapping principle can be used in analogy reasoning in Session 3. In Section 4, a learning algorithm is presented with reference to the local structure mapping principle intuitively, and experiments on Iris data set are provided. Section 5 concludes the research work of this paper. 2. The Representation of Structures A structure is defined as S = {S, 61,62, • • •) , where S is the underlying set and 6i,62,--- are operators or relations which satisfy some properties or constraints. A local structure SL = (S1,6[,6'2,---) of S is a structure such that S' C S, and 6[, 5'2, • • • are the operators or relations restricted on 5". A structure mapping M : Si ——> S2 is a function between structures which hold the rules of the operators or relations of these structures, such as homomorphisms of algebras. For example, lattice is a structure. Let L be a lattice, then L has a underlying set L together with two binary operations V (join) and A (meet), and the operations of L satisfy commutative laws and other three kinds of laws. 3. Structures mapping in Analogy Reasoning Analogy is a process of finding and using correspondences between concepts, and plays a fundamental and ubiquitous role in human cognition2,3'7. Analogy is also used as a machine learning and automated deduction method in AI and other application domain4,5. Analogy facilitates better understanding of old knowledge and the formation and inference of new knowledge. Understanding analogy requires several processes include retrieval, mapping and application6. Retrieval is a process to find the sources adapted for the target. Mapping is to find the consistence between source problem and target problem and to map concepts from base to target domain. Application is to apply correspondence of representation elements from base to target domain to describe and explain new topics. As a reasoning method, analogy contains a deep logic problem3. However, formal rules haven't been found so far for analogy reasoning. Owen suggested that analogy mapping (called analogy matching) can be expressed as a set of positional associations between symbols in logical terms 4 , i.e. consist of elements of form as
179 follows, ((symboli,positioni),
(symbol,positiori2))
(1)
where symboli has positioni in statementi and symbol has positioni in statement?,. A position in (1) is represented as a sequence of successive argument position of a statement. Local structure mapping here is the mapping of sequence structures. There are still other ways of local structure mappings. If we think that the analogy is between two logic sentences, then they may be associated with algebra structures. And if we think the statements have a syntax structure, then these sentences may be parsed to tree structures, then there may be mappings between tree structure. Chatterjee and Campbell proposed an reasoning method called knowledge interpolation with setting number on knowledge to build order structure on knowledge 8 . Knowledge Interpolation is a kind of local structure mappings of order structures. 4. Classification on Real-valued D a t a by Local Structure Mapping In analogy reasoning, local structure mapping uses the structure information to predict new situations according to reference knowledge. Suppose a classification problem involves non-numeric data. For instance, descriptions are discrete and without any natural notion of measures. A learning algorithm used for this kind of data is decision trees 1 . Even they may have structure information, decision trees algorithms do not use it. However, if we can build structure on these data, the local structure mapping may be a more appropriate approach to learn from these kinds of data. We consider the standard classification problem, where Xi,i — 1,2, ••• ,n are real valued properties and C is the class variable having m possible classes Cj, j = 1 , 2 , . . . , m, where Cj is a nominal data. We will build structure on n-dimension real-valued training data set. Then the set of all labels can be viewed as a trivial discrete structure which only contains a unary relation just contains all labels. When a new query instance is given, we can add this instance to training data and re-build the structure. To predict the most appropriate label of the query instance, we map the local structure of the query instance to the discrete structure of the label set. Because the elements of the label set have no relations to each other, every label is naturally a single point structure. If we have chosen the local structure around the query instance d, we can map it to a label which represent a one point structure. For there are m labels, we must
map it m-times. Now we must determine which label is most appropriate to the query instance. For each label Cj, we count the number of instance correctly mapped to label Cj, we can use A (Cj, d) to denote this number. So evidently if some Ck have the biggest A (Ck, d), Ck may be most appropriate label to the query instance d. A example is show in Fig.l, where most appropriate label to d is C\. If there are many kinds of local structures, we must consider the effect of the mapping of all local structures. The ultimate process of classification is in the following section. d^d^.d^
are labeled C\; d5 is labeled C2;dt is labeled C3.
a) local structure around a? mapping to Cj
b)
local structure around d mapping to C2
c) local structure around a? mapping to C3
Figure 1. An example of local structure around query instance mapping to all three class labels.
4 . 1 . Selecting
the Structure
for
Training
Data
For each Xi, we consider the order structure P j = (Xi, ^ ) , where ^ is the order relation on domain of Xi, the set of real numbers. The product of two order structures P = (P, ^ ) and P ' = (P', ^ ' ) is also an order structure P x P ' = ( P x P',^PxP,), where ( P x P') is the Cartesian product of P and P', and (2:1,2:2) ^ P x P ' (i/i,2/2) iff £1^2/1 and x2 ^'2/2- The order structure is dual. Let P * = (Xit ^*) be the dual order structure of P j = (Xi, ^ ) , then for any a, b G Xi, a^b iff b ^ * a. Therefore, let P i and P2 be the order structures on Xi and X2, there are four order structures, P i XP2, P i x P 2 , P i x P j and P * x P ^ , on Xx x X2- Similarly, there are 2" order structures o n X j x l 2 x ••• x Xn: Mj = P f x P f x • • • x P£" = (Xi x X2 x ... x Xn,
^j)
(2)
where j = 1, 2, • • • , 2 n , and • _ U ~ 1) mod 2i - (j - 1) mod 2 ~ 2( i ~ 1 )
Ji
(3)
181 and for each i e {1,2, • • • , n } , P ? = P i = (Xu » , P j = P J = (Xu >*). If we also write ^ as ^ ° and ^ * as ^ ' , then (xi,x2, • • • , x„) > j (2/1,2/2, • • • , j/ n ) iff Xi^jiyi , i = 1,2, ••• , n . Let the collection D of data cases D\, D2, • •., Di be the training data sets. The data item D^ has the form (k, xf, a;*,...,x*, C fc ), where fc is the key label of the instance for identifying each sample instance uniquely, Ck € { C i , C 2 , . • • , C m } and xk belongs to the domain of the predictive variable Xi. We also use dk to denote (k,xk,xk,.. •, x£) and vk = ( x ^ x * , . . . , x * ) , k k k k then D/i; is also written as (d , C ) or (k, v ,C ). We call D x the instances set of D . However, the partial order structures M j can not be reduced to the corresponding structures on the set D x directly. Suppose we can define relation P'j on the set D x = {d 1 ,d 2 , • • • , d ' } such that for any ds,dr £ D x , ds >j d r if and only if vs >j vr.
(4)
There may be duplicate vectors in D x , but they are associated with different instances. If fci 7^ k^ and x / = xt2,i = 1,2, • • • , n, then we have dkl *=j-d*2 and d*2£=$ d A l , but dkl and d*2 are not the same elements in D x This breaks the antisymmetry of partial order. When we classify a new query instance, the duplicate vectors of it in D x must be considered because they are the nearest samples of it. Hence we want to built some structures on D x such that the nature of structures M j are still exist and the duplicate vectors are also have appropriate connections in these structures. A finite partial order structure P = (P, >) can be represented by a Hasse diagram in which two elements a\ and 1x2 are connected if and only if one is a cover of another. Definition 4 . 1 . Let HJ 3 * = ( D x , ^ , ) be a structure that we call the Hasse structure of the instances set D x such that for any ds,dr e D x , dsPjdr if and only if vs is a cover of vr according to partial order relation ^j or vs = vr. A finite partial order set is equivalent to it's corresponding Hasse diagram, so that the Hasse structure of a data set is essentially a partial order structure except that any element p and all duplicate elements of p are apart but "equal" t o each other. 4.2. Determining the Local Structures Related to Training Data
for an
Instance
In a finite partial order structure, the local structure of an element p of it is the structure on the set just containing all it's adjacent elements in
182 Hasse diagram and p itself. One can extend the local structure to include the local structures of all it's original elements. For a Hasse structure the local structure of an element p may intuitively be the set just containing all it's adjacent elements and p itself. When a new query instance is input, it's local structures is retrieved from training data and will be used to classify the new query instance. Suppose the query instance is d = (I + 1,x\,X2, •. •,xn) — (I + 1, v) where I + 1 is given as the key label, the local structure of d in Hasse structure H£>xu{d} =
^Dx
y
^
^^
is d e f i n e d
ag
Loc, (DJC, d) = (loCj ( D x , d), Pj) , where
IOCJ
(Dx,d)
s
s
(5)
s
= {d \dPjd or d'Pjd, d &DXU {d} } .
Proposition 4.1. IOCJ (Dx,(i) = locin_j ( D x , d ) . 4.3. Approximating New Instances' on Their Local Structures
Target Values
Based
There are 2" local structures of the query instance d in D x U {d}. Definition 4.2. Let D be the training data set, C;, i = 1,2, • • • , m, be the class labels, and Xj (Cj, d) be the number of the instances in IOCJ (Dx, d) — {d} whose class label is Q , the support of assigning the query instance d to class d in the partial order structure M j , j = 1, 2, • • • , 2™, is defined as 3
"
#_of-totalJnstances_inJoCj(Dx,d) — 1
Definition 4.3. For class labels Ct, i = 1,2,•••,m, let X(d,d)
=
2"
J2 Xj(Ci,d), the support of assigning the query instance d to class C,, according to data set D, is defined as support (Citd) = -n ^ ^ . J2 (#.of.totaLinstancesJnJocj (Dx,d) — 1)
(7)
Corollary 4.1. For j = 1,2, • • • , 2 n , we have support j (Ci, d) = support 2 n_j (C*, d)
(8)
Proposition 4.2. For j — 1,2, • • • ,2™, we have m
m
YJ support^ (Ci,d) = I, y ^ support (Cj, d) = 1
(9)
The conceptually straightforward approach to predict the class label of d is to assign the query instance d the class label Cd such that support (Cd,d) is the maximum of all support (d,d), i — 1,2, • • • , m. We use LSM(D,d) to denote this predicted class label of d, then LSM(D,d) is obtained from the following equation LSM(D,d)=
argmax C i 6{C 1 ,C 2
4.4. Experiment
support {Cud)
(10)
Cm}
Study
We consider the classification problem of Iris Plants Database. We use 50 instances as prototype samples, in which there are 16 instances labelled Iris Setosa, 17 instances labelled Iris Versicolour and 17 instances labelled Iris Virginica, and 50 instances (15 instances labelled Iris Setosa, 17 instances labelled Iris Versicolour and 18 instance labelled Iris Virginica) for experiment. There are only two classification errors( 96% recognition rate). A fraction of the experiment results containing 2 classification errors are shown on Table 1. From Table 1, for each testing instance the class label with highest support is prominently higher than the supports of other two classes. The last two columns are two misclassified instances. For one the support of correct class label Versicolor (C2) is 0.36 and the highest support is that of Virginica (C3) -0.40, both are higher than the support of Setosa (Ci) -0.24. For another the support of correct class label Versicolor (C2) is 0.38 and the highest support is that of Virginica (C3) -0.43, both are higher than the support of Setosa (Ci) -0.19. For these two misclassified instances the support of the correct class label is very close to the highest support, and prominently higher than the support of another class label. These results illustrate that the presented classification method is robust. Table 1.
A fraction of the testing results on Iris Data, the testing results of 10 instances
d)
0
0
0
0.14
0.11
0.14
0.19
0
0.24
support(C2, d)
0
0.47
0.70
0.60
0.69
0.08
0.12
0.20
0.36
0.38
support(Ci,d)
1.00
0.53
0.30
0.26
0.20
0.78
0.69
0.80
0.40
0.43
Testing result
c3 c3
c3 c3
Ci
Ci
Ci
Cz
Ci
C3
C3
d
Ci
Ci
Ci
Cz
C3
d
Ci
Ci
support(Ci,
Correct class
0.19
5. Conclusion In this paper, we introduced the local structure mapping principle in analogy reasoning, then we applied the principle to instance-based learning problem, and proposed the basic learning algorithm that can be an alternative algorithm for multi-instances based learning tasks. The local structure mapping principle also can applied to data mining tasks other than classification. We haven't discussed the computational complexity of computing LSM(D,d) by (10) yet. One may think that computational complexity is very high for computing LSM(D,d). For prototype feature vectors of n-dimensions, we must consider 2" local structures of the query instance d, hence the computational complexity seems to grow exponentially with the dimensionality of the feature space. But the exponential complexity is only the guise of the problem, we can reduce the computational complexity by building of some data structure on the sample data before classifying new instances. This will be presented in our future papers. And the further study of local structure mapping principle used in knowledge discovery from data will be our future work.
References 1. Richard O. Duda, Petor E. Hart and David G. Stork: Pattern Classification. John Wiley & Sons (2001) 2. Holyoak, K. J., & Thagard, P. R.: Mental leaps: analogy in creative thought. Cambridge, MA: MIT Press (1995) 3. Salvucci D. Anderson J.: Integrating analogical mapping and general problem solving: the path-mapping theory. Cognitive Science 25 (2001) 67-110 4. Stephen Owen: Analogy for Automated Reasoning. Academic Press (1990) 5. Charles Dierbach and Daniel L. Chester: Abstraction Concept mapping: a foundation model for analogical reasoning. Computational Intelligence 13 (1997) 33-81 6. Arthur B. Markman: Constraints on analogical Inference. Cognitive Science 21 (1997) 373-418 7. Michael A. Arbib, Eds, The handbook of brain theory and neural networks. MIT Press (2003) 8. Nilardri C , Campbell J.: Knowledge Interpolation: A simple approach to rapid symbolic reasoning, Computers and Artificial Intelligence 17 (1998) 517551 9. Jocob E., and Joseph O. Eds.: Handbook of Discrete and Computational Geometry. CRC Press LLC (1997)
A KIND OF WEAK RATIO RULES FOR FORECASTING UPPER BOUND* QING WEI, BAOQING JIANG 1 , KUN WU Institute of Data and Knowledge Engineering, Henan University Kaifeng, Henan, 475001,China weiqing@henu. edu. en WEI WANG School of Electrical Eng., Southwest Jiaotong University Chengdu 610031, Sichuan, China This paper deals with the problem of a kind of weak ratio rules for forecasting upper bound, namely upper bound weak ratio rules. Upper bound weak ratio rules is parallel to Jiang's weak ratio rules and has such a reasoning meaning, that if the spending by a customer on bread is 2, then that on butter is at most 3. By discussing the mathematical model of upper bound weak ratio rules problem, we come to the conclusion that upper bound weak ratio rules are also a generalization of Boolean association rules and that every upper bound weak ratio rule is supported by a Boolean association rule. We propose an algorithm for mining an important subset of upper bound weak ratio rules and construct an upper bound weak ratio rule uncertainty reasoning method. Finally an example is given to show how to apply upper bound weak ratio rules to reconstructing lost data, to forecasting and to detecting outliers.
1. Introduction As proposed in paper [l], Weak Ratio Rules can be used to reconstruct lost data. For example, Suppose Butter: Bread 3= 5: 4 (1) be a weak ratio rule that we obtained, that is to say, if the spending on bread is $4, then according to the reasoning meaning of weak ratio rules, the lost data— the spending on butter— is $5 at least. In fact, the WRR method in paper [l] is a way to forecast the lower bound of lost values, so the Weak Ratio Rule'11 is called Lower Bound Weak Ratio Rule (LBWRR for short), and * This work is supported by the National Natural Science Foundation of China (60474022) and the Natural Science Foundation of Henan Province (G2002026, 200510475028). Corresponding author, email: [email protected].
f
185
correspondingly, the WRR method' ] is called LBWRR method here. While in this paper, an Upper Bound Weak Ratio Rule (UBWRR for short) is dealt with, such as Butter: Bread^7:4, (2) which means if the spending on bread is $4, then that on butter is at most $7. In this way, an UBWRR method is proposed. By LBWRR method, we can forecast the lower bound of the lost value; on the contrary, by UBWRR method we can forecast the upper bound of the lost value. Thus, by averaging lower bound and upper bound of the lost value, an average value can be obtained. Therefore, the accuracy of forecasting the lost data will be improved. This paper consists of four main sections: problem statement, mining algorithm, uncertainty reasoning and application. 2. Problem Statement Let D denote a PQTD[1] (pure quantitative transactional database). Let / denote the set of all items but T the set of all transactions. D(t, f)( also £>,('))> a t t n e rth row and the ;th column of A refers to the amount spent by a customer on item i in transaction t. Let A be a nonnegative real-valued function on /. We use supp^ to denote the set {x|*e /, A(x) > 0}, which is called the support set of A. If suppA = {x\, X2,..., xp), then we express/i as A(xi)/xi +A(x2)/x2 +...+A(xp)/xp or AQa) | A(x2) | XI
X2
|
A(») Xp
Let A,B be nonnegative real-valued functions on / and supp^4 O suppS = 0 ; We say that a transaction t supports A if Z),(/) > 0 for any / e supp/1. Let support_count(A) denote the number of transactions supporting A. We say a transaction / supports (A =>B)ift supports A,B and there exists a e (0, + oo), such a D,(i) ^ A{i) for any /esupp/i and B(f) ^ a D,{j) for anyy esupp5. Let support_count(A => B) denote the number of transactions supporting (A => B). Given minimum support threshold ms and minimum confidence threshold mc, if support_count(A => B) ;> ^g
m
"
support_count(A=> B) > mc support_co unt(A) then {A => B; ms; mc),simply (A => B) or A => B, which is called an Upper Bound Weak Ratio Rule (UBWRR for short). If the values in D are integer 0 or 1, A(t)=\ for any ie suppA and B(j)=\ for any j G supp5, then (A => B) is a upper bound weak ratio rule if and only if (supp^ => supp5) is a Boolean association rule(BAR for short in this paper).
So, we come to the conclusion that UBWRR problem is a generalization of BAR problem. We can prove that if (A => B; ms; mc) is a UBWRR then (supp^ => suppB; ms; mc) is a BAR. We can say that (supp/f => suppi?; ms; mc) is the support rule of (A => B; ms; mc), or (A =>£; ms; mc)\s supported by (supp/1 => suppS; ms; mc). If (A :=> B; ms; mc) is an UBWRR, we can get the following proposition "If the spending on item i by a customer is A(i) for any i G suppA then there is mc possibility that the spending by the customer on item j is at most B(J) for any jesuppB" which is called the reasoning meaning of (A =>B; ms; mc). 3. Mining Algorithm If a PQTD D, a minimum support threshold ms, a minimum confidence threshold mc and a Boolean Associate Rule(BAR) (X=$> Y ) is Given, then the set, Ruw (£>; ms; mcj(=> Y ) , of all UBWRRs supported by (X => Y) is determined. The following Algorithm3.1 give a method of finding Qm(D; ms; mc; X=> Y). An element in QUW(D; ms; mc; X=> Y) is called quasi-minimal UBWRR supported by (X=> Y). Without special statement, concepts in this paper will follow that of paper[i]. Algorithm 3.1:UBWRR, finding all quasi-minimal UBWRR Input: PQTD D, minimum support threshold ms, minimum confidence threshold mc andaBAR(X=>Y)X= {*,,x2, ••• ^P} Y={yx^2, ••• Jq}Output: QUW(D; ms; mcj(=> Y), the set of all quasi- minimal UBWRRs supported by (X=> Y). Method: (1) 17]:= record count in D; (2) Scan all the transactions t ofD, if there exists element x,of X cause D,(;CJ) = 0, then delete transaction t. support_count{X) := record count of left transactions; (3) SCLB:= max(ms * \T], mc * support_count{X)); (4) Scan all the transactions t, if there exists element^ of Y cause Dfyj) = 0, then delete transaction t; (5) For any x,GZ(i=l,2,...,p), sort the set Vx, :={D,(xJ\t supports^ U 7} a s { ^ ° , al°,..., a j } }, cause fl<°< o, (0 <...< a^,i=l,2,...p; (6) For any y^eY (/'= 1,2,... ,q), sort the set Vyt := {D,(yj)\t supports I U K ) a s {b(J\ W\..., b[J) }, cause b(0J)> & 0) >...> b[j) 7=1,2,...,q, m :=/> + q. (7) For any transaction t, calculate the matrix M(D,(Y)/D,{X)) :
188 Dt(y,)/Dt(x:)Dt(y2)/Dt(x,)
••• Di(yq)/D,(x,)
Dt(y,)/Dt(x2)Dt(yi)/Dt(xi)
••• Dt(yq)/Dt(x2)
KDt(yi)/Dt(xP)Dt(y2)/Dt(xP)
••• Dt(yq)/Dt(xp)^
(8) posMUB-0 ; (9) getmax(< >); (10) Build Quw(£>; ms; mc-J(=> Y) by posMUB. The following procedure is to find the maximal elements of a lower segment set. The procedure adopts deep-first algorithm, whose speed is quicker than that of breadth-first algorithm. Theoretically, for the UBWRR model, the minimal elements of an upper segment set should be got. However, by keeping the procedure's frame fixed for the portability of the program, but just overturning the data in the position lattice in Algorithm 3.1, we can prove that the effect is the same as that of rewrite the procedure. procedure getmax(h : a vector of natural numbers) (I) U ~{t\t e.posMUB; and the d(h)-prefix, oft, is greater than h}; ( 2 ) i f t / * 0 then (3) b0 := the biggest of (d(h) + l)th component of vectors in U (4) else b0 ~ 0; (5)*:=A„+1; (6) while b^ndm+l and IsUBWRRPos(
B(yO/A(x,)B(y2)/A(xi)
•••
B(yq)/A(x,f
B(y,)/A(x2)B(yi)/A(x2)
•••
B(yq)/A(X2)
•••
B(yq)/A(xP)J
yB(yi)/A(xP)B(y2)/A(xP)
(3) support_count(A => B) := 0; (4) Scan all the transactions t, if the values in matrix (M(B(Y)/A(X)M(D,(Y)/Dt(X)) axe all nonnegative, then add 1 to support_count(A => B); (5) if support_count(A-=$ B) 5s SCLB then (6) return TRUE (7) else return FALSE; Example 3.1: Consider the PQTD D in Table 1. Table 1. The PQTD in example 3.1 XI 10 1 10 5 2
T, T2 T3 T4 T5
X2 20 2 20 10 4
Yl 30 3 40 15 4
Let ms =1/3; mc = 2/3, X = {xhx2}; Y = 0>,}; then (X=>Y) is a BAR. We use Algorithm 3.1 to mine QUW(A' ms; mc;X=>Y), and the results can be got as follows: posMUB={<3,3,\>,<2,2,2>,<0,0,4>}, Qm(D; ms; mc;X=> Y) = {r,: (10/il+20/i2+30/i3), r2:(5/il+10/i2+15/i3), r3: (l/il+2/i2+3/i3)}. However, if we use Algorithm 3.1 in paper[i], the results are posM= {<3,3,0>,<2,2,1 >,< 1,1,2>,<0,0,3>}, Mqq(D; ms; mc;X^> Y) = {r,: (l/il+2/i2+3/i3), r2: (2/il+4/i2+4/i3), r3: (5/il+10/i2+15/i3), r4: (10/il+20/i2+30/i3)} 4. Uncertainty reasoning Using the reasoning meaning of UBWRR, we can reach UBWRR uncertainty reasoning method: rule 1: A, => B, rule n: A„ => B„ fact: P conclusion: Qu In this model, Ah Bh P, Qu are nonnegative real-valued functions on /, and suppP = suppAk = X, supp£* = Y, for any k =1,2...,«; Vy el,
QJJV
A > 0,
J y
*
jiY
190 The UBWRR uncertainty reasoning method has an intuitive meaning: If facts are Given as follows: UBWRRs (Ak=>Bk; ms; mc); k =l,2,...,n, and the fact that " spending on item /' by a customer is P(f) for any i e X\ then there is mc probability that the spending on item j by the customer is at most A«V^O-))forany;eY5. Application The LBWRR111 can be applied to data cleaning, data forecasting and outlier detecting. The reconstructed data underlined in Table 2 are shown in Table 3[1], Table 2. Original data
Rec# 1 2 3 4 5 6 7 8 9 10 11
11 149.3 161.2 171.5 175.5 180.8 190.7 202.1 212.4 226.1 231.9 239.0
12 4.2 4.1 3.1 3.1 1.1 2.2 2.1 5.6 5.0 5.1 0.7
13 108.1 114.8 123.2 126.9 132.1 137.7 146.0 154.1 162.3 164.3 167.6
14 15.9 16.4 19.0 19.1 18.8 20.4 22.7 26.5 28.1 27.6 26.3
Table 3. Reconstructed data by LBWRR Rec# 1 5 7 9 11
11 143.51 180.8 202.1 226.1 239.0
12 4.2 3.08 2.1 5.0 0.7
13 108.1 132.1 146.0 162.3 167.77
14 15.9 18.8 12.87 24.19 26.3
which showed that the reconstructed data by LBWRR are much closer to the original data than other methods. To ensure clear understanding of the new approach, the approach in section V of paper[l] is called LBWRR method here. Parallel to LBWRR method, our new approach for reconstructing lost data — the UBWRR method is as follows: 1. For any record t (corresponding a transaction) with the known data (lossless data) and the lost data, let A" be the set, in which all items corresponding with known data on row / are not UB outliers, and let Y be the set, in which all items corresponding with data on row t are lost. 2. Let ms, mc be 0.5 or smaller than 0.5 to ensure that (X=> Y ) is a BAR. 3. Mining QuW(£>; ms; mc-JC=> Y) by Algorithm 3.1.
191 4. Let the quasi-minimal UBWRRs mined in step 3 be the rules in the UBWRR uncertainty reasoning method in Section IV, and let 2_, x*x (D^x)/x) be the fact P. By applying the UBWRR uncertainty reasoning method, a conclusion Qu can be reached. 5. Let the values in conclusion Qu be the reconstructed values of/ on items of Y. According to the LBWRR method above, we can reconstruct the underlined data in Table 2. The results are shown in Table 4. By integrating UBWRR method with LBWRR method, a WRR method is proposed as follows: 1. Let Lva/=the reconstructed values by LBWRR method; 2. Let t/va/=the reconstructed values by UBWRR method; 3. Let Aval=(Lval+Uval)/2. Table 4. Reconstructed data by UBWRR Rec# I 5 7 9 II
II 161.48 180.8 202.1 226.1 239.0
12 4.2 3.068 2.1 5.0 0.7
13 108.1 132.1 146.0 162.79 172.08
14 15.9 18.8 227 24.87 26.3
By the WRR method above, the new values can be got, which are much closer to the original. The final results are shown as in Table 5. Table 5. Final results by WRR method ~Rec# I 5 7 9 II
II 152.50 180.8 202.1 226.1 239.0
12 4.2 107 2.1 5.0 0.7
13 108.1 132.1 146.0 162.54 169.92
14 15.9 18.8 17.78 24.53 26.3
Wang' ] discussed the root-mean-square error of ratio rule method and that of column average method. The RMS is defined as
V
'=' v'=i
Where dij is the reconstructed value, and dy is the lost original value. By the WRR method above, the RMS= 1.15. Results show that the WRR method is more exact than Jiang's LBWRRm(RMS=1.84),Flip Korn's ratio rule131 (RMS=2.04) and column average method (RMS=8.77).
192 6. Conclusion Parallel to the LBWRR[1I,we have proposed a new association relation (called upper bound weak ratio rules, UBWRR for short). Contrary to LBWRR's lower-bound-guess function, the UBWRR can guess the upper bound in pure quantitative transactional database. The UBWRR problem is also a generalization of BAR problem. Through discussing the mathematical model of UBWRR, we come to the conclusion that every UBWRR can induce a BAR as its support rule. We present an algorithm for mining an important subset of all UBWRRs supported by a given BAR. By the reasoning meaning of UBWRR, we propose an UBWRR uncertainty reasoning method. The UBWRR can be applied to reconstructing lost data, to forecasting and to detecting outlier. The WRR method is based on the average of reconstructed values' lower bound by LBWRR and upper bound by UBWRR, the final reconstructed value by WRR method is limited in a smaller range, which is much more exact than the simplex LBWRR method. Experiments demonstrate that the reconstructed data by the WRR method are much closer to the original data than that by Jiang's LBWRR method'11, and by Flip Korn's ratio rule' 3 ', or by column average method. References 1.
2. 3.
4. 5. 6. 7.
Baoqing Jiang, Yang Xu, Qing Wei and Kun Wu. Weak Ratio Rules between Nonnegative Real-valued Data in Transactional Database. The IEEE International Conference on Granular Computing, Beijing, Chinajuly 25-27, 2005.pp.488-491. S. Guillaume, A. Khenchaf and H. Briand, Generalizing Association Rules to Ordinal Rules, In The Conference on Information Quality (IQ2000), (MIT, Boston, MA, 2000), 268-282. Flip Korn, Alexandras Labrinidis, Yannis Kotidis, and Christos Faloutsos, Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining, In Proc. of the 24th International Conference on Very Large Data Bases (VLDB), pages 582-593, New York, USA, August 1998. A. Marcus, J.I. Maletic and K. Lin, Ordinal Association Rules for Error Identification in Data Sets, CIKM, pages 589-591, 2001. R. Srikant and R. Agrawal, Mining quantitative association rules in large relational tables, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD'96), pages 1-12. Montreal, Cannada, June 1996. Qingyi Wang, Donghui Shi, Bing He and Qingsh-eng Cai, Research on Linear Association Rules, Mini-Micro System, Vol. 22 No. 11 Nov. 2001. (Chinese) Y. Xu, D. Ruan, K. Qin and J. Liu, Lattice-Valued Logic, Springer- Verlag, 2003.
C O M B I N I N G VALIDITY I N D E X E S A N D MULTI-OBJECTIVE OPTIMIZATION B A S E D CLUSTERING
TANSEL OZYER1'3 AND REDA ALHAJJ1-2 Dept.
of Computer
Science, University of Calgary, Calgary, Alberta, Canada { ozyer, alhajj} @cpsc. ucalgary. ca Dept. of Computer Science, Global University, Beirut, Lebanon TOBB Economics and Technology University, Dept. of Computer Eng, Ankara, Turkey
In this study, we present a clustering approach that automatically determines the number of clusters before starting the actual clustering process. This is achieved by first running a multi-objective genetic algorithm on a sample of the given dataset to find the set of alternative solutions for a given range. Then, we apply cluster validity indexes to find the most appropriate number of clusters. Finally, we run CURE to do the actual clustering by feeding the determined number of clusters as input. The reported test results demonstrate the applicability and effectiveness of the proposed approach.
1. Introduction Clustering is the process of classifying a given set of objects into groups by taking into account two main criteria: objects in each group should be homogeneous and the different groups should be separate. To do this, it is necessary first to decide on the characteristics or attributes based on which to cluster the objects because the same set of objects may produce different clustering based on different combinations of the attributes. This is an application oriented decision. For instance, people may be classified based on any combination of age, sex, nationality, etc. It is necessary to point out that clustering is different from classification. While the latter is a supervised process, the former is unsupervised. In classification, it is necessary first to specify the classes and label the data to train the system to be able to classify new coming data objects. Clustering has several practical applications in biology, finance, webuser analysis, etc. Thus, several research groups developed different clustering algorithms, such as k-means 6 , PAM 5 , CURE 3 and ROCK 4 . However, most of the existing clustering approaches require the number of clusters to 193
194 be specified before the clustering process can start. Others are application oriented and find the clustering based on density analysis without having the number of clusters pre-specified. Identifying knowing the number of clusters as a major requirement for successful clustering and realizing that it is not realistic to have it known apriori in general, we started a project that utilizes multi-objective optimization to achieve better clustering. The motivation is that clustering is not a single objective process. Rather, the intention of clustering is to find homogeneous instance groups and separate the clusters as much as possible to clarify the distinction between them 2 . We tested different combinations of parameters and algorithms to be combined into the multi-objective optimization process. The process by itself delivers multiple solutions and hence it is necessary to rank them and decide on the most appropriate solution for the analyzed dataset. For this purpose, we utilize some of the major indexes already developed for cluster validity analysis. We have successfully tested different forms of the objectives and achieved promising results on different datasets 8 » 7 ' 11 ' 10 . Further, we realized, by testing on datasets from different domains, that not every index gives good result for every dataset. So, we discovered that the more indexes we use the better results we obtain because we decide on the best solution as the one favored by the majority of the indexes. The work described in this paper integrates multi-objectivity and CURE clustering. We use multi-objective optimization to find the alternative clustering, validity analysis to find the most appropriate number of clusters and CURE to do the final clustering based on the determined number of clusters. As compared to the previous stages of this project, in the first trial described in 8>7'11, we used the number of clusters and homogeneity as the two main objectives with the whole dataset as input. In the second stage described in 10 , we homogeneity and separateness as the objectives. We also used iterative approach to find the most natural clustering for each particular number of clusters within a prespecified range. It gives better results than the first approach, but suffers scalability problem. So, in the work described in this paper, we use three objective functions, namely the number of clusters, separateness and homogeneity. Also, we use a sample representative of the dataset and CURE for the final clustering. The conducted experiments demonstrate the applicability and effectiveness of the proposed approach. The rest of the paper is as follows. The proposed system is described in Section 2. Test results are reported in Section 3. Section 4 is conclusions.
195 2. The Clustering Process The clustering process applied in this study consists of three main phases. The first phase applies multi-objective genetic algorithm to find the alternative clustering results; the second phase applies different validity indexes to find the most appropriate number of clusters; and the third phase applies CURE to obtain the final clustering.
2.1. Finding Alternative
Solutions
During the first phase, we first generate a sample of the dataset. This is done in a way to speedup the multi-objective genetic algorithm (GA) process, which is known to be slow. Considering the sample a good representative of the original dataset, we run the GA with three main objectives, minimize the number of clusters, maximize the homogeneity within each cluster and maximize the heterogeneity between the clusters. We tested two alternatives for the heterogeneity or separateness, namely average linkage and average to centroid linkage. For homogeneity we used the total within cluster variation. For the separateness we used the following inter-cluster separability formulas, where C and D are candidate clusters. Average Linkage : d(C, D) =
Average t o Centroid:
Yl
|(?|
-=-7
d x
i1)
( ^v)
^2 d{x, vD) + ^
d(y, vc)
y€D
xSC
(2) For the homogeneity we used the following intra-cluster distance formula:
TWCV = J2 J2 Xld- j ^ Y E n=ld=l
fc=l
k
SF
™
(3)
d=l
where Xi, X2,.. ,XN are the N objects, Xnd denotes feature d of pattern Xn (n= 1 to N). SFkd is the sum of the d—th features of all the patterns in cluster fc(Gfc) and Z& denotes the number of patterns in cluster fc(Gfc)and SFkd is:
SFkd = y2^
Xnd
(d = 1,2, ...£>).
(4)
196 T h e GA process requires specifying a set of parameters, including the coding scheme for t h e individuals, the number of individuals in the population, t h e fitness, cross-over, mutation, and termination criteria. T h e termination criteria may be specified as a threshold on t h e progress achieved between different populations obtained during consecutive runs of t h e algorithm or as a maximum number of iterations t o be reached in case the first condition fails. Each individual in t h e population is represented by a chromosome of length n, where n is the number of d a t a points in t h e sample to be analyzed. Every gene is represented by an allele, where allele i is t h e corresponding cluster of instance i. In other words, each allele in t h e chromosome takes a value from the set {1, 2, . . . , K}, where K is t h e maximum number of clusters for which we t r y t o get an optimal partitioning. In this study, we choose K to be y/n. However, in case the validity analysis step favors y/n, then K is incremented by 5 in a repetitive process with the G A and validity analysis reapplied until the favored number of clusters is smaller t h a n t h e current value of K. T h e employed GA process works as follows. T h e current generation is assigned t o zero and a population with the specified number of chromosomes P is created. This is done by using the ordered initialization as follows. In round order, each allele took values 1 t o K in order, and then those allele value assignments are shuffled within t h e chromosome randomly by processing t h e random pairwise swap operation inside the chromosome. This way, we can avoid generating illegal strings, i.e., we avoid having some clusters without any pattern in t h e string. One-point crossover operator is applied on randomly chosen two chromosomes t o generate new chromosomes. Crossover is carried out with probability pc. To decide on candidate chromosomes t h a t will survive to t h e next generation, t h e selection process considers t h e optimization of the three objectives total within-cluster variation fitness value, separateness and number of clusters. Only the best P chromosomes are kept in t h e population for the next iteration. T h e aim of mutation is t o introduce new genetic material in an existing chromosome. T h e mutation operator replaces each gene value an by an' with respect to the probability distribution; for n = 1 , . . . , N. an' is a cluster number randomly selected from {1, . . . , K} with probability
197 distribution {pi,P2,- • • ,PK} defined as: p-d(X„,c)
(5)
ft = -T J2 e-d(Xn,c3) 3= 1
where i G [l..k] and d(Xn, C^) denotes Euclidean distance between pattern Xn and the centroid Ck of the k—th cluster; pi represents the probability interval of mutating gene assigned to cluster i (e.g., Roulette Wheel). Finally, if the maximum number of generations is reached, or the prespecified threshold is satisfied then exit; otherwise the next generation is produced.
2.2. Deciding
on Number
of
Clusters
The result obtained from the previous step is a set of alternative solutions. Each solution satisfies the criteria employed by the multi-objective optimization process. So, we need to decide on the most appropriate solution from the set of alternatives, we used the clustering validation schema described in 1. As a result, we apply the following validity indexes: scott, friedman, ratkowsky, calinski, rubin, Hubert, db, ssi, dunn and silhouette.
2.3. Applying
CURE for Actual
clustering
After the validity indexes suggests the most appropriate value for the number of clusters k, it is used as input to the CURE clustering algorithm to decide on the actual clustering of the whole dataset. The process of CURE can be summarized as follows. Starting with individual values as individual clusters, at each step the closest pair of clusters are merged to form a new cluster. This is repeated until only k clusters are left. As a result, individuals in the database are distributed into k clusters. The input parameters to this CURE are: The input data set D containing |Devalues in n-dimensional space, where \D\ is the number of values in the database and n is the number of attributes. (1) The desired number of clusters k (2) Starting with individual values as individual clusters, at each step the closest pair of clusters are merged to form a new cluster. The process is repeated until only k clusters are left.
198 3. E x p e r i m e n t s and R e s u l t s We conducted our experiments on Intel 4, 2.00 GHz C P U , 512 M B RAM running Windows X P Dell P C . T h e proposed process has been implemented based on t h e integrated version of GAlib ( C + + Library of Genetic Algorithm Components 2.4.6) 9 and NSGA-II source code (Compiled with g + + ) . Necessary or needed p a r t s have been (re)implemented for t h e multiobjective case. T h e approach and t h e utilized cluster validity algorithms have been conducted by using t h e cclust and fpc packages of t h e R Project for Statistical Computing 1 2 . We have run our implementation 20 times for the tested d a t a set with parameters: population size—100; tournament size during the increment the no of clusters^ approximately t h e noof items/5 (20% of t h e entire d a t a set); p(crossover) for t h e selection=.9; we tried single and two point crossover in order; single point crossover gave better results; p(mutation)= 0.05 and for t h e mutation itself the allele number is not changed randomly but with respect t o Equation 5. We used I R I S in t h e evaluation process: a classification of the iris plant in different species having 4 features, 150 examples and 3 classes, 50 instances each, without missing values. We executed t h e clustering process for different combinations of t h e homogeneity and separateness choices. Our termination criteria is chosen as t h e average of the population when each objective is no more minimized. We use 50 as t h e sample from the dataset; and hence \f%0 = 7 is used by t h e first step of the process as the upper limit for t h e candidate number of clusters t o test.
Table 1. 2 3 4 5 6 7
avg.silwidth 0.702591569 0.732155823 0.607591569 0.573189899 0.533915125 0.513803139
- Results of Average Linkage
hubertgamma 0.088402459 0.101366423 0.050971161 -0.058402459 -0.042051062 -0.040205191
dunn 0.007012415 0.011056519 0.00204998 0.004012415 0.003535411 0.003419531
w b . ratio 0.081916363 0.08141178 0.09017462 0.095016363 0.082071856 0.082062071
Reported in Tables 1, 2, 3, and 4 contain the results obtained by applying the different validity indexes on t h e outcome from t h e multi-objective optimization process. In average linkage, t h e four indexes in Tables 1 report t h e right optimal number of clusters as 3; and all of the four indexes report 2 as t h e next candidate number of clusters. Note t h a t 2 is accepted as a candidate solution because two of t h e Iris classes are very similar and
199 Table 2. calinski 1693 -234 527 257 1712 257
2 3 4 5 6 7
db 0.19 0.18 0.31 0.30 0.29 0.28
Table 3. 2 3 4 5 6 7
avg.silwidth 0.608993661 0.65936671 0.563333035 0.548893601 0.509930437 0.489900031 Table 4. calinski -3 1762 502 254 242 236
- Results of Average Linkage and TWCV ratkowsky 0.11 0.92 0.04 0.03 0.04 0.045
scott 105.16 185.37 49.82 20.27 114.11 14.24
Friedman 16.42 12.45 1.33 1.01 1.61 1.18
rubin -64.65 -9.10 -5.46 46.39 46.85 48.77
ssi 1.07 0.99 0.70 0.57 0.86 0.69
- results of Average to Centroid Linkage hubertgamma 0.200449112 0.362859 0.19052666 0.004449414 -0.100814738 -0.178814111
dunn 0.03711746 0.009238841 0.001631388 0.001814346 0.002227689 0.002020080
w b . ratio 0.097762442 0.100437027 0.09785464 0.098762442 0.098922179 0.099977143
- Results of Average to Centroid Linkage and TWCV db 0.29 0.18 0.31 0.42 0.59 0.98
ratkowsky 0.042 0.05 0.04 0.03 0.026 0.021
scott 113.78 181.17 53.77 18.90 18.76 17.29
Friedman 16.09 11.95 5.88 3.92 1.70 1.56
rubin -42.10 -7.65 -6.68 -6.54 -6.35 -6.23
ssi 0.06 0.12 0.07 0.06 0.08 0.09
some researchers report them as single class. As the convex clustering validity indexes reported in Table 2 are concerned, both 2 and 3 are reported as possible numbers of clusters, with the reported second candidate number of clusters as 3 and 2, respectively; except calinski index. The same applies to the analysis of the results using average to centroid linkage; both 2 and 3 are reported as either first or second candidate possible number of clusters as reported in Tables 3 and 4. 4. Conclusions Clustering is unsupervised process to classify instances from a given dataset into classes. It is in general required that an expert provides in advance the intended number of clusters in addition to other parameters. However, this is not possible especially as the dataset to be analyzed becomes more challenging with increased features. Knowing this, we developed an approach that utilizes multi-objective GA and validity indexes to report the most ap-
200 propriate number of clusters. This process is applied on a sample representative subset of t h e original dataset. After getting t h e number of clusters, we feed it as input to C U R E , which effectively clusters the original dataset. We demonstrated using t h e IRIS dataset t h a t t h e proposed approach works effectively and produces t h e intended clustering result. Currently, we are considering partitioning a dataset into different disjoint subsets; then cluster each subset alone; and at the end consider each cluster as a single point and cluster all t h e clusters to obtain the final clustering. This will help in handling t h e scalability problem t h e best.
References 1. E. Dimitriadou, S. Dolnicar, and A. Weingessel. An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1):137-160, March 2002. 2. J. Grabmeier and A. Rudolph. Techniques of cluster algorithms in data mining. Data Mining and Knowledge Discovery, 6:303-360, 2003. 3. S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. of ACM SIGMOD, pages 73-84, 1998. 4. S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. of IEEE ICDE, pages 512-521, 1999. 5. L. Kaufman and P.L. Rouseeauw. Finding group in data: An introduction to cluster analysis. John Wiley & Sons., New York, 1990. 6. A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Technical Report IAS-UVA-01-02., Computer Science Institute, University of Amsterdam, Netherlands, February 2001. 7. Y. Liu, T. Ozyer, R. Alhajj, and K. Barker. Cluster validity analysis of alternative solutions from multi-objective optimization. Proc. of SIAM DM, 2005. 8. Y. Liu, T. Ozyer, R. Alhajj, and K. Barker. Validity analysis of multiple clustering results on the pareto-optimal front. European Journal of Informatica, 29(1), 2005. 9. Massachusetts Institute of Technology and Matthew Wall. GAlib Documentation. MIT, USA, 2005. 10. T. Ozyer and R. Alhajj. Effective clustering by iterative approach. In Proc. of ISCIS. Springer-Verlag LNCS, 2005. 11. T. Ozyer, Y. Liu, R. Alhajj, and K. Barker. Multi-objective genetic algorithm based clustering approach and its application to gene expression data. Proc. of AD VIS, 2004. 12. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2004.
A METHOD FOR REDUCING LINGUISTIC TERMS IN SENSORY EVALUATION USING PRINCD7LE OF ROUGH SET THEORY* XIAOHONG LIU College of management, Southwest University for nationalities, Chengdu, Sichuan, 610041, China XIANYI ZENG LUDOVIC KOEHL The ENSAIT Textile Institute^ rue de I ,Ermitage,F-59100 Roubaix France YANG XU College of science, Southwest Jiaotong University, Sichuan Chengdu,
610041,China
This paper presents a method for reducing linguistic terms in sensory evaluation using the principle of rough set theory. Using this method, inconsistent and insensitive evaluation terms are removed and then the related sensory evaluation work can be simplified. The effectiveness of this method has been validated through an example in fabric hand evaluation.
1. Introduction Initially, sensory evaluation or sensory analysis was developed for studying the reactions to certain characteristics of food products. In today's industrial companies, especially in manufacturing fields, sensory evaluation has been widely used. It also concerns other specialized areas such as risk evaluation, investment evaluation, human resource evaluation and safety evaluation and so on. This concept is defined as follows (Stone and Sidel, 2004). Sensory evaluation is a scientific discipline used to evoke, measure, analyze and interpret reactions to chose characteristics ofproducts or materials as they are perceived by the senses of sight, smell, taste, touch and hearing.
* This work is partially supported by National Natural Science Foundation of China (Grant No. 60474022).
201
In the field of sensory evaluation, a linguistics term is used to describe one attribute of the related product quality or the consumer's preference related to this attribute (Zeng, 2004). The number of linguistic terms directly affects the result and the cost of sensory evaluation. This paper presents a new method for reducing linguistic terms in sensory evaluation using rough set theory. Using this method, inconsistent and insensitive evaluation linguistic terms can be removed according to sensory data provided by of experts while relevant evaluation terms can be preserved in the data based for the future evaluation work. Our strategy is composed of two steps. At the first step, the linguistic terms are divided into two sets according to the result of consistency for identical samples. At the second step, the linguistic terms are divided into two sets according to the sensitivity for different samples. The preserved relevant terms can be obtained using operations on theses sets and they correspond to the lower approximation of the rough set of the initial evaluation terms. The basic principle of our method is briefly presented as follows. First, we express the sensory evaluation system using a weighted knowledge table, i.e. S= in which U denotes a set of evaluation targets (U= {1,2,..., n}) and A is a set of linguistic terms each describing one attribute of the products of interest. Next, we calculate the index of individual consistency of evaluation terms for identical samples and the index of individual sensibility of evaluation terms for different samples. For the j-th term of the i-th product, these two performance indices are denoted as r? and r* respectively with r°,r' e[0,l]Next, we calculate the index of aggregated consistency and index of aggregated sensitivity of linguistic terms, denoted as pc and p* according to the following relations: p c = £
W
V >p ' z ^
w
y ,
c s p pP .
e[0,1]
• The whole set of
linguistic evaluation terms A can be then divided into two subsets satisfying the conditions pc > Ec and pc <Ect respectively. These subsets are denoted as V.c ,V,C • In the same way, the set of evaluation terms A can also be divided into two subsets V.'.V,' according to the conditions p" >s" and „» <E\. In this situation, 1
1
* j
J
H
i
J
we consider the result of yc n Vt" as lower approximation of the rough set of A. The linguistic terms belonging to the lower approximation should be preserved as relevant terms. More details are given in the following sections. 2. A knowledge table of linguistic terms in sensory evaluation Linguistic terms play an important role in the course of sensory evaluation. For example, in the field of fabric hand sensory evluation, the linguistic evaluation terms include soft, smooth, pleasant, etc. They describe different attributes
related to the product quality or consumer's preference related to these attributes. In practice, the efficiency and the cost of evaluation are strongly related to the number of linguistic terms. In each sensory evaluation, it is necessary to adopt relevant terms and remove irrelevant terms. The flow chart of evaluation terms generation is shown as Figure 1. Start
if Original terms No Test
Yes Practical terms Application
End Figurel. Flow chart of evaluation terms generation In Figure 1, we present the procedure for reducing linguistic evaluation terms using rough set theory. The sensory evaluation carried out by the i-th panel can be expressed using a weighted knowledge table mentioned in Section 1. Its formal representation is shown in table 1. Table 1. A knowledge table of linguistic evaluation terms Evaluation terms N. Producte>« ^ j \
A,
Am
1
Rn
Rln
n
Rml
Kmn
In table 1, A (j = \,...,m) denotes the j-th linguistic term, and R the evaluation result on the j-th evaluation term for the k-th product, Rk e {l,2,...,r}, i
r the maximal evaluation score used for the j-th term, and wj e [0,1], V w = 1 >
w and t are the weights of the i-th panel and the number of all panels respectively. In practice, there are too many terms in one system of sensory evaluation after the procedure of "brain storming", which generates an exhaustive list of evaluation terms. Therefore, we need to remove the inconsistent and insensitive evaluation terms according to the results of evaluation on all terms. However, if more than two panellists or experts give different evaluation results, one big problem is how to aggregate these results in a simple and suitable way. 3. Computing the indices of consistency and sensibility for evaluation terms In order to improve the efficiency in sensory evaluation and reduce the number of evaluation terms reasonably, we compute the indices of individual consistency and individual sensitivity of evaluation terms respectively. In a sensory evaluation, the consistency of evaluation terms is considered as the degree of resemblance of evaluation scores given by different experts when evaluating identical products. The sensitivity of evaluation terms is considered as the variation of evaluation scores when evaluating different products. Formally, these two indices are denoted as r c and r*. They represent the consistency and the sensibility of the j-th evaluation term for the i-th product respectively. We have rc.,r* e[0,l]- Their definitions are given as follows. Let V be the set of evaluation terms, V: Rr
•[0,1]
for the same product i, 1
d=0 0
for different products,
(1)
d = dm
0 —
d =0 m 0
1
d = dm
(2)
In eq.(l) and eq.(2), d is determined as follows: 1
(3)
where xt denotes the result of evaluation of the l-th panel, and x, e{l,2,...,r},3t = £ x . .
dm=max{dk}
(4)
k
k and dmare determined according to the maximal value (r) of the evaluation scores for the j-th evaluation term, for example, if r - 2, then k = 0,1, d0 = 0, dm=d],rij=\ orO. We suppose that evaluation scores of evaluation terms are 1,2,3 r, and the value of dm varies with r. Some results are shown as table 2. Table 2. The results of d
r dm
2 0.5
3 1.33
4 2.25
5 4.8
6 7.5
7 10.29
8 14
9 17.14
10 22.5
m
4. A method for reducing linguistic terms in sensory evaluation Let pc and p" be the index of aggregated consistency and aggregated sensitivity of the j-th evaluation term for all panels respectively, and p< = y
W
V ,p'=Y
W
V pc,p) e [0,1], wc and w* are the weights of the
individual consistency and individual sensibility of the i-th panel respectively, t is the number of existing panels. The j-th evaluation term does not play an important role in sensory evaluation only when the two following conditions are satisfied. In the first condition, the sensory evaluation results for the j-th term and the k-th product provided by different experts are quite different. In the second condition, the evaluation results for the j-th term and all products are very similar and concentrated. For the other cases, we remove the j-th term according to the values of pc and p". In rough set theory (Pawlak 1991), an approximation space can be denoted as A =, where U and R represent the domain of discussion and an elementary set in A respectively. For any element xeA , let [x]R be the equivalence class of R containing x. For any set X defined on the domain U, it can be characterized using two sets (the upper approximation and the lower approximation in A). The corresponding definitions are given below.
Aupp(X) = {xeU,[x]RnX*t\
(5)
Ahw(X) = \xeU,[x]lt
Al 7 5 6 7 7 6
A2 1 1 1 1 2 2
A3 1 1 1 2 2 1
A4 1 2 1 2 2 1
A5 3 2 2 3 3 2
A6 1 1 1 2 2 1
A7 1 1 1 2 1 2
A8 7 6 5 7 7 6
A9 3 2 3 2 3 3
A10 1 1 1 2 2 1
All 6 4 5 6 5 5
7
5
1
1
1
1
1
1
6
2
1
5
Another example of fabric hand evaluation is shown in Table 4. In this example, aggregated evaluation results of one expert panel are given for 18 different products and all the terms. Table 4. Sensory data on fabric hand evaluation Code sample 4-1 10-1 14-1 14-2
Al
A2
A3
A4
A5
A6
A7
A8
A9
A10
All
7
10
1
1
3
1
1
7
3
1
5
1 1 1
1 1 2
2 1 1
1 1 1
1 1 5
6 3 2
3 1 1
1 1 1
6 3 2
7 6 4
10 10 8
16-1 16-2
6 4
9 9
22-1 22-2 24-1
6 4 5
8 7 ~~8~
1 1 1 ~1 1 1 1
1
1 1
I I 2 1 1 1
2
1 1
4 5
1
2 ~2~
1
4 4
1
1 ~ i ~ 3 ~ 1 ~ 1 1 4 2 1 1 j ~ 1 ~~3~ 1 1
]
]
1
2__J
1
3 ~ 2 3
24-2
4
8
1
26-1
3
4
2
3
26-2
3
6
2
4
1 2
28-1 28-2
3 2
5 5
2 2
5 5
1 3 ~T~ 1 1 3 1 1
32-1 ~
2
~2~
3
~6~
32-2
2
3
4
6
34-1 34-2
1 1 5 7 1 5__J 1 1 5 1 | l | l | 6 | 7 | l [ 5 | l [ l | l | 5 | l
2
1 4
1
2
1
1
2
\
1
1
1
1
1 1
5
1
1
1
1
1 ~ 2 1 2
1~ 1
~1~
3
~ I
1
4
1
1
Using the equations (1) and (2), we calculate the values of rc and rs • The corresponding results are shown as table 5 (from table 3.) and table 6 (from table 4,) respectively. Table 5. The result of r° V
Grade D Dm rc
Al
A2
7 0.81 10.29 0.92
2 0.24 0.5 0.52
A3 2
A4 2
A5 3
0.24 0.5 0.52
0.29 0.5 0.42
0.57 1.33 0.57
A6 2 0.24 0.5 0.52
A7
A8
A9
A10
All
2 0.24 0.5 0.52
7 0.57 10.29 0.94
3 0.29 1.33 0.78
2 0.24 0.5 0.52
6 0.48 7.5 0.94
According to the values of rc shown in Table 5, we can see that the •J
evaluation terms can be ranked according to their values of sensitivity. Using the symbols of » and >- to denote the order of consistency of evaluation for all linguistic terms, i.e. likeness and preferment, and the obtained ranking result is AgwA 11 >-A 1 >-A9>-A 5 >-A2«A3«A6»A7«A 1 o>-A4 Table 6. The result of rs Al
A2
A3
A4
A5
7
10
6
7
3
3.75
9.53
2.35
5.28
0.26
2.5
1.44
3.56
10.29
22.5
7.5
10.29
1.33
4.8
4.8
10.29
0.42
0.31
0.51
0.17
0.52
0.25
0.35
Grade d dm
0.36
A6 5
A7 5
A8 7
A9
A10
All
3
5
6
0.47
2.03
2.47
1.33
4.8
7.5
0.35
0.42
0.33
According to the values of r* shown in Table 6, the evaluation terms can be ranked according to the values of sensitivity. We obtain A6>- A,;- A 2 «Aio>- A]>- A 8 »A 9 >- An>- A3>- A 7 VA 5 We determine the final linguistic terms in this experiment according to the result of table 5 and table 6. For given the values of the thresholds, we divide the
208 result of table 5 and table 6 into two subsets respectively, and they are denoted as V{ , Vc, V}< and Vs. Because of K,c > Vc and V{ >• Vs, set V* and Vts to build rough sets, i.e. V° u K,'5 and F^ n V* are upper approximation and lower approximation of rough sets. We use the lower approximation of rough sets to select the linguistic terms In this application, the values of the thresholds are given by experts. We obtain E° = 0.52 and E". - 0.35 respectively. With these two thresholds, we divide the whole set of the evaluation terms into 4 subsets using the method presented in Section 4. '\ = \A],A2,AJ,A5,A6,A1,As,A9,Aw,A]iy,y2 =v4 4 / "\ =\A\>A2,A4,A6,Ag,Ag,A]0i,v2 = \A3,A5,A1,Auj We obtain then the relevant evaluation terms preserved for future evaluation work by calculating the intersection between the subset of consistency and the subset of sensitivity. K, OK, =\Ai,A2,A6,Ai,A9,A\0j In this application, the relevant evaluation terms that should be preserved are: soft, smooth, compact, pleasant, fresh and dense. 6. Conclusion This paper discussed a method for reducing the number of evaluation linguistic terms using rough set theory. This method has been validated using a real application on fabric hand evaluation. In this application, we obtain the following results: for the consumers, only three to six terms are needed for characterizing the market behaviour but for experts determining the quality of products and product design, the number of terms needed is more than ten. Reference 1. G.B.Dijksterhuis, Multivariate data analysis in sensory and consumer science, Food & nutrition press, Inc. Trumbull, Connecticut 06611 USA. (1997) 2. H. Stone and J.L Sidel, Sensory Evaluation Practices (Third Edition). Academic Press, Inc., San Diego, CA. (2004) 3. Z.Pawlak, Rough sets, theoretical aspects of reasoning about data, Dordrecht, Holland: Klumer Academic Publisher groups. (1991) 4. X.Y.Zeng and Y.S.Ding. An introduction to intelligent evaluation, Journal of Donghua University. 3, 1-4 (2004)
T H E SPECIFICITY OF N E U R A L N E T W O R K S IN E X T R A C T I N G RULES FROM DATA
MARTIN H O L E N A Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod voddrenskou vezi 2, CZ-18207 Praha 8, e-mail: martin@cs. cas. cz
The paper provides a survey of methods for logical rules extraction from data, and draws attention to the specificity of rules extraction by means of artificial neural networks. The importance of rules extraction from data for real-world applications was illustrated on a case study with EEG data, in which 5 rules extraction methods were used, including one ANN-based method.
1.
Introduction
The wide spectrum of nowadays existing data mining methods entails a wide spectrum of various formal reperesentations for expressing the extracted knowledge, e.g., association and decision rules, classification hierarchies, clusters, regression functions, probability networks. Most of those representations are specific for certain classes of methods, for example classification hierarchies for classification, clusters for cluster analysis, regression functions for linear and nonlinear regression. There is, however, one important exception to that general characterization - knowledge representation with sentences of some formal logic. Such sentences, usually called rules, are used to express the knowledge from data in many methods, and those methods rely on various principles. The main objective of this paper is to provide a short survey of rules extraction methods and to point out the specificity of methods based on artificial neural networks (ANNs). Five rules extraction methods, including one ANN-based method, are illustrated on a case study with EEG data.
2.
O v e r v i e w of R u l e s E x t r a c t i o n from D a t a
The probably earliest method aimed specifically at the extraction of logical rules from data is the General unary hypotheses automaton (Guha), developed in the seventies within the framework of observational logic, which is 209
a Boolean logic extended with generalized quantifiers . In Guha actually only sentences of the form (~ x)(ip(x),ip(x)) are extracted, in simplified notation tp ~ tp, with Boolean predicates tp and tp, and with a binary generalized quantifier ~ . Moreover, nearly all generalized quantifiers encountered in the existing implementations of Guha have been inspired by statistical estimation methods (e.g. —>c, the founded implication with threshold c £ (0,1]) and hypotheses testing methods (e.g. —>[., the likely implication with threshold c and significance level a G (0,1), or ~ £ , the Fisher quantifier with significance level a). The closest relatives of Guha are various methods extracting from data association rules, i.e., Boolean implications valid with at least a given confidence c and supported by at least a given proportion s of data 2 ' 3 . A closer look reveals that such an association rule is actually a Guha sentence ip -> c tp, provided s = ^f- 4 . For the extraction of association rules, there is no harm if the antecedents of two or more implications overlap. On the contrary, a substantial part of data typically corresponds to the antecedents of several association rules simultaneously. However, rulesets with overlapping antecedents are undesirable if used for classification, and especially if used for decision making. In such situations, other methods need to be employed, leading to sets of Boolean implications without overlapping antecedents. Rules from such sets are called decision rules. The most important representatives of methods for the extraction of decision rules from data are AQ 5 , CN2 6 , and a large group of methods known as decision trees 7 , s . Their name refers to the fact that the extracted rulesets have a hierarchical structure, due to which they can be easily visualized as tree graphs. Somewhere between association rules and decision rules are fuzzy decision rules, which are implications of some fuzzy logic. Although they too are used for decision making and classification, it is in general not possible to avoid overlapping antecedents in a fuzzy logic. The best known methods extracting fuzzy decision rules are ANFIS 9 and NEFCLASS 1 0 . Inductive logic programming (ILP) consists basically in constructing an intensional definition of a relation from tuples known to belong or not to belong to it, while other relations can be used as background knowledge in the induced definition 11>12. Rules extraction with genetic algorithms (GA) is nowadays the probably most deeply elaborated GA application in data mining 13 - 14 . In that application, GA are used to optimize some quantifiable property or weighted combination of quantifiable properties of the extracted ruleset, e.g., its ac-
211 curacy, completeness, or some measure of its novelty or interestingness.
3.
R u l e s E x t r a c t i o n w i t h Artificial N e u r a l N e t w o r k s
All the rules extraction methods mentioned so far share one important common feature - rules are obtained directly from the input data, the knowledge contained in them is immediately expressed with sentences of some formal logic, without using any additional knowledge representation. Nevertheless, this is not a universal feature of all rules extraction methods, it does not pertain to one important class of such methods - methods for the extraction of rules from data by means of artificial neural networks. Actually, already the mapping computed by the network incorporates knowledge transferred t o the ANN from the data used for training, knowledge about implications that certain values of network inputs have for the values of its outputs. That knowledge is captured in the ANN architecture, and especially in a multidimensional parameter vector, which together with the architecture uniquely determines the computed mapping. It is this distributed knowledge representation that accounts for the excellent approximation properties of multilayer perceptrons. For humans, however, it is not as easily comprehensible as logical rules. T h a t is why methods for rules extraction from trained neural networks and from the approximations they compute have been developed since the late eighties. Up to now, already several dozens such methods exist (cf. the survey papers 15 > 16 ). They differ in a number of aspects, the most important among which are: expressive power of the extracted rules (Boolean and fuzzy rules), relationship between rules and network architecture, computational complexity of the method, its universality both with respect to acceptable kinds of neural networks, and with respect to acceptable kinds of inputs, as well as accuracy, fidelity and completeness of the rules. Nevertheless, all those methods have the common feature that they employ not only those input-output pairs that have been employed already for network training, but also additional pairs, obtained through the mapping computed by the network. Some methods actually employ only pairs obtained through that mapping, and do not need the original training pairs any more. T h a t feature sometimes allows ANN-based methods to find rules that can not be found with other methods. Hence, the distributed knowledge representation by means of the network architecture and the parameter vector is always an intermediate knowledge representation in ANN-based methods.
4.
A Case S t u d y w i t h E E G D a t a
In collaboration of neurophysiologists and transportation scientists, a research into EEG signals corresponding to somnolence has been performed at the Czech Technical University Prague. Its ultimate objective is to provide an empirical knowledge base for a system automatically detecting impaired vigilance, a cause of severe traffic accidents. A survey of that research and a description of the collected data have been presented in 17 . During data preprocessing, Gabor spectral analysis has been performed for EEG signals measured in 35 healthy volunteers and corresponding always to three vigilance levels - full vigilance, mental activity, and somnolence caused by sleep deprivation. The knowledge about the specificity of individual kinds of the signals was primarily obtained through visual inspection of the EEG records and of the corresponding spectrograms by
Table 1. Example rules extracted from the EEG spectra of 2 electrodes O and T with the methods (i.)-(v.) above. In the table, the abbreviations Of and Tj with / being an integer between 1 and 14 stand for the value of the spectrum from the respective electrode for the frequency / Hz
rule Oio 6 (6,30) & Tj e (3, 7.5) & T3 e (2,4) -> vigilance OioG (6,30)&T! 6 (1,5) —> vigilance Oio e (6,30) & T6 € <0,1.5)
extracted
specializes a rule
with method
extracted with method
Guha ->o. 65i (u, _^F —*o.oi Guha -+b.65,o.i.
Guha -+o.9, AQ,
~>0.01 Guha -+£oi
—> vigilance Oio G (0.5,1.5) k Ou e (3,7.5) & Til £ (2,15) —> mental activity Oi2e(l,3)&T3e(2,3)
Guha ->b.65,o.i> F ^O.Ol Guha -»o.65,0.1
—» somnolence 0 i € (1.5,11.5) k02e
(2,13)&
0 3 6 (0.4,7.5) & O12 e (3,11.5) k Ti € (2,9.5) & T 3 e (0,4.5) k T n e (0.5,6.5) - • vigilance
ANN
CART Guha —*o.9, AQ Guha -» 0 .9, -*o.65,o.i> CN2 Guha -> 0 .9, CART Guha -» 0 .9, CN2
expert physiologists. In addition, to 14 frequencies 1-14 Hz of EEG spectra from two selected electrodes, also 5 particular methods for rules extraction from data were applied: (i) the method Guha, in particular the generalized quantifiers —>o.9, "^o.65,o.i a n d ^ a o i ' m t n e s o called LISP-Miner implementation by the Laboratory of Intelligent Systems and Programming of the University of Economy in Prague; (ii) the version AQ21 of the AQ method, in an implementation by the Machine Learning and Inference Laboratory of the George Mason University in Fairfax; (iii) the CN2 method, implemented anew for this study; (iv) the Classification and Regression Trees (CART), one of the classical decision trees methods, proposed 1984 by Breiman et al. 7 , in their implementation in the Matlab Statistics Toolbox; (v) a method for the extraction of Boolean rules from data by means of piecewise-linear neural networks 18 , in an implementation by the author, making use of the system Rebup by the Machine Learning Research Center of the Queensland University of Technology. Several example rules extracted by those methods from the EEG data are given in Table 1. To better illustrate how different or similar are rules extracted with the individual methods, the table lists not only the methods with which the rules have been extracted, but also methods with which their generalizations have been extracted. 5.
Conclusion
The paper recalled the importance of methods for the extraction of logical rules from d a t a in data mining. To this end, a survey of most common methods of that kind was given, and a case study with EEG data was briefly sketched, in which 5 rules extraction methods were used. Moreover, the specificity of methods based on artificial neural networks was pointed out. T h a t specificity consists in the fact that the mapping computed by the network is always inserted as an intermediate knowledge representation between the input data and the extracted rules. This sometimes allows those methods to find rules that can not be found with other rules extraction methods, as also the reported case study has shown. Acknowledgments This research has been supported by the Czech Ministry for Education grant ME701, "Building Neuroinformation Bases, and Extracting Knowledge from them", and by the Institutional Research Plan AVOZ10300504.
214
References 1. P. Hajek and T. Havranek. Mechanizing Hypothesis Formation. Springer Verlag, Berlin, 1978. 2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307-328. AAAI Press, Menlo Park, 1996. 3. M.J. Zaki, S. Parathasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery, 1:343-373, 1997. 4. P. Hajek and M. Holena. Formal logics of discovery and hypothesis formation by machine. Theoretical Computer Science, 292:345-357, 2003. 5. R.S. Michalski and K.A. Kaufman. Learning patterns in noisy data. In G. Paliouras, V. Karkaletsis, and C.D. Spyropoulos, editors, Machine Learning and Its Applications. Lecture Notes in Computer Science 2049, pages 22-38. Springer Verlag, New York, 2001. 6. P. Clark and R. Boswell. Rule induction with cn2: Some recent improvements. In Y. Kodratoff, editor, Machine Learning - EWSL-91. Lecture Notes in Computer Science 482, pages 151-163. Springer Verlag, New York, 1991. 7. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984. 8. J. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco, 1992. 9. J.S.R. Jang and C.T. Sun. Neuro-fuzzy modeling and control. The Proceedings of the IEEE, 83:378-406, 1995. 10. D. Nauck. Fuzzy data analysis with NEFCLASS. International Journal of Approximate Reasoning, 32:103-130, 2002. 11. L. De Raedt. Interactive Theory Revision: An Inductive Logic Programming Approach. Academic Press, London, 1992. 12. S. Muggleton. Inductive Logic Programming. Academic Press, London, 1992. 13. A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer Verlag, Berlin, 2002. 14. M.L. Wong and K.S. Leung. Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers, Dordrecht, 2000. 15. A.B. Tickle, R. Andrews, M. Golea, and J. Diederich. The truth will come to light: Directions and challenges in extracting rules from trained artificial neural networks. IEEE Transactions on Neural Networks, 9:1057-1068, 1998. 16. S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation: Survey in soft computing framework. IEEE Transactions on Neural Networks, 11:748-768, 2000. 17. J. Faber, M. Novak, P. Svoboda, and V. Tatarinov. Electrical brain wave analysis during hypnagogium. Neural Network World, 13:41-54, 2003. 18. M. Holena. Extraction of logical rules from data by means of piecewise-linear neural networks. In Proceedings of the 5th International Conference on Discovery Science, pages 192-205. Springer Verlag, Berlin, 2002.
STABLE NEURAL ARCHITECTURE OF DYNAMIC NEURAL UNITS WITH ADAPTIVE TIME DELAYS IVO BUKOVSKY, JIRI BILA, Department of Instrumentation and Control Engineering, Czech Technical University, Technicka 4, Prague, 166 07, Czech Republic MADAN M. GUPTA Intelligent System Research Laboratory, Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Drive Saskatoon SK, S7N 5A9, CANADA The paper introduces the concept of continuous-time dynamic neural units with adaptable input and state variable time delays (TmDNU - Time Delay Neural Units). Two types of TmDNUs are proposed as they introduce adaptable time delays either into the neural inputs or both the neural inputs and the neural unit state variable. Robust capabilities of TmDNU for time delay identification and approximation of linear systems with dynamics of higher orders is shown for standalone single-input TmDNUs with linear neural output function (somatic operation). A simple dynamic BackPropagation learning algorithm is shown for continuoustime adaptation of the time delay parameters. The units also represent elements for building novel artificial neural network architectures.
1. Introduction Through simple principles, this article introduces the concept and basic types of linear dynamic neural units with adaptive time delays which we call TmDNU Time-Delay Neural Units. Contrary to conventional Tapped-Delay Neural Networks (TDNN) [1] where time delays are implemented within inter-neuron inter-layer feedback connections, the concept of TmDNU consists in time delays implemented within the neural units themselves. Further, it will be shown that neural weights of TmDNU are well adaptable by a simple dynamic BackPropagation (for dynamic BP based upon classical minimization of quadratic error function (performance index) [1] [2]. Examples of implementation of two basic types of TmDNU as applied to system identification and to higher order system approximation will be shown. TmDNUs as standalone working neural units with linear synaptic operation
215
216 (activating function) represent simple adaptive mechanisms for linear approximation (identification) of dynamic systems which optionally include time delays, and thus can also serve for identification of time delays instead of seeking for Pade approximation. When used in a network mode, TmDNUs, especially TmDNU-Type 2 constitutes a novel type of artificial dynamic neural networks (DNN) that could be called Time-Delay Neural Networks (TmDNN). In particular, a "super stable" designs of internal neural architectures for both TmDNU - Type 1 and TmDNU Type 2, are proposed as they prevent neural unit from converging toward values resulting in instability of standalone TmDNUs and learning algorithm.
ith TmDNU- Type 2
"l(0 ^2l\ •
Td=™ii
f(V,X0,W,)
dx(i) dt »
T
d=™$2
u„(t) ',
T
2
Xi=M>A2
Figure 1: 1 Linear Time-Delay Neural Unit - (Type 1 for W4=0), (Type 2 for Wa, ^ 0) with adaptable time delays represented by weights W3J on its input and W4 in state variable x. For simplicity, the letter /' indexing the neural unit instance is omitted as for indexing the neural parameters on the picture.
2. Dynamic Neural Units with Time Delays: TmDNU, and TmDNU2 The nature of the (dynamic) Time-Delay Neural Units (TmDNU) originates from linear dynamic neural units (Pineda, Hopfield, Cohen-Grossberg) [1] [3] [4]. TmDNU can be viewed as an adaptive mechanism capable of approximation of a dynamic system in form of a linear differential equation. The analogy with a differential equation indicates that we are going to deal with continuous dynamic neural units (DNUs), which are working in continuous-time where the fastest sampling period of a whole neural architecture is practically determined by the capabilities of a particularly used numerical method. We will classify TmDNUs into
217 two major types. Namely, TmDNU-Type 1 (w4=0) [2] and TmDNU-Type 2 (H>4^0) (Figure 1) will be introduced in this work. In case of TmDNUs, their architecture corresponds to the structure of the first order differential equation with time delayed input and optionally with time-delayed state variable. dx{t), -^-(
2 w
/
2 v-> + 7 min) + * ( ' - w 4 )=ZuW2j
2 ' » / ' ~W3j
)
(J)
Let's consider modification of single input DNU [1] given in Figure 1 where, w\, wi, wy (J~-/•••"), and w4 are neural weights, n is the number of neural inputs, the square root of w^ represents the time delay of/* input the square root of w4 represents the time delay of internal state variable x(t), constant x0 is a minimum time constant of the unit (analogy to neural bias), u(t) is vector of neural inputs, )( ) is somatic (transfer) function, y{f) is the neural output, and s is the Laplace operator. To assure the maximum (super) stability of TmDNU, function / ( ) from Figure 1 has been designed as follows f(y,wux0)
= v— w,
(2) +x0
The somatic operation ()){) of a neuron will be kept as linear. For in-this-way designed dynamic neural architecture (Figure 1), the neural bias X0 represents the minimal time constant, so it can be assigned *0^*min>0-
(3)
Researchers dealing with control engineering applications may sometimes deal with linear dynamic systems containing not only the input delays but also time-delayed state variables, where the introduction of time delay, here denoted as w42, into state variable results in the increase of capability to better approximate higher order dynamic systems, and thus it might potentially result in more robust dynamic neural networks with even less number of neural parameters and less complicated structure suitable for a given problem. We propose the above introduced dynamic structure Eq. (1) to be the basis of the both types of Time Delayed Neural Units that we also denote as TmDNU, (w4=0) [2] and TmDNU2 (w 4 #)).
218 3.
Dynamic BackPropagation for Time-Delay Neural Units
Dynamic neural units with time delays (TmDNU - Time-Delay Neural Unit) may be adapted by very simple dynamic version of the BackPropagation (BP) learning algorithm based on common gradient minimization of performance index (error function) [1]. The well known principle of this supervised learning and the weight adaptation is recalled in Eq. (4) (5) Eq. (5) then represents a simple dynamic version of BP which can be used to adapt also the neural weights representing time delays both on inputs and delay of the internal state variable of the proposed TmDNUs. + Aw,(*)
aw,
(4)
d0(QL-i\dGTmDNU(s)u^
•He(i)-
dx(t)
{
(5)
dwj
where jU a n d /J denotes learning rates related as (1 = 2 JU , u{t) denotes neural (and system) inputs, x(t) is state variable of the neural unit, y(t) is neural output and yr(i) is real system output, <J){) is neural output function (somatic operation of a neuron), w,- are neural weights, e(t) is an actual output error, GTmDNu (?) is the transfer function of a dynamic system represented by the internal dynamic structure of T m D N U , S is the Laplace operator, L~ denotes the inverse Laplace transformation, and t is the continuous parameter of time. In case of T m D N U (Figure 1) with a single input u{t), the neural output can be expressed using continuous-time transfer function as
y(Q =
r-i 2 (s) • U(s)}) = L
w2 -e 2 ,~
-{w,)2-s •U(s)
\ „ . o-K)2'*
(6)
where e is the Euler's number and
<j>{x)=x
=fle(t)L
A
-
— — • Y(s)
(7)
219 The realization of Eq. (7) is then depicted in Figure 2 bellow.
REAL SYSTEM «(/) TmDNU, 2w4y(t-w4
) 2w 4
&
(wl
s
+*min)
w/t/V Figure 2: The mechanism for generating the neural weight increment AW4 of neural weight W4 which represents the adaptable time-delay of TmDNU-Type 2 (Figure 1). Besides the purposely designed stable internal dynamic structure of both TmDNU) and T111DNU2, no special measures about stability of the units had to be taken to assure the stability of learning algorithm except the appropriate choice of learning rate and initial conditions. The choice of initial neural weights for TmDNU-Type 2, Eq. (1), should follow the condition
w4 w
\
2
71
+*min * 2 -
(8)
which is the stability condition for such a class of dynamic systems [5]. Recall, the TmDNUs are focused as standalone neural units in this work, and only the simplest and pure learning rule (the dynamic BackPropagation) is shown.
220
4. Approximating Capabilities of Time Delay Neural Units- Type 2 (TmDNU2) TmDNU-Type 2 for Approximation - Adaptation
— y(t) is neural output from TmDNU — yr(t)\s output from identified plant u(t) is input signal into the plant and TmDNU
U(t) -0.5
Figure 3: Detail of adaptation of TmDNU-Type 2 with linear output function ^>( ) . The neural unit performs approximation of a dynamic system (9) with time-delayed both input and state variable.
As an example, the application of TmDNU demonstrates its capabilities to approximate time-delayed systems and higher order dynamic systems by a simple dynamic system which both types of TmDNU represent. In this experimental part, the 'real' system-to-be-approximated is chosen as a linear plant of 10th order with the transfer function G(*) =
(2s+ 1)10
(9)
where S is the Laplace operator. In fact, the system (9) has been selected for it can be well approximated by system (1) in both time andfrequencydomain [5]. 5. Conclusions Two types of linear (continuous-time) Time Delay Neural Units including adaptable time delays have been proposed in this work. The units are denoted as TmDNU - Type 1 respectively Type 2 (or just TmDNU, respectively TmDNU2). Their standalone single-input modifications with the simplified linear neural output
221 (somatic) operation have been focused in this work in order to demonstrate their capabilities to identify time delays in dynamic systems or to approximate dynamic systems with dynamics of higher orders. TmDNU-Type 1 has been capable of both identifying time delays within linear plants or approximating higher order systems through input delays and can be also useful in applications where Pade approximation would have to be used otherwise. TmDNU-Type 2 has been introduced as an extension of the dynamic structure of TmDNUi where another adaptable time delay is introduced into the state variable of TmDNU; therefore, the approximation capabilities of TmDNU are enhanced. TmDNU-Type 2 for Approximation - Neural Weight Convergence 1
4
4
3
co
2 CM"
5
1 o>"
/ ^
1
i
, |
r
w1 ] i
w3~i
1
! w4| initial weights: w10:=w30 =w40=3 ! i I w2l j
J
1
f
1
i
f
i
1
> e... output error
J
1 1 i
|
|
200
300
400
—>-—i,
0
100
i .-j
500 time [sec]
Figure 4: Example of convergence of neural weights of TmDNU-Type 2 for approximation of system of 10* order (9). Neural weights wj and wt represent the continually adaptable parameters of time-delays
The disadvantage of TmDNUs for identification of time delays and system approximation can be seen in (relatively) longer time of adaptation which can be, however, significantly reduced by the choice of another set of initial conditions or by more sophisticated methods. In parallel, the problem of weight convergence toward local minima of error function is reduced due to the naturally robust approximating capability of TmDNU-Type 2 and can be further eliminated by the choice of another set of initial weights for adaptation. According to our experiments, the basin of initial neural weights for which the units converge to a minimum, which
222 provides the approximation with a sufficient degree of accuracy, is practically large enough. One of the advantages of TmDNUs can be seen in the capability to identify time delays within linear dynamic systems, robust capability to approximate linear dynamic systems with higher order dynamics (e.g. the approximation of 10th order is shown) in continuous-time domain. Further advantages are the simplicity of the learning algorithm, algorithm extendibility to neural units with higher order dynamics including more time delays, and excellent stability during adaptation due to the proposed design of the novel super stable-structure (1) of an artificial dynamic neuron. The novelty can also be seen, contrary to common artificial neural networks handling time delays in discrete time domain, in turning to continuous parameter of time, thus to continuous adaptation of time delays within artificial neural architectures, i.e. to adaptation of continuous transfer functions of dynamic neural units with time delays. Further, a possible implementation of multiple-input TmDNUs with nonlinear somatic operation 0 ( ) (Figure 1) into networks represents novel directions in continuous-time artificial neural networks. References 1. M.M. Gupta, Jin L., and N. Homma : Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory, IEEE Press and Wiley-Interscience, published by John Wiley & Sons, Inc., 2003, ISBN 0471-21948-7 2. Bukovsky, I., Bila, J., Gupta, M , M : Linear Dynamic Neural Units with Time Delay for Identification and Control (in Czech), In: Automation, Vol. 48, No. 10, Prague, Czech Republic, Oct 2005, p. 628-635. ISSN 0005125X 3.
J. Hopfield: Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proc. Nat. Sci. USA, Vol. 79, pp.25542558 4. F. J. Pineda: Dynamics and Architecture for Neural Computation, J. Complexity, Vol.4, pp.216-245, Sept. 1988 5. Zitek, P. - Vyhlidal, T.: Low Order Time Delay Approximation of Conventional Linear Model. In: 4th MATHMOD Vienna Proc, Vienna 2003, p. 197-204
EVALUATION CHARACTERISTICS FOR MULTILAYER PERCEPTRONS A N D TAKAGI SUGENO MODELS
WOLFGANG KAESTNER, T O M FOERSTER, CORNELIA LINTOW, RAINER HAMPEL Institute
University of Applied Sciences Zittau/Gorlitz of Process Technology, Automation and Measuring Technology Theodor Koerner-Allee 16, 02763 Zittau, Germany E-mail: [email protected]
(IPMj
In this contribution a comparative analysis of the evaluation for the two Soft Computing methods Multilayer Perceptron (MLP) and Takagi Sugeno model (TSM) will be described. Above all the developed characteristics of the linear connection between the input values and output values are to be compared with regard to the model quality. In addition, a correlation with the characteristic of the linear relationship of two characteristics of the process data is derived.
1. Introduction Multilayer Perceptrons [4] and Takagi Sugeno models [3] are well suited to approximate nonlinear process connections. In particular, this concerns processes, which can be described by analytic methods only insufficiently or not at all, since the analytical relationship between input values and output values is not or only poorly known. If an MLP or TS model has been established then it may be hard to describe the quality of such models. A possibility consists in evaluating various error characteristics. Here, the measuring range and stability problems are to be taken into account. Thus relatively small errors around zero can cause a large relative error leading to wrong conclusions. In addition, error values permit only local information about the model's accuracy in each sample point. For mean errors their distribution should be known in order to get a sound interpretation. Hence, for making statements about the global quality of a model several error characteristics should be consired. To do so, evaluation characteristics correlation weights tyxy [1] for the MLP and TS correlations Kxy 2 for the TSM were developed. They indicate
223
the linear connection between individual characteristics of the inputs and outputs of the models (like the empirical correlation coefficient rxy does for databases). Under certain boundary conditions (see section 3), the two evaluation characteristics of the models are directly comparable with the empirical correlation coefficient rxy. 2. Evaluation m e t h o d s 2.1. Analysis
by means
of internal
characteristics
Here we want to show that the global connection between the characteristics of the inputs Xi and the outputs Yj is mirrored the weight distribution of the MLP respectively in the coefficients of the TSM thus enabling further analysis. To this aim, characteristics had to be derived in certain analogy to classical data analysis. Finally, we want to make global conclusions about the linear behavior of the models in the state space in order to get a comparison with the linear connection of the databases. The Pearson correlation coefficient rxy may serve as such a (dimensionless) measure ranging in [—1,1]. Note, however, that rxy = 0 does not imply independence (the relationship may be strongly nonlinear). For two quadratically integrable variables X and Y the correlation coefficient is given by .
Cov(X,Y) _ ^Var(X) • ^/VarjY)
If only two series of measurements are known for x\, x 2 , . . . , xn and 2/1,2/2, • • •, yn, then the empirical correlation coefficient is computed by E i (xi - x){yi - y)
„ xv
-2.
VEi(*i-*)VEi(yi-i/) 2 '
with x n and 1
"
J7=--Z>' the latter being the empirical expectations of X and Y.
(4)
2.1.1. Weight analysis of the MLP The connection between process inputs and output values is stored in the architecture of the (trained) MLP by the individual weights Wij (weight distribution).
Y
Y
i
(1
Error characteristics
\Y
Y
r
Figure 1. Context of error and process characteristics In Fig. 1 the connection between the weight distribution and the connection of ther process characteristics given by the data as well as their evaluation is schematically represented. It can be seen that the error characteristics only refer to the outputs, whereas the correlation characteristics describe the process relationships between data and model. That is, the weight correlation provides information about the behavior of the model in the state space. To analyse the weight distribution so-called weight factors have been introduced, whereby the following simplifications were made. • Decoupling of the net structure • Linearization of the transfer function The former is accomplished in order to enable an individual anaylysis of the connection between input Xi and output value Yj. The network
architecture is therefore divided into i x j subnetworks.
Output Layer
—
Hidden Layer
—B,
Input Layer
Figure 2. Uncoupling of the net by the example of a two-layered MLP
Consider, for example, a Multilayer Perceptron with two layers of trainable weights, one input neuron X, one output neuron Y and two hidden neurons a and b. Hence, we get four combining weights u>y, which are denoted by w\a, u>xb, w a y and Why- As transfer function the tangent hyperbolicus is used. The bias values are denoted by B\. The output of the trained net is calculated through Y = tanh(way
• Ha + Wby • Hb + By)
(5)
with Ha = tanh(X
• wXa + Ba) ,
(6)
Hb = tanh(X
• wxb + Bb) .
(7)
The linearization of the transfer function leads to simple (linear) connections between the input and the output value. For small arguments the transfer function can obviously be linearized by tanh(z) s* z. Hence, from (1) we obtain Y = waY(X
• wXa + Bo) + wbY(X
Y = X(wXa
• WaY + ™Xb ' WbY) + Ba • WaY + Bb • Wby + By G
• wXb + Bb) + BY ,
(8) •
(9)
Biasportion
From (9) one can see that the equation consists of a linear part (with respect to the inputs X) containing the weights iOy, and a bias part.
The weight measure GxtYj indicates the linear connection between input Xi and output Yj. For an MLP of arbitrary size it holds
that is, the individual weights w^ of a vector path W ^ K , between the input Xi and the output Yj are multiplied and summed up afterwards. There are ax • a.2 • • • • • en vector paths w with the number of neurons 01 in the first hidden layer, 02 in the second hidden layer,..., a; neurons in the hidden layer I. In general for o inputs and p outputs one gets o • p weight measures G. They are independent of the starting initialization or training repetitions. For any training repitition we obtined nearly identical weight measures (for similar or identical net quality) even for different weight distributions u>ij. The transfer functions, however, may not vary (otherwise only the signs of the Wij are preserved). In order to compare or to generalize the weight measures for a process (with various MLP with different architectures) a normalisation is necessary. For this the correlation weight ^ x ^ has been introduced [*].
*** = -
^
=
•
(ID
A further characteristic is given by the weight measure HXY VxiYj
= 92XiY. .
(12)
Obviously, i
£fifc = l.
(13)
fc=l
2.1.2. Cluster analysis of the TSM For the computation of the Takagi Sugeno model, the cluster procedure was used, based on [3] . The cluster algorithm divides itself into structure identification and parameter identification. In the structure identification the database is classify into the clusters c. In the parameter identification the parameters for the fuzzy rules (number of clusters) are determined.
Now we perform the input-output analysis for TSM in a similar way as we did for MLP. considering In order to be able to regard the connection between the input value Xi and the initial value Yj, the structure of the model is divided here also into i x j "subnetworks". Figure 3 shows the uncoupling with i inputs and j = 1 initial value and the appropriate number of Clustern c. The uncoupling supplies i subnetworks, which show the connection between the respective Xi input and Yj initial value.
Figure 3.
Uncoupling of the structure of a TSM
The subnetworks can be described thereby as follows.
Yj(Xi) = (bu • Xi + b10) + ... + {bci • Xi + bc0) .
(14)
Simplified one can write for (14): Yj{Xi) = FirXi + Bj
(15)
with c
Fij =bli + b2i + ...+bCi=
J2bmi
,
(16)
c
Bj =b10+b2o + ...+bCQ=J2
bm0 .
(17)
771=1
The Fij in equation (15) are the rise factors represent over all cluster and according to the coefficients 6C, the cluster c for Xi are determined. The
value B summarizes the absolute values of the linear functions of the individual cluster. For the evaluation the TS correlation Kxy was introduced, which consists of the individual parameters of the cluster. It is computed as KH =
^
.
(18)
Also the TS correlation is a standardized characteristic, which represents the linear connection between Xi and Yj within the range [—1; 1]. As the further characteristic the TS measure My is introduced, which computes itself by squaring Kij. Ma = K% .
(19)
i
£)M f c = l .
(20)
fc=i
3. Comparative analysis In the Introduction boundary conditions were mentioned which should be fulfilled for a reasonable comparison of the characteristics rxy, KXY and tyxy grasping the linear relationship between inputs and outputs. These condition shall be discussed in more detail. (1) As described above, the empirical correlation coefficient rxY is a measure for the linearity between two series (e.g. of measurements) {xi,yi), (x2,jj2), • • • ,{xn,yn). For the evaluation of multiple stochastic dependencies partial and multiple correlation coefficients are applied. While the former accounts each of two series (neglecting the others) the latter evaluates the relationship of one series with respect to all others. The empirical correlation coefficent, however, ignores these relationships what of course may be erroneous. On the other hand, it is not possible to determine the influence of individual characteristics on the weight matrices of an MLP or the clusters of a TSM. Thus rxr is used. (2) For the computation of rxY we usually assumed that the given kdimensional random vector has a fc-dimensional normal distribution. However, this may unrealistic in applications. In these cases, TXY may be unreliable. Contrariwise, for MLP or TSM the distribution
230 is of no meaning with respect to the state space mapping. Recall that the models interpolate between the given nodes. This takes place also for non-normally distributed databases. 4. Conclusion The derived characteristics ^XY and KXY are suitables measures for the linear relationships between the inputs and outputs X and Y of MLP and TSM. They possess the same expressiveness as the empirical correlation coefficient TXY determined from a database. The concordance of these characteristics shows that the interrelationship of the data can be reproduced by the weights u>ij of the MLP respectively the coefficients of the TS models. References 1. T. Foerster, W. Kaestner: Analyse von Gewichtsstrukturen in Multilayer Perzeptren. Technical Report, IPM 2003. 2. C. Lintow: Modellierung/Simulation mittels Soft Computing Methoden. Diploma Thesis, IPM, Zittau, 2005. 3. C. Wong, C. Chen: A Clustering-Based Method for Fuzzy Modeling. Tamkang University, Taipei, Taiwan, 1999. 4. A. Zell: Simulation Neuronaler Netze. Addison-Wesley Publishing Company, Bonn, 1994.
RESEARCH ON IMPROVED MULTI-OBJECTIVE PARTICLE SWARM OPTIMIZATION ALGORITHMS DUO ZHAO* School of Electrical Engineering, Southwest Jiaotong Chengdu, Sichuan 610031 P.R.China
University,
WEIDONG JIN School of Electrical Engineering, Southwest Jiaotong Chengdu, Sichuan 610031 P.R.China
University,
As a novel multi-objective optimization technique, multi-objective particle swarm optimization (MOPSO) has gained much attention and some applications during the past decade. In order to enhance the performance of MOPSO on the diversity and the convergence of the solutions, this paper introduce the new methods to update the personal guide and select the global guide for each swarm members from the particle set and the Pareto front set. In order to validate the proposed method, some simulation results and comparisons with respect to several multi-objective evolutionary algorithms and MOPSO based algorithm which are representative of the state-of-the-art in this area are presented. The article concludes with a discussion of the obtained results as well as ideas for further research.
1. Introduction Since its inception in 1995, particle swarm optimization (PSO), which mimics the social behavior of a flock of birds or fish in order to guide swarms of particles towards the promising solution, has gained rapid popularity as a technique for solving single objective optimization problems [1,2]. In the past, evolutionary algorithms (EAs) have become established as the method at hand for multi-objective optimization problems (MOP) [3]. Recently, researchers have paid more and more attention to PSO to solve MOP. Several multiobjective particle swarm optimization (MOPSO) algorithms have been proposed in the last few years [4],[6],[8]. The MOPSO methods have the property that particles move towards the Pareto-optimal front during generations. Some of these works mainly focus on the design of novel selection or archiving ' Work partially supported by grant 60572143 of the China National Science Foundation and by grant 2005A13 of the Southwest Jiaotong University Science Foundation.
231
mechanisms [7]. But when considering the diversity and the convergence of the solutions, there are some redefinitions and new methods needed when extending the PSO to MOPSO. For instance, in MOPSO the Pareto-optimal solutions should be used to be the global guide for each particle of the swarm [8], and each particle should have a personal guide which represents the best position for itself. The selection of the global guide and the updating of the personal guide have a great impact on the performance of the MOPSO. Thus how to choose the global guide from the Pareto front and how to update the personal guide is a key problem for MOPSO. There are several main approaches to maintain diversity of optimal solutions for MOPSO that have been reported: the e-dominance method [9], the Sigma method [6], the Subswarms method [8], and the stripes method [4]. In this paper, we present a new method for global guide selection and personal guide updating to maintain the diversity and convergence for MOPSO. 2. Multi-objective Optimization The multi-objective optimization problems can be expressed as: Minimize F(x) = \ft(x),f2{x),...,/„»] L - J s.t.
gj(x)>0
; = l,2,...,p,
(1)
hk(x) = 0 k = \,2,:-,q
where m is the number of conflicting objective functions ft: 9?" —> 9? that we want to optimize simultaneously, x = [x],x2,...,xn]T s X c S czW is called the decision vector which is belong to the feasible region, i c S " the feasible region is formed by the/? equality and q inequality constraint functions. Deflnition 1: A decision vector x e X i s Pareto- optimal for every x e X , iff: ehherfi(x)
=
fi(x'),yis{\,2,...,m},orfi(x)>fi(x'),3iG{l,2,...,m}-
Deflnition 2: A decision vector x , e X is said to dominate x2 e X (denoted 3c, -< x2) iff: • x, is not worse than x2, i.e. fi (xx) < fi (x2) V/ e {1,2,..., m}, • x, is strictly better than x2, i.e. ft (x,) < ft (x2) 3/ e {1,2,..., m}. Deflnition 3: The Pareto-optimal Set (P ) is defined as: P*={XEX|^'EX,F(3C')^F(JC)}.
Deflnition 4: The Pareto Front (PF ) is defined as: PF'={F(x) = [Mx),f2(x),...,fm(x)]\xeP'}.
(2)
(3)
In a practical problem, it is impossible for us to get the specific description of the line or surface for the Pareto Front. In order to get the Pareto Front, the usual way is to calculate the nondominated points to form the Pareto Front.
3. Related work PSO and MOPSO involve a swarm of n-dimensional particles through the problem space, in search of the single optimal solution for a single optimization problem or Pareto-Front for the multi-objective optimization problem. Each particle has its own velocity, a memory of the best position it has obtained so far, described as personal guide or personal best position (pBest), and the knowledge of the best position achieved by all the particles of the swarm refer to global guide or global best position (gBest). For PSO, each particle according to its previous velocity, pBest and gBest, adjusts its velocity for the next position as: C , = « < +c^{pBestit -xD + c^igBesti, -x(,). (4) And the position of the particle is updated as: xJ - xJ + vJ . (5) where /' = 1,2,...,«, j is the index of particles of the swarm, and co is the inertia weight of the particle, c, and c2 are two positive constants, and rt,r2 e[0,l] are random values, and t denotes the generation index. Equation (4), (5) can also be applied to MOPSO, but the definition of the global guide (gBest) has to be redefined. In MOPSO, the global guide is no longer a single point but a set of non-dominated solutions. Therefore, the global guide must be selected from the archive that is an external population, in which the updated set of non-dominated solutions are kept. The personal guide ipBesi) for the particle j is a memory keeps the compared result of the last nondominated position pBestt and the new position xl+l. How to select the global guide from the archive, and how to perform the comparison for the personal guide has a great impact on the performance of convergence and diversity of the solutions. This has been studied in [4], [6], [8], [9]. • Laumanns's e-dominance method [9]: The main principle is the relaxation of the dominance. The main reason this uses E-dominance is to keep a certain number of particles in the archive, because the size of the archive depends on the e value which reduces the computation time. • Mostaghim and Teich's Sigma method and Subswarms method [6], [8]: The concepts of Sigma method and the Subswarms method are clustering. Depending on the elite particles in the archive and creating the clusters around the elites thus cover the Pareto front. • Villalobos-Arias's Stripes method [4]: The main mechanism is based on the use of stripes which are applied on the objective function space. According to the calculating of each minimum objective function values (2 objectives), form a line that similar to the Pareto-optimal front.
4. Description of our proposal We adopt the elitism scheme with archive that contains the historical nondominated solutions the particles have found so far. The main idea of our method is adding the global information in to the MOPSO. The global information is the average objective function values (denoted as G,) calculated by the nondominated solutions in the archive at the generation t as follows:
G, =[gu,---,gm,,]
(6)
=[
N N where j e {1,2,..., N} is the index of the archive member. 4.1. Global Guide Selection We can consider the position of each solution in the archive or swarm in 2 objective spaces. And in Delta method, an angle 8Aj is calculated with each archive member's
objective
function
values
(flJ,f2j)
and the
global
information ( g , , g2) as follows: = tan SPi = tan
fij-Si
( f -
-tan ~\
-tan
Si
(7)
8^ V6 i / JLL
(8)
The angle SPi with each particle in the swarm can be calculated as Equation (8) where / e {1,2,..., M) is the index of the particle in the swarm. Figure 1 show the core idea of finding the global guide among the archive members for each particle in the swarm. First, calculate angle SAJ ,je{\,2,...,N} for each member/ in the archive. Second, for particle / in the swarm calculate the difference between SA. and 5Pj, V i e {1,2,...,M} . Then, the member j in the archive which its 5AJ has the minimum difference to 8Pi
is selected as the global guide for the particle /'. In this figure, we can see
that the particle distribution in the search space are divided in two main areas, the left-bottom corner and the upper-right areas. According to the Delta method, particles
in
the
where I J ^ -Spj|
left-bottom
< \SAj -5p\,(k
corner
are
assigned
global
guide Ak,i ,
* / ' ) from archive members. Thus has the
ability to drive the particles direct to the archive members, in order to form the Pareto-optimal front. Particles in the right-upper area, because of their
distribution are assigned one of the two archive members with maximum and minimum angle. In this way, it ensures the particles in the right-upper area have the ability to explore the end of the Pareto-optimal front and to overcome the back draws in the Sigma method.
M
* Ptf*n.i ! W.*~
•
P.*rtit3e m rfe? swam
0
Averse fsmeti«* ultKs
», ?v.
4
"M
\ Figure 1.
ft
,'
\» *^
Global Guide Selection
Figure 2. Personal Guide Updating
4.2. Personal Guide Updating Personal guide Pit for each particle /, is a memory which stores the best position that the particle has found so far at generation t. When the particles gained a new position Pi,l+\ at next generation, an update should be performed. The personal guide update for each particle is another problem in the MOPSO as proposed in [8]. In our method, we adopt another scheme for personal guide updating which illustrated in Figure 2 (2-objective problem). The archive members' average objective function value G, is also used in the personal guide updating scheme. There four relations between personal guide P't and the G,, only when particle Pi,/+i at next generation directjn to the shadow area, personal guide Pjt will be updated with Pi,t+\ , whereP,,(+i =PIM . 5. Comparison of results The Delta method has been validated using several test functions that take from the specialized literature proposed in [3]. In this paper, we choose three test functions: ZDT1, ZDT2 and ZDT3 describe in Table 1. In order to know how competitive our approach was, comparisons against two other multi-objective evolutionary algorithms: NAGA-II [5], s-MOEA [9] and one multi-objective particle swarm optimization algorithm ST-MOPSO that are representative of the state-of-the-art were presented. And in order to allow a quantitative assessment of the performance of these multi-objective algorithms,
we adopted the following two metrics: Inverted Generational Distance (IGD) and Success Counting (SC) which were proposed by Villalobos-Arias et al. Table 1. Test Function ZDTl
*(*,
0=i+9(2I 2 *,)/(«-i);
h(fvg)=\-JJJg
x,e[0,l], n = 30, i = l n 2-objective,30 parameters
/(*) = *,; f1(x) = g(xl,...,x,)-MJt,g) ZDT2 g(x, x,)^l + 9-(£l2x,)/(n-l); A(/i,g) = ! - ( / / « ) ' x,6[0,l], n = 30, i = l,...,n 2-objective,30 parameters fl(x) = xl; f1(x) = g(x2 x,)-h(ft,g) ZDT3 g(x2 x, e [0,1], n = 30, ( = 1 n *„)=i+MXL*,)/(«-i); 2-objective,30 parameters A(/;.«) = l-/A78-(/;/s)-sina0a-yi) f,(.x) = xl; f1(x) = g{xJ xn)-hW„g)
For each test functions, 30 independent runs have been performed. Figs. 3, 4 and 5 show the graphical results produced by DMOPSO in the test functions ZDTl, ZDT2 and ZDT3 respectively. The true Pareto-optimal fronts of the problems are shown as continuous lines in the figure. In order to illustrate the whole performance of the method for 30 independent runs, the solutions were drawn in the left figure. Middle and right figures contain the solutions which have the bad and best performance with IGD metric respectively.
0
0.5
1
0
fi
0.5
1
0
n
0.5
1
n
Figure 3. Pareto fronts obtained by our method for test function ZDTl
Figure 4. Pareto fr6nts obtained by our method for test function ZDT2
0
0.5 f1
1
0
0.5 fi
1
0
0.5 (1
Figure 5. Pareto fronts obtained by our method for test function ZDT3
1
237 Table 2, 3 and 4 show the comparison of results among the DMOPSO and the results obtained in [4] for the other three algorithms considering the previously described metrics for three test problems. In table 1, for test function ZDTl, although the performance of DMOPSO for SC metric is not as good as the STMOPSO, it is much better than NAGA-II and e-MOEA. It also can be seen that the average performance of DMOPSO is as good as the STMOPSO with respect to the IGD which means solutions obtained from our method are much closer to the true Pareto-optimal front than other methods. In table 2, DMOPSO has a better performance for metric IGD, and a better performance for metric SC. In all 30 independent runs, there was only one time that the solutions did not cover the Pareto-optimal front entirely. In table 3, for test function ZDT3, there exists some local Pareto-optimal, and the results of the worst number for SC is zero, which means the solutions founded by DMOPSO did not reach the Pareto-optimal front at all. This is related to the property of the test function, the same results can be seen with the other 3 algorithms. Table 2. Results of the IGD and SC for ZDTl test function IGD Best Worst Mean St. dev. Median
DMOPSO
STMOPSO
EMOEA
1.98e-05 6.18e-03 2.52e-04 6.06e-04 2.32e-04
3.43e-04 6.70e-04 4.30e-04 7.39e-05 4.19e-04
1.67e-03 1.90e-02 7.95e-03 5.05e-03 6.55e-03
NSGAH
2.04e-03 2.76e-02 6.41e-03 5.22e-02 4.95e-03
DMOPSO
SC STMOPSO
EMOEA
NSGAII
100 49 92.57 6.978 98
100 95 99.3 1.208 100
2 0 0.3 0.53 0
8 0 1.1 1.668 1
Table 3. Results of the IGD and SC for ZDT2 test function iIGD Best Worst Mean St. dev. Median
DMOPSO
STMOPSO
EMOEA
4.56e-03 2.54e-04 4.44e-03 1.75e-04 4.56e-03
5.11e-02 1.04e-02 1.87e-02 4.39e-04 5.11e-02
5.16e-02 1.68e-02 1.28e-02 1.12e-02 5.16e-02
NSGAII
5.50e-02 0.37373 2.03e-02 5.18e-02 5.50e-02
DMOPSO
SC STMOPSO
EMOEA
NSGAn
100 9 93.4 8.375 100
100 1 75.6 41.838 100
0 0 0 0 0
0 0 0 0 0
Table 4. Results of the IGD and SC for ZDT3 test function DMOPSO
STMOPSO
EMOEA
NSGAII
DMOPSO
SC STMOPSO
eMOEA
NSGAII
1.05e-04 2.75e-02 7.29e-03 4.27e-03 3.49e-03
7.05e-04 3.97e-02 3.69e-03 7.08e-03 2.04e-03
2.36e-03 0.2357 8.42e-03 3.97e-03 8.04e-03
1.63e-02 2.31e-02 6.51e-03 4.45e-03 5.95e-03
100 0 65 37.546 93
100 0 82.733 25.81 88.5
4 0 0.567 1.165 0
4 0 0.667 0.254 0
I[GD Best Worst Mean St. dev. Median
6. Conclusion and future works This paper has described a method called DMOPSO for multi-objective optimization problems. The core idea is the new method for global guide selection and the personal guide updating scheme in the multi-objective particle
238 swarms optimization algorithm. This new method ensures better performance for diversity and the convergence of MOPSO. The proposed technique was tested using several multi-objective test functions and compared against 3 algorithms. The results have promising improvements over the evolutionary approach for multi-objective optimization, and have almost the same performance as the STMOPSO based on the MOPSO algorithms. This paper mainly considered the 2-dimensional multi-objective optimization problems. How to expand the Delta method to the higher dimensional multi-objective optimization problems will be studied in the future work. Another aspect that we would like to explore in the future is the application of the DMOPSO method for train suspension control system optimization. References 1. J. Kennedy and R. Eberhart, "PSO optimization," in Proc. IEEE Int.Conf. Neural Networks, vol. 4, pp. 1941-1948, (1995). 2. Shi Y. and Eberhart R C, "Particle Swarm Optimization: developments, applications and resources," In Proc Congress on Evol. Comput.,NJ: Piscataway, pp.81-86,(2001). 3. E. Zitzler, K. Deb, and L. Thiele, "Comparison of multi-objective evolutionary algorithms: Empirical results," Evol. Comput., vol. 8, no. 2, pp. 173-195, (2000). 4. M.A. Villalobos-Arias, G. T. Pulido, and C.A. Coello Coello, "A proposal to use stripes to maintain diversity in a multi-objective particle swarm optimizer," In Proc.IEEEInt.Conf. SIS2005, pp.22-29, (2005). 5. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, "A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II," IEEE Trans. Evol. Comput., vol. 6, pp: 182-197, Apr.(2002). 6. S. Mostaghim and J. Teich, "Covering Pareto-optimal fronts by subswarms in multi-objective particle swarm optimization," Evol. Comput., vol. 2, pp. 19-23, June (2004). 7. J.E. Fieldsend, R.M. Everson, and S. Singh, "Using unconstrained elite arch-ives for multi-objective optimization", Evol. Comput, vol. 7, pp. 305323, June (2003). 8. S. Mostaghim and J. Teich. "Strategies for finding goad local guides in multi-objective particle swarm optimization," In IEEE Swan Intelligence Symposium, pp. 26 - 33, (2003). 9. Marco Laumanns, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler, "Combining Convergence and Diversity in Evolutionary Multi-objective Optimization," Evol. Comput., 10(3):263-282, (2002).
PART 2
Decision Making and Knowledge Discovery
This page is intentionally left blank
KNOWLEDGE DISCOVERY FOR CUSTOMER CLASSIFICATION ON THE PRINCIPLE OF MAXIMUM PROFIT CHUANHUA ZENG 1,2 College of Auto-mobile and Transportation Engineering, Xihua University, ChengDu City P.R. China 610039, Phone: +86-028-89829140, E-mail: zchfirst(w263.net YANG XU intelligent
Control Development Center, Southwest Jiaotong University, Chengdu P.R. China 610031 WEICHENG XIE
3
College of Electrical & Information Engineering, Xihua University, ChengDu City P.R. China 610039 It is one of the most important strategies to make management according to the classification of customers. A new method to classify customers is presented in this paper. Firstly we try to find the key background information by simplifying the decision table based on the rough set theory; secondly we get to know the profit by analyzing the sale and the cost of customers; and finally, we get the decision rules on the principle of maximum profit. As such, we could reason out to which class that a new customer belongs and select a good way to serve him, thus achieving the optimal economic benefit for enterprises.
1.
Introduction
It is essential to identify the customer's class by analyzing his background information. The reason is that as long as we know to which class the customer belongs, we could choose a proper method to serve him, so as to achieve the goal of using the source of enterprise efficaciously. Such a method on the one hand, improves the service level as well as attracts more customers; on the other, economizes sources by gaining a maximum output with minimum input. Take current methods for instance. The decision tree method, which is put forward in reference [4], first classifies financial customers by their credibility and contribution, then identifies the relationship between the classification results and the background information through the decision tree method, and finally infers the probability of the new customer's class so as to select a proper 241
242 strategy for him. Although the probability is already given in reference [4], that to which class the customer exactly belongs is not sure. Besides, the activitybased costing method, which is presented in reference [5], suggests a way to identify the most valuable customer by calculating the activity cost, then select the suitable strategy according to the cost. Since we know the background of customers, and the profit resulted from different classification, provided we classify customers according to our experience, then here arise questions: could we find the correlation between our choices? Are these choices bound to make maximum profit? Taking these questions into consideration, we may classify a new customer into certain group. A new method based on rough set theory to classify customers is set forth in this paper. Firstly we get the decision table of customers' classification. Secondly, we simplify the attributes of the decision table and find the essential background information of customers. Thirdly, we get the profit corresponding to certain decision by analyzing the relationship between costs and sales. And finally, we get the classification knowledge under the principle of maximum profit. As such, we could reason out a new customer's class and accordingly achieve the optimal economic benefit for enterprises. 2. The simplification of disharmonious decision table As to the decision table of customers' classification, it all depends on how people adjust the classification of customers. Many factors that may affect the result are taken into account. However, these factors can not be included in the condition attributes in the table, so the decision table may not be harmonious and accordingly we should employ the rough set theory to simplify it in nonharmonious decision table. Definition 1 Decision table1" (U,A,F,{d},{gd}) is a decision table, where U is the set of objects, and U = {xx,x2,-,x„}; A is the set of condition attributes; d is decision attribute; F is the set of relations between A and U , i.e. F = {fj : j <m) , where / • : [ / - » Vj(j < m) , and Vj is the value field of aj . gd :U -> Krf is the set of relation between U and d, and Vd is the finite value field of gd . U RA and Rd is the equivalent relationships generated by condition attributes set A and decision attribute set {d} respectively. For any B c A: UIRB = {[x]B :x<sU}, UIRD = {C\,D2,...,Dr}, where [x]B ={yeU: (x,y) e RB] ,RB= { ( W ; ) : /,(*,) = f,(xj)(Va,
<E B)}.
Let EKDj/[x\B)=
\D;r\[x]B\ ' ' *'
(/Sr),
I L*JB I
£> is the inclusion degree. Let ftB(x) = (D(Di/[x]B),...,D(Dr/[x]B)) (xeU), fiB(x) is the generalized decision distribute function of x. Definition 2 Suppose (JJ,A,F,{d\{gd}) is a decision table, and Be A. As to any \fxeU : if /uB(x) = ftA(x), then B is distribution harmonious decision set. If B is distribution harmonious set and any subset of B is not harmonious set, then B is the distribution simplification. Definition 3 Suppose (U,A,F,{d},{gd}) is a decision table, and UIRA ={q,C2,...,C,}.Let Z>* = {(W / 4 ,bb):// / < W^// B W}, and /t(C,) is the value of C, 's attribute a^. To definition ({«* e ^ : /*(C,) * MCJMC„CJ)
e D*
[A,(C„Cj)*D Then Di(ChCj) is the distribution identifiable attributes set of C, and Cj. A = (D(ChCj),i,j
244 3
R(.n\W)'^(n\'Oj)Pi(Oj\lxYi. 7=1
As to certain customer described as [x], let r{x) be a decision rule, i.e. T(X) e E, let R be the total expected risk of all the decision rules, so we get that R=
^R{T(X))P([X]).
We want to know what decision we can make while the total risk is minimum. According to the Bayes decision procedure, we get the decision rules under the minimum risk, which are showed as follows: (1) r, : [x] -> a , if R{n IM) > R{* I [x]),i = 2,3 , (2) r2:[x]->b, if R(r2 \ [x]) > R^ | [x]),i = 1,3, (2) r3 : [x] -> c, if R(r3 | [*]) > *fo I [x]),i = 1,2 . 4. How to calculate the profit The enterprise classifies its customers into deferent classes according to their characters and selects different service combination for each class. It leads to different cost expended on different customer. The activity cost of customer consists of product cost, selling cost and other cost. The selling cost includes strategy cost and service cost etc. And other cost comprises service cost after sales and management cost etc. By subtracting the total costs from the purchasing sum of each customer, we get the profit gained from him. Let Et be the average cost of », class, and /, be the average purchasing sum of (Dj. We classify customers as classes with different level according to their contribution to enterprise. As to each customer with state a,-, if we make decision r, on it, then we gained the profit from the customer by subtracting the average cost of at class from the average purchasing sum of wt class; if we make decision #•,• with higher level than o,- class on it, then the cost of customer should be the average cost of mj class, while the average purchasing sum should be still at the level of OJ, class because of the limited service; if we make decision rj with lower level than »,• class, then the cost of customer should be the average cost of aj class, while the average purchasing sum should be at the level of tOj class because of the limited service. So we get the following formula.
K1
'
>'
\IJ-E„ift<j
5. Algorithm for classification rule discovery We get the classification rules by the following steps: (l)To get the equivalence classes according to the decision attribute; (2)To get the equivalence classes according to the condition attributes; (3) To get the generalized decision distribution function of each object in decision table; (4)To get the distribution identifiable attributes matrix; (5) To get the distribution simplification B; (6)To get the decision rules on the principle of maximum profit. 6. An example To choose some customers classification information from certain enterprise, we get a decision table, where U = {xux2,..,x20} is the set of customers, and A = {ai,a2,a3,a4} is the set of condition attributes, here at is the income level of customer, a2 is the age fields of customer, a3 is the consumption level of customer, and o4 refers to the frequency of business, while d means the classification result and decision attribute, N means Average, H means HIGH, L means LOW. TABLE 1 THE DECISION TABLE
No
«!
a2
a3
a4
d
1 2
N N
H H
L H
H H
a a
20
Firstly we get the identifiable attributes matrix, then we know that the simplification is {oj, o 3 , o4 }. To classify the customers according to the simplified condition attributes, We get C\ ={*1>*10}>
C
2 ={*2>*12.*18)> •••>
C
l l = (*17 } •
The average cost and the average purchasing sum are showed as follows: /, =8000, I2 =4000, / 3 = 1000, £-, =2000, E2 = 1500, E3 =500. So /l(ii |fi>,)=6000,A(/i |
246 To calculate R(J-J | C,), we get R{rx | C,') =6000, R(r2 | C,') =2500, R(r3 \ c[) =500, Finally, we get the decision rules on the principle of maximum profit: Q -> a,C2 -> o,C3 -> a , C4 -» a.Q -> c,Q -> c , C7 -> c,Q -> 6,C9 -» 6, C|Q —> o,C)i —> b. i.e. (a,,7V)A(a3,Z,)A(a4,//)->o, (a,,iV) A(a 3 ,//)A(a 4 ,//) ^ o, When new customers turn up, we may classify them by these rules. Take the first rule as an example, it means that if a customer is of average income, low consumption but is a regular customer of the enterprise, we consider him as a member of "a" group. Conclusion By applying the algorithm put forward in this paper, we can get the relationship between background and classification of customers easily. We can also reason out the class of a new customer according to the knowledge mined from the decision table. And more importantly, with such management of customers, it satisfies the principle of maximum profit, so as to achieve the optimal economic benefit for enterprises. Acknowledgments This paper is supported by the National Natural Science Foundation of P.R. China (Grant no. 60474022). References 1. Wenxiu Zhang, Yi Liang and Weizhi Wu. Information System & Knowledge Discovery. Beijing: Publishing House of Science, 2003:22-56 2. Wenxiu Zhang, Weizhi Wu and Jiye Liang etc.. Rough Set Theory and Its Application. Beijing: Publishing House of Science, 2001:142-157 3. Qing Liu. Rough Set and Rough Reasoning. Beijing: Publishing House of Science, 2001 4. Jian Kang. The Application of Classification Data Mining in Financial Customer Relationship Management. Journal of Beijing Institute of technology, 2003(23)2:207-211 5. Yingfei Liu. The Application of Activity Based Costing of Customer in Customer Relationship Management. Business Research (274)
AN INTEGRATED ANALYSIS METHOD FOR BANK CUSTOMER CLASSIFICATION JIE ZHANG**, JIE LU*, GUANGQUAN ZHANG*, XIANGBIN YAN* *Management School, Harbin Institute of Technology Harbin, 150001, PR China fzhanqiie; xbvan(a),hit.edu.cn} Faculty of IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia liielu: zhanee: zhanqiie(a>.it.uts.edu.au) Customer classification is one of the major tasks in customer relationship management. Customers have both static characteristics and dynamic behavioral features. To apply both kinds of data to conduct comprehensive analysis can enhance the reasonability of customer classification. In this paper, customer dynamic data is clustered using a hybrid genetic algorithm and then is combined with customer static data to give reasonable customer segmentation by using neural network technique. A novel classification method which considers both the static and dynamic data of customers is proposed. Applying the proposed method in a bank's datasets can obviously improve the accuracy of customer classification comparing with traditional methods where only static data is used. Keywords: Customer classification, bank, time series data, genetic algorithm
1. Introduction Classification is an important task in bank customer relationship management (CRM). Credit and behavioral scoring models have been used to classify the bank customers [1]. Bank customer management involves both static data and dynamic data. The static data is associated with the customer demographics, and the dynamic data is associated with the customer purchasing behaviors or the transaction data. Most of the researches only use static data to conduct customer segmentation. Obviously, it is intuitive and easy to operate [2]. Some researches use dynamic data [3,4] to do segmentation. However, as the privacy of and update speed issues of customers information, the customers demographic data is either difficult to obtain or difficult to get updated [5]. This paper proposes an integration method using both dynamic and static datasets to identify the customer groups without this limitation assumption which is that customers with similar demographics have the same purchasing behavior [3]. Customer dynamic data is clustered using hybrid genetic 247
algorithms and then combined with the static data to give effective customer segmentation by neural network. The paper is structured as follows. Section 2 describes the framework of the method. In Section 3, the data preparation process is given in detail. Clustering steps using GAs are proposed in Section 4. In Section 5, the final neural network' structure is given. Finally a bank example is used to test the effectiveness of the proposed method. 2. Framework of Bank Customer Classification Method Customer segmentation is a classification process using one or more customers attributes. The results depend on the selection of attributes and classification methods. The values of different attributes come from the enterprise databases including data of the nature and transactions of customers. We classify these data as static data and the dynamic data as follows: •
Static data: It reflects the basic profile of the customers which are stable for a period of time. Static data is used to describe the natural situation of customers such as a customer's gender, age, background, income and etc.
•
Dynamic data: It is frequently changed with the time period, such as transaction records of a customer. Such data can effectively reflect customers' behaviors. This study develops a framework which shows how a method can use both kinds of data for customer classification shown as Fig 1. Dynamic features I Dynamic features 2 Dynamic features n :=
—*•
Cluster
Behavior Type
1
Combined classification
]Customer transaction data Static attribute I
results
Static attribute 2 VStatic attribute n
Figl A framework of combined classification method using dynamic and static data Under the framework, the method can be applied by three phases. We use bank customer data to illustrate this method as presented in Section 3,4 and 5.
3. Phase I: Customer Data Preparation We mainly focus on the preprocessing of dynamic data. First, customer transaction data is weighted by profit. This method then get the profit weighted time series. The original and the weighted time series reflect different properties of customers' purchasing behaviors, both of two time series are therefore used to cluster buying behavior groups. Second, this method applies descriptive statistical values of time series to deal with these dynamic data. We only use some common features of data as the trend, average, variance, kurtosis and skewness of the time series. It therefore reduces the difficulties of data transformation and some unsuitable assumption to the time series. Using the method, transaction data can be of different time periods and of different time durations, so it is easy to extend to some other fields. After weighting the transaction data, we can obtain two data sets. 4. Phase II: Customer Transaction Data Cluster Using Hybrid GA Literature review shows that GA is better in the cluster analysis than traditional algorithms [6]. This study uses a hybrid GA algorithm by combining a simulated annealing algorithm (SA) to make GA more effective. The hybrid GA can prevent GA form premature convergence. The key points of the proposed algorithm are described as follows: 4.1.
Coding Method
We first code the cluster center as the chromosome. So the chromosome is combined by K cluster centers of real numbers. This is not the basic GA coding method of 0 or 1 string. For two time series data of each customer, we have 10 features, and then the cluster center will have ten genes. The chromosome is thus compose of 10*K real numbers, or \0*K genes to represent K cluster centers. Every ten real numbers represent one cluster center in the method. The initial chromosomes are generated randomly. 4.2. Fitness Function The K cluster centers are noted as Ci, C2, ..., Q. The total measure of distance among each cluster can be calculated as formula (1). M (C,,C1,-
,Ct)=£
X II * i-
C
> II
Where X} is the feature vector of customer X, the distance is Euclidean distance. The fitness function is F= 1/M. Then the GA algorithm is used to search the K cluster center to make the F have the maximum value.
4.3. Genetic Operators (1) Selection: This operator used in this study is based on roulette wheel selection method and has been widely used. (2) Crossover: We use one crossover point in the chromosomes to do this job. If the length of the chromosomes is /, then an integer is generated randomly in [1,/-1] to act as the crossover point. The genes right of the point is exchanged between the parents' chromosomes to generate the new generation. (3) Mutation: To every chromosome a mutation possibility assigned is very small. The mutation position is defined as the same way as the crossover operation. The final mutation is shown as (2). '* + S(Xmx - x) 5 ;> 0 (2) x + 8( x - xmm ) 5< 0 Where x is the original value of the mutation gene, 8 is a coefficient associate with the fitness function. Xmin, Xmax is the range of x. 8 is calculated by R as formula (3). M -M .... R
= i M _ /
- M „,„
M ... M ^
Z M
(3)
= M „„
Where M is the measure of the cluster center can be calculated by formula (2). Mmax and Mmin is the maximum and minimum measures of all the chromosomes cluster center. 8 is a random number produced in the range (R,R). This kind of mutation can make the cluster center in the range of all the data, and also can make the well fitted chromosome can not be destroyed. The algorithm will terminate at the planned generations. The best one in the population is the final solution. The terminate condition also can be assigned by some kinds of rules can be satisfied, or the populations are mature. 5. Phase III: Customer Classification Using Static and Dynamic Features It is necessary to combine the cluster results of the dynamic data with the customer static attribute data to conduct segmentation for customers. Back propagation (BP) neural network approach is used in this study to conduct the final classification. The neural network used here has one hidden layer notion. The input neutrons number is the same as the attributes of the customer plus the dynamic attribute conducted from GA. So the input included the basic profile of a customer and the dynamic features form the GAs results. The output neutron is the same as the customer group number. The accurate rate of the classification is calculated as (4)
251 (4)
P = T7-£ <*(«,.«,)
Only if e. and ei is equal, d=l, else <S=0. e, is the customer group label, e, is the group label classified by the BP neural network. N is the total number of the customers. 6. Experiment Data Analysis for Bank Customer Classification In order to testify the classification frame and the algorithm proposed in detail, we use a sample of customer dataset of a bank branch in China. The dataset has 1932 records of customers from 2002/1/1 to 2003/12/13. The customers had been classified as high value group, middle value group and low value group. Each customer's record has 34 attributes, such as name, gender, age, date of birth, income and education background. Totally 16 attributes are selected as the static inputs in the experiment. All the data is transformed into the integrate number. For two value attributes we use ' 1' and '2', for gender,' 1' means male, and '2' means female etc. Other values such as age, we transform it into an age range. Table 1 shows a part of the data we used. Table 1 A customer static attribute data example customer no gendel age rang< city type background marriage 2000317 1 2 2 1 3 2000318 1 1 3 0 1
2
2000319
2
1
2
2
income
house loan label
1 2 3
2 1 1
The bank provides 32 different kinds of finical products and services. As some of them are treated similar in China, for example, withdraw money from counter or ATM. 23 kinds of products and services are considered. The profit rate of different products and services are then calculated. Part of the transaction records of a customer are shown as Table 2. The transaction code means the transaction contents. From the table, we can find that the time interval is not equal. Table 2 A customer dynamic data example T date
20030116 20030120 20030128 20030203 20030205 20030205 20030206 20030208 20030210
T code
CS
MC
F0
CS
LN
LN
SA
CS
CS
T amount
-44268
-559
-7.6
-10000
-39000
39000
794. 46
-5000
78000
We use the method proposed above to cluster the dynamic data. The Hybrid GA control parameters are: Population N=50, mutation probability Pm=0.03, initial temperature T0=100, and the cooling parameter a=0.9\ GA parameters is the generation E=100, crossover probability Pc=l. All the customers purchasing behavior are classified as six groups, and finally the cluster number is assigned to each customer.
From the static customer data, 16 attributes are selected for the final classification. The dynamic data cluster results can not be used as one input to the neural network is testified. The result is show in Table 3. Comparing to the classification only use static data, the proposed method has obviously improved the classification accuracy. Table 3 Comparison of classification accuracy rates Combined Only static data ratio
Training accuracy rate 85.3% 73.1% 1.17
Verifying accuracy rate 82.9% 69.4% 1.19
From table 3, we can see the combined method can improve the accuracy of training data at a ratio of 17% and the verifying data by 19%. 7. Conclusion and Further Study This paper reports a new classification method for combining the static and dynamic data of customers. The testified results show that the method can improve the classification accurate ratio to nearly 20%. The method applies descriptive statistical features to handle different time interval and time period data. As a further study, more features can be also included in the clustering process according the characteristic of an application, such as chaos feature to further improve the classification rate. References 1 2 3 4
5 6
L. C. Thomas, A survey of credit and behavioral scoring: forecasting financial risk of lending to consumers. Intl. J. of Forecasting, 16, 149172(2000). K., Hammond, A. S. C Ehrenberg and G. J. Goodhardt, Market segmentation for competitive brands. Eur. J. of Marketing, 30,39-49(1996). C. Y. Tsai and C. C. Chiu, A purchase-based market segmentation methodology, Expert Systems with Applications, 11, 265-276(2004). N. C. Hsieh, An integrated data mining and behavioral scoring model for analyzing bank customers, Expert Systems with Applications, 27, 623633(2004). R. G. Drozdenko and P. D. Drake, Optimal database marketing: Strategy, development, and data mining. London: Sage. (2002). C. A. Murthy and N. Chowdhury, In search of optimal clusters using genetic algorithms. Pattern Recognition Letters 17, 825-832(1996).
TWO STAGE FUZZY CLUSTERING BASED ON KNOWLEDGE DISCOVERY AND ITS APPLICATION * YEQIAN School of Finance, Zhejiang University of Finance & Economics, Hangzhou, China In order to reflect the characteristic type knowledge and mine data in the credit market,Two-stage classification method is adopted, and fuzzy clustering analysis is presented .First of all, the paper carries on attribute normalization of multi-factors which influence banks credit, computes fuzzy analogical relation coefficient, sets the threshold level to CC by considering the competition and social credit risks state in the credit market, and selects borrowers through transfer closure algorithm. Second, it makes initial classification on samples according to the coefficient characteristic of fuzzy relation; third, it improves fuzzy clustering method and its algorithm. Finally the paper study a case about knowledge of credit mining in the financial market.
1. Introduction It is an procedure for data mining to identify effective, novel, potential useful and understandable mode from a large amount of, incomplete, noisy, fuzzy and random data. It adopts such relevant technology as machine learning, mathematical statistics, neural network, database, pattern-recognition, rough set, fuzzy mathematics, etc. It applies to: classify or predict the model data mining, summarize data, cluster data, discovery related rule and sequence mode, dependent relation or model, etc. .According to maximum attribute rule or adopting classification of close degree, it goes on pattern-recognition and then classifies a sample according to principle of choosing the closest (close to degree law) with multi factors (features). However, it is unnecessary to classify those which have relatively more samples or have such problems as having higher discerning cost to classify. It only needs grouping according to a certain characteristic, which involves another kind of categorized method, namely the clustering, clustering study plays an important role in data mining. There are a lot of papers of knowledge discovery, methods adopted focus on fuzzy, neural network and their combination, for instance, flexible neuron-fuzzy " This work is supported by Zhejiang University of Finance & Economics(Grant No.YJZ02). Work partially supported by the project of the National Science Foundation of China (Grant No.70571068).
253
systems,hierarchical neuro-fuzzy systems hybrid, rough-neuro- fuzzy systems are mentioned(D. Rutkowska,2003);Vicen9 Torra (2003) describes fuzzy knowledge based systems and intelligent control on the light of chance discovery,proposes a multi-stage classification algorithm and a multi-expert classifier. Grzegorz Drwal and Marek Sikora (2004) presents system which tries to combine the advantages of rough sets methods and fuzzy sets methods to get better classification. A neuro-fuzzy System for the extraction of knowledge directly from data, and a toolbox developed in the Matlab environment for its implementation is discussed (G. Castellano, etc. 2003). 2. Description of the model and algorithm of its classification 2.1. Clustering analysis on fuzzy relation The basic thought of setting up model with fuzzy clustering is that given sample U determine a level, and classify U by considering level a .Its fuzzy uncertainty is expressed by a . The classification of U is confirmed after confirming a. proposition 1. let/? e F(Ux U) ,witha € ( 0 , 1 ] , R - is cut sets. If R is fuzzy equivalent relation on U, then R- is an equivalent relation on U . So R- can classify U, and then the classification obtained from this is called classification on level of a . The steps related to fussy cluster relation algorithm are as follows: (1) Normalization of the characteristic data (2) Similarity relation coefficient Here we establish fuzzy similar matrix which reflects analogical relation among each target, namely, R = (r )nxn Where r is correlation coefficient between object Ui and U,. There are many kinds of common methods. For example, the coefficient correlation law of index, fuzzy method, etc. Here we give solutions to the multi attributes between two targets with degree of close law, namely, | [(«,,«,) = ( VJUik A Ujk)) A ( 1 - A K . V K ^ ) )
i*j
(3) The classification based on fuzzy similar matrix can adopt three kinds of methods: First, law of weave network. Let R be a similar matrix with a e [0,1], then the procedure of categorizing a level: set matrix Ra , insert the diagonal the corresponding symbol, in the diagonal, replace below "*" with 1 and replace 0 by blank. Regarding "*" position as a joint, proceeds from guide vertical line
and horizontal line to the diagonal and bind them together. The sample elements that bind together while still can connect each other belong to one type; second is turning into tree with maximum; third is transmitting close bag law to make cluster analysis on basis of R ° R . 2.2. Dynamic cluster model based on uncertainty knowledge 2.2.1 Dynamic clustering of fuzzy ISODATA Classification obtained from above-mentioned methods is only a classification of taking extreme values. It is a kind of roughly and comparatively static division.While the law of dynamic cluster allows the mode sample to move from a connection type to another, it is an initial inaccurate division that is improved progressively, and it is a kind of heuristic method, striving to be optimal divided and reducing the calculating. Definition 1. let/?, n be two given straight integers with/?<w, and D = (,) is a fuzzy matrix which satisfies the following conditions: (1) For each k, £ dlk = 1 (2) For each i, ]T dlk > 0 Then D is called (p, «)- Fuzzy division. We denote all (P, n) - Fuzzy division &f (p, n) which can be bridged to be Af. Definition 2. let U= {«,,w2,••-,«,} cz 9?" be limited subset and \eip
;
D = «,)„„„ , C e 5R and C > 1. We define 2
•/(An = I2X)'l|v-*j
(2)
Then J(D, V) is called cluster membership function divided by Fuzzy. Here we provide the concrete cluster algorithm .In accordance with policymaker's intention and adopt fuzzy similar matrix to determine p.Go on to classify " with the simplest way and then get a classification which is regarded as initial Fuzzy classification Dm. We suppose D
e A ; (p,n) \\ J(D
,V
)-J(D
,V )\\<£
.„
then we define V-l)
V,
. *=i
(4)
Here we can obtain j / ( " = {v, ,v2 ,...,v } We go on to revises it and plan to take it place next time in order to have, Uk — vt and meanwhile we let
If for each /, we have
uk*v, t h e n let
rf'"
=•
(6) (7)
triiv-.ii Calculate J(D"'", V"~") and •/(£>'", K1"). If for each chosen £ , there is
\j{D^\V^)-J{D"\V(,))\<£ Then we stop and regard z>(,) and K"' as optimum classification and optimum cluster center. Otherwise we repeat above-mentioned steps. 3. Case Study In a certain period of 2004, it is quite difficult for a credit department of Zhejiang commercial bank to value the 10 borrower applicants from its own borrower applicants the credit grades directly from their public information. But in order to expand its business, the bank hopes to investigate their characteristic state of the 10 borrowing enterprises from the information that the bank has already known and considers to classify them in order to excavate valuable knowledge. By choosing a large number of information of this mode space and carrying on statistical analysis on historical credit materials and data that it owns, the bank examines and excavates five factors that influence the credit decision appraise: Financial statement, project or enterprise production and process technology, market supply and demand state of the products, commercial credit, credit function of management level, If the value of a borrower is positive, which indicates that the borrower is credible and safe. The dealing with ten applicants of this bank and their five respects attribute index can be described briefly as follows: (l)Collect, deal with and make statistical analysis on bank's historical credit materials and data.
(2) By carrying on regular treatment to characteristic datum of borrower, receive mode space which is formed by categorized targets (the samples ) U={w ; ,w 2 ,w 3 ,w 4 ,w 5 ,w 6 ,w 7 ,u 8 ,w 9 ,w 10 }. (3) Calculate the coefficient matrix R of the fuzzy relation between every borrower. Adopt transfer closure algorithm to calculate R ° R . 0.65
0.92
0.6 1
0.96
0.8 .
0.87
0.6 1
0.6 1
0.8 5
0.5 8
0.83
0.75
0.8 3
0.80
0.7 2
0.9 1
0.7 5
1
0.58
0.92
0.58
0,87
0.6 1
0.5 8
0.47
1
0.85
0.93
0.80
0.95
0.77
0 .64
1
0.56
0.87
0.6 I
0.54
0.8 5
1
0.80
0.93
0.83
0.80
0.65
0.65
0.84
1
0.7 7
0.7 2
1
0.7 5
Classification. Establishing different a value, its classification number and the classification are different. At level a =0. 96, U! and « 5 belong to one group, others make up of the other group; At level a =0. 95, U4 and wg belong to the group that u, and w5 form ; At level a = 0. 92, « 3 and u6 can be grouped together with above types. At level a = 0. 91, above-mentioned groups can be included in u2 and u9. At level a = 0. 87, «7 can be grouped to above group. At level a = 0. 85, uw can only be grouped to one type together with other borrowers. Finally, we can establish types with p=3 according to state of the credit market, namely, separate borrowers who are creditable or have security, borrowers whose prestige should pay close attention to ( must offer security or mortgage), and borrowers who should be refused to grant loan (cancelled by red line). And set c =2, S =0. 001. We can compute 1 0 D = 0 0.523 0 0.477
0 0.187 1 0.813 0 0
1 0 0
0 0 1 0 0 1
0.088 0.912 0
0 0.306 0.694
0 0 1
It is shown that borrowers u, and U5 belong to "credit group"; Borrower u1 and« 10 belong to "cancelled by red line group", w3 and u6 belong to " group of paying close attention to"; while to a greater degree w2,w4 andw8 belong to " group of paying close attention "; And to a much greater degree u9 belongs to "cancelled by red line group" .
258
4. Conclusions In the financial theory of the information asymmetry, it is a kind of comparatively effective method to reduce credit ration by dividing borrowers into groups effectively. However, in the credit market, because the datum is incomplete and information is fuzzy and uncertainty, it is difficult to solve such kind of complicate system with classical and traditional theory. This paper tries to study the division group problems of borrowers in the credit market with fuz2y system theory. The case study shows that it is available for clustering banks customers based on two stage fuzzy clustering analysis . References 1. Rutkowska, D.and Hayashi, Y., Fuzzy inference neural networks with fuzzy parametersJ/fSA: Quarterly ,7(1):7 - 22( 2003). 2. Jan Koszlaga and Pawel Strumillo, Discovery of Linguistic Rules by Means of RBF Network for Fault Detection in Electronic Circuits, ICAISC 2004, pp.223-228. 3. Vicenc Torra and Sadaaki Miyamoto, Evaluating Fuzzy Clustering Algorithms for Microdata Protection," Privacy in Statistical Databases , pp. 175-186(2004). 4. Grzegorz Drwal and Marek Sikora, Fuzzy Decision Support System with Rough Set Based Rules Generation Method, Rough Sets and Current Trends in Computing, pp. 727-732(2004). 5. G. Castellano, C. Castiello and A.M. Fanelli, Designing a meta-learner by a neuro-fuzzy approach, Proc. of NAFIPS 2004, Banff, Alberta, Canada, 27-30, June 2004. 6. G. Castellano, C. Castiello, A.M. Fanelli, and C. Mencar, Knowledge discovery by a neuro-fuzzy modeling framework^wzzy Sets and Systems Special Issue on Fuzzy Sets in Knowledge Discovery, Elsevier, Vol. 149, Issue 1, pp. 187-207, January(2005). 7. Shouhong Wang, "Application of self-organising maps for data mining with incomplete data sets ," Neural Comput & Applic ,pp. 42-48, Dec.(2003). 8. Ben Abdelaziz F., Lang P. and Nadeau R., " Dominance and Efficiency in Multicriteria Decision under Uncertainty," Theory and Decision, vol. 47, no. 3, pp. 191-211(21), December(1999). 9. Y. Yoshida , M. Yasuda, J. Nakagami and M. Kurano, " Multi-Objective Fuzzy Stopping in Systems with Randomness and Fuzziness", Proceedings of 8th Bellman Continuum, Intern. Workshop on Intelligent Systems Resolutions, Hsinchu in Taiwan, pp. 341-346, December(2000).
APPLICATION OF SUPPORT VECTOR MACHINES TO THE MODELLING AND FORECASTING OF INFLATION* MILAN MARCEK Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & MEDIS Nitra, Ltd., Pri Dobrotke 659/81, 949 01 Nitra-Drazovce, Slovak Republic DUSAN MARCEK f Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & Faculty of Management Science and Informatics, University ofZilina 010 26 Zilina, Slovak Republic
In Support Vector Machines (SVM's), a non-linear model is estimated based on solving a Quadratic Programming (QP) problem. Based on work [1] we investigate the quantifying of econometric structural model parameters of inflation in Slovak economics. The theory of classical Phillips curve [7] is used to specify a structural model of inflation. We provide the fit of the models based on econometric approach for the inflation over the period 1993-2003 in the Slovak Republic, and use them as a tool to compare their approximation and forecasting abilities with those obtained using SVM's method. Some methodological contributions are made for SVM implementations to the causal econometric modelling. The SVM's methodology is extended for economic time series forecasting.
1. Introduction This contribution considers the econometric modelling of inflation in the Slovak Republic. The main tools, techniques and concepts involved in econometric modelling of inflation are based on the Phillips concept [7]. According to the Phillips inflation theory the variable inflation is generated on a set of underlying assumptions. In any case, the analysed inflation rates are explained by the behaviour of another variable or a set of variables, in our case by the wages and the unemployment as independent variables (see [1], [9]). In this paper the resulting SVM's are applied using an e-insensitive loss function developed by V. Vapnik [11]. We motivate the approach by seeking a function which approximates mapping from an input domain to the real numbers based on a small subset of training points. The paper is organized as follows. * This work was supported by grants GACR 402/05/2768 and VEGA 1/2628/05.
259
260 The next section will provide a quick overview of the concept of SVM's theory. Section 3 analyses the data, discusses statistical and SVM estimator, presents the fitted inflation rate values by the classical statistical method and SVM's models, discusses the circumstances under which SV regression outputs are conditioned and corresponding interpretation of SV regression results is also considered. Section 4 extends the SVM's methodology for economic time series forecasting. A section of conclusions will close the paper. 2. Support Vector for Functional Approximation This section presents quickly a relatively new type of learning machine - the SVM applied in the regression (functional approximation) problems. For details we refer to [2]. .The general regression learning task is set as follows. The learning machine is given n training data, from which it attempts to learn the input-output relationship y = f(x), where { *., y. e 9T x9?}, i = 1,2,...,n consists of n pairs { y-,xj } "_i • The x denotes the j'th input and yf is the rth output. The SVM considers the regression functions of the form f{x) = fj{ai-a])y/(xi,xj) where aj,al
+b
0)
are positive real constants (Lagrange multipliers which are
calculated from solving of the Quadratic Programming (QP) problem by the saddle point of the Lagrangian [3]), b is a real constant, y/(J.) is the kernel function. Admissible kernels have the following forms: y/(x ,x ) = (x r x +1)'' (polynomial SVM of degree d), ^(x,,x .) = exp(-#|x ; - x .|N (radial basis SVM), where 6 is a positive real constant and other (spline, b-spline, etc.). The SV regression approach is based on defining a loss function. They are different error (loss) functions in use and that each one results from a different final model. Next we will use the Vapnik's e-insensitive loss function [11]. Formally, this leads to the solving of the QP problem [3]. After computing Lagrange multipliers OCt and (X{ , one obtains the form of (1) [5], i.e. /(x) = £ ( « , . -a*)^(x,.,x.) +fc= /(x,w) = w r x +
ft
(2)
where w = (w,,..., wn) are weights that are the subject of learning. Finally, b is computed by exploiting the Karush-Kuhn-Trucker (KKT) conditions [3], i.e.
261
b = yk-Yd(al
-a*)^ixitxk)-£
for ak e (0,C),
i=l
(3)
b = yk - X (a, - a] )y/(xi„ xk) + e
for or* e (0, C).
i=i
3. Causal Models, Experimenting with Non-linear SV-regression To study the modelling problem of inflation quantitatively the quarterly data from 1993Q1 to 2003Q4 was collected concerning the consumption price index CPI, aggregate wages W and unemployment U. 3,5
CPI CPI"
15 i •
i
i > i i i i i i i i i i i i i i i i i > i i i i i i i i i i
n —
ii? —
i. o > - n w r - o i - ^ — — ci a ci a ci rs **>
ID ri^ *•>
cr« — n r? •* -*
i
Figure 1. Natural logarithm of quarterly inflation from January 1993 to December 2003
Experimenting with the linear transfer function models [1], the resulting reasonable causal model formulation was found CPI, =0.292 + 0.856 CPI,.
(4)
A graph of the historical (CPI,) and the fitted values for causal (CPI,) inflation model (4) is presented in Figure 1. If CPIt exhibits a curvilinear trend, one important approach to generating an appropriate model is to regress the CPI, against time. In Table 1 the SVR results of inflation were also calculated using an alternative time series model expressed by the following SVR form
CP/,=2> i tt(x,) + * Where x, = (CP/,_,,CP/f_,,...), >•£•
m
(5)
e Eq. (4) is the causal model, or
x = (1,2,..., 43), then the Eq.(4) is a time series model.
262 One crucial design choice is to decide on a kernel. Creating good kernels often requires lateral thinking: many measures of similarity between inputs have been developed in different contexts, and understanding which of them can provide good kernels depends on insight into the application domain. The Figure 2 shows SVM learning by using various kernels. In Fig. 2a we have a piecewise3.5
<s/?S.
3
&
2.5
0^
•Pit 2 #
•
15
t'
>^*+y
JP
» •
#\. ¥/;'
0.5
•
Fig. 2b
Fig. 2a
Fig. 2c
«^f '
3
?
2.5 2 CPIt. 1.5
Fig. 2d
ft _ ^ ^ 1
:/,
-
s>
i
•
0.5
•
0
5
Fig. 2e
tD
15
26
30
35
40
Fig. 2f
Figure 2. Training results for different kernels, loss functions and a of the SV regression (see Tab 1). The original functions (plus points), the estimated functions (full line), the £ -tube (dotted lines) are shown. Fig. a, c, d, e, f correspond to a good choice of the parameters, Fig. b corresponds to a bad choice.
263 linear approximating function, while in Fig. 2b and Fig. 2c we have a more complicated approximating function. Both functions agree with the training points, but they differ on the three y values, they assign to other X inputs. The functions in Fig. 2d and Fig. 2e apparently ignore some of the example points but are good for extrapolation. The true f(x) is unknown, and without further knowledge, we have no way to prefer one of them, and so to resolve the design problem of choosing an appropriate kernel in our application. For example, the objective in pattern classification from sample data is to classify and predict successfully new data, while the objective in control applications is to approximate non-linear functions, or to make unknown systems follow the desired response. Table 1 presents the results for finding the proper model by using the quantity R2 (the coefficient of determination) on our application of the best approximation of the inflation rate. As shown in Table 1 the "best" is 0.9999 for the time series models with the RBF kernel and quadratic loss functions. In the cases of causal models the best R2 is 0.9711 with the exponential RBF kernel and £ -insensitive loss function (standard deviation O = 0.52). The choice of <7 was made in response to the data. In our case, the CPI, CPL\ time series have O = 0.52. The radial basis function defines a spherical receptive field in 9t and the variance <72 localises it. The results shown in Table 1 were obtained using £ -insensitive loss function (£= 0.2), with different kernels and degrees of capacity C = 10 . We used partly modified software developed by Steve. R. Gunn [4] to train the SV regression models. The use of SV regression is a powerful tool to the solution many economic problems. It can provide extremely accurate approximation of time series, the solution to the problem is global and unique. However, these approaches have several limitations. In general, as can by seen from QP solving, the size of the matrix involved in the QP problem is directly proportional to the Table 1. The SV regression results of different choice of the kernels on the training set (1993Q1 to 2003Q4). In two last column the approximation and extrapolation performances are analysed. See text for details. Fig2 a b c d e f
MODEL (5) causal causal causal causal causal time series (4)
KERNEL exp. RBF RBF RBF polynomial polynomial RBF
LOSS FUNCTION £ - insensitive £ - insensitive £ - insensitive £ - insensitive £ - insensitive quadratic
R2
RMSE
0.9711 0.8525 0.9011 0.7806 0.7860 0.9999
0.0456 0.0090 0.0497 0.0191 0.0179 0.5556
0.7762
0.0187
264 number of training data. For this reason they are many computing problems in which general quadratic programs become intractable in their memory and time requirements. To solve these problems they have been introduced many modified versions of SVM's. For example the generalized version of the decomposition strategy is proposed by Osuna et al. [6], the so-called SVM lg ' proposed by Thorsten [10] is an implementation of an SVM learner which addresses the problem of a large task, and finally, in [8] a modified version of SVM's so-called least squares SVM's (LS-SVM's) is introduced for classification and non-linear function estimation. 4. Forecasting with SV-regression Models Unfortunately, the SVM's method does not explicitly define how the forecast is determined, the point estimates of the fitted model are simple values without any degree of confidence for the results. Despite this fact, the point estimates for the large data sets can be calculated. The entire data set is partitioned into two distinct data sets: the training data set, i.e. the sample period for analysis, and the validation data set as the time period from the first observation after the end of the sample period to the most recent observation. The parameters C, G , £ must be tuned as follows. First an SV machine is estimated on a training set by solving QP. Secondly, the performance is evaluated on a validation set. The parameter set with the best performance on the validation set is chosen. With these parameters, the point estimates of the ex-post forecast may be calculated by simply putting the values of the vectors Xv. and X;- into the following SV regression / ( x ) ^ ( f f , - a ; W x > , ) + (-
(6>
i=l...T ;=l...r
where T denotes the sample training period ends, T is the forecasting horizon, i.e. the number of data points on the validation set, CC^CX* are known real constants (Lagrange multipliers), b is known parameter (bias), X- is a Tdimensional vector of the inputs, f(\) are the point estimates or forecasts of the series of yt predicted at the point xv = (*,",xv2,...,xvm), and y/(.l.) denotes the admissible kernel function used in the fitting phase of the SV regression model. An obvious limitation to the use of causal models is the requirement that independent variables must be known at the time the forecast is made. In our case, the new CPI value is correlated with the CPI value one quarter previous. This fact may be used by obtaining one-step ahead forecast of the CPI value
265 operating on a moving horizon basis. Generally, we denote the current period by T, and suppose that we will forecast the series yt in period T +\ (T = n, x=l). The forecast for future observation CPI, is generated successively from the Eq. (6) by replacing the vector of the independent variable xv with CPIT. As a new observation become available, we may set the new current period T+l to T and compute the next forecast again according to the Eq. (6). In situations where the independent variables are mathematical functions of time, the point estimates or forecasts of CPIT+T are just values of the Eq. (6) at the points x]=(T + \,T + 2,...,T + r). As three new observations become available, the ex ante summary forecast statistics (RMSE) may be calculated. The RMSE statistics generated by each SVM's and dynamic model (4) respectively are given in Table 1. As illustrated in Table 1, a curve fitted with many parameters follows all fluctuations (the R2's values increase) but is poor for extrapolation (the RMSE's values increase, too, i.e. the forecast accuracy decreases). The model in Fig. 2b gives best predictions outside the estimation period and clearly dominates the other models. It should be pointed out that we are ranking the seven models within one category of the forecast summary statistics. This is not a statistical test between models, but one way of trying to determine subjectively which of the models generates best the data of the inflation process. 5. Conclusion In this paper, we have examined the SVM approach to study linear and nonlinear models on a time series of inflation in the Slovak Republic. For the sake of approximation abilities we evaluated eight models. Two models are based on causal multiple regression in time series analysis, and six models are based on the Support Vector Machines methodology. Using the disposable data a very appropriate econometric model is the regression (4) in which the lagged dependent variable CPI x can substitute for the inclusion of other lagged independent variables (W , , £ / , ) • The benchmarking was performed between traditional statistical approaches and SVMs in regression approximation tasks. The SVM approach was illustrated on the regression function of (4) which was developed by statistical tools. As it is visually clear from Figures 2, this problem was readily solved by a SV regression with excellent approximation. Finally, the paper has made some methodological contribution for SVM implementations to the causal econometric modelling and extended the SVM's methodology for economic time series forecasting.
266 References 1. J. Adamda, M. Marcek, L. PanCikova, Some results in econometric modelling and forecasting of the inflation in Slovak economics. Journal of Economics, 52, No. 9, 1080-1093 (2004). 2. N. Cristiani Shave - J. Taylor, An introduction to support vector machines. Cambridge University Press (2000). 3. R. Fletcher, Practical methods of optimization. John Wiley and Sons, Chichester and NY (1987). 4. S., R. Gunn, Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton (1997). 5. V. Kecman, Learning and Soft Computing. The MIT Press, Cambridge, Massachusetts, London, England (2001) 6. R. Osuna., Freund, F. Girosi, An improved training algorithm for support vector machines. In J. Principe, L. Gile, N. Morgan, and E. Wilson editors, Neural Networks for Signal Processing VII - Proceedings of the 1997 IEEE Workshop, New York (1997). 7. A., W. Philips, The Relation between Unemployment and the Rate of Change of Money Wages in the United Kingdom, 1861-1957, Economica November (1958). 8. J., A., K. Suykens, T. Van Gestel, J, De Branter, B. De Mor, J. Vandewalle, Least Squares Support Vector Machines. World Scientific Pub. Co., Singapore (2002). 9. V. Pankova, Do Wages Affect Inflation in Czech Republic?, Mathematical Method I Econmics - MME'97, VSB Technical University Ostrava, 1997, (156-160). 10. J. Thorsten, Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT (1999). 11. V. Vapnik: The support vector method of function. In: Nonlinear Modelling: Advanced Black-Box Techniques, Suykens, J.A.K., Vondewalle, J. (Eds.), Kluwer Academic Publishers, Boston, 1998, (55-85).
ASSESSING THE RELIABILITY OF COMPLEX NETWORKS: EMPIRICAL MODELS BASED ON MACHINE LEARNING CLAUDIO M. ROCCO S. Universidad Central de Venezuela, Facultad de Ingenieria, Caracas, Venezuela. crocco(cbreacciun. ve MARCO MUSELLI Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni, Consiglio Nazionale delle Ricerche, Genova, Italy, marco.muselli@ieiit. cnr. it
Abstract In this paper three models derived using Machine Learning techniques (Support Vector Machines, Decision Trees and Shadow Clustering) are compared for approximating the reliability of real complex networks, such as for water supply, electric power or gas distribution systems or telephone systems, using different reliability criteria.
1. Introduction In the reliability community, two main categories of evaluation techniques are considered: in the analytical approach the system is analyzed, deriving a closed form expression for its reliability, for example by determining the minima] cut pr path sets. On the other hand, simulation techniques, among which methods based on Monte Carlo estimation, are adopted when complex operating conditions are considered [1]. This last approach is usually employed to evaluate the reliability of real engineering systems, since analytical approaches are computational complex (NP-hard). The expected value of a System Function [2] or of an Evaluation Function (EF) [3], depending on the system state x (vector representing the state of each element), is normally used as a reliability index. In fact, EF determines whether a specific configuration x corresponds to an operating state or to a failed one [4]. In simulation techniques based on Monte Carlo estimation, the evaluation of the system reliability is performed by: 1) randomly sampling a large number of 267
268 states x, 2) applying an appropriate EF to assess if in each sampled state x the system succeeded or failed, and 3) estimating the expected value of the EF. The definition of the EF depends on the success criterion to be used. For example, to evaluate the connectivity between two nodes, a depth first search procedure can be used. Other criteria can require more time-consuming procedures. Since in the Monte Carlo estimation, a large number of EF evaluations must be performed [4], it seems to be convenient to obtain a valid approximation flx) for the EF using a Machine Learning (ML) technique, thus applying^) to assess the system behavior in each sampled state x. Two different ML approaches have been used: predictive methods (e.g. Neural Networks or Support Vector Machines (SVM)), adopting a black box device whose functioning is not directly comprehensible and descriptive methods (e.g. Decision Trees (DT) or Shadow Clustering (SC)), which provide a set of intelligible rules underlying the problem at hand. The rest of the paper is organized as follows: In Sec. 2 the analyzed problem is introduced together with the three machine learning techniques (SVM, DT and SC) considered for approximating the EF. Sec. 3 compares the results obtained by each method for three networks and Sec. 4 contains the conclusions. 2. The Machine Learning approach to reliability evaluation It is assumed that system components have two states, operating and failed, coded by the integers 1 and 0, respectively, and that component failures are independent events. The state xt of the j'-th component is defined as [5]: (1 (operating state) ' ~ [0 (failed state)
with probability P( with probability g, = 1 - Pt
where P, is the probability of success of component /. The state of a system containing dcomponents is then expressed by a vectorx= (xi,x2, ...,* rf ). To establish if x is an operating or a failed state for die network, we employ a proper Evaluation Function (EF): _
_ f 1 if the system is operating in state x [0 if the system is failed in state*
Suppose that a sample, called training set, containing N pairs (jt,-,yy) is available, where y} = EF(xj); a Machine Learning (ML) technique can be used to retrieve a good approximation/(x) for the unknown evaluation function EF(x) of the system at hand. The behavior of a predictive method, Support Vector
269 Machines (SVM), and of two descriptive methods, Decision Trees (DT) and Shadow Clustering (SC), are analyzed in the following. 2.1. Support Vector Machines [6] Suppose the value +1 and -1 are adopted for the output y of the evaluation function EF(x) instead of 1 and 0. Denote with S? (resp. S~) the convex hull of the points xj in the training set with corresponding output yj• = +1 (resp. yj = -1). If S* and S~ are linearly separable, we can construct the optimal hyperplane w-x+b = 0, which has maximum distance from these two convex hulls. The vector w and quantity b, usually referred to as weight vector and bias, can be derived by solving the following quadratic programming problem: . 1 mm — ww w,b 2
subject to yj (w> • Xj + b) > 1
for every j = 1,..., N
Once we have found the optimal hyperplane, we simply determine on which side of the decision boundary a given test pattern x lies and assigns the corresponding class label, using the function sgn(wjc *+£>). Equivalently the weight vector w and the bias b of the optimal hyperplane can be found by searching for the values of the Lagrange multipliers Oj in the Wolfe dual problem. In this case we have w = 2y a, yi Xj. Only those points, which lie closest to the hyperplane, have e$ > 0 and contribute to the above sum. These points are called support vectors and capture the essential information about the training set at hand. If the two convex hulls 5* and S~ are not linearly separable the optimal hyperplane can still be found by accepting a small number of misclassified points in the training set. A regularization factor C accounts for the trade off between training error and distance from S* and 5^. To adopt non-linear separating surfaces between the two classes, we can project the input vectors Xj into another high dimensional feature space through a proper mapping $(•). If we employ the Wolfe dual problem to retrieve the optimal hyperplane in the projected space, it is not necessary to know the explicit form of the mapping 0. We only need the inner product $(*)•(£(*') for every pair of input vectors x, x'; a proper symmetric positive definite function K(xjc') =
270 The need of properly choosing the kernel is a limitation of the support vector approach. In general, the SVM with lower complexity should be selected. 2.2. Decision Trees Decision tree based methods represent a non-parametric approach that turns out to be useful in the analysis of large data sets for which complex data structures may be present [7]. A DT uses a divide-and-conquer strategy: It attacks a complex problem by dividing it into simpler sub-problems and recursively applying the same strategy to solve each of these sub-problems [8]. Every node in a DT is associated with a component of the network, whose current state is to be examined. From each node starts two branches, corresponding to the two different states of that component. Every terminal node (or leaf node) is associated with a class, determining the network state: Operating or failed. Conventionally, the false branch (failed state of the component) is positioned on the left and the true branch (operating state of the component) on the right. DT methods usually exploit heuristics mat locally perform a one-step lookahead search; once a decision is taken it is never reconsidered. This hill-climbing search without backtracking is susceptible to the usual risks of converging to locally optimal solutions that are not globally optimal. On the other hand, this strategy allows building decision trees in a computation time that increases linearly with the number of examples [8], Different algorithms for constructing decision trees essentially follow a common approach, called top-down induction; the basic outline is [9]: 1. If all the examples in the training set belong to one class, then halt. 2. Consider all the possible tests that divide the training set into two or more subsets. Score each test according to how well it splits up the examples. 3. Choose the test that achieves the highest score. 4. Divide the examples into subsets and run this procedure recursively on each subset, considering it as the current training set. 2.3. Shadow Clustering [10-11] Shadow Clustering (SC) is a rule generation method, based on monotone Boolean function reconstruction, which is able to achieve performances comparable to those of best classification techniques. The decision function built by SC can be expressed as a collection of intelligible rules in the if-then form, underlying the classification problem. In addition, as a byproduct of the training process, SC is able to determine redundant input variables for the analysis at
271 hand, thus allowing a significant simplification in the data acquisition process. SC proceeds by grouping together binary strings that belong to the same class and are close to each other according to a proper definition of distance. A basic concept in the procedure followed by SC is the notion of cluster. A cluster is the collection of all the binary strings having the value 1 in a fixed subset of components; as an example, the eight binary strings '01001', '01011', '01101', '11001', '01111', '11011', '11101', '11111' form a cluster since all of them only have the value 1 in the second and in the fifth component. The procedure employed by SC consists of the following four steps: 1. Choose at random an example (Xj-yj) in the training set. 2. Build a cluster of points including x, and associate that cluster with the class yj-
3.
Remove the example (xj^J) from the training set. If the construction is not complete, go to Step 1. 4. Simplify the set of clusters generated and build the corresponding monotone Boolean function. An important characteristic of this technique is that the execution of SC does not involve the tuning of any parameter. 3. Example To evaluate the performance of the methods presented in the previous sections, the three networks shown in Fig. 1-3 have been considered. It is assumed that all links have reliability of 0.90. For network 1 [12], it is assumed that each link has a capacity of 100 units. A system failure occurs when the flow at the terminal node / falls below 200 units (a max-flow min-cut algorithm is used to establish the value of the EF). Network 2 [13] has 20 nodes and 30 double-links. The goal is to evaluate the connectivity between the source node s and the terminal node /. Finally, Network 3 shows 52 nodes and 72 double links of the Belgian telephone network [14]. The success criterion used is the all-terminal reliability (defined as the probability that every node of the network can communicate with every other node). In order to apply a classification method it is first necessary to collect a set of examples (xj^yj), where yj = EF(xj), to be used in the training phase and in the subsequent performance evaluation of the resulting set of rules. To this aim, 50000 system states have been randomly selected without replacement and for each of them the corresponding value of the EF has been retrieved. To analyze how the size of the training set influences the quality of the solution provided by each method, 13 different cases were analyzed, with 1000 to 25000 examples in the training set.
272
Fig. 1. Network 1 [12]
Fig. 2. Network 2 [13]
Fig. 3. Network 3 [14] These examples were randomly extracted with uniform probability from the whole collection of 50000 system states; the remaining pairs were then used to test the accuracy of the model produced by the machine learning technique. An average over 30 different choices of the training set for each size value was then performed to obtain statistically relevant results. The performance of each model is evaluated using standard measures of sensitivity, specificity and accuracy [15]. For reliability evaluation, sensitivity gives the percentage of correctly classified operational states and specificity provides the percentage of correctly classified failed states. Different kernels were tried when generating the SVM model and it was found that the best performance is achieved with a Gaussian radial basis function (GRBF) kernel having parameter (1/202) = \ld. Figures 4 shows the result comparison regarding accuracy, sensitivity and specificity during the testing phase. As expected, the index under study for each model increases with the size of the training set. SC has almost always the best behavior for all the indices. However, for Network 2 the specificity index obtained by DT behaves better. In [16] the previous ML methods are compared in terms of the reliability of the networks. As expected the best assessment is obtained using SC.
273
Fig. 4. Performance results during the testing phase for each network.
4. Conclusions This paper has evaluated the excellent capability of three machine learning techniques (SVM, DT and SC) as an approximating tool that can be used to assess the reliability of a complex network. For the three networks studied, the SC procedure seems to be more stable when considering the three indices simultaneously. It is important to realize that SVM produces a model that cannot be written in the form of a logical sum-of-products involving the system components, whereas DT and SC are able to obtain it, even from a small training set, thus providing information about minimum paths and cuts [16]. References 1. Billinton, R. Li W.: Reliability Assessment of Electric Power System Using Monte Carlo Methods, Plenum Press, 1994. 2. Dubi A.: Modeling of realistic system with the Monte Carlo method: A unified system engineering approach, Proceedings of the Annual Reliability and Maintainability Symposium, Tutorial Notes, 2001.
274 3.
4. 5. 6. 7. 8. 9. 10.
11.
12. 13.
14. 15.
16.
Pereira M. V. F., Pinto L. M. V. G.: A new computational tool for composite reliability evaluation, IEEE Power System Engineering Society Summer Meeting, 1991, 91SM443-2. Pohl E. A., Mykyta E. F.: Simulation modeling for reliability analysis, Proc. of the Annual Reliability and Maintainability Symposium, 2000. Billinton, R. Allan R.N: Reliability Evaluation of Engineering Systems, Concepts and Techniques (second edition), Plenum Press, 1992. Cristianini N., Shawe-Taylor J.: An Introduction to Support Vector Machines, Cambridge University Press, 2000. Breiman L., Friedman J. H., Olshen R. A., Stone C. J.: Classification and Regression Trees, Belmont: Wadsworth, 1994. Portela da Gama J.M.: Combining Classification Algorithms, PhD. Thesis, Faculdade de Ciencias da Universidade do Porto, 1999. Quinlan J. R.: C4.5: Programs for Machine Learning, Morgan Kaufinann Publishers, 1993. Muselli M., Quarati A.: Reconstructing positive Boolean functions with Shadow Clustering. In Proceedings of the 17' European Conference on Circuit Theory and Design (ECCTD 2005), Cork, Ireland, 2005. Muselli M. Switching neural networks: A new connectionist model for classification. Proceedings of the 16th Italian Workshop on Neural Networks Vietri sul Mare, Italy, 2005. Yoo Y. B., Deo N.: A comparison of algorithm for terminal-pair reliability, IEEE Transaction on Reliability, 37, 1988, 210-215. Chaturvedi S.K., Misra K.B.: An Efficient Multi-variable Algorithm for Reliability Evaluation of Complex Systems using Path sets, International Journal of Reliability, Quality and Safety Engineering, 3, 2002, 237-259. Manzi E., Labbe M., Latouche G., Maffioli F., Fishman's Sampling Plan for Computing Network Reliability, IEEE Trans.Rel, R-50, 2001,41-46. Veropoulos K., Campbell C , Cristianini N.: Controlling the sensitivity of Support Vector Machines, Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999, 55-60. Rocco C. M., Muselli M.: Machine learning models for reliability assessment of communication networks, submitted to IEEE Trans, on Neural Networks.
FUZZY TIME SERIES MODELLING BY SCL LEARNING* MILAN MARCEK Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & MEDISNitra, Ltd., Pri Dobrotke 659/81, 949 01 Nitra-Drazovce, Slovak Republic DUSAN MARCEK* Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & Faculty of Management Science and Informatics, University ofZilina 010 26 Zilina, Slovak Republic
Based on the works [8], [16] a fuzzy time series model is proposed and applied to predict chaotic financial process. The general methodological framework of classical and fuzzy modelling of economic time series is considered. A complete fuzzy time series modelling approach is proposed. To generate fuzzy rules from data, the neural network with Supervised Competitive Learning (SCL)-based product-space clustering is used.
1. Introduction Much of the literature in the field of the fuzzy logic and technology is focused on dynamic processes modelling with linguistic values as its observations (see e.g. [11]). Such a dynamic process is called fuzzy time series. This type of dynamic processes play very important role in making practical applications. Economic and statistical time series analysis is concerned with estimation of relationships among groups of variables, each of which is observed at a number of consecutive points in time. The relationships among these variables may be complicated. In particular, the value of each variable may depend on the values taken by many others in several previous time periods. Very often it is difficult to express exactly these dependencies, or there is not known hypothesis for that. Very frequently, in such cases more sophisticated approaches are considered. These approaches are based on the human experience knowledge and consist of series linguistic expressions each of which takes the form of an ' i f ... then ...' fuzzy rule, and they are well known under the common name fuzzy controllers. But also, an expert is usually unable linguistically describe the behaviour of economic processes in particular situations. Hence, most recent researches in the
* This work was supported by grants GACR 402/05/2768 and VEGA 1/2628/05.
275
276 fuzzy controllers design for deriving of linguistically interpreted fuzzy rules have been centred on developing automatic methods to build these fuzzy rules using a set of numerical input-output data. Majority of these models and data-driven techniques rely on the use Takagi-Sugeno type controllers and fuzzy/non-fuzzy neural networks, [6], [7], [18], clustering/fuzzy-clustering and genetic algorithm approaches [4], [6], [8], [9], [17], [19]. The goal of this paper is to illustrate that two distinct areas, i.e. fuzzy sets theory and computational networks, may be used to economic time series modelling. We show how to use and how to incorporate both fuzzy sets theory and computational networks to determine the fuzzy relational equations. As an application of proposed method, the estimate of the inflation is carried out in this paper. The characterisation of time series is introduced in Section 2. Quantitative modelling methods of time series are presented in Section 3 and 4. Concluding remarks are offered in Section 5. 2. Conventional and fuzzy time series Time series models are based on the analysis of chronological sequence of observations on particular variable. Typically, in conventional time series analysis, we assume that the generating mechanism is probabilistic and that the observed values {xl,x2,...,xl,...} are realisations of stochastic processes {X],X2,...,Xn...}. In contrast to the conventional time series, the observations of fuzzy time series are fuzzy sets (the observations of conventional time series are real numbers). Song and Chisson [16] give a thorough treatment of these models. They define a fuzzy time series as follows. Let Zt,(t= ..., 1, 2, ...), a subset of 5R , be the universe of discourse on which fuzzy sets x\, (i = 1, 2, ...) are defined and X{ is the collection of x\, (i = 1, 2,...). Then Xt, (t = ..., 1, 2,...) is called a fuzzy time series on Zn(t= ..., 1, 2,...). 3. Quantitative time series modeUing methods In practice, there are many time series in which successive observations are dependent. This dependence can be treated here as an observational relation Ro={(yl-.i,yl),(yt-2,yl-il-}sYl_: where Yt, Yt_x denote the variables and yt, yl_l,... of Yt and Yt_x respectively.
x Y„
(i)
denote the observed values
277 In most real economic processes it is assumed that there exists a functional structure between F M and Yt, i.e. /:r
M
->
r,
(2)
belonging to a prespecified class of mappings [5], In practice many real models of this functional structure are represented by linear relation y, = f(y,-\
>
(3)
where
(4)
y't_x e Yt_x, i e I, j e J, I and J are indices sets for Yt and
Yt_x respectively, " ° " is the sign for the max-min composition, R0(t, t - 1) is the fuzzy relation among the observations at t and t -1 times. Then Yt is said to be caused by Yt_x only, i.e. y\-x -*y\ Yt->Yt_x
(5) (6)
and Yt = 7 M °R{t,t-\),
(7)
where R (t, t - 1) denotes the overall relation between Yt and Yt_x. In the fuzzy relational equation (7) the overall relation R(t,t- 1) is calculated as the union of fuzzy relations Ry (t, t -1) ,i.e. R(t, t - 1) = U u Ry (t, t-l), where " U " is the union operator. In the following, we will use Mamdani's method [10] to determine these relations. For simplicity, in the following discussion, we can also
278 express yl_l and yJt as the values of membership functions for fuzzy sets y't_x and y\ respectively. Since the Eq. (4) is equivalent to the linguistic conditional statement "if y\_x then y{ ",
(8)
we have Ry (t, t — l)= y\_x x yJt , where "x" is the Cartesian product and therefore
R(t,t -1) = max { minCy,'.,, yj)}.
(9)
ij
Referring to the above definition b y Song and Chisson of the fuzzy time series, in fuzzy time series model Yt, Yt_x can b e understood as linguistic variables and y \ , y't_x as the possible linguistic values of Yt,
Yt_x
respectively. Equation (7) is called a first-order model of the fuzzy time series of Yt with lag p = 1. This first order model can b e extended to the p-th order model. See [16] for details. 4. Quantitative time series modelling methods All the above fuzzy time series models can b e determined if in particular models the fuzzy relations are known. Since finding the exact solution of fuzzy relations is generally very difficult and in practice unrealistic, hence, more sophisticated approaches are considered very frequently. In a fuzzy system, a powerful tool for generating fuzzy rules purely from data are neural networks. Neural networks can adaptively generate the fuzzy rules in a fuzzy system by SCL-based product-space clustering technique [8]. Next, in a numerical example, we will illustrate and show, how to obtain fuzzy rules using the fuzzy sets theory and neural networks. Let us consider a simple example. The data set used in this example (the 514 monthly inflation rates in the U.S.). A graph oh historical values of inflation is presented in Fig. 1. T o build a forecast model the sample period for analysis y x , •••> ^344
was
defined. The following statistical model was specified
yt=Z+Ay,-i+e,>
(10)
279 where the variable yt is explained by only on its previous values, and £t is a white noise disturbance term. Using Levinson-Durbin algorithm [2], [13] the model (10) is statistically fitted as
y,=-0,1248y M
(11)
The fuzzy time series modelling procedure consists of an implementation of several steps.
Figure 1. Natural logarithm of monthly inflation (514 observations).
Firstly, we specified input and output variables. The input variable x(_, is the lagged first difference of inflation values { y t } . The output variable xt is the first difference of inflation values { y t } . The variable ranges are as follows 0,75 < Xt, X M < 0,75 . These ranges define the universe of discourse within which the data of xt-l and xt are, and on which the fuzzy sets have to be specified. Next, we specified the fuzzy-set values of the input and output fuzzy variables. Each fuzzy variable assumed seven fuzzy-set values as follows: NL: Negative Large, NM: Negative Medium, NS: Negative Small, Z: Zero, PS: Positive Small, PM: Positive Medium, PL: Positive Large. Fuzzy sets contain elements with degrees of membership. Fig. 2 shows membership function graphs of the fuzzy sets above.
Figure 2. Fuzzy membership functions for each linguistic fuzzy-set value.
280 The input and output spaces were partitioned into the seven fuzzy sets. From membership function graphs jUt_x, jUt in Fig. 2 is shown that the seven intervals [-0,75; -0,375], [-0,375; -0,225], [-0,225 -0,075], [-0,075; 0,075], [0,075; 0,225], [0,225; 0,375], [0,375; 0,75] correspond respectively to NL, NM, NS, Z, PS, PM, PL. Next, we specified the fuzzy rule base or the bank of fuzzy relations. The appendix describes the neural network which uses the supervised competitive learning to derive fuzzy rules from data. As shown in Fig. 3(b) the bank contains the 5 fuzzy rules. For example die fuzzy rule of the 34 block corresponds to the following fuzzy relation if x\_x = PM then xj = PS. Finally, we determined the output action given the input conditions. We used the Mamdani's implication [13]. Following the above principles, we have obtained the predicted fuzzy value for the inflation xt = Xj45 =0,74933. To obtain a simple numerical value in the output universe of discourse, a conversion of the fuzzy output is needed. The simplest denazification scheme was used. Following this method, we have obtained the predicted value for the X345 = - 0,15 . The remaining forecasts for ex post forecast period t = 346, 347,... may be generated similarly. 5. Conclusion The method may be of real usefulness in practical applications, where the expert usually can not explain linguistically what control actions the process takes or there is no knowledge of the process. In principle a neural network can derive this knowledge from data. In practice this is usually necessary. Although the method has been carried out in the time series modelling field, it is suitable for other applications as data mining systems, information access systems, etc. Appendix GENERATING FUZZY RULES BY SCL-BASED PRODUCT-SPACE CLUSTERING The neural network pictured in Fig. 4 was used to generate structured knowledge of the form „if A, then B" from a set of numerical input-output data. In Section 4 we defined cell edges with the seven intervals of the fuzzy-set values in Fig. 2. The interval - 0,75 < Xt, Xt_x < 0,75 was partitioned into seven non-uniform subintervals that represented the seven fuzzy-set values NL, NM, NS, Z, PS, PM, and PL assumed by fuzzy variables xt_x and xt. The Cartesian product of these subsets defines 7 x 7 = 49 fuzzy cells in the input-output product space R2.
281 As mentioned in [8] these fuzzy cells equal fuzzy rules. Thus, there are total 49 possible rules and thus 49 possible fuzzy relations. We can represent all possible fuzzy rules as 7-by-7 linguistic matrix (see Fig. 4). The idea is to categorise a given set or distribution of input vectors Xr = (x,_, ,xt),t=\,2,..., 344 into 7 x 7 = 49 classes, and then represent any vector just by the class into which it falls. We used SCL (Supervised Competitive Learning) [10], [14] to train the neural network in Fig. 3. The software was developed at Institute of Computer Science of Faculty of Philosophy and Science, Opava. We used 49 synaptic quantization vectors. For each random input sample X, — (x,_ 1 5 x ( ), the wining vector W(., = (Wy,, W2j') was updated by the SCL algorithm according to Wu,
w2l. < - w2i + ri (x2, - w2i)J
W,, * - % - J] (x„ - Wu ) 1
w2, < - w2i -1] (x2, - w2l )J
where / ' is the winning unit defined |w(, -x,|<||w,. -xjl for all i, and where W(. and X, is a normalized version of W, and X, respectively, 77 is the learning coefficient. X, NL NM NS L
X, PS PM PL
NL NM NS 2
PS PM PL
Figure 3. The topology of the network
Figure 4. Distribution of input-output data (x
,,x )
for fuzzy rules generating by SCL-based product-space clustering.
in the input-output product space X,. x X, (a). Bank of the time series modelling system (b).
Supervised Competitive Learning (SCL)-based product-space clustering classified each of the 344 input-output data vectors into 9 of the 49 cells as shown in Fig. 4(a). Fig. 4(b) shows the fuzzy rule bank. For example the most frequent rule represents the cell 34. From most to least important (frequent) the fuzzy rules are (PM; PS), (PS; PL), (NL; NS), (PS; PL), and (PS; PS). References 1. 2.
G E. P. Box, and G. M. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, CA (1970). P. J Brockwell, and R. A Davis, Time Series: Theory and Methods. Springer-Verlag, New York (1987).
282 3.
4.
5. 6. 7. 8. 9. 10. 11.
12. 13. 14.
15. 16. 17.
18. 19.
B. Carse, T. C. Forgarty, and A. Munro, "Evolving fuzzy based controllers using genetic algorithms", Fuzzy Sets and Systems. Vol. 89: 273-293 (1996). M. Delgato, A.F.G. Skarmita, and F. Martin, "Afuzzy clustering-based rapid prototyping for fuzzy rule-based modelling", IEEE Trans. Fuzzy Systems. Vol. 5, No. 2: 223-233 (1997). M. Fedrizzi, M.M. Fedrizzi, W. Ostasiewic, "Towards fuzzy modelling in economics", Fuzzy Sets and Systems, 54": 259-268 (1993). J.Q. Chen, Z.G. Xi, and Z.J. Zhang,, "A clustering algorithm for fuzzy model identification", Fuzzy Sets and Systems. Vol. 98: 319-329 (1998). J.S.R. Jang, and C.T. Sun, "Neuro-fuzzy modelling and controF', in Proceedings of the IEEE. Vol. 83, No. 3, 378-406 (1995). B. Kosko, Neural networks and fuzzy systems - a dynamical systems approach to machine intelligence, Prentice-Hall International, Inc. (1992). R. Li, and Y. Zhang., "Fuzzy logic controller based on genetic algorithms", Fuzzy sets and Systems. Vol. 83: 1-10 (1996). E.H. Mamdani, "Application of a fuzzy logic to approximate reasoning using linguistic synthesis", IEEE Trans. Comput. 26: 1182-1191 (1997). D. Mar£ek, "Stock Price Forecasting: Autoregressive Modelling and Fuzzy Neural Network", Mathware & Soft Computing, Vol. 7, No. 2-3: 139-148 (2000). D.C. Mongomery, L.A. Johnston, J.S. Gardiner, Forecasting and Time Series Analysis. McGraw-Hill, Inc. (1990). A. Morettin, "The Levinson algorithm and its applications in time series analysis", International Statistical Review, 52: 83-92 (1984). J.J. Saade, "A defuzzification based new algorithm for the design of Mamdani-type fuzzy controllers", Mathware & Soft Computing. Vol. 7, No. 2-3: 159-173 (2000). P. Siarry, and F. Guely, "A genetic algorithm for optimising Tagaki-Sugeno fuzzy rule bases", Fuzzy Sets and Systems. Vol. 99, 37-47 (1998). Q. Song, B.S., Chisson, "Fuzzy time series and its models". Fuzzy Sets and Systems, 54: 269-277 (1993). M. Sugeno, and T. Yasukawa, "A fuzzy-logic-based approach to quantitative modelling", IEEE Trans. Fuzzy System. Vol. 1, No. 1: 7-31 (1993). H. Takagi, and I. Hayashi, "NN driven fuzzy reasoning", Int. Journal of Approximate Reasoning. Vol. 5, No. 3: 191-212 (1991). Y.S. Tarng, Z.M. Yeh, and C.Y. Nian, "Genetic synthesis of fuzzy logic controllers in turning". Fuzzy Sets and Systems, Vol. 83: 301-310 (1996).
INVESTMENT ANALYSIS USING GREY AND FUZZY LOGIC CENGIZ KAHRAMAN Department of Industrial Engineering, Istanbul Technical University,, Istanbul, Turkey
34367 Macka
ZIYA ULUKAN Department of Industrial Engineering, Galatasaray Turkey
University, 34357 Macka
Istanbul,
The theory of fuzzy logics founded by Zadeh in 1965 has been proven to be useful for dealing with uncertain and vague information. The grey theory that was first proposed by Deng (1982) avoids the inherent defects of conventional statistical methods and only requires a limited amount of data to estimate the behavior of unknown systems. In this paper, we use the fuzzy set theory and the grey theory to develop an efficient method to predict the cash flows of an investment. The cash flows obtained are used in present worth analysis to determine if the investment is acceptable. Illustrative examples are given.
1. Introduction Fuzzy theory, originally explored by Zadeh in 1965, describes linguistic fuzzy information using mathematical modeling. Because the existing statistical time series methods could not effectively analyze time series with small amounts of data, fuzzy time series methods were developed. Grey theory, originally developed by Deng (1982), focuses on model uncertainty and information insufficiency in analyzing and understanding systems via research on conditional analysis, prediction and decision making. In the field of information research, deep or light colours represent information that is clear or ambiguous, respectively. Meanwhile, black indicates that the researchers have absolutely no knowledge of system structure, parameters, and characteristics; while white represents that the information is completely clear. Colours between black and white indicate systems that are not clear, such as social, economic, or weather systems. The grey forecasting model adopts the essential part of the grey system theory and it has been successfully used in finance, integrated circuit industry and the market for air travel. The grey forecasting model uses the operations of accumulated generation to build differential equations. It has the characteristics of requiring less data. 283
284
Wang (2002) predicts the stock price instantly at any given time. Two problems in predicting stock prices are counted as that 1-there may be a large or small difference in two continuous sets of data and 2-the volume of stock data is so large that it affects our ability to use it. To solve these problems, Wang (2002) constructs a data mart to reduce the size of stock data and combines fuzzification techniques with the grey theory to develop a fuzzy grey prediction as one of predicting functions in the system to predict the possible answer immediately. Lin and Lin (2005) report the use of the grey-fuzzy logic based on orthogonal array for optimizing the electrical discharge machining process with multi-response. An orthogonal array, grey relational generating, grey relational coefficient, grey-fuzzy reasoning grade and analysis of variance are applied to study the performance characteristics of the machining process. The machining parameters (pulse on time, duty factor and discharge current) with considerations of multiple responses (electrode wear ratio, material removal rate and surface roughness) are effective. In this paper, the cash flows of an investment will be estimated using grey and fuzzy logic and these cash flows will be used to calculate the fuzzy present worth value. Fuzzy interest rates for the future periods will be also estimated by using grey fuzzy logic and will be used in the analysis. Comparisons of the results with the other methodologies and sensitivity analyses are the components of this study. 2. Fuzzy and Grey Time Series 2.1. Fuzzy time series Fuzzy theory, originally explored by Zadeh in 1965, describes linguistic fuzzy information using mathematical modeling. Because the existing statistical time series methods could not effectively analyze time series with small amounts of data, fuzzy time series methods were developed. Song and Chissom (1993a, b) proposed a first-order time-invariant model and a time-variant model of fuzzy time series in 1993. They fuzzified the enrollment at the University of Alabama in 1993 in the first application of fuzzy time series to forecasting. Then, in 1994, they proposed a new fuzzy time series and compared three different defuzzification models. The empirical result showed that the best prediction results are obtained when the neural network method is applied to defuzzify the data (Song & Chissom, 1994). Chen (1996) considered that the neural network method is too complicated to apply; he therefore presented arithmetic operations instead of the logic max-min composition. The arithmetic operations have a
285 robust specification and are superior to those applied in Song and Chissom's model. Hwang et al. (1998) defined a fuzzy set for each year, established the fuzzy relationship, and finally forecast the enrollment in the University of Alabama using the relation matrix. Empirical analysis revealed that the average error rate of Hwang's model was smaller than those of Chen and Song and Chissom. The following steps construct Hwang's fuzzy time series (Hwang et al., 1998): Step 1. Calculate the variations using the historical data. Step 2. Separate the universe of discourse U into several even-length intervals. In this step, the universe must first be defined: it includes die minimum number of units (Dmin) and the maximum number of units (Dm<„), according to known historical data. Based on Dmi„ and Dmm; the universe U is defined as [Dm^-Di; Dmax+D2]. Di and D2 are two proper positive numbers. Then, t/is divided into intervals with equal length. Step 3. Define the fuzzy time series F(t). The fuzzy time series is expressed as follows. F{t) = IC]/U]+IC2/u2+... + ICm/um (1) where / c - is the memberships and 0 < Ia
< 1 . Thus, the fuzzy sets (A,)
are expressed as:
A,
VCI
'Ml'-'C2 '
U
2>--->*Cm
/«J
(2)
Step 4. Fuzzify the variations of historical data. This step determines a fuzzy set equivalent to each set of data. If the variation falls within ut; then the degree of each historical datum belongs to each At is determined. Step 5. Calculate the relation matrix R(t): Two variables, which the operation w (w=2, 3, ..., n) is the window base and t is year. The operation matrix is expressed as follows:
F{t-2) O w (<) =
'
11
Fit-!) F(t-w-\)
21 O
'12 On
^22
O
o2m
(3)
O wl °w2 wn The criterion matrix is expressed as follows: C\t) = F\t -\) = \C\,C1,. ..,Cm J where C/ represents "decreases"; C2 represents "increases a little", and Cm represents "increases too much". The
286 relation for changing the degree of period t is thus obtained. The relation matrix R(t) is expressed as follows:
R„(t)=Cj{t)xo;(tl
R(t) =
F(t)-
\
\<j< m
(4)
O, xC
On x C, 0„xC,
0„ x C,
0,xC_
O.xC
0,xC,
O
xC
(5)
/?., /?_
Ma4^Ru,R2i,...,RjMa4K>R2i^R.2l---Max{K'R2^---^J] = k>>- 2 ,...,rj
(6)
Step 6. Defozzify the fuzzified predicted variations in Step 5. The principles of denazification are as follows: 1. If the membership of an output has only one maximum ui ; then select the midpoint of the interval that corresponds to the maximum forecast value. 2. If the membership of an output has one or more consecutive maximum, then select the midpoint of the corresponding conjunct interval as the forecast. 3. If the membership of an output is zero, then no maximum exists. Thus, the predicted degree of change is zero. Step 7. Calculate the outputs. The actual value of change for the preceding year is added to the forecast degree of change, yielding the forecast value for this year. 2.2. Grey forecasting model Grey theory, originally developed by Deng (1982), focuses on model uncertainty and information insufficiency in analyzing and understanding systems via research on conditional analysis, prediction and decision making. In the field of information research, deep or light colours represent information that is clear or ambiguous, respectively. Meanwhile, black indicates that the researchers have absolutely no knowledge of system structure, parameters, and characteristics; while white represents that the information is completely clear. Colours between black and white indicate systems that are not clear, such as social, economic, or weather systems. The grey forecasting model adopts the essential part of the grey system theory and it has been successfully used in finance, integrated circuit industry and the market for air travel (Hsu & Wang, 2002; Hsu, 2003; Hsu & Wen, 1998). The grey forecasting model uses the operations of accumulated
287 generation to build differential equations. Intrinsically speaking, it has the characteristics of requiring less data. The GM(1,1), can be denoted by the function as follows (Hsu, 2001): » be Step 1. Assume an original to series Step 2. A new sequence x(' operation (AGO).
is generated by the accumulated generating
* W , X W = (x^),x^(2),x^{3),...,x^{n)\
where * « ( * ) = J > ( 0 ) W f=i
Step 3. Establish a first-order differential equation, (dx^ldt) + az = u where = U {x {x {x z \k) = ca \k) + {}-a)x \k + \) k = \,2,...,n-\. adenotes dene a horizontal adjument coefficient, and 0 -< a < 1. The selecting criterion of a value is to yield the smallest forecasting error rate (Wen et al., 2000). Step 4. From Step 3, we have
x^(k + \) =
(x^{l)-^ e
U
ak
+—.
(7)
where
6=
a
= {BTB)'XBTY, 1 (2)
l
W B = -Z (3)
1
—
(8)
(9)
fii" (10)
Y = )
Step 5. Inverse accumulated generation operation (IAGO). Because the grey forecasting model is formulated using the data of AGO rather than original data, IAGO can be used to reverse the forecasting value. Namely
x{:\k) = xw{k)-x[l){k-\),
k = 2,3,.-,>
(11)
288 3. Forecasting Net Cash Flows and Fuzzy Present Worth of an Investment In the crisp case, to forecast the future cash flows (revenues and costs), various quantitative forecasting techniques are used. Among these techniques, linear regression analysis and exponential smoothing technique are the frequently used ones. In this paper, both fuzzy time series and grey forecasting model will be used to forecast the net cash flows of an investment. Later these forecasts will be compared and some sensitivity analyses will be made A net cash flow is the difference between total cash receipts (inflows) and total cash disbursements (outflows) for a given period of time. The assumptions we make are the followings: 1. the first cost of the prospective investment will almost be the same as the one of existing investments in the same sector, 2. the same trend in the existing investment will be saved in the prospective investment, 3. very few data are obtainable to forecast the future cash flows. Table 1 shows the net actual values of the reference investment. Table 1. Net actual values of the reference investment Year Net Actual values (x$1000) l
* (O) 0)
2
x®{2)
3
x<°>(3)
*
•
n
* % )
Using the fuzzy time series and grey forecasting technique above, we obtain the forecasted cash flow intervals. The forecasted series of the net cash flows are represented by symmetric triangular fuzzy numbers. Then, using the fuzzy present worth formula in Eq. (12) (Kahraman, 2001), we calculate the fuzzy present worth of the prospective investment.
/„,(#„ )= ftWWfM7)) for i =1,2 and fi(n,r) = (l + r) ". Both
02) F and 7 are positive fuzzy
numbers. Here fx (.) and f2 (.) stand for the left and right representations of the fuzzy numbers, respectively
289 4. Conclusions Fuzzy set theory has been used to develop quantitative forecasting models such as time series analysis and regression analysis, and in qualitative models such as the Delphi method. In these applications, fuzzy set theory provides a language by which indefinite and imprecise demand factors can be captured. The structure of fuzzy forecasting models are often simpler yet more realistic than non-fuzzy models which tend to add layers of complexity when attempting to formulate an imprecise underlying demand structure. When demand is definable only in linguistic terms, fuzzy forecasting models must be used. Cash flows can also be forecasted using fuzzy or grey theory when too few past data exist. Otherwise, crisp statistical techniques should be used. Acknowledgment We are grateful to Galatasaray University Research Foundation for supporting this paper. References 1. C. Goh, R. Law, Modeling and forecasting tourism demand for arrivals with stochastic nonstationary seasonality and intervention, Tourism Management, 23,499-510,(2002). 2. C.H. Wang, Predicting tourism demand using fuzzy time series and hybrid grey theory, Tourism Management,2i, 361-31 A, (2004). 3. C. Kahraman, Capital budgeting techniques using discounted fuzzy cash flows, in Da Ruan, Janusz Kacprzyk and Mario Fedrizzi eds., Soft Computing for Risk Evaluation and Management: Applications in Technology, Environment and Finance, (Physica Verlag, Heidelberg), 375396, (2001). 4. C.L. Hsu, Y.U. Wen, Improved grey prediction models for trans-pacific air passenger market, Transportation Planning and Technology, 22, 87-107, (1998). 5. C. Lim, M. McAleer, Time series forecasts of international travel demand for Australia, Tourism Management, 23, 389-396, (2002). 6. J.C. Wen, K.H. Huang, K.L. Wen, The Study of a in G M ( U ) Model, Journal of the Chinese Institute of Engineers, 23(5), 583-589, (20009. 7. J. Hwang, S.M. Chen, C.H. Lee, Handling forecasting problems using fuzzy time series, Fuzzy Sets and Systems, 100, 217-228, (1998). 8. J.L. Deng, Control problem of grey system, Systems and Control Letters, 1, 288-294, (1982).
290 9. J.L. Lin, C.L. Lin, The use of grey-fuzzy logic for the optimization of the manufacturing process, Journal of Materials Processing Technology, 160, 9-14, (2005). 10. K.H. Huarng, Effective lengths of intervals to improve forecasting in fuzzy time series, Fuzzy Sets and Systems, 123, 387-394, (2001). 11. L.C. Hsu, The comparison of three residual modification model, Journal of the Chinese Grey System Association, 4(2), 97-110, (2001). 12. L.C. Hsu, Applying the grey prediction model for the global integrated circuit industry, Technological Forecasting and Social Change, forthcoming, (2003) 13. L.C. Hsu, C.H. Wang, Grey forecasting the financial ratios, The Journal of Grey System, 14(4), 399^108, (2002) 14. Q. Song, B.S. Chissom, Forecasting enrollments with fuzzy time series,Part I. Fuzzy Sets and Systems, 54, 1-9, (1993a). 15. Q. Song, B.S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems, 54, 269-277, (1993b). 16. Q. Song, B.S. Chissom, Forecasting enrollments with fuzzy time series,Part II. Fuzzy Sets and Systems, 62, 1-8, (1994). 17. R. Law, Back-propagation learning in improving the accuracy of neural network-based tourism demand forecasting, Tourism Management, 21, 331340, (2000). 18. R. Law, N. Au, A neural network model to forecast Japanese demand for travel to Hong Kong, Tourism Management, 20, 89-97, (1999). 19. S.M. Chen, Forecasting enrollment based on fuzzy time series, Fuzzy Sets and Systems, 81, 311-319, (1996). 20. Y.F. Wang, Predicting stock price using fuzzy grey prediction system, Expert Systems with Applications.,22, 33-39, (2002). 21. Y.P. Huang, H.C. Chu, Simplifying fuzzy modeling by both gray relational analysis and data transformation methods, Fuzzy Sets and Systems, 104, 183-197, (1999).
AN EXTENDED BRANCH-AND-BOUND ALGORITHM FOR FUZZY LINEAR BILEVEL PROGRAMMING GUANGQUAN ZHANG, JIE LU, THARAM DILLON Faculty of Information Technology University of technology Sydney, POBox 123, Broadway, NSW2007, Australia email: fzhangg, jielu, tharam}@it.uts.edu.au
Abstract: This paper presents an extended Branch-and-Bound algorithm for solving fuzzy linear bilevel programming problems. In a fuzzy bilevel programming model, the leader attempts to optimize his/her fuzzy objective with a consideration of overall satisfaction, and the follower tries to find an optimized strategy, under himself fuzzy objective, according to each of possible decisions made by the leader. This paper first proposes a new solution concept for fuzzy linear bilevel programming. It then presents a fuzzy number based extended Branch-and-bound algorithm for solving fuzzy linear bilevel programming problems. Keywords: Bilevel programming; Branch-and-bound algorithm; Fuzzy sets; Fuzzy optimization; Decision making
1. Introduction Bilevel programming (BLP) has been developed for mainly solving decentralized planning problems with decision makers in a two level organization [1, 2, 10]. Decision maker at the upper level is termed as the leader, and at the lower level, the follower. Each decision maker (leader or follower) tries to optimize their own objective function, but the decision of each level affects the objective value of the other level [6]. Bilevel programming theory and method have been applied with remarkable success in different domains in decision making, for example, decentralized resource planning, electric power market, logistics, civil engineering, chemical engineering and road network management [7, 11], The vast majority of research on BLP has centered on the linear version of the problem, i.e., linear BLP. A set of approaches and algorithms have been well developed such as well known KuhnTucker approach [2] and Branch-and-bound algorithm [3,4], It has been observed that, in most real-world situations, the possible values of parameters in a bilevel programming model can be only imprecisely or ambiguously known and therefore are determined by model builders' understanding of the situations during model establishment process. It would be 291
292 certainly more appropriate to interpret the model builders' understanding of the parameters as fuzzy numerical data which can be represented by means of fuzzy sets [12]. Such a BLP problem in linear version is called a fuzzy linear bilevel programming (FLBLP) problem. The FLBLP problem was well researched by Sakawa et al. [8], Lai [5], and Shih [9]. However, to deal with some limitations of these approaches, this study proposes a new solution concept for FLBLP model. Under the solution concept, an extended Branch-and-Bound algorithm is developed to solve fuzzy linear bilevel programming problems. 2. A Fuzzy Bilevel Programming Model Uncertainty and imprecision are naturally appearing in various decision making including bilevel decision making problems. For example, logistics managers often imprecisely know the values of related constraints and evaluation criteria in making a logistics plan. They can only estimate inventory carrying costs and transportation costs of a particular set of goods. Also, for the evaluation of any alternative facilities, logistics managers can only assign values according to their experience, and these values assigned are often in linguistic terms, such as 'about 100', or 'about two times of the original cost". Obviously, when building a bilevel programming model for a decision problem, the parameters of the objective function or constraints of both the leader and the follower are hard to be set by precise numbers. The normal bilevel programming model which involves these issues is not efficient to express such decision problem as uncertain information and imprecise linguistic expressions are involved. This study therefore develops a FLBLP model as follows. ForxeXczR", yeYaRm, F:XxY -+F*(R), a n d / : it consists of finding a solution to the upper level problem: min F(x,y) = 'clx + dly
XxY-+F*(R), (2.1a)
xeJf
subject to Axx + Bxy-
(2.1b)
where v , for each value of x, is the solution of the lower level problem: min f(x, y) = c2x + d2y
(2.1c)
subject to A2x + B2y
(2.Id)
where cx,c2sF\Rn\
dx,d2eF\Rm\
\ = ^~e^F\R\
B2=fa)
bxeF'(Rp),
SysF'iR).
b2eF*(Rq),
293 In this model whatever in \heF(x,y), the leader's objective function, or in the/(x,j>), the follower's objective function, or their constraints, all parameters are allowed to be a fuzzy value. 3. The solution concept for the fuzzy bilevel programming model This section gives a necessary and sufficient condition for an optimal solution of a FLBLP problem defined by expression (2.1a)-(2.1d) so as to solve this problem. Associated with the FLBLP problem shown in (2.1a)-(2.1d), we now consider the following linear multi-objective multi-follower bilevel programming (LMMBLP) problem: F o r x e l c i ! " , yeY
F: XxY -+F*(R), and f:XxY
min (F(x, y)fx - c]xx + d]xy,
-+F*(R),
X e [0,1]
xeX
(3.1a)
min (F(x, yjfx = cfix + dixy,
A e [0,1]
xsX
subjectto A^xx+ B]xy
+ B]xy
A e[0,l]
mm(f(x,y))x=c2xx
+ d2xy,
Ae[0,l]
mm(f(x,y))x=c2Rxx
+ d2xy,
Ae[0,l]
(3.1b) (3.1c)
yeY
subjectto A2xx + B2xy
c^x,clx,c2x,c2Rx
b2Lx,b2ReR<,
<=k) <=kh
eR",
dlx,dlx,
<=fe) RP™>
d2x,d2R<=Rm,
^x,b^xeRp,
,
Obviously, the two groups of followers shown in (3.1c) are sharing of same variables. Based the model, we give the following theorems and lemmas in order to introduce an extended Branch-and-bound algorithm. Theorem 3.1 [13] Let ( * * , / ) be the solution of the LMMBLP problem (3.1). Then it is also a solution of the FLBLP problem defined by (2.1).
294 Lemma 3.1 [13] If there is (x ,y ) such that cx + dy>cx CQX + dQy>cl;x*+doy*
+ dy ,
and CgX + d^y >CgX* + d*y*, for any (x, y) and
isosceles triangle fuzzy numbers c and d, then c x
i
+ dJLy>c^x*
clx +
+dj[y',
dZy>c$x*+d«y\
for any X e (0,1), where c and d are the centre of "c and d respectively. Theorem 3.2 [13] For* e X c R" , yeYaRm,
If all the fuzzy coefficients
ay, by, ety, 7y, c, and dt have triangle membership functions of the FLBLP problem (2.1). t
/*?(') =
-t + z«
(3.2)
z
z0 - z
4<<
0
where z
denotes 2L-, 6,y, e«, 7y, c, and dj and z are the centre of
z respectively. Then, it is the solution of the problem (2.1) that (x ,y ) e R"xRm
satisfying m(F(x,y)) c min
=c]x + diy,
xeX
mm{F(x,y))o
=c^x + d^y,
xeX
(3.3a)
mm(F(x,y))o
=cl$x + dl%y,
xeX
subject to A)X + Bxy < 6,, (3.3b)
mm{f(x,y))c
= c2x + d2y,
yeY
mm(f(x, yj% = c2\x + d2L0y, yeY
mm(f(x,y)YA=c2*x yeY
(3.3c)
+ d2*y,
295 subject to A2x + B2y
(3.3d)
<*+
+ d]y)+(c]L0x + dlLQy)+(c]Rx + d]Ry)
(3.4a)
subject to A}x + Bxy
Kx
+
Ky
(3.4b)
Ao* + B\oy<[>\o, A2x + B2y
(3.4c)
B2Ry
u{Bx +B}LQ+ < ) + V(B2 + B2L0 + B2R)- w = -(d2 + d2l0 + d2R) «((*, + * , o + * i o ) - U + ^ i o + ^ i o ) « - (B. + K R
L
+ v f e + 62 J + b2 ) - (/f2 + A2 0 + A2
R
)K-(B2+
+ 5io M B2
L
(3 4e)
R
0 +B2
(3.4d)
)y)+wy
=0
x>0,y>0,u>0,v>0,w>0.
(3.4f)
Theorem 3.3 provides a theoretical foundation for extending exist Branchand-bound algorithm to handle the FLBLP problems. We now describe the basic idea of the extended Branch-and-bound algorithm. 4. An extended Branch-and-bound algorithm for FLBLP problems We first write all the inequalities (except of the leader's variables) of (2.1 a)(2.Id) as gj(x,y)>0,i = l,...,p + q + m , and note that complementary slackness simply means ujgj(x,y) = 0 (i = \,...,p + q + m). We suppress the complementary term and solve the resulted linear sub-problem. At each time of iteration the condition (3.4e) is checked. If it is satisfied, the corresponding point is in the inducible region and hence a potential solution to (2.1). Otherwise, a Branch-and-bound scheme is used to implicitly examine all combinations of the complementarities slackness. We give some notations for describing the details of the extended Branch-and-bound algorithm.
296 Let W = {\,...,p + q + m} be the index set for the terms in (3.4e), F be the incumbent upper bound on the objective function of the leader. At the kth level of an search tree we define a subset of indices Wk c W , and a path Pk corresponding to an assignment of either M; = 0 or g, = 0 for ieWk. Now let St={i:iefrk,ul=0) Sk ={i:ieWk,gl
= 0}
S°k={i:itWk}. For / € Sk , the variables uf or g, are free to assume any nonnegative value in the solution of (3.4) with (3.4e) omitted, so complementary slackness will not necessarily be satisfied. By using these notations we give all steps of the extended Branch-to-bound algorithm. Step 1 Step 2 Step 3 Step3.0 Step 3.1
The problem (2.1) is transferred to the problem (3.3) by using Theorem 3.2 The problem (3.3) is transferred to the following linear BLP problem (3.4) by using the method of weighting [7]. To solve the problem (3.4) (initialization) Set k = 0, S£ =>, Sk =(/>, S°k={\ p + q + m), and F = °o . (iteration k) Set u, = 0 for i e Sk and g, = 0 for ieSk . It first attempts to solve (3.4) without (3.4e). If the resultant problem is infeasible, go to Step 3.5; otherwise, put &<-& + l and label the solution (xk, yk,
Step 3.2 Step3.3
uk).
(Fathoming) If F(xk,yk k
(Branching) \fu gi(x
k
)>F,
then go to Step 3.5.
k
,y ) = Q, i = \,...,p + q + m, then go to Step
3.4. Otherwise select / for which w, g,(x ,yk)=A0 is the largest and label it /, . Put S+k <- Sk+ u{i,} , 5,° <- 5t° \{/,} , Sk <- Sk , append /, to Pk , and go to Step 3.1. Step 3.4
(Updating) Let F <- F(xk,
Step 3.5
(Backtracking) If no live node exists, go to Step 3.6. Otherwise branch to the newest live vertex and update Sk ,Sk , Sk and Pk as discussed below. Go back to Step3.1. (Termination) I f F = o o , there is not feasible solution to (2.1a)-
Step 3.6
yk).
297
Step 4
(2.1 d). Otherwise, declare the feasible point associated with F which is the optimal solution to (2.la)-(2.Id). Show the result of problem (2.1).
Some explanations are given for these steps and their working process as follows. After initialization, Step 3.1 is designed to find a new point which is potentially bilevel feasible. If no solution exists, or the solution does not offer an improvement over the incumbent (Step 3.2), the algorithm goes to Step 3.5 and backtracks. Step 3.3 checks the value of ufgj(xk,yk)
to determine if the complementary
slackness conditions are satisfied. In practice, if \ukgA < 10~6 it is considered to be zero. Confirmation indicates that a feasible solution of a bilevel program has been found and at Step 3.4 the upper bound on the leader's objective function is updated. Alternatively, if the complementary slackness conditions are not satisfied, the term with the largest product is used at Step 3.3 to provide a branching variable. Branching is always completed on the Kuhn-Tucker multiplier [2]. At Step 3.5, the backtracking operation is performed. Note that a live node is one associated with a sub-problem that has not yet been fathomed at either Step 3.1 due to infeasibility or at Step 3.2 due to bounding, and whose solution violates at least one complementary slackness condition. To facilitate bookkeeping, the path Pk in the Branch-and-bound tree is represented by a vector, its dimension is the current depth of the tree. The order of the components of Pk is determined by their level in the tree. Indices only appear in Pk if they are in either Sk or Sk with the entries underlined if they are in Sk~ . Because the algorithm always branches on a Kuhn-Tucker multiplier first, backtracking is accomplished by finding the rightmost non-underlined component ifPk, underlining it, and erasing all entries to the right. The erased entries are deleted from Sj~ and added to Sk . 5. Conclusions The issue addressed in this study is how to derive an optimal solution for the upper level's decision making for a bilevel programming problem. This paper proposes a fuzzy number based extended Branch-and-bound algorithm to solve fuzzy linear bilevel programming problems. Further study includes the development of models and approaches for fuzzy bilevel multi-follower
298 programming and fuzzy bilevel multi-objective programming problems. In fuzzy bilevel multi-follower programming, the relationships among these multiple followers will be classified into cooperative and non-cooperative situations. Therefore a set of models and algorithms need to be developed. In fuzzy bilevel multi-objective programming, fuzzy multi-objective programming and fuzzy bilevel programming approaches will be integrated to lead a satisfactory solution for the decision makers. Acknowledgments The work presented in this paper was supported by Australian Research Council (ARC) under discovery grants DP0557154 and DP0559213. References [I] G. Anandalingam and T. Friesz, Hierarchical optimization: An introduction, Annals of Operations Research Vol. 34 (1992), 1-11 [2] J. Bard, Practical Bilevel Optimization: Algorithms and Applications (Kluwer Academic Publishers, Amsterdam, 1998) [3] J. Bard and J. Falk, An explicit solution to the programming problem, Computers and Operations Research 9(1982), 77-100 [4] P. Hansen, B. Jaumard, and G. Savard, New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing 13(1992), 1194-1217. [5] Y.J. Lai, Hierarchical optimization: A satisfactory solution, Fuzzy Sets and Systems 77(1996), 321-335 [6] T. Miller, T. Friesz and R. Tobin, Heuristic algorithms for delivered price spatially competitive network facility location problems, Annals of Operations Research 34(1992), 177-202. [7] M. Sakawa, Fussy sets and interactive mulitobjective optimization (Plenum Press, New York, 1993). [8] M. Sakawa, I. Nishizaki, and Y. Uemura, Interactive fuzzy programming for multilevel linear programming problems with fuzzy parameters, Fuzzy Sets and Systems 109(2000), 3-19 [9] H.S. Shih, YJ. Lai, and E.S. Lee, Fuzzy approach for multilevel programming problems, Computers and Operations Research, 23(1996) 73-91 [10] H. Stackelberg, The Theory of the Market Economy (Oxford University Press, New York, Oxford, 1952) [II] D. White and G. Anandalingam, A penalty function approach for solving bi-level linear programs, Journal of Global Optimization, 3(1993), 397-419. [12] L. A Zadeh, Fuzzy sets, Inform & Control 8(1965), 338-353 [13]G. Zhang and J. Lu, The definition of optimal solution and an extended Kuhn-Tucker approach for fuzzy linear bilevel programming, The IEEE Computational Intelligence Bulletin 2 (2005), 1-7.
FUZZY MULTI-OBJECTIVE INTERACTIVE GOAL PROGRAMMING APPROACH TO AGGREGATE PRODUCTION PLANNING
TIJEN ERTAY Istanbul Technical University Faculty of Management, Management Engineering Department Macka, 34367, Istanbul, Turkey. Abstract: In this paper, we consider an interactive goal programming approach for fuzzy multi objective linear programming application to aggregate production planning problems. Our aim is to determine the overall degree of decision maker satisfaction with the multiple fuzzy goal values and to give the exactly satisfactory solution results for decision maker in illustrative example.
1. Introduction As known, aggregate planning is concerned with the simultaneous establishment of a firm's production, inventory and employment levels over a finite time horizon. In other words, aggregate production planning (APP) is a medium range capacity planning method that encompasses a time horizon. This plan provides the basic input from which more detailed, product-specific production plans are derived and with which longer term strategic decisions are made. APP is an important upper level planning activity in a production management system, other forms of product family disaggregating plan, material requirements plan all depend on APP in a hierarchical way. APP has attracted considerable interest from academics for a long time. Holt et al. (1960) proposed a continuous timevaried model for an optimal employment, production and inventory policy in a single manufacturer under the assumption of a given sales forecast. Much considerable attention has been directed towards aggregate production problems and different optimization models have been developed. (Charnes and Cooper, 1961; Singhal and Adlakha, 1989; Bergstrom and Smith, 1970) But, in realworld APP problems, the input data or parameters such as demand, resources, cost and the objective function are often imprecise because of being incomplete or unobtainable information. Conventional mathematical programming cannot solve all fuzzy programming problems. Zimmermann (1976) first introduced fuzzy set theory into conventional linear programming. This study considered LP problems with a fuzzy goal and fuzzy constraints. Hintz and Zimmermann 299
300 (1989) proposed an approach based on fuzzy linear programming (FLP) and approximate reasoning to solve APP. Lee(1990) discussed fuzzy aggregate production planning problems with single product type, under the environment of fuzzy objective, fuzzy workforce levels and fuzzy demands in each period. A linear programming model with fuzzy objective and fuzzy constraints is developed, and fuzzy solutions under different levels can be achieved through parametric programming technology. Wang and Fang (2001) presented a novel FLP method for solving the APP problem with multiple objectives where the product price, unit cost to subcontract, workforce level, production capacity and market demands are fuzzy in nature. Fung et al. (2003) proposed a fuzzy multiproduct aggregate production panning (FMAPP) model to cater to different scenarios under various decision making preferences by applying integrated parametric programming, best balance and interactive methods. Wang and Liang (2004) developed a fuzzy multi-objective linear programming model with the piecewise linear membership function to solve multi-product APP decision problems in a fuzzy environment. Tang et al. (2000) focused on a novel approach to modeling multi-product aggregate production planning problems with fuzzy demands and fuzzy capacities. The objective of this study is to minimize the total costs of quadratic production costs and linear inventory holding costs. By means of formulation of fuzzy demand, fuzzy addition and fuzzy equation, the production inventory balance equation in single stage and dynamic balance equation are formulated as soft equations in terms of a degree of truth and interpreted as the levels of satisfaction with production and inventory plan in meeting market demands. This study considers an interactive goal programming approach for fuzzy multi-objective linear programming to solve APP problems. This model considers to minimize production cost, inventory carrying and backordering cost and hire and lay off worker costs.
2.1 Problem Formulation and assumptions The multi product APP problem can be described based on assumptions that a firm produces N type of products to satisfy the market demand in each period t (t=l,2,....T). The decision making problem is related to determine the compromise solution for meeting forecast demand by considering the Model 1 and Model II. All of those in Model II are fuzzy with indefinite goal level while all of the considered objective functions are crisp values in Model I. Besides, all of the considered objective functions are linear. The values of all parameters are definite over the planning time horizon. The forecast demand over time period is either met or backordered, if there is a backorder in a period, this backorder should be procured in the following period. The used nomenclatures are as follows.
D, : forecasted demand for nth product in period t (units) C,
: Cost to hire one worker in period t ($/man - hour)
CL, : Normal production cost per unit for nth product in period t Cy
: Cost to layoff one worker in period t ($/man - hour)
Wh,: Workers hired in period t (man - hour) (decision variable) WL : Workers laid off in period t (man - hour)(decision variable) Qt : Normal time productionof nth product in period t (units)(decision variable) Otn: Overtime production of nth product in period t (units)(decision variable) Cfjj: Overtime production cost per unit of nth product in period t ($/unit) C: : Inventory carrying cost per unit of nth product in period t($/unit) L : Inventory level in period t of nth product (units) (decision variable) C!f : Backorder cost per unit of nth product in period t ($/unit) B[n : Backorder level of nth product in period t (units) (decision variable) a>tn : Hours of worker per unit of nth product in period t (man - hour/unit) /itn : Hours of machine usage per unit of nth product in period t (machine - hour/unit) Wtmax: Maximum worker level available in period t (machine - hour) Mtmax : Maximum machine capacity available in period t (machine - hour) I.n min : Minimum inventory level of nth product in period t (units) B,n
max:
Maximum backorder level available of nth product in period t (units)
Model I formulations are as follows.
302
n=l 1=1
n=l 1=1
1=1
(Minimize Costs of total production) Min&i = £ ^jfL
*/,„ +C* *Bln]
(2)
n=l 1=1
( Minimize Inventory Carrying and Back ordering Costs) Ming3=Jjcr*WH+Cr-*Wl, 1=1
I
(3)
( Minimize Costsof Changes in Vorker levels) subjecto l,n-Bu =l,-,„-B,^+a,+Om- Dm V(,Vn 'tnmin
(4) (5) (6)
V/,Vn Vf.Vn
—'In
B
mmax-Bm N
fJG>,-jQ,-l,n+0,-J+Wfi-Wl,
<7)
n=l
n=l N
V;
^^(Qn+Oj^Wt^
(8)
n=l N
Vf
imax
(9)
n=l
Non- negative decision variables; Qm:0,„;la;Bm;Wh,;Wl, >0;
Vf; Vn
2.2 The Proposed Fuzzy Multi-objective LP Model (Model 11) In this study, multi-objective linear programming model can be converted into the multiple fuzzy objective linear programming models by considering a linear membership function to indicate the fuzzy goals of decision maker. (Bellman and Zadeh, 970). The proposed model in this study is mainly inspired from the model considered by Wang and Liang (2004). The present model differs from Wang and Liang's model, in terms of the considered constraints and objective functions. The our model is not considered maximum machine capacity and warehouse capacity and the escalating factors in each of the costs categories over the next T planning horizon. In this study, the linear membership function considering for each objective function is determined as follows.
fki)=
8
'~~8/» / gi
0
u -gi
slight ft*rf
do)
303 where giamtgf indicate the lover and upper bounds of objective function respectively. The interval \g',,g?\ of goals values can be determined ith according to decision maker's view. If decision maker is not satisfied according to the initial solution, the model should be revised until a suitable solution is obtained. First, the APP problem is solved using crisp multi objective linear programming (MOLP) model. Second, the linear membership function is determined for each value of the obtained objective function. Third, the variable X is determined as the overall degree of decision maker satisfaction with the multiple fuzzy goal values. The value X is maximized. The new constraints for upper and lower values of each objective are added to the above model. The new obtained fuzzy multi objective linear programming problem should be solved. If the new solution is not acceptable according to DM considering initial solution, then the model must be changed as a satisfaction solution is procured. Model II Maxk
(11)
subjecto
(i&-8,W£- -«,0>A Vi l
B
tn~ m~h-lr,
~B,-I.n+Q„+Om-
(12) Dm
Vt.Vn
(13)
'tnmm—'tn
Vf.Vn
(14)
B
V*,Vn
(15)
,nmax^Bln N
N
2>-;.„(Q-;.,+Q-J+Wt- -VV( n=l N
(16)
n=l
J^u
\/t
n=l N
YfiniQn+OjlM,^
V»
(17)
(18)
3. An application of interactive goal programming approach for fuzzy multi-objective linear programming problem 3.1. A Case Study The considered fuzzy model is applied to an illustrative case study. The APP strategy of this case study is related to procure fluctuating demand to be met using inventories, overtime, backorders based on constant work force level over the planning time horizon. The planning horizon time is six months long, there are three types of products in illustrative case study. The considered data for case study are shown in Table 1.
304 Table 1. The data for Illustrative case study Period
Product
1
2
3
4
D
C
C°
cL
c,B„
1
1500
20
30
0.20
2 3 1 2 3 1 2 3 1 2 3
1200 2400 800 3000 2000 4000 3000 2000 2400 5000 3000
26 18 20 26 18 20 26 18 20 26 18
39 27 30 39 27 30 39 27 30 39 27
0.15 0.10 0.20 0.15 0.10 0.20 0.15 0.10 0.20 0.15 0.10
*m
Mm
40
0.010
0.12
200
52 36 40 52 36 40 52 36 40 52 36
0.012 0.014 0.010 0.012 0.014 0.010 0.012 0.014 0.010 0.012 0.014
0.06 0.08 0.12 0.06 0.08 0.12 0.06 0.08 0.12 0.06 0.08
400 300 200 400 300 200 400 300 200 400 300
"nraax
W
M
itran
250
350
400
600
300
700
200
400
The other data are shown as follows: • Initial inventory of each product is 400, 300, and 200 respectively. • End inventory of each product is 200, 150, and 100 respectively. • The initial worker level is 200 man-hours. • The costs related to hiring and layoffs are $8 and $4 per worker-hour, respectively. 3.2 Interactive solution procedure First, APP problem for illustrative case study should be solved according to Model I considering data in Tablel. The initial solutions for each objective function are obtained based on crisp Model 1. The results are g, =596897.4 g2 = 1374.667 g3 =499.933 To determine the linear membership function of each objective function, the upper and lover level of each objective function are decided by asking the decision maker. The fuzzy multi objective linear programming model can be formulated as in Model II. The related model is solved using the LINDO computer software package. The obtained results are given in Table 2. According to decision maker's view, the results can be modified interactively by changing the parameters and membership functions. In this study, essentially, it has been considered membership function's change ranges. This proposed model provides the overall levels of decision maker satisfaction for X values. For example, if X equal to zero, none of the goals are satisfied.
305 Table 2. Optimal Production Plan according to Model II Period 1
Product
e™
Om
1
990 900 2200 2666 3000 2000 2042 3000 3569 0 5150 1530
0 0 0 0 0 0 0 0 0 0 0 0
2 2
3 1
2 3
4
3 1 2 3 1
2 3 X =1.0
g]
=597306.43 g2 = 1606.44
'TN
90 0 0 1957 0 0 2600 0 1569 0 0 0
B
TN
0 0 0 0 0 0 0 0 0 0 0 0
Whj-
WlT
0
148
39
0
0
0
0
23
g3 =999.99
Using Model II to simultaneously minimize total production costs, carrying and backordering costs and costs of changes in labor levels, yields total production cost of $ 597306.43, carrying and backordering costs of $ 1606.44 and cost of changes in labor levels of $ 999.99. In this situation, the upper and lover level of objective functions are as follows: gl(upper): 700.000 $ ; gl(lover)=500.000 $; g2(upper)= 1600 $ ; g2(lover)=1000 $ ; g3(upper)=1000 $ ; g3(lover)=0. In our example, the exactly satisfaction solution results for decision maker are given in below line in Table 2. 4. Conclusion The proposed model yields an efficient compromise solution and the overall level of decision maker satisfaction given determined multiple fuzzy objective values. The major limitations of the proposed Model II concern the assumption made in determining each of the decision parameters. However, the proposed model constitutes a systematic framework that facilities the decision -making process, enabling the decision maker interactively to modify the fuzzy data and related model parameters until a satisfactory solution is found. In our example, the exactly satisfactory solution results for decision maker are given. References l.Holt, C. C. et al., 1960 Planning Production Inventories and Workforce. New Jersey: Prentice Hall 2.Zimmermann H.J., 1976 Description and optimization of fuzzy systems. International Journal of General Systems, 2,209-215.
306 3.Lee Y. Y., 1990 Fuzzy set theory approach to aggregate production planning and inventory control. PhD. Dissertation. Department of IE., Kansas State University. 4.Wang R.C. and Hsiao- Hua Fang, 2001 Aggregate production planning with multiple objectives in a fuzzy environment. European Journal of Operational Research, 133, 521-536. 5.Fung R.Y.K., Tang J., Wang D., 2003 Multi product Aggregate production planning with fuzzy demands and fuzzy capacities. IEEE Transactions on Systems, Man and Cybernatics-Part A: Systems and Humans 33, 3, 302-313. 6.Wang, R.C., Liang T.F., 2004 Application of fuzzy multi objective linear programming to aggregate production planning. Computers and Industrial Engineering, 46, 1, 17-41. 7.Tang, J., D. Wang, R.Y.K. Fung, 2000. Fuzzy formulation for multi product aggregate production planning. Production Planning and Control, 11, 670-676. 8.Bellman R.E., L.A. Zadeh 1970, Decision-making in a fuzzy environment, Management Science, 17, 141-164.
FUZZY LINEAR PROGRAMMING MODEL FOR MULTIATTRIBUTE GROUP DECISION MAKING TO EVALUATE KNOWLEDGE MANAGEMENT PERFORMANCE* Y. ESRA ALBAYRAK Engineering and Technology Faculty.Galatasaray University, Ciragan Cad. No: 36, 34357 Ortakoy, Istanbul Turkey Tel: 90 212 227 44 80 (435) Fax: 90 212 259 55 57
YASEMIN CLAIRE ERENSAL Engineering Faculty, Dogus University, Acibadem Zeamet Sokak 34722 Kadikoy, Istanbul Turkey Tel: 90 216 327 11 04 (1377) Fax: 90 216 326 33 Abstract: In this paper, we develop a linear programming technique for multidimensional analysis of preference (LINMAP) method for solving multiattribute group decision making (MAGDM) problems with preference information on alternatives in fuzzy environment. Our aim is to develop a fuzzy LINMAP model to evaluate and to select of knowledge management (KM) tools. KM decision-making problems are often associated with evaluation of alternative KM tools under multiple objectives and multiple criteria.
1. Introduction In this paper, we investigate the fuzzy linear programming technique (FLP) for multiple attribute group decision making (MAGDM) problems with preference information on alternatives. In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting, evaluation or ranking alternatives that are characterized by multiple, usually conflicting, attributes [1]. In this paper, to reflect the decision maker's subjective preference information and to determine the weight vector of attributes, the technique for order preference by similarity to ideal solution (TOPSIS) developed by Hwang &Yoon and the linear programming technique for multidimensional analysis of preference (LINMAP) developed by Srinivasan and Shocker [2] are used. LINMAP method is based on pairwise comparisons of alternatives given by decision makers and generates the best compromise alternative as the solution that has the shortest distance to the positive ideal This research has been financially supported by Galatasaray University Research Fund.
307
308 solution. In this paper, according to the concept TOPSIS, we define the fuzzy positive ideal solution (FPIS) and fuzzy negative ideal solution (FNIS). Because organizations operate in different business contexts and drivers of knowledge management are often unique for each company, KM decision-making problems are often associated with evaluation of alternative KM tools under multiple objectives and multiple criteria. We proposed a linear programming technique for multidimensional analysis of preferences under fuzzy environment in evaluating KM tools. The use of fuzzy linear programming (FLP) to KM will be discussed and this approach to KM problems has not been appeared in the literature. The weights are estimated using fuzzy linear programming model based on group consistency and inconsistency indices. Through the proposed methodology in this research, enterprises can reduce the mismatch between the capability and implementation of the KM system, and greatly enhance the effectiveness of implementation of the KM systems. Finally, the developed model is applied to a real case of assisting decision-makers in a leading logistics company in Turkey to illustrate the use of the proposed method, 2. The Basic Model The main focus of this paper is to provide a fuzzy linear programming model [3], for multidimensional analysis of preferences (Fuzzy LINMAP). Consider a MADM problem with n alternatives A.,i = l,2,....n, and m decision attributes (criteria),
c .,j = l,2,....m. xy,
by z> = (*..)
x
component of a decision matrix denoted
,is the rating of alternative At with respect to attribute C. . Let
w=(wr wr ..., w / be the vector of weights, where £ w. = i,w.>o,j = l,2,....m and j=i
J
J
w. denotes the weight of attribute C. [4]. In this methodology, linguistic variables are used to model human judgments. These linguistic variables can be described by triangular fuzzy numbers. x..=ia..,b..,cA [5],[6]. 2.1 Distance between two triangular fuzzy numbers Let m = (m1,m2,w3) and n=(«1,«2»«3) be two triangular fuzzy numbers, then the vertex method is defined to calculate the distance between them as [7], d(m,n) = J - l (m, - n})2 + (m2 - n2)2 + (m3
-n})2\
(1)
309 2.2
Normalization
Suppose the rating of alternative A.(i = l,2,...n) on attribute C.(j = l,2,...m) given by DM P (p = l,2,...P) is xp =iap ,bp ,cpA . A fuzzy multiattribute group decision making problem can be expressed in matrix format ( D p = [ xP. ]
).
c\ap;ap E X , ' = (ap.,bp ,cp. ) , i = l,2,..,n;p r = 1,2,..,P\ A y ij v ij IJ y y --min)ap;ap exp = (ap ,bp ,cp ), i = l,2,...n; p = 1,2 P] ij
U
ij
ij
V
r
'J
I
(2) bmax^min cmax cmax h a v e
als0
game
meaning
In
MADM problems, there are
benefit (B) and cost (C) attributes. Using the linear scale transformation, the various criteria scales are transformed into a comparable scale. forjeB
max
J
mm ,mm mm a . 0. c. J J J
and yp =
for j e c
'ij
J
V V
)
V
(3)
V)
We can obtain the normalized fuzzy decision matrix denoted by Yp. Yp=\yp) p = l,2,..,P; wherey P ={yp ,yp,,yp\ are normalized triangular 6 V V )„xm 'J VvLyiJM yUR) h fuzzy numbers and denote the location of the i' alternative in the /n-dimensional space (criteria). 2.3 Fuzzy group LINMAP Let X
=\xJ,x2...
model
..x i is the fuzzy positive ideal point, i.e., the alternative
location most preferred by the individual, the square of the weighted Euclidean distance between Y.p and x , where x.=(x*L,x.M,x.R)dre
triangular fuzzy
numbers, can be calculated as ~X)L>2
"Srpj^&vL
+
(ym ~*jM>2 +(y0R ~X)R>2]/2
for
ieA
(4) The squared distance s. = d2 is given by S? = E v
i>$4 (5)
p
S can
s=
be
rewritten
using
triangular
fuzzy
l hfjltyvL-*V+<»w-*wt+(yw-xV\ y
ijL~XjL'
'"ijM
"jM/
'KJ,ijR
numbers
x*
as [8],
Suppose that the DM
310 P (p = l,2,...,P)gives the preference
relations
between
alternatives
by
np = {{k,l); Akp Ar k,l = l,2,...,n)]where pp is a preference relation given by the DM P„
s
*--}AdW%•s''lrk
#€
are squared
weighted
Euclidean distances between each pair of alternative {k,l) and the fuzzy positive ideal solution (x ) . For every ordered pair ( t , / ) e Q , the solution would be consistent with the weighted distance model if sf > sf and there is no error attributable to the solution [2]. If sf < S%, (sf -sf) an index (Sf-Sff
gives the error. We define
to measure inconsistency between the ranking of
alternatives and the preferences, i.e., to denote the error of the pair (k,l); (Sf - Sff = 0 if Sf > Sf and (Sf -Sff=Sf-
Sf if Sf < Sf (6)
Then the inconsistency index can be rewritten as, (Sf-Sff
=max[o,Sf-Sf
(7) For all the pairs in Q , the total inconsistency is B"= Y
(Sf-Sff (8)
and the total poorness of fit for the group is
B=ZB" = J: p=\
Y
(sf-sfy
p=\ (A,7)efi
(9) Our objective is to minimize the sum of errors for all pairs in Q Similarly, the total goodness (G) of fit for the group is
G=J:Gp = J: p=l
Y
1
PY (Sf-Sf)
(10)
p=\ (i,7)en
Substituting for B and G from (9) and (10), we get;
I (kj)&np
(Sf-Sff-
S
(Sf-Sff
= G-B = h
(k,i)enp
(11) h is arbitrary positive number. The constraint imposes the condition that the goodness of fit G should be greater the poorness of fit B. Let
311 Z^ = max [0, S? - sf} for each (k,l)e Q." and with z£ > 0, we have z£ > S^ - ^ . The problem of finding the best solution(w,x ) reduces to finding the solution(w,v) [9], which minimizes Eq.(12) subject to the constraints [8]. minimize^ £
Z
Zyf
subject to the constraint I m
P
yfjL -^Mi-yiMi-^R 2 P 3p=l
5> H
-I*.
£ (kJtef
(
«
•*#£ 'IqLyYljM
>
!
x
•
jM
kjM\\ \yljR
(kjjeif
ytiM-^wY^jR '
J=>
^ (k,i)enf
'fw-^
=h
y
kjR
M */-=' ZP>0, m
Iw=;, H J w. >0,
j=lZ...m j=l2...m
Using K = }v.|= fw .x*) we can write as vy.,l =wy.x.., v ... =wy.x... and vjR.„ = wJ .xjR.„ y £ ' /A/ yAf By solving this linear programming, w., v i , v
v (eq. 20) can be obtained and
J* is computed. 3. Application 5./ Evaluation Criteria for the KM tools In order to formulate the multiattribute evaluation model, it is necessary to identify the factors that influence KM practitioners' choice of KM tools. After discussions with four KM consultants and the operations manager, we studied the features of the KM tools provided by vendors, reviewed the literature for selecting software, and identified three essential evaluation criteria to use in selecting the best KM tools: cost, functionality and vendors.
312 3.1.1 Cost Cost is a common factor influencing the purchaser to choose the software [10]. It is the expenditure associated with KMS and includes product, license, training, maintenance and software subscription costs. 3.1.2 Functionality Functionality refers to those features that the KM tool performs and, generally, to how well the software can meet the user's needs and requirements. 3.1.3 Vendor The quality of vendor support and its characteristics are of major importance in the selection of software, such as in [11]. It is also critical for the successful installation and maintenance of the software. 3.2 KM tools (Alternatives) Alternative 1. Knowledger: Knowledger consists of components that support personal KM, team KM, and organizational KM. The benefit of these components is that, through the knowledge portal, it is possible to manage, collaborate, capture and convey information and so forth to the teams or organization. Alternative 2. eRoom; The eRoom software is a digital workplace that allows organizations to quickly assemble a project team, wherever people are located and to manage the collaborative activities that drive the design, development and delivery of their products and services. Alternative 3. Microsoft SharePoint Portal Server; SharePoint Portal Server software is a KM tool that is an end-to-end solution for managing documents, developing custom portals and aggregating content from multiple sources into a single location. The proposed method is currently applied to solve KM tools selection problem and the computational procedure is summarized as follows: Stepl: The experts P (p = 1,2,3) give their preference judgments between alternatives with paired comparisons as n1 = \(l,2),(2,3)}, Q2 = {(1,2),(1,3)}, n3 = {(2,i),(3,2)} i.e., 1 is preferred to 2, 2 is preferred to 3, etc. Step2: The experts use the linguistic rating variables (shown in Table 1) to evaluate the rating of alternatives with respect to each attribute. The data and ratings of all alternatives on every attribute are given by the three experts PrP2,P3 as in Table 2.
313 Table 1 Linguistic variables for the ratings Very Poor (VP) (0, 0.1,0.3) Poor(P) (0.2 0.3,0,4) (0.4 0.5,0.6) Fair(F) Good (G) (0.6 0.7,0.8) Very Good (VG) (0.8 0.9, 1.0) Table 2 Decision information and ratings of the three alternatives Criteria
Alternatives
C, (SxlO3)
Decision Makers Pi Pi Pi 50,000 50,000 50,000 35,000 35,000 35,000 25,000 25,000 25,000 Fair Good VeryG Poor Fair Poor Very G Good Good Fair VeryG Good Good Good VeryG Good Fair Good
A, A2 A, A, A2 Ai A, A2 A3
C2
C3
Step3: Constructing the normalized fuzzy decision matrix Y for expert 1 (using Eqs.(2 ) and (3)) X
X
1
Y
=A2 A
3
(0.5,0.5,0 .5) (0.71,0.71 ,0.71) (1.0,1.0,1 .0)
X
3
2
(0.6,0.77, 1.0) (0.8,1.0,1 .0) (0.2,0.33, 0.5) (0.6,0.77, 1.0) (0.8,1.0,1 .0) (0.6,0.77, 1.0)
We can obtain the normalized decision matrices Y2 and Y3 of the experts P2andP To obtain the best weights and ideal point, taking h = 1.0 and using Yp and w ; =0.284,
fipwe
solve
linear
programming
problem
(Eq.
(12)).
w2= 0.398, w} = 0.318 and
x* =((0.27, 0.27, 0.27), (0.19, 0.20, 0.22), (0.23, 0.24, 0.25)).
Using
Eq.
(6),
the
distances between YP and the positive ideal x* can be obtained. According to distances, the ranking orders of the three alternatives for the three experts are as follows: ForP, :A2pA3pAj ForP 2 : A 3 pA, pA2 ForP 3 : A 3 pA2pA, The group ranking order of all alternatives can be obtained using social choice functions such as Copeland's function [12]. Copeland's function ranks the alternatives in the order of the value of / (x), Copeland score, that is the number of alternatives in alternative set that x has a strict simple majority over, minus the number of alternatives that have strict simple majorities over X . Alternatives
A, A2 A,
Table 3 Copeland's scores Decision Makers P/
Pj
Pj
-1,-1 1,1 1,-1
-1 , 1 -1 ,-1 -1 J
-1,-1 -1,1 1,1
Copeland's scores -4 0 2
314 According to the Copeland's scores, the ranking order of the three alternatives is A3, A2, Aj. The best alternative is A3. 4. Conclusion This paper offers a methodology for analyzing individual and multidimensional preferences with linear programming technique under fuzzy environments. In this paper, a systemic approach is proposed using fuzzy linear programming to evaluate an appropriate KM tool for the organization. To reflect the DM's subjective preference information, a fuzzy LINMAP model is constructed to determine the weight vector of attributes and then to rank the alternatives. The development of a KMS is still relatively new to many organizations. This study has several implications for KM practitioners who intend to evaluate KM tools to build a KMS. Through the proposed methodology in this research, enterprises can reduce the mismatch between the capability and implementation of the KM system, and greatly enhance the effectiveness of implementation of the KMS. References 1. Hwang, K. P. and Yoon, K. P., (1995). Multiple Attribute Decision Making, Sage University Paper, Iowa. 2. Sinnivasan, V., Shocker, A.D., (1973). Linear Programming Techniques for Multidimensional Analysis of Preferences, Psychometrica, 38 (3), 337-369. 3. Hwang, C.-L., and S.-J. Chen, in collaboration with F.P. Hwang (1992). Fuzzy Attribute Decision Making: Methods and Applications. Springer-Verlag, Berlin. 4. Wang, Y., M., Parkan, C, (2005). Multiple Attribute Decision Making Based on Fuzzy Preference Information on Alternatives: Ranking and Weighting, Fuzzy Sets and Systems, 153, 331-346. 5. Van Laarhoven, P.J.M., and W. Pedrycz (1983). A fuzzy extention of Saaty's priority theory. Fuzzy Sets and Systems, 11 (3), 229-241. 6. Zadeh, L.A. (1965). Fuzzy Sets. Information and Control, 8 (3), 338-353. 7. Chen, C.T., (2000). Extensions of the TOPSIS for Group Decision-Making under Fuzzy Environment, Fuzzy Sets and Systems, 114, 1-9. 8. Li, D.,F., Yang, J.,B., (2004). Fuzzy Linear Programming Technique for Multiattribute Group Decision Making in Fuzzy Environments. Information Sciences, 158, 263-275. 9. Fan, Z.PHu, G.F., Xiao, S.H., (2004). A Method for Multiple Attribute DecisionMaking with the Fuzzy Preference Relation on Alternatives, Computers&Industrial Engineering, 46,321-327. 10.Davis,L.,&Williams,G.(1994).Evaluating and selecting simulation software using the analytic hierarachy process. Integrated Manufacturing Systems, 5 (1), 23-32. ll.Byun,D.H.,&Suh,E.H.(1996).A methodology for evaluation EIS software packages. Journal ofEnd User Computing, 8 (21), 31. 12. Hwang, C.,L., Lin, M., J., (1987). Group Decision Making under Multiple Criteria, Springer-Verlag, Berlin.
PRODUCT-MIX DECISION WITH COMPROMISE LP HAVING FUZZY OBJECTIVE FUNCTION COEFFICIENTS (CLPFOFC) SANI SUSANTO* Senior Lecturer, Department of Industrial Engineering, Faculty of Industrial Technology, Parahyangan Catholic University, Jin. Ciumbuleuit 94, Bandung - 40141, Indonesia. PANDIAN VASANT* Research Lecturer, EEE Program, Universiti Teknologi Petronas, 31750 Tronoh.BSI, PerakDR, Malaysia. ARIJIT BHATTACHARYA § Examiner of Patents & Designs, The Patent Office, Bouddhik Sampada Bhawan, CP-2, Sector V, Salt Lake, Kolkata 700 091, West Bengal, India. CENGIZ KAHRAMAN** Professor, Department of Industrial Engineering, Istanbul Technical University, 34367 Macka Besiktas, Istanbul, Turkey. This paper outlines, first, a compromise linear programming (LP) having fuzzy objective function coefficients (CLPFOFC) and thereafter, a real-world industrial problem for product-mix selection involving 29 constraints and 8 variables is solved using CLPFOFC. This problem occurs in production planning management in which a decision-maker (DM) plays a pivotal role in making decision under a highly fuzzy environment. Authors have tried to find a solution that is flexible as well as robust for the DM to make an eclectic decision under real-time fuzzy environment.
1. Introduction The theory of fuzzy LP was developed to tackle imprecise or vague problems using the fundamental concept of artificial intelligence, especially in reasoning and modelling linguistic terms. Conventional mathematical programming techniques fail to solve fuzzy programming problems (Kolman and Beck, 1995). Thus, the CLPFOFC approach is best suited to solve some of real-life problems. E-mail: [email protected] [email protected] [email protected] ' [email protected]
315
316 Some previous attempts to set fuzzy intervals, where coefficients of the criteria are given by intervals, were reported by Bitran (1980), Jiuping (2000) and Sengupta, Pal and Chakroborty (2001). Wang (1997) used triangular MF for LP modelling. A real-life industrial problem for optimal product-mix selection involving 29 constraints and 8 variables has been delineated in this paper. 2. Product-Mix Problem of Chocoman, Inc. The firm Chocoman, Inc. manufactures 8 different kinds of chocolate products. There are 8 raw materials to be mixed in different proportions and 9 processes (facilities) to be utilized having limitations in resources of raw materials. Constraints, viz., product-mix requirement, main product line requirement and lower and upper limit of demand for each product, are imposed by the marketing department. All the above requirements and conditions are fuzzy. The objective is to obtain maximum profit (z) with certain degree of LOS of the DM. 2.1. Fuzzy Objective Coefficients and Non-Fuzzy Constraints The two sets of non-fuzzy constraints are raw material availability, and facility capacity constraints. These constraints are inevitable for each material and facility, based on material consumption, facility usage and resource availability. The decision variables for the product-mix problem are: X] to x8 (viz., milk chocolate of 250g to be produced (in '000); milk chocolate of lOOg (in '000); crunchy chocolate of 250g (in '000); crunchy chocolate of 1 OOg (in '000); chocolate with nuts of 250g (in '000); chocolate with nuts of lOOg (in '000); chocolate candy (in '000 packs); chocolate wafer (in '000 packs)). The following constraints are established by the sales department of Chocoman, Inc.: Product mix requirements: Large-sized products (250g) of each type should not exceed 60% (non fuzzy value) of the small-sized product (lOOg) such that: Constraint-1: xj < 0.6 x2, Constraint-2: X3 < 0.6 X4, and Constraint-3: x5 < 0.6 x6 Main product line requirement: The total sales from candy and wafer products should not exceed 15% (non fuzzy value) of the total revenues from the chocolate bar products, such that: Constraint-4: 400x7 + 150x8 < 0.15(375x, + 150x2 + 400x3 + 160x4 + 420x5 + 175x6) 3. Rest of the identified 29 constraints, i.e., material requirement and facility usages, are given below: Constraint-5 (cocoa usage): 87.5x, + 35x2 + 75x3 + 30x4 + 50x5 + 20x6 + 60x7 + 12x8 < 100000 Constraint-6 (Milk usage): 62.5x, + 25x2 + 50x3 + 20x4 + 50x5 + 20x6 + 30x7 + 12x8 < 120000
Constraint-! (nuts usage): Ox, +0x 2 +37.5x 3 +15x 4 +75x 5 +30x 6 +0x 7 + Ox8 < 60000 Constraint-8 (confectionary sugar usage): lOOx, +40x2 +87.5x3+35x4 + 75x5 +30x6 + 210x7 + 24x8 <200000 Constraint-9 (flour usage): Ox, + 0x 2 + 0x 3 + 0x 4 + 0x 5 + 0x 6 + 0x 7 + 72x8 < 20000 Constraint-10 (aluminium foil usage): 500x,+0x 2 +500x 3 +0x 4 + 0x 5 +0x 6 + 0x7 + 250x 8 <500000 Constraint-11 (paper usage): 450x, + Ox, + 450x3 + 0x 4 + 450x5 + 0x 6 + 0x 7 + 0x 8 < 500 000 Constraint-12 (plastic usage): 60x, + 120x2 + 60x3 + 120x4 + 60x5 +120x6 +1600x7 + 250x8 < 500000 Constraint-13 (cooking facility usage): 0.5x, + 0.2x2 + 0.425x3 + 0.17x4 + 0.35x5 + 0.14x6 + 0.60x7 + 0.096xg < 1000 Constraint-]^ (mixing facility usage): Ox, + 0x 2 + 0.15x3 + 0.06x4 + 0.25x5 + 0.10x6 + 0x 7 + 0x 8 < 200 Constraint-]5 (forming facility usage): 0.75x, + 0.3x2 + 0.75x3 + 0.30x4 + 0.75x5 + 0.30x6 + 0.90x7 + 0.36x8 < 1500 Constraint-] 6 (grinding facility usage): 0x,+0x 2 +0.25x 3 +0.10x 4 + 0x5+0x 6 + 0x7 + 0x g <200 Constraint-] 7 (wafer making facility usage): Ox, + 0x 2 + 0x 3 + 0 x , + 0x 5 + Ox, + 0x 7 + 0.30x8 < 100 Constraint-]8 (cutting facility usage): 0.50x, +0.10x2 +0.10x3 +0.10x4 +0.10xs +0.10x6 +0.20x7 + 0x8 <400 Constraint-19 (packaging-1 facility usage): 0.25x, + Ox, + 0.25x3 + 0x 4 + 0.25x5 + 0x 6 + 0x 7 + 0. lx 8 < 400 Constraint-20 (packaging-2 facility usage): 0.05x, + 0.30x2 + 0.05x3 + 0.30x4 + 0.05x5 + 0.30x6 + 2.50x, + 0.15xg < 1000 Constraint-21 (labour usage): 0.30x, + 0.30x2 + 0.30x3 + 0.30x4 + 0.30x5 + 0.30x6 + 2.50x7 + 0.25xg < 1000 Constraint 22 to 29 [which are for demand for MC 250; demand for MC 100; demand for CC 250; demand for CC 100; demand for CN 250; demand for CN 100; demand for Candy; and demand for Wafer, respectively] are x, <500, x 2 <800, x 3 <400, x 4 <600, x5 <300,x 6 £500,x 7 <200,x 8 <400, respectively. The non-negativity constraints are: x,,...,xg>0. 3. Algorithm for the CLPFOFC The CLPFOFC algorithm in order to arrive at an eclectic product-mix decision under fuzzy environment, is as follows: Step 1: Formulation of the crisp linear programming problem; Step 2: Determination of the type of fuzzy number (e.g., triangular fuzzy number) to be chosen for each of the objective function coefficients C •;
318 Step 3: Defining the objective function coefficient vector (C), lower bound vector (c'J and upper bound vector (C- ) of the objective function coefficient; Step 4: Formulation of the following LP model with multiple objectives to minimize the value of fuzzy triangular numbers as follows: maximise z = (c'x,ex,e*x) subject to Ax<,=,> b, where x>0
(1)
Step 5: Transforming the problem formulated in Step 4 to the following form: min z, = (c - c" )x, max z2 = ex, max z3 = (c+ - c)x subject to Ax <, =, > b, where x > 0 (2) Step 6: Determination of the following set of compromise solutions: (c-c)x ^ xeX»(x|Ax£,=,2b,x20|
(c-c")x
(4)
ex
(5)
ex
(6)
(c -c)x
(7)
((c + -c)x
(8)
xeX={x|Ax£,=,2b,x20) X < E X = ( I | A X S , = ,2 b,x20)
min xeX={x|Ax<;,= ,2 b , i 2 0 | + x<sX = (i|AxsS,..,2b,x20)
zmm =
min mi" ieX=(x|Ax£,=,2b,x20) xeX=(x|Ax£,=,2b,x20)
'
Step 7: Defining the following set of three MFs: , if (c - c" )x ^ z. V-z. (*)
:
— ,if zI
s (c - e )x £ Zj
(9)
•c / -i ^ max , if (c - c )x 2 Zj
, i f ex 2 z™" u
x
*,( )
, i f z ° i n < ex < z™"
(10)
, i f c x < z?'"
(c
Mz,0>:
i/(c -c)x>z3 min -c^x-zj mjn + max if Zj <(c - c ) x £ z 3 max min z3 -z3 +
, if (c
(11)
-c)x< z 3
Step 8: Defining the following linear programming problem: max mini u (x),u, (x),u z (x)}
n?x
Step 9: Converting the LP of Step 8 into the following compromise LP problem: a= min { u7 (x),u (x),u (x)} n^ xeX=(x|AxSb,i20) t ^ z l
V
•" ^ !
v
- , ""Zi v
'J
U
J
/
Step 10: Obtaining an equivalent compromise solution to Step 8 by using the following LP problem:
319 max a
(14)
subject to: A z ,(x)>a
p,t(i)>a
or
or
// Zj (x)>a Ax^,=,>b
(,c-c)x+a(zr~zr)
cx-a(z2
max
+
max
or (c -c)x-a(z3
-z2
-Z3
min
rain
(15)
)>zf
(16)
m/
(17)
)>z3 '
(18)
0
(19)
x>0
(20)
4. Results, Discussions and Conclusion The WinQSB® software have been used to obtain the results (Table 1). Solving eqs. (9) to (11), the following values are obtained respectively: \xz\(x) = 0.5030, Hz2(x) = 0.5030 and (J.z3(x) = 0.5039. By solving the definition of a from eq. (13) one gets a = min{ n Z| (x),|x Z2 (x),n Zj (x)} = 0.5030 The value of a corresponds to the two MF nzi(x) and u^x). From these two MFs, the interpretation of the value of a can be obtained through the following steps: (i) From the definitions given in Step 5 for the optimal solution in Table 1, the obtained values are: z, = 33404.033, z2 = 133899.404, and z3 = 33504.823; (ii) Secondly, from the definition one gets: uzi(x) = 0.5030. This value represents the LOS of the DM achieved by the optimal solution in Table 1. The highest and the lowest LOS are achieved when the difference between the value of ex and c" x is 0 and 67215.600, respectively. By linear interpolation, (c-c")x = Zi= 33404.033 which corresponds to the LOS of DM at |jzl(x) = 0.5030. (iii) Thirdly, from the definition nz2(x) = 0.5030. The highest and the lowest LOS are achieved when the value of ex is 0 and 266184.900 respectively. Using linear interpolation, cx=z2= 133 899.404 which corresponds to the LOS of DM at uz2(x) = 0.5030. Table 1. Optimal combination of products from WinQSB'1' software Quantity to produce per 1000 units Product MC250 MC100 CC250 CC 100
46.037 76.728 360.000
CN250 CN 100
600.000 0.000 0.000
CANDY WAFER
100.790 0.0000
320 The solution using the developed fuzzified multi objective compromise LP algorithm considers imprecision of the given information. The non-fuzzy solution of Tabucanon (1996) results in optimal value of z as 266,157 (assuming LOS of the DM always remains constant at 100%) while fuzzy solution of Vasant (2003) using modified S-curve MF results in optimal z = 318,000 (for 100% LOS of the DM which is an ideal assumption) and z = 254,400 (at 50% LOS with a pre-determined vagueness value of 13.8). The CLPFOFC model uses triangular MF with an optimal profit z = 266,184 (at 100% LOS of the DM) and z = 133,899 (at 50.30% of LOS of the DM). The model presented by Tabucanon (1996) was not flexible enough to incorporate DM's LOS. Inter alia, Vasant (2003) didn't use such a fuzzified compromise model. Further extension of the present model using a suitably designed smooth logistic MF (which is of course a more realistic assumption) may increase profit of the Chocoman, Inc. trading off suitably among decision variables and other constraints in making product-mix decision. References 1. Bitran, G.R., 1980, Linear multiple objective problems with interval coefficients. Management Science 26, 694-706. 2. Jiuping, X., 2000, A kind of fuzzy linear programming problems based on interval-valued fuzzy sets. A journal of Chinese universities 15(1), 65-72. 3. Kolman, B. and Beck, R.E., 1995, Elementary linear programming with applications. Academic Press, USA. 4. Sengupta, A., Pal, T.K. and Chakraborty, D., 2001, Interpretation of inequality constraints involving interval coefficients and a solution to interval linear programming. Fuzzy Sets and Systems 119, 129-138. 5. Tabucanon, M.T., 1996, Multi objective programming for industrial engineers. Mathematical programming for industrial engineers. Marcel Dekker, Inc., New York, 487-542. 6. Vasant, P., 2003, Application of fuzzy linear programming in production planning. Fuzzy Optimization and Decision Making, 2(3), 229-241. 7. Wang, L.-X., 1997, A Course in Fuzzy Systems and Control, Prentice-Hall Int., London.
MODELING THE SUPPLY CHAIN: A FUZZY LINEAR OPTIMIZATION APPROACH NUFER YASIN ATES Industrial Engineering Department, Istanbul Technical University, 80680 Macka, Istanbul, Turkey SEZI CEVIK Industrial Engineering Department, Istanbul Technical University, 80680 Macka, Istanbul, Turkey A supply chain is a network of suppliers, manufacturing plants, warehouses, and distribution channels organized to acquire raw materials, convert these raw materials to finished products, and distribute these products to customers. Linear Programming is a wide used technique to optimize Supply Chain decisions. In the crisp case, every parameter value is certain whereas in real life, the data is rather fuzzy than crisp. The fuzzy set theory has the capability of modeling the problems with vague information. In this paper, a fuzzy optimization model for supply chain problems will be developed under vague information. A numerical example will be given to show the usability of the fuzzy model.
1. Introduction A crucial component of the planning activities of a manufacturing firm is the efficient design and operation of its supply chain. A supply chain is a network of suppliers, manufacturing plants, warehouses, and distribution channels organized to acquire raw materials, convert these raw materials to finished products, and distribute these products to customers. Strategic level supply chain planning involves deciding the configuration of the network, i.e., the number, location, capacity, and technology of the facilities. The tactical level planning of supply chain operations involves deciding the aggregate quantities and material flows for purchasing, processing, and distribution of products. The strategic configuration of the supply chain is a key factor influencing efficient tactical operations and has a long lasting impact on the firm. Meanwhile tactical level planning which determines the operational efficiency of the strategic configuration is very important too and must be handled attentively during the all working process of the chain. Supply chain management (SCM) is the term used to describe the management of the flow of materials, information, and funds across the entire 321
322 supply chain, from suppliers to component producers to final assemblers to distribution (warehouses and retailers), and ultimately to the consumer. In fact, it often includes after-sales service and returns or recycling. In contrast to multiechelon inventory management, which coordinates inventories at multiple locations, SCM typically involves coordination of information and materials among multiple firms. Supply chain management has generated much interest in recent years for a number of reasons. Many managers now realize that actions taken by one member of the chain can influence the profitability of all others in the chain. Firms are increasingly thinking in terms of competing as part of a supply chain against other supply chains, rather than as a single firm against other individual firms. Also, as firms successfully streamline their own operations, the next opportunity for improvement is through better coordination with their suppliers and customers. The costs of poor coordination can be extremely high (Johnson and Pyke, 1999). Beginning with the seminal work of Geoffrion and Graves (1974) on multicommodity distribution system design, a large number of optimization-based approaches have been proposed for the design of supply chain networks. However, the majority of this research assumes that the operational characteristics of, and hence the design parameters for, the supply chain are deterministic. Unfortunately, critical parameters such as customer demands, prices, and resource capacity are quite uncertain. Moreover, the arrival of regional economic alliances, for instance the Asian Pacific Economic Alliance and the European Union, have prompted many corporations to move more and more towards global supply chains, and therefore to become exposed to risky factors such as exchange rates, reliability of transportation channels, and transfer prices. Unless the supply chain is designed to be robust with respect to the uncertain operating conditions, the impact of operational inefficiencies such as delays and disruptions will be larger than necessary (Goetschalckx et al., 2003) In decision-making, especially when a high degree of fuzziness and uncertainties are involved, due to imperfections and complications of information processes the theory of fuzzy sets is one of the best tools of systematically handling uncertainty in decision parameters. Supply Chain Problem is complex in nature and invites strategic decision of long-term implications. Much information at the decision process is not known with certainty. Due to this, the supply chain problem inherits the characteristics of impreciseness and fuzziness. Fuzzy set theories are employed due to vagueness and imprecision in the supply chain problem and are used to transform imprecise and vague information of the objective and constraints into the fuzzy objective and fuzzy constraints (Kumar et al., 2006). Zadeh (1965) suggested the concept of fuzzy sets as one possible way of improving the modeling of vague parameters. Bellman and Zadeh (1970)
323 suggested fuzzy programming model for decisions in fuzzy environment. In this paper, a supply chain problem under incomplete information is solved using fuzzy set theory. In the crisp problem all supplier, production, inventory and market constraints are certain. This is not the real case. The real problem includes vagueness rather than certainty. The technique used for solution of this problem is fuzzy linear programming. This paper is organized as follows. Section 2 defines the crisp supply chain problem. Section 3 gives the basis of fuzzy linear programming and fuzzy model of the supply chain problem. A numerical example is demonstrated in section 4. The example problem is both solved for the crisp and the fuzzy case. Section 5 gives the concluding remarks. 2. Crisp Supply Chain Problem A crisp logistic problem can be formulated as following linear programming (LP) model. For simplifying presentation, only one product is considered in this paper. Also it is assumed that the costs and qualities of raw material from suppliers are same, that's why only transportation costs from suppliers are considered. Crisp Logistic Model Minimize
] T ] T cik xik + ] T ck xk + £
^
c
kj
i
subject to
k
k
j
x
kj
j
Ax > b
X
Xk
>~DJ=gi
for each
J'
^
k
3\\xik,xk,xkj,Di,gi
>0
where xik denotes the amount of raw material shipped from location i to plant k, xk represents the amount of product x produced at plant k, xkj is the amount of product x shipped from plant k to market j , cik denotes the unit cost for raw material shipped from location i to plant k, ck represents the unit cost for producing product in plant k, c^ is the unit cost for shipped the product from plant k to market;', Dj denotes the demand of market j for this product and gj represents the safety stock at marketplace j which is specified by the decision maker (Yu and Li, 2000). The objective function in Eq. (1) expresses the cumulative cost which consists of transportation, production, and inventory. First constraint group expresses the general constraints about flow balance, workers, materials, funds, and other resources requirements in related locations, plants, and markets. Last constraint group expresses the constraints of supply and demand in markets. In real life, many input information related with the supply chain are not known with certainty. For example, how the supplier will respond to a new
324
design cannot be ascertained. At the time many situation are expressed in imprecise terms like 'very poor in late deliveries', 'hardly any rejected quantities', 'capacities of a supplier X is somewhere between 2000 and 2500', etc. Also, the inventory, market and supplier constraints can not be known in advance. Moreover decision makers predetermine an interval in their mind for value of the goal. Such vagueness in the critical information cannot be captured in a deterministic problem and therefore the optimal results of these deterministic formulations may not serve the real purpose of modeling the problem. Due to this, we have considered the model as a fuzzy model. For this model, it is desired to maximize the overall aspiration level rather than strictly satisfying the constraints. 3. Fuzzy Linear Optimization Fuzzy mathematical programming is defined as the term that is usually used in operation research, i.e. as an algorithmic approach to solve models of the type Maximize fix) (2) such that g , ( x ) < 0 Here, a special model of the problem "maximize an objective function subject to constraints", namely the "linear programming model" will be considered: Maximize Z = cTx such that Ax0 where c and x are n-vectors, b is an m-vector, and A is an mxn matrix. It is assumed that the decision-maker has upper and lower bounds ci and Cj for the attainment of the objectives. The decision-maker can establish these aspiration levels for himself, or they can be computed as a function of the solution space. The constraints can be hard or soft. If soft, it is assumed that the decision-maker has upper and lower bounds c, and c, for the, the right hand side b can be exceeded by the amount p, which is also under the discretion of the decision-maker (Jaroslav, 2001). The membership function of the fuzzy objective function i, juG (x), should be 0 for aspiration levels equal to or less than c,, 1 for aspiration levels equal to or greater than c, , and monotonically increasing from 0 to 1, that is, 0
ifcjx^c. T
_ ~
MaM = ] Ci_X „C|'
I
if Ci
ifcfxZc;
(4)
325 The membership function of the fuzzy set representing constraint j , fic
(x),
should be 0 if the constraint is strongly violated (i.e., if it exceeds bj+pj), 1 if it is satisfied in the crisp sense (i.e., if equal to or less than bj), and should decrease monotonically from 1 to 0 over the tolerance interval (bj, bj +pj), 1 if (Ax)}
if bj <(Ax)j
Mc (•*) =
(5)
Pi if (Ax)j >bj +
Pj
The membership function of the decision set, juD (x), is given by /uD(x) = min)M c (x),Mc (•*)} ;'= l,..l; j = l,...,m, forall^e X a
j
'
(6)
The min-operator is used to model the intersection of the fuzzy sets of objectives and constraints. Since the decision maker wants to have a crisp decision proposal, the maximizing decision will correspond to the value of x, xmax, that has the highest degree of membership in the decision set: MD(xmM) = max minJMG (x),M c (*)j i = \,...k\ j = \,...,m m This problem is equivalent to solving the following crisp LP problem Max A subject to f c,T x-c" \ >X i = l,2,...,* c, - c i J f
(Ax)]-bj^
>A
(8)
j = l,2,...,m
Pj
x>0 which can be rewritten as Max A subject to A(c -c)t-c] x<-ct i = l,2,...,k Apj + (Ax)j < bj + Pj
(9)
j = 1,2,...,m
x>Q Now, we can convert the crisp LP for logistics problem into the fuzzy LP problem. Since our problem is a minimization problem, we first convert it into a maximization problem with a slight modification. Here, the results obtained from the strictest crisp case can be used to determine candc values. The fuzzy LP model of the logistic problem is given in Eq. (10)
326 Max A subject to A(c-c)-
-ZZ c <* x <*~Z c i
k
Apj+(-Ax)j
<-c k
k
j
<-bj+p]
(10) for each j ,
APj + )j
V *
Jj
&]\xik,xk,xkJ,Dj,gj
>0
4. Numerical Example In this section, a numerical example is solved to demonstrate the usability of the model and the approach. A supply chain consisting 3 suppliers, 2 production plants and 4 market places are to be optimized with the constraints shown in Table 1. Table. 1: Parameters of the numerical Example
Quantities Supplier Capacity
Plant Capacity
Safety Stocks in Markets
Demands in Markets
1
81
1
95
1
32
1
8
2
79
2
55
2
40
2
12
3
97
3
60
3
59
3
11
4
27
4
8
Costs Market
1
2
1
2.1
3.7
2.3
2
1.9
3.8
2.1
3
2.8
3.1
2.75
3 Plant
Supplier
Plant
Production
1
2
3
4
1
1.25
1
3.45
3.20
3.40
2.25
2
1.98
2
2.35
2.00
4.50
2.60
3
1.65
3
3.80
4.05
3.65
1.95
327 First, the model is solved with the crisp LP for the strictest case where all capacity and demand constraints are determined at the lowest level. The solution obtained which is shown in Table.2 will be used as a basis for the fuzzy LP for setting the c a n d c values. Table.2: Crisp LP Solution
CRISP LP SOLUTION Market
1
2
3
1
16
0
47
2
79
0
0
3
0
55
0
1 Plant
Supplier
Plant
Production
2
3
4
1
95
1
37
0
58
0
2
55
2
3
52
0
0
3
47
3
0
0
12
35
Total Cost = 1315.450 Table.3: Fuzzy LP Solution
FUZZY LP SOLUTION Production
Market
2
3
1
69
0
0
2
30
0
52
3
0
57
0
Plant
Supplier
Plant 1
1
1
2
3
4
1
40
0
60
0
2
58
3
52
2
3
55
0
0
3
0
0
15
37
100
Total Cost = 1425.476 Then the same problem is solved for the case where capacity and demand constraints are fuzzy. In this case, supplier's and plant's capacities can be increased at a %9 level to compensate %12 increase in demands at all markets without increasing the total cost more than %15. For this fuzzy case where X is obtained as 0.557, the solution is given in Table.3 above 5.Conclusion In this paper, a fuzzy approach to the supply chain problem is developed using fuzzy linear programming. It is proposed first to solve the crisp optimization case for the strictest case to obtain a basis for the fuzzy case for setting c andc, then to solve the problem for the fuzzy case where tradeoff values are determined by the subjective assessments of the decision makers. The usability of the technique proposed is shown via a numerical example where the suppliers' capacities, plants' capacities, and demands of the markets are fuzzy. Without interrupting the cost constraint more than a decision maker specified increase, the optimum values for the decision variables are obtained by the method proposed for the supply chain actually working in a fuzzy environment. So the model seems quite useful for all supply chain practitioners in the business world.
328 Reference 1. Bellman, R.E., Zadeh, L.A., Decision making in a fuzzy environment, Management Sciences, 17, B141-B164, 1970 2. Geoffrion, A. M. and Graves G.W., Multi-commodity distribution system design by Benders Decomposition, Management Science, 20, 822-844, 1974 3. Jaroslav R., Soft Computing: Overview and Recent Developments in Fuzzy Optimization, Ostravska univerzita Press, Listopad, 151-176, 2001 4. Johnson, M.E. and Pyke, D.F., Supply Chain Management, Dartmouth College Press, Hanover, 2-11, 1999 5. Kumar M., Vrat, P., and Shankar, R, A fuzzy programming approach for vendor selection problem in a supply chain, International Journal of Production Economics, 101, 273-285, 2006 6. Marc Goetschalckx, M. et al, A stochastic programming approach for supply chain network design under uncertainty, Alaaddin Workshop, March, Pittsburg, 2003 7. Yu, C.S. and Li, H.L., A robust optimization model for stochastic logistic problems, International Journal of Production Economics, 64, 385-397, 2000 8. Zadeh, L.A., Fuzzy sets, Information and Control,8, 338-353, 1965
A FUZZY MULTI-OBJECTIVE EVALUATION MODEL IN SUPPLY CHAIN MANAGEMENT* XIAOBEI LIANG* Shanghai Business School, Shanghai, 200235, China School of Management, Fudan University, Shanghai, 200433, China XINOHUA LIU School of Business Administration, South China University of Technology, Guangzhou, 510641, China School of Information Management, Shandong Economic University, JV nan, 250014, China DAOLI ZHU School of Management, Fudan University, Shanghai, 200433, China BINGYONG TANG, HONGWU ZHUANG Glorious Sun School of Business and Management, Dong Hua University, Shanghai, 200051, China Nowadays the selection of suppliers in supply chain management (SCM) becomes more and more important, and then, the problems of which evaluation method can be used in this area are addressed. This paper proposes a worst point-based multi-objectives suppliers evaluation method, the major thoughts are: First choosing n different suppliers evaluation criterion to be an n-dimensions space, every suppliers information is a point in this space. Then the Euclid distance between these points can be calculated, and the suppliers by the distance of being the best or worst are sorted out.
1. Introduction Under the supply chain environment, the competition among companies will develop from individuals into the whole chain. Within the supply chain, the relationship between individuals will develop into cooperation. The characteristic of this kind of cooperation is that an enterprise wraps up the ability of promoting the key business by outsourcing non-core business. When it This work is supported by grant 05ZRI4091 of the Shanghai Natural Science Foundation. Corresponding author. Tel.: 0086-21-62373937; Fax: 0086-21-62708696. E-mail address: [email protected] (X.B. Liang)
329
330 comes to a specific relationship between suppliers and manufacturers, the winwin status can be reached based on this cooperative foundation. This kind of cooperation will be further developed in the direction of cooperative strategy in some proper period. The supplier's conduct is closely related to the manufacturer's profit. Taking Honda for example, about 80% of an automobile's cost in Honda is used for buying the parts from the supplier, namely the purchase volume in supplier's reaches 6 billion dollars every year, that is to say 13,000 staff only account for 20% costs of all vehicles in the company. From this point of view, Honda implements an important purchase and management action—"best partner's" project. The development of supplier's cooperation is very important to a manufacturer, it embodied mainly as follows: raise product's quality, improve stock level, shorten leading time, quick market react, and better product design1"7. Supplier's appraisal is one of the basic decisions for a company, and the choosing course has always many criterions. Dickson first studied appraisal criterions systematically, he enumerates at least 50 criterions in a document. Weber (1991) summarized 23 criterions in ref. 8, these 23 criterions are included within Dicksons'. Because of many criterions of the problem, we know that the essence of the supplier's appraisal is multi-objective, but there is not so much literature about supplier's appraisal by multi-goals programming. In order to satisfy all criterions, it will often produce the conflict between the goals, Weber, et al propose use multi-goals programming to solve the problem in document 4 in 1993, at the same time, he discussed the balance question between different criterions.9"11 2. A Fuzzy Multi-objective Evaluation Model of Suppliers The fundamental thought of model is: first, one should decide on the evaluation standards of numerous suppliers, and use them to constitute a set of vectors; second, one should make an n-dimensional space to represent each supplier by each point, then make sure the optimal or the worst supplier; finally, one could count the distances of each supplier to the optimal or the worst supplier on the basis of the Euclidean distance, and sort the suppliers according to the value of distances. The explanations of variables and symbols are as follows: X~(xh x2, , x„) : the suppliers vectors, which will be estimated; f=(fuf2, ,/J- the evaluation standards vectors of suppliers; f(X)=(fi(X),f2(X), ,fm(X))'- the set of suppliers evaluation targets. Suppose the favorites of the manufacturers are absolutely known. Furthermore people could employ the targets weights a to express. Generally speaking, there exist the following models.
331 Model 1 min z = (f(X)) In terms of the documents, some evaluation standards of the suppliers are the greater the better, such as punctual delivery rate, quality etc. On the contrary some are the smaller the better, such as prices, returns rate etc. If one develops the model 1, then the model 2 can be obtained. Assuming/" =(fi(X),f2(X), ,fk(X)), represents k standards of the smaller and the better; / + =(fk+i(X), fk+2(X) , fm(X)), represents m-k standards of the greater and the better. Then there exists: \minf-(X) = (fl(X),f2(X), ,fk(X)) Model 2 < [max / + (X) = (fM (X), fM (X), , / . (X)) In general one usually employs weights to change single-objective problems for multi-objective problems in the multi-objective decision. This paper will continue the above method and employ relative inferior subordination degree and quadratic normal form to change the single-objective problem for the model 2. Introducing the relative inferior subordination degree v i{Xj)>' = 1>2, , m,x, e. X, represent i standard degree of quality of supplier j , thus change the relative superior subordination degree matrix for a decision matrix, from this could reduce the affect of decision results. —=—,j = l,2,
,k,x.
eX
—=—,i = k + l,k + 2,
which
,m,x
sX
7 = max{/:(x,),/:(x2),
, /(x,)},i = l,2,
/ = min{/(*,),/(*,),
, f.(xn)},i
, k
= k + \,k + 2,
, m
One could get the worst point on the basis of the most inferior principle. The relative subordination degree matrix n! A = (a..)
ij mxn
=<Ji.{x.))
i j mxn
=(^.<-
Assuming s(x ) = (co(\-a ),a> (\-a j 1 ly 2 2j
-
2 ),
A
)
n
r\(n-r}]
a (\-a )) m mj
Setting the quadratic normal form of s(x ) to be J 2 m 2 ,. ,2
II '(*,> = z«.(i-«..r) ' \i ||2
Letting d(x ) = * III sO ) , its geometric meaning is the Euclidean J V J H distance of supplier j to the worst supplier. The greater d(x ) is, the most J excellent integrated index. And then one could get:
r^
Model 3 max d(x ) = «lit s(x ) J V J According to the existing theorem the optimal solution of the model 3 is the effective solution of the model 2. Thus the multi-objective problems have changed into single-objective problems. 3. An Example for the Fuzzy Multi-objective Evaluation Model of Suppliers The key step before applying the suppliers' evaluation model is to decide on the evaluation standards of the suppliers. Referring to the research result of Dickson and Weber, and to consult with a few professionals of relevant field, this model could look on preset time (T), procurement costs (C), quality (Q), after-sales service (S) and supply ability (F) as the critical factors of evaluation. It is hard to calculate the procurement costs, therefore here one could employ the procurement price instead of the procurement costs. If the company has a perfect cost accounting system, it is better to adopt the procurement costs. The supply ability could be computed according to the average annual capacity. The quality and after-sales service are the targets, which are difficult to quartile, so one could employ the datum correlation method to get them. The suppliers set
''
2
'
3
to = [0.5128,0.2615,0.1289,0.0634,0.0333] The targets weights fi(X): the value of the target of preset time; f2(X): the value of the target of procurement costs; f3(X): the value of the target of quality; f4(X): the value of the target of after-sales service; f5(X): the value of the target of supply ability.
"" factors Suppliers"---^ 1 2 3
Table 1 The value of the target of supply ability T S C Q 3 4 6
90 100 80
100 120 150
80 90 100
min/ (X) =
(fi(X),f2(X))
F 1200 800 1100
Building the model max f (X) = (/ (X), f (X), f (X)) 3 4 5 On the basis of the most inferior principle people could get the worst point f
=<J\,f2,fi,Ufi)
=(6, 150, 80, 80, 800)
The relative subordination degree matrix 0.5 0.67 1 0.89 0.67
0.8
0.89
0.8
0.67' 1
1 0.8 1 0.73 Counting the Euclidean weighting distance Table 2 The evaluation results of suppliers on the basis of the most inferior subordination degree Supplier 1 0.0733
Supplier 2 0.0339
Supplier 3 0.0006
From Table 2 one could see that the optimal is Supplier 1. So Supplier 1 is better than Supplier 2, and Supplier 2 is better than Supplier 3. 4. Conclusions The appraisal of suppliers is not only for choosing the promising suppliers to set up strategic partnership, but also to carry on the performance of assess the supplier with business contact. It then offers consultant for setting up effective incentive mechanism in suppliers. The appraisal method on the basis of the most unsatisfactory point in this text can be practiced easily, it may play a utilitarian role in supply chain management.
334 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Gregory A. The Road to Integration: Reflections on the Development of Organizational Evaluation: Theory and Practice, Omega, 1996,24(3): 295307. F. K. Wang, Norma F. Hubele, Frederick P. Lawrence. Comparison of Three Multivariate Process Capability Indices, Journal of Quality Technology, 2000, 32(3): 263-275. Zachary G. Stoumbos. Process capability indices: overview and extensions. Nonlinear Analysis: Real World Application, 2002,3(2): 191-210. John M. Quigley. A Simple Hybrid Model for Estmating Real Estate price Indexes. Journal of Housing Economics, 1995(4): 1-12. Eklof Jan A, Westlund, Anders h, Customer Satisfaction Index and Its Role in Quality Management, Total Quality Management, 1998, 9(4): 80-86 Hall, Mary-Jo, The American Customer Satisfaction Index, Public Manager, 2002, 31(1): 23-27. Lall Sanjaya, Competitiveness Indexs and Developing Countries: An Economic Evaluation of the Global Competitiveness Report, World Development, 2001, 29(9): 1501-1525. Weber C A, Current J R, Desai A. Non-cooperative negotiation strategies for vender selection. European Journal of Operational Research, 1991,(108): 208223. Lin.Hsiu-Fen, Lee.Gwo-Guang, Impact of organizational learning and knowledge management factors on e-business adoption. Management Decision 43, no. 2 (2005): 171-188. K.Sameer, R.Thomas, A.Mark, Systems thinking, a consilience of values and logic. Human Systems Management 24, no. 4 (2005): 259-274. Weber C A, Current J R. A multi-objective approach to vender Selection.European Journal of Operational Research, 1993,(68): 173-184
EVALUATING RADIO FREQUENCY IDENTIFICATION INVESTMENTS USING FUZZY COGNITIVE MAPS ALP USTUNDAG f Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey MEHMET TANYAS Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey RFID (Radio Frequency Identification) is an Auto-ID technology which uses radio waves to automatically identify the individual items. Using RFID systems for identifying and tracking objects, it is possible to improve the performance of a supply chain process in terms of operational efficiency, accuracy and security. RFID systems can be implemented in different levels like item, case or pallet. These various applications create different impacts in supply chain processes. RFID investments are very important strategic decisions so they require a comprehensive evaluation process. The tangible and intangible benefits should be integrated to evaluate them. The fuzzy cognitive mapping (FCM) is a suitable tool to model causal relations in a non-hierarchical manner for an RFID investment evaluation. In this paper, the FCM method is used to measure the impact of RFID investment in a supply chain process.
1. Introduction Radio Frequency Identification (RFID) is an Auto-ID technology consisting of a microchip with a coiled antenna and a reader. Using radio-frequency waves, data and energy are transferred between a reader and a tag to identify, categorize and track objects. The RFID system consists of tags, readers and a computing infrastructure for storing and analyzing the data received from the reader. The supply chain processes gain many benefits from the RFID technology. According to a study by A.T. Kearney, retailers expect benefits in primary areas: Reduced inventory, store/ warehouse labor reduction and reduction in out-ofstock items [1]. Using RFID technology, the companies can have better order fill rates, short order lead times, less inventory shrinkage and improved customer service. RFID provides efficiency, accuracy and security on the supply chain. Corresponding author: Tel.:+0 (212) 293 13 00-2759, Email: [email protected]
335
336 So far in many studies, the benefits of RFID are explained qualitatively. But there is a lack of quantitative evaluation model of this technology. In this paper using the fuzzy cognitive mapping approach, we try to build a model which quantifies the impact of RFID technology on a supply chain process. 2. Literature Overview 2.1. RFID Technology The most published research papers about RFID are focused on the explanation of what the technology is and what is not. Wu et al. [2] examine the existing challenges that RFID technology is facing, its future development directions and the likely migration paths to realize its promises. In McFarlane's paper on "The Intelligent Product in Manufacturing Control" [3], the basic concepts of RFID and its implications on discrete event control are examined. Sheffi [4] speculates on the possible future adoption of RFID technology considering the innovation cycles of several technologies. Karkkainen [5] discusses the potential of utilizing RFID technology for increasing efficiency in the supply chain of short shelf life products. RFID based system designs for specific applications are also examined in some research papers. Chow et al. [6] propose an RFID-based warehouse resource management system. Ngai et al. [7] propose system architecture capable of integrating mobile commerce and RFID applications in a container depot. Ni et al. [8] present a location sensing prototype system that uses RFID technology for locating objects inside buildings. Goodrum et al. [9] propose an RFID based tool tracking and inventory system which is capable of storing and maintenance (O&M) data on construction job sites. An RFID system implementation can be seen as an IT/IS investment. There are a lot of studies about IT/IS investment evaluation in the literature. However, it is really difficult to calculate the returns of an RFID system deployment. Patil [10] discusses that Discounted Cash Flow (DCF) and Net Present Value (NPV) calculations are too limited a basis on which to make RFID investment decisions because they undervalue returns and focus management attention on short term cash flow. In our study, we propose a fuzzy cognitive mapping approach to evaluate the impact of RFID on business processes.
2.2. Fuzzy Cogntive Maps (FCM) A fuzzy cognitive map (FCM) is a method to draw a graphical representation of a dynamical system, and connecting the state variables in the system by links that symbolize cause and effect relations. According to Kosko [11], an FCM ties facts and things and processes to values and policies and objectives. An FCM is a non-hierarchic flow graph from which changes to each statement (concept, node) are governed by a series of causal increases and decreases in fuzzy weight values [12]. Given an FCM, with a number of the concepts, Ci where i = 1, ,n exists , the value of each concept can be calculated using the following equation: ,i+!
a=f
f n
^
W=i
J
+ C"'
(l)
Cjt+1 is the value of concept at step t+1, Ci'"1 is the value of the concept at the t-1 step, f(x) is a threshold function and Wy is the weighted link from concept d to Cj. The threshold function f(x) can be hyperbolic (tanh x) or sigmoid (x = l/l+e"rx). If the concepts are negative and their values belong to interval [-1, 1], the hyperbolic function is used. If the concept value interval is [0, 1], the sigmoid function is used. The initial row vector can be written with notation {Ci, C2, C3....Cn} for n concepts and the weights of the edges can be written in a nxn matrix W, where each element wy gives the weight of the edge from concept Cj to Cj. Calculating the FCM continuously, we can either approach to a limit cycle or fixed point cycle. In the literature, several extensions to the FCM method have been proposed, such as Rule Based Fuzzy Cognitive Maps, Extended FCMs, and Evolutionary FCMs. The FCM models were used in numerous areas of applications as medicine, political, science, international relations, military science, supervisory systems etc. 3. Cost and Benefits of RFID Deployment Investment in RFID is of strategic nature since it is clearly linked to business strategy in the following way [10]: • • •
By Process Innovation By importance of its adoption in Business Process Reengineering By enabling IT Capabilities
338 Because the decision to deploy RFID technology in an enterprise is a strategic business decision not a technology decision, cost - benefit analysis is a key component of this decision. To measure the value of an RFID investment, we have to understand the elements of cost as well as the business- and customer-related benefits comprehensively. The cost of an RFID deployment can be examined in three key areas: hardware, software, and services. Hardware costs include the cost of tags, readers, antennas, host computers and network equipment. Software costs include the cost of creation or upgrade of middleware and other applications. Service costs include the cost of installation, integration of various components, training, support and maintenance, and business process engineering. RFID benefits can be broken down into two parts: the first is cost reduction (labor cost reduction, inventory cost reduction, process automation , and efficiency improvements), and the second is value creation (e.g. increase in revenue , increase in customer satisfaction due to responsiveness, and anticounterfeiting, etc) [2]. To identify and track the products, RFID systems can be deployed in item, case or pallet levels. RFID can also be used to track and identify the capital assets like forklifts, cranes or racks. Various options of RFID deployments create different levels of efficiency, accuracy or security in business processes. As an example in a distribution center, receiving operations can benefit greatly from RFID. The labor saved is directly related to the type of receiving performed. Since case-level receiving requires more scans, it should benefit more from RFID than pallet-level operations. While receiving and shipping are common areas of interest RFID deployment, an entirely scan-free operation is the ultimate goal from an efficiency perspective. In the perspective of accuracy, RFID has the ability to provide an inventory tracking mechanism that is not dependent on humaninitiated scans. Outbound load confirmation is also a good example of how RFID can improve accuracy. An RFID alternative that reads all case tags on a pallet as it moved through an outbound door would eliminate both the need for the palletizing scan and the error that would occur if the scan did not take place. Security is also an important performance measure in a supply chain process. Since RFID can passively track the movement of an individual object, it can be used in similar manner as sensormatic and other loss-prevention technology to help reduce theft.
4. Fuzzy Cognitive Map to Evaluate the RFID Investment In our study, we propose a model to examine the impact of an RFID investment in a distribution center. After RFID investment, the company will have labor and inventory cost reductions. Customer demand (sales) will increase due to the increase of customer satisfaction (table 1). Table 1. Cost and benefits of an RFID Investment Investment Cost Hardware Software Service Benefits Inventory Cost Reduction Labor Cost Reduction Sales Increase
+ + +
Due to the automation, labor reduction will be seen in the following areas: •
Clerical Employees: Daily data entry hours spent on transactions and operations • Material Handlers: Personnel involved in picking, receiving, put away, shipping and cycle counting • Customer service: Total personnel involved in customer service activities associated with distribution operations The labor will also reduce due to decrease of inventory level and shipping/data entry errors. Using the RFID system, accuracy and security level will increase which can be measured by receiving, shipping and cycle counting errors, stock out ratio and theft ratio. Customer satisfaction which can be measured by the number of customer complaints will increase due to decrease of order delivery time. And increased satisfaction will trigger the customer demand. In our FCM model, seven concepts are determined (table 2). Table 2. Description of the concepts in the model
CI C2 C3 C4 C5 C6 C7
Concept Descriptions Labor Accuracy Security Inventory Level Cost Customer Satisfaction Customer Demand
Different degrees of influence are shown in the table 3. In the weight matrix (table 4), the weights between any two concepts are given. Table 3. Degrees of influences Linguistic variable Negatively very high Negatively high Negatively medium Negatively low Negatively very low Zero Positively very high Positively high Positively medium Positively low Positively very Low
Degree of Influence -1 -0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1
Table4. The weight (influence) matrix
CI C2 C3 C4 C5 C6 C7
CI 0,00 -0,40 0,00 0,20 0,00
C2 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 0,00
C3
0,00 0,00
0,00 0,00
0,00 0,00
C4 0,00 -0,20 0,00 0,00 0,00 0,00 0,00
C5
C7 0,00 0,00
-0,20 0,40 0,00
C6 0,00 0,20 0,20 0,00 0,00
0,00 0,00
0,00 0,00
0,20 0,00
0,80 0,00
0,00 0,00 0,00
The graphical representation of the dynamical system can be seen in figure 1.
Customer Demand C7 Figure 1. Graphical representation of the model
341 The initial labor, accuracy and security level should be determined according to the characteristics of the RFID investment (case or pallet level etc.) considered by the managers. The expected initial labor level can be determined by considering the workforce (man hours) that the company doesn't need any more due to the automation. The initial accuracy level can be assumed by considering the expected error reductions. And for the security level, expected reduction of the theft ratio should be taken into account. The model will show us, how the labor and inventory levels will change with the influence of the accuracy and security values. So the changes in the level of the cost and the customer demand can be examined in the model. In our example the initial vector is assumed as [0,6 ; 0,8 ; 0,8 ; 0 ; 0 ; 0 ;0]. Using the Eq. (1), the values of concepts [- 0,76 ; 0,3 ; 0,3 ; - 0,54 ; - 0,95 ; 0,66 ; 0,66 ] are determined after 14 iterations. The hyperbolic function [f(x) = tanh x] is used as the threshold function. In our study, as seen from figure 2, each concept value reaches an equilibrium state. The labor and inventory values are decreased due to given accuracy and security levels. Because of the decrease in labor and inventory levels, the cost is decreased from level 0,31 to 0,93. It is impressive that the customer demand value has increased from level to 0 to 0,66
.Q
Labor
-T£
Inventory Level
-3K
Cost
—|
Customer Satisfaction
—a
Customer Demand
-1,50 J Iterations
Figure 2. Equilibrium of concept values
5. Conclusion In this study, we presented an FCM-based evaluation tool for RFID investments. The model helps the decision makers to understand the complex relationships among cost and benefit factors of an RFID investment. It can also be used to
perform what-if analyses regarding the results. For example, we can use the model to answer the following questions: • •
What is the impact of a sudden accuracy level increase on sales and inventory? What is impact of decreasing inventory level on the labor level?
In further research, a decision support system can be developed which helps the managers to decide about the RFID investment in a specific business process. References 1. Heinrich, C, RPID and Beyond, Wiley Publishing, Indianapolis, 2005. 2. Wu, N.C., Nystrom, M.A, Lin, T.R, Yu, H.C., Challenges to global RFID adoption, Technovation, in press (available online 25 October 2005). 3. McFarlane, D., Sarma, S., Chirn, J.L., Wong, C.Y., Ashton, K., Auto-ID systems and intelligent manufacturing control, Engineering Applications of Artificial Intelligence , 16 (2003) 365-376. 4. Sheffi, Y., RFID and the innovation cycle, The International Journal of Logistics Management, 15 (2004) 1-10. 5. Karkkainen, M., Increasing efficiency in the supply chain for short shelf life goods using RFID tagging, International Journal of Retail & Distribution Management, 31 (2003) 529-536. 6. Chow, K.H., Choy, K.L., Lee W.B., Lau, K.C., Design of a case based resource management system for warehouse operations, Expert Systems with Applications in press (available online 6 September 2005). 7. Ngai, S.M.T., Cheng, T.C.E, Au, S., Lai, K., Mobile commerce integrated with RFID technology in a container depot, Decision Support Systems in press (available online 16 June 2005). 8. Ni, L.M, Yunhao, L, Lau, Y.C, Abhishek, P.P., LANDMARC: Indoor Location Sensing Using Active RFID, Wireless Networks, 10(2004) 701710. 9. Goodrum, P.M., Mclaren, M.A, Durfee, A, The application of active radio frequency identification technology for tool tracking on construction job sites, Automation in Construction, in press (available online 27 July 2005). 10. Patil, M, Investments in RFID: A Real Options Approach, published whitepaper, Patni Computer Systems Ltd 2004. 11. Kosko, B , M, Fuzzy thinking: The New Science of Fuzzy Logic, Flamingo Press/Harper-Collins, London, 1990. 12. Sharif, A.M., Irani, Z , Exploring Fuzzy Cognitive Mapping for IS Evaluation, European Journal of Operational Research, in press (available online 26 August 2005).
ANALYSING SUCCESS CRITERIA FOR ICT PROJECTS
KOEN MILIS EHSAL, European University College Brussels, Centre for external cooperation, Campus Economische Hogeschool, Stormstraat 2, 1000 Brussels
KOEN VANHOOF Transportation Research Institute (IMOB), Universiteit Hasselt, Wetenschapspark 5 bus 6, 3590 Diepenbeek Since the 1960's many authors accepted the triple constraints (time, cost, specification) as a standard measure of success and this still appears to be extremely important in evaluating the success of ICT (information communication technology) projects. However, an ICT project cannot always be seen as a complete success or a complete failure. Moreover, the parties involved may perceive the terms "success" or "failure" differently. A quasi-experiment (gaming) was developed in order to determine the measures for success used by the different parties involved to judge an ICT project. The results of this quasi experiment were analysed using aggregation theory and validated by probabilistic feature models. In general the figures do not contradict. This research indicates that the impact of the triple constraints on the judgement of success is rather small. Other criteria, as there are user happiness and financial or commercial success are far more important. Surprisingly, whether or not a project was able to meet the predefined specifications was of little importance for the appreciation of the project's success. Keywords: Multi criteria analysis, gaming, project management and scheduling
1.
Introduction
In order to lead an ICT project towards high levels of success, a manager should know the criteria by which success is measured (i.e. success criteria). Fulfilling these criteria should be the manager's prime concern. Since the 1960's many authors accepted the triple constraints (time, cost, specification) as standard success criteria. It is assumed that if a projects completion time exceeds its due date or expenses overrun the budget, or outcomes do not satisfy a company's predetermined specifications, the project is a failure (Ingram, 2000; Wright, 1997; Turner, 1993). However, determining whether an ICT-project is a success or a failure is far more complex (Belassi & Tukel, 1996). Unlike a construction project, an ICT
343
344 project cannot always be seen as completely successful or completely failed (Wateridge, 1998). Moreover, different parties involved (e.g. management, projectteam, users, supporter, stakeholders) might perceive the project's success differently (Pinto & Slevin, 1989). But even among individuals of the same party, opinions might vary, since every individual has his/her own set of criteria against which the project is measured and these may be very subjective (Fowler & Walsh, 1999). Furthermore, not every criterion can be measured at the same time. Some criteria can only be assessed long after the determination of the project, as for example the financial or commercial success of an ICT implementation (Wateridge, 1996). The aim of the research is to determine the set of success criteria used by the different parties involved in an ICT project. 2. Research design In opposite to most studies on the subject, a quantitative approach was selected. The data was gathered using a type of experiment, referred to as gaming. The participants of the "game" were asked to rate the success of ICT projects, based on information (i.e. project descriptions) provided by the researchers. Seven possible success criteria were selected based on a literature review (Milis & Mercken, 2001). The list of criteria existed out of the triple constraints, extended with four other criteria: On time, Within budget, To specification, Users happiness, Projectteam happiness, Management happiness, Financial or commercial success. Selected experts were all well acquainted with ICT projects and were either employees of one of the two large electricity-distributing companies that were participating, or consultants working for these companies. Based on the role the different experts fulfilled, they could be classified into four groups : managers, project team members - no benefactors, project team members - no benefactors, end-users. During five consecutive days, the experts received an email with five project descriptions and were asked to judge the project's success based solely on the information provided. They were asked to reply by email within 24 hours (e.g. before the next set of descriptions arrived) to avoid comparison between answers
Note that due to the absence of a "control group" and a "calibration measurement" this research approach cannot be classified as an experiment and thus should be regarded as a ^Masi-experiment.
345 They were asked to state whether the project was a success or a failure and to rate success on a scale from 1 to 100. This resulted in a dataset with 650 binary datapoints (success or failure) and a dataset with 650 scores. 3. Data Analysis method The data are analysed using a technique proposed by Vanhoof ( Vanhoof &all 2005) in a customer satisfaction study. The technique evaluates the contribution of the success criteria in a two-stage evaluation process. First the evaluation process is modelled and then, in the second stage, the model is used to determine and quantify the contributions of the criteria. 3.1. Aggregation theory: uninorms Aggregation operators serve as a tool for combining various scores into one numerical value. An important class of aggregators , called representable uninorms, posses additive generators g : [0,1] —» [0,1] which define the uninorm via: U(x,y) = g , (g(x)+g(y))(l) (1) Dombi (Dombi 1982) showed that if g(x) is the generator function of the uninorm operator then the function displaced by a : g(x + a) = g a (x) also possesses the properties of the generator function. The neutral value 'e' naturally varies, which allows the formation of uninorm operators with different neutral values from one generator function. The generator function used contains one parameter whose value needs to be determined from the data. Consequently, for every expert evaluation the neutral value can be determined and the individual evaluation function can be constructed, which is a uninorm. This approach has the advantage that there is a higher sensitivity for differences between experts . 3.2. Calculating contributions of criteria, based on the full set of project evaluations The contribution of criterion Xj for expert i can be defined by the following difference : Contrib (ij) = Ej( x,,...., xn) - Ej( (x b ..... Xj.,, e u xj+i, ... , xn)
(2)
346 with Ej the uninorm of expert i. In fact the effect of replacing the criterion score by the neutral score is calculated. This effect can be positive or negative. As a consequence the histogram of all the contributions for a certain criterion will be bimodal. This histogram is characterized by three numbers: the total average value (called mean), the average value of the positive contributions (called pos) and the average value of the negative contributions (called neg). The results of all evaluations are presented in Table 1. Table 1: Overall results Users
mean on time
-0.3
within budget to specifications Management happiness projectteam happiness User happiness fin/com success
-0.5
neg, pos -19.9,,16.5 -18.5, 14.5
projectteam -
projectteam-
management
no benefactors
benefactors
mean
neg, pos
mean
1.6
-17.5 , 16.1
1.8
-18.5, 17.4
0.9
-19.1 ,16.6 ,
neg, pos
mean
neg, pos
0.6
-17.1 ,14.1
1.0
-16.9, 14.8
0.2
-16.9,,13.6
3.3
-14.7,,15.9
4.6
-12.6 ,15.5
4.8
-13.5, 16.7
3.6
-13.0,,14.9
-2.6
-21.0 ,17.5
-0.7
-18.0 ,17.5
-1.0
-18.6,,17.7
-1.9
-19.5,, 17.1
0.2
-17.2 , 14.4
1.5
-15.8 ,14.7
1.7
-15.9,,15.2
0.5
-16.3,,13.7
0.5
-22.8 ,,18.7
2.6
-19.2 ,20.2
1.8
-21.2,,19.9
0.7
-22.3 ,,18.7
-4.4
-22.6,,18.9
-2.6
-19.8 ,18.1
-3.0
-20.2,,18.4
-3.0
-19.9,,18.4
In order to understand the mean values, the percentages of perceived successful and failing projects is given in table 2. Table 2: Global evaluations Failing
Mean neg.
Succesful
scores
Mean pos.
scores
Users
38%
34,9
62%
62,3
Projectteam No benefactors
37%
32,8
72%
70,0
Projectteam No benefactors
36%
36,1
70%
67,9
Management
32%
30,7
68%
68,0
347 The mean value should be considered as a total impact measure. Table 1 indicates, for example, that for this set of project descriptions the criterion 'to specifications' has in general a positive impact for the users while the criterion 'fin / com success' has in general a negative impact. The absolute values of the negative contributions are greater than the absolute values of the positive contributions, which indicate that in absolute terms, the disfirmative effects exceed the affirmative effects. I.e. the reward for fulfilling a criterion is less compared to the punishment received for not fulfilling the criterion. The span between the positive and negative contributions (= |pos - neg|) provides an insight into the impact of the different criteria on the judgement of the project. A large span implies that fulfilling a criterion contributes largely to the perception of success while failing to fulfil the criterion contributes to a perceived failure. Consequently, the larger the span, the more impact the criterion has on the judgement of the project. Table 2 indicates for example that for the users, the span for the criteria "user happiness" (22,8 + 18,7 = 41,5) and "fin/com success" (22,6 + 18,9 = 41, 5) are equal and larger than the span of the other criteria. Consequently, the impact of both criteria on the judgement of a project is similar. Though, the high scores indicate that these are the most important criteria for the users. "User happiness", "fin/com success" and "management happiness" are the three most important criteria for all groups examined. Though, the proportion between the criteria differs. This signifies that the groups involved use similar sets of criteria, though the impact of every criterion in the set of success criteria differs depending on the group examined. Note that the criteria "to specifications" and "project team happiness" have a low span. Consequently, these criteria can be regarded as of little importance to the judgement of ICT projects. 3.3. Comparing results Table 3 combines the results of a probability matrix decomposition model (Maris, De Boeck & Van mechelen, 1996). For every party involved and for every criterion the median of the PMD model and the positive contribution of the aggregated model are represented. The first indicates the probability that a criterion is perceived as necessary for success, the latter features the affirmation power of the criterion.
348 Table 3: Comparing the results of the PMD model and the aggregation models Users
medianpos
projectteam-
projectteam-
no benefactors
benefactors
median pos
median
management
pos
median
pos
on time
.45
16.5
.28
16.1
.32
17.4
.53
16.6
within budget
.18
14.5
.25
14.1
.12
14.8
.06
13.6
to specifications
.06
15.9
.09
15.5
.12
16.7
.06
14.9
management happiness projectteam happiness user happiness
.17
17.5
.15
17.5
.12
17.7
.18
17.1
.04
14.4
.12
14.7
.05
15.2
.06
13.7
.24
18.7
.43
20.2
.42
19.9
.50
18.7
fin/com success
.48
18.9
.28
18.1
.43
18.4
.41
18.4
In general the figures do not contradict. Both techniques indicate that "user happiness" and "fin/com success" are the two most important criteria. They have a high mean for the PMD model and at the same time, they have a large positive contribution, indicating that their impact on the judgement of the project is important. Similarly, the criteria "to specification" and "project team happiness" are the least important factors. Though, the impact of the criteria "management happiness" and "on time" is less outspoken. Depending on the technique used, they have a slightly different place in the ranking of the criteria within the different groups. 4.
Conclusions
Since none of the groups examined bases their judgement solely on the triple constraints, fulfilling them does not guaranty that the project is perceived as a success. Moreover, satisfying the predefined specification appeared to have little impact on the judgement of a project. This clearly demonstrates that other sets of success criteria should be applied. The results of this research indicate that the criteria "on time", "user happiness" and "fin / com success" should be incorporated in any set of criteria, developed to evaluate the success of ICT projects. This research confirms that user satisfaction is a prime criterion for the end users. They want to work with the best (not optimum) application. They should
349 be happy with the project's results. Though, in opposite to literature (see supra), this is not the sole criterion. Financial or commercial success equally influences their judgment. This indicates that besides their personal desires, the corporate goals are a user's concern as well. Literature indicates that project team members are focusing on short term operational criteria. Though, this could only be confirmed partially. Not exceeding due date (criterion "on time") appeared to be a very important criterion for this group, while the other operational criteria such as "within budget" and "to specifications" have far less impact. Apparently, satisfying users and delivering fin/com success prevails over budgetary constraints and predefined specifications. Note that the emphasis on long term gains (fin/com success) is more outspoken for the project team benefactors compared to the project team no benefactors, as could be expected based on literature since the involvement of the latter ends at the handover of the project. The management focuses on the long-term gains (financial or commercial success). Their company needs to make profit and every project should contribute. Though, the criteria "on time" and "user happiness" appear to be important as well. Possibly this is caused by the fact that the gains an ICT project generates are often not fully tangible. References 1. Belassi, W., Tukel, O. I., 1996. A new framework for determining critical success/failure factors in projects, International Journal of Project Management, vol. 14., pp. 141-151. 2. Dombi, J., 1982. Basic concepts for the theory of evaluation: The aggregative operator, European Journal of Operational Research 10, pp. 282-293. 3. Fowler, A., Walsh, M., 1999. Conflicting perceptions of success in an information systems project, International Journal of Project Management, vol. 17, pp. 1-10. 4. Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities, Journal of the American Statistical Association, vol. 85. 5. Ingram, G., 2000. The way to enlightened project management, Project Manager Today. 6. Maris, E., De Boeck, P., Van Mechelen, I., 1996. Probability matrix decomposition models, Psychometrika, vol. 61, pp. 7-29
350 7. Milis, K., Mercken, R., 2001. Implementing IS/IT technology: success factors, in proceedings 13th International Society for Professional Innovation Management - conference. 8. Pinto, J.K., Slevin, D.P., 1989. Critical success factors in R&D projects, Research technology management. 9. Turner, J.R., 1993. The handbook of project-based management, McGrawHill. 10. Vanhoof K. , P. Pauwels , J. Dombi , Brijs T.,Wets G, 2005. Penalty-Reward Analysis with Uninorms: A Study of Customer (Dis)Satisfaction (2005) in Intelligent Data Mining. Techniques and Applications. Editors: Ruan D., Chen G., Kerre E., Wets G, pp. 237-252. 11. Wateridge, J., 1995. IT projects: a basis for success, International Journal of Project Management, vol. 13, pp. 169-172. 12. Wateridge, J., 1996. Delivering successful IS/IT projects: eight key elements from success criteria to review via appropiate management, methodologies and teams, PhD. Henley management college, Brunei University. 13. Wateridge, J., 1997. Training for IS/IT project managers: a way forward, International Journal of Project Management, vol. 15, pp.283-288. 14. Wateridge, J., 1998. How can IS/IT projects be measured for success?, International Journal of Project Management, vol. 16, pp. 59-63. 15. Wright, J.N., 1997. Time and budget: the twin imperatives of a project sponsor International Journal of Project Management, vol. 15, pp. 181-186.
MULTI-ATTRIBUTE COMPARISON OF ERGONOMICS MOBILE PHONE DESIGN BASED ON INFORMATION AXIOM GULCIN YUCEL Department of Industrial Engineering, Istanbul Technical University, Macka, Istanbul 34367, Turkey EMEL AKTAS Department of Industrial Engineering, Istanbul Technical University, Macka, Istanbul 34367, Turkey Axiomatic Design (AD) is a guide for understanding design problems, while establishing a scientific foundation to provide a fundamental basis for the creation of products and processes. The most important concept in axiomatic design is the existence of the design axioms. AD has two design axioms: independence axiom and information axiom. The independence axiom maintains the independence of functional requirements, and information axiom proposes the selection of the best alternative that has minimum information. In this study AD is proposed for multi-attribute comparison of mobile phones regarding their ergonomic design.
1. Introduction Today, mobile phones are not only used for making and receiving calls. Now, mobile phones provide functions such as short messaging service, internet connectivity, mobile camera, video recording, etc. Since mobile phones functions are expanded, the use of mobile phone has become a complex issue and various usability problems have arisen. In order to determine an easy to use mobile phone, an analysis of the current models in the market is conducted. Then several selected models are compared according to their ergonomic design and properties. In this study, to make a comparison, initially dimensions of ergonomic mobile are decided, then this dimensions' sub factors are listed. Finally, six phones are evaluated using Fuzzy AD. Comparison of the mobile phones according to their ergonomic properties is conducted under the predetermined physical and mental criteria. Since the comparison of mobile phones regarding ergonomic concerns has incomplete information, fuzzy AD approaches are exploited. In this paper, fuzzy multi-attribute axiomatic design approaches for selection of the most ergonomic mobile phone is introduced and the implementation process is represented by a real world example. 351
2. Principles of Axiomatic Design Axiomatic Design is a guide for understanding design problems, while establishing a scientific foundation to provide a fundamental basis for the creation of products and processes [1]. The most important concept in axiomatic design is the existence of the design axioms. AD has two design axioms: the Independence Axiom and he Information Axiom [2]. The Information Axiom indicates that the best design is the one with the least information content. In order to apply axiomatic design theory, firstly information content for a given functional requirement FR ( must be calculated. Information content I (. is calculated according to the following equation: I,=log2( — ) , (1) Pi In this formula, p (. is the probability of supplying FR. and it is decided by the design range and system range. Design range shows what the designer wishes to achieve in terms of tolerance, and system range shows the system capability. The intersection area between design range and system range shows the region where the acceptable solution exists and it is called common range, p . is defined as follows: p j = (system range / common range) (2) After obtaining all I ,• for each FR,, because there are n FRs, the total information content is the sum of all probabilities. If I f approaches infinity, then the system will never work [1]. 3. Ergonomics in Mobile Phone Previous research on ergonomic mobile phone evaluation has been done along two separate lines: physical approach and cognitive approach. Physical approach focuses on design elements such as weight, dimension, screen size and arrangement of buttons. Cognitive approach is interested in usability criteria such as leamability, memorability, efficiency, and image. Three usability dimensions are defined in ISO/IEC 9241 -11: effectiveness, efficiency and satisfaction. Effectiveness is defined as the accuracy and completeness with which users achieve specific goals. Efficiency is about the resources expended in relation to the accuracy and completeness with which users achieve specific goals, and satisfaction is the subjective assessment of how pleasurable it is to use. Addition to these factors, leamability (ability to reach a reasonable level of performance) and memorability (ability to remember how to use a product) is defined by Nielson [3].
353 Another reference for usability dimension is SUMI which provides a usability profile according to five scales: affect, control, efficiency, helpfulness, learnability [4] [5]. Also, Han et al. [6] divided usability two main groups. First one is defined as the performance dimensions that measure the user performance. The second group is defined as the image/imprecision dimension that measure the user's perception of the image and imprecision regarding to products. Moreover, the MPUQ (developed by Ryu) includes new criteria such as pleasurability and specific tasks performance [5]. In this study, ergonomic features are divided into two aspects: physical and cognitive. These aspects are further classified into sub-factors which are listed in Table - 2. Cognitive aspect's sub criteria are formed by the common factors in the existing usability questionnaires that are shown in Table - 1. Lai et al. [7] determine 3 representative image word: Simple-Complex, Handsome-Rustic, Leisure-Formal. Since, experts' evaluations have so many factors, these three representative words are used in order to evaluate image of the mobile phone. Moreover, physical attributes are found to be one of the most important features in mobile design in product catalogs. Table 1. Usability dimensions by usability questionnaires. Han et al., 2000
Nielsen, 1993
ISO 9241-11
1 .Perfromance Dimensions 1.1 .Perception/Cognition 1.2Memorization/Leamability 1.3.Control/Action 2.Image/Impression Dimensions 2.1 Basic Sense 2.2 Description of Image 2.3 Evaluative Feeling SUMI
1. Learaability 2. Efficiency 3. Errors 4. Memorability 5. Satisfaction
1. Effectiveness 2. Efficiency 3. Satisfaction
1. Affect 2. Efficiency 3. Control 4. Helpfiillness 5. Learnability
1 .Ease of Learning and Use 2.Helpfiillness and problem Solving 3.Affective Aspect and Multimedia Properties 4.Commands and Minimal Memory Load 5. Control and Efficiency 6. Typical Task for Mobile Phone
MPUQ
354 Table 2. Usability dimensions by usability questionnaires Pyhsical Attributes 1. Weight 2. Dimension 3. Function Button Style 4. Number buttons arrangement 5. Screen Size
Cognitive Attributes 1. Ease of Use 2. Learnability 3. Image 4.1 Simplex-Complex 4.2 Handsome - Rustic 4.3 Leisure - Formal
4. Fuzzy Axiomatic Design Approach In the fuzzy case, there is incomplete information about the system and design range, so the data available is fuzzy. All the previous points regarding uncertainty are very important to incorporate into ergonomic studies. The main advantages of using a fuzzy set are not only a gain in precision, but also the reduction of model complexity. There are limits of using crisp values for the evaluation process. First of all, some criteria can not be measured by crisp values, so in the selection process they are neglected [8] [9], Furthermore, the real world problems are complex and all of the decision data of the problem cannot be precisely assessed [8]. Humans are unsuccessful in making quantitative predictions, but they have capabilities to make qualitative predictions which the computers do not have. The use of fuzzy set theory allows us to incorporate unquantifiable information, incomplete information and nonobtainable information. Since ergonomic mobile phone selection has incomplete information, fuzzy AD is chosen for selection process. The system contains five conversion scales, and triangular fuzzy numbers. Firstly the experts decide design range of each alternative for each criterion by the help of linguistic expressions, and then the linguistic expressions are transformed into fuzzy numbers. After that, the common area is found by the intersection area of triangular fuzzy numbers. 5. Evaluation of Mobile Phones Using Fuzzy AD Six mobile phones of the same price range and having typical mobile phone functions are evaluated using Fuzzy AD. Two of the selected mobile phones are sliding type which is shown in Figure 1 .a, two of them are the folding type (Figurel .b) and the last two are block type (Figurel.c). The criteria considered in the ergonomic mobile phone selection are decided and five conversation scales are produced for them. For evaluating
intangible criteria: ease of use, ease of learning, complexity, fashionability, formality, function button style and number buttons arrangement five fuzzy triangular fuzzy numbers between 0 - 2 0 are used. For tangible criteria: Phone's dimensions, weights, screen sizes, 120 Mobile's related features are collected from magazines and product catalogs, then according to these data, the fuzzy numbers are produced. The linguistic variables and their fuzzy numbers arranged to these criteria are shown in Table 3. The FRs that should be satisfied a Mobile Phone are given below: FR1 = Weight must be light, FR2= Dimension must be medium, FR3 = Screen Size must be large, FR4 = Function Button Style must be moderate, FR5=Number buttons arrangement must be irregular, FR6 = Usability must be high, FR7 = Learnability must be very good, FR8 = Appearance must be moderate, FR9 = Fashionability must be handsome, FRIO = Perception of appearance must be moderate.
Fig 1. Alternative Mobile Phones' Designs.
After design ranges are decided; the experts produce the system range data and use linguistic expressions as in Table 3. In order to obtain Information content for the alternatives, common area is calculated. When M, = (1,, m,, n,) and M 2 = ( l 2 , m 2 , n 2 ) , and when 1, < u 2 , the d is the ordinate of the highest point D between Mi and M2, and it is calculated according to the following formula [10]: Jd)zero.
/|_ 2
" (m 2 -w 2 )-(m,-/,)
and when
'i > u 2 > ^ (d)
is
Table 3. Triangular fuzzy conversion scales and Linguistic Variables. Criteria Weight Dimension Screen Size Function Button Style Number buttons arrangement Usability Learnability Appearance -ashinonability Perception of Appearance
Very light (70,70,80) Very small (50, 50, 70)
Light (80,100,120) Small (60,80, 110)
Very small (65,65,160)
Small 120,160,200)
Very regular (0,0,6) Very regular (0,0,6) Very Low (0,0,6) Poor (0,0,6) Very Simple (0,0,6) Very Rustic (0,0,6) Very Formal (0,0,6)
Regular (4,7,10) Regular (4,7, 10) Low (4,7, 10) Fair (4,7, 10) Simple (4,7, 10) Rustic (4,7, 10) Formal (4,7, 10)
Fuzzy Numbers Medium Heavy 110,140,170) (160, 190,220) Medium Big (90, 125, 160) (140, 175, 190) Medium Large (170,240 (290,350,410) ,310) Moderate Irregular (8,11, 14) (12,15, 18) Moderate Irregular (8,11, 14) (12,15,18) Medium High (8,11,14) (12,15, 18) Good Very Good (8,11,14) (12,15,18) Moderate Complex (8,11,14) (12,15,18) Moderate Handsome (8,11,14) (12,15,18) Moderate Leisure (8,11,14) (12,15,18)
Very Heavy 210,240,240) Very Big 180,200,200) Very Large 390, 700,700) Very Irregular (16,20,20) Very Irregular (16,20, 20) Very High (16,20,20) Excellent (16,20,20) v*ery Complex (16,20,20) 'ery Handsome (16,20,20) Very Leisure (16,20,20)
Table 4. System range data for mobile phones. FR1
FR2
FR3
FR4
FR5
PI
Light
Small
Large
de-derate
Regular
P2
Medium
Medium
Very Large
Irregular
Regular
'hones
FR6 Very High Very High
P3
Light
Medium
Large
Irregular
Very Regular
P4
Light
Small
Large
rregular
Regular
P5
Medium
Big
Large
Very Jregular
Very rregular
Very High Very High
Medium
Medium
dedium
Irregular
rregular
4edium
P6
High
FR7
FR8
FR9
FR10
3ood
Moderate
handsome
Moderate
)ood
"omplex
handsome
Formal
Fair
Simple
Moderate
4oderate
Jood Very )ood Very Jood
Simple
Moderate
Moderate
Complex
Rustic
Leisure
Moderate
landsome
Very Formal
After d points are obtained, common area is calculated, and than information content can be found. The information content calculations for FR1 and FR3 are given in the following. All of the information contents are listed in the table 5. According to the Table 6, the phone with minimum information content is PI. As the alternative with minimum information content is best, PI which a sliding one is selected as the most ergonomic mobile phone together for physical and mental attributes. Also, for one of the two main attribute physical aspects, P3 which is a folding one is the best, on the other hand for cognitive approach PI is found the best.
Table 5. Information Content for Alternatives. Phones
FR1
FR2
FR3
FR4
FR5
Total
J
R6
-R7
FR8
FR9
-RIO
Total
PI
0
5.02
0
0
5.17
6.19
3.4
3.4
0
0
0
12.99
P2
1.32
0
5.78
3.17
0
14.27
3.4
3.4
5.17
0
3.17
27.41
P3
0
0
0
5.17
0
3.17
1
inf
5.17
5.17
0
inf
P4
•.32
3.02
0
5.17
0
10.51
3.4
3.4
5.17
5.17
0
23.65
P5
•.32
5.61
0
inf
3.4
inf
3.4
0
5.17
inf
3.17
inf
P6
•.32
0
5.29
5.17
0
12.78
5.17
0
0
0
inf
inf
6. Conclusion In this paper, we try to find the most ergonomic mobile phone to produce. For this aim, we use fuzzy axiomatic design method rather than Crisp AD. If we had complete information, Crisp AD would be sufficient to solve decision model. If data in decision model have little uncertainty, it is not required to convert data into fuzzy format. According to Ross [12], the aim should be matching the model type with the character of the uncertainty exhibited in the problem. According to uncertainty level, three different types of model can be used. If a system has a little uncertainty, closed-form mathematical expression would be the suitable method. For systems which are more uncertainty, but for which significant data exist, model-free method should be used. However, with the systems which have incomplete information, nonobtainable information or unquantifiable information, fuzzy modeling provides a way to understand system [12]. In our case, ergonomic mobile phone selection has too many attributes and these attributes about the problem are generally conflicting with each other and measured in different scales. Also, there is difficult to measure the intangible criterion quantitatively. Therefore, in this study fuzzy AD method is used. In fuzzy AD method, expert's opinions about the alternatives' design ranges are obtained by the linguistic variables. One advantage of using linguistic variables is that this kind of expression is more intuitive and easy for experts to give their opinions in an ambiguous situation where numerical estimations are hard to get. As a result, PI is found most ergonomic mobile phone both physical and cognitive approach. Also, by the side of physical approach P3 is the best. However, by the side of cognitive approach PI is found the best. The AD
358 method for the selection process has advantages more than other multi-attribute decision making methods. Firstly, the designer wants to satisfy a criterion peak pressure but may not want to meet this criterion best level. This is not possible when working other existing models like AHP, fuzzy AHP, and scoring models. Also, the AD method rejects an alternative which does not meet the decision range of any criterion, and the other methods do not. Also, dimensions of ergonomic mobile phones can help develop ergonomic mobile phone. References 1. Suh, N.P., Axiomatic Design: Advances and Applications. Oxford University Press, New York, (2001). 2. Lai, Y.C., A stochastic model for product development process, PHd. Thesis, Iowa State University (2002). 3. Lee, Y. S., Hong, S.W., Smith-Jackson, T.L., Nussbaum, M. A. and Tomioka, K. Systematic evaluation methodology for cell phone user interface, Interacting with Computers (2001) 1 - 22. 4. Kirakowski, J., and Corbett, M., SUMI: The Software Usability Measurement Inventory," British Journal of Educational Technology 24 (1993) 210-212. 5. Ryu, Y.S. Development of usability questionnaires for electronic mobile products and making methods, Ph.D. Thesis, Virginia Polytechnic Institute and State University (2005). 6. Han, S. H., Yun, M. H., Kim, K. and Kwahk, J. Evaluation of product usability: development and validation of usability dimensions and design elements based on empirical models, International Journal of Industrial Ergonomics 26 (2000) 477- 488. 7. Lai, H.H., Lin, Y.C., Yeh, C.H. and Wei, C.H. User - oriented design for the optimal combination on product design (2005). 8. Kulak, O. and Kahraman, C, Multi-attribute comparison of advanced manufacturing systems using fuzzy vs. crisp axiomatic design approach. International Journal of Production Economics 95(3) (2005) 415-424. 9. Kulak, O. and Kahraman, C, Fuzzy multi-attribute selection among transportation companies using axiomatic design and analytic hierarchy process. Information Sciences 170(2-4) (2005) 191-210. 10. Chang, D.Y., Applications of the extent analysis method on fuzzy AHP. European Journal of Operational Research 95(1996) 649-655. 11. Zhu, K. J., Jing, Y. and Chang, D.Y., Theory and Methodology A discussion on Extent Analysis Method and applications of fuzzy AHP. European Journal of Operational Research 116 (1999) 450 - 456. 12. Ross, T.J. Fuzzy Logic with Engineering Applications (1995).
FACILITY LOCATION SELECTION USING A FUZZY OUTRANKING METHOD IHSAN KAYA Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey,[email protected] DIDEM CINAR Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey,[email protected] Most decision-making problems deal with uncertain and imprecise data so conventional approaches cannot be effective to find the best solution. To cope with this uncertainty, fuzzy set theory has been developed as an effective mathematical algebra under vague environment. When the system involves human subjectivity, fuzzy algebra provides a mathematical framework for integrating imprecision and vagueness into the decision making models. The main subject of this study is a fuzzy outranking model, and a numerical example of a facility location selection problem with fuzzy data is considered using this model.
1. Introduction Facility location problem, whose optimization is a central area in operations research, is determining the best region for facility. Typical applications of facility location include placement of factories, warehouses, schools, ATM machines and proxy servers in content distribution networks on the internet. Selection of facility locations among alternative locations is a decision problem including quantitative and qualitative criteria simultaneously. In this study, we analyze the outranking preference method to generate strategic concepts, evaluate them and select the best ones. During the decision phase, eight main attributes are considered; travel distance, travel cost, political decision, and convenience of access, material handling cost, working condition, cost of renting & maintenance and other characteristics. Since the criteria of determined locations are subjective, we use fuzzy numbers. The rest of this paper is organized as follows. Section 2 provides some approaches about facility location selection methods. In section 3, proposed fuzzy outranking method is examined and showed how such a model can assist in analyzing a multi criteria decision-making problem when the information 359
360 available is vague, imprecise and subjective. An application of this model and its discussion is presented in section 4. Finally, concluding remarks are made in section 5. 2. Fuzzy Sets Approaches to Facility Location Selection Facility location is one of the most important aspects of logistics. The goal of research in this area is to support decisions regarding building facilities, e.g., plants and warehouses, among a set of possibilities such that all demands can be served. Usually the main objective is minimizing cost or maximizing profit. As a result of a good location decision, a company or an organization can save millions of dollars. Because of the imprecision or vagueness of the linguistic assessment, the conventional methods of location selection tend to be less effective. Fuzzy sets have been widely used for the facility location problem in die recent years. Certain types of uncertainties are encountered in a variety of areas and fuzzy set theory has pointed out to be very efficient to consider these. Tzeng and Chen [1] propose a location model, which helps to determine the optimal number and sites of fire stations at an international airport, and also assists the relevant authorities in drawing up optimal locations for fire stations. Kuo et al. [2] develop a decision support system to locate a new convenience store. This system has been integrated with analytic hierarchy process using the fuzzy sets theory. Chen [3, 4] investigates the distribution center location selection problem under fuzzy environment. To solve this problem, they propose a new multiple criteria decision-making method. Kahraman et al. [5] investigates using different solution approaches of four different fuzzy multi-attribute group decision-making. The first one is a fuzzy model of group decision proposed by Blin. The second is the fuzzy synthetic evaluation. The third is Yager's weighted goals method and the last one is fuzzy analytic hierarchy process to solve facility location problems. Al-bader [6], provides an introduction to the concept of decision-making and presents introduction, prerequisites and the needs for a methodology for analyzing certain problems under fuzzy environment and deals with rating models under fuzzy environment with an application of a facility location selection problem. 3. Fuzzy Outranking Method Multi Attribute Decision Making (MADM) refers to making a selection among some given and predetermined alternatives in the presence of multiple, usually conflicting and sometimes interactive attributes [7]. In MADM, selection attributes are determined; all alternatives are evaluated by rating these attributes and comparing the each other. The aggregation phase is followed by an
361 exploitation phase, which allows the decision maker to obtain a rank ordering, a choice or a sorting among the alternatives [7]. In general, the set of A = {a, b, c...} is used for formulating the alternatives of the multi criteria decision-making problem. These alternatives are evaluated by n criteria; gh g2, . . ., g„. The best alternative in the set A is selected based on the criteria vectors g(k), ke A. In this paper, instead of crisp numbers, the performance ratings are described with triangular fuzzy numbers. Since the available information is too subjective or the information is not sufficient, it is difficult to determine which alternative is the best. The incomparable criteria may be required to select an appropriate location until sufficient information is collected. For modeling the imprecise preference relations between location alternatives, the fuzzy outranking relation proposed by Roy [8] is used. Let Pi (a, b) eR be the fuzzy preference relation between a and b, where a and beA, for criterion i. gt(a) and gt(b) are the linguistic performance of alternatives a and b according to criterion i. gt(a) and g,(b) are represented by fuzzy numbers. According to Tseng and Klein [9], the fuzzy preference relationship is given as follows: riab)
'X.'M + manbfl)
(1)
D(a,0) + D(b,0)
where D(a, b) is the area where a dominates b; D(a, 0) the area of a, D(b, 0) the area of b; D(arb, 0) intersection areas of a and b. Preference relations are obtained by using related areas under fuzzy membership functions. Three preference models, which are applied in this study, are analyzed as follows [9]; 3.1. Pseudo-order preference model The pseudo-order preference model separates the set of alternatives into two sets; dominance and nondominance sets. During the discrimination, relative importance of each criterion is not considered. 3.2. The semi-order preference model When the relative importance of each criterion is predictable, semi-order preference model is used to identify nondominance set. 3.3. The complete-preorder preference model Complete-preorder preference model, in which the most promising "best" alternative is selected, is a special type of the pseudo-order preference model. Threshold value is not used so q; =p; =0, ieC. The degree of dominance is used
362 to determine the complete-preorder preference model and rank the set of alternatives in a complete order. 4. Application This example is chosen from the study of Al-bader [6]. In Al-bader's study he solved this problem with not only fuzzy sets but also crisp sets. Then he compared the results.In this study, we will use three preference-ranking methods, which were explained in chapter 3, with fuzzy numbers. Problem. There is an industrial engineering problem about location selection. Four alternatives (LI, L2, L3, L4) are considered. Selection of the best facility location area related 8 attributes; • Travel Distance: Distance between Working Condition: This location and market or the other lay criteria consists of the size, outs. comfort, car parking, ..., etc. Travel Cost Cost of renting and Political Decision maintenance. Convenience of Access: Situation Other Characteristics: There of ways ... etc. are some characteristics that Material Handling Cost: Handling may effect the decision of the cost is the result of long way, long new facility location. Climate, time etc. social conditions etc. Ratings of location alternatives for each attributes are given in Table 1 and weights of the attributes are summarized in Table 2. Triangular fuzzy numbers are used for each alternative to model the selection problem (Table 3). Table 1. Ratings for each location alternative Attribute Travel Distance Travel Cost Political Decision Convenience of access Material handling cost Working conditions Cost of renting & maintenance Other characteristics
LI 9 8 6 5 5 4 5 5
L2 8
L3 9
7 7 6 5 5 4
7 5 6 5 4 5 5
5
L4 7 8 6 6 5 4 6 5
363 Table 2. Weights and Normalized weight for each location attribute Attribute Travel Distance Travel Cost Political Decision Convenience of access Material handling cost Working conditions
Weight 8 8 3 4
Normalized weight 0.222 0.222
2 4
0.083 0.111
Cost of renting & maintenance
3
0.056 0.111 0.083
Other characteristics
4
0.111
Table 3. Fuzzy ratings for each location alternative
Travel Cost
LI (8,9,10) (7,8,9)
L2 (7,8,9) (6,7,8)
L3 (8,9,10) (6,7,8)
L4 (6,7,8) (7,8,9)
Political Decision Convenience of access Material handling cost Working conditions Cost of renting & maintenance Other characteristics
(5,6,7) (4,5,6) (4,5,6) (3,4,5) (4,5,6) (4,5,6)
(6,7,8) (5,6,7) (4,5,6) (4,5,6) (3,4,5) (4,5,6)
(4,5,6) (5,6,7) (4,5,6) (3,4,5) (4,5,6) (4,5,6)
(5,6,7) (5,6,7) (4,5,6) (3,4,5) (5,6,7) (4,5,6)
Attribute Travel Distance
4.1. Pseudo-order preference model qi=0.25 and pj=0.85 are applied as thresholds to the location selection problem [8]. Pseudo-order preference model is used to set the alternatives into dominance and nondominance set. Fuzzy preference relations among four location alternatives are given in Table 4. Table 4. Fuzzy preference relations between alternatives Travel Distance L4 L2 L3 LI 1.00 LI 0.50 0.875 LI 0.50 L2 0.875 L2 0.125 0.50 0.125 1.00 0.50 L3 0.875 L3 0.50 L4 0.50 0.00 0.125 L4 0.00 Political Decision L4 L3 LI L2 LI 0.50 0.875 0.125 LI 0.50 L2 0.875 1.00 0.50 L2 0.875 0.125 L3 0.50 0.00 L3 0.125 L4 0.50 0.50 0.125 0.875 L4
for each attributes Travel Cost L2 L3 L4 LI 0.875 0.875 0.50 0.50 0.50 0.125 0.125 0.50 0.50 0.50 0.125 0.125 0.875 0.50 0.50 0.875 Convenience of access LI L2 L3 L4 0.125 0.125 0.125 0.50 0.50 0.875 0.50 0.50 0.50 0.50 0.50 0.875 0.50 0.50 0.50 0.875
364 Material handling cost LI L2 L4 L3 LI 0.50 0.50 0.50 0.50 0.50 L2 0.50 0.50 0.50 L3 0.50 0.50 0.50 0.50 L4 0.50 0.50 0.50 0.50 Cost of renting & maintenance LI L2 L4 L3 LI 0.50 0.875 0.50 0.125 L2 0.125 0.50 0.125 0.00 0.875 0.125 L3 0.50 0.50 L4 0.875 1.00 0.50 0.875
LI L2 L3 L4
LI 0.50 0.875 0.50 0.50
LI L2 L3 L4
LI 0.50 0.50 0.50 0.50
Working conditions L2 L3 L4 0.50 0.125 0.50 0.875 0.50 0.875 0.50 0.50 0.125 0.50 0.50 0.125 Other characteristics L4 L2 L3 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
According to Fig. 1, dominance and nondominance sets are following; S N D ={2}andS D ={l,3,4} Since only the location 2 (L2) outranks the other alternatives, L2 seems to be the result of this preference model.
Fig. 1. The outranking graph according to pseudo-preference model 4.2. Semi-order preference model Normalized weights are calculated using the weights of attributes in Table 2. For semi-order preference model q; (q,= 0.25) is enough to determine the best result. According to Fig. 2, dominance and nondominance sets are following; S ND ={l}andS D ={2,3,4}
Fig. 2. The outranking graph according to semi-order preference model Since only the location 1 (LI) outranks the other alternatives, it seems to be the result of semi-order preference model.
365 4.3. Complete-preorder preference model Weighted preferences and degree of dominance are calculated using MD = £^(o>&) an< i shown in Table 5 and 6. beA,b*a
Table 5. Weighted preference matrix L2 LI 0,58 0,50 LI 0,42 0,50 L2 0,43 0,53 L3 0,47 L4 0,46
L3 0,57 0,47 0,50 0,53
L4 0,54 0,53 0,47 0,50
Table 6. Degree of dominance Alternative LI L2 L3 L4
Degree of dominance 1,69 1,42 1,43 1,46
Dominance degree of location 1 is the highest value among the four alternatives. Location 4 is following location 1. Since degrees of location 2 and 3 are equal, there are two alternative arrangements. But in both of the arrangements first and second location do not change. Therefore, the best alternative is location 1. Ordering of the alternatives is shown in Fig. 3.
Fig. 3. The outranking graph according to complete-preorder preference model
5. Conclusion In this study a fuzzy outranking model is proposed and a numerical example of a facility location selection problem with fuzzy data is considered . Three preference models are analyzed for facility location problem; (1) pseudo-order preference model, (2) semi-order preference model, (3) complete-preorder preference model. When <jr, is chosen 0.25 and/?, is chosen 0.85, location 2 outranks the other alternatives according to pseudo-order preference model. However, when semiorder and complete preorder preference models are applied, location 1 seems to be the best alternative. The results of models are summarized in Table 6.
366 Table 6. Results of three preference models Pseudo-order preference model Semi-order preference model Complete preorder preference model
Nondominance set {2}
{U {1}
Dominance set 0,3,4} {2,3,4} {2,3,4}
In prospective studies, the effect of the variances of threshold (pf and qO values on the results can be researched as a sensitivity analyses. We may look at whether the best selection changes or not when p ; and q; are changed. Also other fuzzy outranking methods may be used or evaluated to compare with this method's results. References 1. Tzeng, G. H., Chen, Y. W., Optimal location of airport fire stations: a fuzzy multi-objective programming and revised genetic algorithm approach, Transportation Planning and Technology 23 (1) (1999) 37-55. 2. Kuo, R.J., Chi, S.C., Kao, S.S., A decision support system for locating convenience store through fuzzy AHP, Computers & Industrial Engineering 37 (1999) 323-326. 3. Chen, C.T., A fuzzy approach to select the location of the distribution center, Fuzzy Sets and Systems 118 (2001) 65-73. 4. Chen, S.M., Fuzzy group decision-making for evaluating the rate of aggregative risk in software development, Fuzzy Sets and Systems 118 (2001) 75-88. 5. Kahraman, C , Ruan, D., Dogan, I., Fuzzy group decision-making for facility location selection, Information Sciences 157(2003) 135-153. 6. Al-Bader, N., Certain Models for facility location and production planning under fuzzy environment, A thesis of Master of Science, Department of Mechanical and Industrial Engineering, University of Manitoba, Winnipeg, Manitoba 7. Roubens, M., Fuzzy sets and decision analysis, Fuzzy Sets and Systems 90 (1997) 199-206 8. Roy, B., Vincke, P.H., Relational systems of preference with one or more pseudo-criteria: some new concepts and results, Management Sci. 30 (1984) 1323-1335. 9. Giingor, Z., Arikan, F., A fuzzy outranking method in energy policy planning, Fuzzy Sts and Systems, 114 (2000) 115-122.
EVALUATION OF SUPPLIERS' ENVIRONMENTAL MANAGEMENT PERFORMANCES BY A FUZZY COMPROMISE RANKING TECHNIQUE GULCIN BUYUKOZKAN
ORHAN FEYZIOGLU
Department of Industrial Engineering, Galatasaray University, Qragan Caddesi, No:36 Ortakoy, 34357, Istanbul-Turkey Traditionally, when evaluating supplier performance, companies have considered factors such as price, quality, flexibility etc. However, with environmental pressures increasing, many companies have begun to consider environmental issues and the measurement of their suppliers' environmental performance. This paper presents a performance evaluation model based on a multi-criteria decision-making method, known as VIKOR, for measuring the supplier environmental performance. The original VIKOR method has been proposed to identify compromise solutions, by providing a maximum group utility for the majority and a minimum of an individual regret for the opponent. In its actual setting, the method treats exact values for the assessment of the alternatives, which can be quite restrictive with unquantifiable criteria. This will be true especially if the evaluation is made by means of linguistic terms. For this reason we extend the VIKOR method so as to process such data and to provide a more comprehensive evaluation in a fuzzy environment. The extended method is used in a real industrial application.
1. Introduction In the last years "green" movements, institutions and governments have forced many companies to improve their environmental performance. To respond to this growing concern for "green" issues, firms have carried out a great number of environmental programs. Following this, today, the environmentally conscious firms—mainly multinational corporations—are developing "green" programs aimed at organizing their supply value chains according to an ecoefficiency perspective [1]. In particular, pro-active companies are seeking to develop co-operative links with supply chain partners, particularly small and medium sized enterprises, in order to accelerate the diffusion of environmental management initiatives and to design and develop new "green" products [2, 3]. In spite of the growing importance of the supplier's role in the new product development process, there are relatively few models for supplier selection that effectively take environmental performances into account [4, 5, 6, 7]. An effective "green" supplier selection approach should be able to link all the 367
368 members of a supply chain. Furthermore, the selection decision should be driven by environmental factors as well as financial and other tangible and intangible set of performance criteria. As the importance degrees of evaluation criteria are different, the procedure should also allow prioritization of the criteria used in the evaluation process. For these reasons, this study aims to develop a supplier selection approach that is able to (1) consider environmental issues, (2) incorporate both tangible and intangible performance indicators, (3) handle different weights for each evaluation criteria, (4) handle multiple criteria decision making (MCDM), and, (5) provide multiple desirable solutions to the decision maker (DM). The paper is organized as follows. Section 2 gives a brief description of the green supplier evaluation criteria. Section 3 presents the linguistic VIKOR method in a group decision-making context. Section 4 applies the suggested approach to measure the performance of suppliers. Section 5 gives some concluding remarks. 2. Evaluation Criteria for Environmentally Conscious Suppliers Historically, several methodologies have been developed for evaluating, selecting and monitoring potential suppliers10,18'24 that take into account factors dealing with, for example, quality, logistics and cost. However, none of these methodologies has considered the importance of environmental factors, such as, life cycle analysis or design for environment in the decision-making process. Last years, a number of researchers have begun to identify some relevant criteria. Sarkis [8] grouped environmental criteria such as "design for the environment", "life cycle analysis", "total quality environmental management", "green supply chain" and "ISO 14000 environmental management system requirements", but used them only to evaluate the existing internal company operations for their environmental performance. Focused on supplier selection, Noci [7] identified four environmental categories including "green competencies", "current environmental efficiency", "supplier's green image" and "net life cycle cost". Enarsson5 proposed a fishbone diagram based instrument similar to ones used in quality assessment within companies for the evaluation of suppliers from an environmental viewpoint. Four main factors have been identified: "the supplier as a company", "the supplier's processes", "the product itself and "transportation". By consolidating several studies, Humphreys et al. [5] proposes seven environmental categories. The category "environmental costs (pollutant effects)" and "environmental costs (improvement)" are grouped together under the title "quantitative environmental
369 criteria". The other five categories named "management competencies", "green image", "design for environment", "environmental management systems", and "environmental competencies" are in a separate group termed "qualitative environmental criteria". In a recent work, Kongar [6] introduces environmental consciousness indicators such as "recyclability of goods", "decreased amount of hazardous substances" and "compatibility with health and safety regulations" into the supplier evaluation process. Based on the mentioned studies and the contribution of industrial experts who actually work in the environmental management related departments of three international companies' Turkish branches, the following criteria are to be considered for the assessment of the supplier: (a) environmental management competencies, (b) existing environmental management systems, (c) effort for the "design for environment", (d) effort for the "production for environment", (e) effort for the "logistics for environment", and (f) environmental costs. 3. Fuzzy VIKOR Method in a Group Decision-Making Setting Most common approaches in supplier selection include expert evaluation, principal components analysis, factor analysis, cluster analysis, discriminant analysis, data envelopment analysis, fuzzy logic based evaluation approaches [9]. Supplier selection usually involves comparisons of alternative solutions on the basis of multiple conflicting criteria and hence can be considered as a MCDM problem. One type of MCDM methods is the distance-based techniques like compromise and composite programming seek to find a solution that is close to an ideal solution, or like the Nash cooperative game concept - a solution as far as possible from the worst solution. As a method belonging to the compromise programming category, VIKOR was introduced as an applicable technique to implement within MCDM [10, 11]. The VIKOR method determines the compromise ranking-list and the compromise solution by introducing the multi criteria ranking index based on the particular measure of "closeness" to the "ideal" solution. The compromise solution is a feasible solution, which is the closest to the ideal, and here "compromise" means an agreement established by mutual concessions. With this ability, VIKOR is selected in this work as a suitable method for evaluating suppliers. Meanwhile, the method requires crisp evaluation of alternatives. Owing to the availability and uncertainty of the information, it is not always possible to obtain exact numerical data for decision criteria. Moreover, most evaluators tend to give assessments based on their knowledge, past experience and subjective judgments. As an example, "quality" is a linguistic variable since its
values are linguistic values rather than numerical ones, i.e., poor, fair, good, very good, etc. Fuzzy set theory plays a significant role to deal with the vagueness of human thought. The approximate reasoning in fuzzy set theory can properly represent linguistic terms [12]. The value of a linguistic variable can be quantified and extended to mathematical operations using fuzzy set theory [13, 14]. As the suppliers' environmental management performance contains hardly quantifiable factors, VIKOR method is extended in this study with fuzzy logic to process such data and to provide a more comprehensive evaluation. Recently, Opricovic and Tzeng [15] have also suggested using fuzzy logic for the VIKOR method. However, they simply used fuzzy values to define the attributes' ratings and their importance at a first phase, and then, obtained results are defuzzified in a second phase to obtain crisp values and are used as such in the original VIKOR method. Here, we suggest also making use of fuzzy logic in the subsequent phases of VIKOR method to not to loose any important information with the mapping process. Lets denote m alternatives under consideration as a^,a2,...,am, and n evaluation criteria as c],c1,...,c„. Then, the suggested procedure is as follows. Step 1. Construct a committee of K experts and identify the alternatives and evaluation criteria. Step 2. Identify the evaluation base, i.e. the linguistic variables to weight the criteria and rate the alternatives. Step 3. Determine the aggregated fuzzy weight w, of criterion cf i = 1,2,...,n , and the aggregated fuzzy rating ftj of alternative aj j = l,2,...,m under criterion c,. To achieve this, we use the weighted fuzzy Delphi method [16]. Delphi-Stepl: K experts are asked to provide their evaluation by using the linguistic variables as given in Table l.a-b, which in turn corresponds to the fuzzy triangular numbers f~ and wf . For a triangular fuzzy number (/, m, u), m is the most encountered value. Each expert has a weight Xk determined according to his/her degree of experience. Table 1. Linguistic variables to rate (a) criteria importance and (b) alternatives. Linguistic Variable Very Low (VL) Low (L) Medium (M) High (H) Very High (VH)
Fuzzy Scale (0.0,0.0,0.3) (0.0,0.3, 0.5 ) ( 0.2, 0.5, 0.8 ) (0.5,0.7,1.0) (0.7,1.0,1.0)
Linguistic Variable Very Poor (VP) Poor(P) Fair (F) Good (G) Very Good (VG)
Fuzzy Scale (0.0, 0.0, 0.2 ) ( 0.0,0.2, 0.4) (0.3,0.5,0.7) (0.6,0.8, 1.0) (0.8,1.0,1.0)
Delphi-Stepl: First, the weighted average L of all / * 's and wi of all are computed as flj={[\®f^®...®\xk®f^J(\+... + Xk)waA
371 (|A,®v^}e...e{^®wf}) l{^+... + Xk) Then the deviations between the fv and ftj , and also between w, and w* are computed with the method presented in Fortemps and Roubens [17] for each expert. Delphi-Stepi: A threshold value is defined so that the deviation is sent back to the expert if the distance between the weighted average and expert's evaluation data is greater than this value. If the threshold is passed, the process loops from step 2 is until there is no such threshold exceeding value is encountered. This process is repeated until two successive averages are reasonably close to each other. It is assumed that the distance being less than or equal to 0.2 corresponds to two reasonably close fuzzy estimates [18]. Step 4. If the supports of triangular fuzzy numbers expressing linguistic variables (Tables l.a-b) do not belong to the interval [0, 1], then scaling is needed to transform them back in this interval. Here, we use a linear scale transformation to have a comparable number. As an example, if we transform the rating of alternatives, we have f.. = ( / ; / / « , f/ //?>*,////"") where
fv={tiJlfl),fr=™«>fl
< = 1,2,...,«.
Step 5. Compute the values of 5. and Rj j = l,2,...,m by the relations Sj = ®"^wld(\,riJ) and Rj =max/ vv^fl,/^) where Sj and Rj are used for formulating ranking measures of "group utility" and the "individual regret" respectively. Here, d(l,K ) represents the distance of an alternative rating to the positive ideal solution 1 = (1,1,1) calculated by area compensation method [17] which has reasonable ordering properties and computational easiness. Note that the maximum among H ^ I I , ^ . ) values is the one that is the most distant from! . Step 6. Compute the values g\ j = l,2,...,m by the relation Q. =V(S'J)®(\-V)(R'J), where S\ and Rj are the normalized Sj and Rj values using the linear scale transformation. Here, "v" is introduced as a weight of "the majority of criteria" strategy. The compromise can be selected with "voting by majority" (v>0.5), with "consensus" (v=0.5), or with "veto" (v<0.5). Step 7. The ranking order of alternatives is determined with the help of the area compensation method. First, S\, Rj and Qj values are defuzzified into crisp Sj, Rj and g ; values. Then, alternatives are ranked by sorting each Sj, R) and Qj values in an increasing order as in the original VIKOR method. The result is a set of three ranking lists denoted as SI,, R!, and Q,. The alternative j \ corresponding to gL (the smallest among Qj values) is proposed as a compromise solution if
372 CI. The alternative y, has an acceptable advantage, in other words 2r2] _ fijii - DQ where Z)g = l/(/w-l) and m is the number of the alternatives. C2. The alternative _/', is stable within the decision making process, in other words it is also the best ranked in S!-, or Rl,. If one of the above conditions is not satisfied, then a set of compromise solutions is proposed, which consists of: • 7', and j2 where Qj = Q,2, if only the condition C2 is not satisfied, or • j]'j\'---' Jk ^ t n e condition CI is not satisfied; and jk is determined by the relation Q., - Q,, < DQ for maximum k where QA = G t ,. 4. Application of the Proposed Approach Step 1 & 2. The authors have collaborated with a committee of 3 experts from the investigated company to undertake this study. Since there were no significant degrees of experience difference between experts, all are assumed to be equally important for the decision process. Six different suppliers is to be evaluated versus six criteria decided on and given in at the end of section 2. Step3. Linguistic terms presented in Table l.a-b are used in the assessment process. The aggregated fuzzy weights of criteria and the aggregated fuzzy ratings of alternatives are calculated through the weighted sum of individual evaluations and shown in Table 2. In this and subsequent tables, c, and W, stand for the labels of criteria / and alternative j respectively. The column named "Weight" is for the criteria importance evaluations. Table 2. The aggregated fiizzy weights and ratings. W, (.8,1,1) (,2,.4,.6) (.1,.3,.5) (.67,-87,1) (.5,.7,.9) (.1,.3,.5)
c, c2 c3 c„ cs c6
W2 (.8,1,1) (.4,.6,.8) (.57,.77,.9) (.5,.7,.9) (.2,.4,.6) (.5,.7,.9)
W3 (.1..3..5) (.73,-93,1) (.67,.87,1) (.4,.6,.8) (.67,.87,1) (.5,.7,.9)
W4 (.8,1.,1) (.6,.8,1) (.73,.93,1) (.8,1,1) (.8,1,1) (0,.2,,4)
Ws (.57,.77,.9) (.73..93.1) (.1,.3„5) (.4,.6,.8) (.3,.5,.7) (.6,8,1.)
W6 (.3,.5,.7) (.57,.77„9) (0,.2,.4) (.73,-93,1) (.1..3..5) (.5,.7„9)
Weight (.57,-8,1) (.63,.9,1) (.3,-57,-87) (0,-3,-5) (0,.3,.5) (.57,.8,1.)
Table 3. S'. R'. and O, values for v = 0.6. & j'
j
y
Lists \j
Wi
W2
W3
W4
W5
W6
Qj
(.17„29,.38)
(.27,.45,,59)
(.48,.70,.88)
(.12..19..24)
(.35,.55,.72)
(.49,-77,1)
R'j
(.21,.30,.38)
(.28,.40,.50)
(.57, .80,1)
(,14,.20,.25)
(.50,.70,.88)
(.57,80,1)
S'j
(.14,.28,.39)
(.26,.48,.64)
(.42,.63,.80)
(.11,.18,.24)
(.26,.45,.61)
(.44,.75,1)
Step 4. As the supports of the fuzzy numbers given in Table l.a-b are in [0, 1] interval, obtained results remain in that interval and the scaling step is skipped.
373 Step 5 & 6. The normalized "group utility" measure Sj and "individual regret" measure R'j are calculated for each alternativej = 1, ..., m. Based on the results of these two measures, Qs values are computed by selecting v = 0.6 (Table 3). Step 7. Table 4 gives the defuzzified scores of alternatives computed with area compensation method and their corresponding rankings. Table 4. Ranking of alternatives for v = 0.5.
ft'
S' Alternatives
w, w2 w3 w W4s w6
Dist. 0.27 0.46 0.62 0.18 0.44 0.74
Rank 2 4 5 1 3 6
Dist. 0.30 0.40 0.79 0.20 0.69 0.79
Q Rank 2 3 5 1 4 6
Dist. 0.28 0.44 0.69 0.19 0.54 0.76
Rank 2 3 5 1 4 6
It is not possible to declare alternative 4 as the winner. We observe that this alternative satisfies condition C2 but not Ci given that gL, - gL = 0.09 < DQ = 0.2. Since only condition C\ is not satisfied, we propose alternatives 4 and 1 as the set of compromise solutions due to the following inequity: g,3, - gL = 0.25 > DQ = 0.2. The weight v has a central role in the ranking and a sensitivity analysis can be undertaken by systematically setting v to some values between 0 and 1. The results of such an analysis are presented in Table 5. Table 5. Ranking of alternatives for different values of v. V
0.00 0.25 0.50 0.75 1.00
Set of compromise solutions W,, W2, W4 W,,W„ W,,W4 W,,W4 Wi,W4
5. Conclusions This study proposed a fuzzy MCDM framework which a company can consider during their "green" supplier selection process. The approach basically extends the VIKOR method that helps DMs to achieve an acceptable compromise. In the extended method, the importance weights of criteria and the ratings of alternatives are assessed in linguistic terms. By using the suggested approach, the ambiguities involved in the assessment data could be effectively represented and processed to assure a more convincing and effective evaluation process. Although the extended method presented in this paper is applied to the
environmentally conscious supplier evaluation problem, it can also be used to identify acceptable compromises in many supplier evaluation problems. Acknowledgements The authors acknowledge the financial support of the Galatasaray University Research Fund. References 1. J-B. Sheu, Y-H. Chou and C-C. Hu, An integrated logistics operational model for green-supply chain management, Transport. Res. E-Log., 41 (4), 287-313, (2005). 2. G. Biiyukozkan, An analytic approach for strategic analysis of green product development, Proceeding of the 13th International Working Seminar on Production Economics, Innsbruck, Austria, Vol. 2, 87-96, (2004). 3. L. Li and K. Geiser, Environmentally responsible public procurement (ERPP) and its implications for integrated product policy (IPP), J. Clean. Prod, 13 (7), 705-715, (2005). 4. L. Enarsson, Evaluation of suppliers: how to consider the environment, Int. J. Phys. Distr. Logist. Manag., 28 (1), 5-17, (1998). 5. P.K. Humphreys, Y.K. Wong and F.T.S. Chan, Integrating environmental criteria into the supplier selection process, J. Mat. Proc. Tech., 138, 349-356, (2003). 6. E. Kongar, "A comparative study on multiple criteria heuristic approaches for environmentally benign 3PLs selection", Proceeding of the 3rd International Logistics and Supply Chain Congress, Istanbul, 23-24, (2005). 7. G. Noci, Designing 'green' vendor rating systems for the assessment of a supplier's environmental performance, Eur. J. Pur. Supply Manag., 3 (2), 103-114, (1997). 8. J. Sarkis, Evaluating environmentally conscious business practices, Eur. J. Oper. Res., 107, 159-174,(1998). 9. L. de Boer, E. Labro and P. Morlacchi, A review of methods supporting supplier selection, Eur. J. Pur. Supply Manag, 7 (2), 75-89, (2001). 10. S. Opricovic and G.H. Tzeng, Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS, Eur. J. Oper. Res., 156,445^55, (2004). 11. G.H. Tzeng, C.W. Lin and S. Opricovic, Multi-criteria analysis of alternative-fuel buses for public transportation, Energy. Policy, 33, 1373-1383, (2005). 12. L.A. Zadeh, The concept of a linguistic variable and its applications to approximate reasoning, Inform. Sciences, Part I, 8, 199-249; Part II, 8, 301-357; Part III, 9,43-80, (1975). 13. A. ICaufmann and M.M. Gupta, Introduction to fuzzy arithmetic theory and applications, Van Nostrand Reinhold, New York, (1991). 14. L.A. Zadeh, Fuzzy sets, Inform. Control, 8, 338-353, (1965). 15. S. Opricovic and G.H. Tzeng, Defuzzification within a multicriteria decision model, Int J. Uncertain Fuzz., 11, 635-652, (2003). 16. G. Bojadziev and M. Bojadziev, Fuzzy logic for business, finance, and management: advances in fuzzy systems, World Scientific Pub., (1997). 17. P. Fortemps and M. Roubens, Ranking and defuzzification methods based on area compensation, Fuzzy Set. Syst., 82, 319-330, (1996). 18. C.H. Cheng and Y. Lin, Evaluating the best main battle tank using fuzzy decision theory with linguistic criteria evaluation, Eur. J. Oper. Res., 142, 174-186, (2002).
A FUZZY MULTIATTRD3UTE DECISION MAKING MODEL TO EVALUATE KNOWLEDGE BASED HUMAN RESOURCE FLEXD3DLITY PROBLEM* MUJDE EROL GENEVOIS Industrial Engineering Department, Galatasaray University, Ciragan Cad. No: Ortakoy 34357 Istanbul, Turkey Y. ESRA ALBAYRAK Industrial Engineering Department, Galatasaray University, Ciragan Cad. No: Ortakoy 34357 Istanbul, Turkey In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting or ranking alternatives associated conflicting attributes. In this paper, a MADM with fuzzy pairwise information is used to solve human resource flexibility problem. In many manufacturing systems human resources are the most expensive, but also the most flexible factors. Therefore, the optimal utilization of human resources is an important success factor contributing to long-term competitiveness.
1. Introduction Human resources are one of the main sources of flexibility. In many manufacturing systems human resources are the most expensive, but also the most flexible factors. Therefore, the optimal utilization of human resources is an important success factor contributing to long-term competitiveness. For example annualizing working hours is a tool that provides flexibility to organizations; it enables a firm to adapt production capacity to fluctuations in demand. In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting or ranking alternatives associated conflicting attributes. In this paper, through the multicriteria decision making (MCDM) analysis, we illustrate the relationships between human resource policies and characteristics of knowledge-based management. The purpose of this paper is to develop a theoretical framework for flexible modes of strategic human resource management within a global knowledge-based view organisation configuration. Using the proposed fuzzy multi-criteria approach, the ambiguities * This work is supported by research foundation of Galatasaray University.
375
376 involved in the assessment data can be effectively represented and processed to assure a more convincing and effective decision-making. Our aim is to combine subjective fuzzy preference information with objective information to derive the relative importance weights of the attributes and to derive a subjective ranking of alternatives from fuzzy pairwise relations. This approach is based on a fuzzy muhicriteria decision making (FMCDM) model to solve the problem of evaluating the flexibility of Knowledge Based Human Resource Flexibility Management System (KBHRFMS) face to internal and external problems. 2. Basic model In this paper the approach FAHP is introduced, with the use of triangular fuzzy numbers for pairwise comparison scale of FAHP, and the use of the extent analysis method for the synthetic extent value S. of the pairwise comparison. By applying the principle of the comparison of fuzzy numbers, V[MJ>M2)=1
if
m/>m2
a n d v\M2>M])=hgt\M1l
M2)=fiM
(d)
(1) the weight vectors with respect to each element under a certain criterion can be represented by d(A.) = minv[s.2LSk), k = l
,n, k*i.
(2)
In the following, first the steps of the extent analysis method on fuzzy AHP are given and then the method is applied to the evaluation of the human resource flexibility modes face the internal and external firm's problems. The overall FAHP approach can be summarized as follows [1]: a. Construct fuzzy pairwise comparison matrices; the decision-maker needs relative fuzzy values of decision criteria and alternatives based on each criteria. b. Solve fuzzy eigenvalues for each matrix; the eigenvector of the matrix is the relative importance of the alternatives and criteria. c. Determine the total weights The triangular fuzzy number and the linguistic variable are the two main concepts used in this paper to assess the preference ratings of linguistic variables, 'importance' and appropriateness'. In order to assess the relative importance of various criteria, an assumed weighting set W =tyeryLow, Low, Medium, High, Very High) has been developed. To evaluate the appropriateness of the alternatives versus various criteria, the decision makers can employ the linguistic rating set S = {Very Poor, Poor, Fair, Good, Very Good}. The triangle fuzzy conversion scale is shown in Table 1. Assume that X = |x;,x ...,x ) is an object set, and u = \i.,u2,...,u ) is a goal set. According to the method of Chang's [2] fuzzy extent analysis, each object is taken and extent analysis is performed for each goal respectively.
377 Table 1. The triangular fuzzy conversion scale Triangular Fuzzy Numbers
Linguistic Values Very Low (VL) Very Poor (VP)
m
Low(L) Poor(P)
U>)
Exactly Equal
(U.1)
Medium (M) Fair(F) High(H) Good(G) Very High (VH) Very Good (VG)
(U5) {3,5,5)
Therefore, m extent analysis values for each object can be obtained, with the following signs: Mi M2 .,Mg.i = l,2 nwhere all the M{ (j = 1,2 m) are triangular fuzzy numbers representing the performance of the object x with regard to each goal „ . By using fuzzy synthetic extent analysis, the value of fuzzy synthetic extent with respect to the i* object x.(i = 1,2,
,n) that
represents the overall performance of the object across all goals can be determined by m
;
j=l
8
,-
i
(3)
I S Mig
i
The degree of possibility of
n m
i=lj=l
M
> M is defined as, (4)
^ 2 ^ ) = sup min\ fiM (x),MM (j) xi.y
and can be equivalently expressed as follows: When a pair (x,y) exists such that x>y and juu/x) = HU2(y), then we have v(Mx>M1)=\- Since W ; and M2 are convex fuzzy numbers we have that V{M2
>MJ)=
V{M. >M2)=l if m >m ,
hgt[M]I M2)= nM (d),
(5)
where d is the ordinate of the highest intersection point D between jUM and HM . When M] =(//,m/M/) and M 2=\f2,m2,u2),
the ordinate ofD is given by :
378 v{M2>_M^hg{Mx
n
*J
=
p _ i L ^ ^
(6)
To compare M\ and M2, we need both the values of V(M, > M2) and V\M . >M,). The degree possibility for a convex fuzzy number to be greater than
k
v(M>MrM2
convex
fuzzy
numbers
Mk)= V\M > M 1}nnd{M > M 2)md
Assume that, /(A^mmv^zsJ
M.(i = 1,2,...,k)can be
and[MZMk)\ = minV[M >M . ) / = 1,2
for k = l,2,...,n;k*i.
,
7-
defined
by k•
Then the weight vector is
given by, w' = (d (A)d'(A)...,d'(A J where A.(i = l,2
n) are n elements. Via
normalization, the normalized weight vectors are, w = [d{A\d[A\..,d\A f where Wis a non fuzzy number. 3. Human Resource Flexibility 3.1.
WhyHRF
Traditional competitive mechanisms have become less effective as competitors meet or copy each other's corporate initiatives [3]. In response, firms constantly search for newer sources of competitive advantage, one of the most important being human resource management (HRM) [4]. Managers have long understood that the structure of work—for example, the duration of the employment contract, die number of hours typically worked, or the method of compensation—affects employees' wages, opportunities for promotion, the likelihood of unemployment, and other labor market outcomes. In recent years, this managers of labor markets have turned their attention to parttime work, on-call work, independent contracting, and other "nonstandard" forms of employment, meaning those in which work is not done "on a fixed schedule—usually full-time—at the employer's place of business, under the employer's control, and with the mutual expectation of continued employment" [5]. Academic response to these increasingly fashionable ways of organizing work has largely been critical, and with good reason: deviations from "standard" employment relationships are often associated with lower hourly wages, fewer benefits, greater risk of unemployment, fewer chances for promotion, and reduced wage growth ([6]; [7]; [8]; [5]).
379 3.2. Human-Resource-based Flexible Criteria 3.2.1. Human Nature (CI). The human resource's qualifications (CI 1) and the relationship type (contract) (CI2) are the mainly criteria which constitute human nature. The human resource's qualifications can be explain by education (CI 11), skills (CI 12), intelligence (CI 13) and motivation (CI 14). The human-firm relationship consists in a variety of flexible staffing arrangements. When the adoption of these different internal and external staffing arrangements is driven by a firm's uncertainty in its demand for labor we refer to them as contingent work arrangements, using the definition in Polivka [9]: "Any job in which an individual does not have an explicit or implicit contract for long-term employment or one in which the minimum hours worked can vary in a nonsystematic manner". We assume that the employer may draw upon three sources of labor: fixed (C121), contingent (C122) and statue-related (C123) sources like part-time workers and temporary contracts to minimize costs of labor and backlogged work. The decision whether to take on temporary workers in lieu of hiring permanent employees is a decision that involves significant risk. 3.2.2. Organization Nature (C2). There are otiier tools that can provide flexibility within organizations: organization structure (C21) and operation organization (C22). Organizations can be structured in two possible ways: in matrix (C211) or in hierarchical (C212). We can find many possibilities in operation organization like overtime (C221), annualizing working hours (C222), equip-working (C223), multi-skilled workforce, variations in the distribution of working time, shift work. By using annualizing working hours (AH), costs due to a lack of capacity can be diminished and, in some cases, eliminated. However, AH often implies a worsening of the staffs working conditions and the need to solve a complicated problem in planning working time. 4. Application of fuzzy AHP (FAHP). 12 evaluation criteria for the hierarchical structure were used in this study. The aim of the evaluation is to construct the better flexibility modes portfolio for deal whit three problems sets, two external and one internal: PI (needs for volume flexibility), P2 (needs for mixte flexibility) and P3 (needs for process flexibility). The fuzzy evaluation matrix relevant to the goal is given in Table 2.
380 Table 2. The fiizzy evaluation matrix M with respect to the goal Qoal
Human
Organizational
Human(Ci)
(1,1,1)
(1/3,1,3)
Organization^
(1/3,1,3)
(1,1,1)
In a similar way, we can construct the fuzzy pairwise (evaluation) comparison matrices for all criteria among all the elements/criteria and subcriteria in the dimensions of the hierarchy system. Via pairwise comparison, the fuzzy evaluation matrix M, which is relevant to the goal, Mi and M2, matrices with respect to Human resource and Organizational perspective respectively, are constructed (See Tables 3 and 4). Table 3. The fuzzy evaluation matrix of sub-criteria, Mi, with respect to Human resource Human resource Qualification(C„)
Qualification (1,1,1)
Contract style (1,3,5)
Contract style (C,2)
(1/5,1/3,1)
(1,1,1)
Table 4. The fuzzy evaluation matrix of sub-criteria, M2, with respect to Organizational Organization
Organization
Operation
The procedure for determining the evaluation criteria weights by FAHP can be summarized as follows: First step: Table 2 (matrix M) gives the fuzzy comparison data of the subcriteria of goal. Construct pairwise comparison matrices by using formula (3), we obtain SCj =(1.3,2,4)®(I I ^
= (0.16,0.5,1.54)
S^ = (l.3, 2 , 4 ) ® ^ , - , — J = (0.16,0.5,1.54)
Using formulas (4) and (5) y(sr is ) = -, ™-)M , = ,.00 c L cn n) (0.5-1.54)-(0.5-0.16)
v[sr (, cn
ZSr ] = 7 °- 1 '-!- 5 4 , = 1.00 c n J (O.S-1.54)-(O.S-0.16)
The normalized weight vector from Table 2 is calculated as, WM = (0.50,0.50)T • In a similar way, according to hierarchical structure, we can obtain the normalized weight vectors for all criteria and sub-criteria, (omitted).
381 Second step: At the second level of the decision procedure, the committee compares alternatives P, P and P under each criteria separetely. The normalized weight vectors of alternatives with respect to sub-criteria are calculated and shown in Table 5. Table 5. The normalized weight vectors of alternatives Criteria Cm C\n Cm Cll4 Cl21 Cl22 Cl23 C211 C212 C221 C222 C223
Pi 0.182 0.333 0.274 0.333 0.182 0.670 0.420 0.274 0.415 0.908 0.415 0.415
Pi 0.409 0.333 0.311 0.333 0.409 0.179 0.315 0.415 0.274 0.046 0.311 0.311
Pi 0.409 0.333 0.415 0.333 0.409 0.152 0.265 0.311 0.311 0.046 0.274 0.274
The overtime tool is used to reduce the needs of the volume flexibility. The requirement of volume flexibility can be increased also with using contingent labor. The matrix structure tool, educated and experienced workers are the main factors to deal with the mixte flexibility problem. For the process flexibility problem the intelligent, educated and experienced labor forces are the most important factors. Finally, adding the weights per alternative multiplied by the weights of the corresponding criteria (Table 6) a final score is obtained for each alternative. Table 6. Main criteria of the goal and final scores
Weight Pi P2
Ps
c,
C2
Alt. vector
0.5 0.326 0.332 0.342
0.5 0.421 0.305 0.274
0.374 0.318 0.308
priority
5. Conclusion In this paper, the complex, multi-criteria nature of appropriate Knowledge Based Human Resource Flexibility Management System decision has been brought out. A multi-criteria methodology, called the fuzzy AHP, has been suggested in this paper for the purpose of considering the opinions of different managers in order to solve the problem of appropriate flexibility management style. It is well known that the modeling of complex human problems into purely qualitative terms limit the effectiveness of decisions. The decision of appropriate knowledge
382 based decision or evaluation flexibility management is very often, vague and uncertain. To address these concerns, the concepts of fuzzy numbers and linguistic variables are used to evaluate the human factors. An example of appropriate flexibility management style selection is presented to illustrate the proposed framework. The effectiveness of human resource factors are depended to their qualifications. Also, the subcriteria contract type is useful to deal with the quantitative variation of demands. The organizational nature is the most important criteria for the flexibility. When the volume and mixte flexibility problems appear, the organizational structure factors are used, and the volume flexibility need is achieved with operation organization tools. The proposed decision algorithm cannot only manipulate the conventional precision-based (non-fuzzy) problem but also help decision-makers to make suitable decision under fuzzy environment. Therefore, by conducting fuzzy or non-fuzzy assessments, the decision-makers can obtain the appropriate portfolio of flexible factors to resolve specific firm's problems. References 1. Triantophyllou, E., and C.T. Lin (1996). Development and Evaluation of Five Fuzzy Multiattribute Decision-Making Methods, /. J. ofApprox. Reasoning, 14, 281-310. 2. Chang, D.Y., (1996). Applications of the Extent Analysis Method on Fuzzy AHP. European Journal of Operational Research, 95, 649-655. 3. Ulrich, D., (1987). Organizational capability as a competitive advantage: human resource professionals as strategic partners. Human Res. Planning 10 4, 169-184. 4. Schuler, R.S., I.C. MacMillan (1984). Gaining competitive advantage through human resource management practices. Human Resource Management 23 3, 241-255. 5. Kalleberg, A.L., C.F. Epstein, B. Reskin, K. Hudson, (2000). Bad jobs in America: standard and nonstandard employment relations and job quality in the United States. American Sociological Review, 65 (2), 256-278. 6. Blank, R., (1994). Social Protection versus Economic Flexibility. University of Chicago Press, Chicago. 7. Ferber, M., and J. Waldfogel, (1998). The Long-term consequences of nontraditional employment. Monthly Labor Review, 3-12. 8. Kalleberg, A.L., (2000). Nonstandard employment relations: part-time, temporary, and contract work. Annual Review ofSociology, 26, 341-394. 9. Polivka, A.E., (1989). On the definition of contingent work. Mon.Lab. Rev., 112, 9/16.10. Corominas, A., A. Lusa and R. Pastor, (2004). Characteristics and classification of the annualised working hours planning problems, /. J. of Services Tech. and Man., 5, 435-447.
FUZZY EVALUATION OF ON THE JOB TRAINING ALTERNATIVES IN INDUSTRIAL COMPANIES GULGUN KAYAKUTLU T Istanbul Technical University, Department of Industrial Engineering, 34367 Macka Istanbul, Turkey GULCIN BUYUKOZKAN Industrial Engineering Department, Galatasaray University, 34357, Ortakoy, Istanbul, Turkey BURCIN CAN METIN, SAMI ERCAN Institute of Analytical Sciences, Istanbul Commerce University, 34378, Eminonii, Istanbul, Turkey This paper presents a combined fuzzy analytic hierarchy process (AHP) and fuzzy goal programming (GP) to determine the preferred compromise solution for Enterprise Resource Planning (ERP) training package selection in terms of value creation by multiple objectives. The problem is formulated to include six primary goals: maximize financial benefits, maximize effective software utilisation, maximize employee satisfaction, minimize cost, minimize training duration and minimize risk of operation. Fuzzy AHP is used to specify judgments about the relative importance of each goal. A case study performed for a small manufacturing company running ERP software training is included to demonstrate the effectiveness of the proposed model.
1. Introduction Enterprises consider creating a learning organization essential for long time survival. Since the initiation of the concept by Argyrs and Schon [1], and prominence of The Fifth Discipline by Senge [2], improving compatibility in learning is observed to depend on perceived need, learning mechanisms, learning processes and resource allocation. [3]. Enterprise Resource Planning (ERP) training has become a critical part of organizational learning which presents benefits of integrated information, if successfully implemented. Most of the small companies are not in rush to implement ERP because of the inadequacy for the rigorous training and development requirements besides budget [4]. For this reason, this paper suggests an analytic framework for * contact author: kavakutlu(aiitu.edu.tr.
383
effective ERP training package selection. Since ERP training package selection problem is multi objective in nature, the goal programming (GP) approach may be applicable. In addition, ERP training package implementation decision faces many constraints, some of these are related to organizations' internal policy and externally imposed system requirements. In such decision making situations, high degree of fuzziness and uncertainties are involved in the data set. Fuzzy set theory [5] provides a framework for handling the uncertainties of this type. For this reason, the ERP training packages selection problem has been formulated in this study as a fuzzy GP problem and fuzzy analytic hierarchy process (AHP) is also used to specify judgments about the relative importance of each goal in terms of its contribution to the achievement of the overall goal [6]. The concepts, the model, the case study and conclusions will be handled in the following sections. 2. Fuzzy goal programming GP is a powerful multi criteria decision making introduced by Charnes and Cooper [7], Difficulty of defining priorities for goals arises by subjectivity, that can be overcome by fuzzy set theory defined by Narashimhan [8], improved by Hannan [9], Ignizio [10], Tiwari et al. [11], and Mohammed [12]. Formulation of fuzzy GP shows an exception of assigning aspiration levels allowing integration of different decision makers' views. Zimmerman [13] defined two sets of membership functions: for the goals and the constraints. The maximum of discrepancies on expected goal values are to be minimized, hence, Flavell's solution [14] leads to expression of K fuzzy goals and L fuzzy constraints as in (1), where dk are the deviations in goal k and dj are the deviation in constraint / [15]. max X s.t X<\-^Zk'dCkX)
A,l-1^)
* = 1,2 / = 1>2,...,L
K
CD
d
l
Q<X<\ and X>0
3. ERP training package selection model development The selection of ERP training package includes five main steps: identifying the decision variables, describing the goals, formulating the objective function, weighting the importance of the goals by fuzzy AHP and solving the fuzzy GP using fuzzy objective function and the constraints. The first three steps, which
are the model development, are summarized as follows. 3.1. Decision variables Decision variables are xit with n alternatives /= 1, 2, .., «,; x, = 1 if training package i is selected, and zero otherwise. The coefficients are as follows: 6, : Benefits expected from training package i ; c, : Cost associated with training package ;' ; r, : Risk associated with training package i ; est : Employee satisfaction expected from thr training package i; /;: Estimated completion time for training package /'; oc,: Software operation capability proposed by package /. Positive and negative deviations are associated with each goal as in benefit represented by db+ and db. 3.2. Goals Maximize Financial Benefits (FB): Total benefit expected from the training package is to be maximized, to the expected value BEN. Y.biXi+dl -d+ =BEN '=1 (2) Maximize Operation Capability (OC): Contributions of the software training in operations by improving the employee skills is to be maximised to CAPAB. ZocjXj +d~c -dgC = CAPAB M
(3)
Maximize Employee Satisfaction (ES): Maximum satisfaction of the employees anticipated as PERF, will increase the motivation and efficiency. n Z eSjXj + des - des = PERF (4)
i=\
Minimize Risk (R): The risk representing the likelihood of failure in learning or teaching activities is to be minimised. Yet, the real objective is to maximize benefits in the presence of risks; thus, the risk-related objective function includes benefits in addition to risks with a target BEN* denoting the expected benefit given in the presence of risk. Z nbjXi + d; -d, = BENR <=> (5) Minimize Cost (C): We include the costs of resources in our model as an objective to be minimized to fit the budgeted expenses denoted by BDG. ^CjXj+d;
z=i
-d+
=BDG
(6)
Minimize Training Duration (T): Duration of training is to be minimised with a minimum expectation of 2 days, the reality of ERP training. "
-
"EtjXi +dt
+
-dt
=2
'=•
(7) 3.3. The objective function The objective function will attempt to minimize the sum of deviations associated with the constraints in the model as follows: Mm
Z = Pb{d^)+P0C{doc)+Pes{d~esYPr{dt)+Pc{dcYPt[dt)
(8)
4. A case study A small size steel cabinet manufacturer is chosen while it was in process of selecting the ERP software based on training packages proposed by four vendors. Priority of goals is determined by using Chang's fuzzy AHP method [16] with the improvement proposed by Zhu et al. [17] due to effectiveness and simplicity. The goals' importance degrees are given in Table 1. Levels of data sets using fuzzy GP are shown in Table 2 where employee satisfaction factor is subjectively scored on a scale of 0-100, the risk factor subjectively scored on a scale of 0-10, and OC is in a range of 50-90 % with an aspiration level at 80%. Table 1. The relative importance of the goals Training selection objectives
Weights
Financial Benefits (Wb)
0.28
Operation Capability (w„c)
0.22
Training duration (wt)
0.15
Risk(wr)
0.10
Cost(wc)
0.15
Employee Satisfaction (wes)
0.10
Table 2. Data set for fuzzy GP model
Alt. 1 Alt. 2 Alt. 3 Alt. 4
FB ($ 000 /year) AL=135 LT=100 AL=140 LT=120 AL=125 LT=75 AL=150 LT=100
OC
ES
R
AL=80 LT=50 AL=80 LT=50 AL=80 LT=50 AL=80 LT=50
AL=100 LT=90 AL=100 LT=75 AL=100 LT=50 AL=100 LT=80
AL=1 UT=3 AL=2 UT=4 AL=3 UT=5 AL=2 UT=6
C ($ per day per person) AL=1000 UT=1750 AL=1250 UT=1500 AL=750 UT=1250 AL=500 UT=1000
AL is Aspiration Level; LT is Lower Tolerance and UT is Upper Tolerance.
T (in days) AL=28 UT=32 AL=16 UT=35 AL=24 UT=45 AL=12 UT=25
By using the obtained objectives' weights and data set given in Table 2, we formulate the ERP training package selection problem as follows1 min Z = 0.28 ( ^ " + dbj ~ + d^ ~ + dbf~)
+ 0.22(doc>" + dOCi " + 0Cj ~ + d^ ")
+ 0-\o(deSl-+deSi- + des-+des-)+
OAslfi^+d^+d^+d^) + o.io(^++dr;+dr;+dr+)
+Q.\s{dc;+dc;+dc;+dc;)
(3.86-0.028x1 ,)+rf 6 ~ -db+
=1
s.t. M2\.
{2.67-0M3x2i)+d
~-dOCi+=l '
'
+
(l0-0.1x 3 1 )+rf < a i --rf
=1 (9)
(0.5x 4 1 -0.5)+c? r ~-J -"41 •
'
+
=1
r
' +
(0.002x5l-2)+dc~-dc
=1
-"51 •
(0.25x61 -l)+d, -"61 • d
b~>db+>doc i
i
'
>doc >des~>des + i
J
i
i
+
~-d,
=2
'
>di~~,di+,dc~,dc+,dr~~,dr+,xi/>0 i
J
i
J
J
i
J
Following the proposed procedure, we obtained the results of the fuzzy GP model using the software LINDO that all the deviations are null. Based on the results, the ERP training package 4 was selected and the decision makers were satisfied by this selection. 5. Concluding remarks Due to conflicting nature of the multiple objectives and vagueness in the information related to the parameters of the decision variables, the deterministic techniques are unsuitable to obtain an effective solution. The combined fuzzy AHP and fuzzy GP approach formulated in this paper is then extremely useful for solving the ERP training package selection problem when the goals are not clearly stated. The formulation can effectively handle the vagueness and imprecision in the statement of the objectives. The proposed formulation has also the advantages that any commercially available software such as LINDO may be used for solving it. FGP model constraints are given here only for j=l because of the page limits.
References 1.
C. Argyris and D. Schon, Organisational Learning: A Theory of Action Perspective, Addison Wesley, N.Y, N.Y. (1978). 2. P. Senge, The Fifth Discipline: The Art and Practice of Organisational Learning, Doubleday, New York, N.Y. (1990). 3. D. Whittington and T. Dewar, A strategic approach to organizational Learning, Industrial and Commercial Training, 36 (7), 265-268 (2004). 4. J.R. Muscatello, M.H. Small and I.J. Chen, Implementing enterprise resource planning (ERP) systems in small and midsize manufacturing firms, International Journal of Operations and Production Management, 23(8), 850-871 (2003). 5. L.A. Zadeh, Fuzzy Sets, Information and Control, 8, 338-353 (1965). 6. C. Kahraman, and G. BUyukozkan, A Combined Fuzzy AHP and Fuzzy Goal Programming Approach For Effective Six-Sigma Project Selection, Proceedings of UthlFSA World Congress, Volume III, 28-31 July 2005, Beijing, China. 7. A. Charnes and W.W. Cooper, Management Models and Industrial Applications of Linear Programming, John Wiley and Sons, New York, (1961). 8. R. Narasimhan, Goal Programming in A Fuzzy Environment, Decision Science, 13, 331-336(1982). 9. E.L. Hannan, On fuzzy goal programming, Decision Sciences, 12,522-531 (1981). 10. J.P. Ignizio, On the rediscovery of fuzzy goal programming, Decision Sciences, 13, 331-336(1982). 11. R.N. Tiwari, S.. Dharmar and J.R. Rao, Fuzzy goal programming- an additive model, Fuzzy Sets and Systems, 24,27-34 (1987). 12. R.H. Mohammed, The relationship between goal programming and fuzzy programming, Fuzzy Sets and Systems, 89,215-222 (1997). 13. H.J. Zimmerman, Fuzzy programming and linear programming with several objective functions, Fuzzy Sets and Systems, 1, 45-55 (1978). 14. R.B. Flavell, A new goal programming formulation, Omega, The International Journal of Management Science, 4, 731-732 (1976). 15. C.-C. Lin, A weighted max-min model for fuzzy goal programming, Fuzzy Sets and Systems, 142,407-420 (2004). 16. D-Y. Chang, Applications of the extent analysis method on fuzzy AHP, European Journal of Operational Research, 95, 649-655 (1996). 17. K-J. Zhu, Y. Jing and D-Y. Chang, A discussion on extent analysis method and applications of fuzzy AHP, European Journal of Operational Research, 116, 450456(1999).
A STUDY OF FUZZY ANALYTIC HIERARCHY PROCESS: AN APPLICATION IN MEDIA SECTOR*
MELISA OZYOL Galatasaray University, Industrial Engineering Department, Ciragan Cad. No;36 Ortakoy 34357 Besiktas /Istanbul TURKEY Y. ESRA ALBAYRAK Galatasaray University, Industrial Engineering Department, Ciragan Cad. No:36 Ortakoy 34357 Besiktas /Istanbul TURKEY The media sector is a sector which encompasses the creation, modification, transfer and distribution of media content for the purpose of mass consumption; therefore it is a very active and, when managed properly, a very effective sector. In this paper the three management methods Mbl, MbO, and MbV are evaluated and the most adequate management method for the media company A* is determined using Fuzzy Analytic Hierarchy Process (FAHP). This study shows the power of FAHP in capturing experts' knowledge.
1. Introduction To be effective and increase its organizational competitiveness, a media company should be able to maximize the performance of its most important assets, its employees. And to do so, the company should guarantee the satisfaction and the engagement of the employees. One of the most important factors which influence the engagement of the employees is the method of management of the company. In this paper three management methods (Management by Instructions, Management by Objectives, and Management by Values) will be evaluated using the extent method of Chang.
This research has been financially supported by Galatasaray University Research Fund. Company A, which's name will not be given for reasons of confidentiality, is an important Turkish media company.
389
2. Management Methods There are three methods of management selected for evaluation: Management by Instructions (Mbl), Management by Objectives (MbO), and Management by Values (MbV). The Mbl is the traditional model described by F W. Taylor [1] at the beginning of the 20th century, and it is based on the hierarchically arranged control of the employees. In Mbl, the communication is fast, information is specialized, and there is recourse to specialists. MbO, conceptualized by Peter Drucker [2], is a management method that clearly indicates to employees the results expected from them, it facilitates planning since the managers define their objectives and fix the expiries. MbV makes it possible to direct an organization by using its intangible values the most effectively possible. MbV can absorb the organizational complexity caused by the need of adaptation to changes, and can ease the achievement of a strategic vision in a company. 3. Evaluation Criteria The seven evaluation criteria that will be used are taken from the work of Albayrak and Erensal [3]. These criteria can be gathered in three principal criteria: conditional criteria, managerial criteria and individual criteria. 3.1.
Conditional Criteria
One of the two conditional criteria is the Physical Conditions. The complexity of work and individual coping behavior must be taken into consideration in setting up the workplace. And the other conditional criterion is the Corporate Conditions. This criterion examines how the work force is enabled to develop and use its full potential, aligned with the objectives of the company. 3.2. Managerial Criteria Leadership is one of the three managerial criteria and it examines the personal leadership of the senior leaders and their involvement in creating and sustaining values, company directions, performance expectations, and a system of leadership that promotes excellence of performance for human resources. Corporate Culture, which shapes up the way how people are used to think, act, make decisions and participate in an organization, must maintain effective Environmental fluctuations may have a negative effect on the number of
391 mechanisms focused on developing the personal and professional potential of each and every member. Therefore it is a managerial criterion. The last managerial criterion is Participation. The participation of the employees is becoming more essential in the companies, especially when a problem or a decision requires multiple skills and knowledge in various fields the participation becomes more desirable. 3.3. Individual Criteria The first one of the two individual criteria is Capability. Knowledge, skills and abilities of the individuals in an organization constitute the capability of the human performance. The skills and competency of individuals in a company are generally improved by educating and training the human after having designed a system to support human capabilities and limitations. The other individual criterion is Attitude. Development of a workforce with positive work attitudes, including loyalty to the organization, pride in work, a focus on common organizational goals and the ability to work with employees from other departments, facilitates team work and flexibility [4]. 4. Choice of Methodology Analytical Hierarchy Process (AHP) is often used in solving multi-criteria analysis problems involving qualitative data. Where there are a limited number of choices but each has a number of attributes AHP can be used to formalize decision-making. In AHP, instead of exact numbers, expressions like "more important than" are used to show the preferences of decision. However this method may fail in reflecting the uncertainty and imprecision of human thinking style and therefore it is often criticized. On the other hand, fuzzy logic offers a more natural way of dealing with these preferences instead of exact values [5]. The fuzzy set theory, introduced by Zadeh [6], deals with uncertainty due to imprecision and vagueness and it enables decision makers to give interval judgments instead of fixed value judgments, which they think is more confident. The evaluation of the three methods of management according to seven criteria and the selection of the best is a multi-criteria decision problem. The Fuzzy Analytical Hierarchy Process (FAHP) is abundantly used to deal with the problems including the multiple criteria evaluation or the selection of alternative. It is adequate to use the FAHP to determine the weights of the criteria according to subjective judgments of each expert. As the AHP cannot reflect the human thinking style in capturing experts' knowledge it is going to be developed a model, as shown in Figure 1, based on FAHP which would determine the total weights for the three different methods of management while
examining how, according to certain criteria, the properties of each method of management affect the human performance in a company. There are various ways to treat FAHP [7]. In this application, the extent FAHP (EFAHP) will be used. Level 1: Goal Level 2: Criteria
Level 3: Sub-criteria
Human Performance
/ Conditional Criteria •Physical Conditions •Corporate Conditions
1 Managerial Criteria -•Leadership
Capacity
-•Participation
Attitude
—•Corporate Culture
T Level 4: Alternatives
Management bv Instructions
\ Individual Criteria
Management bv Objectives
I Management bv Values
Figure 1. Hierarchical Structure
5. Application In this part the three methods of management will be evaluated by a group of five experts, two from service sector, and three others from production sector, and the best management method for Company A will be determined. Some abbreviations used in the application are: EH: Excellence of the Human Performance, C: Conditional criteria, L: Leadership, M: Managerial criteria, Pa: Participation, I: Individual criteria, Cc: Corporate Culture, Ph: Physical Conditions, Ca: Capacity, Co: Corporate Conditions, At: Attitude. 5.1. Application of FAHP with the Extent Method (EFAHP) The group of five experts is required to evaluate the alternatives according to the selected criteria and is required to present a common decision. The procedure to determine the weights of the evaluation criteria by the FAHP using the extent method can be summarized in two stages. In the first stage, the pair-wise comparison matrices between all the criteria and the sub-criteria of the hierarchical system are constructed.
Table 1. The fiizzy comparison matnx for the goal EH EH
C
M
I
C
(1,1,1) (1,3,5) (3,5,5)
(1/5, 1/3, 1)
(1/5, 1/5, 1/3)
(1,1,1)
(1/3,1,3)
(1/3, 1,3)
0,1,1)
M I
After having built the pair-wise comparison matrix for the goal (Table 1) the vectors of value synthetic extent S are calculated: Sc = (1.40,1.53,2.33)® (.0492,.0739,. 1240) = (.069,. 113,.289) SM = (2.33,5,9)
VEH(S,>SU)
.115-.289 : .41 C-l 13 - .289) - (.369 -.115)
.069-1.116 = 1.32 (.369-1.116)-(.113-.069) .115-1.116 = = 1-17 (.517-1.116)-(.369-.115)
VEH(S, ZSC) =
.069-1.116 = 1.63 (.517-1.116)-(.113-.069)
.213-.289 16 (.113-.289)-(.517-.213)~' .213-1.116 = .86 fEH(Su>S,) " (.369-1.116)-(.517-.213)
VEH(SC^SI) =
Finally, the values d' and then the weight vector Ware obtained: c/'(Q=16 cT(M)=.86 => W ' E H =(.16, .86, 1.17)T rf'(I)=1.17 By standardizing W ' EH the weight vector of Table 1 in respect to the decision criteria C, M, and I is obtained: W EH = (.072, .392, .535)T In the second stage, the alternatives Mbl, MbO, and MbV are compared separately according to each criterion. After obtaining the W vector for each comparison matrix, the final evaluation matrix given in Table 2 is obtained.
Table 2. Final scores of the alternatives
C 072
Mbl MbO MbV
PH .5 .333 .333 .333
Co .5 .333 .333 .333
L .415 .222 .389 .389
EH M .392 Pa .311 .023 .313 .664
535 CC .274 .072 .392 .535
Ca .5 .274 .311 .415
At .5 .023 .313 .664
W total .150 .334 .514
6. Conclusion The goal of this work was to find the best management method for Company A to maximize the performance of its employees. According to the results of the application, to achieve this goal it is necessary to implement in the company a system of management which is a mixture of the three methods presented: Mbl, MbO, and MbV. This mixture can be considered as an ideal method of management to reach the excellence of human performance in Company A. References 1. F.W. Taylor, The Principles of Scientific Management. Harper & Row, London (1947, original edition, 1911). 2. P. Drucker, The Principles of Management. NY HarperCollins Publishers, New York (1954). 3. E. Albayrak and Y.C. Erensal. "Using analytic hierarchy process (AHP) to improve human performance: An application of multiple criteria decision making problem". Journal of Intelligent Manufacturing, 15, 491-503 (2004). 4. V.L. Huber and K.A. Brown, "Human resource issues in cellular manufacturing: Associotechnical analysis". Journal of Operations Research,\\, 138-159(1991). 5. T.J. Ross, Fuzzy Logic with Engineering Applications. (Internationale Edition). McGraw-Hill, NY (1995). 6. L.A. Zadeh, "Fuzzy Sets". Information and Control, 8 (3), 338-353 (1965). 7. C.-L. Hwang and S.-J. Chen, in collaboration with F.P. Hwang. Fuzzy Attribute Decision Making: Methods and Applications. Springer-Verlag, Berlin, (1992).
PRIORITIZATION OF RELATIONAL CAPITAL MEASUREMENT INDICATORS USING FUZZY AHP AHMET BESKESE Department of Industrial Engineering, Bahcesehir University, 34538, Istanbul, Turkey
Bahcesehir,
F. TUNC BOZBURA Department of Industrial Engineering, Bahcesehir University, 34538, Istanbul, Turkey
Bahcesehir,
Relational capital (RC) is a sub-dimension of the intellectual capital which is the sum of all assets that arrange and manage the firm's relations with the environment. It contains the relations with outside stakeholders (te. customers, shareholders, suppliers and rivals, the state, governmental institutions and society). Although the most important component of RC is customer relations, it is not the only one to be taken into consideration. Measuring the RC is related to how the environment perceives the firm. To control and manage this perception, the companies must measure it first. This study aims at defining a methodology to improve the quality of prioritization of RC measurement indicators under uncertain conditions. To do so, a methodology based on the extent fuzzy analytic hierarchy process (AHP) is applied. Within the model, main attributes, their sub-attributes and measurement indicators are defined. To define the priority of each indicator, preferences of experts are gathered using a pair-wise comparison based questionnaire.
1. Introduction Today, IC is widely recognized as the critical source of true and sustainable competitive advantage [1]. Knowledge is the basis of IC and is therefore at the heart of organizational capabilities. Successfully utilizing that knowledge contributes to the progress of society [2], Intellectual capital (IC) is the pursuit of effective use of knowledge (thefinishedproduct) as opposed to information (the raw material) [3]. See Fig. 1 [4] for an illustrative definition of IC. 395
396
INTELLECTUAL CAPITAL
T
HUMAN CAPITAL
ir
ORGANIZATIONAL CAPITAL
ir
RELATIONAL CAPITAL
Individual-level knowledge
Mission-vision
Customers
Competence
Strategical values
Customer's loyalty
Leadership ability
Working systems
Market
Risk-taking and problem
Culture
Shareholders
Solving capabilities
Management system
Suppliers
Education
Use of knowledge
Official institutions
Experience
Databases
Society
Fig. 1. Components of Intellectual Capital
Knowledge of environmental relationship defines the relational capital in the intellectual capital. The relational capital contains the relations with customers, shareholders, suppliers and rivals, the market, the state, governmental institutions and society. Although the most important component of RC is customer relations, it is not the only one to be taken into consideration. Measuring the RC is related to how the environment perceives the firm. The relational capital is the reflection of the firm. It is a knowledge database which has brands, customer loyalty scales, and the image in society, suppliers and customer feedback systems. Mc Kenna [5] states that, there are three steps to establish relations with the environment: a) To understand the market, b) To move with it, c) To establish relations. In the value chain, there is the obligation that the firms should establish relations with all the sections from the customer to the supplier. Many researches show that, being market-focused has an effect on the profit rate of the company and on the increase of the market share [6]. The relational capital, defines the relations of the elements that are in the value chain with the firm. It is obvious that, the essential criteria of the relational capital are related to customer and market. Except these criteria, the stockholders that are important elements of the firm environment, suppliers and society should be defined in the context of relational capital. In a research held in Canada industry, features like the growth rate, sales rate to permanent customers, customer loyalty, customer satisfaction, customer complaint rate, and market share are defined as the criteria of RC [7]. In another
397 research regarding the intellectual capital of companies in Sweden, features like the rate of re-purchasing, and market capital index are defined [8]. 2. Methodology of the study The problem has a subjective and intangible nature where the Analytic Hierarchy Process (AHP) is usually considered the most appropriate method. In this paper, Fuzzy AHP is preferred in the prioritization of relational capital indicators since this method uses a hierarchical structure among goal, attributes and alternatives. Usage of pair-wise comparisons is another asset of this method that lets the generation of more precise information about the preferences of decision makers. AHP method which is developed by Saaty [9] is uses pair-wise comparisons of the elements of each hierarchy by means of a nominal scale. Then, comparisons are quantified to establish a comparison matrix, after which the eigenvector of the matrix is derived, signifying the comparative weights among various elements of a certain hierarchy. Finally, the eigenvalue is used to assess the strength of the consistency ratio of the comparative matrix and determine whether to accept the information. There are several fuzzy AHP methods explained in the literature. In this paper, we prefer Chang's extent analysis method [10, 11] since the steps of this approach are relatively easier than the other fuzzy AHP approaches and similar to the conventional AHP. 3. A Hierarchical Model for Prioritization of RC Measurement Indicators The goal of this study is prioritization of relational capital indicators. According to [12], RC refers to the organization's establishment, maintenance, and development of public relations matters, including the degree of customer, supplier, and strategic partner satisfaction, as well as the merger of value and customer loyalty. The flexibility of the external organizational links, ethics in relations and efficiency of relation-channels have been decided as main attributes of the system. Flexibility of the external organizational links (FE) are characterized by four sub-attributes: Knowledge flow from customer (KF), knowledge flow from suppliers (KS), relations between the company and its shareholders (RCS) and relations between the company and society (RS). Ethics in relations (ER) is characterized by two sub-attributes: Social responsibility (SR) and business ethics (BE). Efficiency of relation-channels (EC) is
398 characterized by two sub-attributes: Easy and quick access to knowledge (EK) and reliability of relation-channels (RCh). In this research, RC measurement indicators can be categorized in three main groups as the indicators related to the customers, to the market and to the other elements of environment [13]. These three main groups can be summarized with nine indicators: IND1: Customer satisfaction; IND2: Environmental consciousness; IND3: Emphasizing customer request; IND4: Customer loyalty; IND5: Preference in competition, IND6: Being the sponsor for the social activities; IND7: Market and customers to be understood by employee; IND8: Efficient relations with shareholders; IND9: Participating social activities that are not sponsored. Fig. 2 shows the hierarchical structure of criteria. A group of experts consisting of academics and professionals are asked to make pair-wise comparisons for main and sub-attributes, and indicators. A questionnaire is provided to get the evaluations. The overall results could be obtained by taking the geometric mean of individual evaluations. However, since the group of experts came up with a consensus by the help of the Delphi Method in this case, a single evaluation could be obtained to represent the group's opinion.
4. Conclusion In this paper, the authors proposed a Fuzzy AHP method to prioritization of relational capital measurement indicators under uncertain conditions. The model proposed in this study consists of three main attributes, eight sub-attributes, and nine indicators. The model is verbalized in a questionnaire form including pairwise comparisons. After analyzing the results, it is decided that indicator 4 (ie. Customer Loyalty) is the most important indicator for relational capital measurement. The sequence of the next two indicators according to their importance weights is as follows: IND1: Customer satisfaction, and IND2: Environmental consciousness. For further research, other fuzzy multi-criteria evaluation methods like fuzzy TOPSIS or fuzzy outranking methods can be used and the obtained results can be compared with the ones found in this paper.
Selection of the most efficient indicators
RCh
IND1
Fig. 2. Hierarchical structure of criteria
IND2
IND9
400
References 1.
B. Marr, G. Schiuma and A. Neely, International Journal of Business Performance Management 4, 279 (2002).
2.
A. Seetharaman, K.L.T. Low and A. S. Saravanan, Journal of Intellectual Capital 5, 522 (2004).
3.
N. Bontis, Management Decision 36,63(1998).
4.
F. T. Bozbura and A. Beskese, in Proceedings of the IIth International Fuzzy Systems Association World Congress (IFSA 2005), 1756 (2005).
5.
R. McKenna, The Regis Touch. Addison-Wesley, Massachusetts, USA (1986).
6.
J. C. Narver, and S. F. Slater, Journal of Marketing 54, 20 (1990).
7.
M. Miller, B. DuPont, V. Fera, R. Jeffrey, B. Mahon, B. Payer, and A. Starr, International Symposium: Measuring and Reporting Intellectual Capital, Amsterdam, Holland, June 9-10, (1999).
8.
U. Johanson, M. Martensson and M. Skoog, International Symposium: Measuring and Reporting Intellectual Capital, Amsterdam, Holland, June 910, (1999).
9.
T. L. Saaty, The analytic process: Planning, priority setting, resources allocation, McGraw-Hill, London, (1980).
10. D-Y. Chang, European Journal of Operational Research 95, 649 (1996). 11. D-Y. Chang, Optimization Techniques and Applications, Vol. 1, World Scientific, Singapore, 352 (1992). 12. P. Y. Chu, Y. L. Lin, H. Hsiung and T. Y. Liu, Technological Forecasting & Social Change, Forthcoming. 13. F. T. Bozbura, The Learning Organization: An International Journal 11, 357 (2004).
MULTICRITERIA MAP OVERLAY IN GEOSPATIAL INFORMATION SYSTEM VIA INTUITIONISTIC FUZZY AHP METHOD TOLU SILAVI MOHAMMAD REZA MALEK MAHMOUD REZA DELAVAR Center of Excellence in Geomatics Eng. and Disaster Management, Dept. of Surveying and Geomatics Eng., Engineering Faculty, University of Tehran, Tehran, Iran. tsilavi@ut. ac. ir, malek@ncc. neda. net. ir, mdelavar@ut. ac. ir
Decision making within the real-world inevitably includes the consideration of evidence based on several criteria, rather than a preferred single criterion. Solving a multi-criteria decision problem offers the decision maker a recommendation, in terms of the best decision alternatives. Within the framework of this article we are attempt to reach a new method in making the comparison matrices in AHP approach to consider some aspects of uncertainties in process of multi criteria decision making. To do this, rules and logic of intuitionistic fuzzy is applied. We provide numerical illustrations of using four major criteria, namely population, age of construction, type of construction and mean number of floors in a census block of an urban region. The result is earthquake vulnerability map which presented with two raster layers, where the degree of membership and the degree of non-membership are the values of each layer so the regions with high membership degree in first map and low membership degree in second map can be determined as high vulnerable regions with more reliability degree.
1. Introduction Geospatial information system (GIS) is a decision support system involving the integration of spatially referenced data in a problem solving environment [4]. One of the basic classes of operations for spatial analysis is attribute operations which are operations on one or more attributes of multiple entities that overlap in space [3]. An overlay procedure generates a new layer (output layer) as a function of two or more input layers. Specifically, the attribute value assigned to every location (or set of locations) on the output layer is a function of the independent values associated with that location in the input layers. Overlay operations may involve any combination of points, lines, areas or pixels [10]. The ultimate aim of GIS is to provide support for making spatial decision so the 401
402 GIS capabilities for supporting spatial decisions can be analyzed in that context of the decision-making process. Simon suggests that any decision making process can be structured into three major phases: intelligence (is there a problem or an opportunity for change?), design (what are the alternatives?) and choice. One of the stages of design is criteria weighting that at this stage, the decision maker's preferences with respect to the evaluation criteria are incorporated into the decision model. One of the most famous and usable approach for doing this stage is the analytical hierarchy process (AHP). The AHP elicits a corresponding priority vector interpreting the preferred information from the decision-maker, based on the pair wise comparison values of a set of objects. Levary and Wan [8] clearly state the need to consider uncertainty within AHP: "Since in most cases it is unrealistic to expect that the decision maker will have either complete information regarding all aspects of the decision making problem or full understanding of the problem, a degree of uncertainty will be associated with some or all of the pair-wise comparisons" [8]. One of the most important source of uncertainty involved in this area, is contradictory of experts' opinion. That means, determining the preferences are imperfect. Using of intutionistic fuzzy (IF) method in such a case is reasonable because on the one hand it supports linguistic variables related to doubt and hesitancy and on the other hand, it could manage given contradictory reasons [2, 11]. Therefore, a multi-criteria decision making approach based on intuitionistic fuzzy AHP has been implemented in order to solve a map overlay problem in GIS to find earthquake vulnerability of an urban region with some simulated data. In this area, there are some spatial criteria which have effect on urban earthquake vulnerability and can be shown as a raster map layers [1]. In chapter 2 of this paper principles of AHP and fuzzy AHP have been stated, chapter 3 considers the intuitionistic fuzzy concepts as a tool for modeling some aspects of uncertainty in AHP, such as contradictory of decision maker's opinions. Chapter 4 is concerned on implementing the fuzzy AHP and presented intuitionistic fuzzy approach in chapter 3; in order to weighting the factors determine earthquake vulnerability. 2. Basic Principle of AHP and Fuzzy AHP The AHP, originally developed by Saaty [14, 15], is a widely applied multicriteria decision making tool which utilizes the concept of pair-wise comparisons to arrive at a scoring and rank ordering of the alternatives under consideration. The decision maker provides a subjective cardinal judgment
about the intensity of his preference for each alternative over other alternatives under each of a number of criteria or properties [19]. In AHP, three important components which are the aim, criteria and alternatives have been represented hierarchically. There are many approaches for extracting weights from this hierarchy that all of them use a pair-wise comparison matrix which is filled by decision maker. As mentioned above, there are aspects of uncertainties in expressing the preferences during pair-wise comparison. This paves the path for the incorporation of fuzzy logic in the AHP [6]. All the laws in traditional AHP can be used for fuzzy AHP but pair-wise comparisons must be expressed as fuzzy numbers which are in most cases triangular. In order to calculate the weights from such a pair-wise comparison matrix, using the operations on triangular fuzzy numbers are necessary. Here are the few basic fuzzy arithmetic operations on triangular fuzzy numbers: Let A=(la, ma, ua) and B=(lb, mb, Ub) be the two triangular fuzzy numbers, • A+B=( la+lb, ma+mb, ua+ub) • A-B=( la-lb, ma-mb, ua-ub) • AB=( lalb, mamb, uaub) • A/B=( la/ ub, ma/mb, ua/lb) After obtaining the fuzzy performances, the ultimate aim will be gotten the final results in crisp form. Therefore, the fuzzy performance matrices are transformed into interval performance matrices using the a-cut concept, a-cuts will yield an interval set of values from a fuzzy number. For example an a=0.5 will yield a set 010.5= [0.3, 0.7] for a triangular fuzzy number which the start and end point of that are 0.1 and 0.9. Now the crisp performance matrix is obtained by applying the X, the optimum index. Optimum index is applied over the interval performance set as shown in equation (1) resulting in a crisp performance C^. Cx=Xy.pra+(\-X)y.p,a
,^
Where C\ is crisp performance of a fuzzy performance p after extracting the [pia, Pra] as a one a-cut of that. The optimum index is a number between 0 and 1. In most cases it would be 0, 0.5 and 1 [12]. 3. Uncertainty modeling in AHP using intuitionistic fuzzy logic As mentioned above, since pair wise comparison values are the judgments obtained from an appropriate semantic scale, in practice, the decision-maker usually gives some or all pair-to-pair comparison values with an uncertainty
404 degree rather than precise ratings. Out of several higher-order fuzzy sets, intuitionistic fuzzy sets introduced by Atanassov [2] have been found to be well suited to dealing with vagueness. The traditional fuzzy logic has two important deficiencies. First, to apply the fuzzy logic, we need to assign, to every property and for every value, a crisp membership function. Second, fuzzy logic does not distinguish between the situations in which there is no knowledge about a certain statement and a situation that the belief to the statement in favor and against is the same. Due to this fact, it is not recommended for problems with missing data and where grades of membership are hard to define [11]. According to the above mentioned arguments, it is expected that intuitionistic fuzzy sets could be used to simulate human decision making processes and any activities requiring human expertise and knowledge, which are inevitably imprecise or not totally reliable[9]. Therefore, it is attempted to developing the AHP, as a multi-criteria decision making approach to consideration of hesitancy in decision maker's opinions. Start point of this method is supposing that there is a finite set of criteria or alternativesX ={xt,...,x \. The set R can be defined as a set of all the possible pairs from members of X. A membership degree and a nonmembership degree can be determined for all these pairs by a decision maker. The concept of set R is consisting of pairs in them the first element has the most preference on the second element. In such a case the membership and nonmembership degrees are 1 and 0, respectively. At the next stage, two set for membership and non-membership degrees which are similar to fuzzy binary preference relations [7] for each member of this set have been produced. Then summing up all the degrees which are related to preference and non-preference of each criteria to the others and dividing the results to n-1 are performed. So for all the criteria, there are two weights indicating the preference and nonpreference of that criteria relative the others. The ultimate results of these weights after using of them on an overlay analysis are two output maps which can indicate the preference index and non preference index of each region to a specific class. 4. Implementing the fuzzy and intuitionistic fuzzy AHP for earthquake vulnerability mapping Natural disasters are extreme events with in the Earth's system (lithosphere, hydrosphere, biosphere and atmosphere), resulting in death or injury to humans, and damage or loss of goods, such as buildings, communication systems, agricultural land, forest and natural environment. Mitigation of natural disasters
can be successful only when detailed knowledge is obtained about the expected frequency, character and magnitude of hazardous events in an area. Many types of information that are needed in natural disaster management have an important spatial component such as maps [16]. The risk of each hazard consists of two components: hazard and vulnerability [18]. According to this we find that, in order to reach a risk map, preparing a reliable vulnerability map is essential equipments. Since most of the criteria which have affected on earthquake vulnerability have spatial characteristics, therefore, assessment of earthquake vulnerability requires the spatial distribution of these parameters [1]. In GIS each criteria can be described by a map layer which can be presented directly or indirectly in raster format [3]. Consequently, criteria integration procedure would be done by overlay analysis on weighted layers (see e. g. [5, 17]). Table 1 shows a pair-wise comparison matrix for fuzzy AHP, in order to weight the four most popular parameters which have affect on earthquake vulnerability. These parameters are mean number of population in each block, age of buildings in each block, type of constructions and mean number of floors [1]. A, B, C and D have been respectively assigned to the parameters. These preferences have been determined based on scale ratings which are presented in traditional AHP, by a decision maker as introduced in Table 2. In other word these are personal opinions of him/her about preferences of each criteria to others. Table 1. Pair-wise comparison matrixes for fuzzy AHP E hc f »" a , k e vulnerability A B C D
A
B
0,1,1) (1/9, 1/8, 1/7) (1/4, 1/3, 1/2) (1/5, 1/4, 1/3)
(3,4,5) 0,1,1) (1/3, 1/2, 1) (1/4,1/3, 1/2)
C
D
(7, 8, 9) (1,2,3)
(2, 3, 4) (1/6, 1/5, 1/4) (2,3,4)
(1,1,1) (4,5,6)
(1,1,1)
Table 2. The AHP pair-wise comparison continuous rating scale (The numbers between them can be used too) [13], More important Extremely 9
Strongly 5
Moderately 3
rt
, Moderately 1 1 / 3
Strongly 1/5
'
Extremely 1/7
1/9
In the case of intuitionistic fuzzy AHP, for parameters A, B, C and D the set R can be produced as follow:
R= {(A, B), (A, C), (A, D), (B, A), (B, C), (B, D), (C, A), (C, B), (C, D), (D, A),(D, B),(D, C )} This set is produced with this assumption that each criteria has no preference on itself, so the (A, A), (B, B), (C, C) and (D, D) have not been included. The sets PRE and NPR, as shown below, consist of membership and non-membership degrees of each pair to set R, which are personal opinions of a decision maker. PRE= {0.80, 0.95, 0.45, 0.17, 0.22, 0.09, 0.02, 0.70, 0.35, 0.45, 0.84, 0.55} NPR= {0.17, 0.02, 0.45, 0.80, 0.70, 0.84, 0.95, 0.22, 0.55, 0.45, 0.09, 0.35} Mean of the all the preferences and non-preferences for each criteria must calculated separately. Table 3 shows the result of these processes and the results from fuzzy AHP with a=0.6 and A.=0.5. Table 3. The weights resulted from intuitionistic fuzzy AHP and fuzzy AHP Intuitionistic fuzzy AHP Fuzzy AHP Criteria A B C D
Preferences {0.80,0.95,0.45} {0.17,0.22,0.09} {0.02, 0.70, 0.35} {0.45, 0.84, 0.55}
Weights 0.72 0.16 0.36 0.61
Non-Preferences {0.17,0.02,0.45} {0.80,0.70, 0.84} {0.22,0.95,0.55} {0.45, 0.09, 0.35}
Weights 0. 0. 0. 0.
w
. 0.528 0.106 0.147 0.217
The main difference of fuzzy AHP and used intuitionistic fuzzy concepts are in preference numbers which are used in them. In fuzzy AHP, numbers are in rating scale which has been presented in traditional AHP, described in Table 3, however, in intuitionistic fuzzy of that, preferences have presented as fuzzy binary numbers. Figures 1 and 2 present the vulnerability maps which are result of weighted overlay analysis on four raster map layers. In this paper one district of city of Tehran has been considered for case study. Figure 1 show this by applying preferences and Figure 2 is the same however, with applying the nonpreferences. The results like these maps, help to decision makers to make decisions with more reliability as soon as possible. In these two raster map, regions with high membership degree in Figure 1 and low membership degree in Figure 2, can be determined as high vulnerable regions with high consistency (see regions inside circles in figure 1 and 2).
5. Conclusions and further works Intuitionistic fuzzy AHP method is an effective method to emerge different experts' viewpoints and to deal with interoperability of different systems, combining different data sets. Our result shows, it would be better for the disaster manager to make more reliable decisions with uncertain data and idea about preferences of effective parameters on earthquake risk and vulnerability. The AHP as a popular approach in multi-criteria decision making processes has been selected and a procedure to combine intuitionistic fuzzy concept with that has been considered. The result of this method has been compared with fuzzy AHP however these techniques are inherently different, because the preference numbers used in them are different. Combination of the given IF risk map with an intuitionistic knowledge base system is our future work.
Figure 1. The left is earthquake vulnerability map resulted from preference weights in intuitionistic fuzzy AHP, based on table 3 and the right is earthquake vulnerability map resulted from nonpreference weights in intuitionistic fuzzy AHP, based on table 3.
References 1. R. Aghataher, M. R. Delavar and N.Kamalian: Weighing of contributing factors in vulnerability of cities against earthquakes, Proceeding of Map Asia, Jakarta, Indonesia (2005). 2. K. Atanassov, Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20: 87- 96 (1986). 3. P. A. Burrough and R. A. McDonnell, Principles of Geographical Information Systems, Oxford University Press (1998). 4. D. J. Cowen, GIS versus CAD versus DBMS: what are the differences?, Photogrammetric Engineering and Remote Sensing, 54:1551-4 (1988). 5. M. N. Demers: GIS Modeling in Raster, John Wiley & Sons, Inc (2002). 6. H. Deng: Multicriteria Analysis with fuzzy pairwise comparison, International Journal of Approximate Reasoning, 21: 215-231 (1999).
408 7. F. Herrera, L. Martinez and P. J. sanchez: Managing non-homogeneous information in group decision making, European Journal of Operational Research, 166: 115-132 (2005). 8. R. R. Levary and K. Wan, A Simulation Approach for Handling Uncertainty in the Analytical Hierarchy Process, European Journal of Operational Research, 106(1): 116- 122 (1998). 9. D. F. Li: Multi-Attribute Decision Making Models and Methods Using Intuitionistic Fuzzy Sets, Journal of Computer and System Sciences, 70: 7385 (2005). 10. J. Malczewski: GIS and Multi-criteria Decision Analysis, John Wiley & Sons, (1999). 11. M. R. Malek and F. Twaroch: An Introduction to Intuitionistic Fuzzy Spatial Region, Geolnfo Series 28a, Proc. of the ISSDQ'04, Vienna (2004). 12. T. N. Prakash: Land Suitability Analysis for agricultural Crops: A Fuzzy Multicriteria Decision Making Approach, Master Thesis, International Institute for Geo-information Science and Earth Observation, Enschede, Netherland, (2003). 13. K. Rashed and J. Weeks: Assessing vulnerability to earthquake hazards through spatial multi-criteria analysis of urban areas, International Journal of Geographic Information Science, 17( 6): 547-576 (2003). 14. T. L. Saaty: A Scaling Method for Priorities in Hierarchical Structures, Journal of Mathematical Psychology, 15: 37- 57 (1977). 15. T. L. Saaty and L.G. Vargas: Uncertainty and Rank Order in the Analytical Hierarchy Process, European Journal of Operational Research, 32: 107117(1987). 16. A. Skidmore: Environmental Modeling with GIS and Remote Sensing, Taylor & Francis, (2002). 17. C. D. Tomlin: Geographic Information System and Cartographic Modeling. Englewood Cliffs, NJ: Pertice Hall (1990). 18. UN: Mitigating Natural Disasters: Phenomena, Effects, and Options: a Manual for Policy Makers and Planners (New York: UNDRO (United Nations Disaster Relief Organization)) (1991). 19. R. C. Van den Honert: Stochastic Group Preference Modeling in the Multiplicative AHP: A Model of Group Consensus, European Journal of Operational Research, 110: 99- 111 (1998).
A CONSENSUS MODEL FOR GROUP DECISION MAKING IN HETEROGENEOUS CONTEXTS
LUIS M A R T I N E Z , F R A N C I S C O MATA Dept. of Computer Science University of Jaen 23071 - Jaen, Spain Email:martin,fmata@ujaen. es ENRIQUE HERRERA-VIEDMA Dept. of Computer Science and University of Granada 18071 - Granada, Spain Email:vieda@decsai. ugr. es
A.I.
The consensus process in Group Decision Making (GDM) problems helps to achieve solutions that are shared by the different experts involved in such problems. Due to the fact that in GDM problems different experts take part in the decision process is common that they need to express their information in different domains 5 ' 2 . In this contribution we focus on GDM problems defined in heterogeneous contexts with numerical, linguistic and interval valued information. And our aim is to define a consensus model that includes an Advice Generator to assist the experts in the consensus reaching process of GDM problems with heterogeneous preference relations. This model will provide two important improvements: (i) Firstly, its ability to cope with group decision-making problems with heterogeneous preference relations, and, (ii) secondly, the figure of the moderator, traditionally presents in the consensus reaching process, is replaced by an advice generator, and in such a way, the whole group decision-making process can be easily automated.
1.
Introduct ion
In GDM problems are carried out two processes before obtaining a final solution 3'4'6-8: the Consensus Process and the Selection Process (see Fig. 1). The first one refers to how to obtain the maximum agreement between the set of experts on the solution set of alternatives. Normally this process is guided by a human figure called moderator 4 ' 8 . The second one obtains the solution set of alternatives. In the literature has shown that in GDM problems could be necessary 409
410 CONSENSUS PROCESS .
ADVICES
PREFERENCES UNDER CONSENSUS
PROBLEM
EXPERTS' GROUP
SETOF
I MODERATOR!
SELECTION J PROCESS J
ALTERNATIVES PREFERENCES
Figure 1.
SOLUTION SET OF ALTERNATIVES
Resolution process of a group decision-making problem
or suitable that the experts can express their knowledge in different expression domains such as numeric, linguistic and/or interval ones and different Selection Processes 2 ' 5 have been proposed to solve them, but there are not defined specific consensus processes for this type of problems. Consequently, in this contribution we focus on the Consensus Process on heterogenous GDM problems. The consensus is defined as a state of mutual agreement among members of a group where all opinions have been heard and addressed to the satisfaction of the group 10 . The consensus reaching process is defined as a dynamic and iterative process composed of several rounds, where the experts express and discuss about their opinions. Traditionally this process is coordinated by a human moderator, who computes the agreement among experts in each round using different consensus measures 9 , r . If the agreement is not acceptable then the moderator recommends to the experts to change their furthest opinions from the group opinion in an effort to make their preferences closer in the next consensus round 1 , n . The moderator is usually a controversial figure because the experts complaints about his lack of objectivity and additionally in heterogeneous contexts it is difficult for him to understand all the different domains and scales in a proper way. Therefore, the aim of this contribution is to present a consensus model for GDM problems such that: • The experts can express their preferences by means of linguistic, numerical or interval-valued preference relations. • The moderator tasks are carried out by means of an automatic advice generator. The rest of the paper is set out as follows. The scheme of an heterogenous GDM problem is described in Section 2. The intelligent consensus model is presented in Section 3. Finally, in Section 4 we draw our conclusions.
411 2. A Heterogeneous GDM Problem A group decision-making (GDM) problem may be defined as a decision situation where there are X = {x\, x-i,..., xn} (n > 2), a finite set of alternatives, and a group of experts, E = {ej, e<2,..., e m } (m > 2); each expert ej provides his/her preferences on X by means of a linguistic preference relation, fipe : X x X —> D, where D is the expression domain used by the expert e; to provide their preferences. The ideal situation in a GDM problem is that all the experts have a precise knowledge about the alternatives and provide their opinions in a numerical precise scale. However, in some cases, experts may belong to distinct research areas and have different levels of knowledge about the alternatives. A consequence of this is that preferences can be expressed by means of numbers, interval values or linguistic terms, so D G {N\I\L}. In this contribution, we deal with heterogeneous GDM problems, i.e., GDM problems where each expert ej may express his/her opinions on the set of alternatives using different expression domains D, G {N\I\L}, by means of a preference relation P e , = (p] ), where p\ £ D* represents the preference of alternative Xj over alternative Xk for that expert. fp}1 ••
p}n\
W1 •••vT) This type of context implies the necessity of adequate tools to manage and model heterogeneous information 5 . 3. A Consensus Model for Heterogeneous GDM problems In this section are presented a consensus model for GDM problems defined in heterogeneous contexts that automates the moderator's functions (see Fig. 2) that is developed in four phases : (1) Making the information uniform: it unifies all the different preferences into a single domain. (2) Computing consensus degrees: these values measure the agreement amongst all the experts. (3) Checking the agreement: these values are used to learn how close the collective and individual expert's preferences are. (4) Generating Advices: an automatic advice generator guides the experts in order to improve the consensus recommending which opinions should change.
PREFERENCES UNDER CONSENSUS
EXPERTS'
SELECTION PROCESS
GROUP ALTERNATIVES
SOLUTION SET OF ALTERNATIVES HETEROGENEOUS PREFERENCES
Figure 2. Resolution process of a heterogeneous group decision-making problem
The above model will be described in detail in the following subsections. 3.1. Making
the information
Uniform
We must keep in mind we are dealing with heterogeneous contexts composed by numerical, interval valued and linguistic information. So, we need to unify the heterogeneous information into a common utility space to operate on it easily. To do so we propose the use of the process proposed in 5 that transforms the heterogeneous input values into fuzzy sets on a linguistic term set, ST = {so,...,sg}. So, each numerical, interval-valued and linguistic evaluation, is transformed into a fuzzy set in ST, F(ST)'TD:D->
F{ST)
TDST(p-j
= {(ch,a£)/h
= 0,...,g}
where at least 3cp£ > 0
After this unification process and assuming that each fuzzy set will be represented by means of its membership degrees ( a ^ 0 , . . . , a^ ), the preference relation of each expert, Pei, whose elements are fuzzy sets:
/ Pe,•
11 _
Pi
/
11
— \ai0 ' • . ,a.ig)
„ l n _ (a}Z,..., c-ln
a%)\
=
\P
3.2. Computation
,nl
Wo1,
a"1)
of Consensus
(ann iO
, •
-<")/
Degrees
The consensus degree measures the agreement among all the experts. To compute these degrees it is necessary to compute a consensus matrix obtained aggregating the distance among the experts preferences, comparing one with each other. The distance between two experts, ej, ej, is computed using distance matrices, DMij = {<&). The values d\k, express the distance
between two preferences plk, plk and are calculated as:
< = rf(pf,pf) = l -
cv\k — cvlk (1)
where cvlk is the central value of the fuzzy set that represents the preference, p\k that is calculated as:
OT« =
s?-o^qp-«S t being index{sij) = .
(2)
The computation of the consensus degrees is carried out as follows: (1) To compute the central values for each p]1: cvf; V i = l,...,m;
I, k = 1,... ,n A / ^ k.
(3)
k
(2) To compute distance matrix DMij = (d\ ) for each pair of experts: d^=d(pt,plk).
(4)
lk
(3) A consensus matrix, CM = (cm ), is obtained by aggregating all the distance matrices at the level of pairs of alternatives: cmlk = 4>{dfj); i,j-l,...,m
A V l,k = 1,... ,n A i < j .
Where & is an aggregation operator. This matrix CM is used to compute the consensus degrees. (4) Computation of consensus degrees: This computation are carried out at three levels: (a) Consensus on pairs of alternatives, cplk: it measures the agreement on the pair of alternatives (xi, Xk) amongst all the experts: cplk = cmlk,
Vl,k=l,...,n
A
l^k.
lk
The closer cp to 1, the greater the agreement. (b) Consensus on alternatives, cat: it measures the agreement on an alternative xi amongst all the experts:
ca^^cmn (c) Consensus on the relation, cr: it measures the global consensus degree amongst the experts' opinions:
•r-ffilSi. n
(6)
3.3. Checking
the
Agreement
The consensus model controls the agreement in each discussion round. Before starting the model, a consensus threshold, 7 e [0,1], is fixed, which will depend on the particular problem we are dealing with. When the consensus measure or reaches 7 the consensus process is ended and the selection process will be applied to obtain the solution. Additionally a parameter, Maxcycles, controls the maximum number of discussion rounds.
3.4. Generating
Advices
When the agreement is not good enough, or < 7, the experts should modify their preferences to increase the agreement. To do so, this model computes which are the experts furthest from the collective opinion (proximity measures) and will generate advices for them recommending which and how do they change their preferences. Both processes are presented in detail.
3.4.1. Computation
of Proximity
Measures
Proximity measures evaluate the agreement between the individual experts' opinions and the group opinion. Thus, firstly a collective preference relation, P e c = (Pcfc)is calculated aggregating the individual preference relations { P e , = (p-*);i = l , . . . , m } : Plc — *l>{p[k, • • • ,Pm) with ip an "aggregation operator" We use the equation (1) to measure the agreement between each individual expert's preferences, P e , , and the collective preferences, P e c . Therefore, the measurement of proximity is carried out in two steps: (1) A proximity matrix, PMi = (pmf), for each expert e*, is obtained with pmf = d{pf,p1*). These matrixes will be used to compute the proximity measures. (2) Computation of proximity measures at different levels: (a) Proximity on pairs of alternatives pplf: it measures the proximity between the preferences, on each pair of alternatives, of the expert, ej, and the group: ppf=pmf,
\fl,k=l,...,n 1
A I J= k
(b) Proximity on alternatives, pa : it measures the proximity between the preferences, on each alternative, xi, of the ex-
pert, ej, and the group:
(c) Experts's proximity, pe*: it measures the global proximity between the preferences of each expert, ej, and the group:
*e, = SkM
(8)
n If the above values are close to 1 then they have a positive contribution for the consensus to be high, while if they are close to 0 then they have a negative contribution to consensus. 3.5. Advice
Generator
Finally, the consensus model will generate advices automatically in order to increase the agreement indicating who and how should change his/her opinions. This generation is carried out as: • To identify the experts furthest from the agreement: percentage. • To identify which alternatives must be changed, those alternatives whose consensus degree cal < 7. • To identify the pairs of alternatives that must be changed. Once has been identified the expert e^ and the alternatives xi to change, all the pairs p' fc (k = 1, ...,n) such that pplk < 0 must be changed. The parameter 0 is a proximity threshold that helps to choose which are the furthest alternatives from the collective opinion. • Changing Direction Rules (CDR): finally the advice generator computes if the values of the pair of alternatives to change should increase or decrease. Taking account that p' fc are fuzzy sets the advice generator defines two direction parameters ml or main and si or sencondary. These parameters are used so to experts (eml, esl) as for the collective opinion (cml,csl). Each parameter are the value and position of the two highest membership values of the expert's preference (emlpoS,emlvai,eslpOS,eslVai) and the collective preference (cmlpOS,cmlvai,cslpOS,cslvai). This parameters are used by the following direction rules: D R . l . IF emlpos > cmlpos THEN ej should decrease the value of p' fc . DR.2. If emlpos < cmlpos THEN e,- should increase the value of p\k. DR.3. If emlpos = cmlp0s THEN DR.l and DR.2 but with emlvai and cmlvai-
DR.4. IF {emlpos = cmlpos AND emlval but with si.
= cmlvai),
THEN DR.l and DR.2
4. C o n c l u d i n g R e m a r k s A consensus model to manage the consensus process of heterogeneous GDM problems has been presented. There are two main features of this model: (i) it is able to manage consensus processes in problems where experts may have different levels of background or knowledge to solve the problem, and (ii) it is able to generate advices on the necessary changes in the experts' opinions in order to reach consensus, which makes the figure of the moderator, traditionally present in the consensus reaching process, unnecessary. References 1. N. Bryson. Group decision making and the analytic hierarchy process: Exploring the consensus-relevant information content. Computers and Operational Research, 23:27-35, 1996. 2. M. Delgado, F. Herrera, E. Herrera-Viedma, and L. Martmez. Combinig numerical and linguistic information in group decision making. Information Sciences, 107:177-194, 1998. 3. J. Fodor and M. Roubens. Fuzzy preference modelling and multicriteria decision support. Kluwer Academic Publishers, Dordrecht, 1994. 4. F. Herrera, E. Herrera-Viedma, and J.L. Verdegay. A model of consensus in group decision making under linguistic assessments. Fuzzy Sets and Systems, 79:73-87, 1996. 5. F. Herrera, L. Martinez, and P.J. Sanchez. Managing non-homogeneous information in group decision making. European Journal of Operational Research, 166(1):115-132, 2005. 6. E. Herrera-Viedma, F. Herrera, and F. Chiclana. A consensus model for multiperson decision making with different preference structures. IEEE Transactions on Systems, Man and Cybernetics-Part A, 32:394-402, 2002. 7. E. Herrera-Viedma, F. Mata, L. Martinez, F. Chiclana, and L.G. Perez. Measurements of consensus in multi-granular linguistic group decision making. Lecture Notes in Artificial Intelligence, Vol. 3131, 3131:194-204, 2004. 8. J. Kacprzyk, H. Nurmi, and M. Fedrizzi. Consensus under Fuzziness. Kluwer Academic Publishers, 1997. 9. L.I.Kuncheva. Five measures of consensus in gdm using fuzzy sets. In Proc. IFSA 91, pages 141-144, 1991. 10. S. Saint and J.R. Lawson. Rules for Reaching Consensus. A Modern Approach to Decision Making. Jossey-Bass, San Francisco, 1994. 11. S.Zadrozny. Consensus under Fuzziness, chapter An Approach to the Consensus Reaching Support in Fuzzy Environment, pages 83-109. Kluwer Academic Publishers, 1997.
A LINGUISTIC 360-DEGREE PERFORMANCE APPRAISAL EVALUATION MODEL
R. DE ANDRES Dep. de Fundamentos del Andlisis Economico e H. I. E., Universidad de Valladolid, Avda. Valle de Esgueva 6, 47011 Valladolid, Spain E-mail: [email protected] J. L. GARCIA-LAPRESTA Dep. de Economia Aplicada (Matemdticas), Universidad de Valladolid, Avda. Valle de Esgueva 6, 47011 Valladolid, Spain E-mail: [email protected] L. MARTINEZ Dep. de Informdtica, Universidad de Jaen, Campus Las Lagunillas s/n, 23071 Jaen, Spain E-mail: [email protected]
Performance appraisal is a process used for some firms in order to evaluate the efficiency and productivity of their employees for planning their promotion policy. Initially this process was carried out just by the executive staff, but recently it has evolved to an evaluation process based on the opinion of different reviewers, supervisors, collaborators, clients and the employee himself (360-degree method). In such a evaluation process the reviewers evaluate some indicators related to the employee performance appraisal. These indicators are usually subjective and qualitative in nature that implies vagueness and uncertainty in their assessment. However, most of performance appraisal models force reviewers to provide their assessments about the indicators in a unique precise quantitative domain. We consider this obligation drives to a lack of precision in the final results, so in this paper we propose a linguistic evaluation framework to model qualitative information and manage its uncertainty. Additionally due to the fact that there are different sets of reviewers taking part in the evaluation process that have a different knowledge about the evaluated employee it seems suitable t o offer a flexible framework in which different reviewers can express their assessments in different linguistic domains according to their knowledge. The final aim is to compute a global evaluation for each employee, that can be used by the management team to make their decisions regarding their incentive and promotion policy.
417
1. I n t r o d u c t i o n One of the main challenges of companies and organizations is the improvement of productivity and efficiency. Performance appraisal is essential for the effective management and evaluation of corporations. Recently more and more companies are trying to increase their productivity through the human performance measurement. Performance appraisal is used for the evaluation of employees estimating their contributions to the goals of the organization, behavior and results. This evaluation process has been accomplished from different points of view can be found in 2, 4, 6, 7, 10, 11, 15, 17 and 18. In classical performance appraisal methods just supervisors evaluated employees. However, corporations are adopting new methods that use information from different people (reviewers) connected with each evaluated worker. In fact, the 360° appraisal or integral evaluation is a methodology for evaluating worker's performance that includes the opinions of supervisors, collaborators, clients and himself (see 9 and 16). Then, each reviewer from the different reviewers collectives (supervisors, collaborators, clients, employee) evaluates indicators used for measuring the performance appraisal of the evaluated worker. Usually these indicators have a qualitative nature and involves uncertainty. However most of evaluation process force the reviewers to manifest their assessments in a unique quantitative precise scale (see 3). Finally the method generates a global evaluation value according to all the indicators and all the reviewers aggregating their assessments. The use of a precise scale to express qualitative information can produce a lack of precision in the assessments provided by the reviewers due to the difficulty of expressing uncertain knowledge in a precise way. In the literature the use of the Fuzzy Linguistic Approach 1 9 to model and manage the qualitative and uncertain information has provided successful results 1,8
Although, it could be logical that in the initial performance appraisal methods there was a unique expression domain for all the reviewers (supervisors) that belonged to the same collective the addition of new reviewers collectives (collaborators, clients, etc) implies that different collectives can have totally different knowledge about the evaluated worker. So each collective or even more each reviewer could express his assessments in different expression domains 1 2 , 1 3 . Therefore the aim of this paper is provide a performance appraisal
419 method that take into account the above problems. Hence the aim of this contribution is to develop a linguistic evaluation method in which different linguistic expression domains can be used by the reviewers to express their assessments. Subsequently, in order to aggregate all the opinions, it is necessary to unify them in a common domain. In this way, the proposed method will conduct each linguistic label provided by reviewers as a fuzzy set in the common domain to compute collective assessments that will allow to the management team to make the final decision. Thus, the problem falls, in a natural way, into the collective decision making context. The paper is organized as follows. Section 2 is devoted to introduce the notation and the structure of the arisen problem. In Section 3 we introduce the multi-granular linguistic evaluation model. Finally, some concluding remarks are included in Section 4.
2. P r e l i m i n a r i e s In this section is introduced a scheme for a 360° evaluation methodology for performance appraisal evaluation and afterwards is showed a classical evaluation method for it. The aim of this problem is to evaluate the employees taking into account the opinions of different collectives related to them. We now present the main features and terminology we consider for the arisen problem. • A set of all employees X = {x\,... following collectives:
,xn}
to be evaluated by the
— A set of supervisors (executive staff): A = {ai,..., ar}. - A set of collaborators (fellows): B = {bi,..., bs}. - A set of clients (customers): C = { c i , . . . , ct}. — X (the opinion of each employee about himself can be taken into account). • Different criteria: Y\,..., Yp. • The assessments of a« £ A, bj £ B and Cj £ C on the employee Xj according to the criterion V*: o'fc, 6*fc and cj*, respectively. Moreover, ar? is the assessment of Xj on himself with respect to Yk- Therefore, there are (r + s + t + l ) p assessments for each employee provided by the different collectives. In the classical approach to performance appraisal, the assessments are usually numerical 7 . However, in this paper we will consider multi-granular
linguistic assessments. However, at this stage we do not specify neither the assessments nature nor the domain expressions. To obtain a global evaluation value of each employee, carried out an evaluation method with the following steps: (1) Computing reviewers collective criteria values, V^_{XJ): for each reviewers collective are aggregated their assessments about a given criterion Yk, by means of an aggregation operator, u*, that can be different for each reviewers collective: vkA(xj) = < « , • • • > < )
VUXJ)
= «c( c "' • • •' cf)
vkB(Xj)
=
ukB{bf,...,bf)
v
j (xi) = Ak
k
(2) Computing global criteria values, v (xj): the previous collective assessments, V^_(XJ), are aggregated by means of an aggregation operator uk obtaining a global value for each criterion Y^: vk(xj) = uk(vkA(xj),vkB(xj),vl^(xj),vk(xj)). (3) Computing a final value, V(XJ): it is obtained aggregating the global criteria values related to the employee Xj, by means of an aggregation operator u: V(XJ)
= U{V1(XJ),
...
,VP(XJ)).
These final outcomes, V(XJ), are used for ranking the employees in order to establish the promotion policy. 3. A multi-granular linguistic performance appraisal m o d e l In Section 2 we have seen that usually a performance appraisal problem is defined in a numerical scale in spite of the most of evaluated indicators are qualitative that are difficult to express in a precise way, the use of linguistic values facilitates the modelling and managing of these qualitative indicators. The Fuzzy Linguistic Approach 1 9 provides a systematic way to represent linguistic variables. Due to the fact that in performance appraisal take part in the evaluation process different reviewers that belong to different collectives, it seems logical that they have different information about the indicators so they can express their values in different linguistic expression domains 1 2 , 1 3 . Therefore a suitable way to model the performance appraisal problem to offer a greater expression flexibility is by means of a multi-granular linguistic framework. In this section we present the scheme of the problem in such a framework and an evaluation model to manage this type of information.
3.1. The scheme The schema we present now is similar to that given in Section 2, but in this case the reviewers express their assessments by means of linguistic labels. Additionally we assume that each collective can use different linguistic term sets 12,13 to assess each criterion Yk, fc = 1,... ,p: • aljk e CkA for each i e { 1 , . . . , r} and each j € { 1 , . . . , n). • bf £ CkB for each i € {l,...,s}
and each j € { 1 , . . . , n}.
l
• c f € CQ for each i g { 1 , . . . ,£} and each j 6 { 1 , . . . ,n}. • x>k eCkx for each j e {1,. ,.,n). We note that any appropriate linguistic term set £* is characterized by its cardinality or granularity, |£^|. The granularity represents the level of discrimination among different degrees of uncertainty. Additionally, each £* is ranked by a linear order5. For a further review of the Fuzzy Linguistic Approach and Multigranular linguistic contexts see 12, 13, 19. Since there are p criteria and 4 collectives, we have at most 4p different sets of linguistic labels. From them, we consider the global set of appeared linguistic labels:
c = c\ u • • • u cpA u clB u • • • u CB u clc u • • • u cpc u cxxU£ux.• 3.2. The method Here we present our method to carry out the performance appraisal in a multi-granular linguistic framework (see Fig. 1). The main difference with the evaluation method presented in Section 2 is that here the information to evaluate is expressed in different domains. Therefore it must be unified into a unique linguistic domain before its aggregation. MULTI -GRANULAR LINGUISTIC PERFORMANCE APPRAISAL Supervisor*
—
Collaborators " ClienU Employoc
EVALUATION METHOD Information
—
UNIFYING INFORMATICS
AGGREGATION PROCESSES
• Employee Evaluatioi
—•
Figure 1.
A Multi-Granular Linguistic Performance Appraisal Model
The different phases of the evaluation method are presented in further detail in the following subsections.
3.2.1. Make the information uniform To operate with linguistic terms assessed in different linguistic term sets, first of all we have to conduct the multi-granular linguistic information provided by the different collectives into a unique expression domain, called Basic Linguistic Term Set (BLTS), £ = {1\,... ,Jg}, with g > m a x { | £ i | , . . . , \CA\, \CB\,..., \CB\,\Clc\,...,
\&cl I ^ U • • • > 1^1}-
Once it has been chosen the BLTS, the multi-granular linguistic information must be conducted in it. To do so, we propose to transform this information into fuzzy sets in C by means of the following function14 that computes the matching between the reviewers labels and the labels of the BLTS: T : C —» F(Z) r(s) = {(ii,ai),...,(lg,ag)} at = maxy mm {fii(y),/j.-[((y)}, i =
l,...,g
where •?•"(£) is the set of fuzzy sets on C, m and fij are the membership functions of the linguistic labels / 6 C and ~U € C, respectively. The function r is used for transforming individual assessments into fuzzy sets in the BLTS F(C). Once we have converted all the individual assessments into fuzzy sets on C, the evaluation is easy to carry out. 3.2.2. Evaluation method This aggregation process is similar to that presented in Section 2 but in this case the information aggregated will be fuzzy sets. So, the aggregation operators must aggregate this type of information14'12. (1) Computing reviewers collective criteria values: for each reviewers collective are aggregated their assessments about a given criterion, Yk: vkA^)-ukA{T(af),
,r{af)),
where u\ : (F{C))r —» T{C)
vkB(Xj) = a%(T{b}k),
Mb'")),
where u% : (F(Z))S —» T{C)
vkc(xj) =
,T(cf)),
where ukc : (JF(Z))' —» F(£)
ukc(r(c)k),
vk{Xj) = r{xik).
(2) Computing global criteria values: the previous collective assessments are aggregated by means of an aggregation operator obtaining a global value for each criterion Y^:
vk(Xj)
uk(vkA{xj),vkB{xj),v£:(xj),v*{xj)),
=
where uk : (T(C))4 —+ F{E). (3) Computing a final value: it is obtained aggregating the global criteria values related to the employee xy. V(XJ)
where u : (Jr(C))P
—»
= u(w1(i;,),...,iip(a;j)),
F(E).
These final outcomes, V(XJ), are used for ranking the employees either to establish the promotion policy by means of a fuzzy ranking 1 2 .
4. C o n c l u d i n g remarks The performance appraisal process is more and more important in nowadays companies. In this contribution has been presented a linguistic method to carry out this process that offers a greater flexibility to the reviewers in order to express their values. In the future we shall evolve this process using a linguistic computational model based on linguistic hierarchies 13 to carry out the processes of computing with words in multi-granular contexts without loss of information and to obtain linguistic outcomes.
Acknowledgments The contribution has been partially supported by the Projects MTM200508982-C04-02/03, E R D P and VA040A05.
References 1. B. Arfi, Fuzzy decision making in politics: A linguistic fuzzy-set approach (LFSA). Political Analysis 13, pp. 23-56 (2005). 2. C. G. Banks and L. Roberson, Performance appraisers as test developers, Academy of Management Review 10, pp. 128-142 (1985). 3. J. N. Baron and D. M. Kreps, Strategic Human Resources. Frameworks for General Managers. Wiley k. Sons, New York (1999).
424 4. H. J. Bernardin, J. S. Kane, S. Ross, J. D. Spina and D. L. Johnson, Performance appraisal design, development, and implementation, In: G. R. Ferris, S. D. Rosen and D. T. Barnum (eds.), Handbook of Human Resources Management, Blackwell, Cambridge, pp. 462-493 (1995). 5. P. P. Bonissone and K. S. Decker, Selecting uncertainty calculi and granularity: An experiment in trading-off precision and complexity. In: L. H. Kanal and J. F. Lemmer (eds.), Uncertainty in Artificial Intelligence, North-Holland, pp. 217-247 (1986). 6. R. D. Bretz, G. T. Milkovich and W. Read, The current state of performance appraisal research and practice: Concerns, directions and implications, Journal of Management 18, pp. 321-352 (1992). 7. R. L. Cardy and G. H. Dobbins, Performance Appraisal: Alternative Perspectives, South-Western, Cincinati (1994). 8. C-H Cheng and Y. Lin, Evaluating the best main battle tank using fuzzy decision theory with linguistic criteria evaluation, European Journal of Operational Research 142, pp. 174-186, 2002. 9. M. Edwards and E. Ewen, Automating 360 degree feedback, HR Focus 70, p. 3 (1996). 10. G. R. Ferris and T. A. Judge, Personnel/human resources management: A political influence perspective, Journal of Management 17, pp. 1-42 (1991). 11. C. Fletcher, Performance appraisal and management: The developing research agenda, Journal of Occupational and Organization Psychology 74, pp. 473-487 (2001). 12. F. Herrera, E. Herrera-Viedma and L.Martinez, A fusion approach for managing multi-granularity linguistic term sets in decision making, Fuzzy Sets and Systems 114, pp. 43-58 (2000). 13. F. Herrera and L. Martinez, A model based on linguistic 2-tuples for dealing with multigranularity hierarchical linguistic contexts in multiexpert decisionmaking, IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics 31, pp. 227-234 (2001). 14. F. Herrera, L. Martinez and P.J. Sanchez, Managing non-homogeneous information in group decision making, European Journal of Operational Research 166, pp. 115-132 (2005). 15. J. L. Kerr, Diversification strategies and managerial rewards: An empirical study, Academy of Management Journal 28, pp. 155-179 (1985). 16. S. Marshall, Complete turnaround 360-degree evaluations gaining favour with workers management, Arizona Republic, Dl (1999). 17. J. B. Miner, Development and application of the rated ranking technique in performance appraisal, Journal of Occupational Psychology 6, pp. 291-305 (1988). 18. K. R. Murphy and J. N. Cleveland, Performance Appraisal: An Organizational Perspective, Allyn & Bacon, Boston (1991). 19. L. A. Zadeh, The concept of a linguistic variable and its applications to approximate reasoning, Information Sciences, Part I, II, HI, 8, pp. 199-249; 8, pp. 301-357; 9, pp. 43-80 (1975).
A N INTERACTIVE S U P P O R T SYSTEM TO AID EXPERTS TO EXPRESS CONSISTENT PREFERENCES
S. ALONSO, E. HERRERA-VIEDMA, F. HERRERA AND F.J. CABRERIZO Department of Computer Science and Artificial Intelligence University of Granada, 18071, Granada, Spain E-mail: {salonso, viedma, herrera}@decsai.ugr.es, [email protected] F. CHICLANA Centre for Computational Intelligence, School of Computing De Montfort University, Leicester LEI 9BH, UK. E-mail: [email protected] In Group Decision Making, the expression of preferences is often a very difficult task for the experts, specially in decision problems with a high number of alternatives. The problem is increased when they are asked to give their preferences in the form of preference relations: although preference relations have a very high level of expressivity and they present good properties that allow to operate with them easily, the amount of preference values that the experts are required to give increases exponentially. This usually leads to situations where the expert is not capable of properly express all his/her preferences in a consistent way (that is, without contradiction), so finally the information provided can easily be either inconsistent or incomplete (when the expert prefers not to give some particular preference values). In this paper we develop a transitivity based support system to aid experts to express their preferences (in the form of preference relations) in a more consistent way. The system works interactively with the expert making recommendations for the preference values that the expert have not yet expressed. Those recommendations are computed trying to maintain the consistency level of the expert as high as possible.
1. I n t r o d u c t i o n One of the key issues when solving Group Decision Making (GDM) problems is to obtain the preferences of the different experts in order to lately combine them and find which solution Xj among the feasible set of alternatives X = {x\, ...,xn} is the best. There exist several different representation formats in which experts can express their preferences but, among others, Fuzzy Preference Relations (FPR) 5 ' 6 , 8 have been widely used because they are a very expressive format and also they present good properties that allow to operate with them easily 6 ' 8 .
425
426 Preference relations may also present some disadvantages. As it is required to express a preference degree among all possible pairs of different alternatives, the amount of information that the experts have to provide increases exponentially. Clearly, when the cardinality of the problem is high then we may find situations where the experts do not provide good (consistent and complete) preference relations. In this cases, an expert might choose not to provide all the preference values that he is required to, or the expert might provide his/her preferences in an inconsistent way, i.e., his/her preferences might be contradictory. In a previous paper * a procedure to compute the missing values of an incomplete F P R taking into account the expert consistency level has been developed. Nevertheless, that procedure could not deal with the initial contradiction that the expert could have introduced in his/her preferences, and what could be worse, the expert might not accept the estimated values (even if they increase the overall consistency level). Thus, when designing a computer driven model to deal with GDM problems where the information is given in the form of FPR, software tools to aid the experts to express their preferences avoiding the mentioned problems should be implemented. As experts might not be familiar with preference relations, the aiding tools should be easy enough to use and they should follow the general principles of interface design 4 . In this paper we present an interactive support system to aid experts to express their preferences using fuzzy preference relations. The system will give recommendations to the expert while he/she is providing the preference values in order to maintain a high level of consistency in the preferences, as well as trying to avoid missing information. Also, the system will provide measures of the current level of consistency and completeness that the expert has achieved, which can be used to avoid situations of self contradiction. The system has been programmed using Java technologies, which allows its integration in web-based applications which are increasingly being used in GDM and Decision Support environments 3 ' 1 0 . The rest of the paper is set as follows: In Section 2 we present our preliminaries. In Section 3 we describe in detail our support system. Finally in Section 4 we point out our conclusions and future improvements.
2. Preliminaries In this section we present the preliminaries concepts needed for the rest of the paper: the notion of Incomplete Linguistic Preference Relation, the
427 Additive Transitivity Property and how this transitivity property can be used to estimate missing values in a fuzzy preference relation.
2.1. Incomplete
Fuzzy Preference
Relations
One of the most frequently used formats to represent preferences are Fuzzy Preference Relations 5 , 6 ' s . They present a very high level of expressivity and good properties that allow to operate with them easily 6 , s . Definition 1: A fuzzy preference relation P on a set of alternatives X is a fuzzy set on the product set X x X, i.e., it is characterized by a membership function p,p: X x X —» [0,1]. When cardinality of X is small, the preference relation may be conveniently represented by the n x n matrix P = (j>ik), being pik = fj,p(xi, Xk) (Ve, k € { 1 , . . . , n}) interpreted as the preference degree or intensity of the alternative Xi over xy. p^ = 1/2 indicates indifference between Xi and Xk (xi ~ Xk),p%k = 1 indicates that x; is absolutely preferred to x^, and pik > 1/2 indicates that x^ is preferred to Xk (xi y xk)- Based on this interpretation we have that pa = 1/2 \/i G { 1 , . . . , n} (xi ~ Xi). Usual models to solve GDM problems assume that experts are always able to provide all the preferences required, that is, to provide all pu- values. This situation is not always possible to achieve. Experts could have some difficulties in giving all their preferences due to lack of knowledge about part of the problem, or simply because they may not be able to quantify some of their degree of preference. In order to model such situations, we define the concept of an incomplete fuzzy preference relation 7. Definition 2 A function / : X — • Y is partial when not every element in the set X necessarily maps onto an element in the set Y. When every element from the set X maps onto one element of the set Y then we have a total function. Definition 3 An incomplete fuzzy preference relation P on a set of alternatives X is a fuzzy set on the product set X x X that is characterized by a partial membership function.
2.2.
Additive
Transitivity
Property
For GDM problems where the preferences are given as fuzzy preference relations, some properties about the preferences expressed by the experts are usually assumed desirable to avoid contradictions in their opinions, that is, to avoid inconsistent opinions. One of them is the additive transitiv-
iij/property 6 ' 9 : [pa - 0.5) + (Pjk - 0.5) = (Pik - 0.5) Vt,i, k e { 1 , . . • , n } 2 . 3 . Estimating
Missing
Values
Using
Additive
(1)
Transitivity
Expression (1) can be used to calculate an estimated value of a preference degree using other preference degrees in a fuzzy preference relation. Indeed, the preference value pu, {i ^ k) can be estimated using an intermediate alternative Xj in three different ways: • From p^ = pij + pjk - 0.5 we obtain the estimate (cPikY1 = Pij + Pjk - 0.5
(2)
• From pjk = Pji +Pik — 0.5 we obtain the estimate {cpikY2 = Pjk ~ Pji + 0.5
(3)
• From p^ = pik + pkj — 0.5 we obtain the estimate {cpik)j3 = Pij - Pkj + 0.5
(4)
As we have already said, and expert can choose to not provide complete preference relations, thus, the above equations may not be possible to be applied for every alternative Xi,xk,Xj. If expert e/, provides an incomplete fuzzy preference relation Ph, the following sets are defined 7 : A = {(i,j) I i,j e h
MV
h
EV
{l,...,n}Ai^j}
= | (i,j) € A | p^j is h
=A\
MV
H% =
lj^i,k\(i,j),(j,k)eEVh]
unknown} H% = \j # i,k I O'.O.O'.fc) e EV". H?k3 =
\j?i,k\(i,j),(k,j)eEVh'
MVh is the set of pairs of alternatives whose preference degrees are not given by expert e^, EVh is the set of pairs of alternatives whose preference degrees are given by the expert e/,| H^k, H^k2, H^k are the sets of intermediate alternative Xj(j ^ i,k) that can be used to estimate the preference value p^k (i ^ k) using equations (2), (3), (4) respectively. The final estimated value of a particular preference degree p\k ((i, k) £ EVh) can be calculated only when #(i// l fc 1 + H$ + H^) ± 0: h
_ ZjeHyicptkV1
+ Sj6fl
EjeH»>(cP?k)J3
In the case of being ( # # # + # # ^ 2 + #H?kz) = 0 then the preference value P'ik ((*> k) e EVh) cannot be estimated using the rest of known values.
3 . Interactive S u p p o r t S y s t e m t o A i d E x p e r t s t o E x p r e s s C o n s i s t e n t Preferences In this section we describe in detail our interactive support system to aid experts t o express their fuzzy preference relations in a consistent way. Firstly we will enumerate all the design goals and requirements that we have taken into account and secondly we will describe the actual implementation of every requirement in the system.
3 . 1 . Design
Goals and
Requirements
Our design goals and requirements could be split in two different parts: Interface Requirements, and Logical Goals. Interface R e q u i r e m e n t s : These requirements deal with the visual representation of the information and the different controls in the system. We want our system to comply the so called "Eight Golden Rules"4 for interface design: • • • • • • • •
GR GR GR GR GR GR GR GR
1. 2. 3. 4. 5. 6. 7. 8.
Strive for consistency. Enable frequent users to use shortcuts. Offer informative feedback. Design dialogues to yield closure. Offer simple error handling. Permit easy reversal of actions (undo action). Support internal focus of control (user is in charge). Reduce short-term memory load of the user.
Logical Goals: • Goal 1. Offer recommendations to the expert to guide him toward a highly consistent and complete fuzzy preference relation. • Goal 2. Recommendations must be given interactively. • Goal 3. Recommendations must be simple to understand and to apply. • Goal 4. The user must be able to refuse recommendations. • Goal 5. The system must provide indicators of the consistency and completeness level achieved in every step. • Goal 6. The system should be easy to adapt to other types of preference relations. • Goal 7. The system should be easy to incorporate to Web-based GDM models and decision support systems 3 , 1 0
3.2. Actual
Implementation
We will now detail how we have dealt with every requirement and goal that we have presented in the previous section. To do so we will make use of a snapshot of the system (figure 1) where we will point out every implementation solution. Implementation of the Interface Requirements: LhoDiIng a Small Car
a
gfe €& c=* e& * » . °:r ^
r *t
J
CAT
I
a
ttoA
..ra
'".
£3* ^ •-L'" ( iimpltniif i * I evil
Figure 1.
Snapshot of the Support System
GR 1. The interface has been homogenised in order to present a easy to understand view of the process which is being carried. We have introduced 3 main areas: In area number (1) we present the fuzzy preference relation that the expert is introducing, as well as a brief description of every alternative. Area number (2) contains several global controls to activate/deactivate certain functions, as well as to finish the input process. Area number (3) contains different measures that show the overall progress (see below). G R 2. Shortcuts have been added to the most frequent options and the input text areas for the preference values have been ordered to access to them easily using the keyboard. G R 3. Our systems provides recommendations (4) and consistency and completeness measures (5) (see below). All controls have tooltips. G R 4. With every change that the user makes to his/her preferences the system provides new recommendations and measures. G R 5. Incorrect inputs are prompt with error messages. G R 6. We have introduced undo and redo buttons (6). G R 7. The user can choose at every moment which preference value
431 wants to give or update, as well as enabling/disabling options. • G R 8. All information is presented in a single screen. Logical Goals: • Goal 1. To offer recommendations, the system computes all the missing values that could be estimated by using equation 5 and it presents them in area (1). As the values are computed taking into account the additive transitivity property, the recommendations should tend to increment the overall consistency level. They are presented in a different color (gray) (4) to be easily distinguishable from the proper expert values (7). • Goal 2. When the expert introduces or updates a preference value all possible recommendations are recomputed and presented. • Goal 3. Recommendations are given in the same manner as the user inputs his/her preferences. There is also a button that enables the user to accept or validate a given recommendation (8). • Goal 4. A user can choose any value for a particular preference degree ignoring all the recommendations. • Goal 5. In previous works 2 we provided some measures of the consistency and completeness of fuzzy preference relations (5). The consistency measure for a particular F P R Ph (called clh) is based on the error that can be computed between the p^k values that the expert e^ provides and the cp\k values that can be estimated using expression 5. The completeness measure (Ch) is obtained as a ratio between the number of values given by the expert (#EVh) and the total number of values that the expert should give to have a complete FPR. In our system we also combine these two measures into a global consistency/completeness measure that informs the expert with his/her current degree of consistency and completeness: CCh = clh • Ch
(6)
• Goal 6. As the system is programmed following the principles of Object Oriented Programming, to adapt it to new kinds of preference relations is an easy task. • Goal 7. As the system is Java based, it is easy to incorporate it into a web-based environment. 4. Conclusions and Future I m p r o v e m e n t s In this paper we have presented an interactive support system which aids experts to provide consistent preferences and to help them to avoid incom-
plete information situations in GDM environments where the opinions must are provided as fuzzy preference relations. The system works providing easy recommendations while the expert gives his/her preference values, always trying to maximize the consistency of the expert's opinions. In the future we will extend the system to allow the use of different preference relations (linguistic, interval-valued and multiplicative preference relations, for example) and we will integrate it into a complete consensus reaching process to enrich the preference acquisition step in the process. Acknowledgments This work has been supported by the Research Project TIC2003-07977, and the EPSRC research project "EP/C542215/1". References 1. Alonso, S., Chiclana, F., Herrera, F., Herrera-Viedma, E.: A consistency based procedure to estimate missing pairwise preference values. International Journal of Intelligent Systems, in press. 2. Alonso, S., Herrera-Viedma, E., Chiclana, F., Herrera, F., Managing Incomplete Information in Consensus Processes. Proc. of Simposio sobre Logica Fuzzy & Soft Computing (LFSC2005), Granada (Spain) (2005) 175-182. 3. Bhargava, H.K., Power, D.J., Sun, D.: Progress in Web-based decision support technologies. Decision Support Systems, In Press. 4. Chen, Z.: Interacting with Software Components. Decision Support Systems 14 (1995) 349-357. 5. Chiclana, F., Herrera, F., Herrera-Viedma, E.: Integrating three representation models in fuzzy multipurpose decision making based on fuzzy preference relations. Fuzzy Sets and Systems 97 (1998) 33-48. 6. Herrera-Viedma, E., Herrera, F., Chiclana, F., Luque, M.: Some issues on consistency of fuzzy preference relations. European Journal of Operational Research 154 (2004) 98-109. 7. Herrera-Viedma, E., Chiclana, F., Herrera, F., Alonso, S.: A group decision making model with incomplete fuzzy preference relations based on additive consistency. Technical Report #SCI2S-2004-U- University of Granada (2004). http://sci2s. ugr. es/publications/ficheros/TechnicalReportSCI2S2004-ll.pdf 8. Kacprzyk, J.: Group decision making with a fuzzy linguistic majority. Fuzzy Sets and Systems 18 (1986) 105-118. 9. Tanino, T.: Fuzzy preference orderings in group decision making. Fuzzy Sets and Systems 12 (1984) 117-131. 10. Zhang S., Goddard, S.: A software architecture and framework for Webbased distributed Decision Support Systems. Decision Support Systems, In Press.
A M O D E L OF D E C I S I O N - M A K I N G W I T H LINGUISTIC INFORMATION B A S E D O N LATTICE-VALUED LOGIC
J U N MA, SHUWEI CHEN, YANG XU Department E-mail:
of Mathematics, Southwest Jiaotong University Chengdu 610031, China [email protected], [email protected], [email protected]
In this paper, a model for decision-making with linguistic information is discussed based on uncertainty reasoning in the framework of lattice-valued logic through an example. In this model, decision-making process is treated as an uncertainty reasoning problem, in which decision-maker's background knowledge about the problem at hand and consultancy experts' assessments on alternatives are regarded as the antecedents of the uncertainty reasoning, the final decision is taken as the conclusion of the uncertainty reasoning, respectively.
1. I n t r o d u c t i o n Linguistic decision-making study is an important issue in linguistic information processing. A great number of works have been presented and applied in various fields, such as multi-source information fusion and aggregation [2,3,9,11]. As far as the primary processing strategies are concerned, most existing approaches are computational means and always have a common presupposition that the final decision is hidden in the evaluated alternatives, which can be picked out by computation. However, in our opinion, there is a great deal of common ground between this process strategy and uncertainty reasoning: 1) the final result is predefined and no unexpected conclusion need to be evaluated; 2) some transcendent knowledge is taken as starting point of processing, which is accumulated and solidified from human's experience; 3) operating is established in a certain interpretation and application environment, which links the transcendent knowledge and the final result. So, it is rational to study decision-making in terms of uncertainty reasoning in the framework of the classical logic or some non-classical logics. In this paper, we shall discuss a linguistic model for a kind of multicriteria decision-making problem based on lattice-valued logic. Concretely,
433
the rest of paper is organized as follows: in Section 2, we give some notations and assumptions firstly, and then illustrate the model through an example; in Section 3, we discuss the merits and disadvantage of the model. 2. M a i n R e s u l t s 2 . 1 . Model
Description
Firstly, we shall suppose that Q is a real decision-making problem, L is a lattice implication algebra (LIA) [5], E = { e i , e 2 , . . . , e/t} is the set of consultancy experts, A = {a\,a2,... ,o„} is the set of alternatives, T = {*ii *2; • - •} is the set of all linguistic terms, F = {/i, /2, • • •, fm} is the set of evaluation factors; G is the decision goal, W = {ui\,W2, • • •, wm] is the set of generalized weights for evaluation factors, W C T, and fi = {u>i,u>2,... ,Wfc} is the set of generalized weights for consultancy experts, OCT. Secondly, we give following assumptions. A s s u m p t i o n 2.1. For each e G E, it is associated with three
mappings:
(1) &^ : T<e) —> Int(L), where T^ CT is the set of linguistic terms the expert e prefers to use, and Int(L) is the set of all intervals of a lattice implication algebra L [4J- This mapping means intuitionally the expert e 's understanding of each linguistic term t £ T^. (2) W^ : W —> T^, which means the expert e's opinion on the importance of each factors. (3) s*V> : AxF —> 0>{TM), which means the expert e 's assessment to all alternatives on all evaluation factors. For convenience, we shall denote (/(a),TJ e J ) as the assessment of the expert e on a in terms of the factor f, T^f G &>{T^). A s s u m p t i o n 2.2. Each f G F is linked to a fact ((\fx)f(x) =» G(x),w), where w G W is the assigned weight by the decision maker to the factor f. A s s u m p t i o n 2.3. Each a £ A is associated with a set of synthesized assessments 5 ( e i ) ( a ) , S ( e 2 ) ( a )> . . . , S ( e f c ) (a), and S(a), where S^(a) is the synthesized assessment of the expert e, to a, and S(a) is the synthesized assessment of all consultancy experts. These synthesized assessments have the following forms: m
S^(a):
(hGvWMa)),
i = l,2,...,fc
435
and S(a):
(SM{a)A---AS(-ek)(a)^G{a),t{a)),
where t(a) 6 T is a set of linguistic evaluation factor fj, j = 1 , 2 , . . . , m. A s s u m p t i o n 2.4. The decision to T*- ' C T, such that each u t e T^ which means the effect decision. Hence, we can describe
term(s),
(1)
Gij(o) is the decision by
maker has defined a mapping from CI G fl is attached to a linguistic term of the consultancy expert e on the final the effect of the consultancy expert e by
((^X)S^{X)=>G{X),LJ).
So, the decision process is aiming at getting 5^ 6 l '(a) A • • • A 5^e*^(a) => G(a) and t(a) from the facts ((Va:)S<e<>(:r) => G{x),Vi), {{\fx)fj(x) => G{x),Wj), and ( / j f a p ) , ^ ^ ) , where i = 1, 2, . . . , k, j = 1, 2, . . . , m, p= 1, 2, . . . , n. 2.2. Example
for
Illustration
Due to the limitation of paper length, we shall illustrate the model through the following example, which is taken from [7]. E x a m p l e 2.1. Consider the evaluation of university faculty for tenure and promotion. The evaluation factors used at some universities are / i : teaching,
fa
: research,
fa
: service.
Five alternatives ai, a?, 03, 04, and 05 are to be evaluated using the linguistic terms: TE = {^o = extremely poor,
t\ = very poor,
£2 = poor,
£3 = slightly poor,
i 4 = fair,
£5 = slightly good,
te = good,
t-j = very good,
is = extremely good},
by four consultancy experts e\, e-z, e$, and e 4 . Step 1. (selecting LIA) The selected LIA L — {ZQ < z\ < • • • < z$} is shown in Figure 1. Here, we shall not distinguish the difference between the consultancy experts' preference and the evaluation factors' importance, and take the LIA as the common universe of discourse for convenience. In fact, we can select different LIAs for them.
436
Int(t) Figure 1.
Linguistic terms for evaluation.
Step 2. (interpreting linguistic terms) Each expert e £ E gives the definition for each linguistic term who prefers to use. Without loss of generality, suppose all consultancy experts' select the same linguistic terms and give the same definition for each, which are also marked in Figure 1.
weights for evaluation factors
Figure 2.
weights for consultancy experts
Weights for evaluation factors and consultancy experts.
Step 3. (interpreting linguistic weights) Each weight w £ W (u> £ £1) is a linguistic term, which reflects the effect of the corresponding factor (consultancy expert) on the final decision. Suppose the weights for evaluation factors and for consultancy experts are shown in Figure 2, where Si is "very important", S2 is "important", u\ is "very important", and u?, is "slightly important". The weights for each evaluation factor and for each consultancy expert are listed in Table 1.
Table 1. perts.
Weights for evaluation factors and consultancy ex-
expert weight
ei
e2
e3
U2
Ul
U2
e4 ui
factor
/i
h
h
weight
S2
si
S2
Step ^. (evaluating alternatives) Suppose the consultancy experts assessments are shown in Table 2 [7]. Table 2.
Assessments of all experts.
Expert ei's assessments: fi ax 12
03
14
15
fl
*4,*5
t7i*8
*7,t 8 *6,*7.*8
t3,t4,*5 *5|t6
t7,t8 *5,t6,*7 t6,*7
"3
14
15
*6,t7
*6,t7,*8
t6,*7,*8
*6,t7
t7,*8 *6,*7 t4,t5,t6it7
t7,ta
f2 *5,*6 *6,t7 h *5,*6>*7 t6,*7i<8 Expert e2's assessments: fi oi «2 fl h,t6 t4,45,*6 J2 t7i*8 /3 *4,t5 *6,*7,*8 Expert e3's assessments: fi ox 12 f\ *6,t7 *4,t5,*6 /2 /3
t7,*8 *5,*6
*6)*7
Expert e4's assessments: fi "l «2
/i fl /3
h,te t6 1 *7i*8 *S)t6i*7
*7.<8 *6,*7
13
04
15
*7,*8 *6,*7 *5>t6
*6,*7 t4,<5 *7,'8
*5,*6,t7 t4,t5,t6,t7
13
04
15
*6,t7,*8 *6>*7 *7,*8
t4,t5,*6 *6.*7 *5,*6
*5>*6,t7 *4,*5
Step 5. (rearranging assessments) We shall select the smallest interval of the selected LIA L to cover all linguistic terms for each expert's assessments. For example, the expert ej's assessment on the alternative a\ in terms of the evaluation factor f\ is covered by the interval [25,ZQ}. The rearranged assessments are shown in Table 3. Step 6. (constructing deduction) Taking expert e's assessments for alternative a and background knowledge for evaluation factors as antecedents, we can construct deduction of S^ia) from (/}'(a),&/W{a,fi)) and (iyx)f\e'(x) =4> G\e'(x), u>i), i— 1 , . . . , m. The construction has two stages: (1) { ( / | e ) ( « ) y W ( « , / i ) ) } ^ ! e ) ( 4 (2) {G<e)(a),i = l,...,m}r-S( e )(a).
i = l,...,m.
Table 3.
Rearranged assessments of each expert.
Rearranged expert e\ 's assessments. a$ 0-2 "3 h a\ h [25, 26] [23, 25) [23,,24] [25, 26] h [23,.25) [24, 26) 1*5,,26) 1*2, 24) h 1*3, 2 6 | [24, 26) [24,, 26J 1*3, 25] Rearranged expert e2's assessments. 04 a\ as 12 fi
as
h h h
[25, 26] [24, 26J [23, 26]
[24, 26] [25, 26] [23, 24]
[23, 25]
1*3, 26) [24, 26)
[24, 26] [23, 25] [24, 26]
[24, 26] [23, 25) [24, 26]
"5
[25, 26]
1*3, 26] [*4, ,2 6 j
Rearranged expert e3's assessments. (24 a\ 12 a.3 fi
as
h h h
[24, 26] [25, 2 6 ] [23, 2 5 ]
[23, 25] 1*3,,*6] [*3, 2 6 ]
fi
a\
<*2
h h h
[23, 25] [24, 26) [23, 26 J
[25, 26) [24, 26) [24, 26)
[25,,26] [24, z6] [23, 251 [24,,26] [*3, 24] [23, 25) [23,,25] [25, 26] [24, 26) Rearranged expert e4's assessments.
az [24, 26] [24, 2 6 )
N, 2 ) 6
04
«5
[23, 25] [24, 26)
[23, 25] [23, 26) [23, 24]
1*3, 25]
By properties of lattice-valued logic based on LI A, the construction is trivial. Readers can refer to [4,6] for more details. Step 7. (computing t(a)) According to the constructed deduction sequences of S^{a), the synthesized assessments for each alternative are listed in Table 4. Table 4.
Synthesized assessments for each alternatives.
ei
ai
02
«3
04
05
ei
[23,25] [20,20] [24,24] [23,24]
[23,24] [22,24] [22,24] [23,25]
[20,20] [22,25] [23,24) [23,25]
[23,24] [22,25] [23,24] [23,24]
[22,25] [23,25] [22,24) [22,23]
e2 e3 e4
Step 8. (aggregating consultancy experts assessments) To aggregate the synthesized assessments of all consultancy experts, we shall construct another deduction of S^ei)(a) A • • • A 5 ( e t ) ( a ) => G(a) and t(a) from the facts ((yx)S^6i^(x) =>• G(x), u>i), i = 1, 2, ..., k. Because the construction is trivial, we only give the aggregated results in Table 5, where the numbers Vij are occurrence of the corresponding linguistic terms Sj in the synthesized assessments for a{.
Table 5. Aggregated synthesized assessments for all alternatives. Vij
so
«1
«2
«3
S4
S5
S6
S7
S8
en
1 1 1 0 2
0 1 0 0 2
1 3 2 3 3
3 4 3 4 4
3 4 3 4 4
2 2 2 2 2
0 1 2 1 1
0 0 0 0 0
0 0 0 0 0
12
a.3 a.4
as
Step 9. (selecting appropriate decision) To make the final decision, the decision maker has many methods. Here, we number each linguistic term's index as its score, i.e. [SJ] — j (j — 0 , 2 , . . . , 8), and compute the average score for each alternative: s(a,i) = 5^i = o( v u case the average score for each alternative is s(oi) = 3.30,
s(a2) = 3.19,
s(a3) = 3.62,
x
lsj])/Ylj=ovij-
S° in this
s(a 4 ) = 3.57,
s(a5) = 2.89.
Hence 0,3 is the best alternative. The conclusion is the same to that in [7]. If we take the sum of scores as the selecting criteria, then 02 is the best, which is the same to [1]. From this example, it can be concluded that different selecting criteria will lead to different results. 3. Conclusion In the present work, we discussed a model for processing linguistic information in a kind of decision-making problem based on lattice-valued logic. In this model, linguistic information needn't be placed symmetrically in linear order, although the example is illustrated based on a linear placed linguistic term sets. Moreover, each experts, as in fact a source of information, can use his/her own operating logic to present his/her opinions. All these strategies are easy to be applied to multi-source linguistic information processing. Notice t h a t the reliability of the conclusion in an uncertainty reasoning will decrease rapidly with the increasing of the length of inference sequence, we should construct as short as possible inference sequences in order to keep the reliability above a satisfying level. Hence, the choice of the underlying logic system plays an crucial role in the model. More work is needed. Acknowledgement We sincerely appreciate the anonymous referees for their valuable comments and suggestions. The work is supported by the National Natural
440 Sciences Foundation of China with granted number 60474022, and ChinaFlanders Bilateral Scientific Cooperation Joint Project with granted number 011S1105. References 1. N. Bryson and A. Mobolurin, An action learning evaluation procedure for multiple criteria decision making problem, European J. Operational Research, 1995, 96: 379-386 2. F. Herrera, E. Herrera-Viedma, and L. Martinez, A Fusion approach for managing multi-granularity linguistic term set in decision making, Fuzzy Sets and System, 2000, 114: 43-58. 3. F. Herrera and E. Herrera-Viedma, Linguistic decision analysis: steps for solving decision problems under linguistic informaiton, Fuzzy Sets and Systems, 2000, 115: 67-82 4. J. Ma, Studies on Lattice-Valued Logic and Its Applications, Post-Doc Research Report, Southwest Jiaotong University, Chengdu, 2005 5. Y. Xu, Lattice implication algebra, J. Southwest Jiaotong University, 1993, 28(1): 20-27 6. Y. Xu, D. Ruan, K. Qin, and J. Liu, Lattice-Valued Logic - An Alternative Approach to Treat Fuzziness and Incomparability, Germany: Springer, 2003 7. Z. Xu, Uncertain linguistic aggregation operators based approach to multiple attribute group decision making under uncertain linguistic environment, Information Science, 2004, 168: 171-184 8. R.R. Yager, Inference in a multiple-valued logic system, Internal. J. ManMachine Stud., 1985, 23: 27-34 9. R.R. Yager, On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. Systems, Man, and Cybernetics, 1988, 18: 183-190 10. R.R. Yager, An Approach to Ordinal Decision Making, Internal. J. Approximate Reasoning, 1995, 12: 237-261. 11. X. Zeng, Y. Ding, and L. Koehl, A 2-Tuple Fuzzy Linguistic Model for Sensory Fabric Hand Evaluation, in: D. Ruan and X. Y. Zeng, eds., Intelligent Sensory Evaluation-Methodologies and Applications, Germany: SpringerVerlag, 2003.
INFORMATION INTEGRATION BASED TEAM SITUATION ASSESSMENT IN AN UNCERTAIN ENVIRONMENT JIE LU, GUANGQUAN ZHANG Faculty of Information Technology University of technology Sydney, POBox 123, Broadway, NSW 2007, Australia Email
{jielu,zhangg}@it.uts.edu.au
Abstract: Understanding a situation requires integrating many pieces of information which can be obtained by a group of data collectors from multiple data sources. Uncertainty is involved in situation assessment. How to integrate multi-source multi-member uncertain information to derive situation awareness is an important issue in supporting decision making for crisis problems. The study focuses on how uncertain situation information is presented, integrated and finally how situation awareness information is derived. A multisources team information integration approach is developed in the study to support a team's assessment for a situation in an uncertain environment. A numerical example is then shown for illustrating the proposed approach. Keywords: Information integration, situation assessment, fuzzy numbers, team
1. Introduction Decision making for a crisis problem often depends on the awareness of decision makers for a situation. Situation awareness (SA) is defined by Endsley [3] "as the perception of elements in the environment, the comprehension of their meaning in terms of task goals, and the projection of their status in the near future." The process of achieving a SA is called situation assessment or situation analysis. Situation assessment is based on acquired situation information that can be implicit or explicit. Awareness information is derived as the results of situation assessment. SA has been largely studied as an important element in diverse military and pilot systems using observation, experiments and empirical methods. Recently SA has been recognized as an element to support emergency management and crisis problem finding. This trend requires the development of general situation assessment models and approaches. Although the elements of SA vary widely between domains, the nature of SA, as the output of a kind of information systems, and the mechanisms used for achieving SA can be described generically. The enhancement of SA information processing approaches and techniques has become a major design goal for situation assessment systems [6]. However, comparing with the results conducted in aviation environments by experiment methods, the research in general situation information processing has received less exploration. This study will combine the main features of both descriptive and prescriptive assessment approaches to show a general process for team situation 441
442 assessment under some uncertain factors in deriving situation awareness for supporting decision making. SA is aware of a situation, can be an objective or an environment, based on the perception of what is happening or has the potential to happen [9]. If a department or a command centre is to make the right decisions for a crisis problem, it must be able to assess the situation, the threats and the opportunities faced, and create awareness for the situation. Information about a situation often comes from multiple data sources. Awareness or decisions based on information from a single source may be sub-optimal or incorrect. It is often the convergence of evidence from various sources that provides an accurate and reliable result [2]. Also, information about a situation is often collected by multiple observers or data collectors as a team. A collector observing a situation and collecting date from the same source but in different time slots may report different results. Because the collector may have difference personal views for a situation in different time slots and a situation may change in different time slots. Therefore, team SA is proposed to concern how to integrate awareness of individuals in such a complex situation [1]. The data fusion community is more commonly to refer such a situation as situation assessment [4]. Uncertainty is involved in both situation assessment elements and situation assessment process. First, information about a situation is hard to be captured precisely and completely. Situation observers or collectors may don't know exactly what information will relate to a possible problem and may put their personal views into the information they obtained. Accepting as a given that the situation information is incomplete and uncertain, the perception of SA can therefore only be obtained through processing the uncertain or incomplete information. Second, when multiple data collectors are involved in a situation assessment, they can be given different relevant degrees or weights based on their experiences and knowledge for the situation. Experienced collectors will have higher weights, while inexperienced collectors, with lower weights. The third, when multiple information sources are used for assessing a situation, these sources may have different believe degrees. Data collected from some sources may be more reliable and truthful than of others. Therefore, an appropriate aggregation of information from different sources and different collectors with a consideration of the effects of uncertain factors contributes crucially to the correct assessment for situations. Many aggregation or fusion approaches have been reported in literature. This research aims to develop a team situation assessment approach to integrate uncertain team assessment information for a situation. Following the introduction, Section 2 proposes multisources, multi-member information integration issue. An information integration approach for team situation awareness is presented in Section 3. A numerical
443 example is shown in Section 4 for illustrating the proposed approach. Conclusions and further study are discussed in Section 5. 2. Multiple data sources and team information integration Understanding a situation requires mentally integrating many pieces of information, including both that information exists for a relevant situation, how it is interrelated to the situational context, the information sources' situation, and the information collection team members' background. Information integration is an increasingly important element of SA systems. The process of information integration uses overlapping information to detect, identify and track relevant objects in an environment which may be reflected in multiple sources. A team's situation assessment is based on individuals' assessment result which aggregates each individual's assessment for multiple sources. Performing assessment individually on each data source will minimize the potentially harmful impact of background noise derived from various sources. As this study will incorporate uncertainty of information collected into an integration process, the information integration process can be completed under four levels which can be implemented by four steps: 1) each individual collector's information integration for his/her multiple observations to each single data source (or multiple sources but with the same believe degree); 2) each individual collector's information integration for his assessment to all data sources; 3) individuals who have the same relevant degree (weight) is aggregated; and 4) all individual collectors' situation assessment information is weighted integrated to conduct a synthesis assessment of the team. This approach first focuses on determining individual assessment from single data source to multiple sources for a situation. It then considers the synthesis information assessment result of the team. Workload, time, stress, inexperience at assessing a problem all affect the assessment results. Some team members in some situations may not either know what information relate to the problem or may not provide all relevant information to the team. Also the team members may obtain different information from the same source at the same time, or have different understanding for an object. As a result, they may have different judgments and awareness for a situation. Therefore, the synthesis assessment results of a team must consider both the information sources' believe degree and members' relevance degree (weight). This approach also considers 'time-scale' issue to analyze individual information gathering. In actual work situations, such as an emergency coordination centre, a team often works exclusively on one or few time-scale, and distributes goals and information to higher or lower levels in the hierarchical web, who are then able to consider other time-scales. The team situation forming process is a consequence of the information filtering, coordination and aggregation. This proposed approach identifies team situation
444 assessment factors, including the believe degree of data sources, the time-scale of data collections, and the weight of collectors. It will apply target recognition approach [5] to conduct team assessment's coordination with consideration of uncertain factors. As team members may share many similar characteristics with persons functioning as individuals in dynamic environments, this approach adopts similar approaches for group uncertain issues, such as weights, as of a lone collector when the group of members has similar backgrounds. This approach uses these concepts to be developed for coordinating team information and deriving a team's situation assessment. This approach uses fuzzy numbers to deal with uncertain information obtained from multiple data sources, believe degrees of these sources and relevance degrees of individuals. To consider the believe degrees of sources and weights of members is to minimize the influence of incorrect understanding for a situation. The proposed situation assessment approach can therefore help teams achieve better decision-making processes in a crisis problem solving. 3. Multi-Sources Team Uncertain Information Integration Approach Let S = {S\,S2, ...,Sm},m > 2, be a given finite set of data source; C = {CUC2, ..., C„} be a given finite set of data collectors. We suppose that these data sources can be divided into s groups by their believe degrees. Such as, there are m{ sources with a high reliability, m2 sources with a rather high reliability, and ms sources with a low reliability, m=m\+..+ms. Similarly, these collectors is devised in to t types such as there are ti\ strong collectors, m rather strong collectors, and wt weak collectors, etc. n=n\+..+ns. Collector C, (/' = 1,2, ..., ri) obtains information ay from source 5} (/'= 1, 2, ...,m),ay(i= 1, 2, ...,n;j= 1, 2, ..., rri) is a fuzzy number and can take values from five fuzzy subsets (linguistic terms): SN {surely have not), MN {may have not), NS {not sure), MH {may have), and SH {surely have). It was recognized that subsets with normal-type membership functions ^:[0,1]->[0,1] The centers of membership functions z of the associating numerical values to linguistic labels of the fuzzy subsets in an intuitive way: 0: SN, 0.25: MN, 0.5: NS, 0.75: MH, 1: SH. The individual collectors' assessment values (linguistic) are effectively combined to produce a synthesis values. The approach is described by following four steps: Step 1: integration of each individual's information of multiple observations for single source (or multiple but with same believe degree) The approach firstly integrates the assessment information of each collector obtained from one or multiple data sources with same believe degree. The multiple values of each collector present the collector's observation for a object in different
445 time slots, and for multiple objectives which have equivalent believe degree. These values are combined according to the believe degree (reliability) of the respective data sources. These data sources can be classified into high reliability sources, medium reliability sources and low reliability sources for example. The step aims to obtain individuals' assessment for a situation from one or few equivalent data sources the average operator is therefore used as follows. m
\ a.,
m
m
i a.,
'-t a.
.
m
* a
C , 1 = E ^ ; C 1 , = S - ^ ; . . , Q _ 1 = 2 : ^ ^ and Cls = £ ^ - ,
/ = 1, 2, . , „,w
here m\ = mi — lt, } . m, = m, /,- is the number of empty value for ith source. Cy (i = 1,2,..., n,j = 1,2,..., s) is the ith collector's average assessment value for they'th group of data source. Step 2: integration of each individual's information for multiple sources We suppose Rh R2, .... Rs represent the believe degrees of these group of data sources using fuzzy numbers with normal-type membership functions. Obviously, from data high reliability sources will be integrated with a higher priority to the integrated result for each collector. This step aims at producing a value for each collector by combining all integrated values obtained in Step 1. By the step, each collector has obtained a value as his/her assessment. C^t^fCy,
i = l,2,-,«.
H
Step 3: integration of equivalent individuals' information To integrate all collectors' assessment for an objective/environment, the difference between team members should be considered. Each group of members who have same relevant degree (weight) will be integrated as follows. J\ a„ i a„ vi a. J^, a„ a n dC C^Z—'Cm-lL — '-'C..,-!,^ » , = Z —> where £| = I nt - n. Step 4: integration of all weighted individuals' information to conduct a synthesis assessment for the situation We suppose wh wj,..., w, represent the weights of collector groups from strong to weak respectively. To achieve a value as the team situation assessment result, the step combines the results obtained in Step 3 with weights. Of course, higher priority is given to most strong collectors. i=l
The synthesis value obtained is as the assessment result of the team for a situation. Obviously, it is a fuzzy set. We then provide a crisp value through defuzzyfication
446 (centre of mass) as a team assessment result by using the following formula [7,8]: i
\xftR(x)dx \HR{x)dx o
4. An illustrate Example There are three data collectors to assess an environment from three data sources through multiple times observations. The nine collectors have three members with weights 'high', three 'middle', and three 'low'. The three sources have believed degree 'high', 'middle' and 'low' respectively. Basically, each collector visits source 1, 2 and 3 for three, two and three times respectively. Therefore, each collector has an assessment result: (SI, S2, S3) from source 1 with a high believe degree, (S4, S5) medium believe degree and (S6, S7, S8) low believe degree. As possible ata missing, some values could be empty. 0
x<0
SN = -4x + \
0<x
i.e.,
SN = u
X
,le[0,l]
0 0
1/4 < x x<0
4x - 4JT + 2
0 < x < 1/4 , , ..e„ 1/4 < x < 1/2
0
1/2 < x
MN--
0 NS--
MN=
4
4
X_ _X_ J_
u X
4
2
x < 1/4
4x-\
1/4 < x < 1/2 ' , , - 4* + 3 \/2<x< 3/4 3/4<x 0 x<\/2 0 4x - 2
MH = {
-4x + 4 0
\/2<x< '
3/4<x
3/4 ' ,
i.e.,
i.e.,
NS=
SH = \4x-3
— I _i. 2
u X ,»e|0,i] 4
MH =
4'
4
4
X I X , +1 4 2 4
<J X — + —,
Ae[o,ij
\<x
x<3/4 0
o,-A+I
3/4<x<\, \<x
i.e.,
A
SH = u X WW \_4
A. 4
Now we use the proposed approach to calculate the integrated situation assessment result for the team. Step 1: integration of each individual's information for single source C.„ =-NS '•"3
X 1 + -MH + -NS= <j X 3 3 -"(".'I —+ - , 4 3
X
5 +— 4 6
C,U=-SN '• W
+ -MN=
2
2
u X
C „ =-NS + -MN + -SH= u /I 1 ' 3 3 3 ^o,i| 4
4 + 8.
8'
Ae[0,l]
i. 1 -1 1 4 8
C1H=-NS • 2
+ -MN= u /t • + 2 *«[o.i] L 4 8'
CiL=-DY ' 2
C, H = DX =
+ ~DN= u J " i . + l , - i + i l 2 A«|o,i] [8 8 8 8 J
C,U=UN=
A 3
A* -', u A—+ -U[o.i] [ 4
+1 -,
4
4
2
2
-KM] [ 4
i6[o,i] [ 4 C.6 H=RY= '"
4
8
46[0,IJ [ 4
8
]•
2
'
- 3]
-uio.i] [ 4
4
4
c7U=-w+-Rr=
4
-111 4 2J
J
4 4J
C6M=D>'= w A | - + 2,I],
u 4 - +2 . - - + 1
J'
2
J40.il L4 4
A«to.ij [ 4 1
1
X\— + —,
io,-A+J
2 4
4
,U 3
u
J] L 4 < 1 ™, ' , , » , ,[•* 1 • 2 2 A6[o,i] [ 8 8 C,„=RY = u i i . + I , - i . + i l
c4,=-!-«r + 2Dr= u J i - + - , - - + i ,,t
6
C,L=DN= u n \_4 4 • ;u(0,
4J
Jx 5 A + -DY = u J A —+ - , o.i) L4 8 8
C.H=-RY
3'
+ -NS= u ^ 1 ++ 2 _ i+ 2 2 -t=lo,i] 4 8 ' 4 8
C2M=-MH ' 2
+
L -A + 1
c7H =iDK + -!-Dy+-oy = Dr= uu x\± + lA
4J
7,«
u 4-+-.--+-
3
3
4
J
Ciu=-RN "•" 2
+ -DY + -UN= V i - + - , - i - + i i 3 3 *«[».u L4 12 12 12 Jx I /I + -UN = u 2 -wo,J) L4 8 4
' * +1 Cw8 , = i i w + i DK= u XJ* — +—, —, 2 2 iMOj) [ 4 2 8 8J
C,„=-DN '"3
+ -UN + -DN = u / — + — . - - + 3 3 w Ll2 12 4 12
C,„=*r=
c,,=/(y= u 4-+-,--+)]
'
2
2
CSH=-DN '" 2
Ae[0,i] [ 4
8
4
C1L=-DY • 3
[o.i] L4
3
8
+ ^-UN = u / i . + I . - i . + i 2 Aeio.i] [ 8 8 4 2 7
u i | - +2 , - - + l
•ie|o.i] [ 4
2
4
w
^o.i] [4
2
4
J
Step 2: integration of each individual's information for multiple sources 1
2 '•"
3 '•"
6 ''
KMiT_24
9
72
„ 1„ 1„ 1„ J5A 11 C, = - C j „ + - C . „ + - C , , = u M—+—, i 2 3jt 3 jjr 6 u ^o.l]^24 24 _ 1„ 1„ !,, SX 7 2 3 6 a<*o,ij ^4 24 r -lr
*'r
-wjf'1*43
J r
7,« 3 7.K
2
6
7J. ^ 0 | ] |^4
?2.
3J
2
!
"
3
2l
*
2,i
6
3A 19l _ 1„ 1_ 1_ +—L C =-C +-C ,+-C ,= 24 24J 4 2 4K'•" 4t3 **4 6 w A 19] „ 1_ 1_ 1_ 4 24J 2 3 6 n
68 +
?2
1
7 2
r-'r
j.
«
2
.V
J r
>.« j I.M
6
i « i ] [48
-..jf"* .,t
4
48
,[5.1 11 u X]— + — , i.+ 1 /.fo.il [24 24 6 6 J i 13 1 23] •uio.i] [4 24 6 i<)0|]
|^ )6
3 ]6.
IU 4g
4.»1 4 g
j.
C,=-C,j,+-C,u+-C,,u A [ - + — , - - + —1 ' 2 " 3 " ( '-1 ^o.i] L6 24 4 24j
Step 3: integration of equivalent individuals' information j
1 3 2
3
1^, 3
,r3U 67 _ 8 M 52
J MX ^ 31 _7£ + 62' M
3 4 3 4 3 A«(o,i) [ 7 2 72 36 72 1„ 1„ 1^ J29A 155 83i 325
C„ + , + w =-C,+-C. +-C. = u A 3 3 ' 3 i
Step 4: integration of all weighted equivalent individuals' information /? = - Q + - C M + 2 c „ = u A 2
3
6
^l».'i
95/t
929
506/t
2005
432
6x432'
6x432
6x432
24 _
448 After defuzzyfication, a crisp synthesis value obtained for the team's assessment: x'R =0.5783. This describes the assessment result of the team for a situation. It could be the possibility of a risk which has the potential to happen. 5. Conclusions and further study This project develops a multi-sources team uncertain information integration approach by applying fuzzy set technique handing uncertain factors involved. It aims to minimize the influence of incorrect assessment results for a situation, and therefore support achieving better decision-making in crisis problems. Acknowledgements The work presented in this paper was supported by Australian Research Council (ARC) under discovery grants DP0559213. References [1] H. Artman, Team situation assessment and information distribution, Ergonomics, Vol. 43, No. 8, 1111-1128,2000. [2] S.Chang and S. Greenberg, Application of fuzzy-integration-based multipleinformation aggregation in automatic speech recognition, The IEEE Conference on Fuzzy Integration Processing, Beijing, 2003. [3] M. Endsley, Toward a theory of situation awareness in dynamic systems, Human Factors, Vol. 37, No.l 32-64, 1995. [4] M. Endsley and D. Garland, Situation awareness analysis and measurement, Lawrence Erlbaum, Mahway, New Jersey, 2000. [5] M M. Kokar and J.Wang, An example of using ontologies and symbolic information in automatic target recognition. SPIE Conference on Sensor Fusion: Architectures, Algorithms, and Applications VI, April, 40-50, 2002. [6] J. McCarley, C. Wickens, J. Goh and W. Horrey, A computational model of attention/situation awareness, The 46lh annual meeting of the human factors and ergonomics society, Human Factors and Ergonomics Society, 2002. [7] S. Murakami, S. Maeda and S. Imamura, Fuzzy decision analysis on the development of centralized regional energy control systems, The IFSA Symposium on Fuzzy Information, Knowledge Representation and Decision Analysis, Pergamon Press, New York, 363-368,1983. [8] R. Yager, and D. Filev, On the issue of denazification and selection based on a fuzzy set, Fuzzy Sets and Systems, Vol. 55, No. 3, 255-272, 1980. [9] W. Zhang and R. Hill, A template-based and pattern-driven approach to situation awareness and assessment in virtual humans, The fourth international conference on Autonomous agents, Spain, ACM Press New York, USA, 116123,2000.
SCHEDULING A FLOWSHOP PROBLEM WITH FUZZY PROCESSING TIMES USING ANT COLONY OPTIMIZATION SEZGlN KILIC Department
of Industrial Engineering, Air Force Academy, Hava Harp Okulu, Istanbul, 34149, Turkey, [email protected]
Yesilyurt
CENGIZ KAHRAMAN Department of Industrial Engineering, Istanbul Technical University, Islt. Fak., Macka Istanbul, 34367,Turkey, [email protected] Most of the work about flowshop problems assumes that the problem data are known exactly at the advance or the common approach to the treatment of the uncertainties in the problem is use of probabilistic models. However, the evaluation and optimization of probabilistic model is computationally expensive and the application of the probabilistic model is rational only when the descriptions of the uncertain parameters are available from the historical data. In this paper we deal with a permutation flowshop problem with fuzzy processing times. First we explain how to compute start and finish time of each operation on related machines for a given sequence of jobs using fuzzy arithmetic. Next we used a fuzzy ranking method in order to select the best schedule with minimum fuzzy makespan. We proposed an ant colony optimization algorithm for generating and finding good (near optimal) schedules.
1.
Introduction
Flowshop problems are made up of n similar jobs which have the same order of processing on m machines. The objective is to find a sequence of jobs which minimizes some measure of production cost such as makespan or mean flow time. In this paper we are interested in the permutation flowshop scheduling (PFS) problem similar to most of the research on flowshops and real-world production management practices, where the same job order is chosen on every machine. Hence a schedule is uniquely represented by a permutation of jobs. The problem is NP-hard, only some special cases can be solved efficiently [1], Although the PFS problem has often been investigated, very little of this research is concerned with the uncertainty characterized by the impression in problem variables. In most of the work about PFS it is assumed that the problem data are known precisely at the advance or the prevalent approach to the treatment of the uncertainties in the PFS problem is use of probabilistic models. 449
450 However, the evaluation and optimization of probabilistic models is computationally expensive and the use of probabilistic models is realistic only when description of the uncertain parameters is available from the historical data. In this paper we propose a schedule generation algorithm for the cases where the problem data are not known precisely and probabilistic models are not suitable because of its computational expense or absence of historical data. It is assumed that the planner is able to approximate the imprecise data by using fuzzy sets. The first application of fuzzy set theory on a flowshop problem as means of analyzing performance characteristics was by McCahon and Lee [2]. Ishibuchi et al. [3] examined flowshop problems with fuzzy due dates. Balasubramanian and Grossmann [4] applied a fuzzy approach to the treatment of processing time uncertainty. We proposed an ant colony algorithm for fuzzy permutation flowshop scheduling (FPFS) problem. As discussed above there is a little search on FPFS problem and there have not been any work on FPFS using an ant colony optimization (ACO) approach. 2. Formulation of the fuzzy permutation flowshop problem Using fuzzy numbers to represent the uncertainty in processing times is very plausible for real world applications. If a decision maker estimates the processing time (fa) of the job j on machine k as an interval rather than a crisp value then the interval can be represented as a fuzzy number. The use of interval \tjk — At ,ki, tJk + Atjk2 J is more appropriate than the crisp fa value. The fuzzy processing time, fk, can be represented by a triangular fuzzy number (TFN); TJk=(tjk-&Jtt,tJk,tJk+AtJk2)-
The addition and the maximum operators are required for calculating the fuzzy completion time. The addition operator does not distort the shape of triangular fuzzy numbers. In contrast, max {A,B} is not always a triangular fuzzy number while both A and B are triangular fuzzy numbers. Sakawa and Kubota [5] proposed an approximation for max {A,B} which keeps the triangularity as follows; max{^,5}= (a\a2 ,c?)v (b\b\tf)
* (a1 vb\a2 vb\a' vfe3)
(1)
When fuzzy data are incorporated in to the scheduling problem the measure of the makespan of alternative schedules are also fuzzy numbers. There exists a large body of literature that deals with the comparison of fuzzy numbers. More recently, Fortemps and Roubens (1996) have proposed the "area compensation (AC)"method for comparing fuzzy numbers based on compensation of the areas determined by the membership functions. They have shown that AC is a robust ranking technique which has compensation, linearity and additivity properties,
451 compared to other ranking techniques, yields results consistent with the human intuition. The AC of a fuzzy number is defined by; i
AC(X) = 0.5 \{xLa + xRa)da
(2)
0
If we are interested in minimizing the makespan, a schedule with fuzzy makespan Xj will be preferred over a schedule with fuzzy makespan X2 if AC(X!)
be the fuzzy completion time of job n(j);
c ... be the fuzzy completion time of job n(j) on machine k; We can find the fuzzy start and completion times for each job on each machine for a permutation n as follows;
c , ( A t = m a x f c y . ^ . c , ( A t _,) + TK{j)k y/ > 1, V* > 1 C
n(\),k -
C
+ £
n(\),k-\
-K(m ~ c«-(y-D,i C
+ L
«•(!),*
\/k > 1
*o),i
v/ > l
~ln(\\\
n(\\\
(3) (4) (5)
(6)
Once all jobs have been scheduled, makespan (M) can be obtained as follows: C
=c
M = maxC„,„ j=l,-.n '»)
(7) (8)
3. Proposed algorithm 3.1. Ant colony optimization approach Ant Colony Optimization (ACO) is a population based, cooperative search metaphor inspired by the foraging behavior of real ants. Real ants leave on the ground a deposit called pheromone as they move about, and they use pheromone trail to communicate with each other for finding shortest path between food resource and their nest. Ants tend to follow the way in which there is more pheromone trail. In ACO algorithms, artificial ants with the above described
452 characteristics and some specific additional features collectively search for good quality solutions to the optimization problem [7], The seminal work on ACO is Ant System (AS) [8] that was first proposed for solving the Traveling Salesman Problem (TSP). In Ant System, the ants are simple agents that are used to construct solutions, guided by the pheromone trail and heuristic information based on intercity distances. Since the work on AS, several extensions of the basic algorithm have been proposed with different names. The main difference between AS and these extensions are the ways the pheromone update is performed and, some additional details in the management of pheromone trails. Common structure of these ACO algorithms can be illustrated as follows [7]; Step 1. Initialize the pheromone trails and parameters. Step 2. While (termination condition is not met) do the following: Construct a solution, Improve the solution by local search, Update the pheromone trail or trail intensities. Step 3. Return the best solution found. 3.2. Description of the proposed ACO algorithm We proposed an algorithm based on MAX-MIN Ant System (MMAS) [9]. We made modifications as described below in order to schedule a FPFS problem. 3.2.1. Initializing the pheromone trails and parameters One of the main differences of MMAS from other ACO algorithms is that it limits the possible range of pheromone trail values to the interval \Tnin , Tmax J in order to avoid search stagnation. r and T max = 1 / Zgb' min = r max / a > w h e r e ZgTis t n e makespan of the best solution found up to now and a is a parameter. r max and Trrin are updated each time when a new ZJ£ is found. Tlp denotes the quantity of trail substance for job i on plh position. It gives the degree of desirability for an ant to choose job i for position p for the schedule being generated by itself. Initial values for Tip are set to r max , we used an initial solution randomly generated. 3.2.2. Construction of a solution by an artificial ant In ACO algorithms solutions are constructed by artificial ants and each of them is capable of generating a complete solution itself. In order to schedule the FPFS problem, each ant starts with a null sequence and makes use of trail intensities for selecting a job for the first position, followed by the choice of an
453 unscheduled job for the second position, and so on. Each selection is made with the application of a probabilistic choice rule, called random proportional rule, to decide which job to select for the next position. In particular, the probability with which the ant will choose job i for the/>,A position of its schedule is;
Pip=Jj£-,XieS
(9)
9 is the set of unscheduled jobs for the ant. After a complete sequence of jobs is generated by the ant, the completion time of each job on each machine and the makespan of the schedule are found by the equations (3)-(8). Approximation in Eq. (1) is used for maximum operation. 3.2.3. Improving the solution by Local Search We tried to improve the solution generated by the ant by searching its neighborhood. The set of possible moves is defined by a neighborhood of the current sequence. The insertion-move operates on a sequence of jobs and removes a job placed at p,h position and inserts it in (p-l)th or (p+l)th position of the sequence while relative order of other jobs are conserved. 3.2.4. Updating trail intensities After a complete sequence is constructed by each ant (an iteration) and possibly improved by the insertion-move, the trails are updated. In MMAS only one of the ants is allowed to add pheromone. The ant which generated the best tour in the current iteration or the ant which generated the best tour since the start time of the algorithm is chosen for trail updating. Let zf£! denotes the best makespan found by the ants in current iteration. The ant which generated the best schedule in the current iteration or the ant which have generated the best-so-far schedule updates the trails as follows,
^new __ J H'"ip
{P-Tip
| ybest ' rybest [
if job i is placed in position p in the best sequence otherwise
where p denotes the persistence of W
trail (0
T, = T"* will be used in Eq. (9) for selecting jobs for positions.
454 4. Computational experiments There exist many test problems for crisp case of flowshop problem and results of the different models can be compared easily. However the case is not similar and easy for fuzzy case. Same job sequence can give different results because of the approximation techniques used in fuzzy arithmetic or the ranking of the two schedules may change because of the selected ranking technique. We used the test problems in [4]. They proposed a mixed integer linear programming model and also applied Reactive Tabu Search (RTS) algorithm for the FPFS problem. They used seven and 21 point approximations for fuzzy arithmetic operations and AC for fuzzy ranking. They described 5 problems all with 4 machines and job numbers varying between 5 and 20. TFN's are used to represent the fuzzy processing time of jobs on machines. The first problem with five jobs is handled in order to inspect the process more in depth and to find parameter values for the proposed MMAS without a local search process. Parameter values are used as follows; number of ants=10, p=0.90, a=5 and 30% of the ants make global trail updating. Figure 1 displays the search process of the proposed algorithm for 100 iterations.
40
50 Iteration
Figure 1. Search process of proposed MMAS for the first problem
Best solution is found on the third iteration with the sequence [ 5 2 3 1 4]. As seen on Figure 1, diversity of search lessens as die iteration number increases because of the intensification of trails on specific positions for specific jobs. Figure 2 displays the trail quantities for jobs on each position of the sequence after 25 iterations, as seen on the figure there is not a clear intensification of jobs for specific positions.
455
0.005
Figure 2. Trail intensities after 25 iterations
On the other hand, as seen on Figure 3 after 75 iterations there is a clear intensification of trails for the best solution found up to now. The process is expected to stagnate close around the best solution and seemingly there is not need to wait for a new best solution. 0.045 0.04 0.035 0.03
I f 0.025 lT
0.02 0.015
nd i in
\ \
position
Figure 3 Trail intensities after 75 iterations
Table 1 gives the computational results for the other test problems. Each problem is solved with the proposed algorithm five times for 100 iterations. Best solutions and the averages of the found at iterations and CPU times are represented in Table 1 beside with the results in [4]. It should be noticed that, for the same sequence of jobs, different AC values may be computed because of the different approximation methods for maximum operation is used in proposed MMAS and RTS.
456 Table 1 MMAS and RTS performance for test problems
Number of Jobs 8 12 15 20
Best Solution (AC) 189.5 285.75 317.75 451
MMAS Found at iteration (average) 10.2 49 74.6 27
RTS CPU(s) for one iteration 0.091 0.146 0.207 0.276
Best Solution (AC) 187.46 285.75 317.08 451
Found at iteration 43 50 38 555
CPU(s) for one iteration 0.016 0.057 0.112 0.271
5. Conclusion The flowshop problem is NP-hard in crisp case and the complexity of the problem seriously increases when it is fuzzified. So, there is a demand for methods which can approach to the optimum solution in a reasonable time period. Even if we tried to investigate the performance of simple artificial ants on the solution space we were able to access good solutions in reasonable time periods. The performance of the proposed model can be increased (especially in means of solution time) by adding extra capabilities to the artificial ants like look ahead information or local search techniques. References 1. A.H.G. Rinnooy Kan, Machine scheduling problems: Classification, complexity and computations. The Hagues: Martinus Nijhoff, (1976). 2. C.S. McCahon, E.S. Lee, Fuzzy job sequencing for a flow shop, European Journal of Operational Research, 62 (1992). 3. H. Ishibuchi, N. Yamamoto, T. Murata, H. Tanaka, Genetic algorithms and neighborhood search algorithms for fuzzy flowshop scheduling problems, Fuzzy Sets and Systems, 67 (1996) 4. J. Balasubramanian, I.E. Grossmann, Scheduling optimization under uncertainty - an alternative approach, Computers and Chemical Engineering, 27 (2003). 5. M. Sakawa, R. Kubota, Fuzzy programming for multiobjective job shop scheduling with fuzzy processing time and fuzzy duedate through genetic algorithms, European Journal of Operational Research, 120 (2000). 6. P. Fortemps, M. Roubens, Ranking and denazification methods based on area compensation, Fuzzy Sets and Systems,S2 (1996). 7. M. Dorigo, T. Stutzle, Ant Colony Optimization, The MIT Press (2004). 8. M. Dorigo, V. Maniezzo, A. Colorni, The Ant System: Optimization by a colony of cooperating agents, IEEE Tansactions on System, Man and Cybernetics, 26 (1996). 9. T. Stutzle, H.H. Hoos, MAX-MIN Ant System, Future Generation Computer Systems, 16 (2000).
TIME DEPENDENT VEHICLE ROUTING PROBLEM WITH FUZZY TRAVELING TIMES UNDER DIFFERENT TRAFFIC CONDITIONS TUFAN DEMIREL* Department of Industrial Engineering, Yildiz Technical University, Yildiz-Istanbul, 34349, Turkey NIHAN CETIN DEMIREL Department of Industrial Engineering, Yildiz Technical University, Yildiz-Istanbul, 34349, Turkey Time dependent vehicle routing problem is a vehicle routing problem in which travel costs along the network are dependent upon the time of day during which travel is to be carried out. Most of the models for vehicle routing reported in the literature assume constant and deterministic travel times. This paper describes a route construction method for time dependent vehicle routing problem with fuzzy traveling times according to different traffic conditions.
1. Introduction Distribution and transportation networks become an important part of our daily life gradually. Vehicle Routing Problems (VRP) are concerned with the delivery of some commodities from one or more depots to a number of geographically scattered customers with known demand. The goal is to find routes for the vehicles, each starting from a given depot to which they must return, such that every customer is visited exactly once. Usually there is also an objective that needs to be optimized, e.g. minimizing the travel cost or the number of vehicles needed. Time dependent vehicle routing problem (TDVRP) is a VRP in which travel costs along the network are dependent upon the time of day during which travel is to be carried out. This problem has constrained time for delivery. The objective function of TDVRP becomes a composite function: • Maximize total number of customers served. • Minimize total number of customers unserved (if allowed).
* Corresponding author Tel.: +90-212-2597070 (2547) E-mail address: [email protected] (T. Demirel)
457
458 • • • • •
Minimize total backorder costs (if allowed). Minimize total lateness duration (if allowed). Minimize total number of vehicle used. Minimize total distance travelled. Minimize total costs.
The difference of ours study is acceptance of travelling times as fuzzy numbers. We allowed the travelling times as fuzzy numbers because in real life the travelling times between to nodes is not constant. Our aim for this paper is to improve nearest neighbour based heuristic algorithm for a problem with time constraint and fuzzy travelling time. The rest of this paper is organized as follows. Section 2 presents a brief literature review dedicated to different vehicle routing problems. Section 3 describes time dependent vehicle routing problem with fuzzy traveling times. Section 4 explains nearest neighbour based a heuristic method for our defined problem. And also in this section algorithm is used and results are given for a problem with different traffic conditions. Finally, section 5 concludes and proposes future avenues of research. 2. Literature Review Many studies on vehicle routing problem have been published. Liu and Shen [2] presented a route construction method for the vehicle routing problem with multiple vehicle types and time window constraints. They extended several insertion-based savings heuristics. Tan et al. [3] investigated and developed various advanced artificial intelligent (AI) techniques including simulated annealing (SA), and genetic algorithm (GA) to effectively solve the Vehicle Routing Problem with Time Windows to near optimal solutions. Moin [4] explanted two hybrid genetic algorithms developed to solve vehicle routing problems with time windows. Czech and Czarnas [5] presented a parallel simulated annealing algorithm to solve the vehicle routing problem with time windows. Hapke and Wesolek [6] suggested a mathematical model taking into account real constraints and goals of a concrete Polish transportation company. In their model both flexibility and uncertainty were handled by means of fuzzy sets. Lau et al. [7] explained a variant of the vehicle routing problem with time windows where a limited number of vehicle is given. Ichoua et al.[8] presented a model based on time-dependent travel speeds which satisfies the "first-in-firstout" property. An experimental evaluation of the proposed model was performed in a static and a dynamic setting, using a parallel tabu search heuristic. Donati et al. [9] described a time dependent model of the vehicle
459 routing problem, with delivery time windows, TDVRP, and the optimization algorithms used, based on the Ant Colony System and local search procedures. 3. Problem Description The VRP can be represented with an incomplete directed graph G(V, A), where A is a set of oriented arcs connecting pairs of nodes, and Fis die set of nodes of which one represent a depot, and the rest the customers. In this paper we consider the Time Dependent Vehicle Routing Problem (TDVRP) having the following features: 1. A single commodity to be distributed from a single depot to customers with known demand. 2. Each customer must be visited by exactly one vehicle. 3. Each vehicle has the same capacity. 4. Total service time of day has a time constraint. 5. Total service time of day is divided into time intervals which depend on traffic conditions of city. 6. Travelling, waiting, and unloading times include an uncertainty 7. The objective is to minimize the total number of vehicle used. As shown in Figure 1 traveling time in a day is changing according to a continuous function [10]. If we appropriate this time constant and deterministic, some issues will be appeared. As a sample; time constraint will be exceed, can not visit some customer or visit lately. These results will be occurred because of unforecast of the traveling times. Clearly, in real life according to different traffic intensity, the traveling times will be variable. Acceptance of this time as fuzzy numbers will approximate to a right conclusion. A
Traveling Time
Time of a day Figure 1. Travel time variation as a continuous function
In our studies we prevent all customer demands in time dependent by choosing the needed vehicle and routing. Because of the traveling times uncertainty we accept these times as a triangular fuzzy numbers (TFNs). A triangular fuzzy
460 number can be defined by a triplet (a, b, c). The parameters a, b, and c, respectively, denote the smallest possible value, the most promising value, and the largest possible value that describe a fuzzy event. The membership function is defined as: 0, l(x)--
x
(x - a)/(b -a),
a<x
(c - x)/(c -b),
b<x
(1)
x>c
0,
4. Nearest Neighbour Based Heuristic Algorithm Our designed heuristic algorithm is similar to nearest neighbour. In our studies we take into consideration, the time of nodes in order to distance between the nodes. Time between nodes are fuzzy triangular numbers. For choosing the minimize time intervals v/e used the Chui and Park's [1] fuzzy ranking method. The weighted method compares traveling times by assigning relative weights. The evaluation of the traveling time in the form of a TFN (a, b, c) is determined by assigning relative weight [1]. a+b+c
+ wb
(2)
where w=0.2 represent the relative weight as the magnitude. 4.1. A heuristic algorithm The basis logic of the algorithm is choosing the min. time interval node between nodes. During selection of the node, vehicle capacity and time constraint have been taken into consideration. The notation used in this paper is defined as follow: S
: set of unvisited nodes.
k
: numbers of vehicle used.
Dj
: demand of jth node (customer).
UC
: update capacity for kth vehicle.
Ty
: fuzzy traveling time \Ty = (cty, by, ctj )) from i toy.
461
Tip
: minimize fuzzy traveling timefromi to p for kth vehicle.
CT0p : cumulative fuzzy traveling timefrom0 (depot) top for kth vehicle. In our studies the scheme of the heuristic algorithm is designed as follow: k =\ Repeat f=0 Repeat j e SI S = {/ —> Y/', j is unvisited node and i & y J
k if (c/C* >o)and(uC
>min(Dj)] J*S
(cf0k < Total Time) then\peS\T*=
min
jf„}l
endif if (cfQkp © fip )< TotalTime
then Update
I let S = S {p} letUCk = UCk k
letCT
p
-D.
CTOp r
\let i = p
endif Until UCk <min(Dj) or^CTkp ®fip)> Total Time] jzS
letk = k + \ Until S = <j> A time interval and coefficient of traffic intensity have been taken into consideration in determining fuzzy traveling time as shown Eq.(3). Where ^ is a coefficient depending on traffic intensity for ith time interval and a time interval is definite time length depending on time of a day.
462
T- = v
A\(ly>mij>uij)
,Time interval 1
A2(lij,miJ,Uy)
.Timeinterval
^3Kij'mij>uij)
, Time interval n
2 (3)
4.2. A numerical example The case study will be seen in Figure 2. In this figure there are 6 customer nodes and single depot. The customer demands are given in Figure 2. In this case study the vehicles are unlimited and each vehicle has same capacities as 10 units. All nodes at traveling time intervals will be seen as fuzzy triangular numbers in Table 1.
0
©
3 units
6 units (£) 10 units
depot
©
©
5 units
6 units f O 4 units
Figure 2. The demand for each customer. Table 1. Fuzzy traveling times for all nodes. 0 0
1
2
3
4
5
6
(30,40,55)
(35,55,65)
(20,30,45)
(30,35,45)
(50,60,75)
(40,45,55)
(40,50,70)
(50,55,70)
(40,60,75)
(30,50,60)
(40,50,60)
(30,40,60)
(25,35,45)
(10,30,40)
(50,60,80)
(15,25,40)
(30,50,65)
(45,50,65)
(20,30,50)
(30,50,55)
1
(30,40,55)
2
(35,55,65)
(40,50,70)
3
(20,30,45)
(50,55,70)
(30,40,60)
4
(30,35,45)
(40,60,75)
(25,35,45)
(15,25,40)
5
(50,60,75)
(30,50,60)
(10,30,40)
(30,50,65)
(20,30,50)
6
(40,45,55)
(40,50,60)
(50,60,80)
(45,50,65)
(30,50,55)
(20.40,50) (20,40,50)
Obtained result using developed heuristic algorithm is showed in both Figure 3 and Table 2. On the result we determined 4 vehicle and found the route for each. And also for each vehicle's service times occurred including the last customer.
463 The service times from the last customer to the depot will not take into the consideration.
Figure 3. A solution for case study. Table 2, Detail results for case study. Vehicle Route 1 2 3 4
Depot-*3-»4->Depot Depot-* l-»5->Depot Depot->6-»Depot Depot->2->Depot
Used Capacity 9 9 6 10
Total Traveling Time (35,55,85) (60,90,115) (40,45,55) (35,55,65)
Time Constraint < 150 < 150 < 150 < 150
5. Conclusion The vehicle routing problem is a very important problem in distribution and logistic systems. Satisfying the customer demands exactly and just in time will be increasing the customer service quality. Because of the uncertain traveling times we used the fuzzy triangular numbers. Development of heuristic algorithm has come to a conclusion for time dependent vehicle routing problem with fuzzy traveling times with an empirical sample. For the future models, inserting the traveling times according to different traffic conditions will be allowed more real conclusion. References 1. 2.
C.Y. Chiu and C.S. Park, Fuzzy cash flow analysis using present worth criterion, The Engineering Economist, 39,2, 113-138, (1994). F.H. Liu and S.Y.Shen, A Method for Vehicle Routing Problem with Multiple Vehicle Types and Time Windows, Proc.Natl.Sci.Counc, 23, 4, 526-536 (1999).
3.
K.C. Tan, L.H. Lee, Q.L. Zhu and K. Ou, Heuristic methods for vehicle routing problem with time windows, Artificial Intelligence in Engineering 15,281-295,(2001). 4. N. H. Moin, Hybrid Genetic Algorithms for Vehicle Routing Problems with Time Windows, International Journal of the Computer, the Internet and Management 10, (2002). 5. Z.J. Czech and P.Czarnas, Parallel simulated annealing for the vehicle routing problem with time windows, In Proceedings of 10th Euromicto Workshop on Parallel Distributed and Network-Based Processing, Spain, 376-383, (2002). 6. M. Hapke and P. Wesolek, Handling Imprecision and Flexible Constraints in Vehicle Routing Problems:Fuzzy Approach, Report RA-005/2003, Politechnica Poznanska, (2003). 7. H. C. Lau, M. Sim and K. M. Teo, Vehicle routing problem with time windows and a limited number of vehicles, European Journal of Operational Research 148 559-569, (2003). 8. S. Ichoua, M. Gendreau and J. Y. Potvin, Vehicle dispatching with timedependent travel times, European Journal of Operational Research 144, 379-396, (2003). 9. A. V. Donati, R. Montemanni, N. Casagrande, A. E. Rizzoli and L. M. Gambardella, Time Dependent Vehicle Routing Problem with a Multi Ant Colony System, Technical Report IDSIA-17-03, (2003). 10. A. Haghani and S. Jung, A dynamic vehicle routing problem with timedependent travel times, Computers&Operations Research, 32, 2959-2986, (2005).
A PROGRAMMING MODEL FOR VEHICLE SCHEDULE PROBLEM WITH ACCIDENT CHUANHUA ZENG 1 ' 2 'College of Auto-mobile and Transportation Engineering, Xihua University, ChengDu City PR. China 610039, Phone: +86-028-89829140, E-mail: zchfirstm63.net YANGXU 1
Intelligent Control Development Center, Southwest Jiaotong University, Chengdu P.R. China 610031 WEICHENG XIE
}
College of Electrical & Information Engineering, Xihua University, ChengDu City P.R. China 610039 A good transport schedule plan may be discomfited when a transportation accident happens. In order to get an adjusted transport schedule, we put forward a model for the vehicle routing problem with stochastic in contingency, and discuss the algorithm by combining stochastic simulation with genetic algorithm. We verify this algorithm by applying it to a specific case, and eventually reach the conclusion that it is better than the ordinary way.
1. Introduction Logistics distribution plays a pivotal role in logistics system. It involves assigning goods, assembling goods and delivering goods to customers in time. In order to obtain the highest serving level with the lowest cost in this work, various VRP(Vehicle Routing problem) and VSP(Vehicle Schedule problem) problems are studied and many optimization algorithms are put forward, such as the accuracy Algorithm including Brach and Bound Approach121, Cutting Planes Approach, Network Flow Approach and Dynamic Programming Approach'51; the Heuristics Algorithm including Constructive Algorithm, Two-phase Algorithm151, etc. All the Algorithms above are studied for the optimization of the future vehicle schedule problem, while there is scanty literature dealing with the vehicle schedule in contingency. In effect, accidents in transportation often 465
466 affect the plan. For instance,(l) the vehicle team can't set up if driver is absent; (2) the vehicle can't undertake work if it's under repair; (3) the vehicle can't reach destination on time if it breaks down midway or gets stuck in traffic jam; (4) suppose there are some extra tasks prop up, need to be done as soon as possible; (5) and also, the transportation time may be affected by the vehicle's capacities, performances of workers and road conditions. On the whole, all these factors involved may pose big problem to the schedule. Since the transportation task may be interrupted by contingency, we've got to adjust the transport schedule plan under these circumstances by applying an optimization algorithm based on SVRP (Stochastic Vehicle Route Problem). 2. Problem Describing Suppose the transport schedule plan has been set, it needs to be adjusted by applying an optimization algorithm based on SVRP (Stochastic Vehicle Route Problem). Here SVRP means that VRP with some stochastic restrictions. We put forward a model for the vehicle routing problem with stochastic in contingency'11, and discuss here the algorithm by combining stochastic simulation with genetic algorithm. The supposed conditions are showed as follows: (1) The usual transportation schedule plan exists; (2) The plan is being performed while a vehicle goes wrong; (3) Each vehicle's loading capacity is limited; (4) Each vehicle serves various customers while carrying out its tasks along a certain route, and the vehicles in good conditions are expected to accomplish the assigned tasks; (5) Each customer's freight could be transported by one vehicle only; (6) Each customer's freight should be delivered in a certain time window; (7) The transportation time between two customers is stochastic. 3. Model 3.1. Some Notations Some notations in model are described as follows: n : The total number of customers served by vehicles in good condition; L : The total number of customers served by vehicles in contingency; n: Final total customers' number, here the customers served by vehicles in contingency are classified into two types : one is load-customers and another is unload-customers, and n = n + 1L ; m : The vehicles' number after contingency, m = m'+l,herem' is the number of vehicles in good conditions, and " 1 " means a new vehicle;
Qk: Capacity of vehicle k; o,: Weight of customer's freight, 1 = 1,2,..., n; Dy : Distance between customer Jand customer j . There are three vectors x0, y0 and t0: x 0 =(x ,x ,...,x ) is an integer vector, \<x'
0 0 ,
•
°
,0
I
J>
>J
> >
'
Yo =(yo.yo.-".yo I ) is an integer vector, y{,,£ = l,2,...,m is corresponding to the superscript of xo(i = l,2,...,n'), and .y,
= z
n+(21)
l •
The other two decision vectors x and y: x = (xi,x 2 ,...,x n ) is an integer vector, l<x,<«and xi*Xj,\,i = \,2,...,n . y = (y 1 ,y 2 ,...,y m ) is an integer vector, yk (k = l,...,m) is corresponding to the superscript ofx;(i = l,2,...,n,...,n + 21), andyx
/, Jx
yk-i*JK
(x,y,t) = fx '•"
+ TX
'
(x,y,t)vax
•/x«-i+y-iv
, y
'
+ SX * « - I + /-I
*,k-)+J-ixn-i+J
2<=i<=yk-yk]
Let g(x, y) be the total distance covered by all vehicles, then we get that m
g(x,y) =
^gk(x,y), k=\
+£ here gk(x,y)= Z A V . + I W
+D
*yko>yk >yk-\
(i).
tyk =yk-4 We know that v £
' (>'i-l+1.>'i-J+2,...,>'it),
yk
yi-\
£ « - * - J^qxj
For each customer, we get that PT{fi(x,y,te[ai,bi]}>/3„i = l,2,...,n.
j=yk-i+i
(2)
3.2.
Model
We get the final model: min g(x,y) s.t. Pr{// (*, y,0 e K .*<]}* ft, i = 1,2,...,« 1 < *, < «,;' = 1,2,..., n x,- * Xj,;' * y,;',y = 1,2,...,« Q
>*0
i—>*o ' — l y»-i+l' ^t-i+l
if z) e{Xyt_l+i,Xyt_l+1,...,xyk), ^i e{Xytj+uXyk]+u...,xyi:},
>"*'
then and z's is behind z,
yk
yi-\
y=r*-i+i
J=yk-t+l
and i =
yk-i+lyk-]+2,...,yk
Xj, yt e Z, i = 1,2,..., n, j -1,2,..., m -1. {*o'~l+1'*o*",+2•••••*<>*} certain order.
is
corresponding with {*yt_,+i,*r*-,+i >->*>'*} by
4. To Construct the Chromosome We are considering solving this problem by combining stochastic simulation with genetic algorithm1'1, and how to construct the chromosome is described as follows. We take the chromosome as an operation plan, while genes x, y have the same meaning as decision vector, here we omit t because t is certain(s , here t k is the time when a new decision needs to be make after accident happens). 1 2
m
To initialize the chromosome randomly. According to yn =(y ,y ,—,y ) , we divide x0=(x ,x ,...,x ) into m sets * • .,,*• .,,...,* • ,K = 1,2,...,/W and "
o
o
o
^t-i+i
yk-\+i
yk
add an empty set 0 into it to denote the set of customers served by new vehicle, here the 0 is marked as k +1 th set. To each remainder / of accident vehicle, we generate a random position k between 1 and k + \ to express that customer / will be served by vehicle k , then add zhz] into k'th set. To each set, we assembly it randomly and get new genes x = (xi,x2,—,x„), y = (yi,y2>—>ym,+\)> here yk(k = \,2,...m +1) is corresponding to the sub mark of the first x, in each set. We verify the chromosome, and if it is feasible we will accept it, otherwise we go through the procedure again till we get a feasible one.
469 To cross the chromosomes, suppose two chromosomes (VUV2) will be employed to make cross, here F, = (*,-, v,,/,),»'= 1,2 • Firstly, we verify whether they are feasible to be made cross: to each Vt , divide it into m +1 sets according to p O ' ] , ^ y„' + |), then compare them one after another, if all sets of each Vi are equal respectively, then select the best part of chromosomes to make cross, and get a new chromosome; continue this work by combining the sets of (VUV2) in line with 2x2,3x3,...,w'x/n', if all sets of each Vt are equal respectively, then select the best part of chromosomes to make cross, and get a new chromosome. To mutate the chromosomes, suppose there are two random numbers /,, l2 for gene x , to exchange z; and z\ of customer /, with customer l2 and verify it, if it is feasible, we will accept it, otherwise we go through the whole procedure again till we get a feasible one. 5. An example Here we taking the problem in [1] as an example to verify the model and algorithm, and the initial best plan is showed as follows: Vehicle 1: 0 -> 10 -> 19 -> 17 -> 18 -> 0; Vehicle 2: 0 - > 1 3 - > 8 - > 2 - > 1 6 - > 0;Vehicle 3: 0 - > l - > 1 4 - > 6 - > 7 - > 4 - > 0 ; Vehicle 4:0->3->9-> 15-> l l - > 5 - > 12->20->0; Here "0" means distribution center, and the other numbers mean the customers. The start time of four vehicles are 8:32, 8:05, 8:14 and 8:32 respectively. At 10:00 vehicle 1 has an accident while it is on the way to customer 18 (we name this place as A), so the task for customer 18 is not finished and its demand is 200. Meanwhile, vehicle 1 has finished its all tasks and has been in distribution center; vehicle 3 is unloading the goods for customer 6, and its start time will be at /0=10:13, while it has its own tasks such as customer 7 and customer 4 yet to be finished, also vehicle 3 has capability of 640 left; vehicle 4 is unloading the goods for customer 9, and its start time will be at t0=10:02, while it has its own tasks such as customer 15, customer 11, customer 5, customer 12, and customer 20 yet to be finished, also vehicle 3 has capability of 600 left. The matrix of time between customers, the time widows, demands for customers and the matrix of distances can be found in [1]. The time of unloading goods for customer 4, 5, 7, 11, 12, 15, 18 and A are 10, 13, 20, 18, 20, 20, 14, 10 respectively. We expect that the service level for customers in its time window is more than 90%, so we get that Pr{/} (x,y, t) e [a,-,bt],i = 1,2,...,«} > 0.90 .
470 We construct the model in the way put forward above, then solve this model by combining stochastic simulation with genetic algorithm111, and finally we get the transportation schedule plan showed as follows. Vehicle 3: 6 -» 7 -> 4->0; Vehicle 4: 9 -> A -> 18 -> 15 -> 11 -> 5 -> 12 -• 20 -> 0, The total distance is 415. If we send a new vehicle from distribution center to transport the goods at A, the transportation schedule plan would be: Vehicle 3: 6 -> 7 -> 4 -> 0; Vehicle 4: 9 -> 15 -> 11 -> 5 -> 12 -> 20 -> 0; New vehicle: 0 -» A -> 18 —> 0; Here the total distance would be 515. So the first plan is much better than the second plan. Conclusion To VSP under the accident, firstly we construct a stochastic programming model. Secondly we solve this model by combining stochastic simulation with genetic algorithm put forward in [1], and give a way of how to construct, cross and mutate the chromosomes particularly. At last, we apply the algorithm to a specific case, get a new transportation schedule plan, and verify that the new plan has more advantages than the old one. Acknowledgements This paper is supported by the National Natural Science Foundation of P.R. China (Grant no. 60474022). References 1. Liu Baoding, Zhao Ruiqing, Wang Gang. Uncertain Programming With Applications. Beijing: Publishing House of Qinghua university, 2003.8. 2. Guo yaohuang, Qian Songdi etc. Operational research. Beijing: Publishing House of Qinghua university, 1990:266 ~ 267. 3. Liu B and Lai K K. Stochastic programming models for vehicle routing problems. Asia Information-Science-Life, 2002,l(l):13-28. 4. Laporte G, Nobert Y. Exact Algorithm for the vehicle route problem[M]. Amsterdam: North-Holland publishing, 1987. 47-84. 5. Christofides N. A new Exact Algorithm for the vehicle route problem based on q-path and k shortest path relaxationsfR], London: Imperial College, 1993. 6. Fisher M L. Vehicle routing with time windows: Two optimization algorithms[J]. Operation research, 1997,45(3): 488-492.
A WEB DATA EXTRACTION MODEL BASED ON XML AND ITS IMPROVEMENT WEICHENG XIE College of Electrical & Information Engineering, Xihua University, Cheng Du, Sichuan Province, China CHUANHUA ZENG Intelligent Control Development Center, Southwest Jiaotong Chengdu P.R. China
University,
A web data extraction model based on HTML or XML Web pages is provided. Firstly, read the Web document from the web server with STOCK, and check format of the Web document, transform the existing HTML web page into XML or XHTML (a subset of XML); secondly, an "operation" on a Web page can generate series of XML documents, integrating these documents will lead to data storing; thirdly, the absolute path in Xpath and the anchors can extract interest data with tools of XML data format; finally, retrieve the data and construct XML output, display the inquiry result on the browser. The result show the implementing Web data extract with the model is effect, but its limitations and defects is existed, an improved semantic web data extraction model is provided.
1. Introduction There are so many data on the web that how to make full use of them has become a hot subject in the field of database technology research. How to obtain information from the web is becoming a hot talk, and various data mining models have been put forward to solve this problem. Web data extraction is a key process of web data mining. Web data extraction is the process to obtain information, including texts and multimedia, from the web, to meet the clients' needs. Therefore, web data extraction is critical to web data mining, and it is necessary to examine the way to do so. Data Mining is to discover rule-governed information from plenty of data, improving the quality of data use [1]. KDW (Knowledge Discovery in Web) covers three different data mining tasks: Content-based data mining; Structure-based data mining and Record-based data mining. While traditional databases are highly structured, web data are typically half-structured and overwhelmingly written by HTML. So web data extraction can't be executed fully automatically so far. But XML is semantic-based and 471
472 can be tackled well by the program, so it is highly likely to extract data automatically from XML data. 2. XML and Web Data Extraction Technology XML is especially for the web application service [2], XML comes up with solutions, which HTML fails to: Internet's lower-speed connection despite its high-speed development; and Difficulty in getting the desired information from desultory data on the web [3]. XML can provide structural and semantic information, and make computers and servers process information in different forms in time. So the new web space based on XML is web data oriented, compatible well with existing web applications, and easier to share and exchange web information. XML makes it possible to map the document descriptions with relation database properties and to inquire information and extract models accurately. 3. Web Data Extraction Model and Extracting Procedures 3.1. Web Data Extraction Model In our development experiments, we create a web data extraction model, the model we propose functions this way: firstly, transform the HTML page into XML format; then, inquire the XML document; finally, display the inquiry result on the browser, and game over. Figure 1 demonstrates the web data extraction model base on XML, which covers the following tasks: 3.2. Web Page A ccess and Data Extraction In the process of data extraction, two kinds of web pages will rise: the page with desired data and the page with hyperlinks pointing to the desired data. By carefully analyzing the navigation rules of the website, we can describe the data manually or with some helpful tools. The key of data extraction is transforming the existing web page into XML or XHTML, and using XML data formatting tools to search for related data. Currently many HTML pages on most websites are format incomplete. Browsers like Internet Explorer can tolerate the ill format. Therefore, the first step is to transform the ill-defined HTML web page into well-defined XML document, following data extraction. Some tools can help to format the HTML page in an organized way, among them is Tidy, which can filter the errors in the HTML page, and is charge free. We can call the XMLHelper.tidyHTML() method to realize our purpose, with a URL as its parameter, and call XMLHelperger.outputXMLToFile() method to make an XML format document.
Web Pages (HTML.XML)
RDF API Transforming the HTML pages into XML format
Data extraction executer
XML DOC storing
Data extraction executing
X
Showing extraction result on browser
Showing extraction result on browser via Internet Figure 1. A web data extraction model faced on HTML or XML pages.
Figure 2. An improved model of Web data extraction based on XML
3.3. Structure Integration and Data Storing An "operation" on an HTML page can generate series of XML documents. Integrating these documents will lead to data storing. The storing technology of XML documents has been widely researched. Besides some general storing systems, some exclusive storing system is introduced one after another. There are three ways to store the XML data: in file systems, in databases (including relational databases and object-oriented databases), and in exclusive systems. Each way has its advantages and disadvantages, and a proper choice of storing is determined by the specific situation. 3.4. Data Mapping-XML Document Inquiry Data extraction is characterized with inquiring and manipulating the data sets out of the web pages, following the integration of the data sets. The XML inquiry language can manipulate the content, so we adopt UnQL and StruQL, designed by AT & T Laboratory, which can fulfill such tasks as inquiring, constructing, transforming and integrating. XML-QL (XQL) integrates inquiry language technology and XML grammar [4]. Declaring the path expression and pattern, providing the Where clause to point out the inquiry condition and XML data module, the final form is still XML. For example: >in www.xhu.edu.cn/library/newbook.xml
Construct
475 of an XML document. After the previous process, the tags in the document are nested correctly. Using JAVA's event-driven method, such as Document_start() (on starting receiving the document);Document_end() (on finishing receiving the document);Element_start() (on starting an XML tag);Element_end() (on finishing an XML tag);Characters() (on retrieving an XML character) and Comment () (on giving comments). By calling these methods properly, a document can be properly traversed. Characters returns the content of an XML document, compares it with the desired content, judges whether it is the desired data, If yes, such methods as Elementstart and Element_end can be called to get the current path, which is the Xpath that is the data reference. Another method is to extract data from the anchors. Due to HTML pages' constant change, absolute path method will trigger an error. Changes are mainly involved in the location of the information, which is often included in such tags as ,
and . Therefore, we should construct location information independent of absolute path, which involves seeking the anchor including the extraction information. Generally, anchors, based on the information content, have nothing to do with HTML paths. For example, if we want to extract the information about books recently-published, as soon as finding the word "book_name", we have make an anchor independent of the path, codes are as follows: <xsl:template match="td[contains(.,'Last Tade')]">
476 A new file will come into being if it is the first time to extract data. If there is another file, we can use the Merge function to merge the two files, and we can check the correctness of data extraction. 5. Defects of The Model and an Improved Model The model is designed for HTML document or XML document. But data on the web are not limited in HTML format or XML document; there are other forms, such as databases, logs, and files. How to extract data from these sources is a great challenge. Adopting XML grammar, RDF can easily realize automatic search without manual tagging interference, improving the rate of search coverage and veracity. Logically, the data extraction system base on XML and RDF has three layers: Information layer; Middle layer (RDF and XML) and Application layer. Considering its logical structure, the previous model can be modified and an improved model is Figure 2. 6. Conclusion A web data extraction model based on XML and its framework implementation are introduced. Based on the discussion about its limitations and defects, an improved semantic web data extraction model is provided by the authors. Data mining is composed of repetitive data extraction processes. We should consider the specialty of data mining, extract data from the web time and again, merge the products, and construct a practical data mining system. References 1. Jiawei Han, Micheline Kamber, DATA MINING Concepts and Techniques. 1st ed., Springfield: High Education Press and Morgan Kaufmann Press, pp.6-12 (2001). 2. MyllymakiJussi.Effective, Web Data Extraction with Standard XML Technologies. International Journal of Computer and Telecommunication Networking In: 10th intl. World Wide Web Conf. Hong Kong, (2001). 3. V. Baran, M. Colonna, M. Di Toro and A. B. Larionov, Intelligent Web Mining System Based on MLDB. Computer Engineering, vol.30, no.5, pp93-94, 101 (2003). 4. Chamberiin D D.Robie J, Florescu D. Quilt, An XML Query Language for Heterogeneous Data Sources. In: Proc. Of the Third Intl. Workshop on the Web and Database, Dallas, Texas, U.S.A., May 2000,pp53-62 (2001). 5. Zhang Chenghong, Gu Xiaohong, Bai Yanhong, The Progress of Web Data Extraction Technology. Computer Science, vol.31, no.2, pp 129-131, 151, (2004).
EVALUATION OF E-SERVICE PROVIDERS USING A FUZZY MULTI-ATTRIBUTE GROUP DECISION- MAKING METHOD CENGIZ KAHRAMAN Istanbul Technical University, Department of Industrial Engineering, 34367 Macka Istanbul, Turkey GULgiN BUYUKOZKAN Galatasaray University, Department of Industrial Engineering, 34357 Ortakoy Istanbul Turkey Since most of the companies prefer outsourcing for e-activities, the selection of e-service provider becomes a crucial issue for those companies. E-service evaluation is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. Cost-benefit analyses applied to various areas are usually based on the data under certainty or risk. In case of uncertain, vague, and/or linguistic data, the fuzzy set theory can be used to handle the analysis. This paper presents the evaluation and selection process for e-service provider alternatives. Many main and sub-attributes are considered in the evaluation. The e-service providers are evaluated and prioritized using a fuzzy multi-attribute group decision-making method. This method first defines group consistency and inconsistency indices based on preferences to alternatives given by decision makers and construct a linear programming decision model based on the distance of each alternative to a fuzzy positive ideal solution which is unknown. Then the fuzzy positive ideal solution and the weights of attributes are estimated using the new decision model based on the group consistency and inconsistency indices. Finally, the distance of each alternative to the fuzzy positive ideal solution is calculated to determine the ranking order of all alternatives.
1. Introduction As it is open seven days a weeks, 24 hours a day, in fact, according to industry observers, Web-based customer service that is also known as "E-Service", is one of the biggest business opportunities on the Net. Internet's influence in creating e-services has been revolutionary for providers and their customers. Unfortunately, there has been a wide gap between inspiring applications of the Internet that help increase service customization while maintaining or even improving delivery efficiency. The firms are not able to determine the scale of the e-service they will provide [1] and select the best Internet service provider, which meets their needs for e-service applications. For firms concerned with e477
service, one central issue is to determine the decision criteria for proper selection of an e-service provider. The lack of accurate decision criteria reveals benefit/cost imbalance such as high investment and inadequate return. This article aims at developing a model for evaluation of different e-service providers for a particular company. Multiple attribute decision-making (MADM) problems find a best compromise solution from all feasible alternatives assessed on multiple attributes, both quantitative and qualitative. Suppose the decision makers have to choose one of or rank n alternatives, A A ••• A , based on m attributes, C ,C •••Cm. An alternative set is denoted by A = \A A •••,Anj and an attribute set is denoted by c = jc , C , • • •, C }• Let Xy be the score of alternative A, (i = 1,2,•••,«) on attribute Cj(j = 1,2, • • •,m), and suppose CO is the relative weight of attribute Cj, where a > o (j = 1,2, —,m) and g a = j . A weight vector H j is denoted by a> = (a>, ,co2,--,co J • A MADM problem can then be expressed as the following decision matrix: c, A D =
(x)
=
4
X
u
c
ca X
n
'"
** *n -
X
\m
*~
(1)
Crisp data are often inadequate or insufficient to model real-life decision problems. Human judgments are vague or fuzzy in nature and as such it may not be appropriate to represent them by accurate numerical values. A more realistic approach is to use linguistic variables to model human judgments. In this paper we evaluate e-services using Li and Yang's approach [2]. In their approach, linguistic variables are used to capture fuzziness in decision information and group decision-making processes by means of a fuzzy decision matrix. They propose a new vertex method to calculate the distance between triangular fuzzy scores. Group consistency and inconsistency indices are defined on the basis of preferences between alternatives given by decision makers. Each alternative is assessed on the basis of its distance to a fuzzy positive ideal solution (FPIS), which is unknown. The fuzzy positive ideal solution and the weights of attributes are then estimated using a new linear programming model based upon the group consistency and inconsistency indices defined. Finally, the distance of each alternative to FPIS is calculated to determine the ranking order of all alternatives. The lower value of the distance for an alternative indicates that the alternative is closer to FPIS.
The paper is organized as follows. In next section, the basic definitions and notations of fuzzy numbers and linguistic variables are defined as well as the fuzzy distance formula and the normalization method. Section 3 defines fuzzy group decision-making model. Fuzzy multi-attribute e-service provider selection is explained in Section 4 and illustrated with a numerical example in Section 5. The paper is concluded in Section 6. 2. Related Terms 2.1. Triangular Fuzzy numbers Let ? = {l,m,u) be a triangular fuzzy number (TFN). Its membership function /*.(*) is given by I \
x- -l l<x<m m -/' \m-i u— u-x x u- -m m<x
(2)
2.2. Linguistic variables A linguistic variable is a variable whose values are linguistic terms. The concept of linguistic variable is very useful in situations where decision problems are too complex or too ill defined to be described properly using conventional quantitative expressions. 2.3. Distance between two TFNs Let S^ and S2 be two TFNs. The vertex method is used to calculate the distance between them as follows:
4tth$t-hJ+fa-mJ
+ (»,-uj]
(3)
2.4. Normalization method Suppose there exist n possible alternatives A ,A ,---,A from which P decision makers
Pp{p = 1,2, •••,/•)
have
to
choose
on
the
basis
of
m
attributes^ ,c , — ,c • Suppose the rating of alternative A. (/ = 1,2, •••,«) on attribute
c .{j = 1,2, -,m) gi v e n by decision maker
p ( p = 1,2,•••,/•) is
x;
=(a'J,b*,c*).
A multi attribute group decision-making problem can be
expressed in matrix format as follows: C, C, - C
DF-{x,±
P=
\X-,P
A x (4) Now let minja'|< « ; = («;, 6;, c; )}, i = 1,2, - , « ; / > = 1,2, • • •, P = max{i;\a', ex; ={a'„b;,c;)}, i = 1,2,-,n;p
= 1,2,-,P
(5) (6)
= min^'|6; ex; =(a;,b;,c;)},
; = 1,2,-,K;/? = 1 , 2 , - , P
(7)
= mvJp;\b;ex;=(a>,b>,c>%
i = \,2,-,n;p
= \,2,-,P
(8)
= 1,2,-, P
(9)
, = m i n ^ c ; e x,' = {a'„b'„c',% i = \,2,-,n;p
C- = min{c;|c; ex;= {a'„b;,c',)}, i = l,2,-,/i;/> = 1,2,-,P
(10)
Then the following normalization formulas are used: Al
forjeC
(11)
-Al
forjeC 1
(12)
and
Now, the fuzzy decision matrices D" {p- \,2,---,P)
are transformed into the
normalized fuzzy decision matrix Rp as follows:
c, 4
'=(r')
c2
• ••
r»
c
= 4 K'
K:
• - r;m P= • •• K:
4, r;
?:
• ••
rtl'
\2,-,P
(13)
r±
3. Fuzzy Group Decision-Making Model Li linear programming based multiple VA and ana Yang iang [3] \p\ propose propose the me following ionowing un -"••''• • - *• ' ' 'on-makine model: attribute fuzzy group decision-making model:
481
Max £ £ 4
(14)
s.t. J l-i
3tr
r'1 (i,/).n''
2X Ek-<)
•J y=i
ft>. =e,j
j
y=i
= l,2,---,m
£
(15)
^ ' W - and v.„(y = 1,2, •••,/«) can be obtained by solving Eq. (14). Then a)L,dm,and a'm (j = 1,2,• • •,m)are computed using Eq. (15) 4. Fuzzy Multi-attribute e-Service Provider Selection Fuzzy sets were introduced by Zadeh in 1965 [3] to represent/manipulate data and information possessing non-statistical uncertainties. Fuzzy logic provides an inference methodology that enables approximate human reasoning capabilities to be applied to knowledge-based systems. The theory of fuzzy logic provides a mathematical strength to capture the uncertainties associated with human cognitive processes, such as thinking and reasoning. Fuzzy multi-criteria decision-making has been widely used to deal with decision making problems involving multiple criteria evaluation/selection of alternatives [4, 5, 6]. These studies show the advantages in handling unquantifiable/qualitative criteria, and obtained quite reliable results. E-service provider selection is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. Since most of the companies prefer
outsourcing for e-activities, the selection of e-service provider becomes a crucial issue for those companies. In this paper, taking into consideration the literature on e-services [7, 8, 9], the following cost-benefit attributes are preferred in the evaluation of the best e-service provider. • Cost Attributes: 1. Setup costs (SC): These costs include installation costs of the required system such as operating system, database system, e-payment systems etc., 2. Hosting Costs (HC): Periodic cost of hiring the service and/or system, 3. Design & Programming Costs (DPC): Costs for designing the Web, programming procedures and preparing necessary scripts, 4. Backup & Maintenance Costs (BMC): The cost of backing up the important data on the system and maintenance and upgrading costs to provide better service, 5. Advertising & Promotion Costs (APM), 6. Personnel (Staff) Cost (PC): The cost of the staff that will primarily work for this system. • Benefit Attributes: 1. Service Guarantee, References & Familiarity (SGRF), 2. Stability, reliability, and uptime (SRU): The e-service should be available online 24 h a day. That is, the operation system of e-services should not present failures at anytime, 3. Ease of control and use (ECU): The system provided should have control panel easy to use, 4. Security & Privacy (SP): Because e-commerce is operated on an open network, encryption technologies must be developed to deter hacker attacks. The system has maximum security for credit card applications. It is very important that stored data should also be kept private, 5. Speed (S), 6. Physical Location (PL), 7. Customer Service Level (CSL): E-service provider should have good customer relations, respond to the matters quickly. 5. A Numerical Application Suppose a firm plans to select a e-service provider among four candidates A\, A2, A-i and A4. There are four experts Pp (p=l, 2, 3, 4) who agree to take into consideration the following six attributes in evaluating the e-service providers: Setup costs (SC), Hosting Costs (HC), Backup & Maintenance Costs (BMC), Security & Privacy (SP), Speed (S), and Customer Service Level (CSL). Assume that the experts provide their preferences between alternatives as a 1 ={(2,l),(l,3),(2,4),(3,4)}, « 2 ={(2)l),(l,3),(l,4),(3,4)}, n 3 = {(2,1),(2,3)>(3,l),(3,4)}) and
n ={(l,3), (2,4), (3,4), (3,2)}. The corresponding relations between linguistic variables and positive triangular fuzzy numbers are given in Table 1. The data and ratings of all alternatives on every attribute are given by the four experts Ph P2, P3, and P4 as in Tables 2 respectively.
Table 1. Linguistic variables and their representation with fuzzy numbers Triangular fuzzy numbers (0.8,0.9,1.0) (0.7, 0.8, 0.9) (0.6, 0.7, 0.8) (0.3, 0.5, 0.7) (0.2, 0.3, 0.4) (0.1,0.2,0.3) (0,0.1,0.2)
Linguistic variables Excellent (E) Very good (VG) Good (G) Fair (F) Poor (P) Very poor (VP) Too poor (TP)
Table 2. Evaluations of experts Attributes
e-service providers
A, A2 A3 A,
HC
SC E2
E3
E4
El
E2
E3
E4
El
E2
E3
E4
33 38 78 64
43 32 78 62
41 39 76 74
42 45 68 55
44 33 45 52
40 33 41 59
45 33 35 52
37 36 37 42
30 34 39 30
25 36 29 30
32 34 35 30
45 28 45 34
El
E2
E3
E4
El
E2
E3
E4
El
E2
CSL E3
E4
30 21 21 24
32 24 18 22
32 21 20 24
28 34 22 25
F VG G VG
G P F VG
G G P E
G VG P VG
G P G P
G G VG F
P G VG F
G P VG G
S
SP
A, A2 A, A4
BMC
El
Now, using Eqs. (11), (12), and (13), we obtain the normalized decision matrices and then the transpose of these for each expert. Accepting e = 0.01 and A = 1.0 and substituting the normalized decision matrices and Q' [p = 1,2,3,4) into Eqs. (14) and (15), the linear programming problem is solved. The final ranking for evaluating and selecting among e-service providers is obtained as A, >- A, y A, > A,. 6. Conclusions In this paper the model developed by Li and Yang [2] has been used for evaluating and selecting among e-service providers. The selection among eservice providers is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. The judgments from experts are always in vague rather than in crisp numbers. It is suitable and flexible to express the judgments of experts in fuzzy number instead of in crisp number. For further research, the results of this study
may be compared with other fuzzy multi-attribute methods like fuzzy outranking methods, fuzzy utility theory and fuzzy AHP. Acknowledgments The authors acknowledge the financial support of the Galatasaray University Research Fund. References 1. K.B. Kenneth, H. Roger and V.R. Aleda, E-services: operating strategy—a case study and a method for analyzing operational benefits, Journal of Operations Management, 20, 175-188 (2002). 2. D-F. Li and J-B. Yang, Fuzzy linear programming technique for multi attribute group decision making in fuzzy environments, Information Sciences, 158, 263-275 (2004). 3. L. Zadeh, Fuzzy sets, Information Control, 8, 338-353, (1965). 4. C. Kahraman, D. Ruan and I. Dogan, Fuzzy group decision-making for facility location selection, International Journal of Production Economics, 87(2), 171-184(2004). 5. G. Buyukozkan, C. Kahraman and D. Ruan, A fuzzy multi-criteria decision approach for software development strategy selection, International Journal of General Systems, 33 (2-3), 259-280 (2004). 6. O. Kulak and C. Kahraman, Multi-attribute comparison of advanced manufacturing systems using fuzzy vs. crisp axiomatic design approach, International Journal ofProduction Economics, 95 (3), 415-424 (2005). 7. P.A. Dabholkar, D.I. Thorpe and J.O. Rentz, A measure of service quality for retail stores: scale development and validation, Journal of the Academy of Marketing Science, 24 (1), 3-16 (1996). 8. C. Liu and K.P. Arnett, Exploring the factors associated with Web site success in the context of electronic commerce, Information and Management, 38 (1), 23-33 (2000). 9. M. Wolfinbarger and M.C. Gilly, eTailQ: Dimensionalizing, Measuring and Predicting eTail Quality, Journal of Retailing, 79, 183-198 (2002).
A CASE BASED RESEARCH ON THE DIRECTIVE FUNCTION OF WEBSITE INTELLIGENCE TO HUMAN FLOW ZILU, ZHUOPENG DENG, YANG WANG Faculty ofResource and Environment Science, Hebei Normal University, Shijiazhuang, 050016, P. R.China Email: luzi@mail. hebtu. edu. en Abstract: More and more websites have started to apply intelligent techniques to conduct effective and flexible operations and provide high-quality and personalized online services. One of the explicit manifestations of website intelligence is that web-based information flow directs and guides a related human flow. This paper chooses Sino-Australian as a case to explore how information flow directs a study-abroad human flow through providing intelligent e-services. An evaluation system with the case is conducted to prove the rationality of the directing function and process. Keywords: website intelligence; e-services; study-abroad; directive functions; human flow; China
1. Introduction Online services (e-services) have been growing rapidly, keeping pace with the web. More and more websites are applying intelligent techniques to provide high-quality online services. These intelligent e-services have conducted an impact on the user decision-making for their activities in the real life. Therefore, some important website information may promote the movement of human flow in a real space. Taking websites as a front-office, some education agents/companies apply intelligent approaches and system to provide Chinese students full-stop services to study abroad from information search to visa application online. The rest of the paper is organized as follows. Literature review and background in related areas are shown in Section 2. The directing process and mechanism of information flow to related human flow are deeply revealed in Section 3. Section 4 focuses on an empirical analysis of how information flow directing human flow by taking www.ozstudynet.com as a case. An evaluation result is displayed and discussed to the directing effect of information flow to human flow. Conclusions are drawn in Section 5. 485
2. Literature Review and Background E-service intelligence (ESI) is an integration of intelligent technologies and e-services [1]. E-service intelligence has been right now identified as a new direction and next stage of e-services. Intelligent online services can provide with much higher quality information, personalized recommendations and more integrated seamless link services. Brazier and Cookson [2] discuss how real-time intelligent technique is applied to the business world. In the context of providing solutions for real-time communications services, their paper puts forward the following design patterns: service node, intelligent network and service delivery platform. Valerie et al [3] points out that enabling the intelligent e-services means that consumers will be provided with universal and immediate access to available content and services, together with the ways of effectively exploiting them. Intelligent online techniques and approaches have been received more and more attention. In the meantime, some researchers have shown their interests in how invisible information flow directing visible material flow. Graham and Marvin [4] identify four effects of information technology to a city, which include cooperation effect, substitution effect, derivation effect and enhancement effect. Moss [5] analyses the information and spatial distribution according to the Internet infrastructure of American. Adams and Ghose [6] have a qualitative analysis to the real effect of Internet in promoting migrations from India to American from the view of new ICTs. Some Chinese researchers have particularly conducted research in this area with China's cases. For example, Yao et al [7] give an explanation and empirical analysis to the four effects mentioned by Graham and Marvin. Zhen et al [8] develops related theories between information flow and transportation. Lu and Zhang [9] analyses the positive effect of implementation of intelligent Traffic Guiding System (TGS). Lu and Sun [10] put forward their perspectives to the nature of flow space. However, review results show that the research on the directive functions of invisible information flow to visible material flow is limited in simple qualitative analysis and lack of deep analysis to its mechanism generation. With the full-scale exchanges and cooperations of politics, economy, trade, culture, science and technology between China and Australia in last decade, education exchange and cooperation have been developed and extended gradually. As shown in Table 1, the number of Chinese students in Australia is rapidly increasing. Besides the various attractions from Australian education systems and policy impetus from Chinese government, one of the important reasons is the extensive establishment of study-abroad websites. Table 1 The increased human flow of Sino-Australia student abroad during 1997-2002 China to Australia
1997 1700
1998 3100
1999 4100
2000 8909
2001 13452
2002 14215
Data source: Chinese Ministry of Education, Australian statistic bureau and federal immigrant agency
The study-abroad websites of Sino-Australian is a kind of virtual platform constructed in a network space for Chinese students study abroad. This platform contributes to the construction of a "study-abroad bridge" between China and Australia. With the websites, a virtual community for study abroad is formed, in which various information exchanges can be freely conducted between website users and owners or among users. This approach is efficient in ultimate result, it thus stands for the future development trends. Therefore, websites can produce an effect to human flow through online information flow provided. 3. Directing Process of Web Intelligence to Human Flow 3.1 The directing process of web intelligence to human flow Stage 1
Stage 2
The destination confirmation of study abroad
The selection and interview of agent for study abroad
4
1
Stage 3
Stage 4
The application and transaction for study abroad
The completion and derivation of study abroad
•
*
t
Online information exchange community
Figure 1 Intelligent services provided by study-abroad websites at different stages
The process of information flow directing human flow under website intelligence includes four stages as shown in Figure 1. We will discuss it in detail below. 3.1.1 Stage 1: The destination confirmation of study abroad Previous research [11] indicates that to enhance the website usability is a key issue. The enhancement of website intelligence can help handling this issue. In this stage, the most significant application of intelligent technology to websites is intelligent information retrieval function. (l)Web Intelligence enhances the proportion of useful information. Some advanced intelligent techniques, such as cooperative filtering, multi-keywords search, are employed to enhance the understanding of "natural language" of users. (2)Website intelligence enhances the pertinence of users' search. In www.ozstudynet.com, it provides an intelligent engine for searching information of study abroad destination. Users can choose countries, cities, even universities and majors through using it. Users therefore can have a quick and full understanding to everything they may concern the process of study abroad. (3)Website intelligence also enhances the utility and pertinence of website links. Many previous links in www.ozstudynet.com are constructed in the absence of analysing their users.
3.1.2 Stage 2: The selection and interview of agentfor study abroad At this stage website intelligence is mainly embodied in their services provided. The improvement of service quality mainly depends on that of website intelligence, including: (1) An intelligent online feedback function. It is a function that can periodically send the related information pertaining to the study abroad destination automatically by e-mails according to different demands of users. (2) An intelligent online consultation function. They can consult the further detailed information about study abroad through intelligent means such as e-mails or on-line chat tools (QQ, MSN) etc. The www.ozstudynet.com online consultation function is available anytime to its users. (3) An intelligent online evaluation function. The student abroad can send their specific circumstance to the websites by e-forms. The intelligent evaluation system and experienced staff give the most accurate and authoritative results within a short time and offer suggestions suitable for different users. 3.1.3 Stage 3: The application and transaction of study abroad In this stage the website intelligence is reflected in three aspects. (1) An intelligent application function for study abroad. This means that users can log in websites to submit a set of their application forms. The service staff of websites will help accomplish necessary procedures for study abroad. Meanwhile, users can log in websites at any time to know their current application state. If some problems happen, they can be solved online without any delay. (2) An intelligent visa transaction function. A typical one is e-visa application. All visa application materials can be transmitted and transacted between Overseas Procedure Centre (OPC) of Australian government and their agents for study abroad in the form of e-files. This way makes the visa procedure become simpler and ensures the objectivity and impartiality to the largest degree. Table 2 lists the differences between traditional and intelligent visa transaction. (3) A follow-up service for directing human flow. Namely, websites make the use of their perfect service Table 2 Differences between traditional and intelligent visa transactions Transaction procedure of traditional visa Applicants or agents post the whole application materials to OPC; After OPC receives the applications materials and fees, the file number and other forms are sent back to your agents, (to wait more than one month); Then you should wait another three months. If pass, the pre-evaluation results will post to your agents; After agents receive the results, it will be posted to your school with tuition fee. When the school gives a confirmation and the enrolment notice will be back; Agents send the confirmation to OPC again; OPC will post the letter of visa approval to general consulate of Australian in Shanghai; Agents will post the passport of applicants to Shanghai to transact visa; General consulate in Shanghai will post the visa to your agents.
Notes: the whole process needs about four months. Your related materials should be posted at least six times.
Transaction procedure of e-visa Application materials of applicants will be scanned by agents authorized e-visa transaction. Then input certain passwords. It will be sent to OPC by e-files; E-Visa is a smart system. Every application will be generated a TRN (Transaction Reference Number) automatically. Applicants also can use their TRN to acknowledge their application state. Meanwhile, the body-checking form can be printed and have a check in appointed hospital; OPC will send the results and letter of visa approval to agents by e-files, too; Agents wilt post the passport of applicants to Shanghai to transact visa. When finished, general consulate in Shanghai will post the visa to your agents.
Notes: most process is done online. Save more time using in pre-evaluation and material posting.
489 network and continue providing intelligent services to users who plan to study in Australian. The directing effect to human flow is ultimately realized. 3.1.4 Stage 4: The completion and derivation of study abroad The fourth stage is completion and derivative stage. Its significance is that through continuous online exchanges, new information flow and human flow may be derived, which further enlarges the circulative size of directive functions. (1) A custom-made service function. One of the important aspects of website intelligence here is to support personalization. The online services also include study accompany and bridging course transferring to provide personalized student-abroad services. (2) An online exchange community function. It is a virtual space in which website owners and online members or members exchange their thoughts, emotions and experiences. In this virtual space, members can have discussions for the interested topics according to their own characters, interest and taste etc. The website www.ozstudynet.com provides such a free exchange community, which includes the four different forums as shown in Table 3. It needs to indicate that intelligent interactive online exchanges in the virtual community are not limited in the fourth stage. As a means of service or personalization, it functions in the whole process. Table 3 The topic of exchange forum in www.ozstudynet.com Different forum Special area for study abroad Ally of study abroad to Australian Living information Website affairs administration
hot topics __ __„—__ Study abroad to Australian; school transfer IELTS ally; major ally; school ally; homemate forum; meeting friends from same city; middle school ally; scholar ally Accommodation; living in Australian; back ticket; Second-hand market; migration to Australian; employment information; matchmaking; recreation Latest activities: Website affairs administration
3.2 Mechanisms The directing mechanism of intelligent information flow to human flow is shown in Figure 2. The human brain is affected by intelligent information directly and they will have an indirect impact on human's decision-making. In turn, e-services will be provided to direct and guide related human flow. On the condition that human cognition is fixed, the availability, precision and abundance of information and intelligent e-services will directly determine the human behaviour. We will prove the process and mechanism in next section by conducting an empirical analysis. 4. The Empirical Analysis — Taking www.ozstudynet.com as a Case 4.1 Research objective, method and result The website www.ozstudynet.com was established in May 2000, Melbourne, Australia. It develops its own intelligent e-study system and provides intelligent e-services that cover all steps of a study-broad application, based on a new operation model. Therefore, this study selects it as an example for exploring how website with e-service intelligence directing the study-Australia human flow. According to continuous investigation to www.ozstudynet.com for three
weeks in December 2005, the evaluation system was created as shown in Table 4. The selection of those factors is based on literature review result and data collection availability. Synthesizing these factors orderly formed the evaluation system. The evaluation system is divided into four levels. Three factors are considered in level 1, eight factors in level 2, 17 factors in level 3 and 33 factors in level 4. The reason to divide factors into four levels is to facilitate scoring and have a deep and systematic view to website intelligence. The Likert Style Scale was employed to quantitative analysis. The score of each factor was divided into five levels (2, 4, 6, 8, and 10) according to the data we observed and collected. First, the scores of factor on level 4 were obtained. Second, the scores of factors on level 3 indexes were also obtained according to their corresponding factors in level 4. In turn, the level 1 and level 2 factors were calculated finally.
Different humai man! flowtoAustrali
fv^
Front NO
Transaction process Information flow
_ZTI | Transaction
If
application-notary-body check fee paying-visa application
I
Service
Instntction-ticketaccommodation-study instruction -school transfer
Figure 2 The directing mechanism of information flow to related human flow
4.2 Evaluation result analysis (1) The score of factor "website content" in level 1 is 7.9, the score of factor "website service" in level 1 is 8.4 and the score of factor "personalization" in level 1 is 6.2. According to the scoring criterion, the factor "website service" gets the best evaluation and the score of its four factors in level 2 are all beyond 8; the factor "website content" gets a better evaluation and the score of its one factor in level 2 is more than 7; the factor "personalization" gets a worse evaluation and only one factor in level 2 gets a score 7. (2) On the base of website intelligent e-services, the directive functions of information flow to a related human flow become more effective and orderly. (3) This research also offers some valuable references to website owners in website improvement. First, the results suggest that the multi-keywords,
Table 4 The evaluation index system of website intelligence Level 1
Score
Intelligent information retrieval function
Level of website content
7.9
Level 2
Intelligent online consultation function Intelligent evaluation function
Level 3
Score
Information serviceability
6
Personalized Retrieval
8.7
Related links
9
Level 4
Score
Rate of useful information Number of useful information Destination retrieval College retrieval Major retrieval Number of links
6 6
Relevance abroad E-mail Multi-means 8.4
8.7
8
8.5
Intelligent online feedback function
5.3
study
10 8
oo MSN
10 8
8
Response time Consultation satisfaction
8 8
Non-profit
S
Charge fee or not Extra fee
10 6
9
Number of Professional Related working experience
10
Professional Evaluation Dynamic query
6
Fast and convenient
6
Diverse ways
10
Especially interview
Make whole process simplified
7
Time-saving Procedure simplified
6 8
Success rate
10
Regularly automatic send e-mail to users
5.3
Other custom-ma de services function
Perfectibility
5
Completeness
7.3
6.2
Online exchang e function
7
with
8 10 8 8
Real-time response
8.5
Intellige nt visa applicati on function
Level of personalization
6.2
7.9
Intellig ent applica tion fiinctio
Level of website service
8.4
Score
Gathering effect
7
Topic abundance
7
video
8
10
Increased success rate
10
Time interval E-mail content
2 6
Junk-mail or not
8
Accompany visa service Transfer service
4 6
Buying plane tickets Meeting at airport Accommodation Number of members Online number at peak time Number of topics Coverage degree
8 6 8 8 6 6 8
cooperative filtering and associated rule techniques are comprehensively used together in the future information retrieval. Second, the personalization of website is still weak. More emphasis should be laid on its real application effect. (4) The website intelligence is a relative new concept and the directive function is not same effective in all-human groups. The effective exertion of its intelligence must aim at specific sub-groups. (5) The significance of intelligent online exchange is not only forming the favourable cycle between web-based information flow and related human flow, but also enlarging the circulative size of its directive functions. Therefore, the intelligent online exchanges can play an important role in the whole process of directing human flow of studying in Australia.
5. Conclusions Intelligent online services in study abroad industry can break down the regionality, enhance realtimeness and create high efficiency. Web based information flow is effecting related human flow. Especially, with the development of intelligent e-visa, online services for study abroad become very effective and efficient. The study-abroad websites offer an approach to let applicants enter a virtual world, and this virtual world can direct and guide the movement of student flow from China to Australian. The research has also found that: (1) As a front-office of study abroad companies, study-abroad websites perfectly meet the users' demand through intelligent e-services; (2) The Sino-Australian study-abroad website direct and guides the human flow effectively; (3) The development of Sino-Australian study-abroad website has been transmitted from information provision and services to full-stop intelligent e-services. Acknowledgement This research is supported by National Science Foundation of China. (40571042) References 1. Lu, J. (2005), "Recent developments on E-service intelligence", www.fuzzy.ugent.be Noordrach Jie Lu .pdf 2005. 2. Brazier, R., Cookson, M. (2005), "Intelligence design patterns", BT technology Journal, Vol.23, No.l, pp 69-81. 3. Valerie, I., Daniele, S, Ferda, T, Francoise, S. (2005), "Developing ambient intelligence systems: a solution based on web services", Automated Software Engineering, No.12, ppl01-137. 4. Graham, S., Marvin, S. (1996), "Telecommunications and the city", Electronics, Urban places, Routledge, London, 234. 5. Moss, L. (1998), 'Townsend AM, Spatial analysis of the Internet in U.S. cities and states." http:// urban.nyu.edu/research/newcastle/Newcastle.html. 6. Adams, R, Ghose, R. (2003), "India.com: the construction of a space between", Progress in Human Geography, Vol.27, No.4, pp414-437. 7. Yao, S., Chen, S., Zhu, Z. (2001), "From the development of information network to the construction of digital-city in urban agglomeration", Human Geography (in Chinese), Vol.16, No.5, pp20-23. 8. Zhen, H., Liu, H., Zhang, J. (2001), "Related theory of information flow and transportation" (in Chinese), China Communications Press, 2001. 9. Lu, Z., Zhang, H. (2003), "The active impact of traffic guiding system to urban space distribution in Shijiazhuang", Economic Geography (in Chinese). Vol.23, No.2, pp242-246. 10. Lu, Z., Sun, Z. (2005), "A geographical perspective to the elementary nature of space of flows", Geography and Geo-Information Science (in Chinese). Vol.21, No.l, ppl09-112. 11. Nielsen, J. (2001). "Did poor usability kill e-Commerce?" Jacob Nielsen's Alertbox, http://www.useit.com/alertbox/20010819.html, 2001.
GENETIC ALGORITHM FOR INTERVAL OPTIMIZATION AND ITS APPLICATION IN THE WEB-ADVERTISING INCOME CONTROL QIN LIAO 1 , XIWEN LI1 'School of Mathematical Sciences, South China University of Technology, Guangzhou, Guangdong 510640, P. R. China Abstract: The control problem of the web-advertising income is discussed in this paper. Firstly, the evaluation index system of the web advertisement and the evaluation model of the neural network are built. Secondly, in order to effectively control the factors influencing the income of the web advertisement, the genetic algorithm for interval optimization is proposed to find the corresponding intervals of those factors according to the anticipant income in a certain interval. Finally, an appropriate allocation of the advertising cost is made by using the clustering algorithm to classify the keywords. Numerical experiments are given and the results show that the novel method is effective. Keywords: Interval optimization, Genetic algorithm, Web advertisement, Advertising income
1. Overview Following by the prompt development of Internet, the evaluation problem about the web-advertising effect, income and factors attain more and more concern. By building the evaluation indexes and evaluation model of the web advertisement, the genetic algorithm for interval optimization is proposed to study the control of the intervals of the web-advertising influencing factors. Furthermore, by using the clustering algorithm to class the keywords, the keywords having maximum advertising income are found and an appropriate allocation of the web-advertising cost is made. 2. Evaluation indexes and evaluation model of the web advertisement CTR (Click Through Rate), CPC (Cost Per Click), and CVR (Conversion Rate) are chosen to be the evaluation indexes of the web-advertising effect. ROI (Return on investment) is chosen to be the evaluation index of the web-advertising income. CTR is the number of clicks divided by the number of impressions—the number of times the advertisements have been displayed. CPC is how much the 493
advertisers should be charged for a click on their ads. CVR is the number of conversions, which occurs when someone clicks on the ads and performs a behavior on the websites published on the ads, divided by the number of ad clicks. ROI can be calculated as revenue from sales, minus advertising costs, all divided by the cost. Suppose ROI is evaluated by the three evaluation indexes, the B-P model is built by choosing CTR, CVR and CPC as 3 input neurons and ROI as the output neuron. The B-P model is showed as follow:
ROI = F(xl,x2,xi) = f(fjVj(£wijxJ-0i)-r) 1=1
(1)
7=1
where/=l / (1 + e"*) ,x\= CPC,x2 = CTR,x3 = CVR, V-, and w,, are connection weights, di and r are thresholds. By the network study of the samples, the total error, which is between the output of the model and the practical result of the sample, can satisfy the demand and the weights and thresholds of Eq. (1) are obtained. Furthermore, the evaluation model is built. By using the evaluation model, the ROI can be forecasted by inputting the values of the three indexes. But to an appointed ROI, it is too burdensome to try many combinations of the values of the indexes in order to satisfy the ROI. Thus, the genetic algorithm for interval optimization is proposed to find the corresponding intervals of those factors according to the anticipant income in a certain interval. 3. Genetic algorithm for interval optimization and the web-advertising income control model Traditional genetic algorithm is to find out the best individual^ K-AfV• ^ V , <XJ0)
individualsX{ ' - {x\ ', x\ ', • • •, x[ '} , and then the population presented as S = {^]\ tf2\.. .X^} comprises N individuals. The output of the individual is: Y = {y^ yi2\...Y(N)}, ^ e [Ymax, Ymin] i = 1, 2...N. The midpoint is presented as Ymtd= (Ymwl+ Ymln) 12 and the fitness function is defined as
By using the genetic algorithm first time, when the fitness achieves the maximum value, the corresponding individual X = {xx, x2...xn} is obtained. It means to find the best combination X = {x\, x2...x„} of the independent variable that makes the output close to the midpoint Ymid of the practical output. Basing on the process of finding the best individual X = {x\, x2...xa}, it generates random intervals (x, - e„ x, + e,), ei>0 of each independent variable (gene) xt, where e, is a positive number chosen randomly. Randomly generate M genes x) k = 1, 2...M, i = 1, 2...«, respectively from n intervals, so consequently M individualsX -{x[ ,---,xn )k = 1, 2...M are obtained, where x; e (x,- eh x,+ e,). The fitness function is defined as Eq. (2). The genetic algorithm is carried out again with the M individuals. After computing several times through the defined crossover operator and mutation, all the individuals that satisfy 1^*' - Yml^ < (Ymax - Ymi„) I 2 are chosen. Suppose P individuals are chosen and the i* genes of the P individuals are ranked from small to big as xfl, xf2, • • •, xfp, so the corresponding interval of the i"1 gene is [ xf ,xf ], j=l, 2...n. In order to determine the possibility that the corresponding output Y, which is calculated by the combination of the random values chosen from n intervals [xf ,xfp], is bounded in [Ymax, Ymi„], we can randomly test H times by choosing JC, from their corresponding intervals [jcf1, xf ] which constitute individuals X(k) = {JC,W, x[k), • • •, X(„k}} , k=\, 2...H. And then determine how many outputs Y= {Y"l\ y ^ , . . . ! ^ } are bounded in[Ymax, Ymin] thereby calculating the frequency that the outputs of the H individuals are bounded in the interval of Y. The frequency presents the approximate possibility describing the outputs, which is calculated by the combination of xt bounded in [xf ,xfp ] and other genes, bounded in an appointed interval. Overall, the interval [xf ,xfp ] of each independent variable xt is obtained, which makes the corresponding output Y e [Ymax, Ymi„], and the possibility of this relationship is also obtained. Suppose CTR and CVR are bounded in [0, 1], CPC\ is depended on the lowest cost demanded by the website, and CPC2 is determined by the maximum cost the advertisers willing to pay. Based on Eq. (1), there is ROI = F (CPC,
CTR, CVR). In order to control the factors CTR, CVR and CPC influencing the ROI, an interval [ROI,, ROI2] is defined for ROI and ROImid= (ROI,+ROI2y2. The individual constituted by CTR, CVR and CPC is defined as tfk)=(CPClk\CTR(k\CVRik)) and the fitness function is defined as: G(Xik)) = ] -j—\ROI^-ROImid\
1 +
,a>0
(3)
a
The best combination X=(CPC, CTR, CVR) of the independent variable is obtained, which makes the output of Eq. (1) achieves or close to the midpoint ROImid of the practical output. By using the genetic algorithm for interval optimization, the corresponding optimal intervals of CTR, CVR and CPC are respectively determined, which make ROI&[ROI,,ROI2]: CPCe[CPC,,CPC2], CVR&[CVR,,CVR2], CTR<E[CTR,,CTR2]. According to these three intervals the ROI can be controlled effectively. 4. Test of the web-advertising income control model Table 1 is the web-advertising sample data of a campaign in 7 weeks. By using the sample data in table 1, the B-P model for the ROI can be built and the parameters of the model are shown in table 2. When the ROI is demanded from 90% to 100%, by combining the evaluation model and the genetic algorithm for interval optimization, three control intervals of the advertising evaluation indexes are obtained, as table 3 shows. In table 3, in order to make the ROI bounded in [90%, 100%], CTR, CVR, CPC should be respectively bounded in [0.030, 0.049], [0.63, 0.99], [1.61, 2.41]. 100 samples bounded in these three intervals are randomly chosen and the ROI of these 100 samples are found out all bounded in [90%, 100%]. It means when the values of the three indexes are respectively bounded in corresponding intervals, the possibility that the outputs, which are calculated by the combinations of the values, are confined in the appointed interval is 100%. Table 1 Web-advertising sample data of a campaign in 7 weeks Week
CTR
CVR
CPC
ROI
Week
CTR
CVR
CPC
ROI
1
0.029
0.718
2.39
0.79
5
0.029
0.546
2.1
0.59
2
0.027
0.396
2.08
0. 16
6
0.030
0.525
1.98
0.62
3
0.033
0.472
1.78
0.61
7
0.035
0.311
2. 17
0. 12
4
0.031
0.651
2.03
0.87
Table 2 The parameters of the B-P model Input neurons 3
Hidden neurons 3
Output neurons 1
Learning times 7566
Total error 0.0009
Smoothness factor 0.15
Adjusting factor 1
Table 3 The result of the genetic algorithm for interval optimization
Control interval
CTR [0.030, 0.049]
CVR TO.63,0.991
CPC [1.61,2.41]
ROI [0.9,11
5. Effective allocation of web-advertising cost In order to raise the advertising income, besides applying the method proposed above to control the values of the factors influencing the advertising ROI, the quality of the influencing factors have to be controlled. Hence, the relation between the keyword of the website and the evaluation indexes of the advertising income has to be studied in order to control the advertising ROI. To raise the CTR and CVR the advertiser should make the keywords of the campaign more pertinence. If a keyword is too general, the advertiser run the risk of getting clicks to his site that aren't really relevant, and have a lower conversion rate. Advertiser can increase the keyword's profitability by adjusting its CPC. For keywords that show a profit, increase the CPC to increase exposure and generate more traffic. For keywords that aren't profitable, decrease the CPC to lower the costs. So clustering is done to the keywords and find out which keywords are profitable and which are not. Two indexes are used: CTR and CVR to determine the quality of the keywords. Table 4. Sample data of the advertising keyword CTR
CVR
Keyword
CTR
CVR
29%
42.4%
data mining technologies
5%
2.8%
45%
61.5%
data mining technology
87%
21.3%
data mining analysis
3%
0.0%
intelligent data mining
3%
3.9%
data mining solutions
6%
0.0%
multimedia data mining
5%
0.0%
Keyword business intelligence data mining data mining
The 8 sample data in table 4 are clustered by using the specimen system clustering method and the distance of the cluster is defined as the minimum distance based on the Euclidean Distance. If the keywords are clustered into two groups, then the result is represented as Ul and U2: Ul={business intelligence data mining, data mining, data mining technology} U2={data mining analysis, data mining solutions, data mining technologies,
intelligent data mining, multimedia data mining} Ul shows the keywords, which contribute a lot to the CTR and CVR, having higher quality, while U2 shows the keywords that have lower quality. So, the CPC of the first cluster can be increased, and the CPC of the second cluster should be decreased to allocate the costs of the web advertisement and it's helpful to increase the advertisers' web-advertising income. 6. Conclusion The genetic algorithm for interval optimization not only can be applied in controlling the influencing factors interval of the web-advertising income but also be validated on controlling multiple factors of general multiple objectives problems. Combining the evaluation indexes of web-advertising income and evaluation neural network model, not only the ROI is forecasted by changing the values of CPC, CTR, CVR but also the optimal control is processed reversibly according to the anticipate advertising income. In optimal control interval, the advertising income of any group, which constituted by "CPC, CTR, CVR", are bounded in the anticipate income interval. The clustering for the advertising keywords in the website can assistant the optimal allocation of advertising cost and effectively controls both the influencing factors and cost of the web advertisement. The result of the application indicates that the models are feasible and can be extended. References 1. Huang Z. W., Liao Q., "A celerity association Rules Method Based on Data Sort Search", Lecture Notes in Computer Science, No. 3613, August 2005, Springer, 1063-1066. 2. Liao Q., Gu H. M., Hao Z. F., "Fuzzy Comprehensive Evaluation of E-Commerce and Process Improvement", Lecture Notes in Computer Science, No. 3828, December, 2005, Springer, 366-374. 3. Liao Q., Li J., Hao Z. F., Li X. W., "Optimized Fuzzy Evaluation Model on Security Risk of Underground network", Asia Pacific Symposium on Safety, 2005,129-135. 4. Li Y. Q. & Liao Q., "Influence Factors and Mathematical Modeling of the Security of Underground Gas Conduit Network". Journal of South China of University of Technology (Natural Science Edition), 2004, 32: 89-93. 5. Syswerda G., Uniform Crossover in Genetic Algorithms. Proc of the 3rd Int. Conf. on Genetic Algorithms, Morgan Kaufmann, Los, 1989, 2-9. 6. Syswerda G., A Study of Reproduction in Generational and Steady-State Genetic Algorithms, Foundation of Genetic Algorithms, Morgan Kaufmann, CA, 332-349.
DESIGN AND IMPLEMENTATION OF AN E-COMMERCE ONLINE GAME FOR EDUCATION AND TRAINING PEILI ZHANG 1 , MEIQI FANG 1 , YI ZENG 1 , JIA YU 2 I. Economy & Science Lab, Renrnin University of China, 1000872, P.R. China 2. School of Electrical Engineering and Telecommunications University of New South Wales UNSW, Sydney, NSW2052, Australia In this paper, we describe our design and implementation of a flexible, efficient way for ECommerce education by utilizing online game mode. We are building a multi-player ECommerce simulating online game - ECGAME, which everyone can login in via Internet all over the world. The platform provides participants with a virtual integrated business world to experience and helps them to learn business rules more effectively. It also motivates participants to learn E-Commerce and sparks competitiveness.
1. Introduction As the online game business growing, we realize that massive multiplayer online game can inspire people's interests greatly. We find that it is a new way to help students experience real business world. Therefore, during this year, we are building a multi-player E-Commerce online game -ECGAME. There are four basic roles in the game: consumers, manufacturers, retailers and transporters. The participants can run their own companies with various strategies and compete with the other companies in the virtual world. In ECGAME, each player will act different roles: consumer, manufacturer, retailer or transporter. The tasks of each role are different. We will calculate the scores based on their performance, for example, the money they earn. It is helpful for students, business planners and e-commerce product designers to train their skills via the game platform. It is also an inexpensive, safe way to evaluate the potential consequences of business strategies and market models. In addition, we cooperate with an art school to ornament our system with embellishment scenes in order to attract participants' interests (Fig. 1.).
Fig. 1. Consumer's Home in ECGAME
499
500 2. Roles Assignment 2.1. Consumer Buy Product
Earn Money
Work
IS
Auction
> Consumer ^ if I ^4
i Stu.dy
Buy Stock Fig. 2. Consumer
Description: Participants who act consumers have various attributes. They need to consume products to enhance their abilities, such as eating food to get strength. There are several ways to earn money. They can auction products with other consumers, work for companies, buy stock and study in the school to get better jobs. 2.2. Manufacturer Research & Development
Buy Materials
J Manufacture ] Products
-
with » Contact Transporters
Bargain with Retailers
k Accomplish the Contract
Employment
Fig. 3. Manufacturer
Description: Participants who act manufacturer need to research and develop new product at first. They can manufacture products as long as they buy enough materials and employ enough workers. They also bargain with retailers to sell their products. After signing the contracts, they need contact with transporters to transfer their products to retailers. 2.3. Retailer
Investigation
Bargain with Manufacturers
Buy the Goods
Fig. 4. Retailer
S
Manage Storage
^ " y J S e l l Goods to Consumers
501 Description: Participants who act retailers should investigate consumers' interests at the beginning. Then, they bargain with manufacturers to buy the goods they want based on the investigation reports. Finally, they need to manage the warehouse and sell the products to consumers. 2.4. Transporter Develop new Lane
Receive order
Transporter
Buy Vehicles Employment
V
1
K*
Transfer goods
Fig. 5. Transporter
Description: Participants who act transporter should develop new lanes firstly. Then they need to buy vehicles and employ workers. Each vehicle has different capacity. At the same time, they will receive orders from manufacturers to transfer goods to retailers. They decide the optimal way to transfer goods to save time. 2.5. Auto Roles It is far less enough to assign only four basic roles (consumers, manufacturers, retailers and transports) to simulate the business market. Hence, we add in several auto roles, such as Supplier, Bank, College, Supermarket, Job Center, Stock Center, Custom Center, Insurance Center, Auction Center, and Arbitration Center etc. Each auto role has its own rules to assist participants to accomplish their tasks. The primary events in ECGAME are listed in Figure 6. Baigain
Sjviiij- mantv
Partkinants Auto rules
Fig. 6. The events among basic roles and auto roles
502 3. Game Scenarios 3.1. Overview ECGAME offers an integral market simulation platform for participants to experience the real business world. As we can see in Figure 7, there are three general game scenarios in our system:
"^N £**"»!
S(»*
M&**(S|S^
i
s-omng 1 tee& Accumuia/ufig
the stock SHr-fcci
Fig. 7. Overview of Game scenarios in ECGAME
Stage 1: Initializing and Accumulating In ECGAME, participants will act consumers with little money and low abilities at the beginning. They will participate in various activities to earn money and enhance their abilities. Stage 2: Running a new company After their skills and personal savings reach the level scenario II requires, participants can enter the scenario II to run a new company. They can act as manufacturer, retailer or transporter. The tasks of each role are different for participants to experience distinct business modes. In this scenario, each participant can manage his company with own strategies and trade with other participants to earn money. Stage 3: Coming into the stock market When participants run their companies well enough, they can apply to act as super enterprisers. The super enterprise can come into the stock market to accumulate more money from the public. In ECGAME, we establish a simplified stock market for participants to experience.
503 3.2. Enterprise resource planning (ERP) We add ERP (Enterprise Resource Planning) components into our online game platform for business management catering the complex task requirements in ECGAME. Participants who act as enterprisers (manufacturers, retailers or transporters) can easily use ERP technology to manage their virtual company during playing the game. In our ERP software, there are five crucial modules: Finance Management, Human Resource Management, Manufacturing Management, Strategy Decision and Materials Management. Participants can learn how to plan enterprise resources efficiently using this platform. 3.3. Enterprise Alliance Although the game provides participants with opportunities to compete with the others, the experienced participants tend to cooperate with potential partners. We find that participants' behavioral patterns keep changing and the relationships among people are dynamic. When they benefit from integrating enterprises' resource, they will become partners. ECGAME provides a platform for those participants to form alliances. The members in the Enterprise Alliance can share information with each other, and come to an agreement with discounts. It is more challenging that participants need to change their strategies and find their partners in order to survive in the game. 3.4. How to win the game In the ECGAME, there are a lot of activities and targets for participants to achieve including various decision factors. In order to win, there are four primary factors that participants should take into account: 1. 2. 3. 4.
Money: How to use the money to invest the company? Time: When to manufacture? When to transfer? Human Resource: How to schedule the workers for manufacturing? Information: What are consumer needs now? How about the opponents?
Therefore, when participants make decisions on the business strategies, they must consider all the factors above. If some one ignores any factor, it will lead to failure. As a whole, only when the student plans everything carefully, could he or she get a high score.
504 4. Conclusion In conclusion, ECGAME is an original and efficient approach to learn and experience E-Commerce. It is an excellent review of many business strategies, management methods, and marketing principles. It provides an approachable way to gain hands-on management experiences. Currently, we have accomplished the prototype of ECGAME with most functions and here we described some main innovations of our system in this paper. As for an online-game, efficiency is a significant factor to attract participants. The participants are inpatient if they play the game in a low speed. Thus, to evaluate the efficiency of our system is a crucial task in the next step. Furthermore, we will add more components in the game for participants to experience other latest technologies: such as Supply Chain Management (SCM), Client Resource Management (CRM) and Supply Resource Management (SRM). References 1. Martin Griss, Reed Letsinger(2000). Games at Work - Agent-Mediated Ecommerce Simulation, Workshop on Autonomous Agents 2000, Barcelona, Spain, June 2000. 2. Hua Cheng, Meiqi Fang and lin Guan, Zuqiang Hong, "Design and implementation of an E-Commerce platform," Journal of Electronic Commerce in Organizations, 2 (2), pp. 44-54, 2004. 3. John Mitchell (2000). The Implications of E-commerce for Online Learning Systems, Proceedings of Moving Online Conference II, 2 - 4 September, Gold Coast, Australia, 110-123. 4. Martin Fowler. Patterns of Enterprise Application Architecture, AddisonWesley, Hardcover, Published November 2002, 533 pages, ISBN 0321127420 5. A. Chaves and P. Maes. "Kasbah: An Agent Marketplace for Buying and Selling Goods". Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM'96). London, UK, April 1996. 6. T. Kindberg, et.al., "People, Places, Things: Web Presence for the Real World", HP Laboratories, Technical Report, HPL-2000-16, Feb 2000. 7. A. Durante, et. Al., "A Model for the E-Service Marketplace," HP Laboratories, Technical Report, HPL-2000-17,Feb 2000. 8. R.Guttman, A. Moukas, and P. Maes. "Agent-mediated Electronic commerce: A Survey." Knowledge Engineering Review, June 1998.
SELECTION MODEL OF SEMANTIC WEB SERVICES
X. WANG, Y. ZHAO, A N D W. A. HALANG Faculty of Electrical and Computer Engineering FernUniversitat, 58084 Hagen, Germany E-mail: {xia.wang, yi.zhao, Wolfgang.halang}@fernuni-hagen.de Widely supported by industry and research developments, Semantic Web Services appear to be the next generation of the service-oriented paradigm. Their discovery constitutes a great challenge. Although much work has been done on selecting semantic services, an exact selection model is still not defined, which could describe services and service selection in a powerful way. Moreover, little consideration was given to quantify the selection of services basing on both the capabilities and quality of service (QoS). Here, a QoS-based selection model of semantic services is proposed, and a service matching algorithm working under this model is presented. Also, two real-life examples are discussed to validate its implementation.
1. Introduction With the development of the Semantic Web11, the future web could provide not only static information understood by human beings, but also semantic services which implement a series of tasks and can automatically be accomplished by machines. More and more semantic web services will be developed and published in the fields of business, such as travel agencies, on-line trade, e-commerce, or manufacturing. Thus, the automatic discovery or selection of desired services is becoming critical. The achievements in the area of Semantic Web Services (SWS) so far can be summarised as follows. Currently, OWL-S12 is one important web ontology language for services, providing strong semantic support for web services. There is also some work on service matching algorithms. Sycara et al. 6 , for instance, proposed a dynamic matchmaking algorithm based on the agent capability description language LARKS. Similar work1,7 adopts these ideas to different degrees. Most current work, however, deals with the selection problem only with respect to similarity of non-functionality and functionality. The quality attributes of services10 are considered to a lesser extent. Moreover, the similarity of services is considered to match the whole description of ser505
vices published in OWL-S or WSDL documents , which may render the selection process to be erroneous 5 as the documents contain too much redundant information. Therefore, as a remedy, this paper proposes a concise and practical QoS-based service model inherently supporting selection. In the context of this model, two real-life examples of implementing the matching process are presented, which consider both the qualities and the capabilities of services. 2. M o d e l of S e m a n t i c W e b Services To describe a service, its non-functionality and functionality should be defined. Based on 2 , 1 2 , a service is stated as a tuple s = (NF, F, Q, C), where NF is a set defining its non-functionalities, F as a set describing the functionalities, Q defines the quality of the service, and Cis the overall cost of its invocation. The details of this service model presented in OWL Description Language (DL) 3 are as follows: - NF, an array of String, is denoted as NF = {serviceName, serviceCategory,textDescription}. According to the OWL-S specification, they are: (1) serviceName a String as the name of a service; (2) serviceCategory a String array as the categories of a service, which is classified according to the application fields, kinds of services, or some industry taxonomy, e.g., NACIS a or UN-SPSC b ; (3) textDescription a short text readable for humans briefly describing the service. - F is a set of service functionalities defined as F = {/j, ji, •••, fi}, i € N, with the ith function fa = (cypi,Hji,T1oi,Consi,Pi,Ei) described by: (1) (2) (3) (4) (5) (6) (7)
a b
opi: String naming the operation i; S / ; : array of String consisting of input parameters; HoV array of String consisting of output parameters; Consi: set of assertions expressed in DL syntax as constraints; Pi: set of assertions in DL syntax as pre-conditions; Ei\ set of assertions in DL syntax as effects; Relation 5: (op, x T,0. x Consi x Pj) — • ( £ 0 . x .E*) being the logic implication of service execution.
North American Cartographic Information Society (NACIS), www.nacis.org United Nations Standard Products and Services Code (UNSPSC), www.unspsc.org
The relation 5 follows the logic statement "If (opi x So, x ConSi x Pj) is true, then (Eoi x Ei) is true". That means, if the inputs, constraints and pre-conditions required by operation i are fulfilled, its relevant outputs and effects are yielded. - Q denotes the quality attributes of a service. In 8 , all possible quality requirements of services were identified. For the purpose of efficiency of service selection and from the point of view of clients, our model only categories QoS metrics into two parts, a necessary set (Qn) and an optional one (Qo)- We explicitly define Q in our model as Q = Qn = {QI,Q2, •••jQj}, j € N, which is formalised as a set of attribute assertions expressed in DL syntax, see Fig. 1. - C is the overall cost of invoking a service, generally an assertion formulated in DL syntax. Fig 1 takes as examples two real flight-booking systems0, which have been re-described using the above service model. T»F-{CheapFtightBearch, Flight, "Search far a cheap flight tickel.UK citizens"); OP={CheapFlightSearch); Sf={LondonUK,OrlandoVSAJu!yl405Myi 705,lAdu!ts.OCkildren.01n/antsJieturnTrip); Z^=(TicketDeparl, TicketReturn); Cons=(n(=M5.70cosLTicket)r(=morningti me.DepartTicket)ri(=eveningtime.Return7ick et) r\(=economyc!ass. Ticket)); P={(=\valid.PaymentCard)); .&={ {money.InsurenceCard-cosL Ticket)); Q=(rX<8time.quaIityResponseTime)r\£8tim e.qualityExecuteTime)ri^8rank.Reliabihty) nQ>7rank.quahtyExceptionHandling)n(i'9 rankqualityAccuracy) n&2rank.quahtySecu rip)); C={(= \55.70cosLCostService));
NF={Flights,Flight, "travel expertise, unrivalled knowledge, excellent value... "); OF=(FindFlight). Sj={LondonCit//Airport,OrlandolISAJulyl 405Julyl70JAduIts,OChildren,OInfants,Retu rnTrip); S0={TicketDepart, TicketReturn); Cons={rX=22S0.35cost.Ticket)ri(=morningti me.DepartTicket)n(=eveni/igtime.RelurnTick et) n (=2class.Ticket) ) , P={{=\valid.PaymentCard)); E=((money.InsurenceCard-cost.Ticket)), Q={rX^2time.qualityResponseTime)n(<8ti me.qua!ityExecuteTime)r\(^8rank.Re!iabilil y)r\(^7rankqualityExceptionHandling)n(8 rank.qualityAccuracy)n(^8rank.qualitySec "rip)); C={(< \M35cosLCostService)),
(a) Advertisement;
(b) Advertisement
Figure 1.
c
Two real-world examples of the service model
Advertisement}, at www.squareroutetravel.com/Flights/cheap-fiight-ticket.htm; Advertisement2, at www.travelbag.co.uk/fiights/index.html
and
3. S e l e c t i o n A l g o r i t h m
Usually, there is always a service requester {sR), and a number of service providers {sA). A selection algorithm matches the published services potentially meeting the requirements of the service requester. Denoted as sR = (NFR, FR, QR, CR) and sA = {NFA, FA,QA, CA), the service requirement and the service advertisement form the inputs to the matching engine. Then, pairs of sA and sR are interpreted and processed conforming to the matching algorithm. The output of the matching engine is a sorted set of candidate services returned to the requester as result. The matching process can flexibly be organised by various kinds of filters. Here, we sequentially combine three filters to illustrate this. I. NF - This filter is based on service name, service category, and service description to select the available services, and obtain a collection of relative candidates, named resultSetn. The process is to run a program implementing the pseudo-code in Fig. 2 twice, using serviceName and serviceCategory as input parameters, respectively. The distance of two different concepts is computed according to 6 . The outputs are the similarity of serviceName and serviceCategory, with a value between 0 and 1. matchConcept (ConceptR, ConceptA){ double simConcept; simConcept = distance (ConceptR, ConceptA) return eimConcept; } / / BimConcept G [0,1];
Figure 2. Matching and serviceCategory
serviceName
matchDescript ion (DescriptionR, DescriptionA){ Double simDes; for each term in DescriptionR and DescriptionA { DesR[tj] = number of term t j in DescriptionR; DesA[ti]= number of term t j in DescriptionA;} a i m D e s = (DesR * DesA)/(\DesR\ * |Des<4|); returnsimDea; } / /simDes e [0, 1];
Figure 3.
Matching
serviceDescription
To measure the similarity of short text information, like serviceDescription, the cosine coefficient is a simple and effective approach. In Fig. 3, serviceDescription is the input, and the similarity of descriptions is the output, which ranges between 0 and 1. II. op, £/, Eo - This filter works on the functionalities of service, and obtains a result set resultSetf. First, the operation op is compared by using matchConcept, as defined in Fig. 2: matchConcept{OperationR, Operation A); //returnsimOP G [0,1] Second, if the service requester's input information subsumes the parameters required by the service advertisement, request and advertisement are considered to be similar. Then, the value of simPara is assigned to 1. Otherwise, it is thought that the requester's input parameters are not sufficient to fulfill the implementation of advertised service, and simPara is set to 0.
The function matchOutput, however, does the opposite. If the service requester's output cannot be subsumed in the service advertisement, then the results of the advertisement's execution meet the service requester's requirement only partly. In this case the match fails, noted as 0. III. Cons, P, E, Q and C - This filter processes resultSet/ to refine the selection and similarly to obtain resultSet. In this step, the matchmaking is processed on a set of expressions in DL format. Here, one could not simply consider the subsumption, because the similarity degree is more important than their subsumption relationship. Considering cost. Ticket in Fig. 1, the requester R is assumed to state that the ticket must cost less than $800; Advertisement\ asks for $645.70 and A2 for $2280.35. In this case, an approach should be used to scale the close-degree rather than the values' difference. This is addressed by the algorithm as: closeExpCost(R, Ax) = 8^(800 - 645.70) = 0.192875; closeExpCost{R, A2) = ^(800 - 2280.35) = -1.8504375; Obviously, the bigger the numerical value is, the better is the result. Advertisementi is a better choice than AdvertisementiIf extended to a set of expressions (in this context, Cons, P, E, Q, and C), the multi-attribute utility theory 2 , 5 should be applied. T h a t is, after computing the close-degree based on fuzzy computing 4 , to the expressions are assigned utility scores 2 between 1 and 5 based on the attribute definitions. In the above example, the values of closeness are utilityScore(R, A\) = 5 and utilityScore{R, A2) = 1. A set of expressions is E = {E\, E2, •••,En} is assumed, whose related utility score are u = {ui,U2, . . . , u „ } , n € N. If a weight (v) between 0 and 1 is assigned to each attribute, then .
.
svmExpression =
Vi • Ml + V2 • U2 + ... + Vn • Un
. vi+v2 + ... + vn Cons, P, E, Q, and C are all sets of expressions, whose corresponding similarities could be calculated as simCons, simP, simE, simQoS and simC. Similarly, to quantify the similarity between service requirements and advertisements, to each attribute involved in the matchmaking a weight between 0 and 1 is assigned. Then, 11
sumSimilarity
= (S,
w
i)~l
' (wi • simName
+ w2 • simCategory
-f W3 x
!=1
x simDes x simCons
4- W4 • simOP + W5 • simlnput
+ WQ • simOutput
+ ws • simP + uig • simE + W\Q • simQoS
+ 107 x
+ u>n • simC)
510 4. C o n c l u s i o n Semantic Web Services appear to be the next generation of the serviceoriented paradigm. The discovery of semantic services constitutes a great challenge. All problems encountered currently lead to the conclusion that an effective semantic service model for selection is missing, which provides powerful expressiveness with respect to capabilities and qualities of service, and that a quantified matching algorithm is needed, too. For these problems, this paper proposed a selection model, which effectively implements QoS-based matchmaking of semantic web services. The matching process is discussed with two real examples. Future work is to optimise the method of accessing each quality attribute of service, and to consider how to combine various QoS attributes. References 1. C. Zhou, L. Chia and B. Lee, DAML-QoS Ontology for Web Services, Proc. ICWS04, pp. 472-479 (2004). 2. D. Menasce, Composing Web Services: A QoS View, IEEE Internet Computing, 8(6) (2004). 3. D. Nardi, F. Baader, D. Calvanese, D.L. McGuinness and P.F. PatelSchneider, The Description Logic Handbook, Cambridge (2003). 4. E. Herrera-Viedma et al., Evaluating the Informative Quality of Web Sites by Fuzzy Computing with Words. Proc. Atlantic Web Intelligence Conference, LNAI 2663 (2003). 5. E.M. Maximilien and M.P. Singh, A Framework and Ontology for Dynamic Web Services Selection, IEEE Internet Computing, 8(5), 84-93 (2004). 6. K. Sycara, S. Widoff, M. Klusch and J. Lu, LARKS: Dynamic Matchmaking Among Heterogeneous Software Agents in Cyberspace, Autonomous Agents and Multi-Agent Systems, 5(2), 172-203 (2002). 7. M. Paolucci, T. Kawmura, T. Payne and K. Sycara, Semantic Matching of Web Services Capabilities, Proc. ISWC02, pp. 333-347 (2002). 8. QoS for Web Services: Requirements and Possible Approaches. W3C Working Group Note (2003). 9. R. Akkiraju et al., Web Service Semantics - WSDL-S, A joint UGA-IBM Technical Note, Version 1.0 (2005). 10. S. Ran, A Model for Web Services Discovery With QoS, ACM SIGecom (2003). 11. T. Berners-Lee, J. Hendler and O. Lassila, The Semantic Web. Scientific American, 284(5), 34-43 (2001). 12. The OWL Services Coalition, OWL-S Semantic Markup for Web Services, W3C Member Submission (2004). 13. Web Services Description Language (WSDL) Version 2.0, Part 0: Primer. W3C Working Draft (2004).
A T R U S T ASSERTION M A K E R TOOL
P A O L O CERAVOLO, E R N E S T O DAMIANI, M A R C O VIVIANI
E-mail:
University of Milan via Bramante 65, 26013 Crema (CR), Italy {ceravolo, damiani, viviani}@dti.unimi.it
ALESSIO C U R C I O , MICOL PINELLI
E-mail:
University of Milan via Bramante 65, 26013 Crema (CR), Haly {acurcio, mpinelli}@crema.unimi.it
In this paper we outline the architecture of a distributed Trust Layer that can be superimposed to metadata generators, like our Trust Assertion Maker. TAM is a tool allowing to produce metadata from multiple sources in an authoritybased environment, where user's role is certified and associated to a trust value. These metadata can complement metadata automatically generated by document classifiers. Our ongoing experimentation is aimed at validating the role of a Trust Layer as a technique for automatically screening high quality metadata in a set of assertions coming from sources with different level of trustworthiness.
1. I n t r o d u c t i o n Nowadays, communication technologies have created a space of flows that substitutes the space of places: geographical proximity is giving way to relational proximity. These phenomena greatly impact on the architecture of a community of practice, giving to it a distributed extension of the internal processes and dynamics. CoPs exist within businesses and across business units and company boundaries; even though they are informally constituted, these self-organizing systems share the capability of creating and using organizational knowledge through informal learning and mutual engagement 5 , putting users at the center of the space of interactions. Moreover, in current business contexts that require multidisciplinary approaches and competencies, this stress the relevance of user's role, reputation, and trust. For these reasons, generic knowledge management techniques in CoPs have 511
512 to be evolved towards a source oriented evaluation of the acquired knowledge. The knowledge extracted during the analysis of the information flow produced by the community must be filtered by the relevance of the node producing it. Also the composition of nodes can evolve and the knowledge is continuously under the evolution pressure. Typically, knowledge management techniques used metadata in order to specifying content, quality, type, creation, and spatial information of a data item. A number of specialized formats for the creation of metadata exist. A typical example is the Resource Description Framework (RDF). But metadata can be stored in any format such as free text, Extensible Markup Language (XML), or database entries. All of these format must relay on a vocabulary that can have different degree of formality. If this vocabulary is compliant to a set of logical axioms it is called an ontology. There are a number of well-known advantages in using information extracted from data instead of data themselves. On one hand, because of their small size compared to the data they describe, metadata are more easily shareable than data. Thanks to metadata sharing, information about data becomes readily available to anyone seeking it. Thus, metadata make data discovery easier and reduces data duplication. On the other hand, metadata can be created by a number of sources (the data owner, other users, automatic tools) and may or may not be digitally signed by their author. The present paper briefly outlines our current research work (for a more detailed description, see 2 ) on how to validate such assertions by means of a Trust Layer, including a Trust Manager able to collect votes from the different nodes and to compute variations to trust values on metadata. Then we focus our discussion on the description of Trust Assertion Maker (TAM), a tool especially designed for allowing the manual production of trust assertions, in distributed environments. Trust assertions are metadata assertions associated to a trust value. The trust level of an assertion is based on user roles. The adoption of a manual tool allows a detailed level of description and represents an important complement to assertion produced by automatic classifier that simply associate resources to domain concepts. This paper is organized as follows: in Section 2 we outline the architecture of our Trust Layer, while Section 3.1 we focus our attention on TAM; finally in section 3.2 we define in detail the type of asserttns supported by TAM.
2. T h e Trust Layer architecture Before describing our proposed Trust Layer, let us make some short remarks on related work. Current approaches distinguish between two main types of trust management systems *, namely Centralized Reputation Systems and Distributed Reputation Systems. In centralized reputation systems, trust information is collected from members in the community in the form of ratings on resources. The central authority collects all the ratings and derives a score for each resource. In a distributed reputation system there is no central location for submitting ratings and obtaining resources reputation scores; instead, there are distributed stores where ratings can be submitted. In our approach trust is attached to metadata in the form of assertions rather than to generic resources. While trust values are expressed by clients, our Trust Layer includes a centralized Metadata Publication Center that acts as an index, collecting and displaying metadata assertions, possibly in different formats and coming from different sources. It is possible to assign different trust values to assertions depending on their origin: assertions manually provided by a domain expert are more reliable than automatically generated ones. Metadata in the Publication Center are indexed and clientss interact with them by navigating them and providing implicitly (with their behavior) or explicitly (by means of an explicit vote) an evaluation about metadata trustworthiness. This trust-related information is provided by the Publication Center to the Trust Manager in the form of new assertions. Trust assertions, which we call Trust Metadata, are built using the well-known technique of reification. This choice allows our system to interact with heterogeneous sources of metadata: our Trust Metadata are not dependent on the format of the original assertions. Also, all software modules in our architecture can evolve separately; taken together, they compose a complete Trust Layer, whose components communicate by means of web services interfaces. This makes it possible to test the whole system despite the fact that single models can evolve with different speeds. Summarizing our architecture, the Trust Manager is composed of two functional modules: Trust Evaluator: examines metadata and evaluates their reliability; Trust Aggregator: aggregates all the inputs coming from the trust evaluators by means of a suitable aggregation function. This system allows to integrate large amount of assertions produced from different sources. Trust aggregation algorithms provides a self-running mechanism allowing high quality assertion to emerge in the whole set of produced assertions. Fig. 1 describes the architecture of our Trust Layer. More details
514 on Trust Manger can be found in 3 .
Figure 1. The Trust Layer Architecture.
3. Trust Assertion Maker 3.1. Architecture
and
requirements
The functional requirements of TAM are to manage role-based creation of metadata assertions. This means that each user must be associated to a role. Roles are obtained by an authority tasked to certify user's expertise. For these reasons TAM is organized according to a client-server architecture. The client provides the structure for editing assertions. The server manages user's access, and maintains the metadata base synchronizing updating. User's expertise is represented associating to a role the portions of ontology describing a domain where his expertise can be considered reliable. Portions of ontologies can be computed simply listing a concept list containing concepts where the expertise of a role is maximal. These concepts are associated to a trust value. And trust values of other concepts of the ontology are deviated decreasing progressively the trust value according to the distance from the closer concept among those in the concept list. In a knowledge management system the role of a manual metadata editor such
515 as TAM, is to complement automatic classifiers. Tools implementing automatic or semi-automatic algorithms for indexing resources are essential in order to support a satisfactory number of resources. But these tools are able to provide only simple assertions connecting a resource to a concept. For this reason they can be used only for producing a first base of metadata. In order to enrich the metadata base users must be provided with easy instruments for the production of semantically complex assertions. As shown in Fig. 2 TAM guides the user in a step-by-step dialogs allowing to produce an assertion without any expertise on the format used in the system for storing and sharing metadata assertions. According to Stojanovic, Staab, and Studer 4 it is possible to distinguish two type of ontologies: a domain ontology, or content ontology, that provides the vocabulary valid for a particular domain, or a structure ontology, or context ontology, that provides general concepts, cross functional to specific domains, describing the type/structure of the resource. Our tool supports both these type of ontologies. We call Direct the assertions made on structural concepts, because we directly assert on the resources. We call Indirect the assertions made on the content of a resource. Also the tool allows to produce simple or complex assertions. In summary we have four type of metadata assertions, as shown in Table 1. In section 3.2 we describe in more detail these assertions.
Direct Indirect
3.2. Editing
Simple DS IS
Complex DC IC
assertions
Metadata produced by TAM are editable by means of an user friendly interface. The interface proposed in TAM leads the user to the creation of an assertion, using step-by-step dialogs. In the first step the user has to select the role he want to hold, the resource he want to relate to the metadata assertion, and the ontology he want to use for creating metadata. Fig. 2 shows a screenshot from the TAM interface. In the second step the user has to choose the type of assertion he want to create. As previously mentioned four type of assertions are supported by TAM.
Figure 2.
The TAM interface.
IS: Indirect Simple assertion. An Indirect Simple assertion consists in associating single concepts of the domain ontology to a resource. To obtain this, it is necessary to select the resource and, then, to choose from the contextual menu, the option related to indirect simple assertion. Then, it is necessary to individuate the concept of the ontology to be related to the resource. For example, a indirect simple assertion can be represented by the following: "Document-X speaks about Enterprise", where E n t e r p r i s e is a concept belonging to the domain ontology. DC: Indirect Complex assertion. An Indirect Complex assertion consists in associating not only single concepts of the ontology to a document, but whole logic assertions, structured according to the model subject-predicate-object. To obtain this, it is necessary to select the resource and, then, to choose from the contextual menu, the option related to direct complex assertion. Then, it is necessary to individuate the subject of the assertion, selecting a concept from the ontology. After that, it is necessary to specify the predicate, choosing it among all the possible predicates, suitable with the subject (considering also the inheritance among the concepts). Finally, it is necessary to select the object of the assertion. The following statement illustrates an example: "Document-X speaks about enterprise that invests in technology". Where, E n t e r p r i s e
517 and t e c h n o l o g y are concepts belonging to the domain ontology. DS: Direct Simple assertion. Direct Simple assertion is different from the previous ones, because it is built relying on a structure ontology. This assertion specifies the type of the current resource the user is indexing. This is an example of a simple no-semantic assertion: "Document-X is an image". IC: Direct Complex assertion. Direct Complex assertion is well explained by the following example: "Document_X is an image that has a white background". The structure is similar to the indirect complex assertion; the only difference is in the typology of utilized ontology, that is a structural ontology, instead of a domain ontology. 4. Conclusions In this paper, we have presented an approach for developing a Trust Layer service, aimed at improving the quality of automatically generated metadata. Such a system must be supported by a tool for manual creation of complex metadata assertions. A specific solution to this requirement is illustrated in the paper, introducing TAM: a tool allowing to produce different type of metadata assertions in a distributed environment.
Acknowledgments This work was partly funded by the Italian Ministry of Research Fund for Basic Research (FIRB) under projects RBAU01CLNB_001 "Knowledge Management for the Web Infrastructure" (KIWI), and RBNE01JRK8_003 "Metodologie Agili per la Produzione del Software" (MAPS).
References 1. J. Audun, I. Roslan, C.A. Boyd, Survey of Trust and Reputation Systems for Online Service Provision. Decision Support Systems (to appear) (2005) 2. P. Ceravolo, E. Damiani, M. Viviani, Adding a Trust Layer to Semantic Web Metadata To appear in Soft Computing for Information Retrieval on the Web, F. Crestani, E. Herrera-Viedma, G. Pasi, eds., Elsevier. 3. P. Ceravolo, E. Damiani, M. Viviani, Adding a Peer-to-Peer Trust Layer to Metadata Generators, Lecture Notes in Computer Science, vol 3762, pages 809-815, (2005).
518 4. L. Stojanovic, S. Staab, R. Studer, E-Learning based on the Semantic Web. Proceedings of the World Conference on the WWW and Internet, Orlando, Florida, USA, (2001). 5. E. Wenger, Communities of Practice: The Key to Knowledge Strategy, Knowledge Directions, 48-64, (1999).
W E B ACCESS LOG MINING WITH SOFT SEQUENTIAL PATTERNS
C. F I O T , A. L A U R E N T A N D M. T E I S S E I R E LIRMM - CNRS - UMII, Montpellier, France E-mail: {fiot, laurent, teisseire} @lirmm.fr
Mining the time-stamped numerical data contained in web access logs is interesting for numerous applications (e.g. customer targeting, automatic updating of commercial websites or web server dimensioning). In this context, the algorithms for sequential patterns mining do not allow processing numerical information frequently. In previous works we defined fuzzy sequential patterns to cope with the numerical representation problem. In this paper, we apply these algorithms to web mining and assess them through experiments showing the relevancy of this work in the context of web access log mining.
1. Introduction The quantity of data from the World Wide Web is growing dramatically: requested URLs, number of requests or session duration, etc. are gathered automatically by web servers and stored in access log files. Analysing these data can provide useful information for performance enhancement or customer targeting. In this context, many works have been proposed to mine usage patterns and user profiles [1, 2, 3]. Particularly, [4] provides knowledge from database of visited page sequences. However this method, based on sequential pattern mining, cannot be used to mine numerical data contained in these log files, such as number of requests for the same page, transfer rates, number of downloaded kilobytes or duration of sessions. Few works have been carried out to process such numerical data and most of them are restricted to association rules [5, 6]. Sequential patterns are more adapted to time-stamped data. In order to cope with this problem, we propose here to apply an efficient fuzzy approach for sequential pattern mining to mine time-stamped numerical data from web access logs. This approach, defined in our previous works [7], is based on the definition of fuzzy intervals. Obtained patterns are of the type "60% of users visiting a lot the Disneyland website and a few Eiffel Tower pages visit later a lot 519
520 of traveling websites". These patterns are characterized by their support, which is the percentage of users who have followed this rule. Three approaches are proposed to mine such rules: SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY, differing by the support computation. The end-user is allowed to choose between the speed of result extraction and the accuracy of the obtained frequent patterns. Implementation of these solutions is based on a method, which extends the PSP algorithm proposed in [4]. Experiments were carried out on synthetic datasets and on real-world data. They highlight the feasibility and robustness of a fuzzy approach. This paper gives an overview of our algorithms focusing on the processing of web log data. Section 2 introduces sequential patterns and fuzzy sequential patterns. Section 3 presents the experiments on web access logs. Section 4 concludes on the perspectives associated to this work.
2. From Crisp to Fuzzy Sequential Patterns In this section, we briefly describe the basic concepts of sequential patterns and fuzzy sequential patterns. Let T be a set of object records where each record consists of three information elements: an object-id, a timestamp and a set of items. An itemset, (iii-z . • -ik), is a non-empty, unordered set of items taken from / = {ii,i2, •••,*m}- A sequence s is a non-empty ordered list of itemsets, denoted by < siS2-..sp >. The support of a sequence s is the percentage of objects having s in their records. To decide whether a sequence is frequent or not, a minimum support value (minSupp) is specified by the user. A sequence s is said to be frequent if support(s) > minSupp. The problem of sequential pattern mining is to find all maximal frequent sequences [8]. In this context, fuzzy sequential patterns were defined to handle quantitative attributes. [9] and [10] proposed methods to mine fuzzy sequential patterns without providing experimental evaluations. We thus here consider the complete approach from [7], defining three fuzzification levels through the algorithms, SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY. We consider the fuzzy extension of item, itemset and sequence. The quantity universe of each attribute is discretized into fuzzy subsets (see next paragraph). A fuzzy item is the association of one item and one fuzzy set. For instance, [/php/tutor.htm, lot] is a fuzzy item where lot is a fuzzy set defined by a membership function on the access quantity universe of the item /php/tutor.htm. A fuzzy itemset is a set of fuzzy items, e.g. ([/php/tutor.htm,lot][/php/functions.php, little]). Note that the fuzzy itemset {{/php/tutor.htm, lot]{/php/tutor.htm, little])
521 is not a valid fuzzy itemset, since no item can be repeated within an itemset. A fuzzy sequence is a sequence of fuzzy itemsets, e.g. < ([/php/faq.htm,lot})(\/php/eg.php,lot]) >. Table 1.
Access grouped by IPs, temporaly ordered (empty cells for unaccessed pages)
Cust./IP CI 82.228.151.02
86.197.153.12
82.226.199.47 82.226.199.48
| Date dl d2 d3 d4 d5 dl d2 d3 d4 dl d2 d3 d4
|
/php/tutor.htm 2 1
|| 1 j
/php/eg.htm
||
/php/functions.php
3
/php/faq.php
1 1 1 2
4
2
2 1
4 1
||
|
3 1 5
1
dl
2 2
1
First, the quantitative database is converted into a membership degree database. These partitions are automatically built by dividing the universe of quantities into intervals. Each interval groups the same proportion of users. It is then fuzzified in order to enhance generalization. From these membership functions we get the membership degrees for each record and for each fuzzy set. An example is given for CI in Table 2. Table 2. D. dl d2 d3 d4 d5
Membership degrees for customer 1
/php/tutor.htm U. | L. || ,, j , 1 :• J".
Items /php/eg.htm L. || m. li 5
V 7S
/php/functions.php li.
L.
/php/faq.php ||
m
|
T.
i j
0.5
0.3 0.5 ••
fM
.
0.5 II.fi 1
1
The support of a fuzzy itemset (X, A) is the proportion of objects supporting it. We propose three definitions depending on the fuzzification level: (1) SPEEDYFUZZY counts objects recording, for each item of the itemset, a membership degree not null at least once; (2) MINIFUZZY is based on a thresholded count, incrementing the number of objects supporting the fuzzy itemset when each of its item has a membership degree greater than a specified threshold in the data sequence; (3) TOTALLYFUZZY carries out a thresholded S-count. The support is computed as a weighted sum of the membership degrees greater than a relevancy threshold u>. The support of a fuzzy sequence is computed as the ratio of the number of objects supporting this fuzzy sequence compared to the total number of objects in
522 database. This support degree is computed algorithms detailed in [7]. Let us consider the membership database for customer 1 (Table 2) and the support of < ([/php/tutor.htm, lot])([/php/f unctions.php, lot]) >. With SPEEDYFUZZY, we consider the items underlined into account. With MINIFUZZY (w=0.49), we take the items boldfaced into account. With TOTALLYFUZZY (o;=0.49), customer 1 supports the sequence, the best occurrence of the sequence is kept, twice underlined. Table 3. SPEEDYFUZZY MINIFUZZY
TOTALLYFUZZY
Sequential patterns extracted with
minsupp=55%
<([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/tutor.htm, lot]}([/php/functions.php, lot])> <([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/functions.php, lot])> <([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/functions.php, Iot])>
75% 75% 75% 75% 69% 56%
Table 3 shows the sequential patterns respectively extracted by SPEEDYand TOTALLYFUZZY. Note that the frequent items are the same for all counting methods. The difference is in the number and length of the sequences. For a same minSupp, the number and length of the mined patterns are indeed greater with MINIFUZZY or SPEEDYFUZZY than with TOTALLYFUZZY (due to the thresholded E-count). This reduction in the number of patterns can be used for a database containing very high amount of frequent patterns to find the most relevant ones, the user will thus be provided with a selection of patterns, and not only have to assess a selection of patterns and not a really large quantity of them. So SPEEDYFUZZY could be used to identify user profiles, whereas TOTALLYFUZZY for mining detailed downloading rate. FUZZY, MINIFUZZY
3. E x p e r i m e n t s In this section, we present performances of the algorithms SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY compared to PSP [4] and results of web access log mining with TOTALLYFUZZY. We show that soft sequential pattern mining brings more relevant information than crisp methods. Access logs from a laboratory website were prepared and mined to find frequently visited pages - such as in crisp sequential pattern mining - but also repeatedly visited pages. The access logs were pre-processed and the dataset recorded the number of accesses to one page, the same half-day by one user. For example, record "1500 5067 10 6" means that "visitor 21500" on half-day
523 5067 visited 6 times the URL coded by 10. This dataset contained 27,209 web pages visited by 79,756 different IPs over 16 days (32 half-days). As explained previously, we fuzzify the number of access per visitors on each page 3 fuzzy sets using a tool based on the DiscretizeFilter module of Weka [11]. Note that all data modeling choices have impact on what can be extracted from the data. Next experiments should thus be carried out using a fuzzification more adapted to the dataset, based on the approach of [12]. First, we compared performances of the fuzzy algorithms to those of PSP [4]. Figure 1(a), it can be noted that SPEEDYFUZZY is almost as fast as PSP despite the fact that it scans three times more items. Figure 1(b) shows that MINIFUZZY and TOTALLYFUZZY extract less frequent sequences than SPEEDYFUZZY and PSP. MINIFUZZY and ToTALLYFuzzYonly keep the items which have a degree greater than w and so which are considered as relevant by the user. The number of frequent sequences is then necessarily reduced compared to SPEEDYFUZZY or PSP. Figure 1(c) shows the extraction time according to the number of data sequences in database for minSupp = 0.2. These preliminary experiments show that results on fuzzy logs are consistent with the one on crisp values. The same frequent URLs can indeed be found using crisp or soft sequential pattern mining. The advantages of our method concern the additional knowledge supplied by quantities. Indeed, while the crisp algorithm PSP found URL 139 was frequently accessed, TOTALLYFUZZY found that it was frequently accessed between 2 and 5 times during the same period.
Runtime according to the minsup value
(a)
t of frequent sequences according to minSupp
(b)
Runtime according to the sequence number
(c)
Figure 1. (a)Runtime according to rninSupp for 79756 sequences; (b)Number of frequent sequences according to minSupp for 79756 sequences; (c)Runtime according to the number of sequences in the datasets, for minSupp=0.2%
524 4. C o n c l u s i o n a n d p e r s p e c t i v e s Historical analysis of web logs and more especially sequential pattern extraction from web databases is highly interesting for customer targeting, server dimensioning or transfer optimization. In this paper we propose to mine web logs using fuzzy sequential patterns handling three fuzzification levels thanks to three algorithms. This choice allows the extraction of frequent sequences by making a trade-off between relevancy and performance. Experiments on web access logs have highlighted the relevance of our proposal. This work builds many perspectives, for instance, to mine web-purchases for e-marketing.
References 1. M. Spiliopoulou and L. C. Faulstich. WUM: A tool for Web utilization analysis. Lecture Notes in Computer Science, 1590, 1999. 2. O.-R. Zaiane, M. Xin and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Proc. of the Advances in Digital Libraries Conf., pages 19-30, 1998. 3. E. Damiani, B. Oliboni, E. Quintarelli and L. Tanca. Modeling users' navigation history. In WS. on Int. Tech. for Web Personalisation, 2001. 4. F. Masseglia, P. Poncelet and R. Cicchetti. An efficient algorithm for web usage mining. Networking and Information Systems Journal, 2(5-6), 1999. 5. C. M. Kuok, A. W.-C. Fu and M. H. Wong. Mining Fuzzy Association Rules in Databases. SIGMOD Record, 27(1), pages 41-46, 1998. 6. R. Srikant and R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1-12, 1996. 7. C. Fiot, A. Laurent, M. Teisseire and B. Laurent. Why Fuzzy Sequential Patterns can Help Data Summarization: An Application to the INPI Trademark Database. In Proc. of the 2006 Fuzz-IEEE Conference, to appear. 8. R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proc. of the 11th Int. Conf. on Data Engineering, pages 3-14, 1995. 9. T.P. Hong, K.Y. Lin and S.L. Wang. Mining Fuzzy Sequential Patterns from Multiple-Items Transactions. In Proc. of the Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., pages 1317-1321, 2001. 10. Y.-C. Hu, R.-S. Chen, G.-H. Tzeng and J.-H. Shieh. A Fuzzy Data Mining Algorithm for Finding Sequential Patterns. Int. J. of Uncertainty Fuzziness Knowledge-Based Systems, 11(2), pages 173-193, 2003. 11. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools with Java Implementations, 2000. 12. A. Gyenesei and J. Teuhola. Multidimensional fuzzy partitioning of attribute ranges for mining quantitative data: Research articles. Int. J. Intell. Syst., 19(11), pages 1111-1126, 2004.
AN IMPROVED ECC DIGITAL SIGNATURE ALGORITHM AND APPLICATION IN E-COMMERCE XU XIAO-PING Guangdong
Polytechnic
Normal
University
Guangzhou
Electron-information
Department,
510633
Abstract: It is well known that Elliptic Curve Cryptograph (ECC) features the highest bit security. The author analyses the advantages of ECC by comparison, explores the mathematics base on elliptic curve and the complexity of discrete logarithm, and makes improvements on existing Elliptic Curve Digital Signature Algorithm (ECDSA) to accelerate the arithmetic speed and shorten time for data transmission. Better effects have been achieved by applying ECC into E-commerce to realize digital signature algorithm. Keyword: Elliptic Curve Cryptograph, Digital Signature, E-commerce
1 Introduction The popularity of the Internet and the sharing of the information have now reached a new level, which have made information security protruded day by day. Key technology is critical in solving network information security. The study on key has been increasing in an explosive way over the past 20 years. RSA, a popular technology, has matured and has higher requirements on bit length. The increase on key module has slowed down its speed in encryption and decryption. Its realization by hardware also becomes more and more unbearable. Therefore the application of RAS has become a heavy burden, especially its application in E-commerce, a substantive way of safe trading. ECC adopts small module and still reaches the same level of security as RAS. Thus the study on ECC has become hot for domestic and overseas scholars. The author has made some improvements on existing ECDSA. Its arithmetic speed has been accelerated and its data transmission has been shortened. The author also studies the application of its improved algorithm in E-commerce and makes E-commerce faster and safer. 2 ECC Systems and its Comparison Since the birth of public key cryptography, people have presented various public key cryptography methods. The following three types of systems have been thought to be safe and effective: Big Integer Factorization System (The representative one is RSA), ECC and Discrete Logarithm System (The representative one is DSA.) 2.1 RSA RSA was firstly put forward in 1978 by three professors from Massachusetts Institute of Technology: Rivest, shamir and Adleman. This is a 525
526 unilateral trapdoor function, an exponential function based on factorization. An exponential function is a unilateral trapdoor function based on seeking discrete logarithm. It is a matured and widely used key system nowadays. The security of RSA is based on the difficulty of factorizing big integer in number theory. Therefore if the integer is bigger, the factorizing will be more difficult. It will be more difficult to decrypt the key. The level of encryption will be much higher. The advantages of RSA methods are: it is simple in theory and easy to use. But with the advancement and perfection of big integer factorization, the raising of computer operation speed and the advancement of computer network, the integer for safeguarding RSA encryption and decryption is required to be bigger and bigger. To ensure RSA safety, the length of its key is expanding. For example, today it is thought RSA needs 1024-bit length to protect its safety'11. Therefore it brings a heavy burden to use RSA and its application range is limited. 2.2 DSA DSA has been presented by National Institute of Standards and Technology of U.S. in August 1991. It is a digital signature standard based on discrete logarithm which is used for Digital Signature Standard (DSS). This is a DSS of public key. DSA uses a public key to verify data integrity and consistency. DSA only provides digital signature. It does not provide data encryption'21. 2.3 ECC In 1985, Neil Koblitz and Victor Miller respectively put forward Elliptic Curve Cryptograph (ECC). The security of ECC not only relies on the factorization of discrete logarithm at elliptic curve, but also on the selection of the curve and system of curve. At present, 200 bits ECC already has very high security. The mathematic theory of ECC is that'31: at a binary finite field F, an elliptic curve is developed based on certain rule where addition and multiplication are defined. Suppose that two points P and Q are known at elliptic curve: Q=xP Seek x? The seeking of X is the well-known discrete logarithm on elliptic curve. At one side, it maps multiplication and exponent arithmetic on field of real numbers into addition on elliptic curve. It is faster and easier than other public key systems whether ECC is realized by hardware or software. The cost is also very low. At the other side, seeking X at elliptic curve is involved in both integer factorization and discrete logarithm, which is undoubtedly more difficult. It is quite natural that this system has higher security.
527 Table 1: Comparison among Various Public Key Systems in key length Protected key (bit)
80
E C C (bit) n
161
D S A (bit) R S A (bit) n
112 224
128 256
192 384
256 512
q
160
224
256
384
512
P
1024
2048
3072
7680
15360
1024
2048
3072
7680
15360
The security of encryption is reflected by its ability to resist attacking. Compared to other public key systems. ECC has absolute advantages in resist attacking. For example, 160-bit ECC has the same level of security as 1024-bit RSA or DSA. 210-bit ECC has the same level of security as 204-bit RSA or DSA.
Decryption Time (MIPS Years) Figure 1: Comparison among ECC, RSA and DSA in Resisting Attacking
ECC has the following technical advantages: 1 .Smaller computational complexity and it is quick in operation. Though RSA can raise key processing speed by selecting smaller key (can be as small as 3) to speed up processing of encryption and signature verification and make RSA compatible in encryption and signature verification speed, ECC is much more faster in processing key (decryption and signature verification) than RSA or DSA. Therefore the comprehensive speed of ECC is much faster than that of RSA or DSA. 2.Smaller storage space Key size and system parameter of ECC is much smaller than RSA or DSA. This means that its storage space is much smaller. It is very significant in applying encryption on IC card.
528 3. Lower bandwidth requirement. When long message is encrypted or decrypted, these three key systems have the same requirements on bandwidth. But ECC has much lower bandwidth requirements when short message is processed. While key encryption is mainly used for processing short message, for example, for digital signature or conversation key transmission in symmetrical system. The low requirements on bandwidth make ECC have very wide prospect in application at wireless network. These features from ECC will make it take the place of RSA and become a popular public key decryption algorithm. For example, SET framer has decided to make ECC a default public key algorithm for next SET protocol. 3 ECC Digital Signature Algorithms and its Improvement Digital signature is used to guarantee data integrity in transmission and provide identity verification and non-repudiation of sender. Using public key algorithm is the main technology to realize digital signature. The flow chart of digital signature and verification • Plaint Text Hashed Value Plaint lext
^
Hash Algorithm
Signature Algorithm
uigitai signature
Private Key | Figure 2: Digital Signat ure Flow chart Plaint Text Plaint Text
^
Hash Algorithm
Hashed Value ..
Verification Algorithm ^
Digital Signature
Valid or not ^
iL
Public Key
Figure 3: Signature Verification Process
By existing elliptic curve digital signature algorithm, we have made some improvements in ECC to make its operation burden decreased and its speed accelerated. Inversion is the main operation burden in existing ECC encryption or signature process. For example, in verifying signature, the algorithm of w= s"1 (mod n), which uses extended Euclid algorithm and needs to complete 0.8431og2(n)+1.47 divisions in average, is very slow'41. We developed a totally
529 new signature equation, which does not need inversion in its algorithmic process. The signature and verifying equations are as following: Signature Equation: k=s+mrx Verifying Equation: r=sg+mry Operation: sg+mry=(k-mrx)g+mrxg=kg=r From above process, we can understand that it is same to generation of elliptic curve and distribution of private keys in existing IEEE PI363. The difference lies in its signature generation and verification process. The improved algorithm reduces the operation burden and speeds up the operation, which has been verified by experimental results. Under the premium that the same security will be provided, this solution has lower requirements on system resources and is much more suitable for using at environment that is limited by algorithm or storage capability, such as intelligent card etc'51. 4 Application of Improved ECC in E-commerce E-commerce is the inevitable result from social and technology development. The network transaction needs a lot of information, such as information on product manufacturing, supply, product demand, competition etc. Compared to traditional transaction, E-commerce has higher requirements on information security'61. The core and key of E-commerce security is the security of e-transaction. At present, the popular safe on-line payment protocols used in e-transaction are SSL and SET. SET (Safe Electronic Transaction) is one of the protocols to guarantee safe transaction on Internet. Because of its strict design and high security, it has been widely used. In SET protocol, various encryption algorithms, including symmetrical and unsymmetrical encryption, have been used to realize safety. In text, we will apply improved ecliptic curve digital signature algorithm into SET. 4.1 SHA Application SI.Add the text for abstracting; S2.Reckon abstract S3.Send your information and abstract to other (business or bank) S4.The other side (business or bank) initializes in the same way. Added text, reckon abstract and compare if abstracts are same. 4.2 Signing Process SI.Cardholder uses SHA to generate abstract H(OI), H(PI) from (OlorPI) . S2.Card holder decides elliptic curve parameters F= (P, a, b, g, n, h) or (m, f(x), a, b, g> n, h) . S3.Cardholder sends decided hash function (SHA) and parameters of elliptic parameters to business and bank.
530 S4.Cardholder chooses private key X based on decided limited field G (P) and elliptic curve. Public key Y will be got from the public point g: y=xg. The signer makes y in public. S5.Cardholder chooses random parameter K, l ^ K ^ n - 1 . S6.Reckon r=kg. If r=0, go to step S5. S7.Reckon s=mrx-k. ( m is text abstract). S8.Use (s, r) to sign m. Send (s, r) with m to verifier. 4.3 Verifying Process S9.The business reckon r'=sg+mry. SlO.If r'=r, then signature is valid. Otherwise the signature will be turned down. 4.4 Code The selection of elliptic curve adopts IEEE P2363 standard. The selection of parameters171 is as following: // Various fields: p(t) = tA163 + tA7 + tA6 + tA3 + 1 inline void use_NIST_B_163 () {F2X pt=Pentanomial (163, 7, 6, 3, 0); setModulus (pt);} // Degree 163 Binary Field from fips 186-2 // (fake random curve E) : yA2 + xy = xA3 + xA2 + b, // b = 2 0a601907 b8c953ca 1481ebl0 512f7874 4a3205fd // (Order of base point) : // r = 5846006549323611672814742442876390689256843201587 // (Base point G) : // Gx = 3 f0ebal62 86a2d57e a0991168 d4994637 e8343e36 // Gy = 0 d51fbc6c 71a0094f a2cdd545 bllc5c0c 797324A // remaining factor h = 2 #define NIST_B_163 EC_Domain_Parameters (163, 3, 7, 6, 3, Curve ("1", "20a601907b8c953cal481ebl0512f78744a3205fd"), decto_Bigint ("5846006549323611672814742442876390689256843201587"), Point ("3f0ebal6286a2d57ea0991168d4994637e8343e36", "0d51 fbc6c71 a0094fa2cdd545b 11 c5c0c797324f 1"), decto_BigInt ("2")); void ecdsa_ex () { // Degree 163 Binary Field from fips 186-2 use_NIST_B_163 (); EC_Domain_Parameters dp = NIST_B_163; ECPrivKey sk (dp);// (generate key pair from elliptic curve parameters) std::string M ("correct message"); ECDSAsigl (sk,OS2IP(SHAl(M)));// (generate signature) // (DER encode) DERder_str(sigl);
531 HexEncoder hex_str (der_str); std::cout« "DER Encoding: " « hex_str « std::endl; ECDSA sig2; try {// (analyze and check the error from DER) sig2 = der_str.toECDSA (); // (decode) } catch (borzoiException e) {// (Print error message and exit) e.debug_print (); return;} ECPubKey pk (sk); std::cout« "Checking signature against M: " « M.c_str ( ) « "\n->"; std::cout« "SHA1(M):" « OS2IP(SHAl(M)) « std::endl; if (sig2.verify(pk, OS2IP(SHAl(M))» // (verify and sign) std::cout«"valid signature\n"; else std::cout« "invalid signature\n"; M = "in" + M; // (falsify data) s t d . x o u t « "Checking signature against M: " « M.c_str ( ) « "\n->"; if (sig2.verify(pk, OS2IP(SHAl(M)))) // (verify signature) std::cout« "valid signature\n"; else std::cout« "invalid signature\n";} The improved algorithm has higher performance now and it is good for implementing key, but not good to crack key. The improved algorithm has raised key practicability as well as guaranteed key security, simplified operation complexity and accelerated operation speed. These have been verified by experiment.
i^iiirK , 'P* , ' Hi 7^ J *^Kag.5iri!,~* zva,," -- •*•'-!™ .•• •
- Pi xf
deyfil: 3a3id53081a20btt72a8648ce3d82B130819602Biai3Blaa28288a30f>092a8648ce3dBia2U 3033O0?e2@18302010C02ei8?:i82e04i50000OB8000@0800000008008fl80BB008B0B008000104i!;B 2B«6B1907b8c953cal48iebl0512f 78V44a328SFdB42MM«3F0eljal6286a2dS7eaB9¥l.:1.68d4V9463 7i;8343e360Bd51fbc6r.7:ta0094f«2c:dd545hllc5c0i:797324ri82:lS04B8080Baa08800808800292f e77e70ci2a4234c33020102032eB8042bB48:l9bc9ba«e09S34M045224S'»92dF5dM>834b86SFfBiB e26418028aa7S9ae3376ai92c69a8014e34FS8c Aei-d?.-- 3881dS3081«.206872*8648<;B3da201388i96028:1013Bl»82028B»306092a8648ce3dai82M 3033009028103828184020J.07302ea41S0BB0800000fl0B8008fl0000000BBB8888000008BB0i041S0 28a6019a?b8c953cal481eM0S'12F?8?44a328£Fd042b0403F0ebal628Ga2dS7ea899U68d499463 7c8343e360Brt5lFhc6r.7la0B94fa2cdd51KhllcHc0i;797324Fl.02i504BBBB8fl0000B00000a00292F e7?e78cl2a4234c330201B2U32sB0842hB40:l.9bc9bd«e09!i34blM45224S492dF5db6834bS65FF01W e2«41.8B28aa7S9ae3376Bl92c69aSB14e34fSI»r: OK Pi'ess anij key to continue^
Figure 4: the experiment and its results.
532 5 Conclusions With continuous popularity of Internet, E-commerce has become social development trend. As a network payment protocol, SET is in continuous perfection. After ECC becomes a hot issue, there are a lot of studies on its application. The author has made some improvements on existing elliptic curve digital signature algorithm and has verified the theory and practical applications on improved algorithm. The improved algorithm has simplified operation complexity, accelerated operation and guaranteed key security. It is suitable for application in SET. Reference [1] Li Ke-hong, Wang Da-ling and Dong Xiao-mei. Applied Password Study and Computer Data Security. Northeast University Publishing House. Shengyang. 1997. [2] Written by Bruce Schneier; Translated by Wu Shi-zhong, Zhu Shi-xiong and Zhang Wen-zheng. Applied Password Study —Protocol, Algorithm and C Source Program. Machine Press. Beijing. 2002. [3] National Institute for Standards and Technology, "Digital signature standard', FIPS Publication 186, 1993 [4] Qing Si-han. Password Study and Computer Network Safety. Qinghua University Publishing House. Beijing. 2001. [5] Lu Sheng, Miao Quan-xing, Bian Zheng-zhong, Luo Rong-tong. Design and Realization of Elliptic Curve Intelligent Card Algorithm. 2003(9):25-2 [6] Han Bao-ming, Du Peng and Liu Hua. Safety and Payment of E-commerce. People's Post and Telecommuniations Publishing House. Beijing. 2001. [7] G Agnew, R. Mullin and S. Vanstone, "An implementation of elliptic curve cryptosystems over F2155", IEEE Journal on Selected Areas in Communications, 11 (1993), 804-813. Note: This topic is a natural science study project for universities from Education Department of Guangdong Province. The project number is 203061. Author: Xu Xiao-ping, Associate Professor of Computer Applications. Mainly involved in the study of Internet Environment and Applications. Address: Room C903, New Teachers Village, South China Normal University, Guangzhou P C : 510631 Email: cathy.xu@ 163.com
AN IMMUNE SYMMETRICAL NETWORK-BASED SERVICE MODEL IN PEER-TO-PEER NETWORK ENVHIONMENT XIANGFENG ZHANG1, LIHONG REN1, AND YONGSHENG DING1'2, * I) College of Information Sciences and Technology 2) Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education Donghua University, Shanghai 201620, P. R. China * Email: [email protected] Requirements for future Internet are implemented possibly in peer-to-peer network environments where all network nodes are equal to each other. Inspired by the similar features between immune systems and future Internet, based on the immune symmetrical network theory, a network service model in a peer-to-peer network environment is designed on our bio-network platform. To validate the feasibility of the model, we do experiments with different network service distributions and user requests towards each node. The results of the hops per request with time show that the users can acquire network services flexibly and efficiently.
1. Introduction The development of Internet technologies enables more and more man-made devices to access Internet and act as its components, which shows us a bright prospect of Internet. Obviously, future Internet must be capable of extensibility, survivability, mobility, and adaptability to network environments. It is necessary to optimize Internet architecture and its applications to address the challenges of the above key features. On the other hand, biological immune systems are adaptive systems and learning behaviors take place through evolutionary mechanisms similar to biological evolutions. They are distributed systems without central control. They can survive local failures and external attacks and maintain balance because of emergent behaviors of the interactions of many local elements, like immune cells. Inspired by the similar features between the immune systems and future Internet, a bio-network platform architecture has been designed in our former work. Bio-entities in the bio-network architecture are regarded as autonomous immune cells and they possess the characteristics of immune cells, such as interaction, no central control, diversity, mobility, and evolution. 533
534 Requirements for future Internet are implemented possibly in peer-to-peer (P2P) network environments where all network nodes are equal to each other and provide network application and services for other nodes in a distributed mode. Hoffmann [1] has proposed a symmetrical network theory for the immune system, which is a tractable first approximation. Principal lymphocytes that are involved in the response to a particular antigen are classified into just two specificity classes or sets. The interaction between these two sets maintains the balance of the immune system. So we develop a network service model in a P2P network environment based on the immune symmetrical network theory. This paper aims at the design a network service model in a P2P network environment. To this end, it is organized as follows: Section 2 will introduce some basic theories about immune symmetrical system and P2P network. Section 3 will present briefly our bio-network platform architecture. Section 4 will emphasize on the creation of the network service model. In order to validate its feasibility, some experiments are made on the bio-network platform with different network service distributions and user requests towards each node. The results show that the users can acquire network services flexibly and efficiently. Section 5 concludes the paper by discussing advantages of the service model. 2. Background 2.1. Immune Symmetrical Network The natural immune system is a very complex system with several functional components. It plays an important role in coping with dynamically changing environment through constructing self-non-self recognition networks among different species of antibodies. According to the immunologists, the components such as cells, molecules and organs in the immune system can prevent a body from being damaged by pathogens, known as antigens. The basic components of the immune system are lymphocytes that have two major types, B lymphocytes (B cells) and T lymphocytes (T cells). These two types play different roles in the immune response, but they act together and control or affect each other. The immune system is an adaptive system and learning behaviors take place through evolutionary mechanisms similar to biological evolution. It is a distributed system without central control. It can survive local failures and external attacks and maintains balance because of emergent behaviors of many local elements, like immune cells. Hoffmann [1] has proposed a symmetrical network theory for the immune system. The principal lymphocytes are classified into just two specificity classes. The first class is the antigen-binding set, denoted T+ andB+ for T
535 cells and B cells respectively. The second set is minus or anti-idiotypic set, T and B -. There are three types of interaction between the plus and minus sets as follows: stimulation, inhibition, and killing. Stimulation can occur when two lymphocytes encounter each other. The receptors of one lymphocyte ('+') can cross-link the receptors of a second lymphocyte ('-'), the converse is also true, so stimulation is assumed to be symmetrical in both directions between these two sets. Specific T cells factors could inhibit receptors. Finally, antibody molecules are assumed to kill in a symmetrical mode. According to interactions among B cells and T cells, we can receive a set of four stable steady states for the system of T+, B +, T - and B - cells. The steady states are initial, suppressive, responsive, and anti-responsive. 2.2. Peer-to-Peer Network Although the traditional client-server model first establishes Internet's backbone, more and more clients enter Internet and loads on the servers are steadily rising, resulting in long access times and server breakdowns. The user requests and communications among users are completed through application server. While P2P systems offer an alternative to such traditional systems for a number of application domains. In P2P systems, every node (peer) of the system acts as both client and server and provides part of the overall resources/information available from the system. In a pure P2P system, no central coordination or central database exists and no peer has a global view of the system. Participating peers are autonomous and self-organize the system's structure, i.e., global behavior emerges from local interactions. P2P technologies have many applications, such as file sharing and exchanging, distributed computing, collaborative system, peer-to-peer computing, and enterprise application. Like most other P2P applications, Gnutella builds, at the application level, a virtual network with its own routing mechanisms. Reference [2] extracts the topology of Gnutella's application level network. A content location solution is also proposed in which peers loosely organize themselves into an interest-based structure on top of the existing Gnutella network and this method makes Gnutella a more competitive solution [3]. Yang [4] further evaluates a non-forwarding peer-to-peer search. 3. A Bio-Network Platform Architecture In our previous work [5,6], we regarded services and applications on Internet as a number of interacting entities and applied some key principles and mechanisms of the immune systems to build network computation models. Based on these models, we have built a new bio-network platform architecture
536 (see Figure 1.) and its simulation platform which has the capability of service emergence, evolution etc. The simulation platform can be used to simulate some complex services and applications for Internet or distributed network.
Bio-network Low-level Functional Modules
Figure 1. The bio-network platform architecture.
In the bio-network platform architecture, a basic and important entity is called bio-entity. A bio-entity is an autonomous mobile agent and analogous to a cell or an antibody in the immune system. The bio-entity consists of its attribute, behaviors, services information, and communication. Interactions among a group of bio-entities can form an emergent service or application, called a society-entity. Apart from the bio-entities, the layered infrastructure consists of bio-entities' survivable environment, bio-network core services, and bio-network low-level functional modules established in a network node. Obviously, the bio-network platform hosts in a network node. The bio-network architecture has the advantages of extensibility, survivability, mobility, and adaptability to the changes of different users and network environments over the current Internet. Bio-entity survivable environment. The Bio-entity survivable environment is a runtime environment for deploying and executing bio-entities and protects the node from attacking with some security policies. It exists on different platforms so that the bio-entities could migrate in the heterogeneous network and access resources of different systems. Bio-network core services. The bio-network core service layer provides a set of general-purpose runtime services that are frequently used by bio-entities. They include event processing service and some basic services such as lifecycle
537 service, directory service, naming service, community sensing service, bio-entities migration service, evolution state management service, interaction control service, credit-driven control service, security authentication service, and application service. All these services alleviate bio-entities from low-level operations and also allow bio-entities to be lightweight by separating them from routine work. Bio-network low-level functional modules. In the bio-network low-level functional modules, there are six main modules: local resource management, bio-entity registration, bio-entity state control, local security, message transport, and class loader. The ideal model would place a bio-network platform on every device as a network node. The modules are just a bridge to maintain access to local resources. 4. A Service Model in Peer-to-Peer Network Environment 4.1. The Creation ofP2P Network Environment We implement the P2P network model on the designed bio-network simulation platform according to the symmetrical network theory of immune systems. The network can provide services and applications to users. Because of complexity and heterogeneity of future Internet, the network structure is very changeable, from just few nodes to incalculable nodes. Users can communicate directly, share resources and collaborate with each other. Users or services, represented by bio-entities in the network nodes, can be regarded as antibodies or anti-antibodies in the immune system. The bio-entities in different nodes interact equally. They have three actions between two neighbor nodes and keep four steady states. There are several bio-entities on the nodes to provide network services. The inter-connecting nodes are called symmetrical nodes. As an example, Node 1(N1) and Node 2 (N2) are regarded as two sets and bio-entities in the bio-network are regarded as antibodies in the symmetrical immune network, thus interactions of bio-entities in these two nodes exit stimulation, inhibition, and killing. The two nodes have four stable states: initial state, suppressive state, responsive state, and anti-responsive state, as shown in Figure 2.These states are shown in details as follows. (1) Initial state. When user requests to nodes are few, the bio-entities in the node can provide enough services to the users so that the users need not send requests to other nodes. At the same time, bio-entities have not enough credits to provide services to the other nodes.
538 (2) Suppressive state. With the increment of user requests, bio-entities in the network nodes require more and more credits to reward their services and they will evolve to produce next generations. At the same time, bio-entities in the node cannot provide enough services to the request users towards the node. For instance, the bio-entities on Nl have not enough credits for some users; while the bio-entities on N2 suppress the users of Nl to access their services because they have to provide services to their own users. The two nodes suppress each other to use their own resources and services, so the bio-entities in the nodes keep suppressive state. Nl
oo o o o
(1) Initial state (2) Suppressive state (3) Responsive state
(4) Anti-responsive state
Fig. 2.
Interaction of two end nodes in P2P network.
(3) Responsive state. When the requests towards Nl are increasing while the requests towards N2 are decreasing, a lot of unused resources or bio-entities exist on N2, bio-entities on Nl stimulus bio-entities on N2 and the latter provides services directly or migrates to Nl to provide services. When the services are provided, the bio-entities return their nodes and establish relationship with Nl. The state of the two nodes is in responsive state. With the decrement of bio-entities on N2, the services to N2 reduce and the credit level of Nl is higher than that of N2 so that Nl and N2 cannot keep the relationship, they will look for new nodes, and return the initial state. But the relationship between these nodes still exits and the bio-entities on the nodes interact with each other to provide services. If a bio-entity, denoting user requests, needs network services on other nodes, it sends a request to all of its known neighbors. The neighbors then check to see whether they can reply to the request by matching it to the service type. If they find a match, then they will reply; otherwise, they will forward the query to their own neighbors and increase the message's hop count. For instance, there are three bio-entities A, B, and C hosting on Nl, N2, and N3 respectively. Bio-entity A sends requests to other bio-entities of its neighbors. If B can provide service, it replies to A. At the same time, B acquires credits from A and establishes a relationship with B. If B cannot provide services to A, it sends the
539 request to C, and C will migrate to Nl to provide service for A. Then C returns its node and establishes a relationship with A. At the same time, bio-entities A and C announce their relationships to odier bio-entities on Nl. Bio-entities on Nl may access services on N3 directly and save time because of the decrease of the hops next time. (4) Anti-responsive state. The state is opposite to responsive state. The bio-entities on Nl provide services to requests from N2 and receive the credits. 4.2. Simulation Experiments In the simulation platform, we implement the P2P network model with two thousand nodes and simulate the service access. Before starting the experiments, we model the following parameters: (1) service information distribution; (2) request distribution encapsulates information on users' behavior. First, we set simulation parameters and set up simulation environments. We assume two request distributions: an unbiased distribution, with all requests having the same probability of being submitted, and a biased distribution, with a small number of requests are frequently asked. Suppose ten kinds of different network services distribute randomly on different nodes. In these experiments, we assume static resource distributions and no failures. The resource distributions are balanced and unbalanced. 100
H
m
|
40
O - -"— -13' -
Unbalanced & biased Unbalanced & unbiased Balanced & biased Balanced & unbiased
I 20
10
20 30 40 Simulator Time (generations)
50
Figure 3. Average number of hops per request with the change of simulation time.
Average number of hops per request with time is shown in the Figure 3. We can see that biased user request can access more easily services than biased uses request under unbalanced or balanced resource distribution. As to a type of user request, bio-entities can migrate much more efficiently resource nodes and achieve much more easily resource services under the unbalanced resource distribution.
540 5. Conclusions In this paper, a network service model in P2P network environment is designed which based on the immune symmetrical network theory. Bio-entities on two nodes can interact through stimulation, inhibition, and killing to provide some services. From the simulation experiments, the nodes are designed in four stable states as those in the immune symmetrical network. Different request distribution affects the average number of hops. The interaction among bio-entities maintains the balance of bio-network and makes the resources utilized reasonably and optimizes architectures of bio-network. The characteristics of biological immune systems can satisfy service evolution, adaptability, and security of future Internet, so it is necessary to study other bio-network computational models to improve the exiting network architecture and to make future network services become much more intelligent and individual. Acknowledgments This work was supported in part by the Key Project of the National Nature Science Foundation of China (No. 60534020), the National Nature Science Foundation of China (No. 60474037 and 60004006), and Program for New Century Excellent Talents in University (No. NCET-04-415). References 1. G. W. Hoffmann, A neural network model based on the analogy with the immune system, J. Theoretical Biology, 122, 33-67(1986). 2. M. Ripeanu, I. Foster, and A. Iamnitchi, Mapping the Gnutella network: Properties of large-scale peer-to-peer systems and implications for system design, IEEE Internet Computing Journal. 6(1),(2002). 3. K. Sripanidkulchai, B. Maggs, and H. Zhang, Efficient content location using Interest-based locality in peer-to-peer systems, Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies, IEEE INFCOM2003, 30 March- 3 April, San Francisco, California, USA(2003). 4. B. Yang, P. Vinograd, H. G. Molina, Evaluating GUESS and non-Forwarding peer-to-peer search, The 24th IEEE Internet Conference on Distributed Computing Systems, Hachioji, Tokyo, Japan, 24-26, March 2004 (2004). 5. Y. S. Ding and L. H.Ren, Design of a bio-network architecture based on immune emergent computation, Control and Decision (in Chinese). 18(2), 185-189(2003). 6. L. Gao, Y. S. Ding, and L. H. Ren , A Novel Ecological Network-Based Computation Platform as Grid Middleware System, Int. J. Intelligent Systems. 19(10), 859-884 (2004).
M A C H I N E L E A R N I N G A N D S O F T - C O M P U T I N G IN BIOINFORMATICS - A SHORT J O U R N E Y F.-M. SCHLEIF a AND T. VILLMANN University of Leipzig, Department of Mathematics and Computer Science, Institute of Computer Science, Leipzig, Germany T. ELSSNER AND J. DECKER AND M. KOSTRZEWA Bruker Daltonik GmbH, Fahrenheitstrafie 4, D-28359 Bremen,
Germany
Bioinformatics is a promising and innovative research field which gains hope to develop new approaches for e.g. medical diagnostics. Appropriate methods for pre processing as well as high level analysis are needed. In this overview we highlight some aspects in the light of machine learning and neural networks. Thereby, we indicate crucial problems like the curse of dimensionality and give possibilities for overcoming. In an examplary application we shortly demonstrate a possible scenario in the field of mass spectrometry analysis in cancer detection. Despite of a high number of techniques dedicated to bioinformatic research as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, chemistry, physics, and mathematics.
1. Introduction Bioinformatics is a challenging field in research on the border between computer science and biology. It brings together scientists from mathematics, physics, chemistry, medicine and biology. The applications of bioinformatic research ranging from food industry, medical analytic, drug development in pharmacy to agriculture. From computer science point of view a broad variety of methods contribute to these applications, including artificial neural networks, statistical and Bayes methods, tree-based systems, image processing, statistical pattern recognition, visualization models to name just a few 1 " 8 . In the following we will shortly outline basic directions of methods in softcomputing for bioinformatics and link the contributions from this special session therein. a
Corresponding author: Frank-Michael Schleif: Bruker Daltonik GmbH, Permoserstr. 15, D-04318 Leipzig, Germany, Tel/(Fax): +49 341 24 31-408(404), [email protected]
541
542 2. Data analysis - clustering and classification Data analysis in the field of bioinformatics takes place after maybe advanced data processing 9 . Main issues of data mining and data analysis are knowledge extraction, modelling of biological processes, generating of classification and decision systems featuring the biological knowledge, which can be used for classification, regression and prediction 10 . Thereby, the data frequently are noisy, high-dimensional/complex and, in particular in medicine, sparse. These aspects cause several difficulties. Especially, many of the traditional statistical methods can not be applied in medical or bioinformatic applications due to these restrictions u . Methods of soft-computing offer alternative ways to handle these difficulties. Clustering as an unsupervised paradigm plays a central role. It can have different purposes depending on the context: data compression, identification of typical patterns, efficient data description. Standard method is the commonly used agglomerative hierarchical clustering, which can be realized in several ways. A more robust class of methods are prototype based vector quantizer as k-means or its fuzzy counterpart fuzzy-k-means 12 , 13 . A robust type of vector quantizer is the family of neural maps 14 . Thereby, the self-organizing map (SOM, 15 ) and the neural gas algorithm (NG, 16 ) are prominent examples, which are successfully used in bioinformatic clustering tasks l r . These unsupervised algorithms allow information optimum compression and clustering of data together with robust noise tolerant behavior 18
In close relation to clustering and compression is complexity reduction of data. Standard methods are principal component analysis (PCA), Fourierand Wavelet analysis for feature extraction 5 or feature extraction by optimization of mutual information 19 or other objective functions. The growing variant of the above mentioned SOM can be taken as non-linear PCA 20 . Another paradigm is classification 21 . In machine learning context, it belongs to supervised learning methods. After training the classification model by pre-classified examples, the model should decide for unknown data the (probably) respective class. Classical statistics uses Bayes inference, which usually require knowledge or assumptions about the mathematical type of distribution of the data 22 . Alternatively, decision trees are a commonly applied method in biological or clinical classification systems 23 . Again, several objectives are possible. Usually some kind of information measures is used (Gini-index or Renyi-index) 24 . In bioinformatics, decision trees are applied for phylogenetic trees, for example 25 . This can be combined with neural networks approaches 26 . A popular prototype based classification method is the family of kernel methods, and, in particular, the support vector machines (SVMs) 27 . Based on the usage of separation properties of huge-dimensional spaces, classification is obtained by mapping of data into such high-dimensional
543 spaces and subsequent separation margin maximization. SVMs are successfully applied to complex data in bioinformatics as gene expression data in micro-array analysis 2 8 , 3 . A combination of SVMs together with decision trees for classification of plant spectra is provided in 29 . Learning vector quantization as another prototype based classification method allows an intuitive interpretation of the class dependent prototypebased classification decision as well as of the adaptation process (learning), which tries to approximate the Bayes borders of classification 14 . Several methods have been established to improve the standard algorithm LVQ2.1 concerning aspects like overlapping classes (Generalized LVQ - GLVQ) or insensitivity to initial conditions by neighborhood cooperativeness using the framework of NG (Supervised NG) 3 0 , 3 1 . Recent approaches also include probabilistic (fuzzy) classification extensions 32 , which have been successfully applied in proteomics and image segmentation of barley grains 3 3 , 3 4 . A challenging issue is the use of non-standard metric for classification 3 5 . Instead of the widely used Euclidean metric, other, more task specific metric should be applied. Examples are the Tanimoto-distance 2 1 widely applied in taxonomy, LIK-kernels or correlation measures in splice site recognition and microarray-analysis 3 6 , 3 7 . Further, parametrized distance measure can be applied. They allow an optimal adaptation to the given classification or clustering tasks. Adaptive metric were applied to cluster gene expression data 38 and cancer spectral data 39 . 3. D a t a visualization Data visualization of high-dimensional data may offer new insights in data structures and, therefore, is of increasing interest for biological experts, too. Standard tools for distance preserving mapping are curvilinear mapping or multidimensional scaling (MDS) 4 0 . Other methods like (linear) principal component analysis (PCA) obtain low-dimensional models by dimensionality reduction, which easily can be visualized 21 . A non-linear PCA can be realized by the growing self-organizing map (GSOM) 20 . It was demonstrated that this method is suitable to visualize the internal data structure of large-dimensional data by faithful data representation covering both statistical and structural (topological) properties, due to the fact, that most high-dimensional data have a low-dimensional internal representation caused by high inner correlations 41 . Applications in bioinformatics comprise visualization of micro-array analysis, gene expression data 42 , and proteomics 33 . Another type of dimensionality reduction scheme is based on the search for independent components based on blind source separation 43 . Thereby, usually a linear mixing of the unknown sources is assumed. The method determines the inverse mixing matrix such that the original low-dimensional sources can be reconstructed and thereafter may be visualized.
544 Another complicate task is the visualization of decision processes or structured non-metric data. The first one can be taken as visualization of decision trees which, hence, leads to the more general problem of visualization of trees. This includes also the above mention phylogenetic trees and, therefore, the visualization of phylogenetic structures and dependen44 45
cies ^ , * ° .
4. Exemplary clinical proteomics application in the light of bioinformatics We present in the following an exemplary application. This is the classification of mass-spectrometry (MS) data for cancer prediction. It comprise several of the above addressed problems, which are typical in bioinformatics: 1.) The data are high-dimensional 2.) Only a small sample set is available. 3) The data need to be preprocessed carefully. 4.) A visualization is highly recommended to detect internal relations and to extract them for biomarker search. 5.) The ability of generalizations is demanded for prediction. The exemplary data set is the LEUK data set generated by 4 6 . It was obtained by matrix assisted laser desorption ionization time of flight (MALDI-TOF) mass spectrometry analysis of blood plasma of patients suffering from acute lymphatic leukemia. Additionally a group of control volunteers was under consideration. A mass range between 1 to lOkDa was used. The spectra were first processed using the standardized workflow as given in 47 . The particular sample preparation is described in 4 8 . After preprocessing the spectra are obtained as 145-dimensional vectors, i.e. the feature vectors to be classified are high-dimensional. The data set consists of 30 cancer samples and 30 control samples, reflecting the problem of small sample sets, see Fig.1.The data were mapped onto their first two non-linear principal components spanned by a two-dimensional SOM 14 . A re-mapping of the SOM lattice into the data subspace spanned by the first two linear principle components is depicted in Fig. 2.A more appropriate probabilistic model of SOM is obtained by incorporation of fuzzy classification learning (FL-SOM, 3 3 ) , such that the responsibilities of the lattice notes give probability values for cancer prediction , Fig. 3. Acknowledgement The processing of the proteomic data was supported by the Bruker Daltonik GmbH using the C L I N P R O T ™ system and Sachsische Aufbaubank grant 7495/1187. References 1. U. Seiffert, B. Hammer, S. Kaski, and Th. Villmann. Neural networks and machine learning in bioinformatics - theory and applications. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 521-532, Brussels, Belgium, 2006. d-side publications.
545 laaukaernie\tcil_wcxjin_pat_s\134 Raw
j 1
i
JXJ;JmUL *J^m l3aukaamta\lol_wcxJin_koiit^s\JT52Raw
jJJMt
III
Ju
X W L Jju.^.i . XH..JL.
, .. A
Figure 1. Mass spectrum samples from LEUK. The above spectrum is a sample from the cancer class, the spectrum below is taken from the healthy control group. 2. M. Amos. Theoretical and Experimental DNA Computation. Natural Computing. Springer, Berlin, 2005. 3. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Mach. Learn., 46:389-422, 2002. 4. L. Li, H. Tang, Z. Wu, J. Gong, M. Gruidl, J. Zou, M. Tockman, and R.A. Clark. Data mining techniques for cancer detection using serum proteomic profiling. Artificial Intelligence in Medicine, 32:71-83, 2004. 5. Pietro Lio. Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics, 19(l):2-9, 2003. 6. M. Kussmann, M. Affolter, and L.B. Fay. Proteomics in nutrition and health. Comb. Chem. High Throughput Screen., 8(8):679-696, December 2005. 7. Jeffrey S. Morris, Philip J. Brown, Richard C. Herrick, Keith A. Baggerly, and Kevin R. Coombes. Bayesian analysis of mass spectrometry proteomics data using wavelet based functional mixed models. UT MD Anderson Cancer Center Department of Biostatistics and Applied Mathematics Working Paper Series. Be press, 2006. http://www.bepress.com/mdandersonbiostat/paper22. 8. N. Benoudjit, D. Francois, M. Meurens, and M. Verleysen. Spectrophotometric variable selection by mutual information. Chemometrics and Intelligent Laboratory Systems, 74:243-251, 2004. 9. M. Strickert, T. Czauderna, S. Peterek, A. Matros, H.-P. Mock, and U. Seiffert. Full-Length HPLC signal clustering and biomarker identification in tomato plants. In Proc. of FLINS 2006, 2006.
546
OVjOCS
Figure 2. Re-mapping of the 3 x 2 SOM lattice into the data subspace spanned by the first two linear principle components of the data set. Different colors and shapes ( • 0) refer to different responsibilities according to volunteers and patients.
FL-SOM for leukaemia data set •9B%\
83%
$0
8
%$L J r. Figure 3. Probabilistic fuzzy label for the 3 x 2 (FL-)SOM. The prototypes representing the responsibilities for the healthy and the patient class, respectively. The first bar visualizes the probability for cancer whereas the second one refers to healthy group membership.
10. W . T i m m , S. Boecker, T . T w e l l m a n n , a n d T. W . N a t t k e m p e r . P e a k intensity prediction for pmf m a s s s p e c t r a using s u p p o r t vector regression. In Proc. of FLINS 2006, 2006. 11. T. Villmann, G. Blaser, A. K o r n e r , and C. Albani. Relevanzlernen u n d
547
12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
27. 28.
29. 30.
31.
statistische Diskriminanzverfahren zur ICD-10 Klassifizierung von SCL90Patienten-Profilen bei Therapiebeginn. In G. Plottner, editor, Aktuelle Entwicklungen in der Psychotherapieforschung, pages 99-118. Leipziger Universitatsverlag, Leipzig, Germany, 2004. Y. Linde, A. Buzo, and R.M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28:84-95, 1980. J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York, 1981. T. Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, Heidelberg, 1995. (2nd Ext. Ed. 1997). Helge Ritter, Thomas Martinetz, and Klaus Schulten. Neural Computation and Self-Organizing Maps: An Introduction. Addison-Wesley, Reading, MA, 1992. Thomas M. Martinetz, Stanislav G. Berkovich, and Klaus J. Schulten. 'Neural-gas' network for vector quantization and its application to time-series prediction. IEEE Trans, on Neural Networks, 4(4):558-569, 1993. Udo Seiffert, Lakhmi C. Jain, and Patrick Schweizer. Bioinformatics using Computational Intelligence Paradigms. Springer-Verlag, 2004. T. Villmann and J.-C. Claussen. Magnification control in self-organizing maps and neural gas. Neural Computation, 18(2):446-469, February 2006. C. Krier, D. Francois, V. Wertz, and M. Verleysen. Feature scoring by mutual information for classification of mass spectra. In Proc. of FLINS 2006, 2006. Th. Villmann and H.-U. Bauer. Applications of the growing self-organizing map. Neurocomputing, 21(1-3):91-100, 1998. R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973. Marina Vannucci, Naijun Sha, and Philip J. Brown. Nir and mass spectra classification: Bayesian methods for wavelet-based feature selection. Chemometrics and Int. Lab. Systems, 77:139-148, 2005. M.K. Mareky, G.D. Tourassi, and C.E. Floyd Jr. Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer. Proteomics, 3:1678-1679, 2003. J.R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993. J. Dopazo and J.M. Carazo. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. Journal of Molecular Evolution, 44(2):226-233, 1997. E.V. Samsonova, T. Back, M.W. Beukers, A.P. Ijzerman, and J. Kok. Combining and comparing cluster methods in a receptor database. In Proceedings of the 5th International Conference on Intelligent Data Analysis (IDA), volume 2810 of Lecture Notes in Computer Science. Springer, 2003. Vladimir N. Vapnik. The nature of statistical learning theory. Springer New York, Inc., New York, NY, USA, 1995. M.P.S. Brown, W.N. Grundy, D. Lin, N. Christianini, C.W. Sugnet, T.S. Furey, M. Ares Jr., and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. PNAS, 97(l):262-267, 2000. P.M. Granitto, F. Biasioli, C. Furlanello, and F. Gasperi. Rf-rfe on ptr-ms fingerprinting of agroindustrial products. In Proc. of FLINS 2006, 2006. A. Sato and K. Yamada. Generalized learning vector quantization. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. Proceedings of the 1995 Conference, pages 423-9. MIT Press, Cambridge, MA, USA, 1996. B. Hammer, M. Strickert, and Th. Villmann. Supervised neural gas with
548 general similarity measure. Neural Processing Letters, 21(l):21-44, 2005. 32. T. Villmann, B. Hammer, F.-M. Schleif, and T. Geweniger. Fuzzy labeled neural gas for fuzzy classification. In M. Cottrell, editor, Proc. of Workshop on Self-Organizing Maps (WSOM) 2005, pages 283-290, 2005. 33. F.-M. Schleif, T. Elssner, M. Kostrzewa, T. Villmann, and B. Hammer. Analysis and visualization of proteomic data by fuzzy labeled self-organizing maps. In Proc. of CBMS, page in press. IEEE Computer Society Press, Los Alamitos, 2006. 34. C. Brtifi, F. Bollenbeck, F.-M. Schleif, W. Weschke, Th. Villmann, and U. Seiffert. Fuzzy image segmentation with fuzzy labeled neural gas. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 563-568, Brussels, Belgium, 2006. d-side publications. 35. B. Hammer and Th. Villmann. Classification using non-standard metrics. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2005), pages 303-316, Brussels, Belgium, 2005. d-side publications. 36. Barbara Hammer, Marc Strickert, and Thomas Villmann. Prototype based recognition of splice sites. In U. Seiffert, L.A. Jain, and P. Schweitzer, editors, Bioinformatic using Computational Intelligence Paradigms, pages 2 5 56. Springer-Verlag, 2005. 37. M. Strickert, U. Seiffert, N. Sreenivasulu, W. Weschke, T. Villmann, and B. Hammer. Generalized relevance LVQ (GRLVQ) with correlation measures for gene expression analysis. Neurocomputing, 69(6-7):651-659, March 2006. ISSN: 0925-2312. 38. Samuel Kaski. SOM-based exploratory analysis of gene expression data. In Nigel Allinson, Hujun Yin, Lesley Allinson, and Jon Slack, editors, Advances in Self-Organizing Maps, pages 124-131. Springer, London, 2001. 39. V. Cheng, C.-H. Li, J.T. Kwok, and C.-K. Li. Dissimilarity learning for nominal data. Pattern Recognition, 37(7):1471-1477, 2004. 40. M. Strickert, N. Sreenivasulu, and U. Seiffert. Sanger- driven MDSLocalize - a comparative study for genomic data. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 265-270, Brussels, Belgium, 2006. d-side publications. 41. Th. Villmann, E. Merenyi, and B. Hammer. Neural maps in remote sensing image analysis. Neural Networks, 16(3-4):389-403, 2003. 42. S. Kaski, J. Nikkila, M. Oja, J. Venna, P. Toronen, and E. Castren. Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics, 4:48, 2003. 43. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. J. Wiley Sons, 2001. 44. F. Schreiber. Visual comparison of metabolic pathways. Journal of Visual Languages and Computing, 14(4):327-340, 2003. 45. A. Drummond and K. Strimmer. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics Applications Note, 17(7):662-663, 2001. 46. IKP Stuttgart MHH Hannover and Bruker Daltonik Leipzig, internal results on leukaemia, 2004. 47. B.L. Adam et al. Serum protein finger printing coupled with a patternmatching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research, 62(13):3609-3614, July 2002. 48. E. Schaffeler, U. Zanger, and M. Schwab. Magnetic bead based human plasma profiling discriminate acute lymphatic leukaemia from non-diseased samples. In 52st ASMS Conf. 2004, page TPV 420, 2004.
FULL-LENGTH HPLC SIGNAL CLUSTERING A N D BIOMARKER IDENTIFICATION IN TOMATO PLANTS
M. S T R I C K E R T 1 , T. C Z A U D E R N A 1 , S. P E T E R E K 2 , A. M A T R O S 2 , H.-R M O C K 2 , AND U. S E I F F E R T 1 1 - Pattern
Recognition Group, 2 - Applied Biochemistry Leibniz Institute of Crop Plant Research, Corrensstr. 3, D-06466 Gatersleben, Germany E-mail: stricker@ipk-gatersleben. de
Group
High resolution HPLC data of a tomato germplasm collection are studied: Analysis of the molecular constituents of tomato peels from 55 experiments is conducted with focus on the visualization of the plant interrelationships, and on biomarker extraction for the identification of new and highly abundant substances at a wavelength of 280nm. 3000-dimensional chromatogram vectors are processed by stateof-the-art and novel methods for baseline correction, data alignment, biomarker retrieval, and data clustering. These processing methods are applied to the tomato data set and the results are presented in a comparative manner, thereby focusing on interesting clusters and retention times of nutritionally valuable tomato lines.
1. I n t r o d u c t i o n High-performance liquid chromatography (HPLC) enables detection of molecular compounds in probe material. Here, substances of tomato peel from 36 accessions from the in-house germplasm collection, providing 55 partly redundant data sets, are studied in order to identify those lines with highly abundant phenolic compounds. The available HPLC multi-channel device records absorption rates in a wavelength range between 200nm and 600nm using a diode array detector with 1.2nm resolution. Specific interest lies in the components displayed at about 280nm for a 50rran sampling at 1Hz. Thus, absorption values for 3000 retention time points are considered corresponding to substance composition in the peel of each tomato fruit. The available 55 high-dimensional chromatogram vectors are further analyzed in a processing pipeline of baseline correction, vector alignment, and clustering. Additionally, a new feature selection method is introduced for identification of distinctive retention times. The extracted time points serve as biomarkers of specific differences within the set of chromatograms. Instead of utilizing standard software tools accompanying the HPLC device, the analytic steps are realized by alternative state-of-the-art and novel methods here. Thereby, entire chromatograms, not a selection of manually determined peaks, are processed. This maintains all information and it is a particular advantage if the chemical compounds are a priori unknown.
549
550 2. C h r o m a t o g r a m d a t a processing pipeline The goal, identification of possibly interesting tomato lines and relevant retention times, is addressed after raw data export from delivered chromatogram software, such as Empower or MetaboliteTools. Data processing steps proposed here are summarized in Fig. 1. 0. 1. 2. 3. 4. Raw Data —• Baseline Correction —» Alignment —>• /Biomarker ExtractionX —* Clustering QuantileSpline DTW, PTW, COW \ FUSE / PCA,HiT-MDS Figure 1.
Pipeline for chromatogram data processing.
1. Baseline correction is an essential step to remove low-frequency instationarities occurring during the measuring procedure. Here, correction is obtained by a moving window approach. Within fixed time frames the g-quantiles (q < 20%), regarded as noise-induced baseline thresholds, are calculated and interpolated by a spline (QuantileSpline); this background spline is subtracted from the original signal in order to obtain the corrected data. This approach implements a high-pass filter and is, for example, part of the MATLAB Bioinformatics Toolbox. 2. Chromatogram alignment is essential to make the measured signals intercomparable. During the HPLC recording usually slight delays or accelerations are observed because of specific processes in the probe columns. Three approaches to chromatogram alignment have been studied, (restricted) dynamic time warping (DTW), parametric time warping (PTW), and correlation optimized warping (COW). Dynamic time warping DTW1 yields optimum alignment in terms of the shortest way, corresponding to the smallest sum of distances, in the component-wise source vs. target distance matrix. Thereby, the source signal becomes stretched by constant replica of adjacent components. This undesired effect of constantinduction is attenuated by upper limit constraints, but it cannot be completely avoided. The constant sections are unnatural for subsequent chromatogram peak integrations and, moreover, the stretched signal has to be mapped back to the original time scale. For these reasons, DTW has been excluded from the following considerations. Parametric time warping PTW 2 is an alternative alignment method without unnatural constant stretching. It provides very fast source-to-target time scale mappings by fitting parameters of a quadratic transfer function via an iterative least squares approach. Different to that, correlation optimized warping COW 3 computes local timescale adaptations by interval-based correlation maximization. Within a fixed range, all possible shifts are tested around either
551 discretely sampled or manually selected retention times to find the best match. A crucial alignment choice of both PTW and COW is the alignment target: all chromatograms need a common reference signal by means of which they become intercomparable. Possible choices of such a prototypic reference signal are the average (mean) chromatogram values or the medians at each time point. The alignment quality is very important, because the subsequent steps, biomarker identification and clustering, are based solely upon the preprocessed data. 3. Biomarker extraction is an optional step after alignment for the identification of retention times that are characteristic of the data set. Unsupervised feature selection (FUSE) is proposed which, in an exhaustive manner, iteratively isolates those retention times i that do maximum dis-correlate the original chromatogram distance matrix D and the time-reduced distance matrix DsS(k) = arg min v2{D, Z? S (*-i)ui), i G ( 1 . . . T)\S{k - l ) , f c = l . . . T - l i
S(k) is the growing set of index pointers to retention times which have been isolated until iteration number k; by definition 5(0) := {}, and by construction \S(k)\ = k. £>s(k-i)ui 1S the distance matrix calculated using the chromatogram vectors, thereby skipping the time indices given in set S(k — 1) U i. The total number of retention times is T = 3000 in the present study; in larger applications early stopping can be considered if the remaining squared correlation r2 drops below a critical near-zero threshold or when reaching a plateau. FUSE is analogous to sensitivity analysis of system input response. The reference inter-relationships of preprocessed source chromatograms are calculated once as distance matrix D and compared with the simplified models reflected by Ds- Thereby, matrix entries in D and Ds are not necessarily Euclidean distances. They may denote any reasonable similarity measure between input chromatograms and their time-reduced counterparts. FUSE is thus more than mere variance analysis of T independent time slots or difficult-to-interpret PCA-loading factor analysis. For example, the variance of components might be high just due to noise without contributing fruitfully to the reconstruction of original data relationships. To summarize, the indices S(k) from FUSE denote in descending manner the relevance of data components for faithful relationship reconstruction. Particularly, these biomarkers represent intuitive components responsible for systematic differences between the source vectors. 4- Visual clustering is the last step of the data processing pipeline to view inter-relationships of aligned chromatograms and their biomarker representations. This helps to find groups of tomatoes with similar chemical
552 compounds and it helps to identify outliers with specific substance combinations. Principal component analysis (PCA) is the standard visual clustering method for inspecting data proximities by projecting chromatogram vectors onto the two axes of maximum variance. PCA, however, is a linear approach with implicit assumption of a Euclidean input space. Moreover, as mentioned before, the axes of maximum variance are not necessarily the axes of maximum interest. The general multi-dimensional scaling (MDS) methodology aims at distance-preserving reconstruction of source data in a low-dimensional space. High-throughput multidimensional scaling (HiTMDS) is a highly optimized realization of MDS, specifically designed for dealing with very high-dimensional data 4 . Analogous to the biomarker detection approach, the optimization goal of HiT-MDS is to find positions of points in low-dimensional, interpretable Euclidean space in such a way that the matrix of their mutual distances is maximum correlated with the original data distance matrix. HiT-MDS iteratively improves the locations of randomly initialized points by update rules derived from an efficient dis-correlation stress minimization criterion. 3. Results Data screening of the 280nm-chromatograms is obtained by pseudo-gelplots of the raw data. The top panel of Fig. 2 displays in their recording order the 55 unprocessed experiments. Three groups are visually identified, experiments 1-19, 20-41, and 42-55. The first group 1-19, separated by the solid horizontal line, is one entire set of continuous HPLC recordings. Experiments 20-55, also continuously taken, were conducted one month later for testing and validation purposes. Runs 20-41 exhibit systematic recording artefacts between 250s and 500s, but a recalibration phase in the late stage of record 41 leads to a recovery of the measuring process for the subsequent runs, starting with the dashed horizontal line. 1. Baseline correction is obtained with the QuantileSpline method. The time frame size is fixed to 50s, providing a number of 3000/50=60 base points for the interpolating spline. A quantile value of q = 15% is chosen. The setting of the quantile value and the window width is determined by visual validation of the background curve and the source data under the constraints of average peak width and avoidance of too many negative baseline-corrected target points. A data example with its corresponding baseline is given in Fig. 3 for the focus of interest between 500s and 2750s: the baseline smoothly follows the original signal without interfering with the time scale of the retention dynamic. Thus, the summation of peak areas of baseline-corrected chromatograms provides more comparable standards.
Raw Data 1
5? 10 15 20
E 2530
'
1
•
i I j
1
I I
•
1
i
>
i
"
•
35 [ 40t_. 45i
*
i
.
J
i
__ 1
50 1 1 -
•
.
.
_ ! .
-;
;
1
,
',
, 1
500
1000
'.
1 i
1500 2000 retention time (s]
i
2500
3000
PTW-aligned Data ,
;
r I
I .'•
j ,
.,...|
J
'
- i t
i : I'I ,
500
I,
1000
, I
i
.
.;
,.
, • !• : •
'i l
1500 2000 retention time [s]
'
!
. _
) 1 f*
2500
3000
2500
3000
COW-aligned Data 5 10 . 15 r 500
1000
1500 2000 retention time [s]
Figure 2. Pseudo-gelplots of tomato chromatograms at 280nm. Dark bands denote high substance concentrations. Top panel: raw data; as indicated by horizontal lines, the 55 experiments fall into three temporal categories, records 1-19, 20-41, and 42-55. Two bottom panels: plots for aligned chromatograms from PTW and COW; they are already baseline corrected.
2. Chromatogram alignment is studied for PTW and COW. When all 55 experiments are processed at once, PTW produces alignments of substantially worse quality than COW. The reason for this are the three temporal groups of chromatograms identified visually: they constitute a data set too heterogeneous for the quadratic warping model. Once the three groups are independently processed, much better alignments are obtained for PTW and COW. Further considerations focus on the first group of runs 1-19, because they are supposed to contain the most trustworthy data. Alignments are shown in the two bottom panels of Fig. 2 for PTW and COW. As can be seen, the bands of the aligned chromatograms are significantly straightened in comparison to the raw data in the the upper panel above the
Chromatogram Baseline Correction reference signal spline base points baseline
•
«»(• • • » nJ\n, 500
750
1000
1250
1500 1750 retention time [s]
2000
2250
2500
. » 2750
Figure 3. Chromatogram baseline determination as spline interpolation of 15%-quantile of 50s-time-windows.
solid line. For these results, biological knowledge has been used in COW to roughly specify 39 intervals of retention times with potentially interesting substances - for these manually set focuses the correlation is maximized, whereby the search is restricted to empirically determined maximum shifts of 20s. Conveniently, PTW does not require further parameter choices. However, for both PTW and COW the alignment reference must be given. Although the differences are small, the mean chromatogram generally performs better than the median signal. This has been confirmed for all three groups. Comparing the alignment quality of PTW and COW it turns out that, for the favored mean-alignment, PTW shows higher squared deviations from the reference and lower correlation values than COW in all three cases. To conclude, COW takes more calculation time, but it yields the best overall alignment quality. Apart from one obvious COW-misalignment for experiment 17 at 1600s, this result is visually confirmed by close inspection of the two bottom panels in Fig. 2. Biomarker Identification (FUSE) 1
la...A...4iljJ.i.l'oifiLiili].k^JJi.Ul'i.>-.
0.8 0.6
8 I
•£-. •{.
0.4
0.2 500
750
1000
1250
1500 1750 retention time [s]
2000
2250
2500
2750
Figure 4. Biomarker extraction. Dots are feature ranks 1-3000 scaled to [0;1]. Upper line connects 20% top most discriminative retention times. Vertical lines refer to equivalent view on corresponding correlation loss of r 2 e [0; 1] during feature exclusion.
555 3. Biomarker extraction is an optional step after alignment. The goal is to reduce the overall data complexity to only relevant retention time intervals. By focusing on the most discriminative features of the data set according to the FUSE protocol, the noise influence of unimportant components is discarded. For the chromatograms X\,.,\§ of runs 1-19 FUSE has been calculated for Pearson similarity with matrix entries D{j = 1 — r(xi,Xj), and Dij = 1 — r(ii,Xj) for the time-dropped chromatogram vectors. This yields Fig. 4, zooming on an interesting time interval. Vertical lines correspond to the correlation loss r2(D, Ds(k-i)vi) during successive time point dropping. The dashed horizontal line separates 600, i.e. 20%, top ranked, most discriminative retention times from the less important features (dots). As expected, these feature ranks and the correlation loss are highly correlated. The top-rated retention times are of high biochemical interest for the characterization of the tomato plant compounds. 4- Visual clustering is the last step of the data processing pipeline to inspect relationships between the tomato lines. Since peak areas are proportional to compound concentrations, integral differences, i.e. Manhattan Lldistances, of chromatograms provide particularly meaningful comparisons. These distances of peak-aligned data are used for 2D-embeddings by HiTMDS of the 3000-dimensional chromatograms and of the 600-dimensional biomarkers obtained by FUSE. Figure 5 displays the visualization results. Within each plot, two visual clusters can be identified, as well as the clear outliers 6, 16, 18, and 19. The elliptic cluster corresponds to tomato lines with specifically high compound concentrations at 7805 and 9605, the circular cluster comprises tomatoes with more subtle differences, but with a tendency to higher rutin concentrations. In a chromatogram overlay, outlier 16 matches pretty well the circular group, except for an exceptionally high rutin level, which is one of the desired compounds in the study. Runs 6, 18, and 19 have more complex structure and show up as outliers. The high similarity of both panels in Fig. 5 must be pointed out: although the biomarker embedding uses only 20% of the original chromatogram length, the results are essentially the same. Since FUSE and HiT-MDS are founded on the same principles, i.e. the consideration of correlation between the original data and the dimension-reduced models, such result is expected. 4. Conclusions a n d future directions HPLC signals of probes from tomato peels have been successfully analyzed by means of a d a t a processing pipeline with the stages QuantileSpline-baseline correction, chromatogram alignment to the chromatograms' mean signal, optional FUSE biomarker identification, and HiT-MDS-based
HiT-MDS full chromatogram clustering
HiT-MDS chromatogram subspace clustering
19 1
19 16
/Ti\
r
16
/14S ,I /7~^\ 7
0
5 "1
\_y 1 17
1 -
1
Figure 5.
0
i
6
1 2 embedding axis 1
6 16
17
18 3
-
1
0
1 2 embedding axis 1
3
Chromatogram and biomarker embedding by HiT-MDS and Ll-distance.
chromatogram and biomarker embedding. The obtained results help significantly to focus on experiment outliers and groups of experiments as well as on relevant retention time intervals. The processing pipeline points out an important direction t o semi-automatic chromatogram processing, especially interesting for handling massive data sets. Future research will focus on improving the alignment procedure by reducing free parameters in COW and by enabling reiteration, i.e. by realignment of already aligned data to the recomputed mean signal. FUSE biomarker detection shows promising results, but is yet only at its starting point. Subsequent experiments are necessary to fully assess the power of this method. Finally, the visual arrangement after embedding is not only determined by the signal alignment quality, but also essentially by the underlying similarity measure for signal comparison. Therefore, meaningful measures must be reconsidered, which is a crucial topic already for most alignment procedures. Taking these issues into account, the presented pipeline can be canonically extended for dealing with mass spectra, which is subject to future investigations. References 1. V. Pravdova, B. Walczak, and D.L. Massart, "A comparison of two algorithms for warping of analytical signals", Analytica Chimica Acta 456, Issue 1, pp. 77-92, 2002 2. Paul H.C. Eilers, "Parametric time warping", Analytical Chemistry 76, Issue 2, pp. 404-411, 2004 3. N.V. Nielsen, J.M. Carstensen, J. Smedsgaard, "Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping", J. of Chromatography A 805, pp. 17-35, 1998 4. M. Strickert, S. Teichmann, N. Sreenivasulu, and U. Seiffert, "HighThroughput Multi-Dimensional Scaling (HiT-MDS) for cDNA-Array Expression Data", in W. Duch et al. (Eds.): Artificial Neural Networks: Biological Inspirations, LNCS 3696, pp. 625-634, Springer, 2005
FEATURE SCORING BY MUTUAL INFORMATION FOR CLASSIFICATION OF MASS SPECTRA C. KRIER 1 , D. FRANCOIS 2 , V. WERTZ 2 , M. VERLEYSEN 1 Universite catholique de Louvain, Machine Learning Group DICE, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium fkrier, verleysenj @ dice. ucl. ac. be 2 CESAME Research Center, Av. G. Lemaitre 4, B-1348 Louvain-la-Neuve, (francois, wertz}@inma.ucl.ac.be 1
Belgium
Selecting relevant features in mass spectra analysis is important both for classification and search for causality. In this paper, it is shown how using mutual information can help answering to both objectives, in a model-free nonlinear way. A combination of ranking and forward selection makes it possible to select several feature groups that may lead to similar classification performances, but that may lead to different results when evaluated from an interpretability perspective.
1. Introduction Mass spectrometry allows identifying chemicals in a substance by their mass and charge. It produces spectra that plot the quantity of chemicals in the substance as a function of their mass to charge ratio (m/z). Typically, several thousands m/z values are considered. Such spectra are said to be high-dimensional. For illustration purposes, the detection of cancer will be considered. The interesting question for researchers is of course which chemicals are involved in the process and which biomolecules are affected by the disease process. Two objectives are thus sought together: classification performances should be high, and the method should identify which chemicals are affected. Focusing only on features (m/z) that allow building an efficient classification model is not sufficient; indeed several sets of features could lead to similar classification performances, while one set could be of much greater interest for causality interpretability than the other ones. Another way of identifying relevant features is to examine the statistical dependency between each of them (taken individually) and the class label. While the statistical dependency concept does not make any assumption on the model that is further used for classification, it will discard features that are only relevant in a group, and not individually. 557
558 In this paper, we suggest to overcome these limitations by using the mutual information measure between features and the class label. Mutual information is a nonparametric, model-free method for scoring a set of features. It can be used to spot all features relevant to the classification, and to identify groups of features that allow building a valid classification model. It is applied to the detection of ovarian cancer through spectra of human serum. The process allows identifying feature sets that can be later assessed from a clinical perspective. The paper is organized as follows: Section 2 reviews the existing literature, Section 3 introduces the concept of the mutual information. Section 4 proposes some experiments with the Ovarian Cancer dataset and Section 5 concludes. 2. Previous work Several mass spectrometry classification algorithms have been proposed in the literature [1, 2, 3]; yet only a few studies focus on feature selection. In a comparative study, Liu [4] considers the Chi-squared test and the t-test, making the assumption that the class populations are normal-distributed. He furthermore uses an entropy-based method, considering the mutual information between pairs of features, and between each feature and the class label. Those methods will however eliminate features relevant only in conjunction with each other. Petricoin uses a Genetic algorithm [5] prior to probabilistic classification. This allows to find an (sub-)optimal feature subset, but fails at scoring each feature individually. The method may thus find a set of features adequate for classification, but not all sets that could be of interest. Furthermore, the procedure is model-dependent and prone to convergence issues. Lilien [6] proposes the use of the Linear Discriminant Analysis, and Back Projection to score the features. The LDA results in a discriminant vector that is then normalized according to the initial features variances. The score associated to each feature is the corresponding element in the normalized discriminant vector. The classification model is thus constructed on all features; therefore, a lot of poorly scored (by the LDA criterion) features may have the same weight in the classification process as a single high-scoring feature. Here again, two features that are relevant only when paired will not be identified as such. In the following section, we will see that the mutual information allows scoring groups of features, independently from a subsequent classification model, and without making any assumptions about the class sample distributions.
3. The mutual information The mutual information (MI) between two random variables or random vectors measures the "amount of information", i.e. the "loss of uncertainty" that one can bring to the knowledge of the other, and vice versa. 3.1. Definition of the mutual information The concept of uncertainty of a random variable is expressed by its entropy [7]. Although the notion of entropy has first been developed for discrete variables, it can be extended to continuous variables rather easily. The entropy H(Y) of a random variable Y with probability density function (pdf) juY is defined by H{Y) = -\nY{y)\ognY{y)dy.
(1)
The entropy of a random variable or vector Y when the value of some other random variable X is known is the conditional entropy: H(Y\ X)=-\Mx(x)
\nY{y\X^x)\ogMY{y\X=x)dydx.
(2)
The mutual information is the difference between the entropy of a variable and the conditional entropy I(X, Y) = H(Y) - H(Y\X). The mutual information can be expressed as the Kullback-Leibler divergence between the joint distribution fix.y of the variables and the product of the marginal distributions fix and /
^^'^dbcdy
.
(3)
When X and Y are independent, the mutual information is zero; the higher the dependency between variables X and Y, the higher is their mutual information. Contrarily to the correlation, the mutual information measures any relationship between variables, and not only linear relations. In the above equations, X and Y can be random vectors instead of random variables. If Y is a binary class label, definition (3) holds. Its extension to multi-class problem is not obvious though, as an adequate class labeling has to be provided. 3.2. Estimation Equations (1) to (3) are not applicable as such, as the pdf are not known in practice. The estimation of the mutual information given finite samples is thus a problem of density estimation. Density estimation can be achieved in several ways [8], for instance with histograms, kernels, B-splines, or Nearest
560 Neighbours. The latter has the major advantage to be reasonably efficient for the estimation of a multivariate density (when a random vector is involved), while the other ones suffer more dramatically from the 'curse of dimensionality' (the required number of samples needed for the estimation grows exponentially with the dimension of the random vector). There exists an extensive literature on density-based entropy estimators [9, 10]; recently, they have been extended to the Mutual Information by Kraskov et al [11]. The latter estimator is used in the experimental part of this paper. 3.3. Using the mutual information for feature scoring/selection A high mutual information between a feature X and the class label Y thus means that feature X is relevant, regardless of the classification algorithm. However, the mutual information can be used in several ways to select (sets of) features. First, the mutual information scores can be estimated between each feature individually (m/z) and the class label. The highest scores correspond to features that are most relevant in discriminating between the two classes. In contrast, the features with a mutual information near zero are statistically independent from the class label. The drawback of this method is that features that are relevant together but useless individually cannot be accurately spotted. Secondly, the mutual information can be used to search for the optimal feature subset (which may or may not be the subset of optimal features) in a forward manner: the feature with the highest mutual information with the class label is chosen first. Then, pairs of features containing the already selected one and any remaining one are built. The mutual information between each of these pairs and the class label are measured; the second chosen feature is the one contained in the pair with the highest mutual information score. The procedure is then iterated until the adequate number of features has been reached. Although this procedure, which is greedy in the sense that the choice of a feature is never questioned afterwards, can lead to a sub-optimal feature subset, it performs most often efficiently, and definitely better than the previous option. While the second procedure is good at identifying the most relevant subset, it will probably not select all features that could be relevant for the problem, as redundancy between features is avoided. Both procedures have advantages; therefore, in order to identify all features relevant as well individually as in conjunction with others, they are merged into a single one, inspired from [12]: 1. N features are selected by individual mutual information 2. M features are selected by the forward procedure 3. All 2N+M possible feature subsets are constructed and their mutual information with the class label is estimated.
561 The subset with the highest mutual information can be chosen for classification purposes; however, all other subsets associated to a high value of the mutual information with the class label can be considered as relevant for the problem too. In this way, several subsets of features can be identified, hopefully allowing spotting all features relevant to the problem, either individually or in conjunction with others. The subsets can thus be ranked and further applicationdependent investigations performed. The values of N and M should be chosen as high as possible, while keeping the 2N+M MI estimations tractable. Despite the fact that the complexity of the method is proportional to the square of the number of features in the worst case, the average number of computations is linear with the number of features. In practice, the computation of all MI values does not exceed a few tenth of minutes on a standard computer if, as an example, N+M is limited to 7. Furthermore, the whole variables selection process can be performed off-line, as it does not need to be repeated to classify a new sample. 4. Example The method is illustrated on an ovarian cancer dataset from the Clinical Proteomics Program of the U.S. National Cancer Institute [13]. The spectra result from SELDI-TOF experiments. The healthy samples come from women showing risks of cancer from a clinical perspective, while the positive cancer samples come from women with various tumors types and severity (see [5] for details.) To get a tractable number of feature subsets to assess, three features were chosen by the forward selection method; then, four other ones were chosen among the highest scored features not already selected. To assess the relevance of the selected features, a linear classification is performed on a test set, as in [6]. 4.1. Results Figure 1 shows the mutual information score for each m/z value. The vertical lines indicate which features were chosen by the forward strategy and not by the ranking procedure (in this case, the m/z ratios 2.7921 and 24.2851). Few features have really high mutual information scores. Note that negative values are obviously the result of the estimates bias (without consequence on the ranking) and variance (that gives an idea of the estimator accuracy). The final set of selected features is given in Table 1, with the corresponding m/z values. Feature 1679 has the highest mutual information with the class label. The features selected by the ranking method are obviously highly correlated; nevertheless, we will see that they are not totally equivalent for classification.
0
2000 4000
6000 8000 10000 12000 14000 16000 18000
m/z Figure 1. Mutual information for each m/z feature. Table I. The seven selected features, along with their corresponding m/z values. An O in regard of the name of a method indicates that the feature was selected by the method. Feature 181 530 1678 1679 1680 1681 1682 m/z 2.7927 24.2851 244.6604 244.9525 245.2447 245.5370 245.8296 Forward O O O Ranking O O O O O
The mutual information of each possible of the 128 feature subsets is given in Figure 2, along with the performances of a linear classifier built using that subset. The feature groups are ordered by increasing mutual information. Table 2 presents some of those feature groups. Figure 2 confirms that the classification performances are highly correlated to the mutual information (0.9041).
O
20
40
60 group ID
80
100
120
Figure 2. The mutual information and classification performances for a linear classifier built on every possible subset of the selected features.
4.2. Discussion From the analysis of Table 2, it appears that: • The group of features achieving the best classification is not the group of most individually relevant features nor it is the group identified by the
2. Some values from Figure 2. Group Feature group ID 181 126 127 530 1678 118 113 530 ;1678 38 181 ; 530 ; 1678 1 181; 1678; 1679; 1680; 1682 33 181; 530; 1678; 1679; 1680; 1682;1681 1678;1679; 1680;1681;1682 59 34 181; 530; 1680 1678 ; 1679 35
•
•
• •
Mutual information 0.3694 -0.0349 0.3694 0.5644 0.6571 0.7026 0.6585 0.6347 0.6583 0.6581
% Correct classification 74.86 58.00 75.86 90.00 100.00 98.43 98.43 95.31 98.43 95.23
forward procedure. Group 38 is the best group and contains only one of the highest-ranked features. Furthermore, the group discovered by the forward procedure achieves less good classification performances; this is because the choice of the first feature was never questioned. Using a forward or ranking procedure alone does not lead to the optimal feature subset. Some individually less relevant features help building more accurate classifiers than if using individually relevant features only. Features 530 (Group 127 - low MI) 1678 together reach 90% of correct classification (Group 113), while feature 1678 (Group 118) classifies only 76% of the samples correctly. It can thus be assumed that feature 530 is involved in the process. Only ranking features may prevent from spotting relevant ones. There are groups of different features that achieve very similar results. For example Groups 34 and 35 share no variable, although their classification performances (95 and 98 %) are rather close. Simply relying on the best feature subset according to the mutual information or to some model-based algorithm does not allow recovering all features involved in the process. The performances in classification reached by the method are similar to the results obtained by Lilen with LDA and Back Projection [6]. In this problem, it appears that only features with low m/z ratio are relevant.
5. Conclusion This paper shows that using the mutual information between features and class label in mass spectra analysis help choosing relevant feature sets. The method based on the combination of feature ranking and forward selection, and using mutual information on sets of features rather than individually, makes it possible to rank feature subsets. Then, an application-driven procedure can be used to
assess the (clinical in this example) relevance of the feature sets, starting from the highest-ranked ones by the proposed procedure. Acknowledgments M. Verleysen is Research Director of the Belgian FNRS. C. Krier and D. Francois are funded by a Belgian FRIA grant. Parts of this research results from the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office. The scientific responsibility rests with its authors. References 1. BL. Adam et al., Serum protein fingerprinting coupled with a patternmatching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men, Cancer Research, 62, 3609 (2002). 2. G. Ball et al., An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics, 18,395 (2002). 3. H. Zhou et al., Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometry, Nature Biotechnology, 20, 512 (2002). 4. H. Liu, J. Li and L. Wong., A Comparative Study on Feature Selection and Classication Methods Using Gene Expression Proles and Proteomic Patterns, Genome Informatics 13, 51 (2002) 5. E. Petricoin III et al., Use of proteomic patterns in serum to identify ovarian cancer, The Lancet, 359 572 (2002) 6. R. H. Lilien, H. Farin, B. R. Donald, Probabilistic Disease Classification of Expression-Dependent Proteomic data from Mass Spectrometry of Human Serum, Journal of Computational Biology 10(6), 925 (2003). 7. C.E. Shannon, W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, IL, 1949. 8. D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley, New-York, 1992. 9. R.L. Dobrushin, A simplified method of experimental evaluation of the entropy of stationary sequence, Theory Prob. Appl. 3(4), 462 (1958). 10. O. Vasicek, Test for normality based on sample entropy, J. Royal Statist. Soc. B38, 54 (1976). 11. A. Kraskov, H. Stogbauer, P. Grassberger, Estimating mutual information, Phys. Rev. E69:066138 (2004). 12. F. Rossi, A. Lendasse, D. Francois, V. Wertz, M. Verleysen, Mutual information for the selection of relevant variables in spectrometric nonlinear modeling, Chemometrics & Intelligent Lab. Systems, 80, 215-226 (2006). 13. http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
PEAK INTENSITY PREDICTION FOR P M F MASS SPECTRA USING SUPPORT VECTOR REGRESSION
W. TIMM 1 ' 2 - 3 , S. BOCKER 2 , T. TWELLMANN 3 , T. W. NATTKEMPER 3 1. International NRW Graduate School in Bioinformatics and Genome Research 2. Junior Research Group Informatics for Mass Spectrometry, Genome Informatics Group, Faculty of Technology 3. Applied Neuroinformatics Group, Faculty of Technology Bielefeld University, Postfach 100131, 33501 Bielefeld, Germany E-mail: wtimm
1. I n t r o d u c t i o n Mass spectrometry has become the method of choice to analyze the proteome of a cell. One widely-used approach is based on separating proteins via two dimensional electrophoresis, then digesting each protein using an endopeptidase such as trypsin, and finally analyzing the peptide mixture by MALDI-TOF mass spectrometry. Proteins are identified by comparison of the resulting protein mass fingerprints (PMFs) with those in a database of known proteins. W i t h the increasing amount of d a t a produced in this area, automated approaches for reliable protein identification are highly desirable. Shadforth et al.1 give an overview of currently available techniques. The most established programs for this purpose are ProFound 2 565
566 and MASCOT 3 . These query tools require human interaction to evaluate the identification results, which significantly slows down the process of analyzing whole proteomes. In P M F analysis, cleaved peptides stay intact during MS analysis and are only detected if they are ionized. During such an experiment, not only peak masses, but also their intensities (peak height, area under curve) can be assessed. However, current analysis tools for P M F d a t a usually only take a list of peak masses as input. If peptides are ionized all alike and the digestion procedure works perfectly, peaks of equal intensity can be found at all peptide masses, taking into account multiplicities of peptides in a protein. In reality, this is not the case: Due to the variety of ionization processes, and unequal specificity of trypsin for different amino acids, peak intensities can vary significantly. Models have been proposed for these processes 4 ' 5 , but currently these models do not allow to precisely predict peak intensities. Elias et al.6 show that the use of peak intensities enhances the reliability of protein identification for tandem mass spectra. It is reasonable to assume the same to be true for P M F spectra. As noted above, software for P M F interpretation usually ignores intensity information. Modeling of intensities in tandem mass spectra has been dealt with by multiple authors 7 ' 8 , 9 , 1 0 . Such models mainly consider the fragmentation probabilities of molecules. For MALDI P M F spectra, where molecules stay intact in the analysis procedure and the intensities highly depend on the ionization probability of the molecules, Gay et al.n applied a number of different regression and classification methods. They found the M5' decision tree algorithm to perform best among the regression methods tested and derived a few rules regarding the influence of the occurrence frequencies of some amino acids on the peak intensities.
2. Goal The goal of this work is to examine approaches for predicting the intensities of peaks from measured protein MALDI-TOF mass spectra using a numerical representation of the peptide sequences of theoretically cleaved proteins. In a first step towards a solution of this problem, our question is whether this string representation enables us to find a correlation between the peptide sequence and its peak intensity. In the long run, the prediction of intensities should enhance the reliability of protein identification results by calculating more realistic theoretical spectra.
567 For the experiments as presented in this work, we assumed that the peak intensities are reproducible under the same experimental conditions. This assumption is supported by the work of Coombes et al.12 who found peak intensities of multiple experiments under the same conditions with proteins from the same sample to be highly correlated. Another assumption we make is that the intensities only depend on the corresponding peptide's sequence. This means that interactions between peptides inside the device are neglected. Also, the effect of incomplete cleavage is not considered. 3. Materials and methods 3.1.
Data
The experimental spectra were taken from biological studies on Corynnebacterium glutamicum which involved 2D-PAGE experiments with subsequent tryptic digestion and measurement in a Bruker Ultraflex device. The whole data consists of 369 raw spectra of which 20% (66) were selected belonging to 43 different proteins. These were identified by MASCOT3 with the highest distance to the score of the second-best match. The minimal score of the selected spectra is 66 and the minimal distance 37. In the further process the MASCOT identification is taken to be true. Based on this identification, theoretical tryptic digestion was performed to determine the theoretical peptides. The masses of these peptides lead to theoretical peak lists that are then used to retrieve the corresponding peaks from the preprocessed raw spectra. The preprocessing involves denoising, baseline correction, peak detection, filtering of peaks, and deconvolution (de-isotoping). If a peak is found within 1.2 Da a of the mass of a theoretical peptide, it is declared a match. For each match, the amino acid sequence and the intensity before and after the deconvolution step are taken. The extracted peaks form an initial set. All sequences from this set are embedded into a vectorial feature space to allow for processing by a machine learning algorithm. The transformation and algorithms used are described in the Techniques section (3.2) below. Three data sets were compiled from the initial set. The set r > 8 0 o includes all peaks with a mass above 800 Da, a match accuracy (distance between theoretical and real peak's mass) of at most 1.0 Da, and the intensities from the deconvoluted spectra as target values. The set TQ+ consists a
In MALDI experiments, usually only singly charged ions are observed. Therefor, when "Da" or "mass" is written in this paper, "Da/z" or "mass per charge" is meant and would be more accurate, but the difference is not important for our application.
568 of the same peaks as the set r>8oo> but additionally includes peaks with masses below 800 Da because of the low signal-to-noise ratio in that range. The r n ( j set consists of the same peaks as the set r>soo but intensities before deconvolution were used as target values instead. All peak intensities are scaled according to Is = = ^ — ^ - — — . Here, Is is the scaled intensity, 7 or i g the unsealed one, N the number of data points in the raw spectrum, /; the raw intensity value at index i, Bi the baseline value, and Ni the noise determined in the denoising step. In the remainder, intensities refers to the scaled intensities. Most peptide sequences occur in the data set multiple times but with different intensities. Therefore, the target values for the regression are calculated as the a-trimmed mean b of all intensities per distinct sequence. The scewishness of the distribution of intensities suggests to normalize them. The result are training sets with different target values: Traw- Target values are those of the original data set. T\n: The natural logarithm of intensities is calculated before the Q-trimmed mean is applied to these values. Trank- All peaks from T\n are sorted by target value, then bins are created so each bin holds twenty consecutive peaks. The bin index is assigned as the target value for the corresponding peak. Tbins: Intensities are sorted into five bins, such that the lowest bin takes all the peaks from T\n with target values < 2, the highest bin > 5, and all bins in between spanning an equal range. Again, the target values are assigned according to the corresponding bin index. Tbt,: This is a balanced variant of T^ms- Here, peaks were randomly selected from Tbi ns so each bin holds the same number of peaks. 3.2. Prediction
of peak intensities
by machine
learning
The feature vectors consist of the relative frequency of mono- and trimers over the amino acid alphabet without positional information. No relative frequencies of dimers were used. Instead, the first and last dimer of a sequence was encoded, scaled by the length of the sequence, because these positions might be the most important locations for ionization. In earlier experiments on a DNA data set the described setup performed best and can be considered a trade-of between dimensionality and amount of information. This encoding yields very sparse 8820-dimensional feature vectors. i.e. the mean of the center 50% of an ordered list
569 For the regression task at hand we applied a support vector machine for regression, the f-SVR 13 . 14 . 15 ; w ith both a Gaussian kernel and a linear kernel. To find the optimal values for the SVR regularization parameter C, the Gaussian kernel's bandwidth 7, and 1/, grid searches were performed in the three-dimensional parameter space log2(C) € [—5,15], log2(7) S [—15,7], and v £ [0.1,0.9] for each training set separately. For the evaluation, a ten-fold cross validation was performed using the optimal SVR parameters found in the grid search. Because of the low amount of data available, no validation set was used for this. The predicted values accuracy was assessed by measuring the root mean square error (RMSE) and Spearman's rank correlation coefficient (cor).
4. Results and Discussion Table 1 shows the prediction performance of the i^-SVR on all data sets as well as some statistical properties. A positive correlation between the predicted and measured intensities (target values) can be observed. Highest correlation is yielded for almost all sets of To-i-. The r n( j sets are predicted worst. The highest correlation overall (0.55) was achieved for TQ+I \n. All other logarithmic sets of To+ have a correlation close to that value. In general, logarithmizing improves the prediction. The ranking/binning as well as balancing of the data does not improve the correlation except for Tncj, rank, which has low correlation though. The choice of the kernel function does not have any clearly visible influence on the correlation. Figure 1 shows plots of the predicted values against target values of different training sets. High densities are shown as dark, low densities as light areas. There is almost a diagonal structure, but low values can not be predicted well, indicated by a wide spreading. In addition, high values are predicted too low in most cases regardless of the normalization used. The Trank a nd the Tbb sets are balanced so this result can not be due to a low number of training examples in these ranges but must be caused by the data and the chosen representation. Pearson's correlation coefficient (which assumes a normal distribution) was also calculated (data not shown). The found values are similar (max. 0.01 difference) to Spearman's rank correlation coefficients except for the non-logarithmic modes. For these, it was noticeably lower. A leave-one-out cross-validation was done for the T raw and the T\n sets with the Gaussian kernel to take into account the low number of training
T(0+, In)-Gaussian kernel estimated 2d density
1
T(0+, rank) - Gaussian kernel estimated 2d density
T(>800, D P ) - linear kernel estimated 2d density
0*
Figure 1. Plots of measured (target) vs predicted intensities as estimated 2d density using a Gaussian kernel. SVR parameters, RMSE, and correlation are according to Table 1. left: Tti+, lm middle: To-)., rank> right: T>soo, bb- An almost diagonal structure is visible, but a wide spread for low target values can be observed. High target values are predicted too low.
examples available. This improved Pearson's correlation values by 0.03 to 0.14 only for the r>soo and r n a data sets, more so for the non-logarithmic ones, but did not change the correlation for To+, raw and To+, in- This suggests that the number of examples available for r>8oo and r nc j is by far not sufficient. It can be assumed that an increase in the number of data points of T 0+i bb will result in a correlation even higher than that of T 0+ , in. 5. Conclusion The presented results show a positive correlation between the predicted values and the scaled and logarithmized intensities although only few data was available, motivating more detailed experiments. The use of a larger data set with known protein identities and more sophisticated feature vectors would be promising. 6. Acknowledgments Thanks to Martina Mahne and Joern Kalinowski for providing spectra from their studies, and to Andreas Wilke, who manually selected spectra and did the MASCOT identification. W. Timm is currently supported by the International NRW Graduate School in Bioinformatics and Genome Research. References 1. I. Shadforth, D. Crowther, C. Bessant, Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines. Proteomics 5(16), p p . 4082 (2005).
Table 1. Dataset r>soo, raw 7>soo, In r> 8 oo!rank TVsoo, bins r>800, bb To+, raw r 0 + , in 7o+'rank 2o+,'bins T 0 + , bb Tnd, raw Tnd, In r„d!rank T n d , bins T n d , bb
N 353 353 353 353 265 448 448 448 448 265 353 353 353 353 252
fi 74.58 3.37 9.33 1.87 2.00 61.11 2.96 11.69 1.56 2.00 34.34 2.81 9.33 1.29 1.55
median 28.7 3.3 9 2 2 19.8 2.9 12 1 2 14.6 2.6 9 1 1.5
a 105.84 1.43 5.10 1.34 1.42 98.22 1.59 6.45 1.37 1.42 47.28 1.13 5.11 1.15 1.21
Results of the y-SVR's intensity prediction
Gaussian kernel v 0.2 0.5 0.5 0.8 0.8 0.6 0.7 0.5 0.8 0.8 0.8 0.3 0.5 0.5 0.7
log 2 (C) 13 13 7 11 3 9 3 5 3 9 13 15 15 3 9
log 2 (7) 3 -11 3 -9 -1 3 1 1 1 1 3 3 -11 -1 -7
RMSE 103.77 1.28 4.57 1.18 1.23 94.46 1.32 5.40 1.17 1.19 45.78 1.07 4.56 1.05 1.12
cor 0.32 0.45 0.45 0.47 0.48 0.49 0.55 0.54 0.53 0.54 0.33 0.35 0.47 0.39 0.38
linear kernel v 0.4 0.6 0.5 0.9 0.5 0.3 0.6 0.6 0.7 0.6 0.3 0.4 0.5 0.7 0.7
ln(C) 11 3 5 3 3 9 3 5 3 3 9 3 5 3 3
RMSE 105.19 1.29 4.48 1.17 1.24 97.00 1.33 5.46 1.22 1.21 44.70 1.01 4.74 1.06 1.10
cor 0.37 0.45 0.49 0.49 0.47 0.40 0.54 0.53 0.48 0.53 0.41 0.45 0.42 0.38 0.41
Note: N: Number of items in data set, fi: mean intensity value, cr: standard deviation, v. trade-off parameter v of the I/-SVR, C: regularization parameter of the f-SVR, 7: width of Gaussian kernel function of the f-SVR, RMSE: root mean square error, cor: Spearman's rank correlation coefficient as calculated by the R statistical environment 16 .
572 2. W. Zhang, B. T. Chait, ProFound: An Expert System for Protein identification Using Mass Spectrometric Peptide mapping Information. Anal. Chem. 72, pp. 2482, (2000). 3. D. N. Perkins, D. J. C. Pappin, D. M. Creasy and J. S. Cottrell, Probabilitybased protein identification by searching sequence databases using mass spectrometry data., Electrophoresis 20(18), pp. 3551, (1999). 4. R. Zenobi and R. Knochenmuss, Ion formation in MALDI mass spectrometry. Mass Spec. Reviews 17, pp. 337 (1998). 5. M. Karas, M. Gliicksmann and J. Schafer, Ionization in matrix-assisted laser desorption/ionization: singly charged molecular ions are the lucky survivors. J. Mass Spectrom. 35, pp. 1 (2000). 6. J. E. Elias, F. D. Gibbons, O. D. King, F. P. Roth and S. P. Gygi, Intensitybased protein identification by machine learning from a library of tandem mass spectra, nature biotechnology 22(2), pp. 214 (2004). 7. R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang and P. Radivojac, A machine learning approach to predicting peptide fragmentation spectra. Pac. Symposiom on Bioinformatics (2006). 8. E. A. Kapp, F. Schiitz, G. E. Reid, J. S. Eddes, R. L. Moritz, R. A. J. O'Hair, T. P. Speed and R. J. Simpson, Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation. Anal. Chem. 75, pp. 6251 (2003). 9. F. Schiitz, E. A. Kapp, R. S. Simpson and T. P. Speed, Deriving statistical models for predicting peptide tandem MS product ion intensities. Biochem. Soc. Trans. 31, pp. 1479 (2003). 10. A. Bonner and H. Liu, Predicting Protein Levels from Tandem Mass Spectrometry Data. NIPS'04 Workshop on New Problems and Methods in Computational Biology (2004) 11. S. Gay, P.-A. Binz, D. F. Hochstrasser and R. D. Appel, Peptide mass fingerprinting peak intensity prediction: Extracting knowledge from spectra. Proteomics 2, pp. 1374 (2002). 12. K. R. Coombes, H. A. Fritsche Jr., C. Clarke, J.-N. Chen, K. A. Baggerly, J. S. Morris, L.-C. Xiao, M.-C. Hung and H. M. Kuerer, Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization. Clin. Chem. 49:10, pp. 1615 (2003). 13. B. Scholkopf, P. Bartlett, A. Smola and R. Williamson, Shrinking the Tube: A New Support Vector Regression Algorithm. Neural Computation 12, pp. 1207 (2000). 14. B. Scholkopf, P. Bartlett, A. Smola and R. Williamson. In M. S. Kearns, S. A. Solla and D. A. Cohn (Eds.), Advances in neural information processing systems, 11, Cambridge, MA: MIT Press (1999). 15. C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, h t t p : / / w w w . c s i e . n t u . e d u . t w / ~ c j l i n / l i b s v m , (2001). 16. R Development Core Team, R: A Language and Environment for Statistical Computing, h t t p : / / w w w . R - p r o j e c t . o r g , ISBN 3-900051-07-0, Vienna, Austria, (2005).
LEARNING COMPREHENSIBLE CLASSIFICATION RULES FROM GENE EXPRESSION DATA USING GENETIC PROGRAMMING AND BIOLOGICAL ONTOLOGIES BEN GOERTZEL, LUCIO DE SOUZA COELHO, CASSIO PENNACHIN, IZABELA FREIRE GOERTZEL, MURILO SARAIVA DE QUEIROZ, FRANCISCO PROSDOCIMI, FRANCISCO PEREIRA LOBO Biomind LLC, 1405 Bernerd Place Rockville MD 20851, USA We consider the problem of how to use automated techniques to learn simple and compact classification rules from microarray gene expression data. Our approach employs the traditional "genetic programming" (GP) algorithm as a supervised categorization technique, but rather than applying GP to gene expression vectors directly, it applies GP to "enhanced feature vectors" obtained by preprocessing the gene expression data using the Gene Ontology and PIR ontologies. On the two datasets considered, this "GP + enhanced feature vectors" combination succeeds in producing compact and simple classification models with near-optimal classification accuracy. For sake of comparison, we also give results from the combination of support vector machine classification and enhanced feature vectors on the same datasets.
1. Introduction The analysis of microarray data is still somewhat problematic [10,13,14,15]. One popular approach is to apply supervised categorization technology [1,3], a strategy with two purposes: to find classification models that can be used to develop diagnostics; and to find models that can be used as guidelines for ongoing experimental and theoretical work. Unfortunately regarding the latter purpose, it is common for highly accurate machine learning algorithms to generate models that can't be easily understood. Techniques exist that alleviate the understandability problem [4,9,18] via providing lists of "important genes or GOs" related to the categorization process - but this sort of approach ignores the issue of how genes relate to each other in the operation of a classifier. The approach we describe here utilizes the genetic programming algorithm [12] in an unusual way - by giving it additional input features beyond the gene expression values, which are derived from the latter using knowledge resources such as the Gene Ontology (GO) [6] and the Protein Information Resource (PIR) [17] This "enhanced feature vectors" approach often produces classification models that are simple and compact and have clear biological 573
574 meaning. Here we report results obtained by applying this approach to two different microarray datasets, using both GP and support vector machines. 2. Feature Vector Enhancement The basic idea of the enhanced feature vectors approach is to produce feature vectors containing additional entries besides the usual (normalized, transformed) gene expression values. The simplest way to produce additional features is to use the GO, but the same approach applies with the PIR or other resources (generically referred to as Background Information or BI). For each entity whose gene expression profile is under study, we may create a single feature value corresponding to each GO category. In the simplest approach, these GO-derived feature values may be computed by averaging. Let G be the set of genes whose expression values have been measured (i.e., the original feature set), and GO the set of gene ontology categories. If we have a GO category CBGO, and an entity E, then the value of the feature corresponding to C in the feature vector of entity E may be defined as the average expression in E of all the genes gj e G annotated to belong to C. More formally, assume we have an entity (e.g. an organism or a tissue sample, evaluated at a single time point) Eh and let geneexpy denote the (perhaps normalized and transformed) expression level of gene g, in entity Et. Let GOk denote the Ar'th GO category under consideration. Let 6* denote the set of genes contained in GOk. We may consider a number wjk in [0,1] associated with each element gj in G/c, which is the confidence with which it is known that gjeGk. As a default this may be set close to 1 (e.g. 0.99), but in cases where GO category membership is determined via automated learning, the confidence may be significantly lower. In the results reported here we set all w# constant, but we have done other work to be reported elsewhere, in which the wJk vary significantly because some geneGO assignments are made via machine learning. Given all these preliminaries, we now define the amalgamated expression value of the GO category GOk, as GO_expik= Zj:gjEGk[ wjk gene_expy]
(1)
Using this approach, one may obtain, for each entity being categorized, an extended feature vector of length.
575 |gene_set| + |GO_cat_set|
(2)
where \gene_set\ is the number of genes measured by the microarrays under use, and \GO_cat_set\ is the number of gene ontology categories being utilized. Alternately, one may choose to utilize only the GO-based feature vector entries, thus obtaining a feature vector of length \GO_cat_set\. 3. Experimental Results 3.1. Biological Test Data In order to test the enhanced feature vectors methodology, we have experimented with two publicly available datasets: 1.
2.
Lung cancer [7]. This dataset has expression data on samples of lung malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA), the goal being to classify between the two tumor types. The training data has 32 samples, equally split between the two classes. The test set has 134 ADCA samples and 15 MPM samples. This dataset has already been submitted to supervised categorization analyses using ensembles of decision trees, which obtained test set accuracies of 93.29% [15]. Aging Human Brain [12]. This dataset contains samples of three categories of subjects divided by age: "young" (less than 42 years old); "middle age" (between 42 and 72 years); "old" (over 72 years). In our experiments, we used a subset of this original dataset containing only the categories "young" (9 samples) and "old" (11 samples).
3.2. Methods First of all, the datasets underwent a normalization based on log-transform and Z score, in such a way that, after normalization, all features had mean zero and variance 1. Then, GP and SVM classification methods available at the categorization framework of Biomind ArrayGenius Software were used on all datasets. In particular, we used the metatasking capability of ArrayGenius: the experiments for each dataset were actually composed of 500 GP tasks and 500 SVM tasks running with random parameters; the results presented here are the best ones found via this process. All parameters not mentioned below were left at their ArrayGenius default values. All combinations of use of direct and derived features (see section above on feature vector enhancement ) were allowed. In SVM tests, only the kernel parameter was varied among all available alternatives. In GP tests, fitness function was varied across all available
576 alternatives. For each test, a random number N between 10 and 1000, and the test used only the top N features with values most differentiated among the categories in the problem. In the case of the Lung Cancer dataset, separate training and test sets were used for statistical validation, in order to allow comparison with previously published results. For the Aging Brain dataset, we used 10x10 cross-validation, since it had no prior division of test and train datasets. 3.3. Results Table 1 . Summary of the accuracy we obtained in our tests using enhanced feature vectors. We compare these results with previous reports on the same dataset when those are available. Accuracy values are for the test sets only.
Dataset
Method
Lung Cancer Aging Brain
SVM GP SVM GP
Accuracy with Enhancement 100.0% 97.0% 100.0% 95.0%
Accuracy without Enhancement 100.0% 91.3% 95.0% 70.0%
Accuracy in Literature 93.3% —
Tables describing our results on these datasets in more detail are available in the Supplementary Information, online at http://www.biomind.com/cibb06.html, where results on other datasets are also noted. Tests using only derived features showed comparatively poor performances. For the lung cancer dataset, the use of derived-only features achieved only 89.9% for both GPC and SVM. For the ageing brain dataset, the use of derived-only features achieved only 80.0% accuracy using GP and 90.4% using SVM. Notably, SVM achieved 100% classification accuracy on both of them, beating the best results in the literature on the lung cancer dataset, namely 93.3%o accuracy, which had been achieved through supervised classification technology. The ageing brain dataset does not have any published benchmark against which to compare our results. GP achieved 97 and 95% accuracies on the lung cancer and ageing brain datasets, respectively. These are numbers for GP with enhanced features, which beat the results for GP without enhanced features on both cases.
577 The models learned using genetic programming together with enhanced feature vectors are compact and informative. Below, we show the best model found for the aging brain dataset using genetic programming, in algebraic form. 1. Aging Brain: (((GO:0015671-FAM0010221) * GO:0001565 ) * (0.849917 + SF000628)) / ( FAM0040135 / GO:0001775)
At this point one can ask some questions to further validate those results: 1) are the BI features selected coherent with the traditional features, i.e., do they represent the genes that are differentially expressed and 2) do they assist with understanding the biological phenomena underlying the data? 3.3.1. Aging Brain data To validate the coherent utilization of BI features versus normal ones (i.e., to check if the BI features are somehow correlated with the traditional ones) we performed a curated analysis of the 20 features most utilized by the models in our GP ensembles as compared to the genes found as differentially expressed in the original work [11]. In this work the greatest changes observed in expression occurred in genes related with synaptic function, neuronal plasticity, signal transduction, lipid metabolism, vesicular transport, protein metabolism, Ca+ homeostasis, microtubule cytoskeleton, amino acid modification, hormones and immune response. The vast majority of BI features most used by our ensembles are clearly correlated with the gene functions differentially expressed in the original work, notably long-term neuron survival (features GO:0048169 and GO:0008582), lipid metabolism (GO:0004063 and GO:0004064), and immune response (GO:0004915, GO:0045917, GO:0048143 and GO:0019981). It's interesting to notice that BI features correlated with common diseases from aging process, notably Parkinson's, were also found (GO:0048154 and GO:0048155). 3.3.2. Lung Cancer data In this dataset there is no clear indication of specific differentiated categories in the original work [7], so we analyzed the feature utilization directly, comparing the normal and BI features generated by our algorithms. The presence in the "important feature lists" of several features related with cytoskeleton (like
578 NM_001614, NM005775 and GO:0005519), and with fibroblast growth (NM_002010, NM_000604 and SF000628) is a good sign. Several characteristics of cancer clearly are represented by BI features: locomotion both from tumor cells and immune ones [2] - and replication, represented by fibroblast growth. References 1. A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer and Z. Yakhini, J ComputBiol. 7(3-4), 559-83 (2000). 2. A. Besson, R. K. Assoian and J. M. Roberts, Nat. Rev. Cancer. 4, 948-55 (2004). 3. M. Brown, W. Grundy, D. Lin, N. Cristianini and C. Sugnet, Proc. Natl. Acad. Set USA. 97, 262-267 (2000). 4. J. Cho, D. Lee, J. Park and I. Lee, FEBS Letters 571, 93-98 (2004). 5. N. Cristianini and J. Shaw-Taylor, Support Vector Machines, Cambridge University Press 2000. 6. Gene Ontology Consortium, Nat. Genet. 25, 25-29 (2000). 7. G. Gordon, R. Jensen, L. Hsiao, S. Gullans, J. Bluemnstock, S. Ramaswamy, W. Richard, D. Sugarbaker and R. Bueno, Cancer Res. 62, 4963-4967 (2002). 8. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Machine Learning 46, 389422 (2002). 9. T. Hvidsten, A. Laegreid and J. Komorowski, Bioinformatics 19, 1116-1123 (2003). 10. R. Kothapalli, S. J. Yoder, S.Mane and T.P. Loughran, BMC Bioinformatics 3(1), 22 (2002). 11. J. Koza. Genetic Programming. MIT Press 1992. 12. T. Lu, Y. Pan, S. Y. Kao, C. Li, I. Kohane, J. Chan and B. A. Yankner, Nature 24, 883-891 (2004). 13. J. Lyons-Weiler, Applied Bioinformatics 2(4), 193-195 (2003). 14. D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo, A. Renshaw, A. D'Amico, J. Richie, E. Lander, M. Loda, P. Kantoff, T. Golub and W. Sellers, Cancer Cell 1 203-209 (2002). 15. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardins, S. Levy, Bioinformatics 21, 5, 631-643 (2005). 16. A. Tan and D. Gilbert, ,4/?/)/. Bioinformatics 2, S75-S83 (2003). 17. J. Wang, T. B0, I. Jonassen, O. Myklebost and E. Hovig, BMC Bioinformatics 4 (2003). 18. C. Wu, L. Yeh, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z. Hu, R. Ledley, P. Kourtesis, B. Suzek, C. Vinayaka, J. Zhang and W. Barker, Nucleic Acids Res. 31, 345-347 (2003).
P R O T E I N SECONDARY STRUCTURE PREDICTION: H O W TO IMPROVE ACCURACY BY INTEGRATION
LUIGI PALOPOLI DEIS, Universita delta Calabria, Italy e-mail: [email protected] SIMONA E. ROMBO DIMET,
Universita
"Mediterranea" di Reggio Calabria, e-mail: [email protected]
Italy
GIORGIO TERRACINA Dip. di Matematica, e-mail:
Universita della Calabria, [email protected]
Italy
GIUSEPPE TRADIGO ICAR-CNR, Rende, Italy e-mail: [email protected] PIERANGELO VELTRI * Universita
keywords:
"Magna GrtEcia" di Catanzaro, e-mail: [email protected] ph: +39 0961 3694149 fax: +39 0961 3694112
Italy
Proteomics, Protein Structure Prediction, D a t a Integration
In this paper a technique to improve protein secondary structure prediction is proposed. The approach is based on the idea of combining the results of a set of prediction tools, choosing the most correct parts of each prediction. The correctness of the resulting prediction is measured referring to accuracy parameters used in several editions of CASP. Experimental evaluations validating the proposed approach are also reported.
* contact author
579
1. Introduction Biological functions of proteins depend on the spatial disposition of amino acids composing them. Even if new protein amino acid sequences are continuously discovered, identifying their spatial disposition requires many efforts. Experimental, or exact, methods such as X-Ray Crystallography or Solution Nuclear Magnetic Resonance (NMR-Ray), are very expensive and time consuming. Thus, computer based automatic tools have been designed to predict protein structures, and such methods received great attention in the last few years 3 ' 5 ' 4 . Recently, many tools have been proposed and are available on-line 6 ' 1 3 , achieving good prediction accuracies. Nevertheless, quality is still not comparable with that obtained by exact methods and research for quality prediction improvements is considered an important research topic 2 . Moreover, most of the existing prediction tools have high accuracy only on specific groups of proteins. Thus, a challenging problem is to devise prediction methods capable of achieving high levels of accuracy independently of the input proteins they are applied to. Recently, to improve the quality of prediction and to reduce the input dependency, methods based on a joint use of available prediction tools have been proposed 6 ' 7 ' 8 . In this paper we focus on secondary structure prediction, presenting a novel approach based on the integration of prediction results obtained by several existing prediction tools. The idea is to select and integrate the best predictions in order to obtain higher accuracy than using a single prediction tool. Such an idea is similar to what it has been done for tertiary structures prediction 8 , but focusing on secondary structures. The following example shows the basic idea of the proposed approach. Given a protein p, composed by k amino acids, its secondary structure can be represented as a string with cardinality k on the alphabet of three symbols £ = {E, H, L}, meaning that the corresponding amino acid stands respectively on an a-helix, a /3-strand or a non regular conformation. Let X i , . . . , Tn be the predictions for a protein p obtained by using n different prediction tools. The idea is to combine (integrate) T i , . . . , Tn to obtain a new prediction. Figure 1(a) schematically shows the foregoing when n = 5. Each prediction is represented by a bar filled using three different textures, one for each possible secondary structure configuration (black for a-helix (H), striped for /3-strand (E) and white for non regular shapes (L)). The bottom part of the figure reports the real (i.e., obtained by exact methods) secondary structure of p, using the same notation. Figure 1 (b) shows that, combining three out of the five predictions (namely, Ti, T3 and T5), the
Set of predictions (except T2 and T4):
Result of predictions "composition":
Result of predictions "composition":
Real secondary structure:
Real secondary structure:
~KW
(>)
(b)
Figure 1. (a) "Composition" of predictions, (b) "Composition" of predictions discarding the worst predictions.
result is closer to the real structure than the one reported in Figure 1 (a). Note, by the way, that predictions T2 and Tn are less accurate than T\, T3 and T5, if compared with the real structure. The main contribution of the paper consists in the definition of a method for the integration of different predictions; this is carried out by applying an appropriate criterion to locate and combine the "best" parts of the various predictions. 2. Parameters Definition To measure the accuracy of a prediction, some parameters have been defined in the literature 1 2 > u , Given a prediction tool and the amino acid sequence for a protein p, the three-state prediction accuracy Q3 represents the percentage of secondary structure configurations (i.e., states) correctly predicted by the prediction tool. The per-segment accuracy SOV measures the percentage of segments of secondary structure correctly predicted, where a segment is a contiguous set of amino acids. Q3 and SOV can be evaluated once the real (observed) protein secondary structure is available. Using such parameters, we define new ones in order to evaluate the accuracy of a prediction tool w.r.t. a set of proteins. In particular, given a secondary structure prediction tool T,, and a set P of m proteins whose observed secondary structure is known, we define the average per-segment accuracy coefficient SOV^), as follows:
SOVu) = ^~>»SOV™ (1) x 100 w m where SOV^j) indicates the value of SOV corresponding to the prediction of Ti for the j-th protein in P . SOV indicates the ability of a prediction tool
to correctly predict entire sections of secondary structures. Such information is necessary to evaluate how much the "opinion" of such prediction tool is to be considered accurate, whenever a situation of disagreement among prediction tools occurs. Given a set T of n prediction tools, and given a protein p with k amino acids, we define a consensus parameter to measure the agreement among prediction tools in T while predicting the secondary structure of p. In particular let T; € T be a prediction tool, and kj the j-th amino acid in p, we define the consensus percentage C(y) as follows:
^ x 100. (2) n where Nc^jf is the number of prediction tools in T that have predicted the same result as T, for the j - t h amino acid of p. The consensus percentage indicates how much a prediction tool agrees with the remaining n — 1 ones in predicting a single amino acid state. Similarly, given a prediction tool Tj and a segment Sj in the predicted structure for p, in order to evaluate the consensus of the prediction tool T; with the remaining n — 1 prediction tools in T w.r.t. the segment Sj, we define the superposition mutual coefficient for segment SOVmutuai as: C(i,j) = ^
X) SOVmutUai(i)
= —
SOVTiT> " ' _ i
(3)
where SOVTiTl is analogous to SOV with the difference that SOV is evaluated by considering a predicted and an observed secondary structure, whereas SOVTiT' measures the segment overlap existing between two predicted structures. 3. Integrating Prediction R e s u l t s Approach The integration approach exploits a set T of n prediction tools. The inputs are the amino acid sequence (primary structure) of a protein p whose secondary structure is unknown, and a set P of m proteins whose structures are known (observed), and such that they are related to the protein p. More precisely, the proteins in P are required to be homologous to p in their structures, and in their biological functions. We notice that several tools and databases that classify proteins in families, referring to their biological functions, mutations or protein structures, are available on-line 9 , 1 ° .
583 3.1. Assigning
a Vote to Each Prediction
Tool
Let p be a protein with k amino acids and let T be a set of n prediction tools that have expressed n distinct predictions for the secondary structure of p. The proposed approach combines the n predictions in order to obtain a predicted secondary structure for p with a better accuracy than each single prediction. It consists in selecting, for each amino acid of p, the most probable state (helix, strand or irregular state) from the n predictions. To integrate predictions we define a voting matrix M(n x k) where M[i, j] represents a vote for the prediction of the i-th prediction tool w.r.t. the j - t h amino acid of p. M is defined by using the parameters defined in Section 2, evaluated running the prediction tools in T on a set P of proteins whose secondary structure is known and such that proteins in P are homologous to p. The voting matrix M is then defined as follows:
M[i, j] = SOV
(0
+ C(i,j) + SOVmutual{i)
(4)
In particular, SOV^) can be considered as a reliability score for the i-th prediction tool to predict the secondary structures of the proteins in P . C^j) represents a punctual agreement among the i-th prediction tool and the other ones in predicting the j-th amino acid of p, whereas SOVmutuai^) represents a structural agreement index comparing the i-th prediction w.r.t. the remaining ones. The following section reports the integration algorithm exploiting the voting matrix.
3.2. The Integration
Algorithm
Let p be a protein, P a set of known proteins homologous to p and T a set of n prediction tools. Suppose to know the n secondary structure predictions for p given by the prediction tools in T , and let M be the voting matrix computed as in equation 4 w.r.t. p, P and T . procedure I n t e g r a t e ^ , T, M, s) begin for each amino acid j of p do begin if all the prediction tools in T agree on the amino acid j t h e n s[j] = the j-th symbol of the prediction of one prediction tool in T; else b e g i n select i such that M[i, j] is the maximum of the column j of M; s ti] = t h e j-th symbol of the prediction of the i-th prediction tool; end else end for end procedure
584 The integration algorithm obtains a secondary structure prediction s for p as follows. For each amino acid j in p, the prediction of the i-th tool is chosen, where i is obtained determining the maximum value M[i, j] for the column j . Finally, the prediction s is the sequence obtained by concatenating the predictions chosen for each amino acid in p. Table 1. Comparison between accuracy measures of the prediction tool scoring the maximum values of SOV (resp. Q3) and the integration tool. Ts (resp., T*5) indicates that the tool T scored the best SOV (resp., Q3) for that protein. Protein
Tools
ldlw lmwb
porter, psipred 5 , rosetta^ prof, jufo, r o s e t t a S Q , yaspin prof, porter, jufo, psipred"^, yaspin psipred 5 , rosetta w , sam r o s e t t a 5 y , porter psapredict, prof, porter, psipred , rosetta 1 ^, yaspin prof, porter, psipred 5 w , rosetta, sam, yaspin porter, psipred 5 , rosetta, s a m ^ prof, porter, psipred, rosetta, sam , yaspin prof 5 , porter, s a m y psipred4*1, jufo, yaspin, rosetta porter, j u f o y , yaspin, rosetta, sam porter, j u f o 5 y , rosetta p o r t e r 5 y , yaspin, rosetta p o r t e r 5 , jufo, prof, yaspin, s a m Q prof, p o r t e r 5 ^ porter, yaspin 5 *^, sam porter, psipred , s a m ^ prof, psipred^, sam prof, psipred , sam prof, psipred 5 ^, sam
lidr li78 lk24 lilz livs llrz leiy lset lfaf lbqO lxbl lhdj lfpo lmm4 lp4t lqj8 lg90 lbxw iqjp
M a x SOV Tool SOV Q3 83.62 86.21 91.87 96.00
M a x Q3 Tool SOV Q3 83.21 91.38 91.87 96.00
Integration Tool SOV Q3 85.2 88.79 99.05 95.12
98.77
91.91
96.95
94.12
99.51
94.12
68.80 67.11 84.20
78.45 78.66 79.64
67.10 67.11 71.89
81.48 78.66 82.91
77.89 67.11 86.71
82.83 78.66 80.00
72.94
78.77
72.94
78.77
75.64
80.74
82.28
80.75
81.02
82.63
84.50
82.63
62.48
73.43
62.48
73.43
63.69
71.14
79.57 85.42
76.96 83.54
75.78 82.44
78.38 87.34
82.43 87.54
78.15 93.67
56.98
71.84
52.55
79.61
60.18
74.76
76.85 92.21 92.73
89.72 90.91 91.8
76.85 92.21 91.71
89.72 90.91 92.98
71.11 92.95 93.31
86.92 92.21 92.40
81.43 81.02 76.08 79.98 89.68 88.08
82.35 79.35 76.35 82.95 87.79 85.96
81.43 81.02 67.15 77.67 89.68 88.08
82.35 79.35 77.7 83.52 87.79 85.96
81.43 82.07 79.12 86.31 97.66 95.51
82.35 80.65 77.7 86.36 87.79 86.55
4. Experiments To validate our approach we tested the algorithm in Section 3.2 on proteins whose secondary structures are published in the Protein Data Bank (PDB) 1, using a set of 9 available prediction tools, namely porter, psipred, psapredict, jufo, prof, rosetta, sam, and yaspin. Results of experimental evaluations are reported in Table 1. Here, each
585 row corresponds to a test on a protein p (assumed unknown as far as the test was concerned) whose PDB identifier is reported in the first column. For each p the table shows: (i) the prediction tools considered in the integration process; in this column we use the notation Ts (resp., T ^ ) to highlight the tool T scoring the best SOV (resp., Q3) on p among the considered ones; (ii) values of SOV and Q3 for Ts; (in) values of SOV and Q3 for T®; (iv) values of 5 0 V and Q3 for our integration tool. Integration improves SOV parameter in 85.7% of cases, and Q3 parameter in 66.6% of cases. Note that, usually the prediction tool scoring the maximum SOV value does not obtain the maximum Q3, and vice versa (see, e.g., rows 1, 3 and 4). On the contrary, the integration tool scores the best accuracy for both SOV and Q3 in many cases and it shows accuracy values that are very close to the maximum ones in the other cases. As a consequence, it is possible to conclude that our integration tool tends to improve the overall accuracy of the prediction, considering both measures. The selection of prediction tools used for the integration process is currently semi-automatic. We are working to define an appropriate technique for the automatic selection of the best prediction tools set, to provide a fully automated tool for the prediction of protein secondary structures. Integration procedure (described in Section 3.2), and voting matrix evaluation are fully implemented in Java. Automatic querying of available prediction tools and results normalization are currently under development. All these modules are part of a more complex architecture for the automatic prediction of secondary structures whose prototyping will be soon available. Finally we plan to face a further challenge, that is, the use of the presented tool in combination with tertiary structures prediction ones to further improve overall accuracy. 5. Related Work In 6 an interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. The main difference w.r.t. our approach is that they aim at individuating the best results among the available predictions exploiting a consensus technique, whereas our system integrates only the subset of available predictions allowing to improve prediction accuracy. In 7 a method based on the cooperative exploitation of different tertiary structure prediction tools is proposed. The tool is based on the selection
586 of models predicted by a number of independent fold recognition servers, by confidence assignment. In 8 an approach for tertiary structures prediction is proposed. Such approach considers the characterization of the performances of a team of prediction tools jointly applied over a prediction problem, choosing the best team for a prediction problem and integrating prediction results of the tools in the team in order to obtain a unique prediction. Differently from our approach, 7 ' 8 face the problem of protein tertiary structures prediction, thus the technique to combine predictions, the voting matrixes and the reference measures of precision are completely different from our own. References 1. H.M., J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. The protein data bank. Nucleic Acids Research, 28:235-242, (2000). 2. Critical assessment of techn. for protein structure prediction. http://predictioncenter.llnl.gov/casp6/Casp6.html. 3. D. M. Webster, "Protein Structure Prediction: Methods and Protocols (Methods in Molecular Biology)", Humana Press, 15 August (2000). 4. C. Guerra, S. Istrail, "Mathematical Methods for Protein Structure Analysis and Design: Advanced Lectures (Lecture Notes in Bioinformatics)", Springer, 13 August (2003). 5. A. Tramontano, "Protein Structure Prediction : Concepts and Applications", John Wiley & Sons , (2006). 6. J. A. Cuff, M. E. Clamp, A. S. Siddiqui, M. Finlay and G. J. Barton, "Jpred: a consensus secondary structure prediction server" Bioinf. 14, 892-893 (1998). 7. D. Fischer, "3D-SHOTGUN: a novel, cooperative, fold-recognition metapredictor" Proteins 51:3, 434-41 (2003). 8. L. Palopoli and G. Terracina, "CooPS: a system for the cooperative prediction of protein structures" J. of Bioinf. and Comp. Biol. 14, 14-16 (2004). 9. A. G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, "SCOP: a structural classification of proteins database for the investigation of sequences and structures" Journal of Molecular Biology 475, 536-540 (1995). 10. S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Anang, W. Miller and D. J. Lipman, "Gapped BLAST and PSI-BLST: a new generation of protein database search programs" Nucleic Acids Research 25, 3389-3402 (1997). 11. C. Venclovas, A. Zemla, K. Fidelis and J. Moult, "Some measures of comparative performance in the three CASPs" Proteins: Structure, Function, and Genetics 34, 220-223 (1999). 12. B. Rost, C. Sander and R. Schneider, "Redefining the goals of protein secondary structure prediction" Journal of Molecular Biology 235, 13-26 (1994). 13. G. Pollastri and A. McLysaght, "Porter: a new, accurate server for protein secondary structure prediction" Bioinf., in press. Bioinf. Adv. Ace. (2004).
THE STABILIZATION EFFECT OF THE TRIPLEX VACCINE*
F . P A P P A L A R D O , S.MOTTA, E. MASTRIANI, M. PENNISI Dept. of Mathematics
E-mail:
& Computer Science and Faculty of Pharmacy University of Catania, V.le A. Doria, 6, 1-95125 Catania, Italy {francesco,motta,mastriani}Qdmi.unict.it; [email protected] P.-L. LOLLINI
Sezione di Cancerologia, Dipartimento di Patologia Sperimentale, and Centro Interdipartimentale di Ricerche sul Cancro "Giorgio Prodi" University of Bologna, Viale Filopanti 22, 1-40126 Bologna, Italy E-mail: [email protected]
Cancer immunoprevention vaccines are based, like all vaccines, on drugs which gives to the immune system the necessary information to recognize tumor cells as harmful. The vaccine, in endogeneous tumors, cannot eliminate all tumor cells, but maintains them to a non dangerous level. In this paper we show that the vaccine's administrations acts as a stabilizing perturbation for the immune system. Results suggest that it is possible to model such an effect using ODE based technique with an "external input".
1. I n t r o d u c t i o n T h e Immune System
(IS) is a c o m p l e x a d a p t i v e s y s t e m of cells a n d
molecules, d i s t r i b u t e d in t h e v e r t e b r a t e s b o d y , t h a t p r o v i d e s a basic d e *This work was supported in part by IMMUNOGRID project, under EC contract FP62004-IST-4, No. 028069. F.P. and S.M. acknowledge partial support from University of Catania research grant and MIUR (PRIN 2004: Problemi matematici delle teorie cinetiche). This work has been done while F.P. is research fellow of the Faculty of Pharmacy of University of Catania. P.-L.L. acknowledges financial support from the University of Bologna, the Department of Experimental Pathology ("Pallotti" fund) and MIUR.
587
588 fense against pathogenic organisms. A vaccine is a drug which provides the immune system with a first encounter with pathogen using inactivated ones. In such a way the system produces memory cells that will be able to destroy the active pathogen at the next encounter. A special kind of phatogens are cancer cells. Being self cells, they are usually not recognized by the immune system as dangerous. Tumors are caused by a combination of exogenous and endogenous factors. Elimination of exogenous factors (industrial carcinogens, tobacco, and so on) is in principle quite easy. This is achieved by governmental decrees and laws or changes in lifestyle. Other tumors are mainly due to endogenous or unknown factors, for example the major risk factors for breast cancer are related to hormonal (estrogenic) stimulation of the mammary gland during fertile life. Cancer immunoprevention is based on the use of immunological approaches to prevent solid tumor, rather than to cure cancer. This is mostly important in endogenous originated tumors in which cancer cells are continuosly formed from corrupted normal cells. Cancer immunoprevention vaccines are based (like all vaccines) on drugs which gives to the immune system the necessary information to recognize tumor cells as harmful. For this reason, cancer vaccines need to be administred to the host for his entire life. The vaccine cannot eliminate all tumor cells but stabilizes them to a non dangerous level. Comprehensive discussion on vaccines, immune system and models can be found in Lollini et al.3. In this paper, we treat immunoprevention vaccines looking in particular to the stabilizing effect of the vaccine's administration. The plan of the paper is the following. In Section 2 we will recall the model which reproduce the effect of an immunoprevention vaccine for mammary carcinoma and presents the analysis of the model results in an appropriate states' space which describe the cancer - immune system competition. In Section 3 we draw conclusions and plan for future works.
2. A n a l y s i s of t h e Triplex stabilizing effect A typical, and unfortunately very spread, endogenous cancer is the mammary carcinoma. Research on immunoprevention cancer vaccines for this carcinoma started in the mid-'90s. Various attempts were made by Lollini et al.2 and others to prevent mammary carcinoma in HER-2/neu transgenic mice using immunological maneuvers. A complete prevention of mammary
589 carcinogenesis with the Triplex vaccine was obtained when vaccination cycles started at 6 weeks of age and continued for the entire duration of the experiment, at least one year (chronic vaccination) 2 . The question whether the Chronic protocol is the minimal vaccination protocol yielding complete protection from tumor onset, or whether a lower number of vaccination cycles would provide a similar degree of protection, is still an open question. Finding an answer to this question via a biological solution would be too expensive in time and money as it would require an enormous number of experiments, each lasting at least one year. For this reason we developed an accurate model of immune system responses to vaccination. A detailed description and applications of this model to vaccine schedules can be found in Pappalardo et al.5'6 and in Motta et al.4. From the point of view of entities description, the model is inspired by Boltzmann equations. The simulator is based on lattice Boltzmann automata and the results one observe in the figures of the quoted papers, describes the moments of the distributions functions of the various entities with respect to time. However the fight of the immune system against harmful cancer cells is a competition which recalls the well known LotkaVolterra equations. As matter of fact, there is a population of prey, (the cancer cells) with infinite food resources (the host blood) and different populations of predators (effector cells) which try to recognize and eliminate them. At variance with classical predator-prey models the predator survival is not determined by the prey, but they exist at a certain level as normal state of the host using the same food resources, the blood. As tumor cells are self cells, they are hardly recognized by the immune system so that without any vaccine's treatment tumor takes over, destroies the immune system efficiency and kills the host. When vaccine is administred effector cells recognize tumor cells, their number raises in order to eliminate tumor cells and then goes back to normal level. However in endogeneous originated tumors new cancer cells will be formed and then it needs a new vaccine administration to stabilize the cancer - immune system competition. Thus the system including cancer cells is unstable and will be stabilized by an external action (the vaccine). The situation is similar, but reversed, to mechanical system where external forces induce instability (like wind effect on bridges). The state of the system describing the immune system - cancer competition can be summarized in three foundamental parameters: the number of cancer cells, the number of cytotoxic cells (representing the cellular response) and the number of antibodies (representing the humoral response).
These parameters can be represented in a 3-D states' space. Curves in this space will then represent the evolution of the system. Time is the curves' parameter. To show the effect of the vaccine in stabilizing the system, we firstly look to the ground state, i.e. the behavior of the immune cells when no cancer cells are present. Then we analyze three cases previuosly studied in Motta et al.A.
Figure 1.
Ground states for normal immune system
The ground state behavior is shown in Figure 1 where we plotted the number of B and Cytotoxic T cells along the normal host's life. Figure 1 shows that due to the stochastic nature of cells birth and death the ground state is a region and not a single point. We then analyze the three cases quoted above. Firstly we analyze the untreated case. In this case there is no action of the immune system and the number of cancer cells grow with no control until solid tumor formation (Figure 2a). Figure 2b shows that both humoral and cellular immune response are absent.
Figure 2.
3-D states' space for no-treated case
591 Then we consider the Early schedule, which consist on three vaccination cycle starting at week 6. One vaccination cycle consisted of four intraperitoneal administrations of non-replicating (mitomycin-treated) vaccine cells over two weeks followed by two weeks of rest 1 . The effect of the schedule is to reduce the initial growth of tumor cells (Figure 3a). This effect is shown in Figure 3b as a large loop in which the cancer cells growth is reduced by immune response which is represented by cytotoxic T cells and antibodies increasing. After this initial phase (~160 days), the cancer cells grow with no constraints and the straight line is similar to the plot of the untreated case. (b)
Figure 3.
3-D states' space for Early treatment
Finally we consider the Chronic schedule. This schedule consists on repeated Early cycle administrations for the entire life of the host. Looking at Figure 4a one can see that after an initial burst, the cancer cells are kept under a safe threshold and the solid tumor formation is inhibited by Triplex vaccine. Figure 4c shows (for the initial phase, ~160 days) the same behavior of the Early schedule. After this initial phase, the system (immune system - cancer) is stabilized (Figure 4b) and the equilibrium region is better shown in Figure 4d.
3. Conclusions We presented an analysis of the effect of a cancer immunoprevention vaccine modeled by computer simulations. This analysis shows that the effect of the vaccine is to stabilize the immune system - cancer competition around values which are safe for the host. This result suggests that it is possible to model this effect using ODE based technique including "external inputs". Work in this direction is in progress and results will be published in due course.
(b) 100000
400
(0
(<0 Abx10°
TCx10'
TCxIO2
CCxIO4
Figure 4. 3-D states' space for Chroninc treatment
References 1. C. De Giovanni, G. Nicoletti, L. Landuzzi, A. Astolfi, S. Croci, A. Comes, S. Ferrini, R. Meazza, M. Iezzi, E. Di Carlo, P. Musiani, F. Cavallo, P. Nanni, P.-L. Lollini, Immunoprevention of HER-2/neu transgenic mammary carcinoma through an interleukin 12-engineered allogeneic cell vaccine, Cancer Res. 64(11), 4001 (2004). 2. P.-L. Lollini, G. Nicoletti, L. Landuzzi, C. De Giovanni, P. Nanni, New target antigens for cancer immunoprevention, Curr. Cancer Drug Targets, 5(3), 221 (2005). 3. P.-L. Lollini, S. Motta, F. Pappalardo, Modeling models in tumor immonology, Mathematical Models and Methods in Applied Sciences, to appear, 2006. 4. S. Motta, P.-L. Lollini, F. Castiglione, F. Pappalardo, Modelling Vaccination Schedules for a Cancer Immunoprevention Vaccine, Immunome Research, 1(5), doi:10.1186/1745-7580-l-5 (2005). 5. F. Pappalardo, P.-L. Lollini, F. Castiglione, S. Motta, Modelling and Simulation of Cancer Immunoprevention vaccine,Bioinformatics, 21(12), 2891 (2005). 6. F. Pappalardo, E. Mastriani, P.-L. Lollini, S. Motta, Genetic Algorithm against Cancer, Proceedings of Second International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2005), Lectures Notes in Computer Science, 3849, 223 (2006).
LEARNING CLASSIFIERS FOR HIGH-DIMENSIONAL MICRO-ARRAY DATA ANDREA BOSIN Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy
ofCagliari,
NICOLETTA DESSI Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy
ofCagliari,
BARBARA PES Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy
ofCagliari,
In this paper, we address the challenging task of learning accurate classifiers from microarray datasets involving a large number of features but only a small number of samples. We present a greedy step-by-step procedure (SSFS) that can be used to reduce the dimensionality of the feature space. We apply the Minimum Description Length principle to the training data for weighting each feature and then select an "optimal" feature subset by a greedy approach tuned to a specific classifier. The Acute Lymphoblastic Leukemia dataset is used to evaluate the effectiveness of the SSFS procedure in conjunction with different state-of-the-art classification algorithms.
1. Introduction The advent of DNA micro-array technology [1] has brought broad patterns of gene expressions simultaneously recorded in a single experiment and several data sets have become publicly available on the Internet. These data sets present multiple challenges, including a large number of gene expression values per experiment (several thousands of genes, usually referred as features), and a relatively small number of samples (a few dozen of patients). Micro-array data can be analyzed from many different viewpoints. Recent works [2][3][4] apply supervised learning for cancer classification: in this case available training samples, coming from patients whose pathological condition (e.g. cancer or normal) is known, are used for building classifiers suitable for medical diagnosis. In addition, the identification of discriminatory genes is of 593
594 fundamental and practical interest since medical diagnostic tests may benefit from the examination of a small subset of relevant genes. This paper goes further in this direction and proposes a greedy step-by-step feature selection heuristic (SSFS). Specifically, the Minimum Description Length criterion [5] is applied for weighting each feature according to its correlation with the target class. The resulting weights serve as input to an iterative procedure that evaluates different feature subsets by measuring the predictive accuracy of a classifier built on them. The smallest subset leading to the best accuracy is selected as "optimal". An experimental study is carried out to evaluate the proposed heuristic in conjunction with different classification algorithms. Specifically, we investigate the effectiveness of Naive Bayes [6], Adaptive Bayesian Network [7], Support Vector Machines [8] and k-Nearest Neighbor [9] in predicting many different classes or targets (i.e. diagnoses). The paper also shows how the knowledge of a domain expert makes it possible to replace one multi-target classifier with a set of binary classifiers, one for each target, with a substantial improvement in accuracy and performance. The Acute Lymphoblastic Leukemia (ALL) dataset [10] has been used as a testbed for the experiments presented here. The paper is organized as follows. Section 2 illustrates our learning strategy. The experiments and the related results are described in Section 3. Finally, Section 4 presents a brief discussion and concluding remarks. 2. Learning strategy A known problem in classification, and in machine learning in general, is to reduce the dimensionality of the feature space to overcome the risk of "overfitting" that arises when the number of training patterns (i.e. instances) is small and the number of features is comparatively large. In such a situation, we can easily learn a classifier that correctly describes the training data but performs poorly on an independent set of test data. Given a particular classifier, the selection of the best subset of features by exhaustive search and evaluation of all the possible subsets is impractical for a large dimensional input space, but it can be used in combination with another method that first reduces the number of features to a manageable size. In this paper, we evaluate a wrapper technique [11][12] based on the MDL principle [5] that removes many of the original input features and retains a minimum subset of features that yields best classification performance.
595 The MDL principle states that the best theory to infer from training data is the one that minimizes the length (i.e. the complexity) of the theory itself and the length of the data encoded with respect to it. MDL provides a criterion to judge the quality of a classification model and, as a particular case, it can be employed to address the problem of feature selection, by considering each feature as a simple predictive model of the target class [13][7][4], In order to select an "optimal" set of features, we use a greedy approach. First we rank and weight each feature according to its description length, that reflects the strength of its correlation with the target class, but any other ranking method can be used as well. In particular, we assume that the most informative features are those with the largest weights. Even if it is good in ranking features, MDL alone cannot be used as a feature selection criterion. To explicitly build a good feature subset and to evaluate its effectiveness, we adopt the iterative procedure, named Step-by-Step Feature Selection (SSFS), whose steps are detailed in Figure 1. 1
Rank all features according to their description length (MDL)
2
Select the set of the N top-ranked features (e.g. start with JV = 20)
3
Build a classifier from a training set D, filtered according to selected features
4
Test classifier accuracy on a test set T, filtered according to selected features Extend the set by adding the next k top-ranked features (e.g. k = 10) and put N =N+k
.
Repeat steps 2 to 5 and stop if the accuracy has not increased over the last J iterations (or when all the original attributes are appended to the sub-set)
Figure 1. Steps of the greedy SSFS procedure
We have applied different classification methods in conjunction with the same feature selection heuristic (by performing a set of independent experiments), in order to obtain evidence of the intrinsic effectiveness of the heuristic itself. In particular, we compare a Bayesian classification approach [6], Support Vector Machines (SVM) [8] and k-Nearest Neighbor (k-NN) [9], These techniques may be useful to identify expression patterns from Micro-array data, as witnessed by recent literature [3][4][14]. Specifically, in the context of Bayesian classifiers, we compare the performances of the Naive Bayes (NB) [6] and the Adaptive Bayesian Network (ABN) [7].
3. Experiments Our experimental study is carried out on the Acute Lymphoblastic Leukemia (ALL) dataset [10]. ALL is a heterogeneous disease consisting of various leukemia sub-types that remarkably differ in their response to chemotherapy [15]. The ALL dataset contains all known ALL sub-types (including T-ALL, E2A-PBX1, TEL-AMLl, BCR-ABL, MLL, Hy-perdip > 50) and consists of 327 samples, each one described by the expression level of 12558 genes. This dataset includes 215 training samples and 112 testing samples. In a first phase, we evaluate the SSFS procedure with NB, ABN, SVM and k-NN classifiers, by addressing the multi-class problem in one shot. That is, we look for multi-target models that are capable of distinguishing between all six ALL sub-types. For each classifier, we measure the errors on the test dataset, i.e. the number of patterns that are misclassified (corresponding to a diagnostic error). The results are shown in Figure 2 for an increasing number of features selected by MDL.
Figure 2. Misclassifications of multi-target classifiers for an increasing number of features selected by MDL.
As we can see, the number of features needed to achieve a stabilization in accuracy is very high, and the resulting classifiers have a relatively high level of misclassifications (5-10%). This confirms the difficulty of multi-target classification, in agreement with recent literature [16]. To circumvent this problem, a domain specific heuristic can be useful to decompose a multi-target classification problem in a structured set of binary classification problems, one for each target.
597 Specifically, we adopt a divide-and-conquer methodology based on clinical knowledge, experience and observation [15]. Indeed, when approaching the ALL diagnosis, doctors first look for evidence of the most unambiguous sub-type (i.e. T-ALL) against all the others (referred as OTHERS 1). If there is no T-ALL evidence, they look for the next most unambiguous sub-type (i.e. E2A-PBX1) against the remaining (OTHERS2). Then the process steps through TEL-AMLl vs. OTHERS3, BCR-ABL vs. OTHERS4, MLL vs. OTHERS5 and finally Hyperdip > 50 vs. OTHERS (which groups all samples not belonging to any of the previous sub-types). This approach can be reproduced in a machine learning process that builds and evaluates six binary classifiers, each one is responsible for only one ALL sub-type, i.e. is capable of distinguishing a single sub-type from all the others. When constructing such a binary classifier, all the samples belonging to a subtype different from the one considered have to be reassigned, in both training and test datasets, to a generic "OTHER" sub-type. In such a way, we only have two targets: the sub-type considered and "OTHER". On this basis, two distinct experiments have been performed. In the first one (referred as El), we learn each binary classifier in turn, according to the order specified above, leaving out from training and test datasets the patterns belonging to the sub-types already considered. Table 1 summarizes the cardinality of training and test sets for each binary classifier. Table 1. Experiment E1: number of training and test patterns for each binary classifier. ALL SUB-TYPE
Training set
Test set
T-ALL vs. OTHERS1 E2A-PBX1 vs. OTHERS2 TEL-AMLl vs. OTHERS3
28 vs. 187
15 vs. 97
18 vs. 169 52 vs. 117
9 vs. 88 27 vs. 61
BCR-ABL vs. OTHERS4
9 vs. 108
6 vs. 55
MLL vs. OTHERS5 Hyperdip > 50 vs. OTHERS
14 vs. 94 42 vs. 52
6 vs. 49 22 vs. 27
We learn each binary classifier by applying the SSFS procedure to the corresponding training and test datasets. This results in six different "optimal" subsets of 20 features. Each subset contains the attributes i.e. genes that characterize a specific ALL sub-type. Figure 3 shows the errors of NB, ABN, SVM and k-NN binary classifiers, in conjunction with MDL ranking.
1
' • NB DABN
T-ALL
,
E2APBX1
w 1 , rr 1 TELAML1
BCR-ABL
DSVM DK-NN
H1 MLL
Hyperdip >50
ALL sub-types
Figure 3. Misclassifications of binary classifiers in the experiment El.
On the contrary, in the second experiment (referred as E2) we learn each binary classifier without leaving out the patterns belonging to the sub-types already considered, i.e. we retain the complete training and test sets (consisting of 215 and 112 patterns) for learning and evaluating each classifier. In this way, we can verify if the use of smaller and smaller training datasets in experiment El (Table 1) influences the accuracy of classifiers. As in the experiment El, the SSFS procedure results in six "optimal" subsets of 20 features. We observe that the six feature subsets selected in the experiment E2 differ from those selected in the experiment El, even if we adopted the same ranking criterion (MDL). For each sub-type, Table 2 shows the number of different features in the corresponding subsets (out of 20). Table 2. Number of features that are not common to both experiments El and E2 T-ALL
E2A-PBX1
TEL-AML1
BCR-ABL
MLL
Hyperdip > 50
0
3
4
11
9
13
The ABN classifiers trained using these two different features sets (El and E2) result in the errors shown in Table 3. The performance is slightly better in E2 and the majority of errors occurs for different ALL sub-types. Table 3. Misclassifications of ABN binary classifiers in experiments El and E2. T-ALL El E2
E2A-PBX1
TEL-AML1
BCR-ABL
MLL
Hyperdip > 50
overall
4. Discussion and concluding remarks In the multi-target problem (Figure 1), the behavior of all classifiers is similar: the accuracy has some initial oscillation and a large number of features (between 300 and 700) is necessary to reach the convergence. On the contrary, binary classifiers achieve maximum accuracy with very few features: only 20 features are enough in all cases, meaning that we can identify every ALL sub-type by the corresponding set of 20 genes. Moreover, the number of misclassified samples is lower for binary models (Figure 3). Interestingly enough, the pre-processing of the training data needed to learn the binary models can influence the results. The comparison of different ABN binary classifiers gives some insight on this point. In both experiments El and E2 there are no errors in classifying T-ALL, E2A-PBX1, TEL-AML1 (Table 3) and the subsets of features are almost the same (Table 2). We can reasonably expect that also from a medical point of view the subsets of features selected by SSFS correspond to genes characterizing these ALL sub-types. Accuracy is also good for MLL, in both El and E2 setting, while it is variable for BCR-ABL and Hyperdip>50 (Table 3). In these last cases, the subsets of features show significant differences (Table 2). Leaving out or not leaving out part of the samples from the training datasets has important consequences on feature selection: this can be due both to the small number of training samples (e.g. for BCR-ABL) and to a not very sharp genetic characterization of these sub-types. In this case feature subsets correspond to genes that are not necessarily relevant from a medical point of view and further investigation is needed. When compared to recent studies on the same dataset [14], our results show some improvement in terms of predictive accuracy. This confirms that MDL can be useful in selecting relevant features, as we have suggested in a previous study on a different dataset [4]. As a last point, Table 4 reports the computational effort measured in minutes of CPU time on a 1.8 GHz Intel Pentium 4 processor, for the basic learning tasks, i.e. feature ranking, model training and model test. Table 4. CPU times (Intel Pentium 4 1.8 GHz) activity MDL ranking Model training Model test
20 attributes < 1 min. < 1 min.
400 attributes < 10 min. 2 min.
12558 attributes 100 min.
References 1. G. Hardimann., Microarray methods and applications: Nuts & bolts. DNA Press (2003). 2. T.R. Golub et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286 (1999). 3. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46 (1-3): 389 - 422 (2002). 4. A. Bosin, N. Dessi, D. Liberati and B. Pes, Learning Bayesian Classifiers from Gene-Expression MicroArray Data, Proceedings of WILF 2005, LNAI, vol. 3849, Springer-Verlag (2005). 5. A. Barron, J. Rissanen and B. Yu, The minimum description length principle in coding and modelling, IEEE Transactions on Information Theory, 44: 2743-2760 (1998). 6. N. Friedman, D. Geiger and M. Goldszmidt, Bayesian Network Classifiers, Machine Learning, 29: 131-161 (1997). 7. J.S. Yarmus, ABN: A Fast, Greedy Bayesian Network Classifier (2003). http://otn.oracle.com/products/bi/pdf/adaptive bayes net.pdf. 8. V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, NY, USA (1998). 9. T.M. Cover and P.E. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13:21-27 (1967). 10. http://www.stjuderesearch.org/data/ALL 1 /. 11. A. Blum and P. Langley, Selection of relevant features and examples in machine learning, Artificial Intelligence, 97:245-271 (1997). 12. R. Kohavi and G. John, Wrappers for feature subset selection, Artificial Intelligence, 97:273-324 (1997). 13. I. Kononenko, On biases in estimating multi-valued attributes, IJCAI95, 1034-1040(1995). 14. H. Liu, J. Li and L. Wong, A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns, Genome informatics 13: 51-60 (2002). 15. E. J. Yeoh et al., Classification, sub-type discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1:133-143 (2002). 16. S. Mukherjee, Classifying Microarray Data Using Support Vector Machines, Understanding And Using Microarray Analysis Techniques: A Practical Guide. Kluwer Academic Publishers, Boston, MA (2003).
P R E D I C T I O N OF R E S I D U E E X P O S U R E A N D C O N T A C T N U M B E R FOR SIMPLIFIED H P LATTICE MODEL PROTEINS USING LEARNING CLASSIFIER SYSTEMS
MICHAEL STOUT, JAUME BACARDIT, JONATHAN D. HIRST, JACEK BLAZEWICZ AND NATALIO KRASNOGOR* Automated Scheduling, Optimisation and Planning Research Group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 IBB, UK Email: {jqb, mqs,nxk} @ cs.nott.ac.uk School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK Email: [email protected] Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland Email: [email protected]
The performance of a Learning Classifier System (LCS) applied to the classification of simplified hydrophobic/polar (HP) lattice model proteins was compared to other machine learning (ML) algorithms. The GAssist LCS classified functional HP model proteins on the 3D diamond lattice as folding or non-folding at 88.3% accuracy, outperforming significantly three out of the four other methods. GAssist correctly classified HP model protein instances on the basis of Contact Number (CN) and Residue Exposure (RE) on both 2D square and 3D cubic lattices at a level of between 27.8% and 80.9%. Again, the LCS performed at a level comparable to the other ML technologies in this task outperforming significantly them in 24 out of 180 cases, and being outperformed just six times. The benefits of using LCS for this problem domain are discussed and examples of the LCS generated rules are described.
1. I n t r o d u c t i o n Prediction of structural properties of proteins such as residue exposure (RE) and coordination number (CN) based solely on protein sequence has recently received renewed attention. In other studies, simplified protein 'corresponding author
601
602 models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar restricting the residue locations to those of a lattice. This paper compares CN and R E prediction for simplified HP model proteins using machine learning technologies, in particular Learning Classifier Systems (LCS). LCS apply Evolutionary Computation to Machine Learning problems. Four questions were examined: 1) Is it possible to predict, from sequence alone, which proteins will and will not fold? 2) Is it possible to predict which residues have above or below average CN and RE? 3) Is it possible to predict the detailed CN and RE states? and 4) Are LCS suitable tools for these tasks? 2. B a c k g r o u n d 2.1. Protein
Structure
Prediction
The prediction of the 3D structures of proteins is a fundamental and difficult problem in computational biology. Popular approachs include predicting specific attributes of proteins, such as secondary structure, solvent accessibility or coordination number. The contact/coordination number (CN) problem is defined as the prediction, for a given residue, of the number of residues from the same protein that are in contact with it. Two residues are said to be in contact when the distance between the two is below a certain threshold. This problem is closely related to contact map (CM) prediction. While protein structure prediction remains unsolved, researchers have resorted to simplified protein models to try to gain understanding of both the process of folding and the algorithms needed to predict i t 1 . Approaches have included fuzzy sets, cellular automata, L-systems and memetic algorithms (for references see 2 ) . One common simplification is to focus only on the residues (C-alpha or C-beta atoms) rather than all the atoms in the protein. A further simplification is to reduce the number of residue types to less than twenty by using residue sequence representations based, for instance, on physical properties such as hydrophobicity, as in the so called hydrophobic/polar (HP) models. Another simplification is to reduce the number of spatial degrees of freedom by restricting the atom or residue locations to those of a lattice 1 ' 3 ' 4 . Lattices of various geometries have been explored, e.g., two-dimensional triangular and square geometries or threedimensional diamond and face centered cubic. Idealized models have been used, among other things, to study the nature of the energy landscape, the uniqueness of the native state or associated degenerate sequences, the origin of the two-state thermodynamic behavior of globular proteins (i.e.
603 first folding into secondary structures and later into a three dimensional shape), the existence of cooperative folding (i.e. an energy gap between the native conformation and the closest non-native one) and structure-function relations (for further references see 5 , 2 ) 2.2. HP
Models
In the HP model (and its variants) the 20 amino acids are reduced to two classes: non-polar or hydrophobic (H) and polar (P) or hydrophilic amino acids. An n amino acid protein is represented by a sequence s £ {H, P}+ with \s\ = n. The sequence s is to be mapped to a lattice, where each residue in s occupies a different lattice cell and the mapping is required to be self-avoiding. The energy potential in the HP model reflects the fact that hydrophobic amino acids have a propensity to form a hydrophobic core. In the standard HP model, contacts that are HP and P P are assigned an energy of 0 and an HH contact is assigned an energy of - 1 . Whilst in the functional model protein (PMP), HP and P P receive a value of 1 and HH a value of - 1 . For an F M P sequence to be viable it must fold into a unique native state (unlike Dill's model 6 where the same sequence could have a variety of minimum energy states), the native structure is required to have a binding pocket, i.e. at least one hole in the conformation 5 . Moreover, there must exist an energy gap between the minimum energy conformation and the next excited state. In this paper, rather than applying optimisation methods 7 ' 8 to minimise the energy of the structures we concentrate on classification of models. We employ a class of machine learning techniques called Learning Classifier Systems, and in particular we use the GAssist system 9 which is based on a binary representations of rules (see section 3 for more details). 10 3. M e t h o d o l o g y Three datasets were employed (Table 1). A 3D HP diamond lattice data set used for the Fold/Non-fold experiments (3DFNF), a 3D HP cubic lattice dataset used for the CN and RE experiments (3DCNRE) and a 2D square lattice dataset used for the CN and R E experiments (2DCNRE). Datasets are available on-line at h t t p : //www. cs . n o t t . a c . uk/~nxk/hppdb.html. The experimental design was as follows: 1) For all residues, calculate CN and RE. CN is typically defined as the number of non-contiguous residues within a given radius (r=1.0 lattice unit) of each residue. R E was defined as the distance of each residue from the center of mass of the protein. 2) Create instance sets by moving a window of fixed length over the sequence-attribute
Table 1.
Details of the data sets used in these experiments.
Dataset Identifier Lattice Dimensions Lattice Type Coordination Number Model Type Number of Sequences Number of Structures Maximum Sequence Length Minimum Sequence Length Total Residues Total Hydrophobic Total Polar Source
3DFNF 3D Diamond 4 FMP 4196352 893 23 23 96516096 48258049 48258047 Taken from 1 1
3DCNRE 3D Cubic 6 HP 15 15 48 27 640 316 309 Taken from yz
2DCNRE 2D Square 4 FMP 4428 4428 20 20 92988 42638 45922 Taken from 1 "
vectors, assigning a class to each instance: the value of that attribute for central residue in the window. 3) Split the instance sets into Training and Test sets. 4) Apply machine learning tools to predict the classes in Test Sets. 5) Extract classification accuracies for each algorithm. 6) For the non-deterministic algorithms (GAssist) iterate 10 times with different random number seeds. 7) Calculate the mean prediction accuracy. 8) Perform student t-tests on the mean prediction accuracies to determine which algorithms significantly outperformed the others (using a confidence interval of 95 and Bonferroni correction 14 for multiple pair-wise comparisons). Windows were generated for one, two and three residues at each side of a central residue. For each attribute and for each window size, three class assignment levels (Two State, Three State and Five State) were explored. For two state assignment residues were assigned the class 1 (high) or 2 (low) according to whether their attribute value was below or above the average for that attribute value in that particular the protein. For three states the class assignments were 1 (low), 2 (intermediate) or 3 (high) for the lower, middle or upper third of the range respectively. In five state assignments the classes were 1, 2, 3, 4 or 5 for the first, second through fifth portion of the range respectively. Composed of a rule learning algorithm and a rule inference engine, LCSs have the ability to balance multiple, potentially conflicting, constraints (e.g. formation of local structures vs global structures) and can produce high quality predictions. Moreover, LCS can produce human understandable explanations of the rules they have used to make their classifications, unlike, for example, neural network based systems. GAssist 9 is a Pittsburgh learning classifier system descended from GABIL 10 . The system applies a near-standard Genetic Algorithm (GA) that evolves individuals that represent complete problem solutions. Each individual consists of a variable length rule set. We used the rule-based knowledge representation of the
605 GABIL 10 system (see section 5 for an example of a generated rule set). The experimental parameters used for the GAssist experiments were the default values 9 except that for the larger datasets (2DCNRE), where 25 strata were used rather than the two strata used by default. One thousand iterations of the LCS were used. GAssist was compared against Naive Bayes, C4.5, IBk (k=3) and JRip, all of them taken from the WEKA machine learning package. 4. Experimental Results 4.1. Results of Fold Non-Fold Classification
Experiments
Table 2 summarises the results of the Fold/Non-fold classification experiments on the 3D Diamond Lattice Structure dataset. For each algorithm the overall average and deviation of test accuracy is shown. GAssist was the best method on this dataset, outperforming significantly three of the four other tested methods. Table 2. Averaged Classification Accuracies (%) for 3D HP Fold/Non-Fold Experiments. A • means that GAssist significantly outperformed the Algorithm to the left Algorithm Naive Bayes GAssist IBk JRip C4.5
Total 74.8±3.1 • 88.3±1.7 81.8±2.7 • 86.9±3.1 • 87.9±2.5
4.2. Results of CN and RE Classification
Experiments
Table 3 summarises the results of the classification experiments for CN and RE for the 3DHPCNRE the 2DHPCNRE datasets. For each algorithm the overall average and deviation of test accuracy is shown. GAssist performed at a similar or better level than the other tested machine learning methods. It significantly outperformed other methods 24 times and it was outperformed in just six of the tested datasets. 5. Discussion The performance of the GAssist LCS was equal or better than the other tested methods, especially on the fold/non fold dataset. It was outperformed significantly very few times. From a general point of view we can say that CN is easier to classify than RE, and that the 2D lattice data are also more difficult to classify than the 3D data. On the 3D lattice, CN can be classified around 80%, 67% and 52% for two, three and five states, and
606 Table 3. Averaged Classification Accuracies (%) for 2D and 3D HP CN and RE Experiments. A • means that GAssist significantly outperformed the Algorithm to the left, a o means that the Algorithm on the left outperformed GAssist Exper. States Alg.\Win. Size Naive Bayes GAssist 2
IBk
JRip
C4.5 Naive Bayes GAssist CN
3
IBk JRip C4.5 Naive Bayes GAssist
5
IBk
JRip C4.5 Naive Bayes GAssist 2
RE
3
IBk
JRip C4.5 Naive Bayes GAssist
IBk JRip C4.5 Naive Bayes GAssist
5
3 79.7±5.8 79.9±6.0 80.1±6.0 80.1±6.0 80.2±fi.O 67.1±5.6
67.1±6.0» 66.1±6.3 60.7±5.2 67.5±5.6 51.6±4.4 51.4±4.5 51.3±4.6 45.5±3.7» 51.7±4.5 77.8±5.5 77.9±5.5 78.2±5.3 78.1±5.3 77.8±5.4 63.0±5.7 62.0±5.5 61.1±4.9 59.7±3.0 61.6±5.2 37.3±6.6
37.6±5.9
IBk
37.0±5.7
JRip
34.5:1:2.9
C4.5
38.2±6.8
3D Data 5 79.9±5.2 80.2±5.4 79.0±5.4 80.1±5.8 79.9±5.7 67.2±4.6
7 80.2±4.5 79.6±4.7 78.0±5.1 79.9±5.0 79.8±4.6 67.3±4.9 67.7±4.6 67.3±5.0 66.7±5.3 64.9±5.7 64.8±5.2 64.5±4.9 67.7±4.7 65.8±5.1 S2.2±4.4 51.8±5.8 51.3±4.4 52.9±5.3 49.6±4.6 48.8±5.8 46.9±4.3» 49.0±6.0 50.7±4.2 52.3±5.1 78.6±4.4 79.7±4.4 78.1±4.8 78.2±4.2 76.7±S.l 76.2±4.3 77.8±4.8 78.3±4.6 T T i i i l J 77.9±4.1 63.3±5.2 62.5±5.5 61.7±5.5 62.1±4.7 61.0±5.0 61.8±5.2 5 9 . 0 ± 3 . 3 . 61.4±3.9 61.7±5.3 64.1±4.1 38.6±6.1 37.6±6.1 36.2±5.9 39.2±g.3 36.7±5.9 38.5±6.1 33.6±3.8« 3<S.2±S.S 36.8±6.3 38.9±4.9
2D D a t a 5 63.9±0.4 64.1±0.4 64.1±0.4 63.8±0.4
7 62.6±0.4« 64.9±0.3 65.1±0.4 64.7±0.4
61.2±0.3 64M0.4
es.i±o.4
3 61.2±0.3 61.2±0.3 61.2±0.3 61.2±0.3 70.9±0.2
70.9±0.2 6 8 . 5 ± 0 . 2 .
70.S±0.4 7i.0±o.4 7l.0±0.4 70.9±0.2 70.9±0.2 70.9±0.2 58.1±0.2 S8.1±0.2 58.1±0.2 58.1±0.2 58.1±0.2 56.9±0.5 56.9±0.4
^.§±0.4'
56.9±0.4 56.9±0.4 43.3±0.3 43.3±0.3 43.3±0.3 43.3±0.3 43.3±0.3 27.8±0.2 27.8±0.3 27.8±0.3 2EJ.3±0.0. 27.8±0.3
71.1±0.3 70.5±U.3« 7l.l±0.3 56.8±0.2« 58.7±0.3 58.7±0.3 57.6±0.3» 58.6±0.3 60.0±0.4» 60.4±0.5 6O.B±0.5 60.2±0.5 60.5±U.4 45.4±0.3» 46.5±0.3 46.5±0.3 45.6±0.3» 46.5±0.3 27.8±0.3»
3o.8±0.5 31.1±0.5 28.4±0.3» 31.2±0.5o
71.0±0.2 70.5±0.3» 71.0±0.2 56.4±0.3» 58.8±0.3 58.9±0.3 57.6±0.3. 58.8±0.2 58.7±0.5» 61.4±0.5
61.9±0.6o
61.1±0.5 61.7±0.6 44.2±0.3» 47.2±0.6 47.8±0.So 46.5±0.4. 47.8±0.4o 28.1±0.4» 32.0±0.6 33.1±0.4o 28.0±0.3« 33.0±0.4o
RE can be classified around 78%, 62% and 38%. For the 2D lattice data, CN can be classified around 65%, 71% and 59%, and RE can be classified around 62%, 47% and 33% for two, three and five states. The fold/non fold domain can be classified with an 88% accuracy. Beside its performance, GAssist has another advantage, which is the generation of compact and interpretable solutions. GAssist generated on average rule sets consisting of 52.8, 9.6 and 3.5 rules for the 3DFNF, 2DCNRE and 3DCNRE datasets, respectively. As an example, we show a rule set from an individual generating 87.3% accuracy for two state prediction with a window size of seven (three residues either side of the residue being predicted) for the CN domain using 3D lattice. An X symbol is used to represent positions at the end of the chains, that is beyond the central residue being studied, H means high CN, L means low CN. The rule set only had three rules, and at most three of the seven input attributes were expressed. The rules are interpreted in order, therefore all examples not matched by the first or second rules are assigned class L.
607 (1) If Positioni-i £ {p}, Positicmi £ {h}, Positiom+x ^ {h} then class is H (2) If Positioni-2 £ {X}, Positiorii e {h} then class is H (3) Default class is L
Moving from highly abstract (2 class) to more informative predictions (5 class) more input data (larger windows) are required in order to facilitate learning. The 3D structures on the cubic lattice have less than 50 residues, as a result the training data has an unnaturally high proportion of exposed/low-CN residues (including hydrophobic residues which are more usually found buried). Analysis (not shown) of the distribution of residues by class showed that for the 2D square lattice structures this bias in the input data distributions is less pronounced. We have extended these studies to real proteins (papers submitted) and HP representations of real proteins 2 . In the future we will investigate computation and prediction of other structural properties such secondary structures and disulfide bridges. 6. Conclusions These studies have shown that: a) it was possible to discriminate at around 80% accuracy, from sequence alone, which proteins will and will not fold b) It was also possible to predict which residues have above or below average CN and RE c) it is possible to predict the detailed CN and RE states of residues and d) The GAssist LCS performs at a level comparable to other ML algorithms on these problems. Of the WEKA algorithms studied, those based on orthogonal representations perform slightly better than those which are not. Minimalist lattice structure models focus on the essential details of protein structure prediction. Moving from highly abstract predictions (above/below mean for a given attribute) to more detailed structural predictions (eg. five state CN), accuracy can be increased by incorporating more local residue pattern information in the inputs (increased window size). However, in real proteins, only some contacts (secondary structure contacts) arise from local residue sequence patterns that may be recognizable in short fragments/windows. Other contacts arise from long-range global features of proteins and these may not be evident in short local sequence patterns. Future studies will extend these investigations with classifications based on other structural attributes and studies of real protein datasets. 7. Acknowledgments We acknowledge the support provided by the UK Engineering and Physical Sciences Research Council (EPSRC) under grants GR/T07534/01 and
608 GR/62052/01 and the Biotechnology and Biological Sciences Research Council (BBSRC) under grant BB/C511764/1. References 1. Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92 (1995) 325-329 2. Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J.: From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: 4th European Workshop on Evolutionary Computation and Machine Learning in Bioinformatics 2006 (to appear). (2006) 3. Blazewicz, J., Dill, K., Lukasiak, P., Milostan, M.: A tabu search strategy for finding low energy structures of proteins in hp-model. (Computational Methods in Science and Technology) 4. Blazewicz, J., Lukasiak, P., Milostan, M.: Application of tabu search strategy for finding low energy structure of protein. Artificial Intelligence in Medicine 35 (2005) 135-145 5. Hirst, J.D.: The evolutionary landscape of functional model proteins. Protein Engineering 12 (1999) 721-726 6. Dill, K., Bromberg, S., Yue, K., Fiebig, K., Yee, D., Thomas, P., Chan, H.: Principles of protein folding: A perspective from simple exact models. Prot. Sci. 4 (1995) 561 7. Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Proceedings of the Parallel Problem Solving from Nature VII. Lecture Notes in Computer Science. Volume 2439. (2002) 769-778 8. Hart, W., Istrail, S.: Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. Journal of Computational Biology 3 (1996) 53-96 9. Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Catalonia, Spain (2004) 10. DeJong, K., Spears, W., Gordon, D.: Using genetic algorithms for concept learning. Machine Learning 13 (1993) 161-188 11. Blackburne, B.P., Hirst, J.D.: Three dimensional functional model proteins: Structure, function and evolution. Journal of Chemical Physics 119 (2003) 3453-3460 12. Hart, W.: (www.cs.sandia.gov/tech_reports/compbio/tortilla-hpbenchmarks.html) Tortilla HP Benchmarks. 13. Blackburne, B.P., Hirst, J.D.: Evolution of functional model proteins. Journal of Chemical Physics 115 (2001) 1935-1942 14. Miller, R.G.: Simultaneous Statistical Inference. Springer Verlag, New York (1981) Heidelberger, Berlin.
A STUDY ON THE EFFECT OF USING PHYSICO-CHEMICAL FEATURES IN PROTEIN SECONDARY STRUCTURE PREDICTION
G. L. JAYAVARDHANA RAMA, M.PALANISWAMI Dept of Electrical and Electronics Engineering, The University of Melbourne, Parkville, Victoria - 3010, Australia [email protected] and [email protected] DANIEL LAI Dept of Electrical and Computer Systems Engineering, Monash University, Clayton, Victoria - 3168, Australia daniel. lai@eng. monash.edu. au. MICHAEL W. P A R K E R St. Vincent's Institute of Medical Research, 9, Princes Street, Fitzroy, Victoria - 3065, Australia. mparkerQsvi. edu. au
Protein structure prediction is a powerful tool in today's drug design industry as well as in the molecular modeling stage of x-ray crystallography research. This paper proposes a redefined encoding scheme based on the combination of ChouFasman parameters, physico-chemical parameters and position specific scoring matrix for protein secondary structure prediction. A new method of calculating the reliability index based on the number of votes and the SVM decision value is also proposed and it has been shown to assist design of better filters. The proposed features are then tested on the RSI 26 and CB513 datasets and shown to give better cross-validation results compared to the existing techniques.
1. Introduction The dependence on experimental methods of protein structure prediction may not yield protein structures fast enough to keep up with the requirement of today's drug design industry. With the availability of abundant 609
proteomic data, it has been shown that it is possible to predict the structure through machine learning techniques. As the prediction of tertiary structure from protein sequence is a very difficult task, the problem is usually sub-divided into secondary structure prediction and super secondary structure prediction leading to tertiary structure. This paper concentrates on secondary structure prediction using position specific scoring matrix and physico-chemical properties as features. Secondary structure prediction is based on prediction of the 1-D structure from the sequence of aminoacid residues in the target protein 22 . Several methods 23 have been proposed to find the secondary structure including PHD 21 , PROF-King 17, PSIPred 12, JPred 4 and SAMT99-Sec 14. Recently, significant work has been done on secondary structure prediction using Support Vector Machines. Hua and Sun 24 used SVMs and profiles of the multiple alignments and reported Q3 score as 73.5% on the CB513 dataset 4 . In 2003 Ward et. al. n reported 77% with P SI-BLAST profiles on a small set of proteins. In the same year Kim and Park 15 reported an accuracy of 76.6% on the CB513 dataset using PSI-BLAST Position Specific Scoring Matrix (PSSM). Nguyen and Rajapakse 18 reported a highest accuracy of 72.8% on RS126 dataset 2 using a two stage SVM. Guo et. al 7 used dual layered SVM with profiles and reported a highest accuracy of 75.2% on CB513 dataset. In this paper we make use of the Chou-Fasman parameters 19, physicochemical properties including Kyte-Dolittle Hydrophobicity 8 , Grantham Polarity 20 and Rigidity of Proline and compare it with existing techniques which mainly use only position specific scoring matrix (PSSM) obtained from PSI-BLAST. We investigate the performance when the PSSMs are used with physico-chemical properties as features. We propose a new method to calculate the Reliability Index (RI) based on the number of votes each class receives in combination with the decision value of the SVM classifier. We suggest an improvement to the tertiary classifier proposed by Hu et. al. 6 by calculating the posterior probability 10 of SVM decision value.
2. M e t h o d s Non-homologous CB513 4 and RS126 2 datasets were used for experiments as these are the most commonly used datasets in the literature. The secondary structure definitions used in our experiments were based on the DSSP 27 algorithm. The 8 to 3 state reduction method used was H to H, E to E and all others to C where H stands for a Helix, E for /? Strand and
611 C for Coil. We used six parameters derived from physico-chemical properties and probability of occurrence of amino acids in each state. Chou-Fasman conformational parameters 1 9 (3 parameters), Kyte-Dolittle Hydrophobicity scale 8 , Grantham Polarity 2 0 and Presence of Proline in the vicinity (1 parameter each) were used as the features in this set. Kyte-Dolittle hydrophobicity values and Grantham Polarity values were taken from the Protscale a website. The last parameter in the set is used to represent the information of rigidity Ri due to Proline residues. If a Proline residue is present at a particular position, Ri is given by 1 otherwise 0. We call this D2C-PC. In the rest of the paper, the term physico-chemical refers to the six features from D2C-PC. A second set containing position specific scoring matrix (PSSMs) generated by PSI-BLAST 2 5 using non-redundant (NR) database was used. pfilt 5 was used to filter the low complexity regions, coiled-coil regions and transmembrane helices before subjecting to PSI-BLAST. After getting the PSSM, a window of length w was considered around every residue and this is used as a feature for the classifier. PSSM have 20 * L elements where L is the length of the protein chain. We used the following function to scale the profile values from the range (-7,7) to the range (0,1) 15 . ( 0.0 g{x) = I 0.5 + 0.1x { 1.0
x < -5 -5 < x < 5 x>5
(1)
where x is the value of the PSSM matrix. All the values within the window of length w were considered 12 . The final feature length for each residue of this set is w*20. We call this D2C-PSSM. The third set comprises of combination of D2C-PC and D2C-PSSM as feature vector and we call this D2C-PCPSSM Support
Vector
Machines
(SVM)
The Support Vector Machines (SVM) developed by Vapnik 2 6 has been shown to be a powerful supervised learning tool for binary classification problems. The d a t a to be classified is formally written as © = { ( x i , 2 / l ) , ( x 2 , t/2) • • • ( x n . y n ) }
xi e nm K6{-1,1> a
http://au.expasy.org/tools/protscale.html
(2)
The SVM formulation defines a boundary separating two classes in the form of a linear hyperplane in data space where the distance between the boundaries of the two classes and the hyperplane is known as the margin. The idea is further extended for data that is not linearly separable where it is first mapped via a nonlinear function to a possibly higher dimension feature space. The nonlinear function is never explicitly used in the calculation. We note that maximizing the margin of the hyperplane in either space is equivalent to maximizing the distance between the class boundaries. For space constraint, details about SVM formulation has not been included in this paper. We use implementation by Lai et. al. 16 , namely the D2CSVMb software based on the heuristic framework 16 for all our experiments. More details about the algorithm can be found in the paper by Lai et. al. 16 . The SVM classifier decision surface is given by / ( x ) = sign I ^ouyiK
(x,x;)+ b\
(3)
3. Classification Prediction of secondary structure by the proposed technique reduces to a three class (H, E, C) pattern recognition problem. The idea is to construct six classifiers which include three one vs one classifiers {H/E, E/C, C/H) and three one vs rest classifiers (H/H,E/E,C/C). To adjust the free parameters of binary SVMs, we selected a small set of proteins containing about 20 chains and performed cross validation with different sets of parameters. Based on these experiments, we selected Radial Basis Function (RBF) kernel with C = 2.5 and a = 4 for all six classifiers. We designed three one vs one classifiers H/E, E/C and C/H and three one vs rest classifiers H/H, E/E, C/C. An ensemble of classifiers used by Hua and Sun 24 and SVMJFLepresent method proposed by Hu et. al. 6 are combined using our voting procedure. The ensemble of classifiers used is as shown in Figure 1. The classifier which gives absolute maximum value amongst H/E, E/C, C/H classifiers is used for decision making. However, the output of the SVM classifier is uncalibrated and it is not wise to use it directly for comparison. We can convert the output of SVM to a posterior probability 10,13 . We use Piatt's method to moderate the uncalibrated output to posterior probabilities Px which would range between 0 and 1. Finally the classifier is chosen according to eq. 4. Vi is incremented based on b
http://www-personal.monash.edu.au/~dlai/
613
V, =» Vole Class i
F i g u r e 1.
Classification Voting Scheme.
classification of the chosen classifier arg
max
\PX — 0.51
(4)
ie{HE,EC,CH}
Reliability
Index and Post
Processing
The reliability index (RI) we propose is based on the highest vote the class gets as well as the posterior probability of the one vs rest classifier. The vote Vi any class can get is in the range (0,4). As discussed earlier, the posterior probability of the winning class P is in the range (0,1). We define RI by eq. 5. RI(k) = (0.5 *
VH/A)
+ (0.5 * Pki)
(5)
where k is the residue number and i represents the winning classifier. 4. Evaluation Methods We use standard Q% accuracy, SOV * and Matthew's correlation coefficients for comparing the proposed technique with the existing results in literature. The procedure described by Rost and Sander 2 was used for calculation of Q3 accuracies and Matthew's correlation coefficients. Q$ was calculated as follows: 3
E Mi Q3 = ^ ^ 5 — X 100 where, Aij = N u m b e r of residues p r e d i c t e d t o b e s t r u c t u r e j a n d observed in t y p e i b = T o t a l n u m b e r of residues in d a t a b a s e
Matthew's
Correlation y-»
where,
pt = A»
Coefficient
Cj
was
calculated
,,.•,
using
Pj.n,;— UjOj
V(Pi +"i 1(Pi +°i )("• +"M"i +°i ) 3 3 3 m; = £ ) £ Ajk for « f « , f t 7 0i = X) A?» jjLik^i
jyti
3 i = 12
u
j^i
(7) A
ij
We also use the Segment Overlap (SOV) Score proposed by Zemla et. al. * (SOV99) which is denned in eq. 8 SOV = 100 x
N
itiB.cy k)
(8)
len{si
™™^*>
\
where S(i) is the set of all overlapping pairs of segments ( s j , s 2 ) i n conformation state i, len(si) is the number of residues in segment s i , minov(si, s 2 ) is the length of the actual overlap and maxov(si,S2) is the total extent of the segment. 5. R e s u l t s a n d D i s c u s s i o n We tested our method on the RS126 and CB513 datasets. Ten fold cross validation was performed for the RS126 dataset and Seven fold cross validation was performed on the CB513 dataset for all experiments. Three sets of features (D2C-PC, D2C-PSSM and D2C-PCPSSM) were used to evaluate RS126 dataset and D2C-PCPSSM feature for CB513 dataset. Results for datasets RS126 and CB513 are tabulated in the following tables respectively. Method Kim and Park 1 5 Nguyen and Rajapakse PHD 2 1 JPred 4 f D2C-PC D2C-PSSM D2C-PCPSSM Method PHD " j PSIPred 1 2 t t JNet 9 f Hua and Sun 2 4 t Kim and Park 15 Guo et. al. 7 D2C-PCPSSM
Q3 70.8 76.5 76.4 73.5 76.6 75.2 77.9
18
Q3 76.1 72.8 72.5 74.8 61.47 74.6 76.9
SOV 79.6 66
QH
QE
77.2 66.1
63.9 57.8
Qc 81.5 81.9
74.5 60.3 70.18 75.2
60.5 70.29 74.56
59.7 65.03 68.2
61.2 79.28 79.1
CH
CE
CC
-
-
-
0.68 0.71 0.67
0.60 0.61 0.65
0.56 0.61 0.6
SOV 73.5
QH
QE
72
66
Qc 72
-
-
-
-
74.2 76.2 80.1 80 76.17
78.4 75 78.1 80.4 77.6
63.9 60 65.6 71.5 69.8
80.6 79 81.6 72.8 81.1
C: Matthew's correlation coefficients fSOV94 3 ^Results not for CB513 dataset
Overall we have looked at several aspects of protein secondary structure prediction including the use of physico-chemical properties as features, fast trainable support vector machines, reliable tertiary classifier and calculation of reliability index. From the cross validation experiments it is clear that the use of physico-chemical parameters will improve the performance of secondary structure prediction. As a fair comparison we have experimented with PSSM alone as feature set as well as PSSM along with physico-chemical properties. We found that the improvement in accuracy was about 3% (on RS126 dataset) demonstrating the role played by the physico-chemical properties.
References 1. Zemla A, venclovas C, Fidelis K, and Rost B. A modified definition of sov, a segment based measure for protein secondary structure prediction assessment. Proteins: Structure, Functions and Genetics, 34:220-223, 1999. 2. Rost B and Sander C. Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology, 247:584-599, 1993. 3. Rost B, Sander C, and Schneider R. Redefining the goal of protein secondary structure prediction. Journal of Molecular Biology, 235:584-599, 1994. 4. J A Cuff and G J Barton. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 34:508-519, 1999. 5. Jones DT, Taylor WR, and Thornton JM. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33:30383049, 1994. 6. Hu HJ, Pan Y, Harrison R, and Tai PC. Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier. IEEE Transaction on Nanobioscience, 3(4):265:271, 2004. 7. Guo J, Chen H, Sun Z, and Lin Y. A novel method for protein secondary structure prediction using dual layer svm and profiles. Proteins: Structure, Function and Bioinformatics, 54:738-743, 2004. 8. Kyte J and Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology, 157:105-132, 1982. 9. Cuff JA and Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. PROTEINS: Structure, Function and Genetics, 40:502-511, 2000. 10. Piatt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, MIT Press, pages 61-74, 2001. 11. Ward JJ, McGuffin LJ, Buxton BF, and Jonese DT. Secondary structure prediction with support vector machiness. Bioinformatics, 19(13):1650-1655, 2004.
12. DT Jones. Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292:195-202, 1999. 13. Kwok JTY. Moderating the outputs of support vector machine classifiers. IEEE Transaction on Neural Networks, 10(5):1018-1031, 1999. 14. K Karplus, C Barrett, and R Hughey. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14:846-856, 1998. 15. Hyunsoo Kim and Haesun Park. Protein secondary structure prediction based on an improved support vector machines approach. Protein Engineering, 16(8):553-560, 2003. 16. D Lai, N Mani, and M Palaniswami. Effect of constraints on sub-problem selection for solving Support Vector Machines using space decomposition. In The 6th International Conference on Optimization: Techniques and Applications (ICOTA6) accepted, Ballarat,Australia, 2004. 17. Ouali M and King RD. Cascaded multiple classifiers for secondary structure prediction. Protein Science, 9:1162-1176, 2000. 18. Nguyen MN and Rajapakse JC. Multi-class support vector machines for protein secondary structure prediction. Genome Informatics, 14:218-227, 2003. 19. Chou PY and Fasman GD. Conformational parameters for amino acids in helical, b-sheet, and random coil regions calculated for proteins. Biochemistry, 13(2):211-222, 1974. 20. Grantham R. Amino acid difference formula to help explain protein evolution. Science, 185:862-864, 1974. 21. B Rost. Phd: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266:525-539, 1996. 22. Burkhard Rost. Protein Structure Prediction in ID, 2D and 3D, volume 3. 1998. 23. Burkhard Rost. Review: Protein secondary structure prediction continues to rise. Journal of Structural Biology, 134:204-218, 2001. 24. Hua S and Sun Z. A novel method ofprotetin secondary structure prediction with high segment overlap measure: Support vector machine approach. Journal of Molecular Biology, 308:397-407, 2001. 25. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. Gapped blast and psi-blasat: a new generation of protein database search programs. Nucleic Acid Research, 27(17):3389-3402, 1997. 26. V. N. Vapnik. The nature of statistical learning theory. Statistics for engineering and information science. Springer, New York, 2nd edition, 2000. 27. Kabsch W and Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577-2637, 1983.
Acknowledgements: The authors would like to thank Prof. David Jones for kindly providing the pfilt program. We are also grateful to NCBI for access to the PSI-BLAST program and the Barton Group for making CB513 and RS126 datasets available on the web.
GENE EXPRESSION DATA ANALYSIS IN THE MEMBERSHIP EMBEDDING SPACE: A CONSTRUCTIVE APPROACH*
M. F I L I P P O N E , F. MASULLI AND S. R O V E T T A Department of Computer and Information Sciences University of Genova and CNISM Via Dodecaneso 35, I-I6I46 Genova, Italy {filippone, masulli, rovetta} @disi. unige. it
Exploratory analysis of genomic data sets using unsupervised clustering techniques is often affected by problems due to the small cardinality and high dimensionality of the data set. A way to alleviate those problems lies in performing clustering in an embedding space where each data point is represented by a vector of its memberships to fuzzy sets characterized by a set of probes selected from the data set. This approach has been demonstrated to lead to significant improvements with respect the application of clustering algorithms in the original space and in the distance embedding space. In this paper we propose a constructive technique based on Simulated Annealing able to select sets of probes of small cardinality and supporting high quality clustering solutions.
1. Introduction Clustering methods provide an useful tool to explore genomic data sets, but often the crude application of classical clustering algorithms leads to poor results. Actually, many clustering approaches suffer from being applied in high-dimensional spaces, as clustering algorithms often seek for areas where data is especially dense. However, sometimes the cardinality of the data sets available is even less than the number of variables. This means that the data span only a subspace within the data space. In these conditions, it is not easy to define the concept of volumetric density. Moreover, when space dimensionality is high or even moderate (as low as 10-15), the distance of a point to its farthest neighbor and to its nearest neighbor tend to become equal3'1. Therefore the evaluation of distances, •Work funded by the MIUR grant code 2004062740
617
and the concept of "nearest neighbor" itself, become less and less meaningful with growing dimension. Defining clusters on the basis of distance requires that distances can be estimated. For instance, one of the most common methods, c-means (CM) clustering, is based on iteratively computing distances and cluster averages. Increasing the data space dimensionality may introduce a large number of suboptimal solutions (local minima), and the nearest-neighbor criterion which is the basis of the method may even become useless. This problem is not avoided even when CM is modified in the direction of incorporating fuzzy concepts, e.g. as for the FCM (Fuzzy c-Means) algorithm 5 , 2 . If the cardinality of the data set is small compared to the input space dimensionality, then the matrix of mutual distances or other pairwise pattern evaluation methods such as kernels 13 may be used to represent data sets in a more compact way. P§kalska and Duin 1 2 have developed a set of methods based on representing each pattern according to a set of similarity measurements with respect to other patterns in the data set. In this framework the data set is embedded in a lower dimensional space called embedding space, in which, in the presence of large-dimensional data sets, a notable complexity reduction is achieved. Following this approach, the data matrix is replaced by a pairwise dissimilarity matrix D. Let X = {2:1, £ 2 . . . . ,xn} be a data set of cardinality n. We start by computing the dissimilarity matrix D:
dik =d(xi,xk)
Vi,k
(1)
according to an assigned dissimilarity measure d(x, y) between points x and y (e.g., using Euclidean distance). Applications of projection into dissimilarity embedding spaces to clustering are reported in 7 ' 1 0 . As pointed out in 1 2 , the dissimilarity measure should be a metric, since metrics preserve the reverse of the compactness hypothesis: "objects that are similar in their representation are also similar in reality and belong, thereby, to the same class". Often non-metric distances are used as well. In the following, we will adopt the Euclidean distance as the dissimilarity measure. In case of a data set with dimensionality N there is the upper bound of N + 1 probes (or support data)12 that we can use in order to build the dissimilarity matrix. In the case of genomic data this upper bound is often un-realistic, since the cardinality is much lower than the dimensionality. However, for data having some structure, it is not necessary to reach this
upper bound for good representation. We only require that the dimension of the embedding space is large enough to preserve the reverse of the compactness hypothesis. On the other hand, if the embedding dimension n is lower than N + l, some points could have an ambiguous representation and, moreover, clustering could be affected by the high metrical contribution of farthest points. In order to avoid those problems, in6 we proposed a different kind of embedding based on the space of memberships to fuzzy sets centered on the probes, that we will call Membership Embedding Space (MES) . Following this approach, a point in the embedding space will be represented by a vector containing only few non-null components (depending on the width of the membership function), in correspondence of the closer probes in the original feature space. In our experiments, the memberships of fuzzy sets centered on the probes were modeled using the following normalized function:
v* =
~r
K
(2)
where i = 1 , . . . , n and k = 1 , . . . , s. Note that the parameter p regulates the spread of the membership function and it is related to the average distance between the data points. For large values of (3 the memberships tends more rapidly to zero than for little (3. In the MES each data point Xi is represented as a row of v^. In this paper, we propose a constructive method to obtain the set of probes leading to optimal clustering in the MES using Simulated Annealing. 2. Simulated Annealing for Probe Selection The proposed method for probe selection makes use of the Simulated Annealing (SA) technique9 that is a global search method technique derived by Statistical Mechanics. SA is based on the work by Metropolis et al. 11 aimed to simulate the behavior and small fluctuations of a system of atoms starting from an initial configuration, by the generation of a sequence of iterations. In the Metropolis algorithm each iteration is composed by a random perturbation of the actual configuration and the computation of the corresponding energy variation {AE). If AE < 0 the transition is unconditionally accepted, otherwise the transition is accepted with probability given by the Boltzmann distribution:
P(AE) = exp(-£pj
(3)
where K is the Boltzmann constant and T the temperature. In SA this approach is generalized to the solution of general optimization problems9 by using an ad hoc selected cost function (generalized energy), instead of the physical energy. SA works as a probabilistic hill-climbing procedure searching for the global optimum of the cost function. The temperature T takes the role of a control parameter of the search area (while K is usually set to 1), and is gradually lowered until no further improvements of the cost function are noticed. SA can work in very high-dimensional searches, given enough computational resources.
(1) Initialize parameters (see list in Tab. 1); (2) Initialize the binary mask g at random; (3) Perform clustering and evaluate the generalized system energy E\ (4) do (5) Initialize / = 0 (number of iterations), h=0 (number of success); (a) (b) (c) (d) (e) (f)
do Increment number of iterations / ; Perturb mask g; Perform clustering and evaluate the generalized system energy E; Generate a random number rnd in the interval [0,1]; if rnd < P(AE) t h e n i. Accept the new g mask; ii. Increment number of success h;
(g) endif (h) loop until h < hmin
and / <
(6) update T = aT; (7) loop until h > 0; (8) end. Figure 1.
Simulated Annealing Probe Selection (SA-PS) Algorithm.
In Fig. 1 the proposed Simulated Annealing Probe Selection (SA-PS) algorithm is shown. In our approach the state of the system is represented by a binary mask g = (pi, g2,..., gn), where each bit gt (with i = l,...,n) corresponds to the selection (ffj = 1) / deselection (gi = 0) of a probe. The initialization of the vector mask g (Step 2) is done by generating so integer
621 numbers with uniform distribution in the interval [l,n] and setting the corresponding bits to 1 of g and the remaining ones to 0. At each step only s probes are selected from the original set of n patterns. A perturbation or move is done in the following way: (1) Chose randomly w G [w m j n ,w max ] and v £ [vm-m,vmax\; (2) w bits of g set to 1 are switched to 0; (3) v bits of g set to 0 are switched to 1. The values wm\n,wmax,vm\n,vmax can be used to reduce or to increase the variability of each perturbation. Once a set of probes is selected, it is possible to represent each pattern in the Membership Embedding Space (MES) and to perform clustering. In the experiments reported in the remainder of this paper, we performed clustering using the FCM algorithm2, but many other clustering algorithms can be employed. The generalized energy E is computed as a linear combination between an assigned clustering quality measure e and the number of selected probes s: E = s + \s
(4)
The clustering quality measure e can be a function of either the cost function associated to the clustering algorithm, a clustering validation index, or, in the case of labeled data sets, the Representation Error (RE). RE is the count of data points in each cluster disagreeing with the majority label in that cluster, summed over all clusters and expressed as a percentage. Note that the introduction of the number of selected probes s in the computation of E penalizes situations in which the number of selected probes is high. This choice of E leads to the minimization of the cardinality of the set of probes able to achieve a good clustering quality measure. The balance between these two terms is controlled by A (regularization coefficient). 3. Experimental setup The method was tested on the publicly available Leukemia data by Golub et al. 8 . The Leukemia problem consists in characterizing two forms of acute leukemia, Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML). The original work proposed both a supervised classification task ("class prediction") and an unsupervised characterization task ("class discovery"). Here we obviously focus on the latter, but we exploit the diagnostic information on the type of leukemia to assess the goodness of the clustering obtained.
Table 1.
Choice of parameters.
Meaning
Symbol
Value
Number of random perturbations of g used to estimate the initial value of T
P
10000
Number of probes to be initially selected
so a
3
Cooling parameter Membership width parameter
0
10~ 6
Maximum number of iteration at each T
Jmax
2000
Minimum number of success for each T
flmin
200
Regularization coefficient
A
0.9
io-2
Minimum number of bits to be switched
^ m i n i min
Maximum number of bits to be switched
t^maxi ^max
1,1 s, 5
Number of clusters
C
3
FCM fuzziness parameter
m
2
FCM trials
r
10
u
The training data set contains 38 samples for which the expression level of 7129 genes has been measured with the DNA microarray technique (the interesting human genes are 6817, and the other are controls required by the technique). These expression levels have been scaled by a factor of 100. Of these samples, 27 are cases of ALL and 11 are cases of AML. Moreover, it is known that the ALL class is in reality composed of two different diseases, since they are originated from different cell lineages (either T-lineage or Blineage). In the data set, ALL cases are the first 27 objects and AML cases are the last 11. Therefore, in the presented results, the object identifier can also indicate the class (ALL if id < 27, AML if larger). In 6 we presented an extended experimentation using the FCM algorithm 2 and comparing the following approaches: (1) FCM on the original data set (RD); (2) FCM in the Distance Embedding Space (DES) with different probe/data ratios; (3) FCM in the Membership Embedding Space (MES) with different probe/data ratios. For each experiment we made 1000 independent trials, each of them using a different random initialization of the membership in the FCM algorithm. In all trials probes were extracted at random (using an uniform pdf) from the data set without replacement, the number of clusters was set to 3, and the fuzziness parameter m of FCM was set to 2. The last approach (3), projecting the data set into the membership embedding space, lead to better results. Moreover, increasing the parameter f3 from 1 0 - 8 to 10~ 6 we obtained for increasing probe/data ratios (from .8 to .4) a shift of the optimal error ratio.
0
5
1 10
1 15
(a) Figure 2. rithm.
1 20
1 25
1 30
r 35
1 0
i 5
i 10
i 15
i 20
i 25
r 30
35
(b)
RE (a) and number of probes selected (b) during a run of the SA-PS algo-
Starting from those previous results, we ran the SA-PS algorithm in the MES with the assumptions shown in Tab. 1. The value of the parameter (3 used in the experiments (/? = 10~ 6 ) was about the reciprocal of the mean distance between patterns. As a clustering quality measure we used the Representation Error (RE) evaluated as the best value obtained on r = 10 independent trials of FCM. Each independent run of the SA-PS algorithm finds a different small subset of probes leading to a clustering Representation Error equal 0. In Fig. 2, the Representation Error and the number of selected bits of g are plotted versus the iteration number during a run of the SA-PS algorithm, where each iteration corresponds to a different value of temperature T. In this case, at iterations 31, 33, 34 and 35 we obtained 4 different sets of 3 probes giving clustering RE equal 0. 4. Conclusions Exploratory analysis of genomic data sets using unsupervised clustering techniques, are often affected by problems due to the small cardinality and high dimensionality of the data set. A way to alleviate those problems lies in performing clustering in an embedding space where each data point is represented by a vector of its memberships to fuzzy sets centered on a set of probes selected from the data set. In previous work, this approach has been demonstrated to lead to significant improvements with respect the
624 application of clustering algorithms in the original space and in the distance embedding space. In this paper we have presented a constructive technique based on Simulated Annealing able to select sets of probes for clustering in the embedding space of fuzzy memberships. The application of the proposed probe selection algorithm combined with FCM to the Leukemia data by Golub et al 8 leads to high quality clustering solutions. References 1. C.C. Aggarwal and P.S. Yu, Redefining clustering for high-dimensional applications, IEEE Transactions on Knowledge and Data Engineering 14 210-225 (2002). 2. J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms. Plenum, New York (1981). 3. K. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, When is nearest neighbor meaningful? In: 7th International Conference on Database Theory Proceedings (ICDT'99), Springer-Verlag 217-235 (1999). 4. R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New York (1973). 5. J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3 32-57 (1974). 6. M. Filippone, F. Masulli and S. Rovetta, Clustering Genomic Data in the Membership Embedding Space. In: CI-BIO Workshop on Computational Intelligence Approaches for the Analysis of Bioinformatics Data, MontrealCanada, IEEE, Piscataway, NJ, USA (2005), http://ci-bio.disi.unige.it/CIBIO-booklet/CI-BIO.html. 7. A. Fred, J. Leitao, A new cluster isolation criterion based on dissimilarity increments. IEEE Trans, on PAMI, 25(8) 944-958 (2003). 8. T. Golub, et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531-537 (1999). 9. S. Kirkpatrick, C D . Gelatt and M.P. Vecchi, Optimization by simulated annealing. Science, 220 661-680 (1983). 10. F. Masulli and S. Rovetta, A New Approach to Hierarchical Clustering for the Analysis of Genomic Data. In: Proc. I.J.C. on Neural Networks, MontrealCanada, IEEE, Piscataway, NJ, USA (2005). 11. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller, Equation of state calculations for fast computing machines. Journal of Chemical Physics, 21 1087-1092 (1953). 12. E. P§kalska, P. Paclik and R.P.W. Duin, A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2 175-211 (2001). 13. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press (2004).
BICA A N D R A N D O M SUBSPACE ENSEMBLES FOR D N A MICROARRAY-BASED DIAGNOSIS
B. APOLLONI AND G. VALENTINI Dipartimento di Scienze dell'Informazione, Universita degli Studi di Milano, Via Comelico 39/41, 20135 Milano, Italy {apolloni, valentini}Qdsi.unimi.it A. BREGA Dipartimento di Matematica "F. Enriquez", Universita degli Studi di Milano Via Saldini 50, 20133 Milano, Italy andrea. [email protected] We compare two ensemble methods to classify DNA microarray data. The methods use different strategies to face the course of dimensionality plaguing these data. One of them projects data along random coordinates, the other compresses them into independent boolean variables. Both result in random feature extraction procedures, feeding SVMs as base learners for a majority voting ensemble classifier. The classification capabilities are comparable, degrading on instances that are acknowledged anomalous in the literature.
1. I n t r o d u c t i o n The traditional taxonomy of malignancies, based on their morphological, histopathological, and clinical characteristics, may be sometimes ineffective for a correct diagnosis and prognosis of tumors 1. Indeed a more refined diagnosis may be achieved exploiting the genome-wide bio-molecular characteristics of tumors, using high throughput bio-technologies based on large scale hybridization techniques (e.g. DNA microarray) 5 . One of the main drawbacks that characterizes DNA microarray data is represented by their very high dimensionality and low cardinality (problem known as curse of dimensionality). Hence several works pointed out the importance of feature selection methods to reduce the dimensionality of the input space 7 . An alternative approach is represented by data compression techniques that can reduce the dimensionality of the data, while approxi625
626 mately preserving their information content. As for their processing, several authors recently proposed to apply ensemble methods for improving the performance of state-of-the-art classification algorithms in the context of gene expression data analysis 4 . In this paper we compare two ensemble methods based on datacompression techniques for DNA-microarray-based diagnosis. The first one exploits random projections to lower dimensional subspaces 8 , while the second performs data compression through a Boolean Independent Component Analysis (BICA) algorithm 13 . While the first method has just been applied to gene expression data analysis 3 , BICA has never been previously applied to DNA microarray data analysis. In the next two sections we introduce the methods, and in Sect. 4 we experimentally analyze the effectiveness of the two approaches, applying them to DNA microarray-bases diagnosis of tumors.
2. R S E : R a n d o m S u b s p a c e E n s e m b l e The dimensionality reduction in the context of supervised analysis of data is usually pursued through feature selection methods. Many methods can be applied, such as filter ones, wrapper methods, information theory based techniques and "embedded" methods (see e.g. 6 for a recent review). We recently experimented a different approach 3 based on random subspace ensemble methods 8 . For a fixed n, n features (genes) are randomly selected, according to the uniform distribution. Then the d a t a of the original cf-dimensional training set is projected to the selected n-dimensional subspace. The resulting data set is used to train a suitable base learner and this process is repeated v times giving raise to an ensemble of v learning machines trained on different randomly selected subsets of features. The resulting set of classifiers are then combined by using majority voting. This method, that can be implemented easily in a parallel way, avoids some computational difficulty of feature selection (that is an NP-hard problem). Anyway feature selection methods can explicitly select sets of relevant features, while this information cannot be directly obtained through RS ensembles. On the other hand, with different random projections of the data we can improve diversity between base learners 9 , while the overall accuracy of the ensemble can be enhanced through aggregation techniques. As a consequence the performance of a given classification algorithm may be enhanced. A high-level pseudo-code of the method is summarized in Fig. 1. In particular, S u b s p a c e _ p r o j e c t i o n procedure selects a n-subset
R a n d o m Subspace Ensemble Algorithm Input: - A data set V = { (XJ , tj) 11 < j < m], x 3 € X C Md, tj € C = { 1 , . . . , A:} - a learning algorithm C - subspace dimension n < d - number of the base learners m Output: - Final hypothesis hTan : X —> C computed by the ensemble, begin for i = 1 to v begin Di = Subspace_projection(X>, n) hi = C(Di) end ftron(x) = argmax t g ccard({i|/jj(x) = t}) end. Figure 1.
High-level pseudo-code of the RSE method
A = { a i , . . . , a „ } from { 1 , 2 , . . . ,d}, and returns as output the new data set Di = {(PA(XJ), tj)\l <j< m}, where PA(xu ...,xd) = (xai ,...,xaJ. The new data set Di is then given as input to a learning algorithm C which outputs a classifier hi. All the classifiers obtained are finally aggregated through majority voting, where cardQ measures the cardinality of a set. 3. B I C A network A suitable way of taking decisions based on data is to split the decision process in two steps. The first is devoted to preprocessing data in a feasible way such that they can be interpreted in the second one. As for the former, it mirrors real vectors into boolean ones, that should reflect relevant features of the original data patterns. Stressing the fact that independence is a property of the representation of the data that we use, we search for this property precisely on a concise Boolean representation of them suitable for their correctly partition into positive and negative inputs of our decision rule. Accordingly, we call the mirroring method Boolean Independent Component Analysis, BICA for short. 3 . 1 . The
architecture
We split the mirroring of the original data into the target Boolean vector in two parts: a true mirroring of the patterns and a projection of a compressed representation of them (obtained as an aside result of the first part) into
the space of Boolean assignments. The whole process is done by a neural network with an architecture shown in Fig. 2 sharing the same input and hidden layer with the two output segments A and B computing the Boolean assignments and a copy of the input, respectively. Part A: Propositional Variable Vector v = (v\,vi,... ,vn)
Part B: Mirroring of Pattern Vector
Pattern Vector x = (x\ , #2, • • •, xj) Figure 2. Layout of the neural network mapping features to symbols.
3.2.
The learning
algorithm
We train this network with a backpropagation algorithm 10 as follows. Error backpropagation in part B . As customary with this functionality 11 , we structured our network as a three-layer network with the same number of units in both input and output layers and a smaller number of units in the hidden layer. Therefore the hidden layer constitutes a bottleneck which collects in the state of its nodes a compressed representation of the input. This part of the network is trained according to a quadratic error function and usual formulas 12 . Error backpropagation in part A . Things are different for the units of part A of the output. In this case we require that the network minimizes the following error:
Es = In (f[ z-*"*(l - z. lfc )- (1 -*" fc) J
(1)
where zsj is the output of the unit j upon presentation of s-th pattern. This function, which we call the edge pulling function, has the shape of an entropy measure that finds its minima in the vertices of the neural network output space (see Fig. 3).
Es
Figure 3. Graph of the function Es with n = 2.
The error which is backpropagated from the units of part A is: S
a,k = / a c t ( n e t s , f c ) a s , k
(2)
where netSjj is a weighted sum of the inputs to j - t h unit on s-th pattern, / a c t is the sigmoid function, and dEs ( a*,* = --a =ln
za
\
(3)
In addition,we insert a syntactic feedback into eq. 3 through an extra term which has the form of a 'directed noise' 6Sik added to the initial value of Q when we are not satisfied with the 'correctness' of the result. Namely, when the Hamming distance between vectors corresponding to patterns belonging to different classes falls below a given threshold, we assume patterns with the minority label incorrect. Then, denoting rStk the specific punishment to the neuron k for an incorrect pattern s, 0s>k reads: 9a,k = {l-2T{zB,k))re>k
(4)
where T is a threshold function. The first term in the brackets specifies the sign of 6s,k so that the contribution to the network parameters is in the opposite direction from the one the unit is moving in. Finally, using a tuning parameter TTA to balance parts B and A, as%k reads: <*s,k = TA (flg.fc+ In f
J*
•))
(5)
The joint goal of minimizing Es and maintaining patterns well separated in two categories brings the Boolean assignments to figure as samples of independent random variables, thus we may say that these variables are expectedly independent. More precisely, the following lemma has been proved in 13 :
L e m m a 3 . 1 . With reference to the neural network and training algorithm described above, if the neural network outputs are correct and all close to the vertices of the Boolean hypercube then their values stretched to the vertices constitute assignments to expectedly independent Boolean variables. We repeat v times also this process getting different maps, as a consequence of the random initialization of the network parameters, and different base learners trained on the encoded training sets. Finally, we compute for each sample of the training set the frequency with which base learners answer 1, and we gather frequencies corresponding to either positive or negative samples. In the lose assumption that frequencies in each group follow a Gaussian distribution we locate a threshold at the cross of their p.d.f.s 14 , i.e.
A-<7++£+<7(7_ + < 7 +
where /t_ and CT_ are the sample estimate of parameter fi- and CT_ of the negative distribution; idem for the positive distribution. With this threshold we classify test set records giving label 1 to those whose 1 frequency according to trained base learners overcome the threshold.
4. C o l o n T u m o r Classification 4 . 1 . Experimental
setup
In order to compare the two approaches, we applied the two ensemble methods to the classification of DNA microarray data relative to colon tumor samples 2 , composed of 2000 genes and 62 samples, 40 tumoral tissue samples and 22 normal samples. We evaluated the generalization performances of the two ensembles using multiple hold-out techniques: we randomly split the data in two equally-sized training and test sets, repeating this process 50 times. Then the average error on the test set has been computed. In both ensembles we used 60 Support Vector Machines (SVMs) as base learners. With RS ensembles we applied different projections into random subspace with dimension from 16 to 1024, and used linear SVMs, tuning their regularization parameter. With BICA network we mapped from R 2 0 0 0 to {0,1} space, and we used a second order kernel SVMs, as a result of a model selection procedure.
631 4.2. Results
and
discussion
Comparing the results obtained with the ensemble methods with those obtained with single SVMs, we can register a significant enhancement achieved with the ensemble approach w.r.t. the single SVMs (Tab. 1). On the other Table 1. Classification accuracy in single SVMs, BICA and RSE ensembles. single SVM
RSE
BICAe
test accuracy
0.67
0.828
0.792
a
0.07
0.05
0.09
train accuracy
0.80
1.000
0.98
a
0.07
0.00
0.02
hand, there is no a substantial difference between the performances of the two ensemble approaches, with only a slight improvement obtained with RSE ensembles. In order to understand if the errors of BICA and RSE ensembles are approximately distributed on the same examples, we also analyzed the frequencies of their errors in function of the pattern examined across the 50 test sets used in the multiple hold-out experiments. Interestingly enough, the two ensemble methods show their largest errors on the same examples (apart a few discrepancies). The largest errors are concentrated on samples 45, 49, 51, 55 and 56 for both the ensemble methods. As explained in 2 , most normal samples are enriched in muscle cells, while tumor samples are enriched in epithelial cells. The above samples consistently misclassified by both ensemble methods present an "inverted" tissue composition: normal samples are rich in epithelial cells, tumor samples are rich in muscle cells. This fact shows that the separation between normal and tumoral samples is also made on the basis of tissue composition, as observed in 7 . The best results with RSE have been obtained through random projections into 64-dimensional subspaces. BICA requires only 20 bits. As a matter of fact both encodings do not represent a real strong compression of DNA data, since we need 60 different maps to obtain a satisfactory classification. We note however that the 63% of the database is well classified using a single SVM and 93% using only 3 SVMs. Moreover, only 14 variables are used by the mentioned single SVM involving in own turn only 151 features uniformly distributed within the topology of the bench of the 2000 features supplied by the micro-array.
These results suggest that BICA technique could be in perspective applied to discover genes relevant for tumor discrimination that may be validated by the RSE ensembles. Acknowledgments We would like to thank the anonymous reviewers for their comments and suggestions. References 1. A. Alizadeh et al. Towards a novel classification of human malignancies based on gene expression. J. Pathol, 195:41-52, 2001. 2. U. Alon, et al. Broad patterns of gene expressions revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 96:6745-6750, 1999. 3. A. Bertoni, R. Folgieri, and G. Valentini. Bio-molecular cancer prediction with random subspace ensembles of support vector machines. Neurocomputing, 630:535-539, 2005. 4. S. Dudoit, J. Fridlyand, and T. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. JASA, 97(457):7787, 2002. 5. M. Eisen and P. Brown. DNA arrays for analysis of gene expression. Methods Enzymol, 303:179-205, 1999. 6. I. Guyon and A. Elisseeff. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3:1157-1182, 2003. 7. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46(l/3):389-422, 2002. 8. T.K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on PAMI, 20(8):832-844, 1998. 9. L.I. Kuncheva and C.J. Whitaker Measures of diversity in classifier ensembles. Machine Learning, 51:181-207, 2003. 10. D.E. Rumelhart, G.E. Hinton and R.J. Williams. Learning Internal Representations by Error Propagation. MIT Press, Cambridge, 318-362, 1987. 11. J. Pollack. Recursive distributed representation. Artificial Intelligence, 46:77105, 1990. 12. C M . Bishop. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995. 13. B. Apolloni, A. Esposito, D. Malchiodi, C. Orovas, G. Palmas, J.J Taylor. A general framework for learning rules from data. IEEE Trans, on Neural Networks, 11:6, 2004. 14. R.O. Duda,P.E. Hart. Pattern classification and scene analysis. John Wiley & Sons, New York, 1973
PREDICTION OF SCLEROTINIA SCLEROTIORUM flJB) DE BAEY DISEASE ON WINTER RAPESEED {B. NAPUS) BASED ON GREY GM (1,1) MODEL* GUIPING LIAO Institute of Agricultural Information Technology, Hunan Agricultural Changsha, Hunan, 410128, China
University,
FEN XIAO College of Bio-Safety Science and Technology, Hunan Agricultural Changsha, Hunan, 410128, China
University,
Abstract: A novel Grey forecasting model for predicting S. sclerotiorum (Lib) de Bary disease on winter rapeseed (B. napus) is built based on Grey GM (1,1) model. The residual error test and the posterror test methods were used for calibration of the model. Different from other conventional forecasting methods, the GM (l,l)-based Grey calamity prediction forecasts a prediction Ar to infer the probable year of Sclerotinia disease outbreaks according to the origin, and then uses the result to recommend spraying a field or not in order to avoid unnecessary fungicide application. Based on practical experiments in Hunan province, the threshold (^) of the disease rate at the time when winter rapeseed begins to flower is defined as 5, and^ < 5 is called a down-calamity. The Grey forecasting model was tested at the 7 stations in 2004 and 2005 and predicted the probable year of Sclerotinia disease outbreaks and the need for fungicide application with thefirst-classgrade and high accuracy.
1. Introduction Sclerotinia sclerotiorum (Lib) de Bary causes diseases and significant yield losses on oil crops. The grain yield losses can be 100%[1]. In China, the yield reduction can reach 50%[2]. Applying fungicides, crop rotation and selecting disease resistant cultivars are currently the major methods of controlling this disease. However, fungicide treatments are often ineffective, because they are either not well-timed or unnecessary, and fungicidal chemicals are expensive and not all environmentally safe. Therefore, growers desire a new and convenient approach to predict Sclerotinia disease on winter rapeseed and to tell them whether or not to apply fungicides spray. ' This work is supported by Provincial Natural Science Foundation of Hunan (BK.04X7-04TT3021) and the Scientific Research Foundation of Department of Education of Hunan Province (04C290).
633
Different approaches have been applied to forecast S. sclerotiorum disease on rapeseed, such as testing petals for Sclerotinia infestation on agar plates [3], serological test [4], a heat unit system, and the checklists method [5]. Recently, decision support system (DSS) was used to forecast plant disease [6]. However, these systems require that many factors be taken into account, which is inconvenient for practical applications to growers. The Grey forecasting model is the core of the theory. GM (1,1) has been successfully applied for solving time series data in finance, physical control, management, engineering and economics with insufficient data [7,8,9]. Lin and Wang [10] proved that a Grey-forecasting model has much higher prediction validity than the Markov chain model. In this paper, using the Grey Systematic Theory we develop a Grey GM (1,1) model for predicting Sclerotinia disease on winter rapeseed (B. napus) based on the data observed in a time series from 1995 to 2005. 2. Grey GM (1,1) modeling principle Assume an original series of a given datax(0)is defined as
xm=(Xr,xi°\...,x^)
(i)
Where xm is the number of data observed at time t, t = 1, 2, . . ., n. A new data sequence xm can be generated by one-order accumulated generating operation (AGO) from the original^(0) sequence, as follows: xm=(x?\xV,...,x™)
(2)
Where *<» = ,<«, and jf> = £_ f 4°>, / = 1, 2 n. From the accumulated sequence xm we can form GM (1,1), which corresponds to the following whitened first-order differential equation: dxm/dt + cdl)=b
(3)
Where a represents the developed coefficient, b represents the grey controlled variable, which are both unknown variables. Therefore, the solution can be written as *<;> = (x<°> -b/a)e-' +b/ayt>l
(4)
Taking the inverse accumulated generating operation (IAGO) to Jc^,, and then the predictive value of original sequence will be obtained as: *%=i%S?\Vt}>l
(5)
The variables a and b can be solved by the ordinary least-square method as: = (BTB)"'BTy
(6)
Where matrix B is -(*
1
-(* 3 (,) +^")/2
1
B=
*<»> (7)
. V -(*I' ) +^.)/2
-<°>
Finally, using the residual error test method and the posterror test method can test the error between the forecasted value and the actual value. The forecasting errors were defined as ^=*<°>-Jc<°\/ = l,2,..,»i
(8)
And the forecasting percent errors were defined as (9) The standard deviation of the original time series (Si) and the forecasting errors (S2) are as below:
*.=^L>' m - 3f(0, >7»
(10)
do Where x (0) and a are the mean of the original time series and the forecasting errors. The posterror ratio C is derived by dividing S2 by Su i.e. C= S2/S\. The lower the C is, the better the model is. The posterror ratio can indicate the change rate of the forecasting error. The probability of small error is defined as p = prob.{krr -q\<0.6745.$,},f = 2,3,...,«
(12)
That p is another indicator of forecasting accuracy shows the probability that the relative bias of the forecasting error is lower than 0.6745. p is commonly required to be larger than 0.95. The pairs of the forecasting
indicators p and C can characterize four grades of forecasting accuracy, as shown in table 1. Table 1. The grades of forecasting accuracy [11] Forecasting indicators P C
Grade Good >0.95 <0.35
Qualified >0.8 <0.5
Just >0.7 <0.65
Unqualified 0.7 0.65
3. GM (1,1) Modeling 3.1. Data Collection The data (table 2) collected from 1995(SN=1) to 2005(SN=11) are the average from 7 long-term agricultural experimental stations for winter rapeseed production distributed at Changsha, Changde, Yueyang, Hengyang, Huaihua, Nan, and Cili, in Hunan Province. The sampling date was at the beginning of rapeseed flowering at all 7 sites. And the sampling method was counted in 3 fields, with 50 plants sampled at random per field (5 points), for a total of 150 plants sampled at every site. Disease rate (%) is defined as the percentage of the disease plants in all sampling plants. Table 2. The disease rate of S. scleroliorum disease on winter rapeseed in 1995-2005 SN 1 2 3 4 5 DR 5.37 6.40 3.80 5.30 1.38 SN: Serial Number, DR: Disease Rate (%).
6 3.60
7 2.66
8 3.19
9 4.05
10 3.49
11 5.65
3.2. Data Processing 3.2.1. Data Processing Method According to practical experiments in Hunan province, if the disease rate is at or above 5 during investigation, i.e. the beginning flowering stage of winter rapeseed, fungicide spraying is necessary; otherwise disease outbreaks usually ensues. So, we let the threshold ( / ) be 5, a n d / < 5 is called the down-calamity year of the disease. Mapping the down-calamity year of the disease, f • {xm} _»{x(0)}, the downcalamity set can be obtained as:
,.(0)
: x
(0)
(0)
„(0)
( /m>x/(2)
\
(13)
Where ^ ,and* 0 e * « V = l,2,...,6. So, based on table 1, the mapping data series x'm is expressed as *'(0) = {4,5,6,7,8,9}
(14)
Here we only take the data from 1995 to 2003 to construct the Grey model, and keep the data of 2004 and 2005 for test. 3.2.2.
Data smoothing
verification
Before modeling GM (1,1) based on down-calamity set*'10', the set of x,(0) must meet the condition of data smoothing:
*'!7x;>i°: <s
(15)
W h e r e O < £ < U = l,2,..,6. When /=3, 4, 5,6 then e =0.67, 0.467, 0.36, 0.3 respectively. The results obviously reveal that setting t sufficiently large (t£3), the smoothing condition can be meet. So, modeling can be done based on x'(m •
3.3. Model construction Fromx' <0) , one-order AGO sequence ofx'(l) is obtained as follows: x'm = {4,5,15,22,30,39}
(16)
From Eq. (7), matrix B and constant vector YN are accumulated as below:
B
-6.5 -12 •17.5 -26 •34.5
(17)
And then (BTB)"' and BTYN are easily calculated as follows: (B T B)-'
0.0020 0.0395
From Eq. (6), a is obtained as
0.0395 0.9705
B T r„
-752.5 35
(18)
-0.1418 4.2340
(19)
Therefore, the Grey prediction model is acquired by substituting a and b into Eq. (4): JC'JVI = 33.8500e01418' - 29.8500(? > 1)
(20)
3.4. Model testing 3.4.1. Residual error testing of model The actual value, the forecasted value, the residual error, the percent residual error and the average residual error can be obtained from Eq. (20), (14), (8) and (9). Table 3 shows that the average residual error is 0.0106; the maximal residual error is -0.1585, the minimal residual error is 0.0551 and the origin residual error is -0.0979( "•" f>6 is called prediction, •'• t=6 is defined as the origin). These statistics indicate the efficiency of the proposed Grey prediction model. Table 3. Residual error test of the Grey prediction model
/ value 1 2 3 4 5 6
Actual value (Eq. (14)) 4 5 6 7 8 9
Forecasted Value (Eq. (5)) 4 5.1585 5.9449 6.8508 7.8948 9.0979
Residual error (Eq. (8)) 0 -0.1585 0.0551 0.1492 0.1052 -0.0979
Percent residual error (%) (Eq. (9)) 0 -3.0722 0.9261 2.1785 1.3326 -1.0765
Average residual error 0.0106
3.4.2. Posteriori deviation testing of model Posteriori deviation testing is a kind of statistic testing based on probabilities distribution of residual error. From Eq. (10) and (11), the posterror ratio C can be calculated as: C = s2/st =0.0698 When t=2,..., 6, the probability of small error can be obtained from Eq. (12) respectively. Obviously p = \ and all the probabilities of small error are met: q^-q\<
1.1519,/ = 2,3,...,6
639 The values of C and p indicate that the forecasting accuracy of the Grey prediction model (Eq. (20)) is the first-class "Good" according to table 1, and also show that the Grey prediction model has a good generality extrapolation, and so it can be applied to forecast. 3.5. Prediction and verification Let 2003 (i.e.*6 = 9 ) be the origin (a reference point), from the Grey prediction model (Eq. (20)), we can easily computed: x'f = 3 8.9469, X''0 = 49.4313, x''" = 61.5136 We can also obtained according to Eq. (5): *'<0) = 10.4844, i'<0) = 12.0822 Therefore AT" is generated from A T - x'J0) -x'f 0>6). Let? = 7 , thenAF = 10.4844-9 * 1. The result indicates that the first year of the down-calamity is 2004 compared the origin (2003+1). Similarly let/ = 8, thenAr = 12.0822 - 9 * 3. The result shows the second year of the down-calamity will be 2006 (2003+3); by contrast, it reveals that 2005 is a calamity year. Obviously the predicted results agree with the actual data in 2004 and 2005. Also we can go on predicting in the same way. 4. Discussion and conclusion The Grey GM (1,1) model described in the paper resolves the problem and focuses on information insufficiency in analyzing and forecasting the disease on winter rapeseed. Compared to previous models mentioned above (introduction) the Grey predicting model is very easy to use, distribute and program into the computer, and requires no laboratory facilities and special training; moreover it is a highly accurate predictor. The results of testing and verification of the model has demonstrated that the Grey GM (1,1) model for forecasting the Sclerotinia disease is correct and accurate by statistics and practical prediction in 2004 and 2005. In conclusion, the Grey GM (1,1) model developed herein is only one of the novel means available to winter rapeseed producers in Hunan province for accurately forecasting the next calamity year of S. sclerotiorum outbreaks. The results illustrate that the residual error of the forecasting model is lower than 10%. Furthermore, the model has better quality prediction validity and is clearly a viable means of forecasting Sclerotinia disease of winter rapeseed. Meanwhile,
640 the proposed forecasting model requires only few data, compensating for the limitations of earlier studies. Also the method requires no biological samples from the rapeseed field, and hence no processing of such samples. Consequently, it is more convenient for growers to operate, and growers in China can better predict the demand for spraying against Sclerotinia disease by using the new Grey-predicting model. Fungicides can be sprayed only when necessary. Therefore, any adverse impacts of fungicides on the environment can be avoided, permitting the sustainable development of winter rapeseed production. Acknowledgments The authors are grateful to Dr. Pierre Fobert from Plant Biotechnology Institute of National Research Council of Canada for his very helpful comments. We would also like to thank the Provincial Department of Agriculture for sharing data. References 1. Purdy LH, Phytopathology, 69, 875(1979). 2. GUAN C.Y., LI F.Q., LI X., CHEN S.Y., WANG G.H., Liu Z.S., Acta Agronomica Sinica, 29(5), 715(2003). 3. Turkington, T.K. and Morrall, R.A.A., 1993. Phytopathology, 83, 682(1993). 4. Jamaux, I. and Spire, Y)., Plant Pathology, 43, 847(1994). 5. Eva Twengstrijm, Roland Sigvald, Christer Svensson and Jonathan Yuen, 1 Crop Protection, 17, 405(1998). 6. Shtienberg D., Crop Protection, 19, 747(2000). 7. Hsu, C.I., and Wen, Y.H., Transportation Planning and Technology, 22, 87(1998). 8. Lin, C.T. and Yang, S.Y., Technological Forecasting and Social Change, 70, 177(2003). 9. Tsaur, R.C., International Journal of Computer Mathematics, 82, 141(2005). 10. Lin, C.T., and Wang, S.M., International Journal of Information Management Science. 11, 13(2000). 11. Deng, J.L., System & Control Letters, 5, 288(1982). 12. Deng, J.L., Journal of Grey System, 1, 1(1998).
PART 3
Applied Research And Nuclear Applications
This page is intentionally left blank
IDENTIFICATION OF SEISMIC ACTIVITIES THROUGH VISUALIZATION AND SCALE-SPACE FILTERING* CHENGZHIQIN' Institute of Geographic Sciences and Natural Resources Research, CAS Beijing 100101, China YEE LEUNG Department of Geography and Resource Management, Center for Environmental Policy and Resource Management and Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Shatin, Hong Kong JIANGSHE ZHANG Institute for Information and System Sciences, Xi 'an Jiaotong University XVan, Shaanxi 710049, China The identification of seismic active periods and episodes in spatio-temporal data is a complex scale-related clustering problem. Clustering by scale-space filtering is employed to give a quantitative basis for their identification. Visualization methods are employed to facilitate researchers to interactively assess and judge the clustering results by their domain specific experience in order to obtain the optimal segmentation of the seismic active periods and episodes. The real-life applications in strong earthquakes occurred in Northern China confirms the effectiveness of such an integrative approach. 1. Introduction Finding natural clusters in spatio-temporal databases is a major issue in geographical analysis. A typical example in seismology is the identification of seismic active periods and episodes in the temporal domain. In larger spatial context, the temporal sequence of strong earthquakes is not quite stochastic but a certain pattern of clustering with interspersed quiescence and active periods [1,2]. The regional seismic activity can be segmented into the seismic active periods and the seismic active episodes on the finer temporal scale [2]. * This work is supported by National Natural Science Foundation of China (No. 40225004). * Work partially supported by National Natural Science Foundation of China (No. 40501056).
643
644 Quantitative analysis of seismic active periods and episodes has important implications to the understanding and forecasting of long- and medium-term earthquakes. Due to the complexity of earthquakes, the studies of seismic activities often call for the seismologists' expertises with simple statistical indices [3,4]. To make the analysis more rigorous and results easier to evaluate, quantitative methods are often needed in conjunction with domain expertise [5]. Cluster analysis has thus become a common approach to study seismic activities [6], Due to the complexity and scale feature of earthquakes, the black-box-style clustering algorithms could not be generally applied. They also fail to give satisfactory answers to the following questions: 1) how many clusters? 2) how to evaluate the validity of a clustering result? An appropriate solution to these problems requires a powerful analytical method which gives a quantitative description, and a visualization method which assists experts to interactively visualize the quantitative results and judge according to their experience. Most of the clustering methods are sensitive to initialization and incapable of determining the optimal number of clusters. To overcome these difficulties, Leung et al. [7] proposed a clustering algorithm simulating our visual system which considers a dataset as an image with each light point located at a datum position. By modeling the blurring effect of our lateral retinal interconnections based on scale space theory, smaller light blobs merge into larger ones until whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process generates a family of clusterings along the hierarchy. This approach is computationally stable and insensitive to initialization. Since visualization was proposed in 1986 [8], geo-scientists have employed it in a variety of studies p]. The impetus for the use of visualization in geographical analysis is to interactively integrate the qualitative analysis based on geo-knowledge, which is difficult to depict by mathematics, and the quantitative computation which caters for massive geo-datasets and complex geo-processes [9,10], The identification of seismic active periods and episodes is actually the problem to which the interactive use of visualization and clustering by scale-space filtering can be aptly applied. The objective of this paper is to achieve such an integration. In Sec. 2, we describe the visualization of clustering by scale-space filtering and indices for cluster validity check. It is applied to segment seismic active periods and episodes in Sec. 3. The paper is then concluded by a discussion.
2. Methodology Scale-space theory was originated in digital signal processing [11]. It applies multiple-scale filtering (often a Gaussian filter with a scale parameter) to describe a digital signal or image on different scales resulting in a hierarchical structure. The process is similar to the visual process. By simulating our visual system, Leung et al. [7] developed a clustering algorithm based on scale-space filtering. To facilitate discussion, we first review the algorithm. 2.1. Clustering by Scale-Space Filtering Without prior assumptions, it is proven that one can blur the image in a unique and sensible way via the convolution of an image p(x), P(x, s), with the Gaussian kernel, i.e., P(x,o)=p(x)*g{x,o)=\p(x-y)-—re
*'ay
(1)
2jca
where g(x,a) is the Gaussian function; s is the scale parameter; (x^)-plane is the scale space; and P(x,s) is the scale-space image. There is a direct relation with neurophysiological findings in animals and psychophysics in man supporting this theory [12]. Leung et al. [7] deduced a numerical solution for following the blob centers (or maxima which represent the cluster centers) at each given scale s. The numerical solution can be interpreted as an iterative local centroid estimation. For higher computational efficiency, the algorithm uses the centroid of a cluster instead of data points in the cluster and becomes N
*(" + D = ^ Ikje
\*W-PJ\2
3-
±<±d_
(2)
2
°
where p} is cluster center j obtained at scale s,; TV, is the number of clusters at scale s{, kj is the number of data points in cluster j whose center is p/, and s =s,• s M . In practice, the {s,} sequence is constructed by S,-SM ~kSj.\ which guarantees the accuracy and stability of clustering [13]. The constant k is called the Weber fraction. Based on Weber's law in psychophysics, there is a lower bound for k since we cannot sense the difference between two images p{x, s^x)
646 and p(x, s,) when k is less than its Weber fraction. For instance, &=0.029 is enough in one-dimensional applications [7]. Based on Eq. 2, the hierarchical clustering algorithm is as follows: 1. Given a sequence of scales {s,} (/=0,1,2,...) with s o=0. 2. At s0=0, each datum is a cluster and its blob center is itself. Let j=l. 3. Cluster the data at scale sh Find the new blob center at j , for each blob center obtained at scale s,.i by Eq. 2. If two new blob centers are close enough, the old clusters disappear and a new cluster is formed. 4. If there are more than one cluster, let i=i+\ and go to step 2. 5. Stop when there is only one cluster. 2.2. Cluster Validity Checks in Clustering by Scale-Space
Filtering
The followed indices were defined in [7] to evaluate the validity of an obtained cluster and to facilitate the selection of the optimal clustering: • Lifetime. Lifetime of a cluster is the range of the logarithm scales over which the cluster survives. The mean of lifetime of clusters in a clustering is taken as the clustering lifetime. Longer lifetime means a more stable clustering because the real cluster should be perceivable over a wide range of scales. • Isolation of a clustering. This index measures the total isolation degree of clusters in a clustering. • Compactness of a clustering. This index measures the total degree of compactness of clusters in a clustering. Intuitively, a cluster is good if the distances between the data inside the cluster are small and those outside are large. For a good cluster, the isolation and compactness of the cluster are close to one. And the isolation and compactness of a good clustering should be as large as possible. 2.3. Visualization of Clustering by Scale-Space
Filtering
The visualization of clustering by scale-space filtering includes two phases which are inclined to visual representation and interactive analysis respectively. Different visualization techniques can be used because of their task-oriented characteristics [14]. In the first phase, the construction process of the scale space for clustering can be visualized naturally by a top-to-bottom tree-growing animation in 2D/3D views. Animation and interaction facilitate the generation of the original, qualitative cognition about the clustering in the whole scale-space. This phase suits the visual representation of the scale space. After the scale space is constructed, visualization based on the scale space and the indices for cluster validity can assist us to interactively construct, verify and revise at any scale our cognition (or assumption) about the optimal
clustering until the final result is obtained. Based on the quantitative indices, we can use the slider technique to select the scale of interest in free-style. The corresponding clustering result is shown by both the view of the scale space and the map or time sequence graph for interactively obtaining the optimal result. 3. Applications 3.1. Experimental Data In this application, we attempt to segment the periodic seismic activity of strong earthquakes in Northern China (34~42°N, 109~124°E), which forms a comparatively regional unit in whole for seismological analysis, via visualizing the clustering process by scale-space filtering. Considering the completeness of the strong earthquake catalogue, the datasets are chosen as the strong earthquakes (Ms>6.0) of 1290A.D.-2000A.D. which have 71 records [15]. 3.2. Temporal segmentation of Strong Earthquakes The scale space for the time sequence of earthquakes is depicted in Fig. 1. The number of clusters and the indices including Lifetime, Isolation and Compactness of clustering are shown in Fig. 2.
.
--..^LJCUD,
Figure 1. Temporal scale-space for earthquakes Scrutinization of the scale-space graph and indices calls for special attention to the patterns appearing in both the 59th~95th and the 6th scale steps (Fig. 2). In the 59lh~95th scale range, there are three clusters in the clustering with the maximum Lifetime, Isolation and Compactness of spreading across the longest scale range. It B thus the seismic active period recognized by the clustering algorithm (Fig. 3a). It actually corresponds to the Second, Third, and Fourth Seismic Active Periods singled out by the seismologists [16] (Table 1). In the 6* scale step, the number of clusters changes dramatically. After the 6th step, the change in clustering becomes comparatively smooth. This can be explained as that the earthquakes, which are temporally frequent preceding the 6th
648
10
20
30
50
M
60
70
80
90
scale step
a) Number of clusters changed with the scale step M 50 c
i i0
1 10
i ° (3 -10 o -20
a a
•3 -30
0
10
3
30
40
50
60
70
80
90
100
scale step b) Lifetime, Isolation and Compactness of clustering Figure 2. Indices of clustering along the time scale for earthquakes
II
( )K«
l n£<
1M
>X
a) 3 clusters in the 59,h~95,h scale range -t H
t—1 H H M
Ml.
™|j™»
*™«
Ml•
"
*
"
\
I j
1
r 1;
b) 17 clusters at the 6th scale step Figure 3. Ms-time plot of clustering results for earthquakes
i
"
'
step, merged rapidly as clusters when the observation scale increases in this scale range. When the time scale is larger than 6 and 7, however, clusters formed in more apparent relative isolations. Less clusters are formed in a relatively long scale range. The clustering result in the 6th scale step corresponds to the seismic active episodes recognized by the seismologists [16] (Fig. 3b; Table 1). Table 1 tabulates the seismic active periods and episodes recognized by our clustering algorithm versus that of the seismologists. It can be observed that the seismic active periods and episodes obtained by scale-space clustering are consistent with the results identified by the seismologists via their domain specific expertise, with the exception that the episodes of the Fourth Seismic Active Period recognized by our approach is not so consistent. It seems that there is quasi-periodicity of about 10~15 years for active episodes. Table 1. Seismic active periods and episodes obtained by the clustering algorithms and the Seismologists' expertise Active Period II III IV II
III
IV
Clustering result
Ref. [16]
Ref. [4]
1484-1730 1815-
1481-1730 1812-
2(?) 1 2 3 4 5 6 7 8 9 1 2 3 4 5
1484-1487 1497-1506 1522-1538 1548-1569 1578-1597 1614-1642 1658-1683 1695-1708 1720-1730 1815-1820 1829-1835 1855-1862 1880-1898 1909-1923
1481-1487 1501-1506 1520-1539 1548-1569 1580-1599 1614-1642
6
1929-1952
1921-1952
7
1966-1978
1965-1976
Active Episode
K?)
(Afe>6) 1290-1340(6) 1484-1730(31) 1815-(34) 1290-1314(5) 1337(1) 1484-1502(3) 1524-1536(2) 1548-1568(4) 1587-1597(2) 1614-1642(8)
1658-1695
1658-1695 (10)
1720-1730 1812-1820 1827-1835 1846-1863 1880-1893 1909-1918
1720-1730(2) 1815-1830(4) 1861 (1) 1879-1888(3) 1903-1918(4) 1922(1) 1929-1945(6) 1966-1983 (13) 1998- (2)
4. Conclusion Clustering by scale-space filtering can provide a series of clustering in different scales by simulating our visual systems. It is computationally stable and insensitive to initialization. It is also free from solving difficult optimization problems encountered by many clustering algorithms. The proposed algorithm constructs the clustering in scale space and computes the indices for cluster and clustering validity checks. And visualization assists researchers to derive clustering that is optimal in terms of computation and physical interpretation. The seismic active periods and episodes can actually be viewed as time scopes in which strong earthquakes in certain spatial region cluster as subsets on different temporal scales. The identification of seismic periods and episodes is a typical scale-related clustering problem to which clustering by scale-space filtering can be appropriately applied. The visualization of clustering provides a promising way to tightly couple background information and domain expertise with quantitative computation to make our solutions natural and meaningful. Since Gaussian scale-space theory is designed to be totally noncommittal, clustering by scale-space filtering cannot take into account a priori information on structures being worthy of preserving. In clustering by scale-space filtering, clusters tend to be spherical along the changing scale. This shortcoming can be alleviated by incorporating structure information in the visualization process. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Y. Kagan and D. Jackson, Geophys. J. Int. 104,117 (1991). Z. Ma and M. Jiang, Earthquake Research in China 3,47 (1987). M. Matthews and P. Reasenberg, Pure Appl. Geophys. 126,2 (1988). F. Gu, C. Zhang and J. Li, Earthquake Research in China 11, 341 (1995). Y. Kagan, Pure Appl. Geophys. 155,233(1999). C. Frohlich and S. Davis, Geophys. J. Int. 100,19 (1990). Y. Leung, J.-S. Zhang and Z.-B. Xu, IEEE T. Pattern Anal. 22,1396 (2000). A. McCormick,T. Defanti and M.Brown, Comput. Graphicsll, 103(1987). A. MacEachren and M.-J. Kraak, Comput. Geosci. 23,335 (1997). M. Gahegan, M. Wachowicz, M. Harrower and T.-M. Rhyne, Cartography and Geographic Information Science 28,29 (2001). T. Linderberg, IEEE T. Pattern Anal. 12,234 (1990). D. Hubel, Eye, Brain, and Vision, Scientific Am. Library, New York (1995). J. Koenderink, Bio. Cybern. 50,363 (1984). C. Qin, C. Zhou and T. Pei, Conf. AsiaGIS'2003, (2003). W. Huang, W. Li and X. Cao, Acta Seismological Sinica 7, 351 (1994). M. Jiang and Z. Ma, Earthquake 6, 5 (1985).
FUZZY APPROXIMATION NETWORK PERTURBATION SYSTEMS AND ITS APPLICATION TO RISK ANALYSIS IN TRANSPORTATION CAPACITY* ZOU KAIQI University key Lab of Information Science and Engineering of Dalian College of Information Engineering, Dalian University Dalian, Liaoning,
116622
University
P.R.China
By embedding the fuzzy neural networks, we construct a fuzzy approximation network perturbation system based on the human knowledge. A survey of the principle of this system is presented including the architectures and hybrid learning rules. In order to enhance the traffic productivity, using Fuzzy Logic Toolbox of MATLAB, wc establish a prediction modeling and apply it to risk analysis in transportation capacity. In terms of the simulation result, the modeling is quite good and decreases risk in transportation capacity.
1. Introduction Vagueness is a distinct characteristic of human thinking. It is not a defect of language, but rather an important source of clarity. Since Zadeh's fuzzy set theory was proposed, fuzzy logic has been successfully applied to control a pilot scale steam engine by Professor E.H. Mamdani in 1974, and a batch of products related to fuzzy technology was explored and applied in Japan. By now, fuzzy set theory and application has been focused on. In sec.2, the structure and algorithm of fuzzy inference system is reviewed. We promote the catastrophe fuzzy neural network system based fuzzy inference system in sec.3. In sec.4, the prediction transportation capacity of this system is explored using the MATLAB language in the field of traffic market to dress the problem that system modeling based on conventional mathematical tools is not well suited for dealing with ill-defined and uncertain systems such as traffic market.
This work is supported by National Nature Science Foundation of China, No. 60573072
651
652 2. Fuzzy Inference System Firstly, we introduce fuzzy if-then rules. Fuzzy if-then rules are the core part of a whole fuzzy inference system. Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. But in order to say anything useful we need to make complete sentences. Fuzzy if-then rules are the things that make fuzzy if-then rules are the things that make fuzzy logic useful. Fuzzy if-then rules or fuzzy conditional statements are expressions of the form if x is A then y is B, where A and B are lingual values of the linguistic variables x and y in the universes of discourse X and Y, respectively. The if-part of the rule "x is A" is called the antecedent or premise, while the then-part of the rule "y is B" is called the consequent or conclusion. Note that the antecedent is an interpretation that returns a single number between 0 and 1, whereas the consequent is an assignment that assigns the entire fuzzy set B to the output variable y. Depending on the types of fuzzy reasoning that fuzzy if-then rules employed, most fuzzy inference systems can be classified into three types. Here we only discuss the first-order Sugeno fuzzy model, which has rules of the form if x is A then y=kx+r, where A and B are fuzzy sets in the antecedent, while k and r are all constants. Its merits lie in compactness and efficiency. Secondly, a fuzzy inference system is composed of five functional blocks (see Figure 1): Knowledge bases Data bases Fuzzification interface
Rule bases
X
Defuzzification interface
±
Decision-making unit Figure 1. Fuzzy inference system where, a rule base containing a number of fuzzy if-then rules; a database defines the membership functions of the fuzzy sets used in the fuzzy rules; a decision-making unit performs the inference operations on the rules; a fuzzification interface transforms the crisp inputs into degrees of matching with linguistic values; a defuzzfication interface transforms the fuzzy results of the inference into a crisp output.
653 In general, the rule base and the database are jointly referred to as the knowledge base. The steps of fuzzy reasoning (inference operations upon fuzzy if-then rules) performed by first-order Sugeno fuzzy model are: 1. Compare the input variables with the membership functions on the premise part to obtain the membership values of each linguistic label.(This step is often called fuzzification) 2. Through multiplication operator combine membership values on the premise part to get firing strength (weight) of each rule. 3. Generate the qualified consequent with a crisp value on the weight of each rule. 4. Aggregate the qualified consequent to produce a crisp output derived by weighted average defuzzification (This step is called defuzzification) For simplicity, we assume the fuzzy inference system under consideration has two inputs x and y and one output z. There are two rules: Rule 1: if x is Al and y is Bl then fl=plx+qly+rl, Rule 2: if x is A2,andyisB2 then f2=p2x+q2y+r2. The fuzzy reasoning by first-order Sugeno fuzzy model is illustrated in Figure 2
Fi^pix+qiy+r
F2=P2X+q2y+r2
c=>
z=
= mxjx + Wj + vt>2
Figure 2. Fuzzy inference system
VJ2J2
654 3. Fuzzy Approximation Network Perturbation Systems Based Fuzzy Inference System By embedding the first-order Sugeo fuzzy inference system into the framework of fuzzy approximation network perturbation system, multiplayer feed forward networks in which part or all of the nodes are adaptive, which means their outputs depend on the parameters pertaining to these nodes, we obtain the architecture of fuzzy inference system by first-order Sugeno fuzzy model. Fig.3 shows the equivalent of the fuzzy inference system by first-order Sugeno fuzzy model. Fig 3 shows the equivalent of the fuzzy inference system shown in Figure 2. No weights are associated with links between the nodes of two adjacent layers. It employs a hybrid learning rule [1] which combines the gradient method and the least squares estimate to identify premise and consequent parameters, respectively. Lqyerl
layer2
layer3
layer4
layer5
Al
< - *
i
^ N ^ A2 /
Bl ^
Nt^
B2
ml
i i i i
Figure 3. Fuzzy approximation network perturbation system
It is called simplified if its fuzzy if-then rules are in the following form: Ifx is big andy is small, then z is d where d is a crisp value. 4. Application to Risk Analysis in Transportation Capacity At present, many prediction methods are used in the field of traffic. For the uncertainty of traffic market, however, some prediction results do not afford to be tested by facts. In the following, we establish prediction modeling in MATLAB by using the data of transportation capacity of world bulk fleet (1976-2004) (see Table 1)
655 Table 1. Transportation capacity of world bulk fleet (Million Dwt)
year
1976
1977
1978
1979
tonnage
234.6
268.1
303.7
348.5
Year
1980
1981
1982
1983
tonnage Year
399.9
449.0
489.9
514.2
1984
1985
1986
1987
tonnage
516.6
520.2
520.6
526.8
year
1988
1989
1990
1991
tonnage Year tonnage Year tonnage Year
519.2 1992
505.3 1993 467.0 1997 511.4 2001
496.7 1994 473.4
tonnage Year
525.9 2004
475.3 1995 486.9 1999 524.4 2003 560.4
tonnage
563.2
268.9 1996 500.1 2000
537.7
1998 517.4 2002 549.6
The procedure of establishment prediction modeling is as follows: 1. Treat the original data to get increased percentages. 2. Establish input-output prediction modeling by extracting 24 input-output data pairs of the following format: [x(i) x(i+l) x(i+2) x(i+3) x(i+4)]. 3. Use MATLAB language to simulate on a PC 586.The simulation result is in Table 2. 4. Analyze the result. Table 2. The simulation result
Year
Treatment
Output
Prediction
1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
0.1428 0.1328 0.1475 0.1475 0.1228 0.0911 0.0496 0.0047 0.0070
0.1227 0.0911 0.0496 0.0047 0.0070
448.99 489.9 514.2 516.6 520.2
Original 234.6 268.1 303.7 348.5 399.9 499.0 489.9 514.2 516.6 520.2
656 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
0.0008 0.0119 -0.014 -0.027 -0.017 -0.043 -0.014 -0.004 0.0137 0.0285 0.0271 0.0226 0.0117 0.0135 0.0029 0.0224 0.0221 0.0197 0.0050
0.0008 0.0119 -0.014 -0.027 -0.017 -0.043 -0.014 -0.004 0.0137 0.0285 0.0272 0.0225 0.0117 0.0134 0.0030 0.0223 0.0221 0.0197 0.0051
520.6 526.8 519.2 501.3 496.7 475.3 468.9 467.0 473.4 486.9 500.1 511.4 517.4 524.3 525.9 537.7 549.5 560.3 563.2
520.6 526.8 519.2 505.3 496.7 475.3 468.9 467.0 473.4 486.9 500.1 511.4 517.4 524.4 525.9 537.7 549.6 560.4 563.2
Figure 4 shows that after 146 epochs, we had RMSE (root mean squared error) is 0.000068. The desired and predicted transportation capacity curves are essentially the same in Figure 5. The predictive values for the future 5 years are given in table 3, and the transportation capacity from 2000to 2004 will average increase by 0.0139. Therefore, this system can model the relationship between input and output quickly and precisely. And, the prediction for the future 5 years fit for the trend of world bulk fleet.
Figure 4. RMSE curves for predicted modeling
657
r" / ^** y
i i
_y
\ \ \
yJ
/ \
j
I
/
"^ -^ /.
i •
/
i
Figure 5. Expected and predicted transportation capacity curves Table 3. Prediction result for the incoming 5 years (Million Dwt)
Year Tonnage Year Tonnage
2005 575.9 2008 593.9
2006 590.8 2009 603.4
2007 594.4 2010 613.4
5. Conclusions We have described the neural network based fuzzy inference system, and briefly introduced its architecture and algorithm. In sec .4, and applied to predict transportation capacity. The prediction modeling is adaptable for the uncertainty of traffic market very well. However, this system has some limitations: 1. Only supports the first-order Sugeno fuzzy inference systems. 2. Employ the defuzzification strategy of weighted average Furthermore, previously, we established prediction modeling only for transportation capacity of traffic market. But for freight volume and freight rate, since it is not easy to collect their historical data and they are interrupted more easily by all kinds of factors such as political and economic than transportation capacity. So, there will be some difficulty in predicting them. Further research challenges us.
658 References 1. Jang, J.R. 1993. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE transactions on system, man, and cybernetics, vol.23, No.3: 665-685. 2. Mamdani.E.H.1997.Aplication of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis.IEEE Transaction on computers, vol.26, No.l2:1182-1191. 3. Takagi. T. & Sugeno.M. 1985. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans.Syst, Man, Cybern.,vol.l5:116-132. 4. Chichocki. A. 1996. New neural network for solving linear programming problems. European Journal of Operational Research, 93(2):244-256. 5. Draye. J. 1996. Dynamic recurrent neural networks: a dynamical analysis. IEEE Transactions on Systems, Nan and Cybernetics, Part B: Cybernetics, 26(5):692-706. 6. Feng. S. &.Xu. L 1996. A hybrid knowledge-based system for urban development.Expert Systems with Applications,10(1): 157-163. 7. Gen. M.1998. Neural network technique for fuzzy multiobjective linear programming. Computers and Industrial Engineering, 35(3-4):543-546. 8. Jacobsen. H.1998.Genetic architecture for hybrid intelligent systems. Proceedings of the IEEE International Conference on Fuzzy Systems, New York: 709-714. 9. P.Z.Wang. & H.X.Li. 1997.Fuzzy Information Processing and Fuzzy Computers, Science Press: New York, Beijing. 10. H.X.Li. 2001.Fuzzy Neural Intelligent Ssystems, CRC Press,FL: New York. 11. K.Q.Zou. 2003 .Fuzzy Decision Support Systems and Its Application. Advances in Systems Science and Application.3(l):52-54. 12. K.Q.Zou. 2003.Catastrophe fuzzy neural network model for natural fire. IEEE-ICMLC-2003:552-554. 13. K.Q.Zou. 2003.Fuzzy symmetric group categories in system sciences. IEEE-NAFIPS-2003:73-76.
APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN THE FLOOD FORECAST* LIHUA FENG Department of Geography, Zhejiang Normal University, Jinhua 321004, China E-mail: fenglh@zjnu. en JIALU Computer Science and Information System, University of Phoenix, FL 33076, USA Adopt the " memory " and "simulation" of the artificial neural networks to do the flood forecast because the advantages of the neural networks can be used to simulate and record the relationship of the input and the output on the complex "function" through the training and the learning based on the historical data without any mathematics models. For this research, the authors proposed a new flood forecast system with the related applications based on the neural networks methods. It was shown the better performance and the efficiency results. It will be expected that this system application will become more sensitive to increase a better performance for the flood forecast.
1. Introduction In 1998, we got the serious damages by the flood at Changjiang areas in China. It cost more than 2000 billion RMB. Therefore, we have to develop a precision technique for the flood forecast. For the traditional forecast, the scientists usually set mathematical models, graphics, and related history data to analyze and to deal with the pattern recognition [1], Many scientists have been doing the researches on the relationship of the dependent variable y and independent variable x in [ y = f(x) ] to find out the models for the flood research [2]. The artificial neural networks (ANN) have been successful in the pattern recognition because of the advantages in self-learning, self-organizing, selfadapting and fault tolerant [3]. In this case, we proposed the system for the flood forecast based on the principles and methods of the artificial neural networks in our research [4].
" This work was supported by Zhejiang Provincial Science and Technology Foundation of China (No. 2006C23066).
659
660 2. The Principles and Methodology ANN is the complex neural networks that consists of the large of the simple neural cells. It was proposed based on the biology research on human brain. It can be used to simulate the activity in human brain neural. ANN has the topology structures of the information processing with the parallel distribution. The mappings of the estimation responses of input and output are obtained via the combinations of the nonlinear functions. We can use the pattern recognition methodology to analyze the algorithms of the neural networks using the past experience, neural cells, memory, fuzzy theory, nonlinear, and noise data without any mathematical models [5]. The neural networks of the error back propagation is called BP network that consists of the back and forward propagating in the process of learning. In the forward processing, the sample signals will gradually go forward through the each layer with the Sigmoid function/(*) = l/(l + e"*). The neural cell of the each layer only affects the status of the next neural cell. If the expected output signals can not be obtained in the output layer, the weight values of the each layer of the neural cells must be modified. The error of the output signals will be backward from the same way. Finally, the signal error will arrive in the certain areas with the repeated propagating. The network is set m layers and yj, so that, y™ indicates the output with m layers and j nodes. If y° is equal to x,, andy indicates the inputs. Let W™ be the connection weight, and it indicates y"~' to y", then we get the threshold of 6J with m layers andy nodes. The training steps of BP networks are as follows: ® Sign the values and the thresholds in (-1, 1) randomly. (2) Select ( x k , Tk) from the group of the training data and add the input variables into the input layer (m=0) , then we get, y°=x" (all/points) (1) k indicates the training graphical numbers (Equation 1). (D The signals go forward propagating at the networks with the following formula:
y:=F(S;)=FQ:wi;yr+0:)
(2)
It starts to calculate they nodes of the each layer with the output ym from the first layer until the calculation processing completed. F(s) represents one of the Sigmoid functions (Equation 2). (3) The error value of the each node for the output layer (Equation 3):
The above error is obtained from the different values between the real output and the required output, ® The error value of the each node is for the previous layers (Equation 4):
;-' =F'«-')I07<J;
(4)
It depends on the each layer of the error back propagation (m=m, m-1, • • •, 1) . © The weight values and thresholds are for the backward of the each layer (Equation 5 and 6),
w; (/+1) = w; (o+ n sjyr + a\w; (0 -w;{$-1)] o~ (t+1) = ; (o+r,s; + a[e; « -e;(t~ i)]
(5) (6)
where t indicates the sequence time of the layers; 77 indicates the learning rate[ rj £ (0, 1)]; and a indicates the momentum constant [ a € (0, 1)]. (7) It calculates them backward to the computation of the step (D, and then we get the next graphics to repeat the calculations from the step (2) to the step (7) until the precision obtained (Equation 7).
^=iE(^-^;)2/2 *
(7)
J
We can start to analyze the forecast information with the weight values and the thresholds after the training procedure of the neural networks. 3. Results and Discussions We use the peak stages in the upper and lower reaches stations at Dadu River of China (without tributary rivers) as an example to describe the flood forecast of the neural networks. The following table (Table 1) indicates the peak stages of 16 measured floods in the upper and lower reaches stations [6]. We use the peak stage of the upper reaches station as an input variable, and the peak stage of the lower reaches station can also be used as an output variable. Then we get the nodes of the input and the output layers as 1. We also set the node of the hidden layer as 5 according to the law of Kolmogorov. Therefore, we obtained the topology structure (1,5,1) from the neural networks of the flood forecast. For the speed of the convergence, it is necessary to handle the original data H, (Equation 8), #;=(",-"-.)/(#«-#.j (8) where H^ and HmiB indicate the maximum and the minimum values of the peak stage. From the equation 8, know the value of H\ locates intervals [0, 1], After the original values were handled and added to the BP networks, we can select the training parameters for the training and the learning by using the learning rate 7 at 0.85 and the momentum constant a at 0.60. We use the numbers of 1 ~ 12 as the training samples and the number of 13 —16 as test
samples in order to test the networks of training and learning. According to 100 thousands of the training and learning on the training samples, the error of the networks was £=0.006. It is less than the expected precision scale where the networks were convergence. The expected results are shown in Table 1 where the mean error e is 0.08m and the maximum error emajc is -0.21m. Table 1. Peak stages in the upper and lower reaches stations at Dadu River and its calculating results (unit: m). NO.
Training
sample
Test sample
Upper peak stage Lower peak stage
Output
Fit value
Error
1 2 3 4 5 6 7 8 9 10 11 12
2280.96 2283.21 2277.93 2278.16 2280.41 2280.59 2281.93 2279.68 2283.00 2280.97 2283.09 2281.54
1309.75 1311.05 1307.88 1308.04 1309.44 1309.39 1310.30 1309.05 1310.93 1309.94 1311.26 1310.09
0.5552 0.9574 0.0217 0.0389 0.4716 0.4991 0.7315 0.3316 0.9328 0.5472 0.9433 0.6525
1309.76 1311.12 1307.95 1308.01 1309.47 1309.57 1310.35 1309.00 1311.03 1309.73 1311.07 1310.09
0.01 0.07 0.07 -0.03 0.03 0.18 0.05 -0.05 0.10 -0.21 -0.19 0.00
13 14 15 16
2280.99 2282.00 2281.59 2278.21
1309.74 1310.44 1310.12 1308.02
0.5599 0.7491 0.6644 0.0436
1309.77 1310.41 1310.13 1308.03
0.03 -0.03 0.01 0.01
We can use the networks for the forecast on the peak stage for the lower reaches station because of the relationship of the "function". This function of the input and the output variables were obtained through the "simulation" and "memory" after the neural networks training. The results are shown in Table 1 where the error e is 0.03 ^ -0.03 > 0.01 and 0.01m respectively. For the flood motion, there exists the relationship of the dependent variable v and the independent variable x: v = f(x). It is impossible to describe the relationship by using the related function, even for the simplest peak stage of the non-tributary river [ Hlma = f(Hupper) ]. Therefore, we adopt the " memory " and "simulation" of the neural networks to do the flood forecast because the advantages of the neural networks can be used to simulate and record the relationship of the input and the output on the complex "function" through the training and the learning based on the historical data without any mathematics models. In fact, we can use the technology of the neural networks to analyze both simple and complex situations for the flood forecast as follows:
663 (1) The forecast of the related peak stage (discharge) based on the parameters of the different factors (equal to the related variable graphics) . These factors include the simultaneous stages (discharge) of the lower reaches station, the difference of the flooding of the upper reaches station (discharge), the precipitation of the space interval, the lower backwater, and the multiple tributary water etc.. These can be only used as input variations. Table 2 shows the results of the forecast of three peak stages about Gushui and Shigou Stations of Suijiang River in China. The first, second, and third input variables are the peak stages in Gushui Station HaaM, the simultaneous stages Hs of Shigou standing with the peak stages of Gushui Station, the precipitation of space interval P, and the output variation is the peak stages of Shigou Station HSMgou [6]. The mean error e is 0.07m, the maximum error emn is 0.19m, and the forecast error e of three peak stages are 0.06m, 0.06m, and -0.09m based on the computations respectively. Therefore, Table 2 shows the satisfied results. Table 2. Forecast of the peak stages at Gushui and Shigou Stations of the Suijiang River (unit: m). NO. 78 79 80
^Gushui
30.26 30.22 32.19
Hs
/>(mm)
"shigou
Output
Value
Error
12.34 13.15 14.68
44 72 136
13.36 13.70 15.44
0.2416 0.3136 0.6597
13.42 13.76 15.35
0.06 0.06 -0.09
(2) The forecast of the propagation time of the flood peak based on the parameters of the factors. The input variable is the same as the above, but the output variable is the propagation time of the flood peak. (3) The forecast of the precipitation- runoff of the factors. The input variables include the precipitation of the single station/^ (/=1, 2, •••, n) , the mean of the area precipitation, the influencing precipitation of the previous stage Pa, the water storage of the drainage basin before the precipitation W0, the duration of the rainfall T, the rainfall intension, the evaporation in the interval of the precipitation and the flooding discharge Q0 etc. The output variation is the runoff. (4) The flood routing through reservoir. The input variables include the precipitation of the single station within the time interval P, (;'=1, 2, •••, n) , the mean precipitation of the area within the time interval, the mean of the discharge flowing into reservoir within the time intervals and the previous water stage within the time intervals etc. The output variations are the stages at the front of the reservoir within the end of the time intervals and the mean of the discharge flowing out from the reservoir within the time intervals.
The neural networks technology can be also used to analyze the relationships of the flood peak-volume and the water stages-discharge. For the research, the historical data were analyzed and computed, which show that the forecast results are satisfied if the parameter selection is reasonable. 4. Conclusions From the above the authors described in the research, it is impossible to describe the relationship between the independent variable x and dependent variable y by using certain functions for the flood motions. The technology of the neural networks can be used as an alternative method to solve the problems. The artificial neural networks can be used to complete the information processing of the networks with the interaction of the neural cells. The mappings of the stimuli effectives and the estimation responses of the input and the output are obtained via the combinations of the nonlinear functions. It has advantages on self-learning, self-organizing, self-adapting and fault tolerant. It is definitely to be used for the application on the flood forecast. It was proved from this research. For this research, the authors proposed a new flood forecast system and developed the new software with the related applications based on the neural networks methods. It was shown the better performance and the efficiency results. Nevertheless, there still exist some problems for the further study on the flood forecast. It will be expected that this system application will become more sensitive to increase a better performance for the flood forecast in the neural networks. References 1. 2.
3.
4.
5.
6.
U. R. Acharya and P. S. Bhat. Classification of heart rate data using artificial neural network and fuzzy equivalence relation. Pattern Recognition, 36(1), 61-68 (2003). M. Amitabha. Application of visual, statistical and artificial neural network methods in the differentiation of water from the exploited aquifers in Kuwait. Hydrogeology Journal, 11(3), 343-356 (2003). L. Konstantin and G. L. Norman. Application of an artificial neural network simulation for top-of-atmosphere radiative flux estimation from CERES. Journal of Atmospheric and Oceanic Technology, 20(12), 1749-1757 (2003). J. C. Zhou, Q. S. Zhou and P. Y. Han. Artificial Neural Networks, the Sixth Generation Computation Accomplishment. Scientific Popularization Publisher, 4751 (1993). G. M. Brion and S. Lingireddy. Artificial neural network modelling: a summary of successful applications relative to microbial water quality. Water Science and Technology, 47(3), 235-240 (2003). H. L. Li, L. C. Li and J. F. Yan. Hydrological Forecast,. China Waterpower Press, 10-13(1979).
INTEGRATED MANAGEMENT PATTERN OF MARINE SECURITY SYNTHESIS RISK WANG YUE College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian, Liaoning, China REN XUE-HUI College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian, Liaoning, China DING YONG-SHENG Institute of Marine Science, Shanghai Marine University, Shanghai Pudong Road 1550 Shanghai, China YU CHANG-YING Department of Public Administration, Dongbei University of Finance and Economics, Dalian Jianshan Street 217, Dalian, Liaoning, China The meaning of marine security has been interpreted in point view of synthesis risk, and by integrating 4S (GIS, GPS, RS and DSS) the management system has been designed. Based on this, the integrated management pattern of marine security synthesis risk has been provided. This management system has been taken as the management tool. Through the risk calculation and fuzzy appraisal, the harm and loss that risk has brought may be estimated, and the forecast or prewam of risk also can be achieved. By the response mechanism of emergency management, replying and reducing risk influence may be realized. The whole process depends on the system that has been built.
1. Introduction Security and risk are the two aspects of one thing. It is necessary to strengthen the management and prevention of risk for security. ' The sea, the important constituent of global life system, is not only the strategic development base of the biology resources, the water resources, the mineral resource and the sea energy, but also the new space that the humanity pursues the continual development.2 However, when the people obtain rich economical benefit, the sea suffered serious safely threat at the same time. Therefore, it is urgent to utilize scientific method to manage sea, reduce synthesis risk coefficient of marine 665
security, and realize continual development of marine economy. 2. Marine Security Synthesis Risk 2.1. Marine Security Marine security was not merely limited to the security of resource and environment, also involved each kind of social relations and social problems that has relation to the marine resource and environment, 3 especially the territory security and national sovereignty problem. Marine security has been divided into territory security, environment security, resource security, marine activity security and so on. Territory security. The sea is usually taken as the extend part of the national territory; its security problem directly has relation to the national signory and territorial integrity. As the front platform of national security, whether no matter has the event occurred, marine security leaves various hidden troubles, which include the significant war harm and sea disaster and so on." Resource security. At present, the marine resource has been extremely influenced and destructed during the course of developing. Take the petroleum as an example, once the scientists has sketchily estimated that it approximately has 500000 tons petroleum entered the world sea by natural process every year, but the petroleum drained by such humanity activities greatly surpassed this number for petroleum transport, ships accident, and marine petroleum mining and so on. Environment security. The environmental pollution mainly originated from two aspects: One is the life activity, and the other is the producing activity. Pertinent data indicate: the bigger fishery pollution and harm accidents of China altogether have surpassed 1000 times in 2001, the directly economical loss amounts to 350 million Yuan.4 Increasingly serious pollution has brought the extremely disadvantageous consequence to marine environment. Marine activity security. There are mainly three kinds of marine activity that influence marine security: sea transportation, sea traveling and marine production. In the process of transportation, besides own fuel to the sea can cause sea pollution, some accidents like the petroleum leak, the dangerous material detonation, the waste throwing random may also create the security threat. The sea traveling activity serious influences and destructs marine resource, and pollutes marine environment. As the main place of marine cultivation activity, people face security risk not only from production but also "DING De-wen, LV Ji-bin, etc. Think about the National Marine Ecological Environment Security. Report on Marine Economic Geography Forum in 2004. Dalian. 2004, 9
from sea itself. 2.2. Synthesis Risk Management At present, Chinese marine security synthesis risk management is in leading strings. The existed risk management pattern has below characteristics:5 1) Each individual or department often independently adapts certain countermeasure for risk, which lacks the systematic characteristic and global property. 2) The risk management basically locates the passive management situation. 3) Risk management is often instantaneous and discontinuous; as the risk comes the management will carry on. 4) Systematic and scientific theory method is lacked to direct the risk management. 3.
Integrated Management of Marine Security Synthesis Risk
The integrated management pattern of marine security synthesis risk should contain the parts, such as monitoring, prewarning, appraisal, emergency, administer and so on; also should involve the person, wealth, matter, etc. 3.1. Integrated Management Idea The integrated management idea of marine security synthesis risk is showed on Fig.l. Mb) re Tactic of rescue and comeback
X
Risk with respect to loss Monitor the risk course
Monitor
Marine security synthesis risk
Risk assessment Risk calculation Forecast risk
Create and implement the pre-planned scheme
Prewarn promul- alarm gate Fig.l Management idea
Correspond department combine emergency
<"
iternet
a
Continual monitor
3.2. Integrated Management Tool The management and emergency decision-making system of marine security synthesis risk has been constructed by seamlessly integrating 4S. The 4S includes Geography information system (GIS), Three-dimensional visible system (VS) and Expert system (ES), Decision-making support system (DSS). Mapinfo7.0 is taken as the foundational platform; SQL Server2000 stores the resource entity data, meanwhile by using VC++ it has been exploited
secondly under the environment of Windows XP. The structure of C/S+B/S is applied to the systemic running environment; a server (SUN3000, 450) is used as the hardware. This system mainly has the functions: risk monitor, risk appraisal and prewarn, management decision, emergency response. 3.3. Systemic Database Design Systemic database design is shown on Table. 1. Table .1 Design of database Database User
Data table Usertable
Corresponding Information Data of Study Area
Environmental Foundational Information
Online Information
Archives Information
Line Name ID User's Name Password Create Data Atmosphere Seawater Biota Weather Human Activity Public Security Medical Treatment Traffic Salvage Army Community Input Record Output Record Historic Files Policy Files
3.4. Integrated Management Pattern 3.4.1. Risk Monitor With technology developing unceasingly, people have already been able to monitor and forecast the security risk. The system can monitor marine disaster risk. Furthermore, probability that the disaster possibly occurs may be forecasted by the fuzzy judgment model interior the system, computation risk: *n() Lute)
' u f a ) *»(?)
n
tM(q)
12
^i() ^ 0 ? ) t*<8) t«{q)
1»
W =
=
(asl(q),ex3(q),a^(q),c^(q))
669 Among them, tj (q) (i=l, 2...n; j=l, 2, 3, 4) expresses the probability that j rank disaster occurs in t sea area when the disaster factor P,„ appears. Based on the result above, the superintendent may forecast the disaster risk occurrence and arrange preparation earlier time. 3.4.2. Risk Estimation and Prewarning In order to alleviate or reduce threat of safe and economical that accident or the disaster has brought, risk harm degree and loss also must be appraised by using the systematic management tool:
sup \ryD>(d,$,l)A KD(d,p)A
KL(l,p)= fit
n^)\
Diem
7tD(d, p) is the fuzzy risk of object of hazard effect, r(3)(d,0,l)is the relation between destructiveness and loss degree,
the working efficiency may be improved effectively. But because the area concerned with the risk management is wide, and the department concerned is more, the pattern need to be perfected and supplemented when it applied in practice. References 1. Ren XH, Wang Y. Design of the administrative decision system of marine environmental disasters' emergency & rescue. {Proceedings ofFifth Annual IIASA-DPRI Forum on Integrated Disaster Risk Management Innovations in Science and Policy)) 2005.9; BUN, NDRCC, DPRI, IIASA. 2. Jia YQ, Li R Marine tour and marine tour resource class. Ocean Development and Management 2005; 2: 77-81. 3. Zhang XL. A brief research on the causes and mechanism in tour security accidents. Economic Geography 2003; 1: 542-546. 4. Zhao JH. Investigation and analysis of fishery production and ecological environment. Chinese Marine Lives 2004; 2:28-29. 5. Yang WM, Yang ND, Jiang JJ. Study on integrated risk management pattern and references of Canada. Chinese Science and Technology Forum 2004;3:133-137. 6. Huang CHF. Risk assessment of natural disaster theory & practice. Beijing: Science Press, 2005:122-123. 7. Xie HX, Hu QH. Research and development of environment emergency management. Environment Pollution and Prevention 2004; 26(1): 44-45. 8. Ren XH, Wang Y. Designing the system of precaution and emergency rescue decision for touring safety accidents of coastal city. Progress in geography 2004; 24(4): 123-128.
RISK ANALYSIS AND MANAGEMENT OF URBAN RAINSTORM WATER LOGGING IN TIANJIN HAN SUQIN College of Environmental Science and Engineering, Nankai University, 94, Weijin road, Hexi District, Tianjin 300071, China Tianjin Meteorological Institute, Tianjin, 300074, China XIEYIYANG Tianjin Meteorological Institute, Tianjin, 300074, China LI DAMING College of Civil Engineering, Tianjin University, Tianjin, 300072, China Abstract: This paper analyzed urban rainstorm water logging disaster in Tianjin city based on statistics and numerical simulation. Firstly, the basic theory of the urban rainstorm water logging mathematical model was introduced and used to simulate various rain process conditions according to the features of the rainstorm and the draining rules. Secondly, the water logging disaster distribution and its influences on traffic were primarily evaluated. Finally, some management and mitigation measures of the urban rainstorm water logging disaster were discussed.
Key words: urbanization, urban rainstorm water logging, disaster, mathematical model, risk analysis 1. Introduction On Dec.ll, 1989, the United Nations propounded that the last ten years of the 20th century were the international ten years of disaster reduction. Reduction of natural disasters has caused worldwide attention. Flood and water logging disaster is the most predominant disaster, from which threats and damages are the most striking. Most countries and regions in the world, such as US, Canada, Japan, and China etc., have all been suffering from floods and water logging hazards. Flood hazards are caused by many factors such as rainstorm, mountain torrents and bursting of dams, among which, storm is the most frequent factor with the biggest damages [1]. Since the industrial revolution, many countries have stepped up the process of urbanization, which has brought economic prosperity while at the same time risked flood hazards [2"3]. Tianjin, a mega-city in north China, is a region with high risks of urban storm water loggings. The human activity changes characteristics of urban hydrology when local economy develops at high speeds. With the reduction in the numbers of puddles and lakes in the area, human's function in regulating and reserving 671
672 the rain water has weakened. The increase of concrete area also reduces the penetration of the rain water into soil. Resistance coefficient of stream lowers when rain water flows along streets. Furthermore, the urban heat island effect increases the frequency and intensity of rainfall in the city. The old design standards can not meet the need to cope with the increase for supplying and draining water in the city. Urban rainstorm water logging may have serious impacts on people's life, e.g. the traffic jam, houses ducking, etc. thus risk analysis for urban water logging plays an important role in urban planning and sustainable development [4~5]. This paper analyzed the distribution of the rain grades in Tianjin, based on the mathematical model of urban water logging developed by Tianjin meteorological Institute and Tianjin University, The urban accumulated water on surface with the different rainfall intensity has been simulated, and the risks of urban water logging have been analyzed and estimated. 2. Risk Analysis of Urban Rainstorm Water Logging The methods for assessing the disasters from rainstorm water logging include: probability, investigation and statistics, and mathematical simulation. With the development of the computer technology, mathematical models have been used more and more frequently to assess the disasters. In the model of this research, the observation rainfall or radar products are effectively interpolated to each grid cell to get the area precipitation and to act as dynamic raining boundary conditions. So the model can not only reduce the inaccuracy, but also forecast the storm water logging. 2.1 The Basic Theory of the Mathematical Model The round flows and river flows in the city are the main objects to be simulated. So the governing equations are the two-dimensional unsteady flow equations: Continuity equation and momentum equations: cH
dM
cN
dt
&
dy
(1)
d(uM) d(vM) n2u4u2 + v2 rjdZ n +— - +— - + gH— + g =0 a & dy 3c Hy3
dM
<3V d(uN) d(vN) n2v4u2 +v TTdl + L + J H ^r-— - + ^r + gH g ~— ++ gg —3— — +— —-—dt ck dy dy H
= 0
... (2)
(3)
673 Where H is the water depth; Z=Z0+H, and Z0 is the height of the underlying surface; q is the term of source and sink sum of effective raining intensity and draining intensity; M and N are flow velocities in the x and y directions, respectively; N is the roughness coefficient; and g is the gravitational acceleration. In the model non-structural irregular grid technique is adopted to divide the whole computation area into cells. The grid cell may be a triangle, a quadrangle, or a pentagon. Boundaries around the cells are defined as passages, and their normal direction can be optional. The continuity equation is separated according to the finite volume method, while the momentum equations are classified to be simplified and be separated. And time alternate alternation method is adopted to calculate water depths and discharges. The waterproof constructions in the city, such as buildings, levees, railroads, are simplified and simulated as continuous or gap dikes lying on the passages. Discharges through this type of passages are calculated by way of the broad-crested weir flow equation. That is,
QT;dl =m
(4)
Where Q, is the discharge per unit width, m is the discharge coefficient, (Ts is the submergence coefficient, and H • is the water depth at weir crest. 2.2 Frequency Distribution of Rain grade in Tianjin City During the analysis process, disasters from different rain grades have aroused much attention. So the paper analyzed the distribution features of the rainfall grades in Tianjin. What is convenient for the research, the precipitation over heavy rain is divided into seven grades; each grade is 25mm. Frequency distribution of rain grade is got according to the observation rainfall from May 1918 to October 1998 (Table 1). It shows good linear relations between rain frequency logarithm and rain grades (Fig. 1). The experiential expression of frequency and rain grades is got through further research: Ln(N) = 6.623028- 0.9906966 xD
(5)
In which D is rainfall's grade, N is the frequency of the different rainfalls' grades. The return period of the rainstorm (T) is the reciprocal of the storm probability (P). T = 1/P (6) With the above expressions, we can calculate the frequency and the reappear period in each rainfall grade.
674 Table 1. Frequency Distribution of Rain Grade in Tianjin City Calculated
return
frequency(N)
period(a)
279.3
0.2477
96
103.7
0.78
75—99.9 mm
38
38.5
2.103
100— 124.9 mm
10
14.3
5.664
125—149.9 mm
7
5.3
15.255
150—174.9 mm
2
1.97
41.083
175—199.9 mm
0
0.732
110.642
precipitation(mm)
Frequency(N)
2 5 ^ 9 . 9 mm
327
50—74.9 mm
It can be found in Table 1 that the simulated frequency of rain grade is similar with the real one, so the simulated equation can be applied in forecasting the utmost storm rainfalls. Grade 2 (50-74.9mm) appeared once in less than a year while rainfall grade 6 (150-174.9mm) appeared once in each 40 years. Although rainfall grade 7 (175-199.99mm) has never appeared in the long period of 81 years, we can't exclude the possibility of its occurrence, and the reappear period is defined as 110 years. 2.3 Evaluating the Project of the Rainstorm Water Logging Based on the above analysis, the schemes to calculate the maximum water depth are established in Tianjin. The schemes are designed as follows: (1) The intensive precipitation processes are chosen to simulate the maximum water depths for 2, 3, 4, 5, 6 grades respectively. The choosing principle is: precipitation per hour equal to or greater than 20mm occurs at least once, because rainfalls which can result in urban water logging disasters attribute to short diachronic precipitation. (2) Because the 7th grade of rainfall have never appeared in 81-years' statistical data, it is supposed that 180mm of rainfall are distributed evenly in 9 hours when we simulate the rainfall process in Grade 7. (3) Different degrees of water logging around different areas are compared on the assumption that the amount of rainfalls is distributed uniformly in the area of the Tianjin city. Although precipitation is not distributed uniformly in the area, similar grade rainfalls may occur anywhere, considering from the view of the risk analysis. Notes: the simulating experiments did not take into consideration the influences of secondary circulation on precipitation distributing characteristics 2.4 Analysis of the Results The simulating results reveal the rainfall intensity results in urban water logging disaster with different degrees in different regions. When the rainfall increases
675 its intensity, the extent of disasters will extend in most regions, and the risks for disasters will be enhanced as well. The distribution of water logging risks for urban rainfalls in Tianjin is shown in Table 2. The disaster risks are mainly in class I and II. In the class III risks may not appear when rainfall amount did not come up to 150 mm. The return period for rainfall amount within 50-124.9mm is 1 -5 years, when the risks can come true easily. The corresponding risk ratio with class II usually reaches 19-36%, thus domestic life and city transportation suffer seriously and the drainage becomes very important for reducing the disasters. Table 2.
Distributing of Water Logging-risk for Urban Rainfall in Tianjin
disaster
50—74.9
75—99.9
100—124.9
125—149.9
150—174.9
175—199.9
extent
mm
mm
mm
mm
mm
mm 19.4%
I
80.5%
69.4%
63.9%
25%
16.7%
II
19.5%
30.6%
36.1%
75%
80.5%
75%
III
0
0
0
0
2.8%
5.6%
3. Losing Evaluation of Storm Water Logging's Impact on the Traffic Accompany with the extension of the city, the task to reduce urban rainstorm water disasters becomes more and more important. So it is necessary to carry out the risk analysis in order to alleviate the damages of the disasters. As we know, different urban rainstorm water logging cause different disasters, which can be indicated by the maximum water depth and the subsiding time of the storm water logging. In this paper, 0.3m is adopted as a depth threshold to affect the traffic, 0.8m as a depth for completely interruption. Subsiding time can be separated into several grades such as: 1-3 h, 3-8h, 8-24h, >24h. Because long period storm water logging has seldom appeared with the development of the drain ability in recent years, the maximum water depths are only considered to establish the class of disaster extents (Table 3). Table 3. Classification of Disaster Extent Class of disaster
Calculated water depth
Influence to transportation
0
<0.05m
Here no influence
I
0.05-0.3m
Jam (Here car can drive, but the speed reduce)
II
0.3-0.8m
Partly break off
III
>0.8m
Complete break off.
With the help from Tianjin Transportation Bureau, we investigated the transportation discharges in the main highways, calculated the accepted disaster index (eq7.), and divided the acceptability into 3 grades (Table 4)
676 index = 1 - discharge of auto -s- maximum discharges Table 4.
(7)
Acceptability of the Main Highway in Tianjin City
Acceptability grade
Accepted index
Car discharges
I
>0.6
<4000/hour
II
0.3-0.6
4000-7000/ hour
III
<0.3
>7000/ hour
The risk analysis of water logging on traffic is based on the accept abilities and disaster extents. The losses from the disasters are calculated by way of the following expression. Losses from disasters = Class of disasters extents x Acceptability grades (8 )
10 a. >H OJ +1 r/l
a! Ui
-o H o ^4 CD l/J O
o
8 7 6 S 4 3 2
I
1 0 0
m level 1
1 2 grade of d i s a s t e r Hlevel2
3
• level3
Fig. 1: Access of water logging to traffic
4. Managing the Urban Storm Waster Logging Disaster It is important to strengthen the defense construction to alleviate the water logging disasters. This work should begin with city planning, and improve the acceptability of the city. The suggestions are as follows: (1) rebuilding the urban drainage system, (2) avoiding filling the lake for new land and reserve the natural lake as possible, (3) building reservoir to alleviate the burden of the river, (4) forbidding digging into the groundwater and avoid the ground's sinking, (5) purchasing enough underground facilities to insure the safety of the personnel and properties. (6) Increasing green lands which ameliorate the microclimate in the city.
677 5. Conclusions The reduction of the urban rainstorm water logging is of great importance to sustainable development. This paper analyzes the risk of urban rainstorm water logging facing the Tianjin city through methods of probability, investigation and statistics, and numerical simulation. Firstly, the basic theory of the urban rainstorm water logging mathematical model was introduced and was used to simulate various rain process according to the features of the rainstorm and the draining rule. Secondly, the water logging disaster distribution and its affects on traffic were primarily evaluated. And finally, some management and mitigation measures of the urban rainstorm water logging disaster were discussed. References 1. HUANG Ping and ZHAO Jiguo, A study on distributed hydrological model of basin and applied prospect [J]. Hydrology, (5): 5-9. (1997, in Chinese) 2. Qiu Jinwei, Chen Hao, Liu Shukun, Urbanization and urban flood in Shenzhen city [J] Journal of Natural Disasters, 7(2), 67-73. (1998, in Chinese) 3. M.H. Hsu, S.H. Chen and T.J. Chang, Inundation simulation for urban drainage basin with storm sewer system [J], Journal of Hydrology, 234(2): 21-37,(2000), 4. QIU Jinwei, LI Na ,CHENG Xiaotao, XIA Xiangao, The simulation system for heavy rainfall in Tianjin City, JOURNAL OF HYDRAULIC ENGINEERING, (ll).34-42. (2000, in Chinese) 5. Li Darning, Zhang Hongping, Li Bingfei et al, Basic theory and mathematical modeling of urban rainstorm water logging, Journal of Hydrodynamics, Ser. B, 16(1), 17-27, ( 2004)
STUDY ON ENVIRONMENTAL RISK INFLUENCE FACTOR OF TONGLIAO* REN XUE-HUI College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian, Liaoning, China LIYUAN-HUA College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian.Liaoning, China TIAN HONG-XIA College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian.Liaoning, China WANGYUE College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian.Liaoning, China Tongliao faces such environmental risk problems such as soil erosion, land desertification and so on. Quantitatively analyze the factors that affected the region's environmental risk using the principal components analysis method. The result indicates that: the dominant factors that cause the environmental risk are: industrial gross output, sewage treatment rate, chemical fertilization rate, the cultivated area and so on; the environmental risk is the result that the natural factor and the human factor affect together, and human factor is more important; the natural factor and the human factor influence the environment through the threatening function.
1.
Introduction
Tongliao locates the center of Horqin sandy land in Mongolia, situates at the Chinese northeast farming and animal interlaced area. Its environmental resources system displays strong sensitivity and own instability under natural and artificial perturbation. The ecological equilibrium often suffers the destruction, and the environmental degeneration is obvious. ' With the industry and agriculture developing rapidly, the phenomenon that humanity irrationally 'This work is supported by the 40271051 National Natural Foundation of China.
678
679 uses water and land resources is even more serious, furthermore, the dryness occurring frequently. That make the environment risk problem prominent, such as the subsoil water level dropping, the vegetation destroyed, land sanded and salted, 2and the region environmental continual development has been threatened seriously. As Tongliao is the typical area of northeast farming and animal interlaced area, through analyzing its environmental risk influence factor, the dominant factor is abstained. This can be the scientific basis provided for this region environmental risk appraisal and management. 2.
Characteristic of Environment Risk
2.1. Shortage Risk of Water Resource 2.1.1. Quantities Shortage of Water Resource According to the correlative data statistics, 3 the total quantity of Tongliao surface water resources is 264 million cubic meters, in which the available water approximately comprises 30%-50%. The ground water resource is only 352 million cubic meters in which the quantity that can be exploited is only 17-21 hundred million cubic meters. Per capita water resource is 1460m3, which is less than 2/3 of the nations' level. So it is one of cities seriously lacking water in China. Because of the present situation, it is extremely sensitive to the climatic change. Once the precipitation slightly reduces, or the transpiration rate slightly elevates, the water resource quantity's serious insufficiency may be caused. Tongliao aridity index is between 1.25-1.5; the yearly average rainfall is between 338-446.8 mm. But yearly transpiration rate is between 1760-2000mm, which is 3-6 time as precipitation, especially the transpiration rate from February to May is 11.6 times as the contemporaneous precipitation. ' After founding the nation, the temperature of Tongliao has obviously been elevated; rising speed is 0.32 °C/10a. But the yearly precipitation is digressing, and the digressive speed is 4.5 mm/lOa.3 As a result of successive drought the ground water supplying declined, meanwhile water quantity of lives and industry needed increases, and the ground water mining quantity is expanding. Therefore the groundwater level dropped year by year, and the funnel area appeared in some area, the ecological environment has been worsened. Analyzing dynamic data to 80 ground water monitor well from 1990 to 2002, the shallow layer groundwater that has been mined amounts to 2400 square kilometers, accounts for 75% of the total area, among the rest, Horqin sandy land has reduced 1015m, other banner have all dropped 2-3m ."
2.1.2. Qualitative Shortage of Water Resource Because the water environment is seriously polluted, the available water resource quantity reduced quickly. Along with the industrial production scale expanding, the chemical fertilizer is used widely, and the life sewage leak. Tongliao surface water and the groundwater all are suffered from pollution respectively. For example, in 2003 Tongliao industrial waste, the waste gas, the waste residue withdrawal respectively achieved 1.07794 million ton, 254.78 cubic meter, 1630700 tons, which respectively has increased 53.4%, 9.4%, 6.4% comparing to 2001. In 2003 the chemical fertilization rate is 203800 tons, compared 4.5 times of that 1990. With the water quantity for people using increasing, the sewage withdrawal also correspondingly increasing, the multiannual means of leakage reaches 5000000 m3/ year 2. Without doubt, the water pollution enlarged the contradiction of region resources shortage. 2.2. Soils and the Vegetation Degeneration Risk Wind sandy soil, mash soil, meadow soil, saline alkaline soil and Chestnut soil have made up of Tongliao soil. The wind sandy soil is the focus that is thicker and easily cultivated. But the organic content is low, the soil texture is loose, the granular structure is bad, and it is extremely easily eroded by wind, so it is one of areas that are eroded most seriously by sand. Because of the wind-water compound corrosion, the erosion area reaches 23483km2 in South Tongliao, accounts for the total area 79.3%, yearly soil erosion modulus reaches for 10008000t per square kilometer.2 Tongliao total area is 60000 square kilometers, for which forest accounts 14.45% of the total area with low cover rate and bad protection capability. The esplanade area accounts for the total area 58.8%. The area is amplitude, but the quality is bad with low ability to carry the livestock, and the lawn degeneration area accounts for the total area 45.4%.5 According to the material, 6the prairie sand area amounted to 1558800 hectares in 1997, accounted for the prairie total area 54.83%, and the prairie total area will be 29.34million acres till 2000 with degenerated area that accounted for about 70%. Along with the population increasing ceaselessly, randomly cultivates the denudation, excessively herds increase in order to meet the survival and economical benefit pursuing need. That sped up the land sand and the prairie wilderness advancement and strengthened the environmental vulnerability.
3. Dominant Factors of Environment Risk 3.1 Factor Selection Through investigating on the spot and analyzing data material, it is found that Tongliao mainly exist two big environmental risks: water resource shortage; the soil and vegetation degeneration. After making clear the main composition of environmental risk, the risk factors are selected (see Figure 1) that are easily quantified, following the principle of science, independence, arrangement, dynamic and maneuverability. Then discover the dominant risk factor using the principal components analysis. Without losing the main information that are independent respectively, the principal components analysis can make less objects reflect the information that original variable target reflects, and the best synthesis and simplification of high dimension variable system may be realized. Environmental risk is the probability consequence, of the unfortunate events that caused by natural and humanity activities affecting together, and destructed, harmed the human society and natural environment, through the environmental medium diffusion, so it widely exists in humanity's activities. At present, the biggest barrier of implementing environmental risk management is lacking the quantificational risk analysis result. Therefore, it is very important to quantitatively analyze the risk factor. The risk factor is the factor that increases loss frequency or the loss degree. Only correctly appraising the dominant factor that caused environmental risk, the risk guard measure can be adopted scientifically. 3.2 Risk Factor of Water Resource Shortage By using SPSS 11.5 software, the 13 dominate factors that responses to Tongliao water resource shortage risk have been analyzed by carrying on the principal components analysis. The factor named dominant factor that affect the water resource shortage risk, can be analyzed through the new synthesized main factor load. The characteristic root, the contribution rate and the accumulation contribution rate of principal components load matrix have been showed on Table 1 and Table 2. Known by analyzing table 1, the first, the second and the third principal components accumulation contribution rate reaches 90.141%. According to the principle that the accumulation contribution rate should reach 85%-95%, the principal components Zl, Z2, Z3 can be extracted. Table 2 indicates that among the first principal components factors load, industrial gross output (X4), transpiration rate (X10), per capita GDP (X3), the sewage treatment
Irdustnalwastewaterdrauiage(XlXrn) Industrial was te s olil (X2Xt) Per capita GDP (X3)(yuan) Industrial pass output (X4)(yuan) Sewage tieatanent rate (X5X%)
Effective inigation area(X6)(hm) Population per sq in (X7)
Precipitation(X8XrnrrO Annual average tempeiatuie(X9)CC) Trans piration rate (X10) (mm) Relative hirraliy (X11X'/.) Most wind speed CX12X«Vs) Accumulative temperature (X1 ^(X;)
Appling quantity of chemical fertilizer (X14Xt) Cultivated aiea (X15Xhm) The first industry proportion (X16X%) M echanization of agriculture (X 17)(kw)
Fotestatbn area (X 18)(km) Chow grass area(Xl?)(hm) Natural lawn area (X20)(hm ) Livestock count (X21)
Figure 1 .Tongliao environmental risk influence factor and theirs relationship
rate (X5) above has the bigger load in positive direction, the load value separately is 0.979, 0.975, 0.970, 0.849; The loads are all small in the negative direction. The load of second principal components is bigger in accumulative temperature (XI3), industrial waste solid (X2) in positive direction, separately value is 0.852, 0.796; the loads are all small in the negative direction. The load of the third principal components is bigger in annual average temperature (X9) in positive direction, its value is 0.759; the load value is smaller in negative direction. By analysis, the biggest factor of the first principal components load value is the industrial gross output, which belongs to the human factor, its contribution rate is 47.776%, and the second and third principal components load are the yearly accumulative temperature, and the yearly average temperature, which belong to the natural factor, the contribution rate sum is 42.302%. Obviously, the water resource shortage risk is the result that natural
factor and human factor affects together, but the human factor occupies the leading status. Tab. 1 Eigenvalue, variance, cumulative Main component factor
Eigenvalue
Contribution
(%)
Cumulative contribution
(%
) Zl
6.211 3.519 1.988 0.772 0.247 0.159 0.104
Z2 Z3 Z4 Z5 Z6 Z7
47.776 27.072 15.293 5.936 1.901 1.221 0.800
47.776 74.848 90.141 96.078 97.979 99.200 100.000
* (Z8-Z14 is omitted) Tab.2 Main component factor load matrix X XI X2 X3 X4 X5 X6 X7
Z Zl -0.652 -0.373 0.970 0.979 0.849 0.501 0.786
X Z2 -0.396 0.796 0.083 0.042 0.059 0.567 0.489
Z3 0.564 0.414 0.223 0.194 0.313 -0.510 -0.206
X8 X9 X10 Xll X12 X13
Z Zl -0.540 0.181 0.975 -0.514 0.774 0.146
Z2 0.740 0.452 0.014 0.768 -0.294 0.852
Z3 0.378 0.759 -0.080 -0.116 0.490 -0.176
3.3 Risk Factors of Soils and the Vegetation Degeneration To analyze the 14 factors that influenced the soil and vegetation degeneration risk, the principal components factor characteristic root, the contribution rate and the accumulation contribution rate and the principal components load matrix have been obtained, and have been separately showed on Table 3 and Table 4. By analyzing table 3, the first, second, third and fourth principal components accumulation contribution rate reaches 91.460%. So the four principal components Zl, Z2, Z3 and Z4 could be calculated. By analyzing Table 4,it is found that the load of first principal components factors are bigger in the applying quantity of chemical fertilizer (XI4), the transpiration rate (X10), and the mechanization of agriculture (XI7) in positive direction, the value is separately 0.945, 0.939, 0.913; the industry proportion (X16) is the biggest in the negative direction, its value is 0.931. The load of the second principal components is bigger in accumulative temperature (X13), cultivated area (X15),
relative humidity (XI1) in positive direction, the value is separately 0.873, 0.709and 0.705, and the natural lawn area (X20) is biggest in the negative direction, the value is 0.775. The load of the third principal components is biggest in annual average temperature (X9) in positive direction, the value is 0.735, and the load of other factor values is all smaller in the negative direction. No factor of the fourth principal components load surpasses 0.500. The results of analyzing Table 3 and Table 4 indicate that the biggest value factor of the first principal components load is the chemical fertilization rate, which belongs to the human factor, the contribution rate is 46.299%, but the biggest factor of the second and the third principal components in the load value are separately yearly accumulative temperature and yearly average temperature, all of which belong to the natural factor, the contribution rate sum is 37.903%. All the factors that the fourth principal components factors load are smaller, total contribution rate is only 7.257%, so they do not affect the result. Obviously, the soil and the vegetation degeneration risk is also the result that the natural factor and human factor affected together, and the intensity that human factors acted is higher than the natural factors. Tab.3 Eigenvalue, variance, cumulative Main factor
component
Zl Z2 Z3 Z4 Z5 Z6 Z7
Eigenvalue
Contribution
(%) 6.482 3.157 2.149 1.016 0.637 0.496 0.064
46.299 22.550 15.353 7.257 4.547 3.573 .0456
Cumulative contribution (%
) 46.299 68.850 84.203 91460 96.007 99.544 100.000
* (Z8-Z14 is omitted)
4. Conclusion Through quantitatively analyzing the environmental risk influence factors of Tongliao, the results indicate that the environmental risk is the result that the natural factor and the human factor affect together, and human factor is more important; the contribution rate of the human factor that caused the water resource shortage risk reaches 47.776%, but the natural factor contribution rate reaches 42.302% that is lower than the human factor; the contribution rate of the human factor that caused soil and vegetation degeneration risk reaches 46.299%, but the natural factor contribution rate only reaches 37.903% that is similarly lower than the human factor; the dominant factor that caused environmental risk
Tab.4 Main component factor load matrix X X8 X9 X10 Xll X12 X13 X14 X15 X16 X17 X18 X19 X20 X21
Zl -0.498 -0.290 0.939 -0.561 0.853 0.079 0.945 -0.336 -0.931 0.913 0.718 0.647 -0.401 -0.647
Z2 0.558 0.247 0.115 0.707 -0.364 0.873 -0.049 0.709 -0.291 0.357 -0.042 -0.029 -0.775 -0.248
Z3 0.626 0.725 -0.205 0.253 0.291 0.076 0.299 -0.180 0.100 -0.104 0.284 0.550 0.329 0.622
Z4 -0.140 -0.464 0.146 0.273 0.152 0.412 -0.080 -0.200 0.054 0.089 -0.391 0.439 0.248 0.167
includes that industrial gross output, sewage treatment rate, chemical fertilization rate, the cultivated area and so on. These findings prompt us that reducing or alleviating the Tongliao environmental risk is asked to obtain from the humanity own activity, so as to make the social economy and environment develop in phase, which may provide the scientific basis for environmental risk management. References 1. Wang YJ, Ao BC, Zang YH.Discuss genesic foundation and countermeasure of Tongliao deserted land. Inner Mongolia Forestry Investigation and Design (Supplement), 2002; 25:31-43. 2. Zhang ZF, Wu WQ, Liu DSH. Situation and prevented idea of Tongliao land area soil-water erosion. Soil and water conservation in China, 2001; 6: 29-30. 3. Se YWLJ. Thinking on sustainable development of Tongliao City's economy. Journal of Inner Mongolia University for Nationalities (Social Sciences), 2002; 28 ( 2 ) : 67-71. 4. Zheng YSH, Li JG, Qiao JL. Utilized situation and continual developing way of Tongliao. Modern Agriculture, 2004; 5: 35. 5. Zheng SL, Ao TG, Sun CHF. Inner Mongolia forestry investigation and design. Inner Mongolia Forestry Investigation and Design. 2003; 26 (4) : 31-32. 6. Jia YP. The existing problems and suggestion of ecological environment protection and construction in West Liao River Basin of Tongliao City. Inner Mongolia Environmental Protection, 2004; 16 (1) : 38-40.
PRACTICAL RESEARCH OF THE FLOOD RISK BASED ON INFORMATION DIFFUSION THEORY* XINGCAI ZHANG Institute of Remote Sensing, Zhejiang Normal University, Jinhua 321004, China E-mail: jhqxzxc@zjnu. en LIHUA FENG Department of Geography, Zhejiang Normal University, Jinhua 321004, China Due to the fact that flood data series for small drainage basins is lacking, data that can be used for flood risk analysis is insufficient (Incomplete information). This is the risk analysis under small sample conditions. One method for the analysis of problems of this kind is to consider the small sample as fuzzy information. The optimized fuzzy information can then be processed using the information diffusion theory to obtain a result with higher reliability for risk analysis. Small samples can only supply limited and incomplete information. Statistical rules cannot be clearly demonstrated with only this information. Fortunately, the incomplete information, especially the fuzzy information supplied from the small sample, can be treated using the fuzzy information optimizing technology, based on the information diffusion theory. In this paper the risk analysis method, based on information diffusion theory, was used to advance a model for flood risk analysis. Application of the model was also illustrated taking the Jinhuajiang and Qujiang drainage basins of China as examples. The study indicated that the above model exhibited a fairly stable analytical end result, even in a small sample condition. The method can be applied easily and its analytical end result is easy to understand. It may play a guiding role on disaster prevention to some extent.
1. Introduction Flooding in China is a common natural disaster, one that causes serious damage and harm. It is estimated that of the total economical loss caused by all kinds of disasters, 40% was due to floods. They occur frequently, affect large areas of the community, hence constituting a huge threat to human life and property. Flood damage becomes even more aggravated with the rapid progress of the economy in recent years.
This work was supported by Zhejiang Provincial Science and Technology Foundation of China (No. 2005C23070).
686
Due to the fact that flood data series for small drainage basins is lacking, data that can be used for flood risk analysis is insufficient (Incomplete information). This is the risk analysis under small sample conditions [1]. One method for the analysis of problems of this kind is to consider the small sample as fuzzy information. The optimized fuzzy information can then be processed using the information diffusion theory to obtain a result with higher reliability for risk analysis [2], The information diffusion theory will then help to extract as much as possible underlying useful data and thus improves the accuracy of system recognition. Therefore, the technology can also be called the fuzzy information optimized processing technology [3, 4]. In this paper the risk analysis method based on the information diffusion theory [5] is employed to assess the flood risks. 2. Risk Analysis Method Based on Information Diffusion Theory Information diffusion is a process of fuzzy mathematics that deals with the samples using the set numerical method. A single-valued sample can be transformed into a set numerical-valued sample through this technology. The simplest example of this kind of models is the model of normal diffusion. Suppose the index field of flood damage is represented as: £/={«!, «2, •••, «„} (1) Then, the information carried by a single-valued observation sample of ys can be diffused into each point in the field U according to the following equation:
m
wuzr "
h in the above equation denotes the diffusion coefficient, which can be determined according to the maximum value 6 and minimum value a of the samples and the sample number m in the sample set as follows: j l .4230(6-a)/(m-1) m<\0 ~ [1.4208(6- a) /(iw-1)
w>10
If we let: Cj = £ / , ( « , ) (4) *=\ Then the related attaching function of the fuzzy subset can be represented as following: /!„(«,) = / > , ) / C ,
(5)
The function of juy («,) is called the normalized information distribution of sample vy [6, 7].
A good result for risk analysis can be obtained through the processing of the function of n («,). Let: m
?(«,) = Z>„(«.)
(6)
The physical meaning of the above function is that if the observation value of flood damage can only be chosen as one of the values in the series of u\, «2, " , u„, then the sample number with the observation value of u, can be determined to be q(u,) through the information diffusion from the observation set of {vi, v2, •••, vm} in regards to all values of yj as the representatives of the samples. It is obvious that the value of q(uj) is generally not a positive integer, but it is surely to be a number no less than zero. Furthermore, let:
G = £*(«,)
(7)
In fact, Q should be the sum of the sample number on each point of wf. It is easy to understand that the function, which can be represented as follows: P(u,) = q(u,)/Q (8) should be the frequency value of the sample appeared on the point of uh and the value can be taken as the estimated value of the probability. For the flood damage index set of X= {x\, x2, • • •, x„}, the index universe in the equation (1) is normally selected as Zand an element «,in the universe U can be selected as xt. It is obvious that the probability value transcending of u, is as follows: /*K*«,) = 5 > ( « . )
(9)
The value of p{u >u:) is therefore the required value for the risk assessment. 3. Flood Risk Analysis Application of the risk analysis method on the evaluation of flood damage will be illustrated, taking the Jinhuajiang and Qujiang drainage basins in Zhejiang Province of China as examples. Jinhuajiang River lies in the middle of Zhejiang Province. The annual maximum peak discharge of the river has significant influence on industries and agriculture in the region. Table 1 lists some annual maximum peak discharges surveyed in Jinhua hydrometric station from 1981 to 2004. A set of [0, 10000] on the space of one-dimensional real number can be regarded as the universe of Xj according to the variation range of the annual maximum peak discharge. The continuous universe of [0, 10000] can be transformed into a discrete universe through equidistantly selecting the points. Considering the requirement for calculating accuracy, 21 points were selected to form the discrete universe, which can be represented as following:
689 U={uu u2, ••, M„}={0, 500, 1000, ••, 10000} Risk assessment value p for the annual maximum peak discharge in the Jinhua hydrometric station can be obtained using the equations from (2) to (9) as shown in Table 2. Table 1. The annual maximum peak discharge Qm in the Jinhua hydrometric station in 1981-2004. Year
QM'*)
Year
2„(m'/s)
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
1730 1690 1880 1830 1200 1640 2490 2570 3250 1700 1880 3540
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
3590 3110 2520 2390 3280 2790 3810 4200 1200 3380 1400 2500
Table 2. Risk assessment value p for the annual maximum peak discharge in the Jinhua hydrometric station.
P
Risk level
P
500
1
1000 1500 2000 2500
0.9999 0.9394 0.7384 0.5756
3000 3500 4000 4500 5000
0.3662 0.2530 0.0768 0.0136
Risk level
0
During the calculation, the unit selected is one year. Line 3000 in the table therefore stands for the probability that the Jinhua hydrometric station's annual maximum peak discharge is larger than 3000m3/s is p =0.3662 annually from now on. Or in other words, the annual maximum peak discharge in the Jinhua hydrometric station (under this risk level) can be estimated to be encountered in every two or three years (Reproduction period is equal to 1/ p). Qujiang River lies in the middle west of Zhejiang Province. The Qujiang drainage basin has a climate abundant with light and heat and therefore is rich in various natural resources. However, the region is frequently hit by storm and flood. At present we take Misai and Yanggang hydrometric stations as the
typical hydrometnc stations of the upper and lower reaches of Qujiang River, respectively, and regard the annual maximum seven-day precipitation as the damage index. Then set of [0, 1000] should be the universe of xt. According to the similar calculation above, risk assessment values of the annual maximum seven-day precipitation in the Qujiang drainage basin can be obtained as shown in Table 3. Table 3. Risk assessment value p for the annual maximum seven-day precipitation in the Qujiang drainage basin. Risk level
50 100 150 200 250 300 350 400 450 500 550 600
r Misai
1 0.9991 0.9610 0.7362 0.4883 0.2945 0.1933 0.1301 0.0557 0.0314 0.0011
0
r Yanggang
1 1 0.8216 0.2939 0.0919 0.0402
0 0 0 0 0 0
From the table we find that the risk levels for storm and flood within the drainage basin differ greatly. At the risk level of larger than 200mm of the annual maximum seven-day precipitation, the upper reaches of Qujiang River manifests a property of occurrence of this situation every one or two years, while the situation may occur every three or four years in the lower reaches of the river. At the risk level of larger than 300mm of the annual maximum seven-day precipitation, this situation may occur every three to four years in the upper reaches of the river and every twenty-four or twenty-five years in the lower reaches of the river. At the risk level of larger than 400mm of the annual maximum seven-day precipitation, this situation may occur every seven or eight years in the upper reaches of the river and the situation may not happen at all in the lower reaches of the river. As a result, storm and flood are more frequent in the upper reaches of Qujiang River and as a consequence they can cause more serious damage than other areas.
691 4. Conclusion Small samples can only supply limited and incomplete information. Statistical rules cannot be clearly demonstrated with only this information. Fortunately, the incomplete information, especially the fuzzy information supplied from the small sample, can be treated using the fuzzy information optimizing technology, based on the information diffusion theory. In short, information diffusion means a data treatment process that can change data from a point of traditional data sample to a fuzzy set. In this paper the risk analysis method, based on information diffusion theory, was used to advance a model for flood risk analysis. Application of the model was also illustrated taking the Jinhuajiang and Qujiang drainage basins as examples. The study indicated that the above model exhibited fairly stable analytical end result, even in a small sample condition. The result also indicated that the information diffusion technology is highly capable of extracting sufficient useful information and therefore improves the system recognition accuracy. The method can be applied easily and its analytical end result is easy to understand. It may play a guiding role on disaster prevention to some extent. In combining the risk analysis method with genetic analysis method, the level of the flood risk analysis would be further improved and would finally achieve the goal of flood control and disaster reduction. References 1. 2. 3.
4.
5.
6.
7.
B. Suzanne. Managing toxic chemicals in Australia: a regional analysis of the risk society. Journal of Risk Research, 7(4), 399-412 (2004). C. F. Huang. Demonstration of benefit of information distribution for probability estimation. Signal Processing, 80(6), 1037-1048 (2000). E. A. Chatman. Diffusion theory: A review and test of a conceptual model in information diffusion. Journal of the American Society for Information Science, 37(6), 377-386 (1986). K. Mossberger and K. Hale. "Polydiffusion" in intergovernmental programs: Information diffusion in the school-to-work network. The American Review of Public Administration, 32(4), 398-422 (2002). C. F. Huang, X. L. Liu and G. X. Zhou et al.. Agricultural natural disaster risk assessment method according to the historic disaster data. Journal of Natural Disasters, 7(2), 1-8(1998). D. A. Novikov and A. G. Chkhartishvili. Information equilibrium: Punctual structures of information distribution. Automation and Remote Control, 64(10), 1609-1619 (2003). H. Steven and L. Mark. Information distribution within firms: evidence from stock option exercises. Journal ofAccounting and Economics, 34(1-3), 3-31 (2003).
RISK ANALYSIS FOR AGRICULTURAL DROUGHT BASED ON NEURAL NETWORK OPTIMIZED BY CHAOS ALGORITHM* LINQIU Water Conservancy Department, North China Institute of Water Conservancy and Hydroelectric Power, Zhengzhou City, Henan Province, China XIAONAN CHEN, CHUNQING DUAN, QIANG HUANG Institute of Water Resources and Hydroelectric Engineering, XVan University of Technology, Xi 'an City, Shanxi Province, China Based on the idea of fitting function curve with neural network, this paper establishes the neural network models for estimating agricultural drought quantitatively and describing its probability distribution. The models based on this idea can avoid the inconvenience of establishing concrete mathematical formulas and the calculation of parameters, and this method can fit the probability function when the theoretic distribution of the random variable is unknown. On the choice of training method of network, gradient descending method is combined with chaos algorithm, which can make network find the global minimum quickly. Finally this paper calculates probability distribution of the drought extent of agriculture in Qucun irrigation area, Puyang city, Henan province, validating the correctness of the models.
1. Introduction Crop drought is a phenomenon of serious water deficit in crop as the imbalance between water absorption and consumption, and agricultural drought is the synthetic reflection of drought extents of all crops1. We have put forward that it is the most important in the assessment of agricultural drought to reflect correctly agricultural loss that arise from drought and established new model of evaluating drought extent of agriculture based on Jensen model. The model can not only calculate quantitatively the drought extent but also reflect accurately the agricultural loss2. In addition, we have made the research of the probability distribution of agricultural drought with simulation method, supposing that the distribution of precipitation is submitted to P-III3. However, there are a lot of crop water production functions at present, but none can fit greatly any area. The ' This work is supported by the foundation of creative talents in Henan province, in China.
692
distribution of precipitation is necessarily submitted to P-III and we do not know the distribution of agricultural drought at all. How to get the distribution function of agricultural drought? This paper will solve the problems with neural network optimized by chaos algorithm. 2. Model of Agricultural Drought Based on Neural Network Optimized by Chaos Algorithm 2.1. Traditional Gradient Descending Method The BP network has been used in many fields, but the training speed is unsatisfying and network is inclined to the local minimum in training4. Suppose that the weights matrix of output layer is W , and Wy is the weight joining the rth nerve cell of hidden layer and the /'th nerve cell of output layer, / = 1,2, • • •, h, j = 1,2, • • •, m, the weights matrix of input layer is V , and v,-, is the weight joining the rth nerve cell of input layer and they'th nerve cell of hidden layer, / = 1,2,•••,», j = 1,2,•••,/). {X,Y)'\s a sample where X = (xux2,- ••,*„), F = (y\, ^2>"' '>ym)- The active function can be given as f(neO = l/(\ + e-"el)
(1)
Suppose that the real output of the sample is 0 = (o 1 ,o 2 ,--,o m ) and 0 =(o,,o 2 ,---,o A )is the output of hidden layer. The total errors of all samples can be calculated as
2>ill>-^ 2
(2)
According to gradient descending method, the regulations of Wy and vtj can be gotten as follows: Awy = aSjo] = a(vj - Oj )(1 - o} )Ojo\
(3)
m
Avv = aS)x, = a^{SkwJk)(}-Oj)OjXi k=\
where a is learning efficiency. 2.2. Chaos Optimization Algorithm The non-linear program can be described generally as follows:
(4)
min f(X) •s.t.
gi(X)>0,i
= \,2,-,m hj(X) = 0,j = l,2,-,l
(5)
where f(X) is objective function, gt(X) ,hj(X) are constraint functions. There is a non-linear function at least in these functions. Constraints can be described by a set that is called feasible area as follows:
S = {X\gi(X)>0,i
= \,2,-,m;hj(X) =
0,j^\,2,-,l}
Logistic equation is one of the most classic models in chaos research5. The model can be expressed as Xw=A-xk-(\-xk),xke[0,l]
(6)
where A is control parameter whose value is between 0 and 4. When X =4, the system will get into chaos state. The system will pass all points in the interval [0,1], regardless of any initial point. Suppose that the dimension of X in Eq. (5) is M , X = {xx,x2,---,xM), Xj e[^ ; ,e,] , the main calculation steps of non-linear program with chaos algorithm can be described as follows: Step 1: Initialize system. Let k = 1 , xik is in the interval [0,1] , and make randomly a vector Xk,k = 1,2, • • •, M , in the range [0,1]. Transform vector as x'ik=di+(ei-di)xik
(7)
When vector Xk e S , let X* = Xk, f = f(X*) . Otherwise, step 1 will be repeated until find aXk, which m a k e ^ satisfy constraints in Eq. (5). Step 2: Generate the maximal iterative number N. Bring X, into Eq. (6) and generate chaos vector sequence Xk, k = 2,3, • • •, N . The vectors in sequence are brought Eq. (7) to get Xk, k = 2,3, • • •, N . Check the feasibility of each Xk . If Xk pass the test and f(Xk)
X is the optimal solution a n d / is the optimal value. 2.3. Weights Optimization with Chaos Algorithm When the initial weights are generated, min V'.E'is the objective function and weights are all constrained into [0,1]. The network is trained with gradient descending method. When network plunges into local minimum, weights can be calculated with chaos algorithm, but now weights are all constrained into [Wmiri, Wmix ] , where Wmin is the minimum and Wmwi is the maximum in weights. 2.4. Model of Crop Drought Based on Neural Network Suppose that there are t phases during the growth process of crop. Thus, the number of nerve cell of input layer is t and that of output layer is one. Let the number of nerve cell of hidden layer is n, the sample number is s as: {(Xi,y]),(X2,y2),-,(Xs,ys)} where Xt = (xn, xi2, • • •, x„), i = 1,2, • • •, s, xik is the relative water stress of the rth sample in the kth phase, k = 1,2, •••,!. It can be expressed as: xik=WN)k-(Wc)ikV(WN)k
(8)
where (WN)k is the sum of evapotranspiration of the crop in the Mi phase (ET)k and the allowed minimal soil water content in the phase (w )k , mm, (Wc)ik is the sum of water supply of the rth sample in the fcth phase and the preliminary soil water content of the fth sample in the phase (w0)ik , mm, yi is the loss extent of production, that is, drought extent. It can be calculated by the following formula: ^1-V*™*
(9)
where Yt is the real production of the rth sample, kg/hm2, 7max is the potential production of the crop, kg/hm2. Now the network can be trained by these samples. Run the chaos network that has been trained successfully and calculate the crop drought. 2.5. Model of Agricultural Drought Based on Neural Network After the establishment of the model of crop drought, the model of agricultural drought can be made by weights that can reflect economic loss. Suppose that At (hm2) is the area of the rth crop, Prt (Yuan/kg) is the price of
696 the ith crop, Y^iX is potential production of the ith crop. Thus, the weight of the rth crop Wt can be calculated as follows: Num
W, = ( C • 4 • Pr,)l(£?L*
• *i • Prj)
(10)
Therefore, the extent of agricultural drought Dr is Num
Dr=YJWi(Dr)i
(11)
>=i
3. Probability Distribution of Agricultural Drought Based on Neural Network Calculate sequence of agricultural drought and arrange them by descending sort as Drt > Dr2 >•••> Drn . The frequency of some drought extent can be calculated as follows: P,=i/(n + l)
(12)
where i is sequence number, n is amount of sequence elements, P{ is frequency while drought extent is bigger than Drf or equal with it. Drt is the sample of input layer and Pt is the sample of output layer. The amount of samples is n . After success of training, the network can output corresponding probability/'when some drought extent Dr is inputted. 4. Example We make a research to the probability distribution of agricultural drought without considering irrigation in Qucun irrigation area, Puyang city, Henan province. The statistical parameters about precipitation are known and other detailed data about crops can be found in Ref. 3. Establish model with 20 groups data and check accuracy of model with other 10 groups data. The number of nerve cell of hidden layer is 10. The accomplished network can calculate the weights matrixes of crops. The accuracy of models is respectively 0.013,0.011 and 0.017. Calculate 38 data of agricultural drought according to the data in the area as well as crops models and take them as inputs of network. The ideal outputs of network can be calculated by Eq. (12). The number of nerve cell of hidden is 30. Train network and the error of the 38 group data is 0.019.
Now we can calculate the distribution of agricultural drought with the network. Tab. 1 shows the probability of every region of drought extent. Table 1. Interval distribution of agricultural drought 0.0-0.2 0.2-0.4 0.4-0.6 0.056
0.511
0.375
0.6-0.8
0.8-1.0
0.045
0.013
The first row data are regions of drought extent and the second row data are probability corresponding to every region. We can see the probability is the biggest in the region 0.2-0.4, the value is 0.511. The irrigation area is in semidry and semi-humid area, so it is necessary to make irrigation. 5. Conclusion This paper establishes the model of agricultural drought based on neural network optimized by chaos algorithm and fits the function of probability distribution by neural network. The research show that the model of agricultural drought based on neural network not only keeps the merits of Jensen model that can reflect production loss but also avoids the inconvenience brought from the establishment of mathematical expression and the calculation of parameters. On the other hand, this paper integrates chaos algorithm with BP network, which can make network get to the global minimum without slowing down the training speed. References 1. Yuanhua Li. The theory and technology of water-saving irrigation. Wuhan: Wuhan University of Hydro-electric Engineering Press, 1999. 2. Lin Qiu, Xiaonan Chen, Chunqing Duan. The quantitative analysis of drought extent for agriculture. Journal of Irrigation and Drainage. 2004(3):34-37. 3. Lin Qiu, Xiaonan Chen, Chunqing Duan. Study on probability distribution of agricultural drought extent. Journal of Northwest Sci-tech University of Agriculture and Forestry. 2005(1): 105-108. 4. Z.Sen.Probabilistic Formulation of Spatio-temporal at drought Pattern. Theoretical and Applied Climatology. 1998, 197-206. 5. Zongli Jiang. The introduction of artificial neural network. Beijing: Advanced Education Press, 2001, 40-46.
A COMPUTER SIMULATION METHOD FOR HARMONY AMONG DEPARTMENTS FOR EMERGENCY MANAGEMENT* FUPING YANG, CHONGFU HUANG Institute of Disaster and Public Security, College of Resources Science and Technology, Beijing Normal University, Beijing, 100875, China. Email: [email protected] Response mechanism to harmony among departments for emergency managements is presented for the first time. Then a computer simulation model for request-response is built basing on the response mechanism to harmony. Finally, some mathematical methods involved in the process of implementing the simulation model are discussed.
1. Introduction The development of IT provides an excellent approach to presenting disaster by simulating the events that happen in the real world under a virtual environment. The research of computer simulation methods and theories for management of disaster emergency and rescue can be regarded as a sort of technological work, significant as it is, on disaster mitigation and risk estimation. It is well known that effective and efficient cooperation between vertical and horizontal departments can furthest improve the effect of rescue and mitigate the loss and risk caused by disaster. Harmony among departments for emergency management is so critical that a lot of researchers have done a great deal of work including renovation of constitutions, indemnification of laws and improvement of communications technologies1. But computer simulation has not yet been well applied to risk assessment in disaster management. In this paper, one method about harmony among departments for emergency management is discussed. The purpose of this effort is to establish a foundation for further simulation research on disaster rescue. Section 2 describes our problem under research. Section 3 introduces basic simulation theories and model. Section 4 focuses on mathematical methods for simulation modeling. Finally, conclusion is drawn in the Section 5.
* Project Supported by National Natural Science Foundation of China, No.40371002, and National Key Scientific and Technological Project, No. h02110.
698
699 2. Description of Problems under Research The main problem discussed in this paper is "cooperation and harmony among departments for emergency management" with respect to disasters, such as earthquakes and floods. The common organizational form consisting of departments for emergency management could be induced by researching on typical framework and organization of disaster management from China, Japan, Russia and America, where a central department is responsible for commanding work of emergency action during a disaster rescue2. This core institution can be called the centre of emergency command. Its main function is to organize and coordinate all parts of emergency and rescue in an action so that all resources for disaster rescue can be utilized effectively and efficiently. At the same time, there are a lot of sub-departments in the charge of the command centre. And below the sub-departments, there are usually some other subordinate units. So this kind of organization structure can be considered as a tree of organization. It can be illustrated in figure 1. |
Command centre
/ | Sub-department 1
|
i
\
| | Sub-department 2 | Unit 1
|
| Unit2
|
• | Sub-department n
1
| Unitm
\
|
Figure 1. A tree figure for organization of emergency management and rescue
Now, to assessment the harmonious degree of this tree structure is a major task. In practice, harmony of all departments is shown dynamically in the course of a rescue action and vertical and horizontal cooperation goes through the whole emergency action. Furthermore, interactions among these departments usually come into being a complex network illustrated in figure 2. |
*>"
Command centre
*
|
%
N
Figure 2. A network figure for interaction among departments
It is necessary to define bunches of estimation indexes for each node of the tree. Yet, how to take advantage of those well-defined indexes to assess the harmonious degree of the entire emergency organization system is a tough task.
700 3. Basic Theoretical Idea In a rescue action, interactions among sections of emergency managements are extremely complex and netty. If the rescuing process is dealt with directly, it will be very hard. So, a new method is introduced to analyze this process in this section. 3.1 Principle of Response Mechanism for Harmony For a harmonious system that is made up of a lot of units, each piece of requesting information from any unit of the system can receive at least one reasonable response from other units of the system. This fact exists universally in our lives. Especially, in a rescuing action, all departments for emergency managements have to keep highly harmony. For a well-performed organization of emergency and rescue, where there is a request, there is a better response. In practice, the response among elements of a system is mainly influenced by three factors: attributes of elements themselves, internal and external environment and stochastic factors. The attributes of elements themselves determine the main properties of response. The environmental and stochastic factors possibly influence the response strength and fashion. 3.2 Model Hypotheses To find a way to researching on harmony among departments for emergency management, some critical theoretical hypotheses are necessary as follows: 1. Performance of a system in its response mechanism can represent the degree of harmony of the system approximately. 2. The request-response among elements of a system is always one-to-one. That is to say, all request-responses happen between two elements. That's reasonable because interactions among departments in a longer period can be considered as combination of interactions between two departments in a shorter period. 3. The more harmonious all the departments are, the stronger the response between two departments is shown. This fact is taken from common sense in our lives. 4. The more harmonious a certain department is, the more active it is in an action. This can be reflected through recording the number of its response. This is as well really true in the course of a rescuing action. 5. All information that is interacted between any two departments should pass through the centre of commanding. This condition should be understood from two different views. One is that the command centre is the core of an organization of emergency management and almost all the information of request-response should go through it and get instruction from the centre. The other is that the information that is directly interacted between two
701 certain units should pass through the centre and get an empty instruction. The former is accordant with reality in a certain degree. And the latter can make the process of information interaction clearer and simplify simulation implementation of this procedure. These hypotheses above form the base of the following simulation model. 3.3 Simulation Modelfor Request-response According to previous principle of response mechanism and hypotheses, the request-response model among departments is illustrated in figure 3, which is simplification of figure 2. I
S"\ 1 Sub-department 1
1
xl I
Command centre
I
N
% 1 Sub-department 2
t
Responser
\<
*• I Sub-department n
/ /
I
\
Figure 3. A simulation model for request-response among departments
In above simulation model, each unit can randomly send request information to anyone else except the unit of responser that is only in charge of calculating response strength and collecting corresponding data. During a simulation, after any unit in the system receives request information, the unit will activate the responser and so all units are directly linked to the responser by a straight line with single arrowhead. At the same time, in figure 3, only two management top layers is figured for any two layers that is next to each other have the same topological structure. Here we don't eliminate the horizontal interaction among departments but do change the route of interaction information. That is to say, horizontal interaction information always passes through the command centre. This change doesn't affect function of the whole system at all; on the contrary, it enhances the feasibility of simulation of the system. 4. Mathematical Theoretical Methods for Simulation Modeling It is necessary to find a series of estimation indices for the whole system and all the units. But there is a question that most of the indices well-defined are often recorded by qualitative words such as good, better, best and so on. How to transform them into quantitative indices which computer can recognize? And how is the response strength calculated? Answering these questions are our following dedications.
702 4.1 Flow for Simulation The following figure 4 is used to simply show flow for simulation of the model: Qualitative indices
I
Quantitative indices
II
Simulation indices
m
Response indices
Figure 4. A simple flow figure for simulation
The flow figure above shows that the assessment indexes will be transformed or calculated three times in the course of simulation. It is obvious that each step is different from other ones. Mathematical methods for these steps will be respectively introduced in the following subsections. 4.2 Fuzzy Transform for Indexes of Estimation At the step I in figure 4, the indexes need to be transformed form qualitative variables to quantitative variables so that computer can recognize and process them. However, one of estimation indexes for an organization is usually described in several fuzzy words such as good, better etc. So it is natural to realize the transform of step I with methods of fuzzy sets. Let x be any qualitative estimation index for some department and F(x) be a real interval function from 0 to 1. The transform of this step can be realized by defining F(x) as follows: x e D. [0,*,] \*1 > *2 J
F(x)
xeD2
1 v^-2 > * 3 J
x e Di
\%5 > -"'4 J
x e DA
l(*4.1]
0)
X 6 Dr
4.3 Generation of Stochastic Input At the step II, methods of statistics and fuzzy information process find their way. There are three kinds of situations. For the first one, if only one interval is gained for one index, the input transform is easy by generating random number on the corresponding interval. For the second one, if several intervals for one index are collected but not complete, the methods of fuzzy information diffusion presented by Prof. Chongfu Huang3'4 will well perform. For the third one, if the data for one index is enough, probabilistic and stochastic theories can be employed. As a result of the above discussion, the final random inputs for simulation can be generated in different approaches.
703 4.4 Measurement Function for Harmonious Degree At the step III, the outputs will be calculated. There are a series of indices of harmony, such as strength and time of response, need to be collected. Here, only response strength is discussed. Let the response function be R(F(a),F(e),£ ) . And this function will be affected by three function variables F(a), F(e) andf representing department itself, environment and uncertain stochastic factor respectively. So R(F(a),F(e),£ ) can be expressed as follows: R(F(a),F(e),s) = axF{a) + a2F(e) + e where F(a) and F(e) are written as follows respectively: F(a) = anxai+auxa2+--+ alnxan
(2) (3)
F(e) = a2lxa + a22xe2 +••• + a2mxem (4) And in the above equations, a, (i= 1,2,3) is a constant coefficient, ay is a weight of corresponding index, Xy is one of indices of a unit. 5. Conclusion In sections above, the principle of response mechanism for harmony was presented for the first time. And then the simulation model for request-response was built on previous response mechanism. Finally, study was focused on mathematical methods of the implement for simulation model. Work being done on this paper is a preliminary research. There are a lot of efforts that need be done in the future on harmony among organizations, among all kinds of forces for disaster rescue and among all rescuing resources. References 1.
2. 3.
4.
Teng Wuxiao, Establishment of the emergency response system of China by referring to earthquake disaster emergency systems of Japan and US, Journal of Disaster Prevention and Mitigation Engineering, 24(3), (2004), 323-329 Cui Qiuwen and Miao Chonggang, Overview of International Earthquake Emergency and Rescue, Meteorology Press, 2004 C.F. Huang, Information diffusion techniques and small sample problem. International Journal of Information Technology and Decision Making, 1(2), (2002), 229-249 C.F. Huang, and Y. Shi, Towards Efficient Fuzzy Information Processing Using the Principle of Information Diffusion, Physica-Verlag, Heidelberg, 2002.
AN APPROACH OF MOBILE ROBOT ENVIRONMENT MODELING BASED ON ULTRASONIC SENSORS ARRAY PRINCIPAL COMPONENTS YONG-QIAN ZHANG, FANG LI, HONG-MING WANG, ZENG-GUANG HOU, MIN TAN The Key Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of sciences, Beijing 100080,China e-mail: yongqian,zhang@ia. ac. en MADAN M. GUPTA, PETER N. NIKIFORUK Intelligent Systems Research Laboratory, College of Engineering, University of Saskatchewan, Saskatoon, Saskatchewan S7N 5A9, Canada. e-mail: [email protected] This paper presents a novel approach to mobile robot environment modeling based on principal component analysis of ultrasonic sensors array data. A principal components space, which has the less dimensionality than the raw data space, is constructed from the principal components of a large number of ultrasonic sensors data sets. Subsequent ultrasonic data sets from the environment project as a point in this principal components space. By application of SVM (support vector machine) method, these projections are classified into typical local structures of the environment in order for the robot to discriminate them. The experimental results will be provided to show that the proposed method is satisfactory for mobile robot navigation applications.
1. Introduction The navigation of an intelligent mobile robot is generally defined as controlling motion to arrive at a known location [1]. Ultrasonic sensing methods using multiple receivers or using wave shapes of the echoes have been researched for the localization and navigation tasks of known environmental features [2], Many conventional research approaches on sensor-based navigation of mobile robots in unknown, unstructured and complicated environment assume that robots are able to measure precise information [3]. Some researchers focused on the localization issue, which is based on extracting the contour free of obstacles around the robot from a local evidence grid [4]. With those methods, an accurate bearing angle measurement can be achieved. However, those methods are limited to measure only in the area of the directivity of the transducer, so 704
mechanical rotation of the sensor head is required when these sensor are used on a mobile robot, hence it takes time to measure the environment around the robot. In this paper, we propose an approach to accomplish environment modeling based on principal components of ultrasonic sensors array. In this approach, a principal components space, which has the less dimensionality than the raw sonar data space, is constructed from the principal components of a large number of sonar readings. Subsequent ultrasonic data sets from the environment project as a point in this principal components space. By application of SVM (support vector machine) method, these projections are classified into typical local structures of the environment for the robot to discriminate them. The remainder of this article is structured as follows: In the next section, a brief overview of our mobile robot CASIA-I and its sensing unit will be set forth. Then, the procedure of environment modeling based on principal components analysis and the classification by SVM is presented. Finally, some environment modeling experiments are performed and conclusions are drawn. 2. Environment modeling Based on PCA of Ultrasonic Data 2.1. Mobile Robot CASIA-I and Its Sensor System We designed and developed a mobile robot, which is called CASIA-I, as the experimental platform for testing various methods and algorithms [1]. Due to different applications, there is no unique way to select the type and number of sensors for a mobile robot. In the CASIA-I, one CCD, one electronic compass, sixteen ultrasonic, sixteen infrared and sixteen touch sensors are being employed. 2.2. Procedure of Environment Modeling )
-«0
Figure 1. A segment of the environment of our laboratory
During the process of navigation, the mobile robot cannot constitute the global environment model on account of unknown, dynamic, unstructured and uncertain factors and can only gain the real time local environment model by
means of the sensors equipped on it. Hence, constituting the local environment model reliably in real time determines whether the mobile robot is able to move safely, continuously and smoothly. Figure 1 shows the typical environment of our laboratory. The environment modeling is described as classifying the environment into typical model by information fusion and identification algorithm in terms of the data of ultrasonic sensors array. The typical models of environment are shown as figure 4: (a) moving along the corridor, (b) moving towards the extreme of corridor, (c) moving backwards from the extreme of corridor, (d) moving with the branch way leftwards, (e) moving with the branch way rightwards, (f) moving towards the branch way, (g) moving backwards form the branch way.
6
IR f*l " f r i Y i Y i V
(")
(b)
(c) (d) (e) (f) Figure 2. The environment models of corridor
(g)
During the procedure, sets of ultrasonic data instead of discrete individual points of sonar reading are employed. Because, if the robot utilizes a set of sonar data acquired from the previous time to the current one, the mobile robot can take account of the changes of ultrasonic data and reduce the uncertainties in discriminating the local structures. In our experiments, one step corresponds to 15cm approximately, and one set of ultrasonic sensors data includes three single rings of 16 ultrasonic sensors readings at three positions along a line. Hence, each sample of a set of ultrasonic sensors data has 48 dimensions. The particular environment modeling procedure is summarized as follows: Stepl: CASIA-I explores in the corridor and enables the sonar system at the same time in order to store sets of ultrasonic sensors array range data. Step2: The principal components analysis of sets of ultrasonic sensors data is employed to reduce the dimensions of the sets of ultrasonic data. As a result, a principal components space is constructed and each set of ultrasonic sensors data is projected as a point in the principal components space. Step3: Apply the identification algorithms of SVM to classify these projections mentioned above into typical seven models shown in figure 2. 2.3. Principal Components Analysis of Sets of Ultrasonic Sensors Data As mentioned above, each individual ultrasonic sensor measurement can be considered as an independent dimension. Such a measurement is not commonly applied because of the ultrasonic wave characteristic error and the obvious
correlation between adjacent measurements. Principal component analysis provides a method to automatically identify the dependence structure behind a multivariate stochastic observation in order to obtain a compact description of it. In this paper, PCA is employed to accomplish the process of original ultrasonic data to enhance the signal noise ratio and improve the capability of antijamming. 2.4. Support Vector Machines Based Classifier Design Support vector machines proposed by Vapnik [6] [7] were widely used for pattern classification and nonlinear regression problems in the last few years. For a classification problem, support vector machines transform the original problem into a higher-dimensional feature space which makes the difficult problem tractable. The various kernel function results in different support vector machines algorithm. The current kernel functions investigated for the pattern recognition problem are: a polynomial of degree q, Gaussian radial basis function and sigmoidal neural network kernel function. In our experiments, the three types of kernel function are employed respectively to classify the projections in the principal components space of ultrasonic sensors data and result in a satisfactory recognition rate. 3. Real Robot Simulations and Experiments 3.1. Experiments of Construction of the Principal Components In our experiments, we drive our robot in the corridor shown in figure 1. The robot acquires 80 sets of ultrasonic data in various environment models shown in Fig. 2 respectively. Hence, we gain a sample matrix which has the dimension of 48 x 80 at each environment and 48 x 560 totally. For the 80 sets samples at each model, 60 sets samples are employed for training and the 20 sets left for testing.
*• *.,.&, .8.-. • *
Figure 3. The contribution rate of single and accumulation of principal components
Applying the principal component analysis to all 420 sets of sample data, we find that the largest five principal components contribute larger than 84.85% (shown as figure 3). Hence, the first five principal components are extracted to expand the projection space of the original data. Because the projections of all seven shapes of the environment on the principal components space can not be visualized, we illustrate the 5 and 30 sets of projections for model (a), (b) and (c) on the first 3 principal components plotted by signal '*','o' and '+' respectively (shown as figure 4).
Figure 4. Projections on the first 3 principal components principal components
3.2. Experiments ofSVMs Classifier
trail
Table 1 Results of various kernel functions with different parameters applied SVMs classifier Recognition rate (%) on various kernel function environment polynomial (g=) Sigmoid((v,c)=) radial basis( 8 =) 3del (2,1.1) (2,0 0.01 0.001 0.006 (2,0.9) 2 3 4 (a) 96.67 96.67 96.67 98.33 98.33 98.33 98.33 96.67 96.67 98.33 100 98.33 98.33 100 100 98.33 (b) 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 (c) & (d) 100 100 98.33 100 100 100 100 98.33 100 £ 98.33 98.33 98.33 98.33 100 98.33 98.33 98.33 98.33 (e) 98.33 98.33 98.33 98.33 98.33 98.33 98.33 98.33 % if) 98.33 98.33 98.33 98.33 100 100 100 98.33 98.33 98.33 90.00 90.00 90.00 95.00 95.00 95.00 90.00 90.00 95.00 (a) 90.00 95.00 90.00 95.00 95.00 95.00 90.00 100 90.00 (b) 100 95.00 95.00 95.00 95.00 95.00 95.00 95.00 90.00 (c) !' 8 (d) 90.00 90.00 90.00 95.00 95.00 95.00 90.00 90.00 90.00 («) 95.00 95.00 95.00 95.00 100 95.00 95.00 95.00 95.00 W 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 average 96.96 97.32 96.79 97.86 98.57 97.68 97.14 97.50 96.96
§
to
s
les
1
fe)
SVMs classifier is applied to identify these projections of 60 sets of training samples and 20 sets of testing samples respectively. Analyzing the experimental results in table 3, we have: when the kernel function is a radial basis function of S =0.001, the recognition rate of all seven local environment models tends to
709 maximum 98.57% and when the kernel functions are polynomial with q=7> and sigmoid function with (v,c)=(2.1), the highest recognition rates are obtained as 97.32%) and 97.50% respectively. By the comparison of various kernel functions, therefore, the radial basis function can be determined to the final accomplishment of the SVMs classifier. 4. Conclusion and Perspective Ultrasonic sensors are commonly used to measure, apperceive and communicate with the environment in an active way. In this paper, a PCA based ultrasonic sensors array information processing approach to mobile robot environment modeling is presented. A principal components space is constructed from a large number of ultrasonic sensors data sets. By application of SVM classifier, these projections are classified into typical local structures of the environment. The experimental results show that the proposed method is satisfactory for mobile robot navigation and localization applications. Because of space limitations, this paper is restricted to explaining the foundation of the approach and presenting examples. On the basis of this paper, we can furthermore apply the conventional path planning or reinforcement learning algorithm for the real time navigation task.
Acknowledgments This research was supported partially by the National Natural Science Foundation of China (Grants 60205004, 50475179 and 60334020), the Hi-Tech R&D Program (863) of China (Grant 2002AA423160).
References 1. Z. G. Hou, M. Tan, M.M. Gupta, P. N. Nikiforuk and N. Homma, 18th annual Canadian Conf. on electrical and computer engineering, CCECE05. (2005). 2. H.Choset, K. Nagatani, A. Rizze, Proc. of SPIE Conf. on System and Manufacturing. Pittsburgh, USA. pp.72-83. (1998). 3. L. Kleeman, Proc. of IEEE/RSJ Int. Conf. on IROS. Osaka, Japan, pp.96103.(1996). 4. A. Poncela, C. Urdiales, C. Trazegnies, F. Sandoval. Proc. of the 6th International FLINS Conference, pp 456-462. (2004).
5.
K.I. Diamantaras, S. Y. Kung. Principal component Neural Networks, Theory and Applications. John Wiley & Sons, Inc. (1996). 6. V.N. Vapnik. The Nature of Statistical Learning Theory, 2nd Ed., New York: Springer-Verlag, (2000). 7. C.J.C. Burges. Data Mining and Knowledge Discovery, Vol. 2, No. 2, pp. 121-167,(1998).
SLAM WITH CORNER FEATURES FROM A NOVEL CURVATURE-BASED LOCAL M A P REPRESENTATION
R. VAZQUEZ-MARTIN, P. NUNEZ, J. C. DEL TORO, A. BANDERA AND F. SANDOVAL Grupo de Ingenieria de Sistemas Integrados Departamento de Tecnologia Electronica, Universidad de Malaga Campus de Teatinos, 29071-Mdlaga (Spain) E-mail: rvmariiniSuma.es This paper presents a solution to the Simultaneous Localization and Map Building (SLAM) problem for a mobile agent which navigates in an indoor environment and it is equipped with a conventional laser range finder. The approach is based on the stochastic paradigm and it employs a novel feature-based approach for the map representation. Stochastic SLAM is performed by storing the robot pose and landmark locations in a single state vector, and estimating it by means of a recursive process. In our case, this estimation process is based on an extended Kalman filter (EKF). The main novelty of the described system is the efficient approach for natural feature extraction. This approach employs the curvature information associated to every planar scan provided by the laser range finder. In this work, corner feature has been considered. Real experiments carried out with a mobile robot show that the proposed approach acquires corners of the environment in a fast and accurate way. These landmarks permit to simultaneously localize the robot and build a corner-based map of the environment.
1. I n t r o d u c t i o n The difficulty of the simultaneous localization and map building (SLAM) problem lies in the fact that an accurate estimation of the robot trajectory is required to obtain a good map, and to minimize the unbounded growing odometry errors requires to associate sensor measurements with a precise map 5 . To increase the efficiency and robustness of the process, sensor data have to be transformed in a more compact form before attempting to compare them to the ones presented on a map or store them in the map that is being built. In either case, the chosen map representation heavily determines the precision and reliability of the whole task 4 . Typical choices for the map representation include occupancy grids, topological maps and feature maps 1 . 711
712 In this paper, a feature based approach is employed to solve the SLAM problem. Feature maps are a suitable representation for long-term convergent SLAM in medium-scale environments 1 . It allows the use of multiple models to describe the measurement process for different parts of the environment and it avoids the data smearing effect5. In order to achieve consistent estimation of the robot pose, a basic stochastic SLAM algorithm is used. This algorithm stores robot and landmarks locations in a state vector and updates these estimates using an extended Kalman filter (EKF). This approach suffers from three main disadvantages: high computation and storage costs, fragile data association and inconsistent treatment of non-linearity 1 , 5 , 2 . In this work, we will demonstrate that all these weaknesses can be alleviated if a fast and reliable algorithm to extract landmarks for the large set of noisy and uncertain data is employed. The proposed landmark acquisition algorithm is based on the curvature information associated to every scan provided by the laser range finder. Particularly, in this work we only consider corner features. These corner features will permit to simultaneously localize the robot and build a map of the environment. The rest of the paper is organized as follows: Section 2 describes the proposed EKF-SLAM algorithm, where the curvature-based corner acquisition algorithm is included. Section 3 presents experimental results and, finally, Section 4 summarizes conclusions and future work. 2. D e s c r i p t i o n of t h e P r o p o s e d S y s t e m In the standard EKF-based approach to SLAM, the robot pose and landmark locations at time step k are represented by a stochastic state vector x£ with estimated mean x£ and estimated error covariance P * . The mean vector x j contains the estimated robot pose, x£, and the estimated environment landmarks positions, x*,, all with respect to a base reference W. This concatenation is necessary as consistent SLAM relies on the maintenance of correlations PJj m between the robot and the map 1 . In this work, we use the robot pose at step k=Q as the base reference (W = x ° ) . Thus, the map can be initialized with zero covariance for the robot pose, x° = ( 0 , 0 , 0 ) T , P ° = 0. Previous work has showed that this improves the consistency of the EKF-SLAM algorithm 2 . For convenience, the k notation can be dropped in this Section as the sequence of operations is apparent from its context. Then, the mean x 0 and covariance P a of the state vector can be defined as *vv
-turn
p
±mm.
p ±vm PT P . * vm
•*• mm
_
When the robot pose and map landmarks are stored in a single state vector, stochastic SLAM1'5'2 is performed by estimating the state parameters via a recursive process of prediction and correction. The prediction stage deals with robot motion based on incremental dead reckoning estimates, and increases the uncertainty of the robot pose estimate. Then, new landmarks are acquired from the environment. These landmarks are associated to the previously stored ones. The update stage employs this data association to improve the overall state estimate. Finally, if a landmark is observed for the first time, it is added to the state vector through an initialization process called state augmentation. Next subsections deal with the stages of the described EKF-SLAM algorithm. The proposed landmark acquisition stage will be explained in subsection 2.2. 2.1. Prediction
stage
When the robot moves from pose at step k — 1 to pose at step k, its motion is estimated by odometry. In our case, the system has been tested on a four wheeled robot, where left and right wheels are mechanically coupled and, thus, encoders only return right and left speeds. Assuming that the robot state is represented by its pose, xv = (xv yv <j>v)T, the prediction stage only changes the robot pose part of the state vector and the P„ and P u m submatrices in the state covariance matrix. Map landmarks remain stationary. Therefore, the predicted state is given by
+ LI D• C Xv + yv+D-
s
<}>V + A<j>
V/xaPaV/J+Q
= /(xa,u)
(2)
c = cos < s = sin ( where u = (D A
2.2. Curvature-based
V0xv Ovm
landmark
V5x„
acquisition
=
10 -Ds 0 1 Dc 1 00
(3)
stage
In this work, we characterize each range reading of the laser scan by a curvature index. This index is adaptively filtered according to the distance
between possible corners in the whole laser scan. This filtering permits to remove noise, but scan features are nevertheless detected despite their natural scale. For each range reading i = (xi,yi) of a laser scan, the proposed method for corner acquisition consists of the following steps: (1) Calculation of the maximum length of laser scan presenting no discontinuities on the right and left sides of the range reading i: Kj(i) and Kb(i), respectively. Kf(i) is calculated by comparing the Euclidean distance from i to its Kf{i)-th neighbour (d(i,i + Kf(i))) to the real length of the laser scan between both range readings(/(«, i+Kf(i))). Both distance tend to be the same in absence of corners, even if laser scans are noisy. Otherwise, the Euclidean distance is quite shorter than the real length. Thus, Kf(i) is the largest value that satisfies d(i,i + Kf(i))>l(i,i
+ Kj(i))-Uk
(4)
where Uk is a constant value that depends on the noise level tolerated by the detector. Kb(i) is also set according to Eq. (4), but using i — K(,(i) instead of i 4- Kj{i). (2) Calculation of the local vectors fi and foj associated to each range reading i. These vectors represent the variation in the x and y axis between range readings i and i + Kf(i) and between i and i — Ki,(i). They are defined as fi = {%i+Kf(i)-Xi,yi+Kf(i)-yi)
h =
(xi^Kb^)-Xi,yi-Kb(i)-yi)
(5) (3) Calculation of the angle associated to i. According to previous works3, the angle at range reading i can be estimated as follows
(4) Detection of corners over \Kg(i)\. Corners are those range readings which satisfy the following conditions: i) they are local peaks of the curvature function and ii) their |-Kfl(i)| values are over the minimum angle required to be considered a corner instead of a spurious peak due to remaining noise (0min). 2.3. Data association
stage
Once corner features have been acquired, they must be associated to previously stored ones. Correct correspondence of observed landmarks to map
ones is essential for consistent map building because a single failure may invalidate the whole process. In our case, landmarks are distinguishable only by their positions. Therefore, correspondences established by the data association stage are constrained by statistical geometric information. In this work, the normalised innovation squared (NIS) defines the validation gate or maximum discrepancy between a measurement z and a predicted observation h(kj) for target Xj 1 . Given an observation innovation Vij with covariance S ^ , the NIS forms a x2 distribution. The gate is applied as a maximum NIS threshold, j n . Then, = z - h{xj) Sn = V / » X o P - V / £ + R NIS = vfP7?vH < 7n (7) 2 The integral of the x distribution from 0 to j„ specifies the probability that, if z is a true observation of target Zj, the association will be accepted. In our experiments, the innovation vector is of dimension 2, and the gate 72 equal to 6.0, if Zj is truely an observation of landmark Xj the association will be accepted with 90% of probability. Vij
The validation gate defines an ellipsoid in the observation space centred about the predicted observation h(kj). Then, an acceptable observation must fall within this ellipse. Data association ambiguity occurs if either multiple observations fall within the validation gates of a particular target, or a single observation lies within the gates of multiple targets. The most common ambiguity resolution method is nearest neighbour data association. Given a set of observations, Z, within the validation gate of target x, a normalised distance NDi can be calculated to each z; € Z NDi = vfS^vi
+ log\Si\
(8)
Nearest neighbour data association then chooses the observation that minimizes NDi. This is the simplest d a t a association algorithm and it can only associate a single observation at each step k. 2.4. Updating
stage
If an observation z is correctly associated to a map landmark estimate (XJ, j/i), then the perceived information is related to the map by Zj — tljryX.aJ
Ax • c + Ay • s -Ax
Ax = (xi - xv)
(9)
• s + Ay • c Ay = (j/j -
yv)
The Kalman gain Kj can be obtained as ui = z-hi(^)
Si = V / i X a P - V / £ + R
K^p-V/i^Sr
1
(10)
where R is the observation covariance and the Jacobian V/i x „ is given by —c — s — s • Ax + c • Ay 0 ... c s ... 0 s —c—c- Ax — s • Ay 0 ... — s c ... 0 ( i i )
^-(£L
It can be noted that the Jacobian V/i X o only presents non-zero terms align with the positions of the robot states and the observed feature states in the augmented state vector. The posterior SLAM estimate is determined from x+ = x - + KiVi 2.5. State
P + = P " - KSiKj
augmentation
(12)
stage
As the environment is explored, new landmarks are observed and must be added to the map. To initialise new landmarks, the state vector and •ariance matrix are augmented with the values of the new observation, and its covariance, R, as measured relative to the observer. P 1
*aug —
P vv
>T
±
fl u
vm
P
0
vm •* mm
A function gi is employed to translate z = (xc This transformation is denned as gi{xv,z)
Xj
=
y%\
Xv
-+- Xc
0
yc)T
0 R to a global location.
c/ c • S
' C
(13)
u
x
L Vv + c • s + yc • c
(14)
Then, the augmented state can be initialized by performing a transformation to global coordinates by the function /» as follows Xa
Ji\%aug) —
v/xau9PaugV/x
ffi(x„,z)_
(15)
can be derived as
The Jacobian V / X a
Iv 0 0 Im Vfi(x„ 0
dfj V/x
UA
au
0 0 Vgz
(16)
where V
=
1 0 — xc • s — yc • c 0 1 xc • c — yc • s
Vffz
c —s
(17)
s c
The posterior SLAM covariance matrix, P + , is as follows "v
"«m
p+ =
" t / *9: p„ V"T
(18)
,v f f x
V s x „ P ^ VgxvPvm
V Xv P„V<£ v +
\>gzRVgl
717
odometry — — - estimate Landmarks + First observation * Last estimation
odometry estimate Landmarks + First observation * Last estimation 10
15 motors
(a)
(b)
Figure 1. a) Estimated trajectory of the robot using the proposed landmark detection method; and b) Estimated trajectory of the robot using a slower landmark acquisition rate.
3. Experimental Results Figs, l.a and l.b show the experimental results obtained by running two different landmark acquisition rates. Figures illustrate the estimated trajectory. The robot pose uncertainty has been drawn over the trajectory. Fig. l.a has been generated using the proposed landmark acquisition algorithm. The whole EKF-SLAM algorithm runs every 200 ms on the 400MHz Versak6 PC 104+ embedded on our Pioneer 2AT mobile platform. The landmark acquisition algorithm only takes 25 ms including 180° laser data acquisition. It can be noted that the robot pose uncertainty is bounded due to a more frequent updating. To obtain the estimated trajectory of Fig. l.b, the robot has been moved through the same path. However, in this case the landmark acquisition algorithm is slower. Then, less landmarks are acquired and the updating rate decreases. It can be appreciated that the robot pose uncertainty is higher and its pose estimation is poorer than in Fig. l.a. 4. Conclusions and Future Work Experiments show that EKF-based SLAM problems can be alleviated when a fast and reliable landmark acquisition algorithm is employed. The proposed landmark extraction algorithm reduces the computational cost associated to the whole process. This fact is specially interesting because
it avoids large periods without suitable update process. The increasing of the updating rate reduces the robot pose uncertainty and avoids that d a t a association becomes very fragile 1 . On the other hand, it has been shown that in the basic EKF-SLAM framework, linearization errors produce inconsistency problems 1 . These problems can be reduced using local maps or robocentric mapping 2 . Future work will be focused on the combination of these techniques with the proposed fast landmark extraction approach in order to further improve map consistency. Besides, a batch data association method would improve the updating stage because it provides more information in the innovation 1 . Acknowledgments This work has been partially supported by the Spanish Ministerio de Education y Ciencia (MEC) project no. TIN2005-01349. References 1. T. Bailey. Mobile robot localisation and mapping in extensive outdoor environments, PhD Thesis, Australian Centre for Field Robotics, University of Sydney (2002). 2. J. A. Castellanos, J. Neira and J. D. Tards. Limits to the consistency of EKFbased SLAM, 5th IFAC Symp. on Intelligent Autonomous Vehicles (IAV'04), Lisbon-Portugal (2004). 3. P. Reche, C. Urdiales, A. Bandera, C. Trazegnies and F. Sandoval. Corner detection by means of contour local vectors, Electronics Letters, 38(14), pp. 699-701 (2002). 4. S. Roumeliotis and G. A. Bekey. SEGMENTS: A layered, dual kalman filter algorithm for indoor feature extraction, Proc. IEBE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 454-461 (2000). 5. J. D. Tardos, J. Neira, P. M. Newman and J. J. Leonard. Robust mapping and localization in indoor environments using sonar data, Int. Journal of Robotics Research, pp. 311-330 (2002).
OBSTACLE AVOIDANCE LEARNING FOR BIOMIMETIC ROBOT FISH ZHIZHONG SHEN, MIN TAN, ZHIQIANG CAO, SHUO WANG, ZENGGUANG HOU Laboratory of Complex Systems and Intelligence Science, Institute of Automation, Chinese Academy of Sciences Beijing, 100080, China Abstract- Biomimetic robot fish can play an important role in underwater object reconnaissance and tracking. In order to fulfill more complex tasks, the capability of obstacle avoidance is an indispensable one for robot fish. The design of biomimetic robot fish with multiple ultrasonic and infrared sensors is presented in the paper. An obstacle avoidance strategy based on the reinforcement learning is proposed. State-behavior pairs are obtained. Computer simulation shows the validity of the learning results.
1. Introduction Bionics has extended into many domains since 1960s. As one of main applications, bionics has been integrated with robotics closely. Fish, the earliest vertebrate in the nature, has evolved into extraordinary swimming ability. This provides human with new thoughts combining the advantages of fishes with robot technology to develop new style subaqueous thruster [1], MIT developed successfully a biomimetic robot fish (RoboTuna) [2] in 1994, which originated research and development of robot fish. Since then, a lot of experimental platforms and samples of biomimetic robot fish have appeared for different purposes [3] [4] [5]. Some reviews concerning fish swimming and the analytical methods that had been applied to some of their propulsive mechanisms have appeared [6] [7], It is probable that robot fish will play an important role in many domains with the characteristics such as flexible maneuverability, high efficiency and low noise. Nevertheless, now many robot fishes must be controlled by upper computers, and it leads to limited applications. Therefore, the ability of autonomous swimming for robot fish becomes an important research topic. In this paper, a kind of biomimetic robot fish with ultrasonic and infrared sensors is designed and an autonomous obstacle avoidance strategy is put forward based on the reinforcement learning. 719
2. Biomimetic Robot Fish with Ultrasonic and Infrared Sensors Based on the volume of robot fish and requirement of task, three ultrasonic sensors are equipped with the robot fish. The beam angle of each sensor is 30 degree. These three sensors are located respectively in right, frontal, left side of the robot fish head. In addition, in order to make up the measured blind area of ultrasonic sensor, three infrared sensors are equipped. The infrared sensors are regarded as signals to take action in urgent situations at the same time. The obstacle avoidance for robot fish is realized by controlling the turning mode of tail fin. As shown in Figure 1, three turning modes are defined for robot fish, which are described as follows. • •
•
Mode A, turning in the course of robot fish swimming. The robot fish swings its tail only to one side during a turning. Mode B, swift turning, namely, robot fish swings its tail to one ultimate position, keep the body hold C posture, then the robot fish turn depending on hydrodynamics force. Mode C, stationary turning, namely, robot fish swings its tail to one side rapidly from stationary state. Then robot fish turn rapidly depending on hydrodynamics force and inertia force. Rapid turning can be obtained by this mode.
Figure 1. Three turning modes for robot fish
The above three turning modes are the basic ones. Different turning control can be realized with combing these turning modes. 3. Obstacle Avoidance Control Based on Reinforcement Learning The task of obstacle avoidance for robot fish is as follows. Robot fish swims in an environment scattered with obstacles. Through processing the information provided by ultrasonic and infrared sensors, robot fish performs different turning modes to accomplish autonomous swimming. An obstacle avoidance strategy based on reinforcement learning is put forward in the following.
3.1. Reinforcement Learning Reinforcement learning is a real-time and on-line learning method. It learns behaviors through trial-and-error interactions. It does not need mathematic model of environment and task. Hence, only the goal is needed for robot fish without knowing how to reach goal. Through learning, robot fish can get a set of optimal strategies from the experience about the states, actions, and rewards. There are many learning algorithms based on different problems. Dyna, Sarsa and Q learning are the common ones. Sarsa algorithm is adopted in this paper to learn strategies. 3.2. State and Behavior Sets of Robot Fish The combination of an ultrasonic sensor and an infrared sensor can get relative complete information of one direction. The data provided by sensors are divided into sections. Here, we label Lm as the maximum detection distance of ultrasonic sensor. The critical distance to perform obstacle avoidance behavior is Lw. Lh is denoted with the dangerous distance between robot fish and obstacle. Assume that the distance of measured blind area of ultrasonic sensor is equal to the maximum detection distance L\ of infrared sensor. The relation of these four distances is Lm>Z,v>Lh>Z,i. The urgent situation that the distance between robot fish and obstacle is less than L\ is discussed in [8]. The obstacle distribution has three states according to the sensor value Lj. s0, robot fish does not detect any obstacle or the detected obstacle is very far from robot fish, that is, Li>Lm or La>Z,v. st, robot fish is far from the detected obstacle, that is, L^>L&>Lh. S2, robot fish is near obstacle, that is, Z,h>Z.d>Z,i. The combination of three sensors is expressed as ^ P S R , where sL is the state of left ultrasonic sensor. Similarly, s? and sR are the states of frontal ultrasonic sensor and right one respectively. The number of possible states for robot fish is 27. However, the states of the three sensors are not of the same importance. Because the main swimming direction of the robot fish is forward, whether there is an obstacle in front of the head of the robot fish and the distance between frontal ultrasonic sensor and obstacle are more important. Moreover, in order to decrease the number of state set and enhance learning speed, part of states is combined in this paper. The rule to combine states is given by:
•Vr1 when 'Vr1
or
'Vr 2
(1)
There are 12 states after combination. Si-soSoSo. 52-5oSoSi, S^-sos^so, S4-soS\S\, Ss-soStfQ, S6-SoS2su STSISQSO, SsrSiSoSu S9-S\S^s0, Sl0-sxSiSh Sn-Sis2s0, Sn-s&Si. Considered the performance of robot fish, the obstacle avoidance behaviors are designed. They are: bi-Robot fish turns right still. b2- Robot fish turns left still. b3-Robot fish turns right with velocity V. b4-Robot fish swims forward. b5Robot fish turns left with velocity V. b6-Robot fish swims randomly. In these behaviors, b[ to b5 are the learning aims for robot fish and robot fish does not learn behavior b6. Robot fish chooses b6 automatically when no sensors detect obstacles. 3.3. Rewards Because ultrasonic sensor cannot determine accurately the orientation that obstacles relative to robot fish, obstacles in the beam angle of the sensor may generate same signal. The status of three ultrasonic sensor signals is integrated. We define y/ the angle that obstacles relative to robot fish body, that is, 30°
S L =iriS F = 0 0 ^ = 0
is'
sL=in(sF=iusF=2)nsR=o
0" y/ = \-\5°
SL=on(5'F=lUSF=2)n5R=0 SL=on(SF=lUSF=2)nSR=l
-30°
(2)
sL=or\sf=or\sR=i
-30° 1130° -30°UO°U30°
5L=in5F=ori5'R=l SL = i n ( S ' F = l U 5 F = 2 ) n 5 R = l
If the distance values of three sensors are dL, J F and c/R, d is defined as the distance between obstacle and robot fish. That is, d = min(dL,df,dK)
(3)
Based on \j/, £ is defined as the angle between next swimming direction of robot fish and obstacle orientation y/. In [9], Tucker Balch gives a set of descriptor about rewards. For the obstacle avoidance of robot fish, immediate reward and delayed reward are combined in this paper. The following instances may produce positive reinforcement. 1. 2. 3.
Robot fish swims from one state that there are obstacles around fish to another state that there are not obstacles. The behavior which robot fish performs makes angle £ increase. When robot fish's heading is far from obstacle, encourage the statebehavior pair that enlarges the exploration scope.
The instances to produce negative reinforcement are as follows. 4. The behavior which robot fish performs makes angle £ decrease. 5. When robot fish's heading is near obstacle, the distance between obstacle and robot fish decreases after performing behavior. All these rewards are immediate rewards. On condition that reward 1 is satisfied, the former two state-behavior pairs are encouraged. This is delayed reward. 4. Simulations The e - Greedy strategy is adopted to choose behavior and update Q value. The initial value st of £ is 0.95. Total episodes T are 65. The last value of e is 0.05. Learning rate a is 0.03. Fixed discounting factor y is 0.9. Maximum step number of every episode is 3100. Through learning, the state-behavior pairs are got. They are: Sw, Sn, Sn-b\, From learning results, the following rules are obtained. Rule 1. Robot fish turns right when there is an obstacle in left side and turns left when there is an obstacle in the right side. If there are obstacles on the right and left side, robot fish swim forward. Rule 2. When robot fish is far from obstacle, robot fish performs Mode A to avoid obstacle. When obstacle is in the dangerous area of robot fish, robot fish performs Mode C to avoid obstacle. The strategies learned by Sarsa algorithm are tested through simulation whose environment is depicted in Fig. 2. From the trajectory of robot fish, we can see that robot fish may avoid obstacles.
Figure 2. A typical swimming trajectory
5. Conclusion The obstacle avoidance capability is indispensable for robot fish executing in unknown environment. With the increase of category and quantity of sensors, it is difficult for human to determine swimming rules. The design of a kind of biomimetic robot fish with multiple ultrasonic and infrared sensors is presented in this paper. By using reinforcement learning, robot fish can learn the obstacle avoidance strategies without the need of accurate mathematic model. The number of states is reduced and the learning speed is enhanced through states combination. The validity of the learning results is tested through computer simulation. In near future, we focus on applying the strategies to robot fish designed. Acknowledgments This work is funded by research grants from 973 Project (No. 2002CB312200), NSFC (No. 50475179) and 863 Project (No. 2004AA4201104 and No. 2005AA420040). References 1. Colgate J E, Lynch KM. Mechanics and control of swimming: a review. IEEE Journal of Oceanic Engineering, 29(3): 660-673 (2004). 2. Triantafyllou M, Triantafyllou G S. An efficient swimming machine. Scientific American, 272(3): 64-70 (1995). 3. Ayers J, Davis J, Rudolph A. Neurotechnology for biomimetic robots. Cambridge, MA: MIT Press (2002). 4. CALibot. http://www.me.berkeley. edu/~lwlin/mel02B/project. html. 5. Kato N, Furushima M. Pectoral fin model for maneuver of underwater vehicles. Proc. of the 1996 Symposium on Autonomous Underwater Vehicle Techntology (1996). 6. Sfakiotakis M, Lane D M, and Davies J B C, "Review of fish swimming modes for aquatic locomotion," IEEE Journal of Oceanic Engineering, 24(2): 237-252 (1999). 7. Cheng J, Zhuang L, and Tong B, "Analysis of swimming of threedimensional waving plates," J. Fluid Mech., vol. 232, 341-55 (1991). 8. Sang Haiquan, Wang Shuo, Tan Min, Zhang Zhigang. Autonomous obstacle avoidance of biomimetie robotfish based on infrared sensor. Journal of system simulation, 17(6): 1400-1404(2005). 9. Tucker Balch. Behavioral diversity in learning robot systems. Ph.D. Dissertation of Georgia Institute of Technology (1998).
SNAKE-LIKE BEHAVIORS USING MACROEVOLUTIONARY ALGORITHMS AND MODULATION BASED ARCHITECTURES J. A. BECERRA, F. BELLAS, AND R. J. DURO* Grupo de Sistemas Autonomos, Universidade da Coruna, Spain J. DE LOPE Grupo de Perception Computational y Robotica, Universidad Politecnica de Madrid, Spain In this paper we describe a methodology for obtaining modular artificial neural network based control architectures for snake-like robots automatically. This approach is based on the use of behavior modulation structures that are incrementally evolved using macroevolutionary algorithms. The method is suited for problems that can be solved through a progressive increase of the complexity of the controllers and for which the fitness landscapes are mostly flat with sparse peaks. This is usually the case when robot controllers are evolved using life simulation for evaluating their fitness.
1. Introduction The main goal in the design of snake-like robots is to imitate the body configuration and, therefore, the motion of biological snakes [1]. Snakes are able to crawl on almost any surface, including slopes, slippery ground or both. With more or less difficulty, the snake will arrive at the goal zone. This kind of robots are mainly indicated when the objective is to reach zones that are difficult to access, or where the workspace and the environment do not allow the operation of conventional, wheeled or legged, robots. For a snake-like robot, a small space slightly larger than its body section is enough for moving forward. A classical application example of this kind of robots is motion in channels or pipes. From the energy consumption point of view, snake-like robots represent very efficient solutions. The total balance of consumed energy is comparable to other animals of similar mass [2]. Probably, the main contributions to snake-like robots are due to Hirose and his group [1] and more recently Chen et al. [3]. Basically they describe the implementation of several prototypes that emulate the behavior and gaits of bio' This work was supported by the MEC of Spain through project C1T-370300-2005-24 and Xunta de Galicia through project PGIDIT03TIC16601 PR.
725
726 logical snakes. In order to define the motion as realistically as possible, Hirose formulated the serpenoid function, which is a sinusoidal function that associates parameters such as amplitude, frequency or phase to the curvature in the spine of the snake. Another approach is to define pattern generators whose outputs are applied consecutively and cyclically to the robot for producing the desired motion. Some of the most representative papers in this area are those by Shan and Koren [4], in which a set of patterns are established for controlling the robot and make it move forward and turn, and Nilsson [5] with his pattern for climbing. The rest of the paper is organized as follows. In the next section we explain the structure and construction of the compound artificial network based behavior structure. Then we provide a description of the macroevolutionary algorithm used for the evolution of the controllers. After this we provide some examples of results obtained through the application of this approach and finally we provide some conclusions. 2. Constructing Control Architectures with Modulation Evolving artificial neural network based control structures is commonplace in behavior based robotics research. The problem has become how to obtain complex multi-ANN structures that allow for the seamless combination of multiple ANNs operating together in an autonomous robot. A very promising approach is to contemplate the construction of the control system as an inter-modulating structure where the very basic behaviors are evolved individually and then, through the introduction of modulating structures, they are modified on line in order to adapt them to slightly changing situations. Thus, in the case of the construction of a control system for snake-like robots, one can obtain controllers for the generation of different types of snaking strategies when moving in a straight line and then use a modulating architecture in order to generate turns or intermediate strategies through their mutual modulation. If this process is carried out carefully, most of the structures thus obtained will be reusable for other tasks. We consider two types of modulating structures: sensor modulators and actuator modulators. In our case, both types of modulators are constructed using ANNs and are obtained through evolution. Formally: • A module X is an ancestor of a module Y if there is a path from X to Y. • X is a descendant of Y if there is a path from Y to X. • X is a direct descendant if there is a path of length 1fromY to X. • X will be called a Root node (denoted as R) if it has no ancestors. • X is an actuator node (A) if its outputs establish values for the actuators. • X is a selector node (S) if its output selects one of its direct descendants as the branch to follow, short-circuiting the others.
727 • X is an actuator modulating node (AM) if its outputs modify (multiplying by a value between 0 and 2) the outputs of its descendant nodes of type A. The modulations propagate through the controller hierarchy. If between R and A there is more than one AM that modulates one output of A, the resulting modulating value will be the product of the individual modulations in the path. Assuming that AM modulates the values of n actuators, its number of outputs must necessarily be n*number of direct descendants, as the modulation propagated to each descendant is usually different. When more than one node A provides values for the same actuator, the actuator receives the sum of these values. An AM does not necessarily modulate all the actuators over which the set of nodes acts, just any subset of them. • X is a sensor modulating node (SM) if its outputs modify (multiplying by a value between 0 and 2) the inputs of its descendant nodes. The modulations propagate through the controller hierarchy to the actuator nodes. If between R and Y there is more than one SM that modulates one input of Y, the resulting modulating value will be the product of the individual modulations in the path. Assuming that a SM modulates the values of n sensors, its number of outputs must necessarily be n*number of direct descendants, as the modulation propagated to each descendant is usually different. A SM does not necessarily modulate all the sensors over which the nodes act, just any subset.
Fig. 1. Example of a controller with all the elements of the architecture.
The use of actuator modulators leads to a continuous range of behaviors for the transitions between those determined by the individual controllers. This is due to the fact that actuator values can now be a linear combination of those produced by every low level module.
Sensor modulators permit changing how a module reacts under a given input pattern transforming it to a different one. This way, it is very easy to make changes in the reaction to already learnt input patterns. In addition to increasing the architecture's possibilities, modulation results in a very interesting secondary effect: there can be more than one sub-tree being executed simultaneously in the controller. So, the architecture is not really different from a distributed architecture where modules are categorized into different groups, because actuator modulators can be put together in the same level and sensor modulators can be set aside of the hierarchy and attached to the appropriate inputs where necessary. Figure 2 displays an alternative representation for a controller taking this equivalence into account. Due to the large computational requirements for calculating the fitness of each solution (possible robot controller that must live its life out in a real or simulated environment), and in order to make computing times bearable, most processes in evolutionary robotics imply relatively small populations. The usual fitness landscapes with these types of life fitness functions imply large areas of mostly flat fitness values and some very sparse peaks where all the action takes place. This makes it very difficult for traditional genetic or evolutionary algorithms, which tend to converge to suboptimal solutions after under-exploring these very large solution spaces. In this work we address this issue through the application of macroevolutionary algorithms which were proposed in [7], The authors consider a new temporal scale, the "macroevolutionary" scale, in which the extinctions and diversification of species are modeled. The population is interpreted as a set of species that model an ecological system. The species may become extinct if their survival ratio with respect to the others is lower than a "survival coefficient". This ratio measures the fitness of a species with respect to the fitness of the others. When species become extinct, a diversification operator colonizes the vacancies with species derived from those which survived or with completely new ones.
a
I Sensor uiodululura
I Aiiluator moduladors
I'QOO
!0 OO.
-*&-
k-<W
[
r
D
Internal stale modifying modules
Fig, 2. Alternative representation for the architecture
3. Experiments and Results To test the approach, we designed a very simple testing environment where, as in [4], we have a 7 segment snake with inter-segment joints that can turn in angles between -120 and 120 degrees and a set of actuators (solenoids) that allow the snake to establish friction with the surface (similar to thrusting sticks into the ground). These actuators can either be on or off. The snake operates in an environment with a light source it must reach. To obtain an unbiased fitness function for the evolutionary process, during evolution we lay pieces of food on the ground in the direction of the light and every time the snake goes over a piece of food its energy is increased. When it moves, its energy is decreased. This type of fitness function favours reaching the light as fast as possible. All of the controllers evolved are artificial neural networks whose outputs provide the angles for the joints and the state of the solenoids and their inputs are the output angles and solenoids in the previous instant of time. In the case of modulators, the input is the sensor value for the position of the light and the outputs are the modulation values for the angles. We have run multiple evolution processes under different physical and energetic constraints where, initially, the snake had to reach the light following a straight path in order to obtain the basic snaking behavior. After the behavior was obtained, we ran evolutionary processes to produce the modulators that would allow the snake to reach a light even if it had to turn. Figures 3a and 3b display the results of two evolutions of a snaking behavior under different constraints. The populations used in the tests were between 800 and 16000 individuals. Each chromosome was 320 genes long containing real numbers. The controller obtained by the process was a synaptic delay based type network with two 6 node hidden layers, whose 13 inputs were the outputs of the controller in the previous instant of time in a recursive topology. The gait the snake obtained in the first case is very similar to the typical sidewinder mo-
£ 0.6
"3>
!3t
£
•e *0.4
- Fitness Best Individual - Population Average Fitness 150
200
Generation
Fig 3: Best individual in basic straight line sidewinding motion behavior and in a regular snaking behavior (a, b). Turning behavior through the modulation of the previous straight line motion behavior (c). Evolution of fitness when evolving the modulator (d)
730 tion found in some real snakes. In the second case (figure 3b) the motion is more similar to a traditional snaking behavior. Figure 3c displays the behavior obtained through the modulation of the robot's main behavior controller (shown in 3b) in order to turn towards the objective. This modulating controller is also a synaptic delay based neural network with 2 input nodes, one 6 node hidden layer and 6 output nodes (modulating the angles of the basic snaking behavior). It was evolved for 300 generations using a 72 gene chromosome and 16000 individuals and, as shown in Figure 3c provides a successful modulated behavior. Figure 3d shows the fitness for the best individual and the average of the population when evolving the modulator where, as we can see, the fitness increases to a stable level in a few generations due to the intrinsic simplicity of the modulators that arise using this architecture. 4. Conclusions In this paper we propose an approach for constructing multiple module behavior architectures for snake-like robots through a modulation structure using macroevolutionary algorithms. The application of this method implies first the generation of the basic behavior controller in a simplified environment and the consequent generation of the necessary modulators in order to achieve the complex behavior by gradually increasing the complexity of the environment. The results show that this way it is quite simple to generate all types of intermediate behaviors in a very natural manner. References 1. S. Hirose, Biologically Inspired Robots. Snake-like Locomotors and Manipulators. Oxford University Press (1993). 2. M. Walton and B.C. Jayne, "The Energetic Cost of Limbless Locomotion", Science, 524-527 (Aug. 1990). 3. L. Chen, Y. Wang, S. Ma and B. Li, "Analysis of Traveling Wave Locomotion of Snake Robot", Proc. IEEE Int. Conf. RISSP 365-369 (2003). 4. Y. Shan and Y. Koren, "Design and Motion Planning of a mechanical Snake", IEEE Trans, on Syst, Man, and Cyb.. 23(4):1091-1100 (1993). 5. M. Nilsson, "Snake Robot Free Climbing", IEEE Control Systems, 21-26, (1998). 6. Duro, R. J., Santos, J., Becerra, J. A. (2000), "Evolving ANN Controllers for Smart Mobile Robots", Future Directions for Intelligent Systems and Information Sciences, Springer-Verlag, pp. 34-64. 7. Marin, J. and Sole, R.V. (1999), "Macroevolutionary Algorithms: A New Optimization Method on Fitness Landscapes", IEEE Transactions on Evolutionary Computation, Vol. 3, No. 4, pp. 272-286.
DECISION TREE AND LIE ALGEBRA METHOD IN THE SINGULARITY ANALYSIS OF PARALLEL MANIPULATORS* KUANGRONG HAO ' AND YONGSHENG DING1'2 College of Information Sciences and Technology, 2Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, Donghua University, Shanghai 201620, P. R. China The singularity analysis of a parallel manipulator is often very complicated. There exist multi-criterions to determine the singular behavior of a parallel manipulator, for example: rank condition criterion of a screw set x of the parallel mechanism, second order criterion of screw set x > transverse criterion etc. This paper aims to explain a general method to analyze the relationship between the different criterions of singular configurations with the help of the decision tree method for the first time. We find that the second order criterion is the most important one to determine whether the singular configuration is bifurcated through a large amount of parallel manipulator analysis.
1. Introduction Parallel manipulators, also known as closed manipulators such as parallel robots, are widely used in many domains due to their high load capacity and high precision properties. At the same time, there exist too many complicated singular configurations to be analyzed. The singular behavior of such a manipulator depends not only on the variation of the rank of the associated Jacobian matrix, but also the geometric conditions of the second order. In this paper, we study a screw set x composed of n skew-symmetric fields, which is equivalent to study the Jacobian matrix. In virtue of the dual numbers, the rank of x is calculated by defining a free maximal list of x avoiding the calculations of the sub determinants of matrices of dimension 6 [1]. The behaviour of singular configurations of parallel manipulators can be determined with the rank condition of a free maximal list of x, transverse condition and the second order property of the screw set x •
' This work was supported in part by the Key Project of the National Nature Science Foundation of China (No. 60534020), the National Nature Science Foundation of China (No. 60474037), and Program for New Century Excellent Talents in University (No. NCET-04-415).
731
732 The decision tree method is frequently used in knowledge processing for robot control [2]. It can be used to aid engineering design [3, 4]. This paper presents an application of an ID3 algorithm in knowledge acquisition for the singularity of mechanisms recognition for the first time. This method can be used to robot real time control and the singular configuration identification by avoiding the complicate analysis of screw sets. 2. Rank Problem of a Screw Set and Generated Lie Sub Algebras 2.1. System Definitions e: The affine Euclidian space of 3 dimensions. E : The vector space attached to e. D : the displacement group, e : its unit element. D : Lie algebra of dimension 6 of D . • [.,.] Lie bracket, defined by •
VX,Ye D,[X,Y](m) = OXA Y(m)-co, AX(m\Vm e e .
• •
exp: D -»D, exponential application. The module structure on D. The set of dual number f = z+ez°, with z , z" e R and E* = 0 , is the dual number ring A. A skew-symmetric vector field X of D can be expressed in a basis of D over A. z = (z\i2,?) is called the coordinates of X in this basis. Furthermore, z = (z',zz,z') is the real part, and z = (z'°,z20,z30) is the dual part. 2.2. Rank of a screw set The most classical method to calculate the rank is to evaluate determinants of matrices. The notion of rank for screw systems is described as linear dependence over R. Let x = ix\ x„) t> e a s e t °f skew-symmetric vector fields which is a subset of the Lie algebra D . We find a list composed of some linearly independent elements of % over R. The number of which is equal to the rank of^. Such a list LR = {X^Xh,...X^ , is then a basis of the space generated by x over R. x contains a free maximal list, its element number is rA, this free maximal list is denoted byl 4 = {XaxXat,...xa }. I s is a free maximal list of the set*. In order to analyze the rank, we must fabricate a basis Ba of D over A from a free maximal list i a of % . We can choose B4 composed of rA elements of x and of 3 - r& elements which are not in x • So fl4 is not the same for different x > w e distinguish the particular cases as follows: 1. Maximal case: r4 = 3, sa = i 4 = {x,x2 ,x3};
2. Medium case: r4 = 2,fla= {y;,y2)y,}; where y, = x„y2 = A - ^ t x', 3. Minimum case: ra = l, flA = {^,r2,r3}; where^ = x„y2ex,rt *x • 2.3. Second order condition of parallel manipulator Considering a parallel manipulator consisting of rigid bodies, its n links connected by n joints. To every joint, a Lie sub group of D is associated and its Lie algebra is a sub algebra of D. q, is supposed as the parameter of joint/. The equation for a parallel manipulator is f(q) = e w i t h / ( ? ) = exp( ? l ^,)o...oexp(^„),9 = (g„...,g„)eR". T h e Sub set M = f'\e)
of R" is
the kinematic admissible space of the manipulator. jf,(9) = Ad(exp(?l£,)°exp(q'2#2)o...oexp(9t£i)) for k = l,...n . £ is the generator of the sub group of one parameter, where Ad(^) = 8^, °yrA. r = nmk(x) = ™rk(Xl(q),x2(q),...,x„(q)) . After the classical result of differential geometry, the DOF is equal to n-r in the regular cases i.e., /-is locally constant. Let F, = (X,(q) Xn{q)) be the sub vector space of dimension rof Din which xa(q) is a basis of F, , with ae/, = {i,...,r} . Then x,(q) = Y,c°x-^ withie/2={l,...,n-r} ,c° are functions ofq, withF,©G, =s,, vp(q) is a basis of G, with pef, = {I,...,* - r } , where dim(SJ = s . In fact,£ = (.v,(<7) x„{q)) is an application of R" inL(R",s,), its second order form is defined as This is a necessary condition't'o have the related configuration been regular. 2.4. Transverse condition The set E' of u with rank(u) = r is a sub manifold of co-dimension equal to (n-r){s-r) of the vector withO
space
£(R",S ? )
and Er = {«EL(R",S,)|rank(w) = r} ,
For q e R", x(q) e E' is called transverse to E' in q if the image of Tnx is transverse to the sub space TxWEr. Let q be a point of R", such that x(q) e E'. We suppose that the sub spaces zXq)™-" and TlU)Er are transverse, then £""(/) = * '(£') is a sub manifold of R" at q and we have: T^X\E')) = x\qT\TmE') . When « = /(9).^+v,v«eL(R",s,), V ^ R - , Vve7^(,,£r, Then the transverse condition is obtained to verify if the sub spaces x\q)• R" andTxMEr are transverse. Proposition 1 A is defined as the matrix of dimension (s - r)(n - /-) x n by: [J^CfG^ tfl
^e/3.V/6(1
«),V/e/ 2
A = (A,,/,)=
--
^is transverse to Er if and only if: rank(A) = (n-r)(s-r), where G,° is defined by[Jr„*j = 2 X * „ + Z G £ V , • Proposition 2 ffA/e2"*(/), then M is a sub-manifold of £""*(/) and that of R". r=rank(^,,...,4) = dim(F0). For all q e M, we have: r,M = Ker(/)ur,2"''(/), ^ = dim( A/) = dim(Ker(/) u r, £ " ' ( / ) ) , and7;A/ = toeR,|//,seF,}.
If Propositions 1 and 2 are satisfied, the mechanism has DOF at the point g . 3.
An application of the ID3 algorithm
3.1. Fundamental theory of an ID3 algorithm The ID3 method works as follows. Suppose T = PEvNE, where PE is the set of positive examples (e.g., transverse condition is verified), and NE is the set of negative examples (e.g., transverse condition is not verified), pe = |/>£| and ne = \NE\. An example will be determined to belong to PE-with a probability pe/(pe + ne) and NE with a probability ne/(pe+ne) . By employing the information theoretic heuristic, a decision tree is considered as a source of message, PE or NE, with the expected information needed to generate this message, given by —log2— log2 whence*0 andne*Q pe + ne pe + ne pe + ne pe + ne 0 otherwise
If an attribute At with the value domain {A„...,A„} is used for the root of the decision tree, it will partition T into {Tit...,TN} where T, contains those examples in T that have values A, of At. Let T, contain p, examples of PE and n, of NE. The expected information required for the sub-tree for T, is/(/>„>!,)• Th e expected information required for the sub-tree wither as the root, Ei(At), is then obtained as a weighed average: El(At) = Y-^-—'-I(p„n,), where the weight for the i th branch is the ~f pe + ne
proportion of the examples in T that belong to T,. The ID3 examines all candidate attributes, chooses At to minimize El(At), constructs the tree, and then uses the same process recursively to construct decision trees for residual subsets {7; r„}. 3.2. Generated decision tree for singularity identification In the following, an implementation of ID3 algorithm will be given. Table 1 gives examples about singularity identification of parallel manipulators, ec,: 4-link chain; ex2: Bricard chain; ex,: Bennett chain; ex4: Saltcellar [5];
ex,: Star parallel robot [6]; ea6: UPU parallel robot [7]; ex7: R-cubel parallel robot [8]; ex,: R-cube2 parallel robot [8]; ex,: Delta parallel robot [9]; exm: Tsai's parallel robot [7]. Based on the above definitions, four candidate attributes are used for collecting cases: The maximal rank of the screw list x (Mrank(^) = r ^ ) , The rank of the free maximal listl4(rank(£4) = r 4 ), transverse condition satisfied (T.C.S.), and the second order //{„of the screw list / i s in the space FV generated by z (//(, )• Bennett chain and Bricard chain are well known mechanisms, they work only in their singular configurations, and the obtained results are also proved that they possess 1 DOF in their work space. Star robot, UPU manipulator, Tsai's manipulator, R-Cube manipulator and Delta robot have 3 DOF. Star robot and UPU manipulator are bifurcated at their home position that means the robots do not possess any DOF at this position. Tsai's manipulator is an improved one from UPU, its home position is no more singular after the change of the axis orientation of the revolute joints [7]. The R-Cube mechanism is a non symmetric manipulator, so it is interesting to find that the result is different when the different chain is selected. Calculations:
T = {ex],ex1,exJ,exA,exi,exi,ex1,exs,ex9,exw}
,
pe = 4
,
ne = 6
,
T h e n 7(4,6) = -0.4 log 2 (0.4) - 0.6 log 2 (0.6) = 0.9710 .
Table 1: Knowledge cases obtained about singularity of closed mechanisms Example
r
r
T.C.S.
//
DOF
ex.
No
1
No
No
No
ex2
No
3
Yes
Yes
Yes
ex3
No
2
No
Yes
Yes
exa
Yes
2
No
No
No
ex5
No
2
No
No
No
ex6
No
2
No
No
No
ex.,
Yes
3
Yes
Yes
Yes
exa
No
2
No
No
ex,
No
2
No
No
No No
ex]0
Yes
3
Yes
Yes
Yes
K,eK No
JSS_ nnk(4) isnaxjiral
laiktfj isiIBXJlTBl
~~1 Yes
No
Transvose
&,
T ^^.eic^e^.et^ex,}
cand.satisfied
T Figure 1. Decision tree of the DOF of parallel manipulator
By using the expression ofEJ(Ai), the information measure needed for generating a decision tree with each attribute as the root is: the attributeH^sF , because it minimizes the entropy. Similarly, we now choose rank(L4) to separate out from the subset 7; = {e^,«,,«„«„} andr2={exi,«4,«J,«„«„ex,}. The complete decision tree is shown in Fig. 1 and Table 2.
Table 2. Calculations Attributes
El(Ai)
Results
H„eF,
0.4/(4,0) + 0.6/(0,6)
0
rank(LJ
0.3/(3,0)+ 0.5/(1,5)+ 0.1/(0,1)
0.3901
T. C. S.
0.3/(3,0) + (0.1 + 0.6)/(l,6)
0.4142
Mrank(^)
(0.2 + 0. l)/(2,1) + (0.2 + 0.5)7(2,5)
0.8797
From the result, it can be seen that the attribute ff-£f| is the most important to decide the behaviour of singular points, theoretically it is only a necessary condition in order the manipulator to having a DOF instead of sufficient condition. But as far as we know, we do not find that an example satisfying H„ <= F, does not possess a DOF, it is not theoretically demonstrated yet. At the same time the rank of the screw list x, the rank of the free maximal list La and the transverse condition are also important requirements in identification of singularity of the closed mechanism. 4.
Conclusions
In this paper, the application of the integration of differential geometry method and ID3 algorithm in knowledge acquisition for identification of singularity of parallel manipulators has been presented. The main advantages of such method lie in two aspects: First, It is not necessary to assign properties to attributes for the knowledge case collection. The result generated by an ID3 algorithm will give a very clear hierarchy. Second, this method can be adopted to analyze the mechanism design where large amounts of knowledge and experience scattered. References 1. K. R. Hao, Mech. Mach. Theory, 33,1063(1998). 2. D. Wilking and T. Rofer, Realtime, 8th international workshop in RoboCup, Lecture Notes in Artificial Intelligence, Springer (2004). 3. X. Shao, G. Z. Zhang, P. Li and Y. Chen, J. of Material Processing Technology, 117, 66(2001). 4. D. McSherry, Knowledge Based Systems, 12,269(1999). 5. K. R. Hao, These de doctorat, ENPC, France (1995). 6. J. M. Herve, European Patent: EP0494565A1, July 15(1992). 7. G. Liu, Y. Lou and Z. Li, IEEE Transactions on Robotics and Automation, 19,579(2003). 8. W. M. Li, F. Gao and J. J. Zhang, Mech. Mach. Theory, 40: 467(2005). 9. R. Clavel, United States Patent: 4976582, Dec. 11(1990).
COMBINING ADABOOST WITH A HILL-CLIMBING EVOLUTIONARY FEATURE SEARCH FOR EFFICIENT TRAINING OF PERFORMANT VISUAL OBJECT DETECTORS
Y. A B R A M S O N Transportation Research Institute, Technion - Israel Institute of Technology Haifa 32000, Israel E-mail: [email protected] F. M O U T A R D E , B. STANCIULESCU AND B. S T E U X
E-mail:
Ecole des Mines de Paris, Robotics Laboratory F-75272 Paris Cedex 06, France { Fabien. Moutarde, Bruno.Steux, Bogdan. Stanciulescu}
@ensmp.fr
This paper presents an efficient method for automatic training of performant visual object detectors, and its successful application to training of a back-view car detector. Our method for training detectors is adaBoost applied to a very general family of visual features (called "control-point" features), with a specific feature-selection weak-learner: evo-HC, which is a hybrid of Hill-Climbing and evolutionary-search. Very good results are obtained for the car-detection application: 95% positive car detection rate with less than one false positive per image frame, computed on an independant validation video. It is also shown that our original hybrid evo-HC weak-learner allows to obtain detection performances that are unreachable in reasonable training time with a crude random search. Finally our method seems to be potentially efficient for training detectors of very different kinds of objects, as it was already previously shown to provide state-of-art performance for pedestriandetection tasks.
1. Introduction The seminal work of Viola and Jones [1] [2] introduced a new and powerful framework for training of visual object detectors. It is based on the adaBoost algorithm, where each "weak-classifier" assembled in the final strong classifier uses a single and simple image feature. In most works inspired by [1], these features are localized filters similar to Haar wavelet basis. At each adaBoost step, one of them is selected whose weighted error on the 737
738 training set is as low as possible. Following this work, several authors tried different approaches for further improvement of the algorithm, either for speeding up the training, and/or improving the performance of the final detector. Two main directions have been explored: extending or changing completely the set of features, and/or trying to replace the exhaustive search for feature selection by a more efficient process. For instance, McCane and Novins [3] proposed an alternative non-exhaustive search based on a simple "local search" heuristic, and found that they could obtain nearly as good a classifier as in [2], but with a much faster training. Bartlett et al. [4] also used some custom heuristic based on initial random selection of a small subset (5%) of all possible features, from which the best one is further refined by some kind of local search (over a set of features obtained from the best initial one by applying various shifting, scaling and reflecting operations). Treptow and Zell [5] proposed to extend the features family proposed by Viola and Jones to a more generalized set of similar features, and to use a specific evolutionary search as "weak learner". They found that adaBoost training with their evolutionary search over their larger feature set produced better detectors than exhaustive search applied to the initial limited feature set. Simultaneously, we have proposed and tested in [6] and [7] a radically different set of image features: the so-called "control-points" features (see 2.1), and designed a custom "evolutionary" weak-learner for selecting, at each boosting step, an individual feature from the huge control-points feature space. In this paper, we detail our specific weak-learner heuristic, show that it allows to reach detection performances unattainable by crude random search, and present the very good results of our method on a different application: car detection in images from on-vehicle camera (instead of face or pedestrian detection). 2. Experiments 2.1. Image
features
The very general family of features we use are called "control points" features. This family was first designed and proposed by us in [6], where we already described it. Therefore we hereafter only briefly recall what these features are. Each feature is defined by two sets of "control points", {p\.. .p,} and {rix. ..rij}, where i,j < K (we use K = 6 as in [6]), all placed either within the W x H detection window, or on a half-resolution or quarter
739
Figure 1. Three examples of control point features: the left one is "full resolution" with 2 "positive" points and 3 "negative" points located in the 36 x 36 pixels detection window; the middle one is " half-resolution" with 3 positive and 2 negative points testing the 18 X 18 pixels down-sampled detection window; the last one on the right is "quarterresolution" with 5 positive and 2 negative points chosen in the 9 x 9 pixels of the twice down-sampled detection window. The upper row shows the control points by themselves, and the lower row illustrates the application of the same 3 features to a given sub-window extracted from one movie image.
resolution version of the same image. The feature examines the pixel values in {pi-..pi} and {n\...rij} in the relevant image (full, half or quarter resolution), and answers "yes" if and only if for every control-point p £ { p i . . .pi] and every control point n e {n\.. . n , } , val{p) — val(n) > 0 is true, where 0 is some feature-dependant minimal margin. Note that this last condition is a generalization of the original control-points features defined in [6] (where 6 = 0). For the experiments presented here, the detection window size W x H is 36 x 36. We refer the reader to [6] for more details, justification and advantages of this family of features. 2.2.
Datasets
We used data recorded by ourselves with an on-vehicle 320x240 high-end CCD color webcam (LogiTech QuickCam Pro 4000). The training and validation sets were built partly manually (essentially for the positive examples), and partly using the "active learning" approach, with the semiautomated selective sampling implemented by the "SEVILLE" software (see [7] for more details). For creating the positive examples, the subimage was carefully positioned and sized in order to have the back of the car centered, with margins of about 15% around it, as can be checked on positive examples shown on top of figure 2.
740
Figure 2. Some examples from our training set: various positive car examples on the upper row, and several negative examples on the lower row (note that we define non-car as non-"correctly centered rear of car"). Image examples are all square but of various sizes; they are all warped before training to the same 36 x 36 detection window, and its 2 sub-sampled versions.
Our final total dataset includes 3291 labeled square images of various sizes, among which 1224 positive examples. This dataset was split in two: 2/3 of randomly chosen examples constituting the 2204 examples (including 791 positive) of our training set, the remaining 1/3 just being used as a validation set to monitor for overtraining. For final testing, in order to ensure an unbiased measure of accuracy, we used an independant recording together with a manually built "ground truth" information. The latter specifies, for each successive image in the movie, the exact position and size of actual "rear of car" positive detections that a perfect detector should output. 2.3. AdaBoost
training
For detector training, we use the adaBoost algorithm [8], as in [2] and most subsequent papers. AdaBoost requires a "weak learner", i.e. an algorithm which will select and provide, for each adaBoost step, a "good" feature (i.e. with a "low-enough" weighted error measured on the training set). The weak learner used by Viola and Jones is just an exhaustive search of all « 180,000 possible features in their set of features. But in our work, exhaustive search is definitely not possible, because we use "control points" features as described in 2.1, and the total number of possible different features is absolutely huge in this family (there are more than 1035 of them for our 36 x 36 detection window size). One of the goals of the present work is precisely to present in detail our weak-learner heuristic specifically designed for efficient exploration of this huge space.
2.4. Evo-HC hybrid search
weak-learner
The general scheme of our custom-designed hybrid search is the following: • we start with a first generation of 70 random simple features (i.e. with only 2 positive and 2 negative points); • at each generation: (1) we select the 30 best features of the previous generation (those with lowest weighted error on the training set); (2) we apply to those 30 best features a hill-climbing consisting in applying to each of them a maximum of 5 successive specifically-designed "mutations", each mutation being cancelled if it did not improve the feature; (3) we complete the population with 40 new simple random features; • we stop the algorithm when there has been no improvement during 40 consecutive generations, and choose the best feature of the last generation as the "good" selected feature for this adaBoost round. We call this custom exploration search evo-HC, where "evo" stands for evolutionary, and "HC" for Hill-Climbing. Indeed our search algorithm is largely driven by the "random mutation hill-climbing" step (2) where the best features of the previous generation are further refined by several successive only-improving "mutations". And it is at the same time somehow hybrid with an evolutionary search as for each new generation the 40 worst features (among 70) are discarded and replaced by new random simple features. It should be noted however that only selection (and no crossover) occurs during our evolution. But, as found some time ago by Mitchell et al. in [9], Random Mutation Hill-Climbing (RMHC) by itself outperforms Genetic Algorithms in some difficult optimization problems. The "mutation" operator is specifically designed for the control-points features. It is a random choice between one of the 6 following alterations: • randomly moving one of the existing control-points within a maximal radius of 5 pixels around its initial position; • add a new random (positive or negative) control-point; • remove one of the existing control-points; • change the working channel (red, blue or green) of the feature; • change the feature "margin" value 9 (see section 2.1) by ±2; • change the working resolution (full, half or quarter) of the feature.
100%
,Vv-
*w „• Error on validation set ' '"«-""''"• "-Yift-B
-•- evoSearch-fiOOfeatures
S ,1
• evoSearch-200features -«~ evoSearch-SOfeatures
85%
-*- evoSearch-20features Error on training set 0
GO 100 100 200 ZOO 300 350 400 4GD SOO
Boosting step (number of features assembled)
80% 1E-05
1E-04
1E-03
False detection rate
Figure 3. Monitoring of adaBoost training using our evo-HC search: on the left, typical training curve during training; on the right, evolution of ROC curve during training (generally moving left and up along boosting iterations).
3. R e s u l t s As is customary for assessment of detectors (c/. for instance [2]), most of our evaluations and comparisons are based on the "Receiver Operating Characteristic" (ROC) curves of detectors. These ROC curves are graphical plots of sensitivity vs. specificity (here, positive detection rate vs. false detection rate) as the discrimination threshold is varied (see e.g. [10]). 3 . 1 . Detection performance with our evo-HC weak-learner Our training method works very well for the car detection problem, as illustrated by the very good ROC curve obtained for the 500-features detector (see upper curve on right of figure 3). The positive detection rate for this classifier is « 95% for a false detection rate of 1 : 40,000. T h e latter rate approximately corresponds to one false detection per image frame in the camera stream, as in our setup 33,227 sub-windows of various locations and sizes are tested in each image. The typical computation time for adaBoost training on our 2204 car/non-cars examples set using our evo-HC search weak-learner is « 28 minutes (on a Pentium IV 3Ghz) for each 100 boosting cycles (and thus for each 100 features added). The total training time for a 500-features detector is therefore « 2.5 hours. 3.2. Comparing evo-HC to "random search" weak-learners In order to assess the exploration power of our evo-HC search weak-learner, we conducted a systematic comparison with an alternative trivial weaklearner: a simple random search. T h e latter is extremely simple: for each adaBoost step, we simply try successively maxTrials randomly generated features. Note that, contrary to what we do in our evo-HC, the feature
743
False detection rate
maxTrials
Figure 4. Comparison of detection performance attainable with our evo-HC search, to that of detectors obtained using random search: left, the ROC curves of the latter are always below, when comparing detectors with the same or a smaller number of features; right, slow improvement of random-search detectors with increasing maxTrials, reaching a maximum before the performance obtained with our evo-HC search.
randomization procedure naturally covers all possible features. The only parameter of the random search weak-learner is maxTrials, the number of randomly generated features for each adaBoost step. It is clear from ROC curves on figure 4 that the detection performance of the best detector using our evo-HC search weak-learner is definitely higher than that of any of the detectors we obtained using random search weaklearner. This is true not only for the most complex detector (i.e. with 500-features), but also for any given maximal number of features allowed for the detector. Also, even though some detectors obtained with random search with very high value of maxTrials produce "lower but acceptable" detection performance, it should be noted that the corresponding training time was already much higher (nearly 10 times) than the training time with our evo-HC weak-learner. 4. Conclusions and discussion In this paper, we have shown that adaBoost training with control-points features and a specially-designed weak-learner (evo-HC, a hybrid of HillClimbing and evolutionary search) can produce a very good "car detector" reaching 95% positive detection rate for 1 : 40,000 false detection rate (i.e. less than 1 false alarm per video frame). Moreover, we have conducted a series of tests to compare these good results to what can be obtained using an alternative very simple " random search" weak-learner. It was shown that it is apparently not possible (even if increasing the number of tested features to values implying unreasonable training times) to get as good a final classifier as the one that was obtained using our evo-HC search weak-learner. It should also be noted that the
744 training time of a 500-features detector on our « 2000 images training set is only « 2.5 hours on a typical desktop. A deeper study and comparison of our weak-learner with other sophisticated search algorithms should now be conducted. Nevertheless, the evidences presented here already indicate that our evo-HC search is indeed a quite efficient weak-learner, with good exploratory power, allowing performant selection of several hundreds of discriminative features among the w 10 3 5 possible "control-points" features. Finally, as we had previously successfully applied our method for training an acceptable face-detector [6], and a state-of-the-art pedestriandetector [7], it seems that our adaBoost training with evo-HC weak-learner exploring "control-points" feature space can be an efficient and general method for training visual detectors of any kind of objects. References 1. P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR'01), Kauai , Hawai, USA (2001). 2. P. Viola and M. Jones, Robust real-time object detection. International Journal of Computer Vision 57(2) (2004). 3. B. McCane and K. Novins, On training cascade face detectors. In Image and Vison Computing, pages 239-244, New Zealand (2003). 4. M.S. Bartlett, G. Littlewort, I. Fasel and J.R. Movellan, Real time face detection and facial expression recognition: Development and application to humncomputer interaction. In CVPR Workshop on Computer Vision and Pattern Recognition for Human-Computer interaction, Vancouver (2003). 5. A. Treptow and A. Zell, Combining adaBoost learning and evolutionary search to select features for real-time object detection. In Proc. of IEEE Congress on Evolutionary Computation (CEC 2004), Portland, Oregon, vol. 2, pp. 21072113, IEEE Press (2004). 6. Y. Abramson, B. Steux and H. Ghorayeb, YEF real-time object detection. In Proc. of Intl. Workshop on Automatic Learning and Real-Time (ALART'OS), Siegen, Germany (2005). 7. Y. Abramson and Y. Freund, SEmi-automatic VIsuaL LEarning (SEVILLE): a tutorial on active learning for visual object recognition. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR '05j, San Diego (2005) 8. Y. Freund and R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: EuroCOLT'95, pages 23-37, Springer-Verlag (1995). 9. M. Mitchell, J.H. Holland and S. Forrest, When will a genetic algorithm outperform Hill-Climbing ? In Advances in Neural Information Processing Systems, vol. 6, pages 51-58, Morgan Kaufmann Publishers Inc. (1994). 10. J.A. Hanley, Receiver operating characteristic (ROC) methodology: The state of the art. Critical Reviews in Diagnostic Imaging, 29:307-335 (1989).
INTELLIGENT SYSTEM SUPPORTING NON-DESTRUCTIVE EVALUATION OF SCC USING EDDY CURRENT TEST SHIGERU KANEMOTO The University ofAizu, Tsuruga, Ikki-machi, Aizuwakamatsu City, 965-8580, Japan WEIYING CHENG, ICHIRO KOMURA, MITSUHARU SHIWA NDE Center, Japan Power Engineering and Inspection Corporation, 14-1, Benten-cho, Tsurumi-ku, Yokohama City, 230-0044 Japan SHIGEAKI TSUNOYAMA The University ofAizu, Tsuruga, Ikki-machi, Aizuwakamatsu City, 965-8580, Japan Two premier issues in eddy current testing (ECT), that are, the automatic identification of cracks from their noisy backgrounds and sizing of cracks with complex morphologies, are dealt in this paper. A signal processing method based on statistical pattern recognition is established to realize automatic crack identification. And a new method for estimating crack depth is proposed by utilizing a new concept depth sizing index which is analytically constructed from raw measurement signals based on system optimization theory. 1. Introduction
Nondestructive inspection is one of the main measures to ensure the safe and reliable operation of nuclear power plants. Among the nondestructive evaluation methods, eddy current testing (ECT) is widely used for the inspection of surface breaking cracking [1-3]. Along with the enhancement of ECT sensor and the development of numerical analysis methods, ECT has achieved significant success in detecting and sizing of clearly separated crack like artificial defects, such as EDM (Electro Discharge Machining) notches and fatigue cracks [4,5]. A variety of stress corrosion cracks (SCC), which are induced by the combined influence of a tensile stress, a corrosive environment and a material property, have been found at critical reactor components in recent years. Because of the morphologies complexity of SCCs, such as a narrow width, bridges between crack surfaces or complex branch shapes of crack, the detection and sizing of SCCs in their early stages remain difficult. New detection and sizing methodology, such as development of new sensors, establishment of new 745
746 signal processing and crack sizing methods, are needed, in order to realize automatic, accurate detection and sizing. In this study, two premier issues in eddy current testing, that are, the automatic identification of cracks from their noisy backgrounds and sizing of cracks with complex morphologies, are dealt with. A statistical pattern classification technique is applied for the identification of cracks. A new concept of depth sizing index, which is sensitive only to crack depth and insensitive to other parameters, is constructed based on the system optimization theory. These two algorithms help human experts to identify and sizing cracks from ECT signals. 2. Automated Crack Detection Figure 1 shows ECT signals which measure 10mm long, 2mm deep and 0.3mm width EDM notch fabricated near the welding line of SUS304 pipe. The pancake type ECT probe with 400kHz frequency is scanned on the surface of the test piece with 0.5mm pitch in X-Y direction. Then, real and imaginary components of the measured signals are displayed as so-called C-scan image figures. Also, in Fig.l, Lissajous plots are shown for x-direction (horizontal) and y-direction (vertical) scan data which are indicated by crosshair cursor of Cscan figures. Due to the existence of welding line, it is hard to distinguish crack parts from background signals. Human expert usually examine the Lissajous plot by manipulating ECT probe and distinguish the crack signals, by referring to the ECT signal of a calibration purpose notch with knowing location and size in prior. R^a!
imaginary
_ >, e*
i
X-Dfrecflon Scan
,\ >
\i
- \.\
, X-Positk>n("Q,Smm)
X-PosittonCO 5mm)
R 6a |
_
Y-Directton Scan
2T
2 * 9 • 1 .. Real
Figure 1. Typical ECT signal plots (C-scan (left) and Lissajous (right) figures)
In order to automate this procedure, we introduce the following new multidimensional signals which are constructed by embedding signals at neighboring sampling point. z(i,j) = [zre(i-nj-m),zlm(i + n,j + m)] (n = -N~ + N,m = -M~ + M) (1) Where (ij) indicates a target position and n and m are neighboring positions along x- and y-directions. This embedding procedure corresponds to human
747 expert's probe manipulation behavior, also, reminds us the Takens' embedded theorem in time series analysis. After embedding ECT signals in the multi-dimensional state space, we can apply various kinds of classification algorithms. Here, we utilize the Fisher's discriminant theory in this study. In this theory, the defect and noise classes are defined respectively by, 1) class of defect signal: cos : {z(ij) I (i,j) 6 defect region} 2) class of noise signal: CON : {z(ij) | (ij)S out of the defect region} where the subscript S and N stand for defect signal and noise respectively. The optimized linear discriminant function g to distinguish these two classes is, g(z(/,;)) = wrz(/,y) + *r0, (2) where w is a vector representing the axis to make this projection from multidimensional space to one-dimensional space, and W0 is a optional scalar value to decide the original point of the discriminant function along the projection axis. According to the Fisher's theory, w can be calculated by w = {E s + E N }-'{m,s-m w } .
(3)
where m^. and m^ are mean of the 2X (2N+1)X(2M+1) vector of this two classes, and X,s and ~LN are matrix of variances of the two classes respectively. The constant WQ is optional but necessary for determining the origin of discriminant function. If W0 is calculated by yB,_w7-frfc.?+gj»„) ,
(4)
then the two classes can be classified by the plus or minus signs of the discriminant function g, that is, g(z(ij))>0:z(i,j)ews
.
(5)
g(z(i,j))<0:z(i,j)ecoN
a\ and a2N are variance of the two classes which can be calculated by crj = w r £ 5 w ,
aj, = w r E w w .
The value of discriminant function g can be presented by a C-scan image. Then, the acceptance or rejection of defect identification can be automatically decided from the value of g. The ECT signals shown in Fig. 1 are utilized and the supervised defect is identified according to Eqs. (l)-(4). The supervised defect is defined in a rectangular region near the crosshair cursor point in Fig. 1. Signals inside and out of this region are assigned to defect class and noise class respectively. A 22dimensional vector space (N =0, M=5) is constructed. Then, the C-scan image of discriminant function g is calculated and shown in Fig. 2(a). Only the positive value, that is, g > 0 is plotted in Fig. 2(b). Although there is very
strong welding noise as indicated in Fig. 1, the notch is clearly extracted and indicated in Fig. 2(b). False alarm probability and miss alarm probability are measures of defect identification ability of the method. Figure 3(a) shows the probability of identification by utilizing the above 22 dimensional vector. The horizontal axis labels the discriminant function value g(z). The identification probability of defect is denoted by circle (o) and noise is denoted by cross (x). The classification boundary is indicated by the vertical line. Normalized histogram is constructed based on the mean and variance of the distribution of the two classes. The solid and broken lines represent the normal distribution of the two classes respectively. Here, the results using a 2-dimensional vector space (N=0, M=0) are shown in Fig. 3(b). Also, the summary of the false and miss alarm probability calculated from the normal distribution is presented in Table 1. This comparison clearly demonstrates that defect identification capability is enhanced by using multi-dimensional space parameter.
(a)Discriminant function value (b)Extracted flaw Figure 2. C-scan images of discriminant values and extracted flaw signal parts indicated by g(z)>0.
Dk.'OTi\!nftnt fjManae
Otscnfronant Distance
(a) Discrimination results (b) Discrimination results in 22-dim. space (N=0,M=5) 2-dim. space (N=0,M=0) Figure 3. Comparison of discrimination performance in different state space. Table 1. Summary of false and miss alarm probability p.
•
- "
""""—"-
-
™
~
™
~
~
~
—
;
•
™
-
—
-
-
-
—
" ™ ™ " - "
"
32-dim. vector
2-dim. vector
1
False alarm probability
3.09E-04
0.0139
s i
Miss alarm probability
3.09E-04
0.0807
3. Depth Sizing Index The depth sizing of SCC is one of the major concerns of ECT. However, the complexity of SCC, especially, partial contacts between SCC crack surfaces yield the bypass of eddy current and the depth estimation bias. For this reason, a variety of methods have been proposed [6,7]. One of the approaches assumed that the conductivity in SCC region is constant, and the ECT inversion is reduced to the computing of the constant crack conductivity and crack's geometrical profile [8,9]. Correspondingly, a modified two-layer SCC model has also been proposed, in which the SCC region is divided into two layers with different constant conductivities in depth direction [10]. Nevertheless, both the crack conductivity and the crack geometrical profile have to be taken into consideration in these two approaches. However, in point of mechanical strength assessment, there is not too much interest in crack conductivity distribution. The crack profile, especially the crack depth, is one of the main concerns. If we can find a characteristic parameter, that is sensitive to crack geometrical profile, but not to conductivity distribution, we could assess the depth of a crack without knowing its conductivity. In accordance with this idea, a depth sizing index, which has the lowest sensitivity to crack conductivity and highest sensitivity to crack depth, is designed based on the system optimization theory, and the depth of SCC is estimated according to the depth sizing index. First, we define the depth sizing index, S, by a linear combination of ECT signal, S = cTz. Here, c is an unknown parameter which should be optimized later. And z is denoted by z = [zj, ( x , ) , z ^ ( x , ) ] r as a function of frequency fk and probe position xj. Measured ECT signal z depends on crack depth d and crack conductivity a. Hence, if we can find the parameter c, which satisfies the following conditions dS{c,z(d,cr))/dd = max /g\ dS(c, z(d, a))lda = min then, the depth sizing index, S, depends on just the crack depth. This optimum coefficient vector copl is concretely derived from = arg max w,xmax(
dS(c,z(d,(T)) dd
w2 x max(as(cz(,
(7)
where w, and w2 are weighting parameters. In order to solve Eq. (7), we approximate z as a non-linear function of depth d and conductivity a as follows: Zi=xl*d + x2*
In our ongoing depth sizing index searching, ECT signals of a 22mm long crack with conductivities of 0, 0.5, 1, 1.5, and 2 percent of a0, and depth values of 1 to 7mm are calculated in prior by the FEM analysis. Only the peak of signal, that is, signal sampled at the crack length center, is utilized. Figure 4 shows the normalized amplitude and phase of the peaks of all the cases. Subscripts L and H indicate low frequency (50kHz) and high frequency (400kHz) test cases in this study, respectively.
\ *
Figure 4. Amplitude and Phase of the high and low frequency signals with respect to different cases.
As an example of the conventional heuristic depth sizing index, the difference between the amplitudes of 400kHz and 50kHz signals, that is, s = zf» - z £ a r e examined at first. Figure 5(a) shows that the dependency of this 5 on conductivity is quite small when crack depth ranges from 1 to 3 mm, whilst with the increase of crack depth, dS/der increases. In order to estimated the crack depth, we have to prepare so-called master curve which defines the relation between S and depth estimation value dpred. Here, we utilize the following second order polynomial curve fitted from the FEM analysis data of l-7mm depth and 0% conductivity cracks. dpred =XI*S*S + X2*S + X, (9) The depth estimation results using this master curve is shown in Fig. 6(a) using l-7mm depth and 0-2% conductivity crack data. Here, 2% white noise is added to original FEM analysis data. Numbers in the horizontal and vertical axes denote the true crack depth and estimated depth, respectively. The estimated depth of a 1mm deep crack ranges from 1 to 1.3 mm, however, the estimation depth of a 7mm deep crack varies from 2.5 to 8.0 mm. Obviously it is impossible to estimate the crack depth using this heuristic depth sizing index for cracks having different conductivity. Then, optimization is carried out using Eqs. (7) and (8). Here both the amplitude and the phase of high and low frequency signals are utilized, that is z = {z£>z£>zS.zS}-
(10)
751 The coefficient vector c and S obtained from the optimization as follows: 5 = 0.68*z^-0.69*z^,-0.12*z^+0.24*z^. 00 Figure 5(b) shows the optimized depth sizing index dependence on d and a. Here, we can see dS/dcr is optimized to its minimum. And the depth estimated by Eqs. (11) and (9) is shown in Fig. 6(b). Here, the same data as in Fig. 6(a) are used in the depth estimation. The circle indicates the estimation using 0 conductivity ECT data, and the X indicates the estimation using other nonzero conductivity ECT data. The average depth and the standard deviation of the estimation are listed in Table 2. The estimation error is less than 1mm and is quite acceptable. Experimental verification of this algorithm will be shown in our near future paper [11].
Cwfcelivil»<W
(a) s = zf" • z{\ W 5 = 0.68*z^-0.69*z^-0.12*z^+0.24*z^i Figure 5. Contour of the fitting plane, fitted by using the different depth sizing index.
True{mm)
Ws = ( ) S = 0.68*z{" -0.69»zi-0.12* >Sj,+0.2A*z'j, Figure 6. Estimation of depth using second polynomial fitting of the different depth sizing index. b
Table 2. Comparison of the estimation and true depth. true(mm) 1 2 3 4 5 6 7
average(nim) 1.3 1.5 3 4.5 5.4 5.9 6.2
Stdev(mm) 0.01 0.04 0.16 0.22 0.13 0.14 0.34
4. Summary Two kinds of intelligent signal processing algorithms for automatic detection and depth sizing of SCC are presented in this paper. The preliminary results of the present methods suggest large possibility to improve the present inspection procedures, and also to make clear the criteria of expert judgment for standardizing non-destructive inspection procedures. Of course, since a lot of tacit knowledge is remained in the expert judgment, we should pursue them in future. Also, the present methods should be verified by actual SCC data. Since the present paper just shows the possibility of the most basic signal processing methodologies, a number of other state-of-the-art algorithms, especially, non-linear signal processing and classification algorithms, should be studied in future to establish more intelligent and high performance NDT support systems. References 1. A. Dogandzic, N. Eua-anant and B. Zhang, Review of Progress in Quantitative Nondestructive Evaluation, Vol.24, 704 (2005). 2. N. Zavaljevski, S. Bakhtiari, A. Miron, D.S. Kupperman, Review of Progress in Quantitative Nondestructive Evaluation, Vol.24, 728 (2005). 3. B. Shin, P. Ramuhalli, L. Udpa and S. Udpa, Review of Progress in Quantitative Nondestructive Evaluation, Vol.23, 597 (2004). 4. T. Takagi, M. Uesaka and K. Miya, Electromagnetic Nondestructive Evaluation, IOS Press, 9 (1997). 5. W. Cheng and K. Miya, International Journal of Applied Electromagnetics and Mechanics, 495 (2001/2002 ). 6. J.A. Bieber, C.C. Tai and J.C. Moulder, Review of Quantitative Nondestructive Evaluation, 17 (1998). 7. N. Yusa, W. Cheng, T. Uchimoto and K. Miya, NDT & E INTERNATIONAL, Vol.35(l), 9 (2002). 8. M. Kurokawa, T. Kamimura and S. Fukui, Proceeding of the 13lh International Conference on NDE in Nuclear and Pressure Vessel Industries, Kyoto, (1995). 9. Z. Chen, K. Miya and M. Kurokawa, Electromagnetic Nondestructive Evaluation (III), Studies in Applied Electromagnetics and Mechanics, vol.15, 233(1999). 10. Z. Chen, K. Aoto and K. Miya, IEEE Mag, Vol.36(4), 1018 (2000). 11. W. Cheng, S. Kanemoto, I. Komura and M. Shiwa, To be published in NDT&E International (2006).
THE CONTINUOUS-SENTENTIAL KSSL RECOGNITION AND REPRESENTATION SYSTEM USING DATA GLOVE AND MOTION TRACKING BASED ON THE POST WEARABLE PC* JUNG-HYUN KIM
School ofInformation and Communication Engineering, Sungkyunkwan University, 300 Chunchun-dong, Jangan-gu, Suwon, KyungKi-do, 440-746, Korea KWANG-SEOK HONG
Sungkyunkwan University, 300, Chunchun-dong, Jangan-gu, Suwon, KyungKi-do, 440 746, Korea Sign language is a method of communication that uses facial expressions and gestures. However, not only absolute, natural learning and interpretation of sign language are very difficult, but also takes a long time to represent and translate it fluently in hearing person. Consequently, we design and implement the real time-sentential the Korean Standard Sign Language (hereinafter, KSSL) recognition system using fuzzy logic and wireless haptic devices based on the post wearable PC. Experimental result shows the average recognition rate of 93.9% about dynamic and continuous the KSSL.
1. Introduction In recent years, there have been a lot of innovations and evolutions in the areas of human-computer interaction and sign language recognition as a part of natural language understanding. However, as well as a traditional study on sign language recognition can not secure mobility of a sign language recognition system for efficient a dialog between a deaf person and hearing person according as it has general purposes to control of optional hand signal and sign language recognition based on the desktop computer and the technology of wire communication, it is impossible correct measuring of the KSSL gesture data and
This work is supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment)-(IITA-2005- CI090-0501-0019)
753
754 needs complex computation algorithm because traditional sign language recognition system used image capture system or video processing system for an acquisition of sign language. Consequently, in this paper, we design and implement individual the KSSL recognition system using fuzzy logic and wireless haptic devices based on the post wearable PC (WPS : Wearable Personal Station) for clear communication between the hearing-impaired and hearing person. 2. Regulation of the KSSL Gestures and the KSSL input module According as the KSSL is very complicated and is consisted of considerable numerous gestures, motions and so on, it is impossible that recognize all dialog components which are represented by the hearing-impaired. Therefore, we prescribe that this paper is a fundamental study for perfect dialog and communication between the hearing-impaired and hearing person, and selected 25 basic KSSL gestures connected with a travel information scenario according to the "Korean Standard Sign Language Tutor (hereinafter, "KSSLT") [1]". And necessary 23 hand gestures for travel information - KSSL gestures are classified as hand's shapes, pitch and roll degree. That is, we constructed 44 sentence recognition models according to associability and presentation of hand gestures and basic the KSSL gestures. For the KSSL input module, we adopted blue-tooth module for wireless sensor network, 5DT company's wireless data (sensor) gloves and fastrak® which are one of popular input devices in the haptic application field. Wireless data gloves are basic gesture recognition equipment that can acquires and capture various haptic information (e.g. hand or finger's stooping degree, direction) using fiber-optic flex sensor. Also, the fastrak® is electromagnetic motion tracking system, a 3D digitizer and a quad receiver motion tracker, and it provides dynamic, real-time measurements of six degrees of freedom [2], [3]. The architecture and composition of the KSSL input module is shown in Figure 1. 5th Data Clove System
Motion Tracking System: Fastrak*
HMDflfoatf Mounted Display)
Oracle log RDBMS
Figure 1. The architecture of sign language input module
3. The Training and Recognition Models using RDBMS The process of implementing training and recognition models are to analyze the input data from data glove system and to segment the data into validity gesture record set and status transition record set. Therefore, the RDBMS is used to classify inputted sign language gestures (hand gestures and motions) data from sign language gesture input module into valid and invalid gesture record set (that is, significant and status transition record set) and to efficiently analyze valid record set. A rule to segment valid gesture record set and invalid record-set (changing gesture set) is shown in Figure 2. •
If the difference between preceding average (preceding 3 and 1) and current row value is over 5, the current value is regarded as transition sign language gesture record.
•
If one of 5th data glove system and fastrak® data value are over 5, current value data is also regarded as changing sign language gesture record. i. Dtffmmc*fffffw»*ff Pncm4tiitAiimi
!
n •! u
K *S n
2ii t»h .H
*H
!<• nil
T
I
I'C J*l ni M« | IH I
i.s: 14'J i M :an in iit.t ' II II <M rltl ' 1 » art sw n)i \tj: in i W j 110 I H.II J 4.H f Omltilon HCI IBS " 101 41 fS 913 Hit fen \n aiw ill II m
*
! 1W , 1 »
IIS
IBIS
ill
«| fr
b>j
; t ~—. 1 . ~.~ — . . « . n ; .».
Figure 2. The rule of segmentation and record set
4. Fuzzy Max-Min Composition for Sign Language Recognition For a design of fuzzy membership function, many types of curves can be used, but triangular or trapezoidal shaped membership functions are the most common because they are easier to represent in embedded-controllers [4]. 0
x-a b-a 1 JUA(xYd-x d-c 0
xd
«,»
0)
Figure. 3. Trapezoidal fuzzy numbers-set A as,4 = (a, b, c, d)
So, we applied trapezoidal shaped membership functions for representation of fuzzy numbers-sets, and this shape is originated from the fact that there are several points whose membership degree is maximum. To define and describe trapezoidal shaped membership functions, we define trapezoidal fuzzy numbers-set A as A = (a, b, c, d), and the membership function of this fuzzy numbers-set will be interpreted as Equation (1) and Figure 3. Also, the examples of proposed the fuzzy membership functions are shown in Figure 4.
(b) Basic KSSL Gestures
SOo4_YOUs
best WIT
(c) Thefuziy membership functions Figure 4. The fuzzy membership functions for KSSL recognition.
Also, we utilized the fuzzy max-min composition to extend a crisp relation concept with fuzzy proposition and to reason approximate conclusion by composition arithmetic of fuzzy relation. Two fuzzy relations R and S are defined on sets A, B and C (we prescribed the accuracy of hand gestures and basic KSSL gestures, object KSSL recognition models as the sets of events that are happened in KSSL recognition with the sets A, B and C). The composition S • R = SR of two relations R and S is expressed by the relation from A to C, and this composition is defined in Equation (2) [4], [5]. For(x,y)^AxB,
(y,z)eBxC, x z
PS-R ( ' > =
Max
(2)
\
Min
(MR(x,y), fts (y, z))\
5. Experiments and Results The proposed fuzzy sign language gestures recognition system's file size is 251 Kbytes and it can process and calculate 200 samples per seconds on WPS. The overall process of embedded the KSSL recognition system using fuzzy logic consists of three major steps. In the first step, while the user inputs prescribed the KSSL into WPS using data gloves, motion tracker based on blue-
tooth module, the KSSL input module captures user's the KSSL. And, in the second step, the KSSL recognition system changes characteristics of data by parameters for fuzzy recognition module. In the last step, it calculates and produces fuzzy value about user's dynamic the KSSL through a fuzzy reasoning and composition process. The recognition model by the RDBMS is used as independent variable of fuzzy reasoning, and we decide to give a weight to each parameter. The produced fuzzy value is used as a judgment of user action and the KSSL recognition system decides user's dynamic the KSSL according to degree of fuzzy value. The flowchart of embedded the KSSL recognition system using fuzzy logic is shown in Figure 5.
Figure 5 The flow-chart of the fuzzy sign language recognition system Recognition rate(%) af t
"I go to airport": 95.6%
A 44 KSSL Recognition Models Figure 6. The average recognition rate of the KSSL recognition system
758 For every 15 reagents, we repeat this action 20 times about the KSSL sentence and words. Experimental results, 93.9% average recognition rate of the KSSL recognition system about 44 sentence recognition models are shown in Figure 6 respectively. The root causes of errors between recognition of gestures are various: that is, imprecision of prescribed actions, user's action inexperience, and the changes in experiment environment of physical transformation at 5th data glove system fiber-optic flex sensor. 6. Conclusions The Post-wearable PC is subset of ubiquitous computing that is the embedding of computers in the world around us. In this paper, we implemented hand gesture recognition system that analyzes user's intention more efficiently and more accurately can recognizes and represents continuous 44 significant, dynamic the KSSL of users with flexibility in real time using fuzzy max-min recognition algorithm and RDBMS module in WPS. Also, because the sentential sign language recognition system has very superior the ability of communication and representation, we can expect the function that understand life and culture of deaf people (and an aphasiac) and connect with modern society of hearing people through sign language recognition system in this study. Acknowledgement This research was supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment)" (IITA-2005- CI090-0501-0019) References 1. Seung-Guk Kim, Korean Standard Sign Language Dictionary, Osung publishing company 107-112 (1995) 2. J.-H.Kim, et al, Hand Gesture Recognition System using Fuzzy Algorithm and RDBMS for Post PC. FSKD2005. Lecture Notes in Artificial Intelligence, Vol. 3614. Springer-Verlag, Berlin Heidelberg New York 170175 (2005) 3. 5DT Data Glove 5 Manual and FASTRAK® Data Sheet, http://www.5dt.com 4. W. B. Vasantha kandasamy.: Smaranda Fuzzy Algebra. American Research Press, Seattle (2003) 5. C.H.Chen.: Fuzzy Logic and Neural Network Handbook. 1st edn. McGrawHill, New York (1992)
ON THE INTUITIONISTIC DEFUZZIFICATION OF DIGITAL IMAGES FOR CONTRAST E N H A N C E M E N T
I. K. V L A C H O S A N D G. D. S E R G I A D I S Aristotle
University of Thessaloniki Faculty of Technology Department of Electrical & Computer Engineering Telecommunications Division University Campus, GR-54124, Thessaloniki, GREECE E-mail: ivlachosQmri.ee.auth.gr, sergiadiOauth.gr
This paper addresses the issue of the intuitionistic defuzzification of digital images. Based on a recently introduced framework for intuitionistic fuzzy image processing, the validity conditions and properties of different intuitionistic defuzzification schemes are studied under the scope of performing contrast enhancement.
1. Introduction Since Zadeh introduced the concept of fuzzy sets (FSs) 1 , various notions of higher-order FSs have been proposed based on different intuitive points of departure. Among them, Atanassov's intuitionistic fuzzy sets (A-IFSs) 2 provide a flexible framework to handle the inherent ambiguity present in digital images, springing out of imperfect or/and imprecise information. In this paper we extend the task of de-constructing the A-IFS representing an image in the intuitionistic fuzzy domain, recently introduced in 3 , in order to obtain the image in the gray-level domain. The study involves a thorough investigation of different intuitionistic defuzzification schemes and their properties, as well as their validity conditions. Finally, application of the proposed intuitionistic defuzzification schemes to low-contrasted digital images is carried out, in order to perform contrast enhancement.
2. Elements of Intuitionistic Fuzzy Sets Theory We briefly describe the basic notions of A-IFSs that are going to be used throughout the paper. 759
760 Definition 2.1. An FS A denned on a universe X may be given as 1
A = {{x,fiA(x))\xeX},
(l)
where /i^ : X —> [0,1] is the membership function of A. The membership value HA{x) describes the degree of belongingness of x €f X in A. Definition 2.2. An A-IFS A denned on a universe X is given by2 A = {(x,nA(x),vA(x))\xeX},
(2)
where »A:X-*
[0,1]
and
vA : X -> [0,1],
(3)
with the condition 0 ^ fiA{x) + vA(x) ^ 1,
(4)
for all x € X. The numbers /x>i(:r) and vA{x) denote the degree of membership and the degree of non-membership of x to ^4 respectively. For an A-IFS A in X we call the intuitionistic index of an element x E X in A the following expression 7>Vi(z) = l-nA(x)
-fA{x).
(5)
We can consider irA(x) as the hesitancy degree of x to A 2 . From (5) it is evident that 0 ^ irA(x) ^ 1, for all x € X. 3. Intuitionistic Fuzzy Image Processing Framework Intuitionistic fuzzy image processing involves in general a sequence of operations carried out using the concepts of A-IFSs theory, for performing image processing tasks. Fig. 1 illustrates an overview of the aforementioned framework. In the first stage the image is transferred into the fuzzy domain and then into the intuitionistic fuzzy domain by a suitable selection of membership and non-membership functions. After the modification of the intuitionistic fuzzy components according to the desired image operation, the inverse procedure is carried out to obtain the image in the gray-level domain. In this work we are focusing on the stage of decomposing the A-IFS corresponding to the image to the appropriate FS; that is the intuitionistic defuzzification.
761
Input image
Figure 1.
Overview of the intuitionistic fuzzy image processing framework.
4. Intuitionistic Fuzzification In 3 , a method for constructing the A-IFS corresponding to an image has been proposed, based on the optimization of its intuitionistic fuzzy entropy. We briefly describe the aforementioned approach in this section. Let us consider an image A of size M x N pixels having L gray levels g ranging between 0 and L — 1. The image can be considered as an array of fuzzy singletons 4 ' 5 ' 6 . Each element of the array denotes the membership value of the gray level g^ corresponding to the (i, j)-th pixel, with respect to a pre-defined image property. For the task of contrast enhancement we consider the property "brightness" of gray levels. Therefore, the image can be represented using the following FS A = {{9ij,VA{9ij))\9ij G { 0 , . . . , L - 1}} .
(6)
In order to transfer the image from the fuzzy to the intuitionistic fuzzy domain, we apply the method of maximum intuitionistic fuzzy entropy principle^, which involves the selection of an appropriate combination of membership and non-membership functions to describe the image in terms of elements of A-IFSs theory. The optimal image representation in the intuitionistic fuzzy domain is derived by varying a free parameter A that controls the shape of the membership and non-membership functions and considering the maximization of the intuitionistic fuzzy entropy as defined in7. This optimization criterion can be formulated as \opt = arg max {E(A; A)} ,
(7)
where E(A; A) is the intuitionistic fuzzy entropy of the image given by F M [
. , '
= ;
_ L v \
l-max{l-(l-MA(g))\(l-^(g))MA+D}
MJV^/^l-min{l-(l-^(5))\(l-^(g))MA+i)}' (8)
762 where A ^ 0 and h^ is the histogram of the fuzzified image. The membership function of gray levels, denoted as HA{g) is calculated according to / \
9 ~ 9min
PAti) = ~
—
9max
/n\
.
(9)
9min
where gmin and flWx are the minimum and maximum gray levels of the image respectively. It should be mentioned that alternative methods for generating the FS corresponding to image A can be applied. After obtaining the optimal parameter Xopt, the image is represented as the following A-IFS Aopt = {(9,»Aopt(9),VAopM)\9 € {0,... ,L - 1}} , (10) where ^ opt () = l - ( l - M < ? ) ) A ° p t
(11)
">W,(s) = (1 - ^ ( 3 ) ) A o p t ( A o p t + 1 ) -
(12)
and
In this paper we will demonstrate that the construction of the A-IFS corresponding to an image using the aforementioned method, results in enhancing the overall contrast of the image under processing. 5. Intuitionistic Defuzzification: Embedding Hesitancy In order for the image to be transferred back to the gray-level domain, the process of intuitionistic defuzzification should be first carried out. This task involves the de-construction of the A-IFS representing the image into its corresponding FS and is performed using the following operator. Definition 5.1. If A € J&y{X), where Da(A) = {{xi,nA(xi)
then Da : J?&y{X)
+ aTrA{xi),uA(xi)
+ (1 - a)irA(xi))\xi
-»
&y(X),
e X} , (13)
with a e [0,1]. It should be mentioned that the family of all FSs associated with A by the operator Da, denoted by {Da(A)}ae<0<1,, constitutes a totally ordered family of FSs. We will call Da the Atanassov's operator. Different values of a generate different FSs and therefore different representations of the processed image in the fuzzy domain are possible. Thus, a criterion must be employed in order to select the optimal parameter aopt.
763 5.1. Maximum
Index of Fuzziness
Defuzzification
3
In , the maximization of the index of fuzziness criterion was employed. In image processing it is sometimes useful to increase the grayness ambiguity, since images with high fuzziness are expected to be more suitable in terms of human brightness perception8. For an FS A defined on a universe X, the index of fuzziness, using the product operator to implement the intersection, is given by 1 |X| -K^) = 7 M 7 i £ M * 0 ( l - ^ f o ) ) .
(W)
where \X\ = Cardinal(X). Applying Atanassov's operator to the A-IFS Aopt describing the image in the intuitionistic fuzzy domain, we obtain the representation of the image in the fuzzy domain as the FS Da(Aopt), with a corresponding index of fuzziness given by 7 (Da{Aopt))
1 L~X = — - Y, hA(9)Vna(Aopt){g)
(l - HDa(Aopt){g)) •
(15)
In order to find the optimal parameter aopt that maximizes the index of fuzziness, we set d-y (Da(Aopt)) /da = 0, which yields that ,
_ E g Jo hA{g)nAoi>t(g) (l 2
Eg=o
2fiAovt(g))
hA{g)7TAopt(g)
Moreover, since d2j{Da(Aopt))
1
^
2
(17)
g=0
it is evident that the extremum of (15) is a global maximum. However, (16) does not guarantee that the optimal parameter a'opt of Atanassov's operator will be in the [0,1] interval. Therefore, the optimal parameter aopt is obtained as 0, aopt
= <
ifa'opt<0,
a'opt,
if0
1,
ifa'opt>l-
(18)
Finally, the image in the gray-level domain is obtained as g' = {L-\)^D
(Aopt){g),
(19)
764 where tJ-Daopt(Aopt)(g) = aopt + (1 - <xopt)HAopM ~ aoPtVABpt{g)
(20)
and g', g are the new and initial intensity levels respectively. 5.2. Validity
Condition
A critical constraint in most cases of contrast enhancement is the monotonicity of the gray-level transformation function employed, which is required to be increasing, in order to preserve the ordering of the intensity levels. In the case of (20), generally we obtain that dfJ.pa(A)(g) _ ,, dg ~{1
^dVAJg) dg
„dvA{g) dg '
a)
(21)
Since a G [0,1] and /XA() and VA{9) are increasing and decreasing functions of g respectively, it follows immediately that ^Da(A){g) is also an increasing function of the gray levels, for any a G [0,1]. The aforementioned analysis, constraints our selection of appropriate ways to embed hesitancy and is the driving force when considering a to vary with g. 5.3. Generalized
Intuitionistic
Defuzzification
9
Burillo and Bustince proposed a generalized form of Atanassov's operator, so that for each i j ^ I w e take a value of the parameter a according to that point. By extension we will denote this operator by Dax and we will call it Atanassov's point operator. Definition 5.2. If A € J&S?{X), where D
aXi{A)
then Dax. : J&Sf{X)
= {(xi>^A(xi) + aXiTTA(xi),i/A(xi)
->
&Sf{X),
+ {1 - aXi)irA{xi))\xi
G X} , (22)
witha X i e [0,1]. Consequently, (21) can be generalized as dnDag(A){g)
dg
dy.A{g) = (1
-
a(fl))
, sdvA{g)
-ir~
a{g)
, /n +{l
..
,
^g~ ^
A{9)
^da{g)
~
l/A{9))
^g~-
(23) Obviously, if a = const for all g € { 0 , . . . , L - 1}, then (23) reduces to (21). From (23) it is evident that if a is an increasing function of the gray levels
765
(a)
(b)
(c)
(d)
Figure 2. (a) Gray-scale test image and (b) contrast-enhanced image obtained using histogram equalization. Images processed using the proposed intuitionistic fuzzy image processing framework using (c) maximization of index of fuzziness criterion and (d) cumulative histogram approach.
(a)
(b)
(c)
(d)
Figure 3. (a) Gray-scale test image and (b) contrast-enhanced image obtained using histogram equalization. Images processed using the proposed intuitionistic fuzzy image processing framework using (c) maximization of index of fuzziness criterion and (d) cumulative histogram approach.
and additionally a(g) € [0,1] for all g € { 0 , . . . , L - 1}, then fJ,Da(A)(g) is also an increasing function of g, since it holds that da(g)/dg > 0. Using increasing functions for the parameter a of Atanassov's point operator in the intuitionistic defuzzification stage, has the effect of keeping darker levels dark, while at the same time brightening higher intensity levels. 6. Experimental Results For experimental evaluation, we considered gray scale images of size 256 x 256 pixels with 8 bits-per-pixel gray-tone resolution that have undergone extreme contrast degradation. Based on the analysis of Sec. 5, we propose a method for calculating a using the cumulative histogram of the image, given by athA (fl) = jm £?=o M*)> f o r all # e { 0 , . . . , L - 1}. The cumulative nature of the function, ensures that a will be an increasing function of the
766 gray levels. Fig. 2(a) depicts an image with low contrast, while images of Figs. 2(c) and 2(d) illustrate the results of the proposed intuitionistic fuzzy image processing framework for contrast enhancement. In Fig. 2(b) the histogram equalization technique is demonstrated, while in Figs. 2(c) and 2(d) the results of the methods based on the maximization of index of fuzziness and the cumulative histogram are shown respectively. Another example is illustrated in Fig. 3. From the images of Figs. 2(d) and 3(d), one can observe t h a t the proposed approach using the generalized intuitionistic defuzzification scheme delivers better results, since it also considers image statistics during hesitancy embedding. 7. D i s c u s s i o n a n d C o n c l u s i o n s In this paper we studied the intuitionistic defuzzification of digital images in the framework of A-IFSs theory. Different intuitionistic defuzzification approaches were studied and their properties were investigated. Finally, application of the proposed approach to contrast enhancement demonstrated the potential of the described intuitionistic fuzzy image processing framework and the various intuitionistic defuzzification schemes. References 1. L. A. Zadeh, Fuzzy sets, Inf. Control 8, 338-353 (1965). 2. K. T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets Syst. 20, 87-96 (1986). 3. I. K. Vlachos and G. D. Sergiadis, Intuitionistic fuzzy image processing, in: M. Nachtegael, D. V. der Weken, E. E. Kerre, and W. Philips (Eds.), Soft Computing in Image Processing: Recent Advances, Studies in Fuzziness and Soft Computing, Springer-Verlag (2006) (to appear). 4. S. K. Pal and R. A. King, Image enhancement using fuzzy set, Electron. Lett. 16, 376-378 (1980). 5. S. K. Pal and R. A. King, Image enhancement using smoothing with fuzzy sets, IEEE Trans. Syst. Man Cybern. 11, 495-501 (1981). 6. S. K. Pal and R. A. King, A note on the quantitative measure of image enhancement through fuzziness, IEEE Trans. Pattern Anal. Mach. Intell. 4, 204-208 (1982). 7. E. Szmidt and J. Kacprzyk, Entropy for intuitionistic fuzzy sets, Fuzzy Sets Syst. 118, 467-477 (2001). 8. H. R. Tizhoosh, Fuzzy image enhancement: An overview, in: E. E. Kerre and M. Nachtegael (Eds.), Fuzzy Techniques in Image Processing, Studies in Fuzziness and Soft Computing, vol. 52, 137-171, Springer-Verlag (2000). 9. P. Burillo and H. Bustince, Construction theorems for intuitionistic fuzzy sets, Fuzzy Sets Syst. 84, 271-281 (1996).
A HEURISTIC APPROACH TO INTUITIONISTIC FUZZIFICATION OF COLOR IMAGES
I. K. V L A C H O S A N D G. D. S E R G I A D I S Aristotle
University of Thessaloniki Faculty of Technology Department of Electrical & Computer Engineering Telecommunications Division University Campus, GR-54124, Thessaloniki, GREECE E-mail: [email protected], [email protected]
In this work we present a heuristic approach to the intuitionistic fuzzification of color images; that is constructing the intuitionistic fuzzy set corresponding to a color digital image. Exploiting the physical properties and drawbacks of the imaging and acquisition chain, we model the uncertainties present in digital color images, using the concept of fuzzy histogram using fuzzy numbers.
1. Introduction Atanassov's intuitionistic fuzzy sets (A-IFSs) 1,2 ' 3 ' 4 constitute a generalization of the concept of fuzzy sets (FSs) proposed by Zadeh5, by considering the imprecise or/and imperfect nature of information. A-IFSs are described using two characteristic functions, namely the membership and nonmembership, denoting the degree of belongingness and non-belongingness of an element of the universe to the A-IFS respectively. In this paper we present a novel approach for constructing the A-IFS corresponding to a color image, based on an extension of the method for gray-scale images recently presented in 6,7 . The task of intuitionistic fuzzification is the first and, presumably, the most important in the intuitionistic fuzzy image processing framework, since it involves the definition of suitable membership and non-membership functions to describe the image in the intuitionistic fuzzy domain. To model the hesitancy present in digital images we consider all those sources of indeterminacy characterizing every physical system, and especially digital image acquisition systems, such as the quantization noise and the dynamic range suppression. Finally, application of the introduced scheme to real-world color images, demonstrates 767
768 its efficiency in modelling the hesitancy associated with image pixels. 2. Intuitionistic Fuzzy Sets Theory Definition 2.1. An FS A defined on a universe X may be given as 5 A = {{x,lxA{x))\xeX},
(1)
where n^ : X —> [0,1] is the membership function of A. The membership value fiA(x) describes the degree of belongingness of x £ X in A. Definition 2.2. An A-IFS A defined on a universe X is given by 1,2,3,4 A = {(x,iiA(x),i/A(x))\x€X},
(2)
where fiA:X^[0,l]
and
vA: X-*
[0,1],
(3)
with the condition 0 ^ HA(x) + vA(x) ^ 1,
(4)
for all x £ X. The numbers \iA {x) and vA (x) denote the degree of membership and the degree of non-membership of a; to A respectively. For an A-IFS A in X we call the intuitionistic index of an element x € X in A the following expression 7rA(x) = 1 - nA(x) - vA{x).
(5) 123
We can consider nA(x) as the hesitancy degree of a; to A ' ' '*. is evident that 0 < TTA(X) ^ 1, for all x £ X.
From (5) it
3. On the Intuitionistic Fuzzification of Images 3.1. Sources of Hesitancy
in Real-World
Images
Hesitancy in real-world images springs out of various sources, which in their majority are due to the inherent weaknesses of the imaging mechanisms. Limitations introduced by the acquisition chain affect our certainty regarding the actual "grayness" or "edginess" of a specific pixel. As sources of hesitancy we can identify, as also illustrated in Fig. 1, the imaging sensor that introduces a suppression of the dynamic range and also the A/D converter, which is mainly responsible for the quantization noise present in any digital image processing system. In the following section a model for estimating the hesitancy associated with pixels of gray-scale images is described6'7.
769
Dynamic range suppression
§§f Quantization noise
Figure 1. Image acquisition model. 3.2. A Fuzzy Histogram
Approach
to Hesitancy
Modelling
A fuzzy number g : R —• [0,1] is an FS of the real line that is normal and convex. Symmetrical fuzzy numbers of the form Hg(x) = max I 0,1 —
(6) P where p is a positive real parameter, are conceptually suitable to represent the notion of gray level "approximately g". Using the concept of fuzzy numbers, to represent the gray levels, the notion of histogram can be extended into a fuzzy setting 8,9 . The fuzzy histogram of an image A of size M x N pixels having L gray levels g ranging between 0 and L — 1 is defined as
hfA(g) ^WWJh^igWe
{o,...,M
-l},
j e {o,...,N
-i}}\\,
(7)
where || • || stands for the cardinality of an FS. However, fuzzy histogram itself fails to be a probability density function. Therefore, a normalized fuzzy histogram is defined as hfA(9)
hfA(9)
EfJo 1 ^)'
(8)
A limitation of "hard" first-order statistics when applied to real-world images, is that there exists a number of gray levels with zero or almost zero frequency of occurrence, while gray levels in their vicinity possess high frequencies8'9. Therefore, it seems proper that the hesitancy originating out of quantization noise to be proportional to the normalized absolute difference between the crisp and fuzzy normalized histograms; i.e. nA{g)
a
m9)-hA{g)\ maxg {\hcA(g) - hfA(g)\j
(9)
770 where hcA is the normalized crisp histogram of the image. From (5) it follows immediately that for any gray level g, the maximum hesitancy value n™ax(g) that can be assigned is *TX(9)
=
(10)
1-HA(9),
which describes the fact that the maximum hesitancy of a gray level decreases as the corresponding membership value increases. Considering that TrA(g) oc (1 - nA{g)),
(11)
ensures that the constraint imposed by (10) is satisfied. The aforementioned mathematical constraint turns out to describe a very interesting property of physical systems. Due to the intrinsic noise in the acquisition and imaging chain, lower gray levels are more affected by noise than higher ones. Therefore, hesitancy associated with lower gray levels should by definition be larger than the one corresponding to higher levels. Last, but not least, the hesitancy associated with a gray level should also decrease as the dynamic range of the image decreases, since the more suppressed the dynamic range of the image, the less certain we are regarding the actual grayness of the intensity levels. The normalized dynamic range is given by Akr _ 9max - 9min
,^\
where gmin and gmax are the minimum and maximum gray levels of the image respectively. Consequently, by considering the aforementioned sources of indeterminacy and their appropriate modelling, the hesitancy margin associated with the gray level g of a digital image is approximated by nA(g) = (1 - »A(g))
1^)
~ ^(g)l
maxg{\hcA(g)-hA{g)\j
(1
_
fcAr);
(13)
with k e (0,1). Parameter k controls the overall influence of the dynamic range to the hesitancy of the gray level g. One can easily verify that the hesitancy degree computed by (13) satisfies the constraint of 0 ^ TrA(g) ^ 1. Finally, the membership value of a gray level g, denoting the degree of brightness, is considered to be its normalized intensity level; that is
where g e { 0 , . . . , L - 1}. It should be mentioned that any other method for calculating fiA can also be used. The A-IFS with membership function
771 [iA and hesitancy index TTA, turns out to be a proper A-IFS; i.e. it satisfies the fundamental constraint of (4). 4. Intuitionistic Fuzzification of Color Images Color images can be represented in various ways using different color spaces. A color space is the mean for representing the colors and the relationship among them. There exists a number of different color spaces, each one using different models depending on the imaging system used and the current application. One of the most commonly used color spaces is the RGB model, which is widely applied in digital image processing applications. The RGB color space utilizes an additive model in which the primary colors red, green, and blue are combined in different amounts to produce different colors. Therefore, in a color image the color of a pixel x is defined as the triplet x = (XR,XG,XB), where each element XR, XG, and XB corresponds to the intensity of the R, G, and B channel respectively. Let us now consider a color image A in the RGB color space. A straightforward extension of the aforementioned scheme to color images, involves its application to each color component of the image and then the combination in an intuitive way of the intermediate results, in order to obtain the overall hesitancy associated with a specific pixel of the image. Decomposing the color image into its color components, allows us to apply the intuitionistic fuzzification framework for gray-scale images in a straightforward manner. Consequently, by using (13) in each of the R, G, and B channels we obtain the corresponding hesitancy functions n^, n^, and n^. Each one of the hesitancy functions describes the hesitancy margin associated with a specific pixel of the image in a specific channel of the RGB color space. However, it is useful to assign a single hesitancy value to an image pixel, in order to have an overall estimation of its corresponding hesitancy margin. Even though we have considered that each channel of the RGB color space is independent from the other two, the total hesitancy of a specific pixel should also carry the information of the relative brightness of a pixel for each of the channels with respect to the overall color of that pixel. Therefore, we define for each pixel of the image the following weight associated with each component of the RGB color space. In the case of the R channel we have w*{x)
=
^ XR + XG+
•
(15)
XB
In a similar manner we compute the weights of the pixels corresponding to the G and B channels. Moreover, for color images obtained under poor
772 illumination conditions, the color information is lost, either for the whole image or parts of it, and the resultant image can be considered as graytone one. In order to take into account this situation we also consider the hesitancy corresponding to the gray-level version of the color image, whose gray levels x are computed as x = - {xR + xG + xB).
(16)
Finally, the total hesitancy associated with a pixel of a color image is obtained as TKlZtal{x) = A (wR(x)n%(x) + wG(x)n%(x) + wB{x)n%(x))
X)nA(x), (17) where A G [0,1) and A denotes the gray-level version of image A obtained using (16). Parameter A controls the influence of the two hesitancy components, corresponding to the color and the gray-scale images, to the overall hesitancy of the specific pixel. The value of A depends on the illumination of the image under processing. For low-illuminated images smaller values of A result in a more accurate estimation of pixels hesitancy. Finally, the total membership function fj,t£tal is constructed in a similar manner. The analysis of Sect. 3, ensures that the set Atoial will be a proper A-IFS. 4.1. Hesitancy
Image Processing
+ (1 -
(HIP)
Intuitionistic fuzzification of color images allows for the handling of the image source defects and also for the compensation of errors according to the output devices. Fig. 2 illustrates sources of hesitancy associated with an image, but this time originating due to the physical properties and drawbacks of the output mechanisms. Having obtained, as described in Sect. 4, the hesitancy associated with the different channels of the image, as well as the total hesitancy corresponding to the image pixels, allows us to treat the hesitancy component in such a way, in order to overcome errors propagating through the input and output chain. 5. Experimental Results In order to demonstrate the applicability of the proposed intuitionistic fuzzification scheme for color images, we tested the presented approach with various synthetic and real-world images. We considered color images, using
773
Dynamic range suppression
Figure 2.
fBll
Quantization noise
Hesitancy image processing (HIP) framework.
the RGB color space representation, of size 256 x 256 pixels with 8-bitsper-pixel color resolution for each channel. Figs. 3(a) and 3(c) depict two of the color images considered, while Figs. 3(b) and 3(d) show the derived normalized total hesitancy maps, with dark pixels corresponding to low hesitancy values and white to high ones. Finally, Fig. 4 illustrates the hesitancy functions for each of the R, G, and B channels of the image of Fig. 3(c). It should be mentioned that in both examples we have considered that A = | , while the parameters for calculating the fuzzy histograms were set to p = 5 and k = 0.9.
6. Discussion and Conclusions In this paper a method for the intuitionistic fuzzification of color images was presented, which is the first stage of the intuitionistic fuzzy image processing framework. A fuzzy histogram-based approach was used, in order to model the sources of indeterminacy present in digital images. Finally, the notion of hesitancy image processing (HIP) framework was briefly presented, which exploits the potential of the proposed method for the selective processing of color images. References 1. K. T. Atanassov, Intuitionistic fuzzy sets, Fuzzy Sets Syst. 20, 87-96 (1986).
774
Figure 3. (a) Color test image with its corresponding (b) normalized total hesitancy map. (c) Another color image and (d) normalized total hesitancy map. 0.08
0.08 0.06
0.06
^0.04
<~0.04
0.02 0
EL
.^mrt^kAtk^uU, 0
50
100
150 200
250
0.02 0
,. M »n.t>i*,/»*Uh—
0
50
100
150 200
Red level r
Green level g
(a)
(b)
250
50
100
150 200
250
Blue level b
Figure 4. Hesitancy functions for each of the (a) R, (b) G, and (c) B channels for the color image of Fig. 3(c). 2. K. T. Atanassov, More on intuitionistic fuzzy sets, Fuzzy Sets Syst. 33, 37-45 (1989). 3. K. T. Atanassov and G. Gargov, Interval valued intuitionistic fuzzy sets, Fuzzy Sets Syst. 3 1 , 343-349 (1989). 4. K. T. Atanassov, Intuitionistic Fuzzy Sets: Theory and Applications, Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg (1999). 5. L. A. Zadeh, Fuzzy sets, Inf. Control 8, 338-353 (1965). 6. I. K. Vlachos and G. D. Sergiadis, Intuitionistic fuzzy image processing, in: M. Nachtegael, D. V. der Weken, E. E. Kerre, and W. Philips (Eds.), Soft Computing in Image Processing: Recent Advances, Studies in Fuzziness and Soft Computing, Springer-Verlag (2006). 7. I. K. Vlachos and G. D. Sergiadis, Towards intuitionistic fuzzy image processing, in: Proc, International Conference on Computational Intelligence for Modelling, Control and Automation, Vienna, Austria (2005). 8. C. V. Jawahar and A. K. Ray, Fuzzy statistics of digital images, IEEE Signal Process. Lett. 3, 225-227 (1996). 9. C. V. Jawahar and A. K. Ray, Incorporation of gray-level imprecision in representation and processing of digital images, Pattern Recognit. Lett. 17, 541-546 (1996).
INTUITIONISTIC FUZZY FEATURE EXTRACTION FOR QUERY IMAGE RETRIEVAL FROM COLOUR IMAGES K.SURESH BABU Department of Electronics and Communications Engineering, Birla Institute of Technology, Ranchi, Jharkhand 835215, India R.SUKESH KUMAR Department of Electronics and Communications Engineering, Birla Institute of Technology, Ranchi, Jharkhand 835215, India In this paper we propose a query image retrieval technique using fuzzified features extracted from the co-occurrence matrices. Among the fourteen features proposed by Haralick, three features Angular secondary momentum (ASM), Entropy and Contrast are computed from crisp and fuzzy co-occurrence matrices. The above said features are also computed by proposed new formulae based on fuzzy and Intuitionistic fuzzy concepts. The fuzzy features are compared with the crisp features by sensitivity analysis and the superiority of the former over the latter is established. The above features are used for query image retrieval on monochrome as well as color images. It is established that in noisy environment, the proposed fuzzy features successfully retrieve the query, where the crisp feature either fail to retrieve the query or result in multi faulty retrievals.
1. Introduction: Various query image retrieval techniques have been proposed earlier, where the textural features are used for comparing the query image with the original image. However, when the original image is corrupted with noise, these techniques fail in successful retrieval of the query in most cases. Since fuzzy techniques, inherently consider the imprecision present in the pixel intensity values, retrieval using fuzzy features with a similarity measure, is expected to yield better success rate. In this paper, new formulae are proposed for textural features like Angular secondary momentum (ASM), Entropy and Contrast based on fuzzy and Intuitionistic fuzzy concepts, and fuzzy features thus extracted, are successfully employed to retrieve query image from noisy original image. Gray level co-occurrence matrices have been widely used for texture analysis, since the features [1] derived from such matrices have been observed to possess good discrimination capability among the textural images. These features also find application in image retrieval algorithms. The pixels always 775
776 carry a reasonable amount of inherent fuzziness due to the imprecision of pixel intensities and hence fuzzy statistics have been observed to behave better in representing the spatial gray distribution of a digital image[2]-[4]. Also reasoning based on fuzzy logic is found to be closer to the human thinking mechanism and hence is more appropriate for enhancing machine vision capabilities. Because of the capability of fuzzy logic to represent the imprecision inherent in pixels, it is better suited to image processing, especially in a noisy environment. Given a query, its features are compared with the features of different regions of a given image, to find whether there is a match or not. This allows to retrieve the query from the image. The retrieval using a single feature may result in multiple retrievals. By combining several features together, with proper thresholds for different features, an exact retrieval can be achieved. 1.1. Prior work: The basic technique of image retrieval is to compare some prominent features of the query with the same features of the image segments using some kind of similarity measure. Many researchers proposed different techniques for query image retrieval. A 3-D object-based image retrieval system, where human image understanding cues to represent the object structure and its detection is investigated in[5]. Six Haralick's features computed from co-occurrence matrices and normalized correlation adopted as a similarity function is used for query image retrieval in [6], The study in [7] Propose a technique where color histograms of the patch windows of image are compared with query color histogram with a similarity measure. Integration of color histogram and texture histogram is extracted for query and then multi-histogram used to measure similarity is inspected to retrieve the query in[8], Textural features extracted from colour HSV model are used with human visual perception as similarity measure in [9]. Recent researches show that application of fuzzy logic in image retrieval yields better results. Extraction of features using fuzzy logic for image retrieval is investigated in [10]. 1.2. Present Work: In computation of co-occurrence matrices, the repetitive property of textural images in intensity, distance and direction are calculated. Due to the imprecision in the pixel intensity, all the above parameters can be fuzzy in nature. Various computations of fuzzy co-occurrence matrices, by fuzzifying the different parameters like intensity (I) distance (8), direction (0), individually and
777 a combination of distance and direction, are carried out using three fuzzy membership functions viz Triangular, Trapezoidal and Gaussian. The textural features extracted from fuzzy co-occurrence matrices are found to be less sensitive to impreciseness. Generally in computation of fuzzy co-occurrence matrices, it is observed that fuzzification of combination of distance (8) and direction (0) using Trapezoidal membership function yields the best result for color images. Those fuzzy co-occurrence matrices are used for further computations. Further as explained earlier, formulae have been proposed based on fuzzy and intuitionistic fuzzy for three features. The features extracted thus are termed as fuzzy features. Then features are also extracted from crisp and fuzzy co-occurrence matrices using these formulae. The fuzzy features are compared with features extracted from crisp and fuzzy co-occurrence matrices using the earlier existing formulae [1] using a insensitivity measure called sensitivity [4]. The comparison results well establish the superiority of fuzzy features in handling the impreciseness embedded in digital images. To further establish the principle, query image is retrieved from images corrupted with noise using all the features extracted using the four techniques, viz crisp formulae applied on crisp co-occurrence matrices (crisp-crisp), crisp formulae applied on fuzzy co-occurrence matrices (fuzzy-crisp), proposed formulae applied on crisp co-occurrence matrix (crisp-fuzzy) and proposed formulae applied on fuzzy co-occurrence matrices (fuzzy-fuzzy). The application of a single feature among the three features namely ASM, Entropy and Contrast yields in multiple retrievals. Proceeding to apply all the three features combinedly results in single exact query image retrieval. All the four techniques yield successful retrievals, when applied to uncorrupted original images. By progressively corrupting the image by adding Gaussian noise, reveals an interesting observation that when the noise content reached higher level, the retrieval attempted using crisp-crisp and fuzzy-crisp technique either failed to retrieve the query image or resulted in multiple retrievals, while fuzzy feature could still successfully retrieve the exact query image. 2. Crisp and fuzzy features The co-occurrence matrix represents the statistics of pairs of geometrically related image points. Co-occurrences, the second order statistics of a digital image represent the joint probability or frequency of occurrence of pixels with gray values m and n (m, n eL ) separated by a distance 5, at a specified direction 0, and are expressed in the form of a two dimensional matrix C = [C m n ] L x L = f (1,8,0)
In general, the co-occurrence matrix of an image I = [Lj] quantized into L gray values is an L x L matrix C = [ C n i l ] with an (m, n) * element. Cm„ = |{((i, j), (k, l));Iij =m,Ikl = n,d ((i, j),(k,l)) =5, Z ((k, l),(m, n)) = 9}| (1) where d(x, y) is the distance between points x and y and Z (x, y) represents the angle made by the line joining x and y with the horizontal. The fuzzy co-occurrence matrix, the second order statistics, of a digital image is an array of real number R = [r J , where r mn represents the frequency of occurrence of the gray value "around m" separated from another pixel with gray value "around n" by a distance 5 at a specified directionG, that is, R=f(I,5,9). Here the computation can be performed by fuzzifying the parameter I (henceforth mentioned as pixel intensity), or 5 (henceforth mentioned as distance) or 9 (henceforth mentioned as direction). The computations can be performed by fuzzifying two or three parameters combined also. In general case, the fuzzy extension of Eq. (1) leads to R » = I {n / ((>, j) ,(k, 1)); I(i, j) = m, I(k, 1) = n, d((i, j), (k, 1)) = 5, Z ((i, j), (k, 1)) = 9 ; V((i,j),(k,l))eG}| (2) where u can be one among \xu u2, H3 or ^4 as explained below, where Hi =min {[i m(Iij), |i „ (Iki) } for fuzzification of pixel intensity where u2 = ^ § d((i, j), (k, 1)) for fuzzification of distance 5 where u3 = M- e Z ((i, j), (k, 1)) for fuzzification of direction 9 where jx4= prod (M2. H3) for fuzzification of 5 and 9 2.1. Intuitionistic Fuzzy Sets Let a set E be fixed. An Intuitionistic L-fuzzy sets (ILFS) A * in E is an object having the form A*= {x, uA (x), vA (x)} I x e E} Where the functions \i-^. E —> L and VA: E —> L define the degree of membership and the degree of non membership of the element x s E t o A c E (for simplicity below we shall write A instead of A*), respectively, Properties of ILFS (V x e E) (n A (x) < N (vA (x))) Where N: L -> L. (VX6E)(G^HA(X) + VA(X)<1) min
( 1 - H A ( X ) ) = 1-max
xe E
HA(X)
xeE
2.2. Fuzzy feature extraction Applying fuzzy rules, the formulae for ASM, Entropy and Contrast given by Haralick et al.[l] can be modified using Triangular membership function, r
m
2.2.1. Extension to Second order Fuzzy A SM temp,(i,j) = {(um(k))min *P(i, j)}2 + {1 -(um(k))max *P(i, j)}2 Where and k
= [x][y],
m
=[r][s],
x =[0,1... L - l ] , r = [0,l...L-l],
y = [0,l... L-1] s = [0,l...L-I]
where ([i m
fASM(I) = Z Z { t e m P . ( i ' J ) }
P)
i=0 j=0
2.2.2. Extension to Second order Fuzzy Entropy
y y ^m(k))rai„*P(iJ)*iog{(Mm(k))min*P(U)}}+ L-l L-l
F {l>
(4)
i=0 j=0
2.2.3. Extension to Second order Fuzzy Contrast temp(i,j)= {(Mra(k))rain*P(i,j)} + {l-(M ni (k) ) max *P(i,j)} L-l L-l f L - 1
>
fCN(I) = Z Z Zt'temPCiJ) i=0 i=0 V t=0
(5)
J
wheret= | i - j | = [0,1..., L - 1 ] 3. Experiments and Results 3.1. Computation of sensitivity The performance of co-occurrence features computed by each method is compared by measuring the sensitivity in each case. Because the co-occurrence based features are extremely useful, it would be preferable, if these features are
780 less sensitive to imprecision. For verification, the features are computed on ideal as well as imprecise textures. Let f be particular feature computed from the ideal texture and f' be the same feature computed from the corresponding imprecise texture. Ideally the difference between f and f must be minimum if the features are to be insensitive to imprecision. To establish that fuzzy statistics offer a better method of computation of features, a measurement parameter called sensitivity is computed as follows S = (f-f)/(l + f*f) (6) 3.2. Query Image Retrieval: To retrieve the query from image, image is divided in to query sized window in all possible ways. Features ASM, Entropy and Contrast are extracted from query are compared with same features extracted from each window with sensitivity as similarity measure. Finally, the window which gives sensitivity less than the given thresholds with all three features is treated as retrieval. By progressively adding Gaussian noise to the original image the retrieval is done as per the above algorithm with crisp and fuzzy features extracted from both crisp co-occurrence and fuzzy co-occurrence matrices. 4. Analysis of Results The sensitivity analysis is carried out on many color textural images. As example, the analysis of a color images are given. Precise image, Fig. 1.1 has been made imprecise (Fig. 1.2) by adding 50% uniform noise. Sensitivities of all the three features ASM, Entropy and Contrast computed using crisp-crisp, fuzzy-crisp, crisp-fuzzy, fuzzy-fuzzy, and plotted as bar charts against the respective computation techniques are shown in Figs. 1.3, 1.4 and 1.5. The best result is given by the computation technique, which gives least sensitivity. The effect of applying the fuzzy features (application of Eqs. 3,4 and 5) either on crisp co-occurrence matrices or fuzzy co-occurrence matrices proved to be giving better sensitivity results than applying the crisp set of equations. This is true for colour textural images. The results generally prove the superiority of fuzzy feature extraction techniques over the conventional feature extraction techniques. The extent of improvements in results can be observed from the order of results, i.e. the sensitivities computed by fuzzy feature extraction have decreased to a very large extent. Analysis of query image retrieval results are shown in Figs 2.1 to 2.8. Fig. 2.1 an apple image, Fig. 2.2 is the query given By adding Gaussian noise progressively to the original image, it is observed that crisp-crisp could retrieve
781 till 6% noise (Fig. 2.3), fuzzy-crisp could retrieve till 11% noise (Fig. 2.4), crisp-fuzzy could retrieve till 15% noise (Fig. 2.5), fuzzy-fuzzy could retrieve till 18% noise (Fig. 2.6).The Figs. 2.7 and 2.8 show failure(multi faulty retrievals) of crisp-crisp and fuzzy-crisp at 15% noise respectively.
__
Fig 1.1 Precise image
Maximum=5.57*10Jl Minimum=2.18*1016 Y-axis: scaled sensitivity
Fig 1.3 ASM
Fig 1.2 imprecise image
Maximum=1.138*10"5 Minimurn=1.21*10"12 Y-axis: scaled sensitivity
Maximum=1.138*10"5 Minirmim=1.21*10"12 Y-axis: scaled sensitivity
BBH Fig 1.4 entropy
EH
Fig 1.5 contrast
Fig 2.1 Original Image
Fig 2.2 Query Image
Fig 2.5 Crisp-Fuzzy at 15%
Fig 2.6 Fuzzy-Fuzzy at 18%
Fig 2.3 Crisp-Crisp at 6%
Fig 2.7 Crisp-Crisp at 15%
l-lg LA
Fuzzy-Crisp at 11%
Fig 2.8 Fuzzy-crisp at 15%
5. Conclusion Uncertainty pervades every aspect of image retrieval, because of the ill posed and noisy images or queries. Fuzzy concepts are better suited for feature extraction in such cases. Sensitivity analysis shows that the fuzzy feature
782 extraction gives less sensitivity, compared to crisp feature extraction, which implies fuzzy features are less sensitive to impreciseness. The results obtained further confirm that when fuzzy features and crisp features are used for query image retrieval, the fuzzy features retrieved query successfully in noisy environment where crisp features failed to retrieve. Acknowledgments The authors would like to acknowledge Mr. Vineet Kumar Singh, who has kindly sponsored the registration fees. References 1. R.M. Haralick, K. Shanmugam and I. Dinstein, Textural Features for Image Classification, IEEE trans. Systems Man Cybernetics Vol.3, pp: 610-621, 1973. 2. C.V. Jawahar and A.K.Ray, Fuzzy statistics of digital images, IEEE Signal Processing Lett. 3:225-227, 1996. 3. C.V. Jawahar and A.K.Ray, Incorporation of gray level imprecision in representation and processing of digital images, Pattern Recognition Lett. 17:541-546, 1996.1999, PP: 759-778. 4. C.V. Jawahar and A.K.Ray, Techniques and applications of fuzzy statistics in Digital Image Analysis, Fuzzy Theory Systems: Techniques and Applications, Vol.2,1999, PP: 759-771. 5. Linhui Jia and Zhi-Qiang Liu, A technique for 3-D object representation and detection in knowledge-based and content driven image retrieval. IEEE International Conference on Intelligent processing Systems PP: 979-983, 1997. 6. Kyuheon Kim, Seyoon Jeong, Byung Tae Chun, Jae Yeon Lee and Younglae bae, Efficient video images retrieval by using local co-occurrence matrix texture features and normalised correlation.IEEE,PP:934-937,1999, TENCON. 7. P.Lewis, D.Dupplaw, and K.Martinez, Content -Based Multimedia Information Handling, Cultivate Interactive issue 6, 11 February 2002 8. Jinshan Tang and Scott Action, An image retrieval algorithm using multiple queries to images. IEEE, PP: 193-196, 2003. 9. Ying Dai, Intention -based image retrieval with or without Query image, IEEE Proceedings of the 10th International Multimedia Modelling. Conference, 2004. 10. Raghu krishnapuram, Swarup medasani, Sung-Hawn, Young-sik choi and Rajesh Balasubramaniam. Content-based image retrieval based on a fuzzy approach. IEEE transaction on Knowledge and Data Engineering, VOL. 16, NO. 10, October 2004, PP: 1185-1199.
CLASSIFICATION WITH INTUITUITIONISTIC FUZZY REGION IN GEOSPATIAL INFORMATION SYSTEM M. REZA MALEK 1 ' 2 JALAL KARAMl'
SHAMSOLMOLOOK ALIABADY 2
/- Dept. ofGIS, Faculty of Geodesy & Geomatics Eng., K.N. Toosi Univ. of Technology, Tehran-Iran 2- Research Institute of National Cartographic Center, Tehran-Iran
Abstract Although fuzzy logic methods are of great interest in many GIS applications, the traditional fuzzy logic has two important deficiencies. First, to apply the fuzzy logic, we need to assign, to every property and for every value, a crisp membership function. Second, fuzzy logic does not distinguish between the situation in which there is no knowledge about a certain statement and a situation that the belief to the statement in favor and against is the same. Due to this fact, it is not recommended for problems with missing data and where grades of membership are hard to define. In this paper, a simple fuzzy region and fundamental concepts for uncertainty modeling of spatial relationships are analyzed from the view point of intuitionistic fuzzy (IF) logic. We demonstrate how it can provide model for fuzzy region; i.e., regions with indeterminate boundaries. As a proof of our idea, the paper will discuss the process of creating thematic maps using remote sensing satellite imagery.
1. Introduction A rational assessment of usability of spatial data for the many tasks requires a proper understanding of the role uncertainty plays. As Geo Information System (GIS) analyses are often a base for decision making, ignoring the uncertainty may lead to wrong decisions and also to undetermined any trust. Describing spatial object in the current mapping of the real world as a precisely determined phenomenon is an insufficient abstraction process since often the feature of spatial indeterminacy or spatial vagueness is inherent to many geometric and geographic data [4].Sources of uncertainty are the inexact or incomplete definition of objects, and the inability to observe precise and complete relevant data (see [4]). The description of objects-static or dynamic- is not only uncertain in the above mentioned sense, but also contradictory in different contexts [8]. 783
784 Over the past few years, there has been considerable work done on modeling of fuzzy spatial objects (see e.g. [7]). Models based on fuzzy set theory, allow a much more finegrained modeling of vague spatial objects. Although fuzzy logic methods are of great interest in many GIS applications, the traditional fuzzy logic has two important deficiencies. First, to apply the fuzzy logic, we need to assign, to every property and for every value, a crisp membership function. Second, fuzzy logic does not distinguish between the situation in which there is no knowledge about a certain statement and a situation that the belief to the statement in favor and against is the same. Due to this fact, it is not recommended for problems with missing data and where grades of membership are hard to define [11]. Using not enough or irrelevant data like aged satellite images are one example for the aforementioned problem. Another example is the definition of objects. They can be very different for the same object. Some experts may have different views of an object e.g., 'forest'. This problem is emerging whenever one has to deal with interoperability of different systems, combining different data sets. In this paper, a simple fuzzy region and fundamental concepts for uncertainty modeling of spatial relationships are analyzed from the view point of intuitionistic fuzzy (IF) logic. We demonstrate how it can provide model for fuzzy region; i.e., regions with indeterminate boundaries. As a proof of our idea, the paper will discuss the process of creating thematic maps using remote sensing satellite imagery. The remainder of this paper is structured as follows: Section 2 reviews the related works. Section 3 introduces necessary concepts. This section proposes novel concepts and their properties to define a simple intuitionistic fuzzy spatial region (IFSR). The contribution of the intuitionistic fuzzy logic to satellite image classification is discussed in the section 4. The concluding section 5 gives a summary and conclusion. 2. Related Work During recent years, fuzzy logic has been much investigated in different geoinformation application domain. Zhan [16] has proposed using fuzzy logic for urban land use classification. Wang [14] has proposed a fuzzy supervised classification method for classify remote sensing images. The intersection model is extended to vague regions by three main approaches: the work of Clementini and Di Felice [5] on regions with "broad boundary", the work of Zhan [15] who developed a method for approximately analyzing binary topological relations between geographic regions with indeterminate boundaries based on fuzzy sets, and Tang and Kainz [13] that provided a 3*3, a 4*4, and a 5*5 intersection matrix based on different
topological parts of two fuzzy regions. A good review for vague objects can be found in [7]. The notion of intuitionistic fuzzy sets (IFS) was introduced by Atanassov [1, 2, 3] as a generalization of fuzzy sets. Later the concept of intuitionistic fuzzy topology was introduced by Coker [6]. Malek in his works [9, 10] provided a theoretical framework for both dominant ontologies used in GIS; namely point-set topology and Region Connected calculus. 3. Intuitionistic Fuzzy Region First we shall present the fundamental concepts and definitions given by Atanassov. Definition 3.1 [3]: Let X be a nonempty fixed set. An intuitionistic fuzzy set (IFS) A in X is an object having the following form
A := {< x, nA (x),vA (x) > | x e X) where the function juA : X —» [0,1] and vA : X —» [0,1] define the degree of membership and the degree of non-membership of the element x E X , respectively. For every X E X, /J.A and VA satisfy: 0<MA(X)+VA(X)<\.
In contrary of traditional fuzzy, the addition of jUA and VA does not necessarily have to be 1. This is particularly useful when system may lack complete information. Definition 3.2 [6]: We define 0_ and 1_ as follows:
0^ :={<*,0,1 >|x e l )
1_ :={<*,1,0>|* e l } .
Consequently, an intuitionistic fuzzy topology (IFT for short) on a nonempty set X is a family Tof IFSs in X satisfying the following axioms:
(n)0 o i. e r ; (T2) G, nGjeT
for any Gt, Gj e T;
(T3) U Gi; e T for arbitrary family {G,. | / e /} a T. The pair (IFS(X),7) is called an intuitionistic fuzzy topological space (IFTS) and any IFS in T is known as an intuitionistic fuzzy open set (IFOS for short) in X. The complement Ac is called an intuitionistic fuzzy closed set (IFCS) in X.
Definition 3.3[61 Let (X,T) be an IF topological space and A =< X,HA,VA > be an IFS inX. Then the fuzzy interior and fuzzy closure are defined by:
A~ = C\{K : K is an IFCS in X and A" = \J{G : G is an IFOS in
A^K},
XandGcA}.
Corollary 3.1T61:
A°nB° A° oc
A
=(AnB)°,A-uB-
=(AuBY
^A^A~ =AC~,A~C =AC0.
Now, we add some further definitions and propositions. Definition 3.4: We define a IF boundary A=<X,JUA,VA >by:
(IFB)
of
an
IFS
8A = A~ nAc'. The following theorem shows the intersection methods no longer guarantees a unique solution. Corollary 3.2 dAnA" = 0_ iff A" is crisp (i.e., A" = 0_ or A" = 1_ ). Proof. =>) If A" = {< x,0 < HA„ < 1, 0 < VA„ < 1 >| x € X}, then
A~ = {< x,0 <juA. < 1, 0 < vA. < 1 >| x e X) and Aoc = {< x,0 < vA„ < 1, 0 < jur <\>\xeX}.
Then,
e ^ n ^ ° =A'nAc-nA° =A~nA°c nAc~ = {x,0< min(^.,//,.,v,_)< 1,0<max(y r , v,_ , ^ „ ) < 11x e l } . Therefore, ifdA nA°
=0„, then A" = 0 , o r ^ ° = 1_.
<=) If A° is crisp, then A" =0„orA° =l_.lf A" = L then A°c = 0_ and .4" = 1_, then 8AnA° = 0 . If A" = 0 then it immediately results
that S/f n A" = 0_ . Definition 3.5: Let A =< X,//^, V^ > be an IFS in (X,T). Suppose that the family of IFOS's contained in A are indexed by the family < X,/JG ,VG >: i G / } and the family of IFOS's containing A are indexed by the family < X, JUK , VK >: j e J} . Then two interiors, closures, and boundaries are defined as following:
4] :=< x, max(//G ), min(l - juGj) > Al :=< x, max(l - vc ),min(yG ) > 4~ :=< x, rmn(jiKj ), max(l - pKj) > AQ :=< x, min(l - vK ), m a x ^ ) > dA
[] := AU n AU Proposition 3.1:
8A
o '-=Al
nA
l~-
(c)A;m={[lO}A°andAm)={[],0}AProof. We shall only prove (c), and the others are obvious. []A° = < X , m a x ( / / G ) , l - m a x ( / / G . ) > . Based on
1 - max(//G ) = min(l - //G )
'
,
knowing
that
then
[]y4° =< X,max(// G ),min(l - // G ) >= ^ . In a similar way the others can be proved. Definition 3.6: Let A =< X,fJ.A,VA >be an IFS in (X,T). We define exterior of A as follows:
AE
=XnAc.
Definition 3.7: An IFOS A is called regular open iff A = A °, and An IFCS /( is called regular closed iff A = A" . Now, we shall obtain a formal model for a simple spatial fuzzy region based on IF C5-connectedness defined in [6]. A simple geographic object is assumed to be one component set, which does not have holes, cuts, and punctures. Therefore, the next definition gives us the explanation of a simple IF region. Definition 3.8: An IFS A is called a simple fuzzy region in a C5-connectedness IFTS, such that: (1) A~,A^,and AQ are regular closed, (2) A",A^,and AQ are regular open, and (3) dA, dA{], and dA{) are C5-connectedness. 4. Satellite Image Classification Land use and land cover are important examples of using fuzzy logic in GIS, because most of which is obtained from the classification procedure is over not precisely defined and indeterminate spatial objects like urban and rural, or forest and grassland. In satellite images some pixels might remain unclassified due to clouds or absorption of the sensors signal. Missing data causes problems in the
788 classification of pixels. Another major source of uncertainty in satellite images is age. Frequently there occurs the need to combine aged images with new data. The procedure will be faced with cases in which arguments indicate that part of e.g. a grassland area did not change while might be some arguments against this. In such situations using intuitionistic fuzzy logic would be reasonable. The suitability of using Intuitionistic fuzzy is shown in the remainder sections. 4.1. Materials and Study A rea The location of the present study is situated in the north of Iran (south of Caspian Sea). For this study the Landsat-7 ETM+ 2002 (figure 2) were used. The supervised crisp and fuzzy classification procedure were performed using FUZCLASS, one of the soft classifiers available in IDRISI for Windows image processing software. Training data for each class can possess both pure and fuzzy signature. The fuzzy classification allows different level of pixel membership (0.00-1.00). The crisp classification was done using the maximum likelihood method. The information classes were "forest" and "road". 4.2. Data Analysis and Discussion As can be seen from the image (figure I), the image is partly cloudy and shadowed. The two aforementioned phenomena cause the image to be partly unclassified. The results of fuzzy classification are illustrated in Figures 2 and 3_. Based on the classification uncertainty calculated by IDRISI, it can be easily seen that shadowed or cloudy area has maximum uncertainty (figure 4). The result of intuitionistic classification for the forest class is illustrated in the figures 5. The details of the intuitionistic computation are left for another article. Now, output images coming from traditional and Intuitionistic fuzzy classification can be compared. The Intuitionistic method gives an output result for uncertain area based on introducing the following simple rule:"The area surrounded by forest is more probable to be forest". The membership degree for those areas is calculated as a function of distance. 5. Conclusion In contrary of the traditional fuzzy logic, IF logic is well equipped to deal with missed data. By employing IF logic in spatial data models, we can express a hesitation concerning the object of interest. Because it distinguishes between the situation in which there is no knowledge about a certain statement and a situation that the belief to the statement in favor and against is the same. Using IF method is reasonable because on the one hand it supports linguistic variables related to doubt and hesitancy and on the other hand, it could manage given
contradictory reasons. This article has gone a step forward in developing methods that can be used to define fuzzy and vague spatial regions. Classification procedure for satellite images is strongly influenced by the presence of cloud and shadow. For example, cloudy area is lighter, so it cannot be properly classified. In the case of poor relevant data, IF method can be used. The IF decision rules for determining class membership degree are the framework in which the doubt and hesitancy of expert knowledge has been embedded.
Cloudy area
Fiji. 1. The oiiLiinal clnuiK .uul sh.ulnwed
IIIKILV
Fig. 4. The classification uncertainty Fig. S. The IF membership of the forest class
790 References 1. Atanassov, K.T.: "Intuitionistic Fuzzy Sets", Fuzzy Sets and Systems, 20: pp. 87-96, 1986. 2. Atanassov, K.T.: "More on Intuitionistic Fuzzy Sets. Fuzzy sets and Systems, 33:pp. 37-45, 1989. 3. Atanassov, K.T.: "Intuitionistic Fuzzy Logic: Theory and Application", Studies in Fuzziness and Soft Computing., Heidelberg, Physica-Verlag, 1999. 4. Burrough, P.A. and Frank, A.U. (eds.): "Geographic Objects with Indeterminate Boundaries", GISDATA Series, ed. I. Masser and Salge, F. Vol. II., Taylor & Francis: London, 1996. 5. Clementini, E. and Di Felice, P.: "An Algebraic Model for Spatial Objects with Indeterminate Boundaries", In: [4], pp. 155-169, 1996. 6. Coker, D.: "An introduction to intuitionistic fuzzy topological space", Fuzzy sets and Systems, 88: pp. 81-89, 1997. 7. Frank, A.U., Grum, E. (compilers): ISSDQ '04, Vienna, Dept. for Geoinformation and Cartography, Vienna University of Technology, 2004. 8. Kokla, M. and Kavouras, M.: "Fusion of Top-level and Geographic Domain Omtologies based on Context Formation and Complementarity", International Journal of Geographical Information Science, 15(7): pp. 679687,2001. 9. Malek, M.R.:" Spatial Object Modeling in Intuitionistic Fuzzy Topological Spaces", Lecture Notes in Artificial Intelligence, 3066, pp. 427-434, 2004. 10. Malek, M.R. and Twaroch, F. :"An Introduction to Fuzzy Spatial Region", In: Frank, A.U., Grum, E. (compilers): ISSDQ '04, Vienna, Dept. for Geoinformation and Cartography, Vienna University of Technology, 2004. 11. Roy, A.J.: "A Comparison of Rough Sets, Fuzzy sets and Non-monotonic Logic", University of Keele: Staffordshire, 1999. 12. Stell, J.G. and Worboys, M.F.: "The Algebraic Structure of Sets of Regions", In: Proceding of Spatial Information Theory (COSIT '97), Laurel Highlands, PA: Springer, 1997. 13. Tang, X. and Kainz, W.: "Analysis of Topological relations between Fuzzy Regions in a General Fuzzy Topological space", In: Proceeding of Symposium on Geospatial Theory, Processing and Applications, Ottawa, 2002. 14. Wang, F.: "Fuzzy supervised classification of remote sensing images", IEEE Transactions on Geoscience and Remote Sensing, 28, pp. 194-201, 1990. 15. Zhan, F.B.: "Approximate analysis of binary topological relations between geographic regions with indeterminate boundaries", Soft Computing, 2: p. 28-34, 1998. 16. Zhan, Q. and Molenaar M. and Gorte, B.: "Urban land use classes with fuzzy membership and classification based on integration of remote sensing and GIS", Proceeding of ISPRS, 2000.
ON-LINE TRAINING EVALUATION IN VIRTUAL REALITY SIMULATORS USING FUZZY BAYES RULE RONEI MARCOS DE MORAES Department of Statistics, Federal University ofParaiba Joao Pessoa, PB, Brazil, [email protected] LILIANE DOS SANTOS MACHADO Department of Informatics, Federal University ofParaiba Joao Pessoa, PB, Brazil, [email protected] Simulators based on Virtual Reality (VR) provide significant benefits over other methods of training, mainly in critical procedures. The assessment of training performed in this kind of system is necessary to know the training quality and provide some feedback about the user performance. Because VR simulators are real-time systems, on-line evaluation tools attached to them must have a low complexity algorithm to do not compromise the performance of the simulators. This work presents a new approach to on-line evaluation using an assessment tool based on Fuzzy Bayes Rule for modeling and classification of simulation in pre-defined classes of training. This method allows the use of continuous variables without loss of information. Results of its application are provided and compared with another evaluation system based on classical Bayes rule.
1. Introduction Virtual Reality (VR) systems for training provide significant benefits over other methods of training, mainly in critical medical procedures. In some cases, those procedures are done without visualization for the physician and the only information he has is the touch sensations provided by a robotic device with force feedback. These devices can measure forces and torque applied during the user interaction [1] and these data can be used to an assessment or evaluation [3,10], This is especially interesting in medical applications to simulate some invasive procedures. Just a few years ago were proposed the first methodologies for training evaluation. They can be divided in off-line and on-line methods. In medicine, some models for off-line [6,10,11] or on-line [3,5,7,8,9] evaluation of training have been proposed. The evaluation methodologies for training through VR simulators are still more recent. Because VR simulators are real-time systems, an evaluation tool must continuously monitor all user interactions and compare his 791
792 performance with pre-defined expert's classes of performance. By didactic reasons, it is more interesting the use of on-line evaluation tools because the user can remember his mistakes and learn how to correct it more easily. The main problems related to on-line training evaluation methodologies applied to VR systems are computational complexity and accuracy. An on-line evaluation tool must have low complexity to do not compromise VR simulations performance, but it must have high accuracy to do not compromise the user evaluation. For this case, an evaluation tool based on Fuzzy Bayes Rule reachs those requirements and could obtain better than Classical Bayes Rule application [7]. 2. Assessment in VR Simulators The assessment of simulations in VR-based systems for training is necessary to know the training quality and to provide some feedback about the user performance. User actions, as spatial movements, can be collected from mouse, keyboard and any other tracking device. Applied forces, angles, position and torque can be collected from haptic devices [1]. So, VR systems can use one or more variables, as the mentioned above, to assess the simulation performed by user. Recently, models and methods for off-line [6,10,11] and on-line [3,5,7,8,9] assessment of training have been proposed. In this paper, we propose the use of the fuzzy statistical classification based on Fuzzy Bayes Rule for an on-line training evaluation tool for virtual reality simulators. The system uses a vector of information with data collected from user interactions with virtual reality simulator. These data are compared by an evaluation system with M pre-defined classes of performance. To test it, we are using a bone marrow harvest simulator [4]. The evaluation tool proposed supervises the user movements and assesses the training according to M possible classes of performance. 3. Evaluation Tool Based on Fuzzy Bayes Rule This section presents the method for training evaluation, based on Fuzzy Bayes Rule. For reader's better understanding, we first present a short review about Classical Bayes Rule. After that, we present Fuzzy Sets and Fuzzy Bayes Rule. 3.1. Classical Bayes Rule Formally, let be the classes of performance in space of decision £2={l,...,Af) where M is the total number of classes of performance. Let be W;, i 6 £2 the class
of performance for an user. We can determine the most probable class of a vector of training data X, by conditional probabilities [2]: P(wt | X)=P(w, nX) /P(X), where zeQ. (i) The probability done by (1) gives the likelihood that for a data vector X, the correct class is Wj. Classification rule is performed according to X e w i if P(Wj | X) > P(Wj | X) for all i * j , and i, j eQ.
(2)
However, all the probabilities done by (1) are unknown. So, if we have sufficient information available for each class of performance, we can estimate that probabilities, denoted by P(X I Wj). Using the Bayes Theorem: P(w s |X)= [P(X|wOP(wD]/P(X), P(X) = Z M ,., P(X | ws) P(Wi).
where
(3)
As P(X) is the same for all classes w;, then it is not relevant for data classification. In Bayesian theory, P(Wj) is called a priori probability for Wj and P(WJ I X) is a posteriori probability for Wj where X is known. Then, the classification rule done by (2) is modified: X e wi if P(WJ | X) P(Wj) > P(Wj | X) P(w,) for all i * j and i, j e Q.
(4)
Equation (4) is known as Bayesian decision rule of classification. However, it can be convenient to use [4]: g(X) = ln[P(X|w i )P(w i )] = In fP(X | Wj)] + In fP(wi)l, with i efi. (5) where g(X) is known as discriminant function. We can use (5) to modify the formulation done by Bayesian decision rule in equation (4): X e Wj if gj (X) > gj (X) for all i * j and i, j eQ.
(6)
It is important to note that if statistical distribution of training data can assume multivariate Gaussian distribution, the use of (6) has interesting computational properties [2]. If training data cannot assume that distribution, the equation (6) can provides a significant reduction of computational cost of implementation. 3.2. Fuzzy Sets In classical set theory a set A of a universe X can be defined by a membership function u.A(x), with |iA: X —»{0,1}, where 1 means that x is included in A and 0 means that x is not included in A. . . _ f 1, if x € A ~ \ 0, iix£A
fiA[x)
(7)
A fuzzy set can be seen as a representation in classical set theory, of which we only have an imperfect knowledge. In this case, the membership function cannot be done by only one value 0 or 1, but by a value in [0,1] interval. The probability of a fuzzy event is defined by [14]: let be (Rn, <> | , P) a space n of probability where <> j is an cj-algebra in R and P is a probability measure over Rn. Then a fuzzy event in R" is a set A in Rn, with membership function u,A(x), where |iA: R" —>{0,1} is Borel-mensurable. The probability of a fuzzy event A is defined by Lebesgue-Stieltjes integral: P{A) = JRnlLA{x)dP=E{tiA)
(g)
In others words, the probability of a fuzzy event A with membership function u.A is the expected value of the membership function |j.A. 3.3. Funy Bayes Rule Again, let the classes of performance for an user done by wi; i=l,...,M, where M is the total number of classes of performance. However, now we assume that Wj are fuzzy sets over space of decision £1 Let be Hwi(X) the fuzzy membership function for each class W; given by a fuzzy information font (for example, a rule composition system of the expert system, or a histogram of the sample data), according a vector of data X. In our case, we assume that fuzzy information font is a histogram of the sample data. By use of fuzzy probabilities and fuzzy Bayes rule [14] in the classical Bayes rule [12], we have the fuzzy probability of the Wj class, given the vector of data X: p,„ln liuu(X).P(wt).P(X\w,) P(w,• |A) = — ' -—-
^ , with 2 _ / V (X) = 1
(9) '
v
However, as the denominator is independent, then the Fuzzy Bayes classification rule is to assign the vector of training data X from the user to Wj class of performance if: UH, K^i(X).P(wi).P{X\Wi)=mBxj
(1Q)
4. The Evaluation Tool The evaluation tool proposed should supervise the user's movements and other parameters associated to them. The system must collect information about positions in the space, forces, torque, resistance, speeds, accelerations, temperatures, visualization position and/or visualization angle, sounds, smells
795 and etc. The VR simulator and the evaluation tool are independent systems, however they act simultaneously. The user's interactions with the simulator are monitored and the information is sent to the evaluation that analyzes the data and it emits a report on the user's performance at the end of the training. The VR system used for the tests is a bone marrow harvest simulator [4]. In a first movement on the real procedure, the trainee must feel the skin of the human pelvic area to find the best place to insert the needle used for the harvest. After, he must feel the tissue layers (epidermis, dermis, subcutaneous, periosteum and compact bone) trespassed by the needle and stop at the correct position to do the bone marrow extraction. In our VR simulator the trainee interacts with a robotic arm and his movements are monitored in the system by variables as acceleration, applied force and spatial position. For reasons of general performance of the VR simulator, were chosen to be monitored the following variables: spatial position, velocities, forces and time on each layer. Previously, the system was calibrated by an expert, according to M classes of performance defined by him. The number of classes of performance was defined as M=3: 1) correct procedures, 2) acceptable procedures, 3) badly executed procedures. So, the classes of performance for a trainee could be: "you are well qualified", "you need some training yet", "you need more training". The information of variability about these procedures is acquired using fuzzy membership functions and Gaussian probability models. In our case, we assume that the font of fuzzy information for construction of the fuzzy membership function for w( classes is the histogram of the sample data. The user makes his training in the VR simulator and the Evaluation Tool based on Fuzzy Bayes Rule collects the data from his manipulation. All probabilities of that data for each class of performance are calculated by (9) and at the end the user is assigned to a Wj class of performance by (10). So, when a trainee uses the system, his performance is compared with each expert's class of performance and the Evaluation Tool based on Fuzzy Bayes Rule assigns him the better class, according to the trainee's performance. At the end of training, the evaluation system reports the classification to the trainee. The calibration of the Evaluation Tool based on Fuzzy Bayes Rule was performed off-line, before any evaluation of training. For that, an expert executed the procedure twenty times for each class of performance. After, for a controlled and impartial analysis, several users used the system and 150 training procedures were monitored. The data collected from these trainings were manually labeled according to the expert specifications. These same cases were labeled using the Evaluation Tool based on Fuzzy Bayes Rule and it generated the classification matrix showed in Table 1. The diagonal of that matrix shows
796 the correct classification. In the other cells, we can observe the mistakes of classifications. Table 1. Classification matrix for the Evaluation Tool based on Fuzzy Bayes Rule. Class of performance according to experts 1 2 3
Class of performance according to Evaluation Tool based on Fuzzy Bayes Rule 1 2 3 0 6 44 0 50 0 1 3 46
It was used the Kappa Coefficient to perform the comparison of the classification agreement, as recommended by literature of pattern recognition [13]. From the classification matrix obtained, the Kappa coefficient for all samples was K=90.00% with variance 9.3082 x 10"4 %. In only 10 cases, the evaluation tool made mistakes. It is important to note that for the class "acceptable procedures", all classifications were correct. That performance is very acceptable and it shows the good adaptation of Evaluation tool based on Fuzzy Bayes Rule in the solution of this evaluation problem. Another important result is the computational performance of the evaluation tool: with a Pentium IV PC compatible, 2GB of RAM and 80GB of hard disk, the average time of CPU consumed by the evaluation was 0.0310 seconds of CPU. Then, we can affirm that the Evaluation tool based on Fuzzy Bayes Rule has low computational complexity. It allows the inclusion of other variables in the evaluation tool without degradation of the performance of the virtual reality simulation. 5. Comparison with an Evaluation Tool based on Classical Bayes Rule A comparison was performed between the Evaluation Tool based on Fuzzy Bayes Rule and the Evaluation Tool based on Classical Bayes Rule, proposed by Moraes and Machado [7]. The Evaluation Tool based on Classical Bayes Rule was configured and calibrated by the expert for the same three classes used before. The same sixty samples of training (twenty of each class of performance) were used for calibration of the two evaluation systems. The same way, the data of the same 150 procedures from users training were used for a controlled and impartial comparison between the two evaluation systems. The classification matrix obtained for the Evaluation Tool based on Classical Bayes Rule is presented in the Table 2. The Kappa coefficient was K=81.00% with variance 0.0016%. In 19 cases, the evaluation tool made
797 mistakes and at least one classification was made incorrectly in all classes. That performance is good and shows that an Evaluation Tool based on Classical Bayes Rule is a competitive approach in the solution of evaluation problems. Table 2. Classification matrix for Evaluation Tool based on Classical Bayes Rule. Class of performance according to experts 1 2 3
Class of performance according to Evaluation Tool based on Classical Bayes Rule 1 2 3 46 4 0 1 49 0 36 8 6
When the Evaluation Tool based on Classical Bayes Rule performed the classification, were observed few mistakes. However, it is possible to see by Tables 1 and 2 and by Kappa coefficients that the performance of the classification based on Classical Bayes Rule is lower than the one based on Fuzzy Bayes Rule. In statistical terms, the difference of performance between those evaluation methods is significant. About computational performance of the Evaluation Tool, the one based on Classical Bayes Rule was faster than the one based on Fuzzy Bayes Rule. The average of CPU time consumed for evaluation of training based on Classical Bayes Rule was 0.0160 seconds of CPU using a Pentium IV PC compatible. 6. Conclusions and Future Works In this paper we presented a new approach to on-line training evaluation in virtual reality simulators. This approach uses an Evaluation Tool based on Fuzzy Bayes Rule and solves the main problems in evaluation procedures: low complexity and high accuracy. Systems based on this approach can be applied in virtual reality simulators for several areas and can be used to classify a trainee into classes of learning giving him a status about his performance. A bone marrow harvest simulator based on virtual reality was used as base for the performance tests. The performance obtained by an evaluation tool based on Fuzzy Bayes Rule was compared with an evaluation tool based on Classical Bayes Rule. From the obtained data, it is possible to conclude that the evaluation tool based on Fuzzy Bayes Rule presents significant better results when compared with an evaluation tool based on Classical Bayes Rule for the same case. However, the second one has better computational performance in terms of CPU time.
Acknowledgments This work is partially supported by the process 506480/2004-6 of the Brazilian National Council for Scientific and Technological Development and by the process 01-04-1054-000 of the Brazilian Research and Projects Financing. References 1. G. Burdea and P. Coiffet, VR Technology, 2nd ed., Wiley (2003). 2. R. Johnson and D. Wichern, Applied Multivariate Statistical Analysis. Prentice Hall, 4th ed., 1998. 3. L. Machado et al., Fuzzy Rule-Based Evaluation for a Haptic and Stereo Simulator for Bone Marrow Harvest for Transplant. 5th PUG Proc. (2000). 4. L. Machado et al. A VR Simulator for Bone Marrow Harvest for Pediatric Transplant. Studies in Health Tech. and Informatics, 81, 293-297 (2001). 5. L. Machado and R. Moraes, Online Training Evaluation in VR Simulators Using Evolving Fuzzy Neural Networks, 6th FUNS Proc, 314-317 (2004) 6. P. McBeth et al., Quantitative Methodology of Evaluating Surgeon Performance in Laparoscopic Surgery. Studies in Health Tech. and Informatics, 85, 280-286 (2002). 7. R. Moraes and L. Machado, Maximum Likelihood for On-line Evaluation of Training Based on VR. GCETE'2005 Proc, 299-302 (2005). 8. R. Moraes and L. Machado. Fuzzy GMM for On-line Training Evaluation in VR Simulators. Annals of FIP'2003, 2, 733-740 (2003). 9. R. Moraes and L. Machado, GMM and Relaxation Labeling for On-line Evaluation of Training in VR Simulators. GCETE'2005 Proc. (2003). 10. J. Rosen et al., Hidden Markov Models of Minimally Invasive Surgery. Studies in Health Tech. and Informatics, 70, 279-285 (2000). 11. J. Rosen et al., Objective Laparoscopic Skills Assessments of Surgical Residents Using HMM Based on Haptic Information and Tool/Tissue Interactions. Studies in Health Tech. and Informatics, 81,417-423 (2001). 12. T. Terano et al., Fuzzy systems theory and it's applications (1987). 13. B. Tso and P. Mather, Classif Methods For Remotely Sensed Data (2001). 14. L. Zadeh, Probability Measures of Fuzzy Events. Journal of Mathematical Analisys and Applications, 10, 421-427 (1968).
ASSESSMENT OF GYNECOLOGICAL PROCEDURES IN A SIMULATOR BASED ON VIRTUAL REALITY LILIANE DOS SANTOS MACHADO Department of Informatics, Federal University ofParaiba Joao Pessoa, PB, Brazil, [email protected] MILANE CAROLINE DE OLIVEIRA VALDEK Center of Biological and Health Sciences, Federal University ofCampina Campina Grande, PB, Brazil
Grande
RONEI MARCOS DE MORAES Department of Statistics, Federal University ofParaiba Joao Pessoa, PB, Brazil, [email protected] Gynecological cancer is one of the most common causes of cancer-related deaths in women. The training of new professionals depends on the observation of cases in real patients. To improve this training, was developed a simulator based on virtual reality that shows pathologies that can cause the gynecological cancer. This paper presents an evaluation tool developed for this simulator to assess the student knowledge in a situation presented and classify his training. This on-line evaluation tool uses two fuzzy rule-based expert systems to monitor the two stages of the virtual training.
1. Introduction Virtual Reality (VR) to medicine is probably one of the most promising areas in VR development. Basically, the researches are related to surgery planning, assistance (augmented reality) and training. Medical training systems include the simulation of procedures in realistic environments. These simulations can be used for procedures practice, evaluation of students' abilities and professional certification processes [1], The first VR based simulators to medical training were developed in 90' to interactive visualization of procedures. Since that, the systems approach simulations of subcutaneous tumor identification, prostate examination and ocular surgery, bone marrow harvest simulator and laparoscopic training system [2]. In procedure dependent on touch, systems for internal organ examination can be found to training prostate detection [1]. 799
800 Recently, models for off-line or on-line evaluation of training performed in VR systems have been proposed [5,6,7]. During the simulation, those evaluation systems supervise user's movements and other parameters associated with them. In this paper, we extend the concept presented by [1] to a computational assessment that automatically allows to understand which information took the user to the diagnosis and to evaluate his final decision too. 2. Basic Concepts 2.1. Fuzzy Set Theory In (classical) set theory, each subset A of an universe X can be expressed by means of a membership function JXA: X ->{0,1), where, for a given a e X, juA (a) = 1 and fiA (a) = 0 respectively express the presence and absence of a in A. A fuzzy set or fuzzy subset is used to model an ill-known quantity. A fuzzy set A on X is characterized by its membership function fiA: X -*[0,1]. The intersection and union of two fuzzy sets are performed trough the use of t-norm and t-conorm operators respectively, which are commutative, associative and monotonic mappings from [0,1] to [0,1]. Moreover, a t-norm T (respec. tconorm ±) has 1 (respec. 0) as neutral element (e. g.: T=min , ±=max). 2.2. Expert Systems Expert systems use the knowledge of an expert in a given specific domain to answer non-trivial questions about that domain. For example, an expert system for medical diagnosis uses knowledge about the characteristics of the symptoms present in a patient to recognize a disease. This knowledge also includes the "how to do" methods used by the human expert. Usually, the knowledge in an expert system is represented by rules of the form: IF < condition > THEN < conclusion> A simple rule of image processing could then be: IF Temperature >39^C THEN the patient has fever Most rule-based expert systems allow for the use of connectives AND or OR in the premise of a rule, and of connective AND in the conclusion. In several cases, we do not have precise information to express the knowledge about conditions or conclusions in the rules. In those cases, it can be interesting to use a fuzzy rule-based expert system. Then, an example of simple fuzzy rule in image processing could be: IF Temperature is High THEN the patient has Fever
801 where "High" and "Fever" can be characterized by fuzzy sets. It is important to note that a crisp set can be interpreted as a fuzzy set with a particular membership function. The connectives AND and OR are implemented by a tnorm and a t-conorm, respectively. The implication operator THEN is implemented by t-norm min. This particular configuration of operator characterizes a fuzzy inference engine called Mamdani type. 2.3. The Gynecological Exam The gynecological exam is one of the most important exams to female health and allows to detect pathologies that can evolve to cervix cancer. The gynecological examination is performed in two steps. In the first stage is used an instrument named speculum to allow the visualization of the vagina walls and of the cervix to check color and surface of these structures. In the second stage there isn't visual information. The doctor uses a lubricated glove to touch the vagina walls and identify any wounds or lumps. When touch the cervix, the doctor will feel its elasticity and will check to any irregularity. In general, this kind of exam presents some difficulties. One example can be the patient's discomfort when this exam is performed and a medicine student is present. This occurs because the only way of training is by the observation and experimentation. Other difficulty is related to the students' absence of opportunity to enter in contact with all possible cases, what result in an incomplete training. 3. Simulator for Gynecological Exam Procedure The Simulator for Gynecological Exam (SITEG) allows the training a gynecological exam and simulates different phases of pathologies. At this time, the SITEG can simulate normal, HPV or Herpes and inflamed cases presented at random to the user. The two stages of a real exam were divided to compose a visual and a touch exam. In the visual exam, the user must observe the vagina walls and cervix and notice their coloring. After the visual exam, the user will have only the external view of the vagina and must perform a touch examination perceive the texture and detect if there is wounds or lumps. The haptic device is also used in the first stage to simulate a small lantern that can be positioned by the user to provide a better internal visualization of the vagina (Figure 1). Based on his experience, a doctor described the haptic properties of the vagina walls and cervix in normal, Herpes or HPV and inflamed cases (Table 1). For each pathology, he used a calibration system to touch spheres visually identical and points out the one that best described the real case. The chosen property was successively refined until the doctor could identify the one that best
described the pathology. The same happened to describe the vagina walls and cervix color of each pathology. All the properties were exported to the SITEG.
Figure 1. (left) Visual exam of vagina walls and cervix in the SITEG. The light gray object is the representation of the speculum; (right) External view of the unidigital exam in the SITEG. Table 1. Visual and haptic properties description. Color .. , Normal Herpes / HPV „ , Inflamed T
rosy ' white with warts , red
Texture similar to buccal , mucous membrane
Viscosity . smooth
Cervix elasticity Similar to an _, ,. ,, orthopedic rubber
. , irregular similar to buccal , mucous membrane
with bubbles
very soft
. smooth
, ... hard/tense
4. Assessment of Gynecological Procedures in the SITEG An evaluation system must supervise user's movements and other parameters associated with them. However, the case of exam simulators is not similar to those presented in the literature for surgical procedures. It requests an approach based on the user's knowledge used to build his diagnosis and if this diagnosis is right or wrong. [1] partially used this concept in the prostate exam simulator and we extend that for a computational evaluation which allows to know relevant information that contributed to final diagnosis and to assess that diagnosis. Then, the evaluation system must capture relevant information about the exam during its several stages and to interact with user to know the reasons for his final diagnosis. That diagnosis must be evaluated according to the case presented by the simulation system, whose pathologies are randomly presented to user. The assessment in the simulator for gynecological exam follows the real format of exam that the physician will execute in his professional life. A well-trained professional must recognize normal and abnormal cases. The evaluation system for SITEG is based on two fuzzy rule-based expert systems, each one according to the knowledge obtained from an expert, as
803 presented in Figure 2. The first corresponds to the visual stage of the exam and the second corresponds to the touch stage. In the visual stage, the user must identify the cervix coloring according to a diagnosis of normality, Herpes/HPV or inflamed. Each coloring has its own fuzzy membership function: normal (rosy), Herpes or HPV (white) and inflamed (red). User
/\ Tsc tie Stage
Visual Stage , • VR Simulator
\ VR Simulator
I
I
Observa lions and Diagnosis •
Observations and Diagnosis
*
Fuzzy Rule-Based Expert System 1
r L
I
Database
«
4
Fuzzy Rule-Based Expert System 2
3= h~
Evaluation
Figure 2. Diagram of the Evaluation System.
At the end of the first stage the user must provide to the system his opinion about the coloring and his diagnosis. The evaluation system compares it to the information related to the case presented to the user and stores them in a database that will be used by the second fuzzy rule-based expert system in the second stage. In the second stage, the user perform the touch exam of the vagina walls and cervix and emit again an opinion according to the possible diagnosis: normal, Herpes/HPV or inflamed. This information will used by stored by the second fuzzy rule-based expert system. At the end of both stages, the evaluation system will request from the user a final diagnosis that should be one among: D = {normal, inflamed, herpes or HPV, doubt}. In this case, the "doubt" option is pertinent because the combination of information during a real exam allows to the doctor to define a final diagnosis. It happens due the fact that can occur contradictions about what the doctor decides on the patient's conditions between the first and second stage of the exam. Internally, the evaluation system executes in real-time the rules of each stage according to the case presented to the user. The evaluation system is also capable to decide if the training was successful by the combination of the two stages and final diagnosis with the case presented by the simulator. This way, the evaluation system can classify the user into classes of performance. In this work, we use five classes of performance, according to: a) user is qualified to execute a real
procedure; b) user is almost qualified to execute a real procedure, performance is good, but it can be better; c) user needs training to be qualified, performance is regular; d) user needs more training to be qualified, performance is bad; e) user is a beginner, performance is very poor. It is important to mention that in cases of wrong diagnosis, the evaluation system is able to detect where the user made a mistake by the analysis of the degrees of pertinence of his information. All these information, including the users' class of performance, is provided to user in the evaluation report (Figure 2). Due to all those features, that assessment methodology can be used in continuous evaluation of training. 5. Conclusions and Future Works In this paper we presented a new approach to assessment in a gynecological exam procedure performed in a simulator based on virtual reality. Because that exam must be performed in two steps, this approach uses two fuzzy rule-based expert systems to assess in real-time each step. It is also capable to assess about the success of training. If user's diagnosis is wrong, the system is able to identify user's fails and alert him about it. As future work, we intend to add other pathologies in the simulator. We intend also to make a statistical comparison of performance among groups of users that utilize and groups that do not utilize that simulator. Acknowledgments This work is partially supported by the process 506480/2004-6 of the Brazilian National Council for Scientific and Technological Development and by the process 01-04-1054-000 of the Brazilian Research and Projects Financing. References 1. 2. 3. 4. 5. 6. 7.
G. Burdea et. al, IEEE Trans, on Biomedical Eng. 46(10), 1253 (1999). L. Machado, PhD Thesis, USP (2003). A. Crossan et al. Proc. Eurohaptics 2001, 17(2001). S. Baillie et al. Studies in Health Tech. and Informatics, 111, 33. (2005) J. Rosen et al. Studies in Health Tech. and Informatics, 81, 417 (2001). R. Moraes and L. Machado. LNCS, 3773, 778 (2005). L. Machado and R. Moraes. Proc. FUNS, 314 (2004).
SCREAMING RACERS: COMPETITIVE AUTONOMOUS DRIVERS FOR RACING GAMES' FRANCISCO GALLEGO, FARAON LLORENS, ROSANA SATORRE Departamento de Ciencia de la Computation e Inteligencia Apdo. Correos 99, 03080, Alicante, Spain University of Alicante
Artificial,
Online videogames are not just entertainment products but also can be used as environments for Multi-agent Systems (MAS) where agents can reach multiple sources of knowledge from which to leam. In particular, videogames are especially well suited for learning Human-Level Artificial Intelligence, because they strengthen human-agent interaction. In this paper we present Screaming Racers, a simple car-racing online videogame designed to experiment with MAS which learn to drive racing cars along competition tracks. We also present our car-driving learning results with several Neuroevolution techniques, including Neuroevolution of Augmenting Topologies (NEAT).
1. Introduction Videogames have evolved exponentially in the last twenty five years. They have changed from simple bricks made of single colour flashing pixels into extremely detailed 3D models with thousands of polygons, and from simple left-to-right computer controlled characters into complicated BOTs1 with adaptive personalities, capable of making their own decisions and even able of devising strategies. Present computer games bring complete virtual worlds to life with complex physical and social rules. This creates a fantastic framework within which to conduct research in various fields such as Multi-agent systems (MAS), goaldirected behaviour, knowledge representation and reusability, machine learning, temporal reasoning, and many others [1], Moreover, due to the fact that in videogames it is not necessary to develop complex and computationally expensive mechanisms to acquire, filter and understand the knowledge provided by the sensors, we can focus our work in building Human-Level Artificial Intelligence [2], Therefore, online videogames are unexplored mines of knowledge where to dig and explore with our intelligent agents. In this paper we present some ideas 1 1
This work is supported by the Spanish Generalitat Valenciana, project number GV05/165. The term BOT comes from robot, and refers to a piece of software which acts as a virtual robot.
805
806 on how to make the most of an online videogame to face the problem of driving a race car, considered from the point of view of the Human-Level Artificial Intelligence. 2. Screaming Racers Screaming Racers (SR) is a car-racing simulation online videogame where cars are controlled by intelligent agents, which try to learn and improve their skills as drivers. The aim of the game is to create the best artificial group of drivers, which will become our personally managed motor-racing team. In order to accomplish this task, we will have to train our agents using the tools provided by the game. Once we have created and trained our personal motor-racing team, we can test its effectiveness and efficiency by participating in tournaments against other teams. In order to train our agents, SR has a set of possibilities we can explore. It let us create our custom-defined set of tracks, car types and complex training plans. Its AI system is designed for letting us include AI algorithms as plug-ins. Finally, it also lets us selecting a different AI algorithm for every agent being trained, and tweak the parameters for performance improvement. By playing around with all of these possibilities we can construct speciallydesigned training plans with the intention of trying to maximize the learning rate of our driving agents. Once our agents are trained, we can save their brains to files, which lets us continue training them later or they can be grouped together to create a team and so challenge other teams of agents or human players. 2.1. Internal design SR has been designed as a multi-agent online videogame where players use a client application to connect to a server which realizes the simulations of the races. This multi-agent design of the agent population provides several advantages: • Agents can migrate from client to server and vice versa whenever needed, thus minimizing the problems related to latency with communications. • Agents can sense the environment using a set of defined sensors, which decouples them from the internal representation of the environment. • Agents can communicate with each other, so letting teams share knowledge in order to benefit from a collaborative approach. In our client/server architecture (Figure 1), the main components of SR are inside the Server Kernel. These components are responsible for creating and maintaining the virtual environment and the multi-agent system, as well as communicating with clients. The description of the role of the most important
components is as follows:
Figure 1. Architecture of Screaming Racers
Physics engine: applies laws of physics to the virtual environment. It lets tune virtual physics laws for convenience. AI System: manages all the AI algorithms and agents of the virtual environment. It lets agents use different AI algorithms (Figure 2). Simulator Agent: This agent is responsible for creating and maintaining the virtual environment needed for every race or training session. It manages the center of simulation where the action occurs, so it has to deal with the track, cars, time control, race statistics, environmental objects, etc. Driver Agent: Each instance of this class of agents is the implementation of the brain of one of the drivers participating in the simulation. So, these agents receive the information about the environment from their sensors (the Simulator Agent) and have to respond with the desired actuator changes, i.e. accelerating by "x" m/s2 "awdriMf" SetoctAlgottthm(ld) CrwtMo»nti() Aunf) OetPaputttionO
~?r
mtMwwm OaaMgmtO
TakeDeeltion(AQ) Proc*w(Popufihon) Draw(Ag)
MMtMMt OeatiAgenfO T^keOeciHon(Ag) Pmae**(Papul»tlonl DrtwiAg)
S CnafAgtnlO TMktDteakmCAgj Proet*iPapul»tion) DttwtAg)
WM^Wt&tma TakaDeolttmlAg) Pmean(Papul»tion} Dmw(Ag)
ALGORITHM' MyAlg ALGORITHW UyAlQ ALGORimw MVM0
ALGoamwuy»g
Figure 2. Class diagram of the AI System
2.2. The environment of Screaming Racers Here we describe how the environment where our agents have to test their abilities has been designed. The environment of an agent i which is participating
808 in SR dj =(L,C, C,) is defined by a totally sorted set L of vectors describing track sectors (1), a set C of cars taking part in the race, and the car ci driven by the agent i. Each car Cj has a set of parameters describing its state (2) (location, velocity and angle). I = $ } / | Z J = nAi<j
i,j,n
e N
e R
(1) (2)
Each agent / obtains a value Sy e [0,1] from every sensory, representing the distance to an obstacle (sensor model is depicted in Figure 3). For instance, a value of si2 — 1 means that sensor number 2 of the agent 1 is detecting no obstacles (black lines), whereas a value of sn = 0.1 means that the same sensor is detecting an obstacle very close (white lines). Therefore, supposing we have p sensors, we have a vector £,• (3) for our agent / at the time /, which is the input for its artificial brain.
Figure 3. Sensor model selected for our experiments
Once the agent has processed the information of its environment, it produces a vector o, (4) with the values which represent the decisions of the agent for the next time step. In this case, this vector is composed of 2 real values, a, and or, , which represent the pressure the agent puts on the accelerator and the degree of turn applied to the steering wheel. s\ = {s\j)li - 0..m A j = 0..p Am,pe
N,t e R
o/+' =(a';,,a';{)/i = 0..mAa'l+\a'i+l e[0,l],me N,teR
(3) (4)
2.3. Feedback for agents Taking into account the decisions taken by an agent i at a given time step tk , this agent will be given a fitness score rtk e R which will reward or punish its driving effectiveness, as a feedback result of its decisions. This score is calculated as t =%b', defining each bt as follows: •
Reward for driving faster
b0 = /fcjvj
•
Reward for driving aligned with the track *' "*l('TCM
•
Reward for arriving to the end of the current track sector (5) [0 in other case
• •
Punishment for over-steering h - ~K [«,'* | Punishment for colliding with an obstacle (6) „, r
,
fl ifci collides
c
b4=-k4e(c,,ia,c)A nh,c) = \n. •
(6)
[0 in other case
The kg coefficients which appear in these definitions are those which can be tweaked by the player of SR in order to govern the agent learning aims. Therefore, when the training session is over (tend), the final scores regarding each agent will be calculated using n - K*. This integral is calculated assuming linear growth between every two consecutive tk . 3. Experimentation In order to experiment with SR and create agents capable of learning to drive, we have developed and tested several Neuroevolution algorithms [3, 4, 5, 6, 7]. We chose Neuroevolution as a starting point for two main reasons: • Neuroevolution is a generic learning algorithm. It starts with no knowledge of the problem to solve and so learns in a non-supervised fashion. It is not necessary to change even a single line of code to use Neuroevolution to learn in different environments. • A relevant breakthrough has been made in this field recently with the creation of the Neuroevolution of Augmenting Topologies (NEAT) Algorithm [3]. To develop our experiments, we have selected a set of different tracks and cars and carried out the experiments using a set of 220 different training plans, with varying parameters for every Neuroevolution algorithm. The overall statistics: fitness points obtained by agents measure quality of their driving experience. To clarify this, we have obtained fitness points for various human drivers: An average human driver obtains a fitness of around 1750 points, while an expert human driver achieves up to 2317. In our experiments with different learning algorithms, GENITOR and BME obtained a fitness mean of approximately 400 point, Schiffman Encoding got 650, NEAT without speciation arrived to 1450 and Full NEAT reached 1800 points. Results obtained from the learning experience of the agents are depicted in (Figure 4) which shows their driving skills. As the car moves forward it leaves a
810 tail which lets us infer the path taken and the approximate speed. 4. Conclusions and further work We have discussed interesting characteristics which online games offer to AI research. We think that the complex environments which online games create nowadays, and the ones they will create in the future, are suitable for experimenting with AI algorithms to improve them and AI research. We have also explained our recently developed game-prototype, Screaming Racers, which has allowed us to generically train agents using different Neuroevolution approaches and to compare their results. The final results of the experiments show us how NEAT can be useful in bringing driving agents to life. The best individuals obtained are even able to outperform human players.
HHBS
Figure 4. 100-generations-trained agents (arrows added for clarifying the path followed by cars)
In future research, we will add more complex physics, more complex environments and more realistic cars. We also plan to create virtual cars which emulate radio control racing cars in order to also use SR as a research environment where we can conduct racing experiments with emulated cars, and finally try to put our trained agents in the real world. References 1.
2. 3. 4.
5.
6.
7.
van Lent, M; Laird, J. E.; Buckman, J.; Hartford, J; Houchard, S.; Steinkraus, K..; and Tedrake, R. (1999): Intelligent Agents in Computer Games. Proceedings of the National Conference on Artificial Intelligence, Orlando, FL, pp. 929-930. Laird, John E.; van Lent, Michael (2000): Interactive Computer Games: Human-level AI's Killer Application. National Conference on Artificial Intelligence (AAAI), Austin, Texas. Stanley, K.O.; Miikkulainen, R. (2002): Evolving neural networks through augmenting topologies. Evolutionary Computation 10, pages 99-127 Whitley, D. (1989): The Genitor Algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. Proceedings of the Third International Conference on Genetic Algorithms and their Applications. Massachusetts Institute of Technology, 116-121 G.F. Miller, P.M. Todd, and S.U. Hedge (1989): Designing Neural Networks Using Genetic Algorithms. In J.D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 379-384. Morgan Kaufmann. Moriarty, D. E. (1997): Symbiotic Evolution of Neural Networks in Sequential Decision Tasks. Ph.D. thesis, Department of Computer Sciences, The University of Texas at Austin. Technical Report UT-AI97-257 Moriarty, D. E. and Miikkulainen, R. (1996): Efficient reinforcement learning through symbiotic evolution. Machine Learning, 22:11-32.
URBAN SIGNAL CONTROL USING INTELLIGENT AGENTS MOHAMMAD AMIN ALIPOUR Department of Computer Engineering, University Kashan, Isfahan 87317-51167, Iran
ofKashan
SAIEED JALILI Department of Computer Engineering, Tarbiat Modarres Tehran, Tehran, Iran
university
The urban traffic jams is daily problem in large cities. Urban traffic control system aims to solve this problem. The major difficulty in urban traffic is great number of variables in order to present the traffic state such as flow, speed, density. Another bothering characteristic of urban traffic is lack of precise relation between these variables. These problems make researchers to develop several traffic models to describe the traffic behavior and build control on them. In this paper, we present a new model of traffic flow and introduce a control for this model that helps agents to control traffic autonomously.
1. Introduction Urban road networks serve a significant part of traffic demand. For example in Germany 30% of some 600 billion car kilometers per annum are traveled within metropolitan road networks. Because of the high demand many urban road facilities are frequently oversaturated and respectively congested. Through congestion the capacity of the road infrastructure is in fact reduced and particularly during rush hours when the maximum capacity is most urgently needed the performance deteriorates considerably [1]. Three approaches may be considered to solve this problem:(l) reduction of urban population by making citizens to migrate to other less populated areas, and eventually decrease demands for urban traffic, (2) development of infrastructures and (3) better use of infrastructures. First approach looks somehow silly because the problem is delegated to other areas, so let it be discarded. The second proposal is hard to be applied in some high density areas like downtown. The most favorable option is third approach. One of prescription for third approach is better control of urban area traffic flows in order to achieve efficient use of urban transportation. Traffic control management is generally subdivided into two different classes [2]: (1) direct control measures using traffic lights and variable message signs and (2) indirect control measures like recommendations for the drivers by 811
means of VDS (variable direction signs and text panels), warning messages (via broadcast, RDS/TMC or handy-based services), pre-trip information (e.g. via Internet) and individual driver information systems. In subsequent sections we advocate a new method traffic light control for an urban network that has been described in section 2 and in subsections 2.1 and 2.2 the underlying formulations and control algorithm is described. The proposed methods were simulated and the results are presented in section 3. 2. Traffic Control via Resource Scarcity Measurement In fact traffic control is problem of allocating scarce resources (i.e. streets) among users (i.e. drivers), fairly, the thing that economics tries to do. But, what is the meaning of "fair" in an economy? In economical view, every good in an economy have a real price (shows scarcity of good) and each customer of good has a valuation for it (essentiality of use). The good must be allocated to the user with most valuation. In economics and more especially market science several types of mechanisms (e.g. several auction types) and theoretical foundations have been provided for this allocation. Many other allocation problems in engineering environments have been used the idea of building a market, in order to achieve optimal resource allocation [3,4]. But could urban traffic make use of market initiatives. 2.1. Resource Scarcity Measure All efforts in economics are around two question 'how scarce is a resource' and 'where this scarcity going to go? Or trend of its scarcity'. Although, in urban traffic, no market mechanisms have been proposed but, if a comparable criterion for resource scarcity could be presented; control can operate based on it. Let us call this criterion 'price'. Price of a resource should explain the state of scarcity of resource. In urban traffic, two parameters shows the dynamism of scarcity of a resource, i.e., street. These parameters are 'load of the streetj (Is)' and 'normalized mean speed of street; (nss)'. In which
Current no. of cars inside street. Is, = 'Capacity street. Average of all cars velocity in street. Desired max. velocity in street, We developed function (1) for price of street;:
price, = 1000 * log( ls + s ) (1) nss + e in which 0 < £ «
1.
Eq. 1 forces the price to rise sharply as we approach high Is and nss This gives the a clear indication of the "dangerous zone" to avoid, while at the same time encouraging a relatively high utilization. Fig. 1 .helps to inspect correctness of eq. 1. All three cases in fig. 3 have equivalent loads but differ in location. In fig.la the cars bulk is at beginning of street. Obviously, this bulk is bottleneck for arrival cars to the street. The increase of velocity will remove the bottleneck. In this case the price due to eq. 3 will decrease if velocity of bulk is increasing. Else, it will be unchanged. In fig. lb the cars bulk is neither bottleneck for coming cars nor prone to delay at next intersection. If the bulk velocity is decreasing the price will increase and if it is increasing the price will decrease. In fig. lc cars bulk is at the end of street and will wait until the permission of intersection use granted to it. If the bulk is stopping, price will increase sharply. It seems that proposed eq. 1 is good criterion for scarcity of resource and trend of it. 2.2.
Greedy Control with Scarcity Measure (GCSM)
By having the price function question is 'how to use this scarcity measure in order to control the traffic?'. In urban traffic, the control applied at intersection level. Traffic control decisions make changes in state of environment (traffic flows in connecting to the intersection). The changes can be divided into > Bettering changes > Worsening changes State of the environment has been defined in eq. 2 and the changes are categorized by relation 3. State Change^ Min§Apriceope„
slreels\}-
Max{Max{\Apriceopen
streels\s,Max^pricedowm„eam
if stateChang e < o
>
Bettering change
if
>
Worsening change
(2) slreels\}}
(3) StateChang e > o
^SK «-C'aisBulk
.vj
'."..'
Head
Tail
Bothering for next incoming cars
Prone to delay at end of street (a)
•'«-Ciii's Bulk'
• V .-•••;'. '.
Prone to delay at end of street
•^CarsBulk ,: •
\
-
•
•" •
--:,: Head
(b) liotherinu lor next incoming cars Tail
• "='••'-.'-f" Head
Tail
Bothcrinu for next incoming cars
Prone lo delav at end of street (c) Fig. 1. Resource scarcity behavior
Also intersection control should obey the environment control restrictions. The restrictions are as follows: 1- The street lights changes to green in a pre-defined order 2- The street green light length must be in interval of [min green time, max green time]. GCSM control algorithm prohibits the control agents to do worsening changes. Here a question may be raised: "How the changes will be measured when the price is a maximum value of itself (i.e. ls=\ and nss=0)T\ In this case, whenever the streets reaches the maximum price. Its last biggest change in few minutes ago will be considered as worsening change. 3. Results GCSM has been compared with pre-timed intersection control (PIC) in which they always have same green time length. The controls have been applied in urban traffic micro-simulator based on [5]. Fig. 2a and fig. 3a show the simulated urban network topologies and respectively, diagrams in fig. 2b and fig 3b show the average delay per car in both control strategies GCSM and PIC. In
815 PIC two timing has been considered t=50 sec. and t=120 sec. inspecting the results shows that GSCM is superior to PIC with both timings.
(a)
Average-Oemy per caretsec)—
'>'•&<&',$
(b)
PC(t=50)
PC(t=120)
GCSM
Fig. 2. (a) evaluated topology, (b) evaluation results
(a)
Average delay per car
(b)
HC(t=50)
PC (1=120)
GCSM
Fig.3. (a) evaluated topology, (b) evaluation results
4. Conclusion Here, in this paper, a control algorithm for urban traffic was presented. Our GSCM algorithm is similar to control methods via feedback. The feedback in context of GSCM is a quantized value of resource scarcity. GSCM knows that its actions make changes in its environment, hence classifies changes to bettering changes and worsening changes. It assumes that prohibiting worsening changes is itself a bettering change and it tries always to do bettering changes. So, the used algorithm in GSCM is somewhat greedy. As other greedy algorithms, it is time efficient but it may be unable to reach the optimal control if it trapped in local price fluctuations. Another feature of GCSM is reduction of decision making parameters to one parameter (price). In computer networks a variation of this method has been used [6]. But it is not formally proved in which environments control could be made by quantizing resource scarcity.
References 1. B. Friedrich, Proc. of the 9th Meeting of the Euro Working Group Transportation, Adaptive Signal Control - An Overview. Italy (2002). 2. H Kirschfink, M Boero, J Hernandez - 7thWorld Congress on Intelligent Transport Systems, Intelligent Traffic Management Models, Turin, 2000 3. D. Helbing. New Journal of Physics, Modeling supply networks and business cycles as unstable transport phenomena. (2003). 4. J. Feigenbaum,Christos, H. Papadimitriou,S. Shenker. Journal of Computer and System Sciences, Sharing the Cost of Multicast Transmissions, (2000) 5. P. Hidas, Transportation Research Part C, Modeling lane changing and merging in microscopic traffic simulation, (2002) 6. L. Yamamoto and G. Leduc. Network and Information Systems Journal Resource Trading Agents for Adaptive Active Network Applications, (2000).
CONSIDERATIONS ON UNCERTAIN SPATIO-TEMPORAL REASONING IN SMART H O M E SYSTEMS
JUN LIU, JUAN C. A U G U S T O AND HUI WANG School of Computing and Mathematics University of Ulster at Jordanstown Newtownabbey BT37 OQB, Co. Antrim, UK E-mail: [j.liu, jc.augusto, h.wang]@ulster.ac.uk
Smart Homes are the subject of intensive research to explore different ways in which they can be used to provide home-based preventive and assistive technology to patients and other vulnerable sectors of the population like the elderly and frail. Current studies of Smart Homes are focused on the technological side (e.g., sensors and networks) but little effort has been given to what we consider is a key aspect of these kind of systems, that is their capability to intelligently monitor situations of interest and advise or act to the best interest of the home occupants. This paper investigates the importance of spatio-temporal reasoning and uncertainty reasoning in the design of Smart Homes. Accordingly a framework of applying a methodology referred as Rule-base Inference Methodology using the Evidential Reasoning in conjunction with Smart Home framework considering spatio-temporal aspects of human activities monitoring is outlined.
1. I n t r o d u c t i o n Smart Homes is aimed to help people at risk in their living place by preventing hazards and by assisting them as much as possible when they need services from the health system 1 . Although technology has made significant advances in developing sensors and networks that allow the monitoring of different environments, progress on how to take full advantage of these technologies has been slow. Spatio-temporal reasoning and uncertainty handling underlie the working of a desirable Smart Home system. The confluence of them gives rise to a concept of uncertain spatio-temporal reasoning. This paper shows how the confluence can improve the ways in which Smart Homes can be applied to improve health care. Then a framework of how a methodology referred as Rule-base Inference Methodology using the Evidential Reasoning (RIMER) can be combined with an active database framework 2 considering spatio817
818 temporal aspects of human activities monitoring is provided. Although the scenarios considered in this article are mainly related to increasing independent living of the elderly, the concepts and methodologies developed here can be applied in many ways to other aspects of health-related issues. Due to the space limitation, the framework is only outlined in general, details can refer to another paper3. 2. An Investigation into Rule-Based Design of Smart Homes Systems 2.1. Smart
Homes
A Smart Home l can be briefly described as a house that is supplemented with technology in order to increase the range of services provided to its inhabitants by reacting in an intelligent way. The technology can be diverse but generally will have two main components: a set of sensors and a networking layer linking those sensors with some computing facilities. Typical sensors are carbon monoxide (or heat) sensors or heat sensors, motion sensors used for burglary alarms and sensors detecting if a window or a door has been opened. All these sensors send out signals and some of them can also receive signals so that for example, the cooker can be turned off automatically. An obvious way to turn off a device would be with a timer but usually this can be a very rigid mechanism. A more useful and flexible use of the device demands the intelligent analysis of several factors in order to decide if the turning off of a cooker is meaningful given a context.
2.2. ECA rules and
uncertainty
Dynamic systems like Smart Homes can be modelled by considering the occurrence of meaningful events and the contexts in which those events occur. Then based on the detection of situations of interest defined by events occurring in particular contexts, decisions can be taken. Active databases can be used to store information gathered from a Smart Home. A characteristic feature of Active Databases is their use of the so-called Event-Condition-Action (ECA) rules as a way to react to the incoming information. ECA rules have a syntax of the following format: ON <Event> IF DO
819 This means that whenever an occurrence of the event described in the ON clause is detected, if the condition described in the IF clause (usually imposing constraints on different aspects of the events described in the ON clause) is true, the action described in the DO clause is obeyed by the system. When the ON clause is satisfied the rule is said to be 'triggered' and if, additionally, the IF clause is satisfied then the rule is 'fired'. Here we focus on their use with respect to Smart Homes: specifically, the monitoring of activities carried out by patients, diagnosis of situation of interest from a health and safety perspective, and the recommendation of actions to follow on behalf of the caring environment for the patients living in those homes. For example, we can monitor the events t h a t are triggered inside a house with the aim of enhancing safety by reacting as quickly as possible to hazards. Below we provide an informal description of an ECA rule as a first approximation of the idea. This rule is triggered when the smoke alarm in the kitchen is activated and under the condition that the person is known to be at home (this conclusion is the consequence of another rule or set of rules which can have as a resulting action: 'set status variable patient_at_home=true') the action recommended is to initiate a fire hazard procedure that explicitly takes into account the assumption that there are people at risk and will for example involve contacting a hospital nearby to request an ambulance. ON k i t c h e n smoke alarm t r i g g e r e d FOLLOWEDBY ( ( d o e s not go out i n a ' s h o r t ' t i m e ) OR (does not c a l l s e c u r i t y t o s t a t e i s n o t d a n g e r o u s ) ) I F p a t i e n t i s known t o be a t home DO c a l l f i r e b r i g a d e AND i n i t i a t e p r o c e d u r e t o r e s c u e p a t i e n t Event descriptions can be provided in very different shapes, depending on the type of events which are being detected. Actions are very much dependent on the application and proposals for their languages are far less prescriptive. Several languages have been proposed for ECA rules definitions and ours will be on the line of that proposed by Augusto et al2. In our framework we are passing from an ECA-like data-base system enriched with temporal reasoning capabilities to an 'IF-THEN'-like knowledge-based system (KBS). The diagnosis mechanisms achieved with the ECA based approach can be connected with reasoning systems under uncertainty which bring different advantages to the diagnostic system. Events and conditions of the ECA rules are subsumed in the 'If part of
rules in the KBS and actions of the ECA rules are passed to the 'Then' part of rules in the KBS, e.g., the ECA rule given above will look like: IF
k i t c h e n smoke alarm t r i g g e r e d and ( ( d o e s n o t go out i n a ' s h o r t ' t i m e ) OR (does not c a l l s e c u r i t y t o s t a t e i s not d a n g e r o u s ) ) and p a t i e n t i s known t o be a t home THEN c a l l f i r e b r i g a d e AND i n i t i a t e p r o c e d u r e t o r e s c u e p a t i e n t The mechanisms that were previously located in the ON clause of ECA rules will now be in the IF part of the rules of the KB. Events or conditions involved in the IF part of rules may not be of the same type, for example, they could be quantitative or qualitative in nature. It is possible that some events or conditions can be measured numerically (e.g., age and medicine intakes) and other can only be described subjectively (e.g., how often the house occupant prepares food). Smart Homes systems, which rely on data gathered by sensors, have to deal with the storage, retrieval, and processing of ambiguous and uncertain data. Although ECA rules allow us to reason more neatly in terms of the relation between the occurrence of events, their context of occurrence and the actions that should follow as a response to them, there is a practical need to complement them with the representation of other important aspects of knowledge. We aim at complementing the ECA-based framework with the possibility to incorporate uncertainty into the rule execution of a rule-based system derived from the original ECA-based system. Sources of uncertainty in ECA rules include: a) Uncertain event. Sometimes, an occurrence of the event described in the ON clause is detected. An uncertain event can be "It is most likely that the patient has fallen asleep" or "The patient is in the kitchen with 80% certainty". b) Uncertain condition. Uncertain conditions might include uncertain queries like "a sensor can be considered activated (with 'high' confidence)". c) Uncertain relationship between the event/condition and the actions. In the design and implementation of rule-based systems, uncertainty may be caused by weak implication that may occur when an expert is unable to establish a precise correlation between the event/condition and the action except by using degrees of belief or credibility. One such situation may lead
821 to the specification of a rule expressing that if some events are detected in a context suggesting an elderly patient is active and they are followed by other events suggesting sudden suspension of activities, then there is a significant chance that the patient may be in a compromised situation (e.g., has fallen or fainted). In later sections we will structure these kind of situations in a way similar to the following sketch of rule. IF a t _ k i t c h e n _ o n w i t h ' h i g h ' confidence Followed_by tdRK_on w i t h 'medium' c o n f i d e n c e Followed_by no_movement_detected f o r 10 u n i t s of time THEN assume w i t h 80% confidence t h a t t h e p a t i e n t h a s f a i n t e d In addition, different kinds of uncertainty may coexist in real Smart Homes systems, e.g., fuzzy information may coexist with uncertain information, leading to the inference of knowledge without certainty but only with degrees of belief or credibility regarding a hypothesis. 2 . 3 . Time
dependent
rules
Monitoring activities in a Smart Home is a time dependent activity in the sense that being able to represent and reason about the order in which activities developed and their duration is essential for a correct diagnosis of the situations. Instantaneous events are associated with points in time. This in turn has an effect on how rules are triggered. Actions can also have time attached. The time of the action is always the time when the rule advising a particular course of action is fired. We represent in our IF-THEN rules the logical operator "ANDlater" which differs from the classical AND in that o ANDlater b is true if the time of detecting a precedes that of detecting b. The operator "ANDsim" where a ANDsim 6 is true if the time of detecting a and b are the same. RIMER offers supports to represent several important notions of uncertainty and incompleteness but there are no specific primitives for temporal reasoning. Hence we extend the RIMER framework with a temporal dimension which will combine the best of both approaches (RIMER and spatio-temporal reasoning 3 ). 2.4. Conflict
resolution
- Aggregation
scheme
Depending on incoming sensor-related events, the system evaluates all the ECA-rules to identify which Event parts match the actual situation. These selected rules may conflict with each other if the Event parts of more than
one rule are matched simultaneously. How to resolve the conflict is a crucial issue in a rule-base inference formalism, especially uncertainty is involved. Within the RIMER framework5 rule aggregation using an evidential reasoning approach resolves the conflict and gets the aggregated conclusion. In addition, the input for an antecedent attribute may not be available or may be only partially known. In the inference process, such incompleteness should be considered because it is related to the strength of a conclusion. 3. RIMER as a system to design Smart Homes In the above sections, we have presented a general Smart Home environment and explained how diagnosis in such cases is based on spatio-temporal considerations. Here using the RIMER framework, the rule-based smart home systems for supporting decision making is outlined. The general architecture of the system is illustrated in Figure 1.
1
Caring personnel e.g., nurses
Sensors Data available Experts
Belief rule base
Facts
RIMER engine Time-related rule matching
Activation weight determination
Rule combination byER
Assessments
Actions F i g u r e 1.
General A r c h i t e c t u r e
The Rule Base is defined and generated by the experts based on the activity database. Depending on the incoming events captured by sensors and the experts (e.g., the caring personnel in a smart home housing elderly people). The rules-matching component (including time-related ordering
823 matching and the activation weight determination) then searches through a combination of facts to find those combinations that satisfy the antecedent of rules and select rules that should be fired. Time-related ordering matching is responsible for using the time ordering strategy to decide which of the rules, out of all those that apply, have the highest priority and should be fired first and which rules cannot be used. The activation weight determination is used to calculate the matching degree of the facts to the IF part of the rules. These selected rules may conflict with each other if the IF parts of more than one rule are matched simultaneously. Then the rule combination scheme based on the Evidential Reasoning (ER) algorithm 4 is applied to get the final aggregated assessment which solves the rule conflicts. The database will be updated based on the new assessment and be fed into the rule-base and the new situation. The concept of belief rule base and its associated inference methodology were proposed 5 as a formalism based on the E R approach 4 . In a belief rule base, each possible consequent of a rule is associated with a belief degree. Take for example the following informal description of a belief rule for the specification of a smart home: Rk: IF a t _ k i t c h e n on w i t h ' h i g h ' confidence ANDlater tdRK_on w i t h ' l o w ' confidence ANDlater no_movement_detected w i t h ' h i g h ' confidence THEN e s t i m a t i o n of confidence t h a t t h e p a t i e n t h a s f a i n t e d {(H, 0 ) ; (M, 0 . 4 ) ; (L, 0 . 6 ) , (N, 0 ) }
is
The linguistic terms {high (H), medium (M), low (L), none (N)} are used as the referential values for the attributes "atJutchen.on", also for "tdRK_on", and "the patient has fainted", respectively. Here "no_ movement-detected" is the consequent of another IF-THEN rule in the rule base which will conclude that as a consequence of analyzing other sensors, e.g. in the reception area, for some amount of time, e.g. 10 minutes. And {(H, 0), (M, 0.4), (L, 0.6), (N, 0)} is a belief distribution representation of the patient's health status (e.g. fainted), indicating that we are 40% sure the level of confidence that the patient has fainted is medium, and 60% sure the level of confidence that the patient has fainted is low. Space constraints do not allow us to give a full account of the steps. Instead we just provide with an outcome based on the following assumption: Let's assume some of the main events in the antecedent of our IF-THEN rule is not known. For example, we know "atJcitchen on" with high confidence and "tdRK_on" with high confidence, but we only got partially evidence
824 that after some time units the person is not moving, i.e., we are not 100% sure. We can then assume the belief distribution is (H, 0.7); (M, 0); (L, 0), (N, 0). This could be because of a sensor fault, human being's inability to provide precise judgments, information not being transmitted properly over the network from the Smart Home to the computing centre. T h a t means the information is incomplete. If we apply our methodology then the conclusion from the system will be: (high, 0.59); (medium, 0.13); (low, 0.01); (nothing, 0); (unknown, 0.27) where "Unknown" in the above result means that the output is also incomplete due to the incomplete input. Hence, both complete and incomplete inference can be accommodated in a unified manner within the proposed RIMER-based Smart Home framework. 4. Conclusions This article shows the importance of the combination of spatio-temporal and uncertainty reasoning for designing Smart Homes based on the belief rule-based system RIMER. The combination led us to extend it by adding uncertainty handling capabilities to spatio-temporal ECA rules. For more detailed technical explanation, please refer to another paper 3 . Much remains to be done but this a first attempt to bring to the attention of the future developers the importance of some concepts and the need to provide systems which are build with solid theoretical foundations. References 1. S. Giroux, H. Pigot (Eds.), Proc. of Int. Conf. on Smart Homes and Health Telematic, IOS Press, 2005. 2. J. Augusto, C. Nugent, The use of temporal reasoning and management of complex events in smart homes, in: R. L. de Mantaras, L. Saitta (Eds.), Proc. of European Conf. on AI (ECAI 2004), IOS Press (Amsterdam, The Netherlands), 2004, pp. 778-782, August 22-27. 3. J. Augusto, J. Liu, H. Wang, and J.B. Yang, Management of uncertainty and spatio-temporal aspects for monitoring and diagnosis in a Smart Home, Technical Report, University of Ulster, 2005 (see http://www.infj.ulst.ac.uk/ jcaug/uandstr4sh.pdf) 4. J. Yang, D. Xu, On the evidential reasoning algorithm for multiple attribute decision analysis under uncertainty, IEEE Trans, on Sys., Man, and Cyb. (Part A: Systems and Humans), 32 (3) (2002) 289-304. 5. J. Yang, J. Liu, J. Wang, H. Sii, H. Wang, A generic rule-base inference methodology using the evidential reasoning approach - RIMER, IEEE Trans, on Sys., Man, and Cyb. (Part A: Systems and Humans), 36 (2) (2006) 266285.
NEURO-FUZZY MODELING FOR FAULT DIAGNOSIS IN ROTATING MACHINERY ENRICO ZIO Department of Nuclear Engineering, Polytechnic of Milan Via Ponzio 34/3, 20133 Milano, Italy [email protected] GIULIO GOLA Department of Nuclear Engineering, Polytechnic of Milan Via Ponzio 34/3, 20133 Milano, Italy giulio.gola@volimi. it Malfunctions in machinery are often sources of reduced productivity and increased maintenance costs in various industrial applications. For this reason, machine condition monitoring has been developed to recognize incipient fault states. In this paper, the fault diagnostic problem is tackled within a neuro-fuzzy approach to partem classification. Besides the primary purpose of a high rate of correct classification, the proposed neurofuzzy approach aims at obtaining also a transparent classification model. To this aim, appropriate coverage and distinguishability constraints on the fuzzy input partitioning interface are used to achieve the physical interpretability of the membership functions and of the associated inference rules. The approach is applied to a case of motor bearing fault classification.
1. Introduction Monitoring of rotating machine systems can be highly effective in minimizing maintenance downtime by providing advanced warning and lead time to prepare the appropriate corrective actions upon an adequate fault diagnosis. This paper addresses tins problem by means of a Neuro-Fuzzy modeling technique which combines Neural Networks (NNs) [1] and Fuzzy Logic (FL) [2] to exploit the advantages of both, namely the simple learning procedures and computational power of the former and the high-level, human-like thinking and reasoning of the latter. This kind of neuro-fuzzy methods [3-7] has been proven to be a powerful framework for tackling practical classification problems. The models developed within this framework are both accurate and transparent, i.e. based on a low number of physically readable rules in the Fuzzy Knowledge Base (FKB). This is achieved by the imposition of some semantic properties to the Membership Functions (MFs) of the fuzzy input partitioning interface [8]. 825
826 The paper is organized as follows. Section 2 illustrates the MFs' properties introduced to achieve a transparent model [8]. Section 3 presents the basic steps of the neuro-fuzzy algorithm [9]. Section 4 reports the results obtained from the application of the proposed modeling technique to the classification of motor bearing faults [10]. Some conclusions are drawn in the last Section of the paper. 2. Optimal input partitioning interface for a transparent model The intelligibility of a neuro-fuzzy model can be enforced by means of semantic constraints which guarantee the physical transparency of the input space partitioning, i.e. of the input variables' MFs, so as to obtain an "optimal" input interface [8]. The following semantic properties of the MFs define an "optimal" input partitioning interface [8]: 1. Moderate number of MFs. The number of MFs must be kept low: a high number of MFs generally increases the precision of the model, but it also causes a loss of the system intelligibility. 2. Normality. The partitioning MFs should be normal [2]: this requirement is motivated by the fact that a MF represents a linguistic term of specified semantic meaning: for each MF in the fuzzy partition of the Universe Of Discourse (UOD) Ux of the input variable X at least one of the numerical values x e Ux should exhibit full semantic matching with that MF. 3. Coverage. The MFs should cover the entire Ux in order to provide a linguistic representation of the whole range of possible values x of X. Generally, a strong coverage is imposed by setting a minimum coverage level s > 0 , so that for each value x of X at least one of the nx MFs which make up the fuzzy partition of Ux is such that nl (x) > e. 4. Distinguishability. This property relates to the need of physical interpretability. Each linguistic term must be associated with a clear semantic meaning and therefore the corresponding MF must not excessively overlap with the others. While the number of normal MFs can be directly implemented in die model, the coverage and distinguishability of the UOD must be enforced by some semantic constraints within the model construction phase itself. 3. The neuro-fuzzy modeling approach to fault classification The neuro-fuzzy approach proposed is sketched in Figure 1.
827
c
TRAINING SET
D
PARAMETERS OPTIMIZATION i FORWARD COMPUTING
PARAMETERS OPTIMIZATION
^ ~ - ^ " " " N E W R U L E ^ ~ " " - » ^Y E S
^ ^ ^ _ ^ CREATED ?^-^-*-~**^
—MOSSlHILiy I \<1 CLASSIFICATION MODEL
D
Figure 1. Sketch of the neuro-fiizzy algorithm 3.1. The initial fuzzy knowledge base In this work, an empirical procedure for the construction of the initial FKB is implemented on the basis of the available training data set [9,10]. The UOD of each of the n, input variables is a priori subdivided into nx =3 linguistic terms Xt, 1=1,2,3, bearing the semantic meaning of "small", "medium" and "big". These linguistic terms are quantitatively expressed in terms of a corresponding number of Fuzzy Sets (FSs) with quasi-gaussian, bell-shaped MFs jux (x) [11]. The centers of the MFs associated to each FS are positioned at the lower extreme, at the center and at the higher extreme of the normalized UOD of the input variable, for the "small", "medium" and "big" linguistic terms, respectively, whereas the spreads are taken equal for all MFs and set up so as to provide a minimal coverage e of the UOD (Fig. 2).
"»M
0.1
02
03
04
0.5
06
Figure 2. Initial normalized UOD partitioning of an input variable with minimum coverage e =0.1
828 This a priori fuzzy partitioning interface generates 3"' fuzzy relations among the «, input variables which can in principle be taken as the antecedents of an equivalent number of corresponding fuzzy rules in the FKB. To select the fuzzy rules which will constitute the initial FKB, the "firing strength" with which each of the 3"' fuzzy relations is activated by each crisp training input pattern is firstly computed. Considering a generic rule r fed by a generic input pattern x = (x,, x2, • • •, xn ) , the rule strength sr (x) is taken as the minimum of the rule antecedent membership values nx, (xq), q=l,...,nj,
where Xql denotes the/-
th FS, /=1,2,3 (small, medium, big), characterizing the <jr-th antecedent variable Xq of the r-th fuzzy rule and xq is the value of the g-th component of the input pattern x considered. With respect to the problem of classifying patterns into C classes, the cumulative strength of the r-th fuzzy rule for they'-th class, SrJ, r=l,...,3"', j=l,...,C, is then computed by adding the firing strengths sr(xk)
of the rij
training patterns of classy: "j
SrJ=
J^sr(xk)
(1)
Finally, the fuzzy relation with the highest Srj is retained as the antecedent part of the only fuzzy rule to be kept in the FKB for they'-th class. By doing so, only C rules, one for each class, are kept in the FKB, instead of 3"', to summarize, in first approximation, the input space relational characteristics with respect to the different classes. The corresponding consequent part of the rule is achieved by a possibilistic approach which provides the degree of membership of a pattern to each class [12]. In this respect, the consequent part of rule r is a vector Or ={o ry }, j=\,...,C and orj e [0,1], given by: "j
I n
k=\,xtcj
I 4=1
where n is the total number of training patterns of any class. Summarizing, the form of the r-th fuzzy rule is: If X] is X[t and X2 is X% ...and Xn. is Xrnl
then Ol is orX and 02 is or2 and...Oc
is orC,
829 where Xq, q=l,2,...,nj,
and 0 . ,y'=l,2,...,C, are the input and output variables,
respectively. 3.2. The forward algorithm for computing the model output Given an input vector x of «,• measured quantities, its possibilistic membership grade to the y'-th class can be inferred as follows from all the p(h) rules in the FKB at the h-th. step of the iterative training procedure [12]: lP(h)
PW
=
S
°j (*) Z! °n r (*) / X Sr (*) #•=1
/
(3)
r=\
where o . is the consequent of the r-th rule for the y'-th class given by Eq. (2) and sr (x) e[0,l] is the firing strength of the r-th rule. 3.3. The rule creation module At the beginning of the training phase, the model initial FKB contains p(0) =C rules, one for each class. During the training, if the incoming pattern x is such to call for a new rule to be added to the current FKB [5,6,13], a new MF is introduced in Ux to function as the antecedent part of the input variable X only if the component xq of x has maximum membership to the FSs Xql , lq = 1,..., nq, of the current partition of Ux
lower than a predefined threshold.
If this condition were not satisfied, the antecedent of the new rule for the input variable Xq would be given by the FS in Ux bearing the highest MF value in correspondence of x . This criterion for rule creation bears the advantage of being directly connected with the membership values and of creating a new MF only when necessary, thus controlling the number of MFs and keeping the model transparent. 3.4. The constrained optimization for parameter tuning The parameter tuning phase is the last step of an iteration of the training procedure. The centers and spreads of the input bell-shaped MFs and the output membership grades of the rules currently in the FKB are tuned by a backpropagation algorithm [14] which exploits the penalty function method [8].
830 The global error function to be optimized, i.e. minimized, is defined as the sum of the Root Mean Square Error between the model possibilistic output and the true class membership [9] (equal to 1 for the true class and 0 for the others), for the model accuracy objective, and two weighted error contributions related to the coverage and distinguishability constraints, for the model transparency objective [8,9]. The parameter tuning is first applied to optimize the initial FKB. Then, during the iterative training phase, it is activated as soon as a new rule is created, thus allowing the model to adjust its parameters with respect to the new FKB [9]. 3.5. The pruning of the FKB For further improvement of the model transparency, a pruning procedure [9] is applied to the FKB resulting at the end of the training iterations. In particular, those rales which are not sufficiently fired with respect to a predefined firing threshold by any of the training patterns are eliminated and the antecedents of an input variable which do not appear in any of the remaining rules are deleted. Finally, the UOD of each input variable is checked for too similar MFs, which are then reduced to one by elimination [9]. The training ends when the processing of the whole set of training patterns does not initiate a new rale, the model having reached the optimal configuration. 4. Classification of motor bearing faults: a case study In this work, the problem of motor bearing fault classification presented in [10] has been considered. From [10], n=64 data patterns are available, each one characterized by two features: the bearing vibration frequency (XD) and time amplitude (XT), obtained as explained in [10]. The model fed by the normalized two-feature input pattern (XD, XT), («,=2), computes the possibilistic membership grades (Eq. 3) to the three classes (C=3), "none", "some" and "severe", related to the magnitude of the bearing defects. Finally, in order to obtain a crisp classification, the model assigns the input pattern to the class for which it bears the highest possibilistic output. The neuro-fuzzy classification model has been trained as explained previously by firstly generating the initial FKB (Fig. 2) with e set equal to 0.1 and then updating the rules and optimally tuning its parameters. Table 1 shows the resulting fuzzy knowledge base, characterized by a low number of rules which are easily interpretable when associated to the consistent coverage and high distinguishability of the antecedent partitioning FSs (Fig. 3).
831 Table 1. FKB built by the neuro-fuzzy model Rule, r
Input vibration features, X
Bearing defect possibilistic membership, Oj
xL
None
Some
Severe
0.860 0.283 0 0.374 0 0 0 0 0 0
0.204 0.616 0.400 0.519 0.485 0 0 0.411 0.012 0
0 0.100 0.660 0.100 0.560 1 1 0.600 1 1
A.T
low low low medium medium medium medium med-high med-high med-high
low med-low med-high low med-low med-high high low med-high high
Input 1. Vibration frequency, X^
Input 2. Time amplitude, Xj.
Figure 3. MFs built by the neuro-fuzzy model. The final classification accuracy (Table 2) is slightly higher than that achieved in [10], with the overall diagnosis accuracy, calculated over the entire training set, increased from 90.63% to 93.75%. Table 2. Classification performance using the initial FKB and the final model Class, j Number of Correctly classified patterns, nj None Some Severe Total
4 12 48 64
Misclassified
Initial FKB Final model [10] Initial FKB Final model [10] 4 5 34 43
4 8 48 60
4 12 42 58
0 7 14 21
0 4 0 4
0 0 6 6
Finally, from the analysis of the misclassifications, it turns out that 4 patterns of class 2 ("some") are wrongly assigned to class 3 ("severe"), with the output possibilistic membership values to classes 2 and 3 being very close, the
832 latter only slightly larger than the former. This manifests an uncertainty of the model in assigning these 4 patterns to a class and provides a physical insight in the classification problem in terms of overlapping decision boundaries for classes 2 and 3 in the feature space: patterns falling in this region have comparable possibilistic membership to both classes. 5. Conclusions A neuro-fuzzy approach to pattern classification has been propounded for tackling fault diagnostic tasks. In order to obtain a transparent and interpretable model, semantic constraints are introduced into the fuzzy input partitioning interface and enforced during the model parameter optimization. The application of the approach to the diagnosis of motor bearing faults has returned satisfactory results concerning both the accuracy and the interpretability of the model. Furthermore, the possibilistic output has offered a useful tool to understand the uncertainties of the classification problem. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
B. Muller and J. Reinhardt, Neural Networks - An introduction, (1991). L.A. Zadeh, Information and Control, 8, 338 (1965). J.S. Jang, IEEE Trans. Sys. Man., Cybern., 23(3), 665 (1993). C.F. Juang, H.W. Nein and C.T. Lin, Fuzzy Theory Systems: Techniques and Applications, 3, 1265 (1999). C.T. Lin and C.S.G Lee, IEEE Trans. Comput., 40(12), 1320 (1991). M. Marseguerra, E. Zio and P. Avogadri, Progress in Nuclear Energy, 44(3), 237 (2004). D. Nauck and R. Kruse, Fuzzy Sets and Systems, 89, 277 (1997). J.V. De Oliveira, IEEE Trans, on Systems, Man, Cybernetics-part A: Systems and Humans, 29(1), 128 (1999). E. Zio and G. Gola, Annals of Nuclear Energy, 33, 415 (2006). G. Goddu, B. Li, M.-Y. Chow and J.C. Hung, IEEE Trans, on Systems, Man, Cybern., 1961 (1998). L.X. Wang and J.M. Mendel, IEEE Trans. Neural Networks, 3(5), 807 (1992). G. Castellano, A.M. Fanelli and C. Mencar, IEEE Conference on Systems, Man, Cybern., (2003). C.T. Lin and C.S.G. Lee, IEEE Trans. Fuzzy Systems, 2(1), 46 (1994). D.E. Rumelhart and J.L. McClelland, MIT Press, 1 (1986).
FLC DESIGN FOR ELECTRIC POWER STEERING AUTOMATION* J. E. NARANJO, C. GONZALEZ, R. GARCIA, T. DE PEDRO Institute de Automdtica Industrial Ctra. Campo Real Km. 0,200, La Poveda, Arganda delRey, Madrid 28500 Phone +34 918711900, Fax +34 918717050 {jnaranjo, gonzalez, ricardo, tere}@iai.csic.es The steering wheel automatic control for autonomous vehicles is presently one of the most interesting challenges in the intelligent transportation systems field. A few years ago, the researchers had to adapt motors or hydraulic systems in order to automatically manage the trajectory of a vehicle but, due to the automotive industry technological development, a new set of tools built into the mass produced cars allow the feasibility to be computer-controlled. This is the case of the electronic fuel injection, sequential automatic gearbox or the Electric Power Steering (EPS). In this paper we present the development of an autonomous vehicle's EPS control, based on a two layer fuzzy controller. The necessary computer and electronic equipment has been installed in a Citroen C3 Pluriel mass produced testbed vehicle and a set of experiments has been carried out to demonstrate the feasibility of the presented controllers in real situations.
1. Introduction The aim of the Intelligent Transportation Systems (ITS) is to apply computer science, electronics and communication techniques to develop a new generation of components that improve the safety, enhance the mobility and reduce the contamination in the transportation field. The range of this concept is broad and affects every transportation mode at short medium and long term. One of this long term tasks is the intelligent vehicles development whose final objective is to substitute the human driver for an artificial one, who will improve the safety and comfort of the passengers. We will center this work in the intelligent vehicles for road transport, where is located the discipline of the autonomous vehicles. The autonomous road vehicle research field can be divided into three areas: control and automation, sensors and communications. The first area refers to the automatic control of the three fundamental actuators of a vehicle: the steering * This work is supported by MICYT ISAAC project CICYT DPI2002-04064-C05-02 and MFOM COPOS.
833
834 wheel, the throttle and the brake pedal. Second one deals with the necessary sensorial equipment to allow the control system to correctly sense the environment and the third one is about the data interchange among the control systems, sensors and every element involved in the autonomous driving task. The Autopia Program focuses mainly on autonomous driving using fuzzy logic controllers. The steering wheel of two Citroen Berlingo vans has been robotized, using electrical motors engaged to the steering bar through a gear system [1]. There are some examples of automating the steering wheel of road vehicles. In the Parma University, the ARGO vehicle has been automated and its steering wheel is automatically controlled using a DC motor attached with a pulley [2]. The University of Sevilla "Heavy Vehicles Automatic Guidance" project, directed by Anibal Ollero, where the steering of a truck is moved using a DC motor, a clutch and a pulley [3]. New generation of vehicles incorporates a different system of steering assistance: the Electric Power Steering (EPS). The advantage of this kind of power steering for automation purposes is that no external actuator has to be added. The aim of this paper is to present the fuzzy logic-based EPS control system developed for automatic driving that has been installed in a Citroen C3 Pluriel vehicle. In this paper, we present the automatic control of a Citroen C3's steering wheel, which is electrically powered. 2. Onboard Equipment Electric power steering consists of a torque sensor and motor actuator couple. The sensor is attached to the steering column and measures the torque applied by the driver when he moves the steering wheel. This torque signal is transmitted to a control/power card that sends an amplified proportional power signal to the DC motor, which is engaged to the steering rack bar. The first step for achieving automatic steering control is to manage the wheels from a computer that we have installed in the vehicle and which is powered by the vehicle battery. The solution for this automation is to completely bypass the sensor and control/power card equipment and send a power signal directly to the motor. An onboard computer running a fuzzy logic-based control system that generates an analog signal mimics the control card. An additional power card whose input is the analog signal produced by the computer and whose output is a power signal that supplies the assistance motor. Two sensorial sources are needed to manage the vehicle steering: a carrier phase differential GPS receiver, which generates to-the-centimeter accurate
positioning, and a CAN bus interface, which is connected to the internal CAN network of the C3 and can read control data like, for example, the steering turning angle, essential for closing the control loop. 3. Steering Control System The objective of the steering controller is to calculate the angle that the steering must be turned to track the desired trajectory correctly. To do this, a computational trajectory representation has to be defined that can be used as a reference for tracking in order to perform map matching [4], 3.1. Fuzzy steering controller A two-layer fuzzy controller has been defined for steering control Figure 1. The high-level layer calculates the target position of the steering wheel to fit the vehicle to the desired route. The low level layer generates the optimum torque that must be exerted by the EPS assist motor to move the steering wheel.
Figure 1. Two layer fuzzy controller schema.
3.1.1. FLC I: Steering Position controller Its mission is to calculate the target turning angle that the steering wheel must be moved to track the reference trajectory segment. Two variables are used as input for the fuzzy steering control system, namely lateral and angular errors. Having calculated the crisp value of these variables, they had to be fuzzified for being used in the fuzzy controller. To do this, we define two fuzzy variables, also named angular error and lateral error, each of which has two linguistic labels, left and right, that indicate where the vehicle is located with respect to the reference segment. The output of the system is the target turning angle that the steering wheel must be moved to correct the trajectory deviation indicated by the input variables. There is only one fuzzy output variable, named Steering, with two linguistic labels, called left and right, whose membership functions are singletons.
836 The qualitative actions for the human driver (rules) are the same in every driving situations, and only the quantitative part varies, which is defined in the fuzzy control by the fuzzification of the variables: Rl. 1: IF Lateral_Error Left THEN Steering Right Rl .2: IF Lateral_Error Right THEN Steering Left Rl .3: IF Angular_Error Left THEN Steering Right Rl .4: IF Angular_Error Right THEN Steering Left
These rules may look very simple, but this is where the power of fuzzy computing lies. Note that driving is an easy task for humans, which everybody can do without long computations or mathematical equation expansions. When executing the inference, we use the minimum as t-norm (AND) and the maximum as t-conorm (OR). 3.1.2. FLCII: Steering Torque controller The aim of the low-level fuzzy controller is to manage the application of torque to the steering wheel to move it to the target position commanded by the highlevel fuzzy controller. A fuzzy controller has to be used because of the characteristics of the actuation components. There are multiple and, in any event, unknown factors affecting EPS control, for example, variable pressure on the rack depending on speed or tire sliding. An artificial intelligence-inspired controller, in this case based on fuzzy logic, means that none of these elements have to be taken into account as it just mimics human behavior. In this case, three input variables are needed to control the torque applied to the steering wheel. The first is the angular position error of the steering wheel, that is, the difference between the target position generated by the high level fuzzy controller and the real position. The second input variable is the real position of the steering wheel, and the last one is the angular speed at which the steering wheel is turning. When these variables are fuzzified for use in the fuzzy controller, they are transformed into fuzzy variables called Pos_Error for the angular position error, PosAbs for the real steering wheel position and AngSpeed for the angular speed, respectively. The output of the fuzzy controller indicates the voltage that must be sent to the motor power card that applies a proportional amperage to the motor to move the steering wheel with the optimum torque to correctly achieve its target position. Two linguistic labels have been defined, Positive (right) and Negative (left), whose membership functions have been defined as singletons. In this way, this output is normalized from -1 to 1, where the negative values represent a torque to the left and the positive values, a torque to the right.
837 In this case, we have defined six rules for controlling the applied torque. R2.1: IF Pos_Error PosLarge THEN Torque Positive R2.2: IF Pos_Vol Neg_Large THEN Torque Negative R2.3: IF Pos_Abs Negative AND Pos_Error Neg_Small THEN Torque Negative R2.4: IF PosAbs Positive AND PosError Pos_Small THEN Torque Positive R2.5: IF Ang_Speed MORE THAN Null THEN Torque Positive R2.6: IF Ang_Speed LESS THAN Null THEN Torque Negative
4. Experiments Having installed the described controller in the instrumented testbed car, we ran some automatic steering control experiments. Figure 2 includes a trace of controller behavior while taking a bend to the left, to show how the system works. The top graph shows the input variable values for taking the bend. The next graph plots the output of the steering position fuzzy controller. The third graph contains the values of the input variables of the torque fuzzy controller, the steering wheel position enor, the steering wheel real position and the steering wheel angular speed. Finally, the bottom graph shows the output torque, normalized from -1 to 1, to be applied to the EPS motor controlling the steering wheel.
Figure 2. Detail of the control input and output variables for the first turning to the left of the automatic tracking experiment.
At the beginning of the experiment, the car is driving centered along a reference segment of the route. As the car approaches the curve, the lateral error increases to the positive part of the graphic (left) and the angular enor augments to the right. These input values are fuzzified, and the rule inference of the high-
838 level controller is executed, generating a left turning command that is illustrated by the steering output variable. To execute this turning, effective power must be applied to the EPS motor. The low-level fuzzy controller, whose input variables are the position error, the real position and the steering angular speed, calculate this power. The output of the low-level controller (torque) shows that the maximum effort is applied at the beginning of the turning, when a peak is needed to initiate the steering movement (4-5.5 sec). Once the movement is under way, the torque decreases rapidly. However, as the steering rack moves away from the center, the effort it takes to move the rack is bigger, and the output torque has to increase. Finally, the controller maintains the steering position and moves the steering wheel back to the center when the turning has finished. Although the target position commanded by the high-level controller is never exactly achieved, the behavior of the vehicle is correct. The reason for this is that both controllers are closely related and fitted, the output torque being adapted by the second controller to generate the correct behavior. 5. Conclusions In this paper, we have presented a two-layer fuzzy controller architecture for automatic electric power steering control, on which we have run some automatic driving experiments discussed in the last section. These results showed that electromechanical systems, like an EPS, can be managed in a human-like way using artificial intelligence techniques, in this case, fuzzy logic. This method allows the user to mimic human behavior by extracting knowledge from experts, in this case, drivers. An additional advantage of fuzzy logic is that complex nonlinear vehicle models do not need to be developed. References 1. R. Garcia et al., "Frontal and Lateral Control for Unmanned Vehicles in Urban Tracks", IEEE Intelligent Vehicles Symposium, France, 2002. 2. A. Broggi, M. Bertozzi, A Fascioli, et al., "The ARGO Autonomous Vehicle's Vision and Control Systems", International Journal of Intelligent Control and Systems, Vol. 3, No. 4, pp. 409-441, 1999. 3. A. Rodriguez-Castano, G. Heredia y A. Ollero, "Fuzzy path tracking and position estimation of autonomous vehicles using differential GPS" mathware and soft computing 7 (2000) pp.257-264. 4. J.E. Naranjo, et al. "Fine Tuning for Autonomous Vehicle Steering Fuzzy Control", FUNS 2004, Blakenberge, Belgium, 2004. pp. 450-455.
STUDYING ON ACCELERATION SENSOR'S FAULT-TOLERANCE TECHNOLOGY OF TILTING TRAINS1 JIANHUI LIN1, YUMING ZHANG2, YAN GAO2, TIANRUI LI3 ' National Traction Power Lab., Southwest Jiaotong University, China
Electronic Engineering Department, Chengdu Electromechanical College, China 3
Department of Mathematics, Southwest Jiaotong University, China
The acceleration sensor's fault-tolerance principles and methods of tilting trains are discussed based on the practical running conditions of the first tilting train in China. The 2/3(G) vote redundant fault-tolerance strategy is feasible for the acceleration sensor's fault-tolerance technology. All the algorithms are tested in simulating conditions by data from the test-on-line. A program based on them has been developed and performs well in the DSP system and has passed the preliminary testing in the test-on-line.
1. Introduction Fault-tolerance is an important technology to enhance reliability of a system. The main approaches of fault-tolerance may be divided into two parts: hardware redundancy and analytic redundancy. Hardware redundancy refers to using redundant components in the key position of a system to perform identical tasks. It detects and diagnoses system faults by voting. Analytic redundancy refers to using functional relations among systems or components to constitute functional redundancy. It detects and diagnoses system faults through variations of the system status, output and parameters with the least hardware redundancy [ 1 2 l A tilting train is a train with a tilting mechanism that enables increased speed on regular railroad tracks. Many of the problems with motion sickness are related to the fact that a traditional servo system cannot respond instantaneously to the change in trajectory forces, and even slight discrepancies whilst not being * This work is supported by the Fund of Education Ministry of China (NO.705044 and IRT0452).
839
840 noticeably perceivable cause nausea due to their unnatural nature [ . Therefore, it is essential to develop the signal processing technology of tilting trains to increase passenger comfort. In this paper, we mainly discuss the acceleration sensor's fault-tolerance technology of tilting trains based on the practical running conditions of the first tilting train in China. We employ hardware redundancy and parameter estimating methods to realize the acceleration sensor's fault-tolerance since hardware redundancy is easier and more reliable than analytic redundancy in real applications I4"6l 2. Mathematical model of the reliability of hardware parallel redundant systems in tilting trains A parallel redundant system composed of several same sensors is helpful to improve the reliability of measuring parameters since the system will perform well as long as there remains a good sensor. Generally, sensors can not be replaced or maintained on line if tilting trains are running. This system is called un-maintainable system whose reliability can be calculated according to its corresponding mathematical model. Assume the acceleration sensor's life-span has negative exponential distribution, mean time to failure (MTTF) is T and invalid rate is X=\/T, then its reliability is R(t) = e~' T,t > 0 . The reliability of parallel systems including n units is
Rs(t) = \-(\-e-"rY,t>0
(1)
It illuminates that parallel redundancy systems obtain high reliability when K0.2T. The systems' reliability increases nonlinearly with the increasing number of redundant components. When n reaches up to a certain value, the reliability increases slowly. Since the weight, volume, cost and power loss of systems rise in proportion with the increasing number of redundant components, it is significant to minimize the number of redundant components in order to satisfy reliability. Example 1: Assume t=0.lT, the reliability of the system is no less than 99.9%. Then, according to equation (1), the smallest number of units satisfy:
l - ( l - e " 0 1 ) " > 0.999
(2)
From equation (2), it is easy to show that three sensors are enough to keep the system running by the reliability of 99.9%.
841 Hardware redundancy detects and diagnoses system faults by voting. Voting redundant system with k out of n, denoted as k/n(G), requires at least k units out of n to work normally. If k=\, it is a parallel system, k is greater than nil for majority vote, and n is often odd. Since a system with three sensors can provide very high reliability shown in Example 1, we select a voting redundant system with two out of three sensors, 213(G), in our experiments. Assume that a sensor has only two states: normal and invalid. Then its reliability is 3
Ra(t) = ^C'3R'(l-
Rf'1 =3R2-2R3
(3)
i=2
3. Fault-tolerance methods based on the hardware redundancy In fact, parallel redundancy itself does not guarantee the realization of the reliability, in other words, hardware redundancy does not automatically implement hardware fault-tolerance. A set of fault decision-making logics are required to accomplish the hardware fault-tolerance. Therefore, it is necessary to develop methods to judge and locate a faulty sensor and to remove the data from the faulty sensor in time. 3.1. The decision-making logic of2/3(G) vote redundant system Majority vote is the most simple and common decision-making logic of the multi-unit parallel redundant system. Through majority vote, we can decide whether a unit fails in the system. Moreover, if a unit fails, it could automatically find the faulty unit and take corresponding actions. Therefore, in fact, multi-unit parallel redundant systems based on the hardware become a kln(G) voting redundant one, here 213(G) voting redundant system is utilized. Majority vote implements diagnosis by building parity equations and calculating residua. Assume the outputs of the three sensors are y\, >>2 and y-j respectively, then, the parity equations are e
k = \yt ~yt\»»hk
= l,2,3,i*j*k
(4)
A reasonable residual threshold eT is given according to the measuring error among sensors. When all the sensors works normally, we have ek<eT(k=l,2,3). If one sensor fails, the residuum will change so drastically that ek is beyond eT. From the parity equations (4), sensor faults are judged and located merely according to the difference of the output of sensors without considering factors leading to faults. So any fault can be detected by this method.
842 3.2. Decision-making logic in parallel redundant system If a sensor fails with a permanent fault in the 2/3 (G) vote redundant system, it would be shielded in order to keep it from participating vote. Otherwise, faulty units take up majority if another sensor fails too, and eventually wrong conclusion will be drown. Once an invalid sensor is shielded, there will be only two sensors left in the system. As a result, equation (4) only includes one residual equation. If one more sensor fails, the residual threshold method can only judge whether there is a fault and can not decide which sensor fails. Therefore it is unknown whether received signal is true. In this case, the parallel hardware is changed into logical series which leads to a lower reliability. However, the system still can work if the second faulty unit is detected by signal characteristics. Consequently, the parallel redundancy can be implemented based on 2/3(G) vote redundancy. 3.3. Determination of the residual threshold Now the question is how to set an appropriate residual threshold for majority vote since this value has a direct influence on performance of detecting and diagnosing sensor faults14"61. Larger threshold can cause the rise of detecting leakage rate, while smaller one can increase false diagnosing rate. Threshold can be chosen according to characteristics of residua themselves. In our experiments, we randomly select two acceleration signals from data in a certain track line. Then we calculate distribution of the difference between the two signals. Under the same speed and railway status, the probability of residua beyond the limit is P(e>er)
=a
(5)
where e=|Ay|. Usually let a=X, where X is the invalid rate of a sensor and a is an arbitrary constant. The residual threshold eT can be calculated from equation (5). However it is unreasonable to set only one threshold to satisfy different train speeds and lines. Based on the analysis to large numbers of data from testing line, we find, if the scope of signals is large, their difference become larger. However, their difference no longer decreases after the scope of signals reduces to a certain value, which shows that the difference is not only relative to the difference of sensors' sensitivity but also to random factors. Therefore, a mathematical model for the difference of two acceleration signals could be the following equation:
Ay = Afcc + Ae
(6)
843 where x is the input of sensors, Ak is the difference of two sensors' sensitivity, and Ae is the difference irrelative with inputs of two signals. If the fixed difference of the two signals is eliminated, Ae is a random variable with the mean of zero. Assume the residual threshold eT is e
r=^rH
+ e
o
(8)
where kT is a dynamic threshold coefficient and e0 is the minimal threshold, y is the middle one among yx, y2 and yy We can obtain the distribution of two signals' residuum based on testing data in a practical track line when ly|=0. Therefore, the probability of small signal's residuum beyond the limit can be obtained by following equation:
P(\Ay\>e0) = a
(9)
Equation (9) is also employed to compute the probability of small signal's fault diagnosis. e0 can be calculated by the same method as that of calculating fixed residual threshold. The dynamic threshold coefficient kf can hardly be calculated by analytic or statistics methods. A feasible approach is to find a piece of signal with maximal variance from testing data first, then calculate its residual threshold and regard it as an upper limit. Therefore, we can let kT=(em-e0)/ym
(10)
where ym stands for the maximal value of the signal, and em is the maximum threshold. 4. Simulation example A simulation example of the fault-tolerance algorithms is based on experimental data from a section of a track line. Because there is no fault happens during experiments, in order to verify the algorithms, faults are supposed to appear at sampling 3000, 4000 and 5000, respectively, which correspond to the first, second and third sensors' failure. Faults are represented in form of output of 0, 15 and 1.5, which simulates open circuit fault, overflowing fault, and stuck fault, respectively. Two parameters of residual threshold in this algorithm, e0 and kT, are 0.9 and 0.03, respectively. The low limit of variance OT = 0.005 and the times of continuous vote C = 3. Here direct method is utilized to calculate
844 variance. The width of data window is 32 and the sampling frequency of signals is 256 Hz. Simulating results show that the first sensor is detected to be failed at the 3009th sampling, another fault is detected at the 4002nd sampling, which comes from the second or the third sensor. At last, the fault of the second sensor is successfully detected at the 4033rd sampling and the fault of the third sensor is located at 5031st sampling. In other words, each fault is detected by 0.0078, 0.1289 and 0.1211 second respectively after it occurs. This meets the demand of real-time fault diagnose basically. 5. Conclusions In this paper, acceleration sensor's fault-tolerance principle and methods are discussed based on the practical running conditions of the first tilting train in China. We conclude that 2/3 (G) vote redundant fault-tolerance strategy is feasible to acceleration sensor's fault-tolerance technology. All the algorithms are simulated based on data from a practical track line. A program is developed by standard C language for the selected algorithms, which can be transplanted to a DSP system easily. This program performs well in the DSP system especially designed for tilting trains and passes the preliminary testing in the test-on-line. References 1. R.M. Goodall: Active Railway Suspensions: Implementation Status and Technological Trends, Vehicle System Dynamics, 28(1997), 87-117. 2.
A.C. Zolotas, R.M. Goodall and G.D. Halikias, New control strategies for tilting trains, IAVSD 2001, Copenhagen, Denmark, Sept. 2001.
3.
http://en.wikipedia.org/wiki/Tilting_train.
4.
R.M. Goodall, A.C. Zolotas , J. Evans: Assessment of the Performance of Tilt System Controllers, RAILTEX 2000, Birmingham, UK, Nov., 2000.
5.
A.C. Zolotas, G.D. Halikias, R.M. Goodall: A Comparison of Tilt Control Approaches for High Speed Railway Vehicles, Proceeding of ICSE 2000, Coventry, UK, 2(2000), 632-636.
6.
J.T. Pearson, R.M. Goodall, I. Pratt: Control System Studies of an Active Anti-Roll Bar Tilt System for Railway Vehicles, Proceedings, Institution of Mechanical Engineers, 212 Fl(1998), 43-60.
7.
T. Hill, P. Lewicki: STATISTICS Methods and Applications. StatSoft, Tulsa, OK. 2006.
A RISK-RISK ANALYSIS BASED ABSTRACTION APPROACH TO SOCIETAL PROBLEM-SOLVING IN NUCLEAR SYSTEMS SUMAN RAO NIS Sparta Limited, Dhirubhai Ambani Knowledge City, G Block, 2nd Floor, Wing 3 Koparkhairane, Navi Mumbai, Maharashtra 400 709, India [email protected] Societal problems in nuclear systems are typically complex and characterized by scientific uncertainties. This paper describes an abstraction approach to solving such problems by using Risk-Risk Analysis (RRA). A Fuzzy Analytic Hierarchy Process (FAHP) framework is used to combine RRA elements with the normative elements of the Precautionary Principle (PP) and inform regulators on the relative ranking of possible regulatory actions/safety alternatives. To illustrate the utility of this framework, the deep geological repository experience (1991-2006) of the French Nuclear Safety Authority Autorite de surete nucleaire (ASN) is abstracted and solved with hypothetical fuzzy rankings. The results demonstrate that such a framework could indeed help demonstrate the objectivity and transparency in regulatory decision-making to multiple-stakeholders.
1. Introduction 1,1. Radioactive Waste Management In France [1], [2] One of the key societal nuclear problems characterized by scientific uncertainties is management of high level and long-lived intermediate level radioactive wastes (referred to as radwastes in this paper). Nuclear safety authorities like ASN of France are required by society to demonstrate their objectivity and transparency in radwaste management decision-making, due to the long-term Health, Safety, and Environment implications involved for the current and future generations. In France, currently a draft radioactive waste management regulation has been tabled in the French National Assembly that formally declares a deep geological repository as the reference solution for managing radwastes in the long-term. Similar to all key nuclear regulatory decisions, this decision is also a subject of much public debate and stakeholder concerns. 2. Stakeholder Concerns And Abstraction Using RRA Table 1 outlines a sample of societal stakeholder concerns. These need to be incorporated objectively in the regulatory decision-making framework. The 845
846 conceptual and computational complexity involved in objectively incorporating such concerns requires a suitable form of abstraction in problem solving. In this paper, the societal problem-situations are abstracted using a RRA approach. This approach is based on the premise that reducing risk in one area can often lead to increases in risk in other areas, or at other times [3] and hence any action should be ideally taken keeping in view the net risk-tradeoffs. This is at the core of the 'against-arguments' expressed by Stakeholders (see Table 1). Table 1. Stakeholder Concerns and Abstraction Using RRA
S.No
1
2
3
Issue
Technical feasibility [4] of the storage site
Economic incentives and devpt. at local site [5], [6]
Sustainable Devpt. (Underground Water Resources) [7]
Against arguments
Risk-Risk perceptions identified
Derived R-R Elements that need to be further analyzed for tradeoff decisions.
IRSN: underground storage technique 'appears feasible'
Campaign group 'Get out nuclear' calls technical feasibility 'worse than dubious' and 'a crime against future generations'
R-R 1: Risk of regulatory measures not being commensur ate with desirable levels of protection to society.
Internal Hazards (IH), External Hazards (EH), Cultural Hazards (CH)
General CouncilMeuse: 'an unexpected opportunity for area to develop its local economy' subject to the reservation that disposal, if any, should be reversible CNE: Aquifer layers are 'little permeable' and part of the water is 'very salty'. (Earlier CNE gave a different impression).
Economic incentives provided for hosting not clear-is it a compensation for 'hosting' the lab site or 'bribery'. Presence of a "nuclear waste dump" may be dangerous for the local industry like agriculture, Champagne etc. CNE observations are inconclusive as regards the threat of underground water contamination. Is it not necessary to protect all Aquifers-present & future?
R-R 2:Risk of Triple Bottom Line (TBL) benefits not commensur ate with costs.
TBL elements Cost Benefit Analysis (CBA), Environmental Sustainability (ES), Social Responsibility (SR)
R-R 3: Risk of inequity.
Current Generation People (CGP), Future Generation People (FGP), Current Generation Environment (CGE), Future Generation Environment (FGE)
For arguments
847 3. FAHP Problem Formulation The goal is to aid ASN choose the 'safest' option for radwaste management from die following available alternatives: al- maintain status quo with existing radwaste management mediods, a2 - construct a deep geological radwaste repository in France, a3 - participate in an international deep geo repository. 3.1. Integrating Precautionary Elements in the problem formulation Formally spelt out beginning from the Maastricht Treaty [8], a key precept that guides regulatory decision-making under scientific uncertainty in Europe is the PP. Fig.l displays a 'normative' concept net prepared from the communication of the Commission of the European Communities (EC) on the PP [9], from which normative, precautionary elements have been extracted.
Fig. 1 Normative Concept net of the PP based on yr.2000 EC Communication [8]
3.2. Integrating R-R Elements in the problem formulation Table 2. Linking R-R elements with the normative PP elements in the Hierarchy.
S.No.
L2 Elements (based on linguistic marker 'should' and semantics as per the EC PP communication)
Correspondi ng Risk-Risk element from Table 1
1
Proportionality
R-Rl
2
Non-Discrimination/ Consistency
R-R 3
3
Examining Benefits and Costs (Triple Bottom Line)
R-R 2
Associated L3 elements
Internal Hazards (IH), External Hazards (EH), Cultural Hazards (CH) Current Generation People (CGP), Future Generation People (FGP), Current Generation Environment (CGE), Future Generation Environment (FGE) Cost Benefit Analysis (CBA), Environmental Sustainability (ES), Social Responsibility (SR)
848
Analytic Hierarchy Process view of the French Deep Geo Disposal Repository Problem Selecting The Safest* Long Trim M»n>igrineiil Option for High le\el mid Long IJved Bndvmsle in France
NiMi-dte'Ciiiituiiilioii
Proportionality
..aMcoiw^teuej:.
Z,, F
*safety iu the face of scientific uncertainty as characterized by the Precautionary Principle Fig. 2 FAHP Decision-making Hierarchy Table 3. Sample Questions to obtain Fuzzy Ranking from Regulatory DecisionMakers
Ql
Q2
Q3
How important is the criteria of proportionality as compared to the NonDiscriminatory Nature and Consistency? How important is the impact of unforeseen Internal Hazards on safety levels as compared to unforeseen Cultural Hazards? How beneficial is a2 over a3 in terms of economic cosl-benefits?
Ranking Preferences (7/2,4,9/2)- Absolute, (5/2,3,7/2)- Very Strong/Important/beneficial (3/2,2,5/2) - Fairly Strong /Important/beneficial, (2/3,1,3/2)- Weak /weak importance/weak benefits, (1,1,1 )-Equal/Equal Importance/equally beneficial.
It is important that this ranking is not arbitrary and reflects the norms and consensus of the decision-making organization (ASN). 4. Solution To The FAHP Problem With Hypothetical Fuzzy Rankings Table 4. The Summary Fuzzy Evaluation Matrix of the French Radwaste management alternatives. Alt. Priority wt
P
NDC
TBL
Wts.
0.33
0.33
0.33
al
0.41
0.24
0.16
0.25
a2
0.34
0.53
0.69
0.51
a3
0.24
0.23
0.15
0.20
849 The solution to the FAHP problem has been arrived at using Chang's extent analysis as illustrated by Kahraman et al. [10] using hypothetical fuzzy preference rankings. From Table 4 the alternative that ranks highest in terms of safety as per the normative precautionary approach is a2-i.e. A deep geological repository option for the radwastes. 4.1. Advantages Of Using The Above Method 1.
2. 3.
It increases objectivity and transparency in decision-making by enabling objective comparison of all feasible alternatives against the same decisionmaking criteria. It uses a 'normative' Precautionary approach, which is Europe's stated regulatory precept of dealing with scientific uncertainties. Through the RRA abstraction, the method objectively integrates societal concerns into regulatory decision-making criteria thereby lending credence to multi-stakeholder involvement in regulatory decision-making.
5. Further Directions And Improvement Further directions involve using more rigorous methods like Force Field Analysis for analyzing societal concerns, developing suitable clustering methods for classification of risk-risk issues at L2 level, incorporating the French national understanding of the Precautionary Principle into concept net (which is currently based at EC level), developing objective guidelines for fuzzy preference ranking in the decision-making organization ASN, etc. Also the above FAHP model could be extended to form a scenario-building model, which could then seamlessly integrate the impact of other developments that arise in the strategic context of nuclear safety regulation for radwastes. E.g. the success of transmutation research. 6.
Conclusion
This paper described how a RRA based abstraction approach could be used in solving societal problems in nuclear systems that are typically characterized by scientific uncertainties. The approach involves combining the normative elements of the Precautionary Principle with the Risk-Risk elements in a Fuzzy AHP decision-making hierarchy. Such an approach demonstrates the objectivity and transparency in regulatory decision-making by ranking the 'safest' alternative in comparison to competing alternatives based on net risk-risk tradeoffs. To demonstrate the use of this approach, the French (ASN) experience
850 in choosing the safest long-term alternative in radwaste management (deep geo repository) was modeled using hypothetical fuzzy preferences. 7. Acknowledgments I am grateful to M. Ludo VEUCHELEN (SCK.CEN), Prof (Dr.) Da RUAN (SCK.CEN), Prof. (Dr.) Etienne Kerre (Ghent Univ.) and Ms. Marianne LAVERGNE (ASN) for their help in making this paper happen. I owe my gratitude to my husband for his help in checking computations as occurring in this paper. I devote this paper to the precious affection bestowed on me by my Father and my elder Brother. References 1.
Fact Sheets on the regulatory control of radioactive waste management in NEA member countries, The Control Of Safety Of Radioactive Waste Management In France, NEA, OECD, Apr 2005 2. Web Translations of the draft French Bill PROJET DE LOI de programme relatifa la gestion des matieres et des dechets radioactifs, March 2006 3. R.M.Males, Beyond Expected Value: Making decisions under Risk and uncertainty A report submitted to: U.S. Army Corps of Engineers, 2002, 74 4. Staff writers, Terradaily News about Planet Earth, Article French Nuclear Watchdog Gives Thumbs-Up To Deep Waste Burial, Jan 2006 http://www.terradailv.com/reports/French Nuclear Watchdog GivesThum bs Up ToDeep Waste Burial.html 5. COW AM Secretariat, Bure case Study, Paris, Feb 2005 http://www.cowam.com/article.php37id article=32. 6. A. Ferron, Confrontations Europe, Summary of the Entretiens Conference, 'Debate on the Management of Nuclear Waste, Projects Against Fear', Nov 2004 http://www.confrontations.org/publications/lettres/69/artferron_en.php 7. M.B.Davis Nuclear France: materials and sites, 2004, http://www.francenuc.org/en sites/lorr bure e.htm 8. J.L. da Cruz Vilaca, The Precautionary Principle in EC law, European Public Law, Vol 10 Issue 2, Kluwer Law International, Jose Luis, 2004 369 9. Summary of Communication from the Commission on the Precautionary Principle, Communication of the European Communities, Brussels, 2000 10. C.Kahraman, U.Cebeci, D.Ruan, Multi-Attribute Comparison Of Catering Service Companies Using Fuzzy AHP: The Case Of Turkey, International Journal of Production Economics, Elsevier, 2003,174-177
A FUZZY LOGIC METHODOLOGY FOR OPEN SOURCE INFORMATION SYNTHESIS IN A NON-PROLIFERATION FRAMEWORK7 ISABELLA MASCHIO Department ofNuclear Engineering, Politecnico di Milano, via Ponzio 34/3, 20133 Milano, Italy Abstract The international nuclear proliferation control regime is facing the necessity to handle increasing amounts of information in order to assess States' compliance with their safeguards undertakings. An original methodology is proposed that supports the analyst in the State evaluation process by providing a synthesis of open source information. The methodology builds upon IAEA nuclear proliferation indicators and develops within the framework of fuzzy sets. In order to aggregate the indicators' values, both the reliability of the information sources and the relevance of the indicators are taken into account. Two distinct logical systems based on fuzzy sets are addressed to identify appropriate aggregation tools: possibility theory and fuzzy logic.
1.
Introduction
In the last decade, the International Atomic Energy Agency experienced a substantial reinforcement of its role: from verifying the non-diversion of safeguarded nuclear material to giving reasonable assurance about the absence of undeclared nuclear material. As a consequence, under the Additional Protocol, States provide a wider range of information about materials, equipment, know how and sites, directly or indirectly related to the nuclear fuel cycle. By confronting the information supplied in States declarations with inspections' results and other information sources, including open source, analysts assess both the correctness and completeness of the declarations. A decision support system has been developed to provide a synthesis of open source information. Based on the IAEA's Physical Model [5. ], it is constituted by an acquisition and analysis module, where information is drawn from open sources and values are assigned to proliferation indicators, and an aggregation and output module, where the information is synthesized. In order to adapt to the specific nature of open source information, indicators are assigned linguistic values. As a consequence, fuzzy sets [8. ] are adopted to compute with them and the universe of discourse, the interval of values each proliferation indicator can assume, is subdivided into five fuzzy sets: DN, definitively not [present], RN, rather not [present], UN, uncertain, RY,
1
Excerpt from PhD thesis [7. ]
851
852 rather yes (present), DY, definitively yes (present). Moreover, Gaussian-type -(x-c)2 membership functions (ux = e ) are adopted.. For the purpose of the present work, two characteristics of information are further considered: relevance and reliability. As far as relevance is concerned, the Physical Model gives a detailed description of all known elements for the acquisition of sensitive nuclear material or technology. Elements have been assigned a strength considering the importance they assume in the acquisition path. Hence, each indicator can be of strong, medium or weak strength: e.g. a centrifuge is a strong indicator of the presence of the UF6 enrichment process by gas centrifugation, a molecular pump is a medium indicator and titanium alloys is a weak indicator. Clearly, relevance is a technical characteristic of the indicator. Differently, reliability of information is related to the source providing it: a list of referenced sources has been established by experts and each of them has been ranked according to its reliability level: high, medium or low [7. ]. Finally, in order to compensate somehow the arbitrariness introduced in setting up the methodology, three principles have been outlined that ensure a) the transparency of methodology, b) the multiplicity of sources (indicators can be assigned multiple values from multiple sources, not necessarily coherent with each other) and c) the traceability of information. Hereinafter, the aggregation methodology is outlined (2.), followed by an example (3.) and conclusions (4.). 2. The Aggregation Methodology Shortly, the scope of the present methodology is to combine a number of linguistic indicators' values into one synthesis numerical value. The aggregation methodology proceeds in two phases: phase 1: to synthesize the information about the presence of each single indicator; taking into account the reliability of the sources; phase 2: to synthesize the information about the present of a single technology, taking into account the relevance of the indicators. 2.1.
Phase 1: aggregation according to reliability
A narrow examination of the first aggregation phase allows identifying two substeps: step 1: for each indicator, to gather the values provided by information sources of equivalent reliability level, thus obtaining three components: a high-reliability component, a medium-reliability one and a lowreliability one;
853 step 2: to combine those three components, giving priority to that of higher reliability. In the end, one single value is obtained for each indicator. Possibility Theory, as derived from Zadeh's Theory of Approximate Reasoning [9. ], is an appropriate framework to cope with this issue. Further to the definition of fuzzy sets [11. ], Zadeh introduced fuzzy relations and fuzzy restrictions [10. ] so that a possibility distribution (nK) is defined as the set of all the possible values assumed by a variable. Hence a possibility distribution can assume the same analytical formulation as a membership function (ux). In the present case, the membership functions associated with the linguistic values attributed to indicators are considered as possibility distributions (7ix = ux). Possibility distributions are the starting point of Dubois and Prade Possibility Theory [1. ] [2. ] [3. ]. As it has been stated in early works, the semantic associated with the Possibility Theory intrinsically differs from that of probability theory or belief theory, and specific tools are needed to operate in this framework. Among them, combination operators have been studied in depth [4. ]. Hereafter, the selection of combination operators based on the characteristics of the information to be aggregated is 4. briefly illustrated. 2.1.1.
Step 1: conjunctive operator
The aim of the first aggregation step is to combine values assigned by sources with equivalent reliability level, i.e. interchangeable sources: it is assumed that only the information common to both sources is valuable. Actually, the most appropriate operators for this task are conjunctive operators that perform the intersection between two sets of values. Simple conjunctive operators are min and prod. In addition, operators fulfilling the idempotence property (like min) produce a reinforcement effect on the result whenever input values are repeated. However, reinforcement may be dropped when no information is available about the independence of the sources. When the positions of the sources are distant, the result of a conjunctive combination is sub-normalised. A normalisation operation can be performed in order to smooth the amplitude of the discrepancy between the sources, which is measured through the "conflict index" [2. ]. These few considerations drive to the selection of the normalised min operator: n
"
W
=
min(n,n 2 ) urn rr \
where II i and n 2 are the distribution function associated to two information sources and h (n^ n 2 ) is the conflict index:
854 /2(n,,n 2 ) = max(min(n,,n 2 )) As a result of this first aggregation step each indicator is associated with three components: one of high, one of medium and one of low reliability level. 2.1.2. Step 2: disjunctive operator The aim of the second step is to produce one single value for each indicator, by combining the three components obtained in step 1. According to the principle of multiplicity of sources, none of them is disregarded, but higher priority is given to more reliable ones. Disjunctive operators perform the union between two sets of values. Adopting a disjunctive operator is a conservative attitude, aiming at taking into account all the information available. A typical disjunctive operator is max. Yet, since higher reliability sources are to be privileged, a prioritised combination rule [2. ] is assumed, that takes into account the whole of the information provided by the most reliable source, whereas the less reliable source is considered only as far as it is in agreement with the first one. A prioritised disjunctive operator, here max, is selected: ^PR-MAX
= max
l n,,min
where ri| is the most reliable distribution. The first phase of the aggregation process is concluded when a single value is obtained for each indicator. 2.2. Second phase: aggregation according to strength In the second phase, data are first homogeneously combined by strength, then the components obtained are aggregated. Therefore the second phase is subdivided in two steps: step 3: for each technology, all the strong indicators are aggregated in one unique strong component, medium indicators in a medium component and weak indicators in a weak component; step 4: strong, medium and weak components are aggregated considering their respective strength. The final result of the second phase is a value for the synthesis indicator of the presence of a given technology. 2.2.1. Step 3: conjunctive operator The third step is similar to the first one in that it aims at combining equivalent pieces of information, here in terms of strength. Therefore a conjunctive operator is selected. However, and differently from step 1, it is assumed that the
855 larger the number of indicators involved, the more precise the result obtained: hence a reinforcement effect is sought. The prod operator (conjunctive, non idempotent) fulfils these criteria. Moreover, normalisation is performed, as in step 1. The normalised prod operator is chosen: n
1Vra
_ j?roJ(rii,n 2 )
"
Aai„n2)
2.2.2. Step 4: fuzzy inference system. At step 4, the aggregation of the previously obtained components is performed. A knowledge-driven instrument is necessary in order to interpret how the simultaneous presence of components of difference relevance combine to draw out a value for the synthesis indicator of the presence of a given technology. Within the fuzzy logic theory, fuzzy inference systems have been developed as rule-based systems where the logical knowledge about the process is enclosed in a set of rules [6. ]. Here, the three components that have to be combined represent the input to the fuzzy inference system; the synthesis indicator of the presence of a given technology is the output. The set of rules specifies how the simultaneous presence, to various extent, of the three input components results in the final assessment of the presence of a given technology. It establishes how uncertainty and relevance of information combine in order to produce an assessment of the presence of a given technology. The rule-base is obtained by allowing the three components (S: strong, M: medium, W: weak) assuming the five linguistic values in the universe of discourse: DN, RN, UN, RY, DY. The 125 rules are shown in table 1, table 2, table 3, table 4 and table 5. The result of the fourth aggregation step is the synthesis indicator's value (synth), a fuzzy set that is converted into a crisp value, by means of the centre of mass defuzzyfication method [6. ]. table 1- Rules for W=DN
table 2 - Rules for W = RN
M\S
DN
RN
UN
RY
DY
DN
DN
RN
RN
UN
DY
RN
DN
RN
RN
UN
DY
UN
RN
RN
UN
RY
DY
RY
UN
UN
RY
RY
DY
DY
UN
RY
RY
RY
DY
M\S
DN
RN
UN
RY
DY
DN RN UN RY DY
DN DN RN UN UN
RN RN RN UN RY
RN RN UN RY RY
UN UN RY RY RY
DY DY DY DY DY
856 table 3 - Rules for W=UN DN RN MS UN DN RN
DN
RN
RN
DN
RN
RN
RY
DY
UN
DY
UN
DY
table 5-Rules for W = DY DN RN UN
ms DN RN
RY
DY
DN
RN
UN
RY
DY
RN
RN
UN
RY
DY DY
UN
RN
RN
UN
RY
DY
UN
RN
UN
UN
RY
RY
UN
UN
RY
RY
DY
RY
UN
UN
RY
RY
DY
DY
UN
RY
RY
RY
DY
DY
UN
RY
RY
RY
DY
table 4 - Rules for W = RY MS DN RN UN
RY
DY
DN
DN
RN
UN
RY
DY
RN
RN
RN
UN
RY
DY
UN
RN
UN
UN
RY
DY
RY
DY
RY
DY
RY
UN
UN
RY
DY
UN
RY
RY
3. Example 3.1. Input data It has been supposed that a given technology (e.g. UF6 enrichment by gas centrifugation) is described through nine indicators: three strong (i314, i315, i316, e.g.: frequency changer, rotor tube, magnetic suspension bearings), three medium (i200, i201, i202, e.g.: molecular pump, motor stator, centrifuge housing) and three weak (il50, il51, il52, e.g.: high strength aluminium, titanium alloys, pressure measurement instruments). It was also assumed that the information sources supplying information about these indicators are: three high-reliability sources (SI, S2, S3), two medium-reliability (S4, S5) and three low-reliability (S6, S7, S8). Fictitious values are given in table 6. Table 6 - Input values MEDIUM reliability
HIGH reliability SI
II II It
1314
1
UN
1315 1316
S4
RY
UN
UN
RN
RY
DY
RY
|
DY DN
LOW reliabhty
1
S7
S5
S6
DN
RN
UN
RN
RY
UN
DY
DN
UN
RY
DY
DN
|
S8 DY
DN ,
RN
UN
RY
1151 i152
S3
UN
i202 1150
I
DY
i200 1201
S2
DY
UN
1
DY
DY
UN
RY
DY
i
DN
UN
RN
UN
UN
)
UN
DN
I
RY
DY
;
RY
i
UN DY
1
857 3.1.1. Step 1 The normalised min operator is applied to the three high reliability values for the i314 indicator (UN, RY, UN); the result is the high reliability component i314H ( figure 1). The same operation is performed for the two medium reliability values and for the three low reliability values, resulting in the i314M and i314L components. 3.1.2. Step 2. Yet, the three components of the i314 indicator obtained are combined through a prioritised max operator. The result is given in figure 2. Due to low agreement between high reliability sources (i314H) and medium reliability sources (i314M), die latter contribution to the result is sensibly reduced. Moreover the low reliability component (1314L) is totally absorbed by the high reliability one (i314H). 3.1.3. Step 3. At the third step, the single values obtained for each strong indicator (i314, i315, i316) are combined into a strong component (iS) by means of the normalised prod operator (figure 3); analogously for the medium indicators (i200, i201, i202) which are combined in iM and for the weak indicators (il50, il51, il52) in iW. 3.1.4. Step 4. At the fourth and last aggregation step, the input variables to the inference system are the strong (iS), the medium (iM) and the weak components (iW); the output is the synthesis indicator (synth) represented in figure 4. After defuzzyfication (centre of mass), the crisp value obtained for the synthesis indicator of the presence of UF6 enrichment by gas centrifugation is: 0.4472. 4. Conclusions The present work outlines an original methodology for open source information synthesis based on the IAEA Physical Model indicators' and on fuzzy logic. A decision support system has been developed based on this methodology [7. ] as a contribution to enhance the effectiveness of the information analysis process, and, to some extent, the efficiency of the State evaluation process which is today at the centre of safeguards systems. By providing a synthesis of conspicuous amounts of information, the decision support system reduces analytical work at
858 the lower level, while experts' evaluation capacity at a higher level is not brought into question.
figure 1 - First aggregation step SrdagareaBlienlwiS
figure 2 - Second aggregation step
flgure 3 . Third aggregation step J
' i;ytgtfon sB»p: lyrtlieiia Indicator
figure 4 - Fourth aggregation step
References 1. D. Dubois, H. Prade, J. Approx. Reasoning, 2(1), 65-87 (1988). 2. D. Dubois, H. Prade, Control Eng. Practice, 2 (5), 881-823 (1994). 3. D. Dubois, H. Prade, Data Fusion in Robotics and Machine Intelligence, M. A. Abidi, R. C. Gonzalez (eds.), Academic Press Inc. (1992). 4. D. Dubois, H. Prade, Fuzzy sets and Systems, 142, 143-161 (2004). 5. IAEA, Physical Model, STR-314,1 to 8 (1999) and 9 (2000) (restricted). 6. M. Marseguerra, Notes on Introductory Fuzzy Reasoning, Politecnico di Milano (2003). 7. I. Maschio, "A Fuzzy Logic Decision Support System for Open Source Information Analysis in a Non-Proliferation Framework", PhD thesis, Politecnico di Milano, Italy (2005). 8. L. A. Zadeh, IEEE transactions on Fuzzy Systems, 4 (2), 103-111 (1996). 9. L. A. Zadeh, Machine Intelligence, vol. 9, J. Hayes, D. Michie, L. Mikulich (eds), Wiley and sons( 1979). 10. L. A. Zadeh, Fuzzy sets and Systems, 100, 9-34 (1999). 11. L. A. Zadeh, Information and Control, 8, 338-353 (1965)
A FINANCIAL-OPTION METHODOLOGY FOR DETERMINING A FUZZY DISCOUNT RATE IN RADIOACTIVE WASTE MANAGEMENT P. L. KUNSCH ONDRAF/NIRAS,
14 Avenue des Arts BE-1210 Brussels, Belgium
p.kimsch&mirond.be
ABSTRACT. With the present probabilistic analysis the author proposes a procedure based on financial-option theory for determining a triangular fuzzy number representing the discount rate for use in the economic calculus of radioactive waste management. The adequate present funding to be set aside for the future is equal to the net present value (NPV) of the future costs, including technical-scenario uncertainties. For taking into account the financial uncertainties, the NPV is identified with the strike price of a European put option calculated with the Black-Scholes formula, and the asset value in the managed fund is identified with the current price.
1. Scope of the analysis The choice of a discount rate for comparing costs distributed over long periods of time is a well-known but difficult task, discussed by many authors. Some review of the existing literature on this topic can be found in Lind [1], Kunsch [2], Sumaila et al. [3]. In this paper the author addresses the discount-rate choice problem from the funding perspective of radioactive-waste management. In this particular setting the open literature is to our knowledge limited, if not inexistent. The proposed original approach has been used by ONDRAF/NIRAS [4], the Belgian radioactive waste management organization (WMO) for determining the funding level of its activity. In section 2 the basic assumptions are discussed for applying the proposed methodology. Section 3 describes the principles of the probabilistic approach based on option theory which is used. Section 4 details how to define the uncertain discount rate as a triangular fuzzy number. Section 5 gives short conclusions. 2. Basic assumptions of the model In this paper a general nuclear-waste-management project is considered to be properly funded at the inception time, i.e., just before the project starts at time 859
860 t=0. It is assumed that the overall budget of the project, including uncertainty margins, is completely known. A fuzzy approach to describe technical uncertainties can be found in [5] and references therein. Therefore the fuzzy evaluations in this paper will not at all deal with these aspects, which have to be taken into account in addition. The present paper exclusively deals with the evaluation problem of the discount rate (D.R.) for calculating the Net Present Value (NPV) at t=0 of the total cost distributed over a given extended time horizon T. Although it may seem unrealistic to consider funding over time periods of one century or more, this is indeed the case ONDRAF/NIRAS is facing with the so-called Long-Term Fund: the latter is supposed to finance the whole future cost streams of the high-level-waste repository, inclusively some institutional control for several decades after the site closure. The initial funding S0 is to be set aside at t=0 in a fund, managed according to a specific strategic asset allocation (AA). The stochastic rates of return fj. are expressed in constant currency units at t=0; possible AA's are as follows: A low-risk portfolio with 100% bonds. The couple (rate of return //=2,30%; standard deviation (7=3,90%) is estimated from historical data series; A medium-risk portfolio with 50% equities/50% bonds. The couple (rate of return jU =5%; standard deviation <7 =10%) is estimated from historical data series. Etc. Further, the assumption is made of Brownian motion for the asset evolution, so that the resulting distribution of the asset price ST at t=T is a lognormal distribution. T is the portfolio liquidation date. Several T-values are considered for the funding: T= 5, 15, 25, 50, 100 years are typical for radioactive-waste management. More formally, In S(T) is normally distributed (see [6] and [7]):
\nST ~0> l n S 0 + ( / / - c r 2 / 2 ) 7 \ c r V r
(1)
where S0, ST represents the value of the value of assets in t=0 and t=T respectively; (&(m, s) represents the cumulated normal distribution of mean value m, and standard deviation s. Calling a the discount rate (D.R.), and X(T) the liability at the time horizon t=T, the funding S0 at initial time t=0 set aside for financing X(T) will have to satisfy the equality: assets = liabilities, thus:
Sa = X(T)eal
(2)
Note that the D.R.=a will always be smaller than the real mean rate of return fj, of the portfolio, even for an infinite time horizon T, because of the return volatility measured by the standard deviation G. The available assets in t=T are represented by the lognormal distribution defined in (1). Adequate funding situations in T will impose:
861
ST>X(T)
(3)
Under the given assumptions, it is easy to calculate the probability that the asset value, which is represented by the stochastic variable ST, will be inferior to the supposedly deterministic liability value X(T), covered by the funding, i.e., to evaluate the risk of insufficient funding ('underfunding') at time T. This is illustrated in Fig. 1. Given the compounded return and standard-deviation data of each specific portfolio, and using the initial asset value S0, calculated from the known values of X, T, and the D.R., it is now very easy to calculate the "Confidence Level" ('CL'), i.e., the actual probability for X being sufficiently on the left-side of the probability distribution shown in Fig. 1. This probability is represented by the surface underneath the distribution to the right of X; the surface to the left of X thus represents the risk of not achieving the liability target at time T.
probability distribution o o o o o o o o o ro -£*• 05 Co
Lognormal distribution
X
ji )
0,5
i I
/ ~ \
:
v
COG 1
1,5
1 • i
i
\ 2
2,5
asset value Figure 1. The log-normal asset distribution in arbitrary units; the cost value (liabilities) X must be smaller than the average fund value (assets). The confidence level (CL) for having sufficient funding is given by the surface below the distribution to the right of X. As explained in section 3, the surface to the left is equal to the probability of exercising the put option. The value of the option can be estimated by the distance between X and the projected center of gravity (COG) of the described surface on the asset axis.
A convenient way of performing the risk calculation is to use the formalism of option theory explained in the following section. 3. The put-option theory and the Black-Scholes formula A put option as explained in [6], [7] is an asset protection instrument: it covers the risk that the price ST, i.e., the asset value at time T, falls below the liability value X(T), called the strike price in financial-option theory. The value of the
862 put option is equal to the expected value E[.] of the difference between those two prices, resp. values, as follows: Put option price=P=E[max(0,X-ST)]
(4)
The price of the put option P is given by the celebrated Black-Scholes formula, we write here for the put pricing (rather than for the call pricing, as usually done):
P = XerJ [1 - N(d2)] - S[\ - N(dt)] di = d2 +
7= In
ayff
(5) =-
Xe~rT
CT-N/T
2
where: N(.) is the standardised normal cumulative distribution of mean value 0 and standard deviation 1 r is the risk-free rate of return. In our model, r= fJ., the expected portfolio return; N(d{)represents the so-called 'delta', i.e. the derivative of the callprice with respect to the price of the assets. 1 - N(d2) represents the probability of exercising the put option, i.e. the probability that S < X . In our interpretation, we have the probability that the assets will be inferior to the liabilities X(T) at time t=T. This probability gives the confidence level for having sufficient assets for the funding. S = 5 0 is the value of the assets placed in the fund at t=0. The strike price X=X(T) represents the future value of the total liability of the fund at T, called in option theory the exercise time. Remember that in this paper we consider only financial uncertainties. We assume that all technical uncertainties have been taken into account when calculating the X(T)-value using the methodology described in [5]. With this assumption the liability value is deterministic for each time horizon T, but the asset value is a stochastic variable. Financial-option theory it is only used for obtaining statistical properties: of course there is no need here to buy options on the specialised market. No additional assumptions beyond those explained in section 2 are necessary for using the Black-Scholes formula. The put-option price P is shown to be equal to the average expected asset value represented by the surface left to the strike price X in the distribution of Fig. 1. As discussed before, this average value represents at the same time the
863 underfunding risk at time T. The price is equal to the distance between X and the center-of-gravity (COG) of the left wing of the probability distribution at time T: it provides a measure of this risk. An important result for the present discussion is that the BS-formula gives at the same time the confidence level CL=N(d2), which measures the probability of not exercising the option because there is no underfunding; the probability of exercising the option for protection is thus 1-CL, equal to the surface of the normalised probability distribution to the left of X. The maximum expected amount by which underfunding occurs at time T corresponds to a value farthest left of the log-normal asset distribution, which can be satisfactorily approximated as follows: Maximum expected underfunding ~ 3* P
(6)
The option price P is given, as discussed above by the distance of the COG from X in the left wing of the asset distribution at time T (Fig. 1): the latter can be approximated by a rectangular triangle: from the property of the COG it directly comes that the vertex farthest left of this triangle is located at three times the distance of the COG from X. By means of the Black-Scholes formula, acceptable values for the discount rate can be identified, which fulfill some minimum requirements of the WMO, in order to keep low the funding risk at any time T. The first minimum requirement is that the confidence level for sufficient funding would be at least 50%, because otherwise, as can easily be shown, the average asset value at time T would come below the liabilities X(T), which is clearly unacceptable:
CL(T,D.R)> 50%
(7)
CL should ideally be increasing with increasing T as the relative volatility of the portfolio, and thus the funding risk, both decrease with T:
dfCL(T,D.R.)] -± 11 > o dT
(8)
It can be decided in addition that the maximum expected underfunding given by Eq. (6) represents a additional Initial Funding Premium (IFP) to be included into the initial funding, herewith reducing the underfunding risk to a low value: IFP = 3*P
(9)
To be acceptable, it seems logical to impose that the funding premium would not be larger than a maximum percentage of X, e.g., 20%.
864 IFP<20%X
(10)
An additional, but less stringent, condition could be imposed on the IFP, which should ideally decrease with T:
driFP(T,D.R.)l -i
^ < 0
(11)
4. Defining a triangular fuzzy number for the discount rate A discount-rate range can be established on the basis of the requirements given in eqs. (7) to (11), in a way which is either just acceptable (maximum value), or reasonably conservative (minimum value). Also a centrally located value with average quality regarding those requirements can be obtained. In this way, the D.R. can be represented by a triangular Fuzzy Number (FN) [D.R.(min), D.R.(central), D.R.(max)]. As an example the analysis results are presented in the case of the 100% bond portfolio: The maximum value D.R.(max)=2% is adopted has being just acceptable: the CL increases from 55% to 72%, which is a rather poor but just acceptable performance; note that the subsidiary eq. (11) is not fulfilled. The central value with reasonable properties is obtained for this portfolio as being D.R.(central)=1.7%. The CL increases from 62% to 91%; eq. (11) is fulfilled, except for T<15. The choice of a minimum D.R. value depends on how strict one will be with the CL: a minimum CL=70% results in D.R.(min)=1.3%. The triangular FN [1.3%; 1.7%; 2%] is used to compute the funding using VENSIM© [9], a simulation code based on System Dynamics: the approach consists in integrating the discounted costs over the time-period from year 2005 up to year 2100. Note that, because of this integration, the final result is only available at this termination date. All operations performed during the simulations are arithmetic ones, so that the fuzzy funding is also represented by a triangular-like FN. To better visualise this FN, many funding traces are generated with VENSIM© according to the methodology developed in [8], Consider some Ct -cut of the triangular FN representing the D.R., and its application to the interval la =\cc,2 — a] centered on 1. A random number V e Ia distributed according to a triangular probability law replicating the membership grades within the or -cut is immediately obtained as follows:
865 Firstly, a random variable interval[a m i n ,6 m a x ], such that: a = # / • / ) u min /2 '
max
=1_or
g
is
uniformly
sampled
/ /2
in
the
(12)
Secondly, the random variable V e /^ is calculated as being: (13)
v = 2-V2(l-^)
500,000
400,000
300,000
200,000 2095
2096
2098 Time (Year)
2099
2100
Figure 2. The visualisation of the fuzzy D.R. by means of random sampling within the CC -cut values (OC =0). The central darkest trace corresponds to the central D.R. =1.7%
866 5. Conclusions In this paper an option methodology has been presented for calculating a fuzzy discount rate to be used for funding radioactive-waste management. The BlackScholes formula has been used because it is convenient for evaluating the risk of insufficient funding. It is assumed that the liabilities are deterministic, because they are supposed to include sufficient contingency factors like described in [5], while the asset value in the fund is stochastic, because its volatility depends on the adopted asset allocation (AA). It results from computational experiences with different AA's that it is advisable to use a D.R. values significantly smaller than the average portfolio return. Even with a low D.R. the risk of underfunding is not eliminated. It is recommended to include in the funding a premium, the value of which is equal to three-times the put-option price. References 1. Lind, R.C. (Ed.) Discounting for Time and Risk in Energy policy, John Hopkins University press for resources for the Future, Baltimore (1982). 2. Kunsch, P.L. Environment and Multigenerational Social Choices", in, M. Paruccini (Ed.), Eurocourses Environmental Management, Vol. 3, Kluwer Acad. Publishers, 199-211 (1994). 3. Sumaila, U.R., and Walters, C. Intergenerational discounting: a new intuitive approach, Ecological Economics, 52, 135-142 (2005). 4. ONDRAF/NIRAS (2004) Website www.nirond.be 5. Fiordaliso, A., and Kunsch, P. A Decision Supported System Based on the Combination of Fuzzy Expert Estimates to Assess the Financial Risks in High-Level Radioactive Waste Projects, Progress in Nuclear Energy, 46, N° 3-4, pp. 374-387 (2005). 6. Beninga, S. Financial Modeling, 2nd edition, MIT Press, Cambridge, MA (2000). 7. Hull, J.C. Options, Futures & Other Derivatives, Prentice-Hall International, London (2000). 8. Kunsch, P., and Springael, J. Application of Fuzzy Set Theory and Logic to System Dynamics Modelling. Short title: Fuzzy System Dynamics Modelling. VUB/STOO work document, Brussels (2002), to be published. 9. VENSIM Professional32 Version 5.4.a ©Copyright 1998-2003, The Ventana Simulation Environment, Ventana Systems, Inc.
APPLICATION OF INTELLIGENT DECISION SYSTEM TO NUCLEAR WASTE DEPOSITORY OPTION ANALYSIS DONG-LING XU, JIAN-BO YANG Manchester Business School The University of Manchester Manchester Ml 5 6PB Ling.Xu@Manchester. ac. uk
BENNY CARLE, FRANK HARDEMAN, DA RUAN Belgian Nuclear Research Centre (SCK'CEN) Boeretang 200, 2400 Mol, Belgium This paper describes how the Evidential Reasoning approach for multi-criteria decision analysis, with the support of its software implementation, Intelligent Decision System (IDS), can be used to analyse whether low level radioactive waste should be stored at the surface or buried deep underground in the territory of the community of Mol in Belgium. Following an outline of the problem and the assessment criteria, the process of using IDS for data collection, information aggregation and outcome presentation is described in detail. The outcomes of the analysis are examined using the various sensitivity analysis functions of IDS. It is demonstrated that the analysis using IDS can generate informative outcomes to support the decision process.
1. Introduction Mols Overleg Nucleair Afval vzw (MONA) is a partnership between the municipality of Mol and ONDRAF/NTRAS, the Belgian Agency for Radioactive Waste and Enriched Fissile Materials, set up to examine whether disposal of lowand medium-level short-lived waste is feasible in Mol, a municipality located in the Belgian province of Antwerp. MONA consists of four working groups: Sitting and Design (S&D), Environment and Health (E&H), Safety Assessment (SA), and Local Development (LD) [6], By the request of MONA, multi-criteria decision analysis is carried out in the Belgian Nuclear Research Centre (SCK'CEN) to evaluate the options for a repository of low level radioactive waste. The objective of the analysis is to support a number of representatives from the various working groups within MONA in their selection between the two acceptable options for the repository on the territory of Mol. The first option is a surface repository according to a concept further developed by the working groups of MONA, and the other option is a deep 867
868 repository in the clay layers underneath the nuclear site of Mol [2]. To simplify the description, the two options are code named Surface and Deep respectively. The role assigned to and accomplished by the researchers at SCK'CEN was to assist the MONA members to analyse the two options in an as objective manner as possible, according to the ethical charter of SCK>CEN, without any efforts to influence the participants. This study should facilitate MONA to select between both options, or in the case this appears to be difficult, at least get a well-structured overview of all factors (criteria) of importance to the judgment, and get insight into the degree to which the various criteria contribute to the selection [3]. This paper describes how the data collection and analysis process can be supported using the Intelligent Decision System1 (IDS) software. IDS is a general-purpose multi-criteria decision analysis (MCDA) tool, developed in Manchester UK on the basis of the Evidential Reasoning approach [9]. The ER approach uses concepts from several disciplines, including value theory [4] in decision sciences, theory of evidence [8] in artificial intelligence, and statistical analysis. It models multiple-criteria decision analysis problems using belief decision matrices, in which traditional decision matrixes are a special case. Belief decision matrices allow different formats of data, whether subjective or objective, random or deterministic, and with or without uncertainties including missing data, to be used in the decision analysis. A weighted evidence combination algorithm [9], a nonlinear statistical process, is used for aggregating the information in the belief decision matrix, so that more insightful information can be generated to support the decision making process. The outcomes of the ER algorithm include not only average performance scores of alternatives on each criterion, but also the distributions of their performance variations which reveal the strengths and weaknesses of each alternative. When there are uncertainties in the input data, ER algorithm can calculate the lower and upper bound of the score ranges of alternatives. Such score ranges show the global sensitivity [7] of rankings of different options to uncertainties. Theoretically, the ER algorithm for information aggregation requires only the satisfaction of value independence condition [4], which is easier to check and satisfy than the stringent additive independence condition as defined by Keeney and Raiffa [4] required by the multiple value function theory (MAVT) [1,4,5]. Therefore, to the ER approach the number of the attributes is less a concern than to the weighted sum approaches based on MAVT. During the past few years, the ER approach supported by the IDS software has been used in a number of areas2. In this paper the decision problem and process are first outlined. Then the process of how to use IDS for data collection and outcome analysis is described in detail. Global and local sensitivity analysis is illustrated by
2
A free test version of IDS is available from the website: www.e-ids.co.uk. Publications about the applications can be found in www.personal.mbs.ac.uk/jyang/joumals.htm.
869 examples. The features of the IDS for supporting such processes are summarised in the conclusion remarks. 2. Outline of the Decision Problem To compare and analyse Deep and Surface options, a group of volunteers were recruited from the various working groups of MONA. A number of meetings were organised and several forms were created to allow participants to express their individual opinions. During some of the meetings, representatives from ONDRAF/NIRAS or colleagues of SCK-CEN were present, mainly to observe the process, but they did not influence the process. These processes were led by the SCK'CEN staff with the support from MONA [2, 3]. During the meetings, a tree of relevant criteria (as partly shown in Fig. 1(a)), the evaluation scales for assessing each option on each criterion, and the relative importance of all criteria in the format of criterion weights were determined. Then, a form was prepared and sent to each participant. The form provided clear description (definition) of each criterion and its scale for scoring. Participants filled in the form individually and anonymously by scoring both options on all criteria and subcriteria. The evaluation scales or grades for assessing the options on the criteria are three different 5-point scales. For most of the criteria, the values of the five points are linearly distributed and they take the values 0, 25, 50, 75, and 100 respectively. For criterion 4.2 "Effect of inaccurate knowledge of inventory," a logarithmic scale (0, 35, 65, 85, and 100) is used because for this criterion it is regarded that a small decrease of the effects at the lower end is considered to be more significant and more appreciated than the same decrease at the higher end. For three other criteria, it is believed that an exponential scale (0, 15, 35, 65, and 100) is more appropriate. 3. Using IDS to Support the Process IDS provides support to multi-criteria decision analysis (MCDA) in problem structuring, data collection, infonnation and group opinion aggregation, outcome presentation and sensitivity analysis. They are discussed below.
870 ^ti E*» E * Jfiw*
frying
.iDl2l| lr*U analysis 6ff»rt SmtfW/ M n w ^ec *.JLJJJ"
• D B S H ; * 3**9:11 Alternative Nam< U Group Surface ZJGroup Deep
: i'l Lib
4.2 Effect of inaccurate knowledge of inventory
K - • 1.2 Safety of Population a; • 1.3 External risks S 1.4 Accessibility K v 1.5 Controllability & 1.6Retrievability
«i
t
>i
Define Evaluation Grades and Assiqn Utilities tf NecsBsaryfor
-,,|v 1. Safety , w, . 1.1 Safety of the workforce
ULiiy[0 1]
Help
1
Dsfinfl
I
OK
|
wads i 3 I025 C oJp 3 * ••
331.7 Collective memory m <. 2. Environment f] . . 3 . Health a .: 4. Technical feasibility .. 5. Economy v 6. Social acceDtabilltv
ij _
C-a la 1»-
l°= J J _ |075 - | [
j'aae'i I"
|01
j j Cancel
Fig. 1 IDS Main Window with Tree of Assessment Criteria and Assessment Scale Definition Window
33333233333
.jniisl
E3BB3B3235&BB1I
"3
Attribute 4 A Flexibiliiy
A.A Flexibility
Name:
Attribute Type:Quantitative or Qualitative OK f
Quantitative <* Qualitative Number of Grades:
Cancel
(^^crtoT^H—> Define Grades Now? les
Mo I
*]
Describe the following Attribute..
"3
Flexibility (32%). defined by MONA as such (author's translation): 'the expression "flexible" means that during the development and the realization of a technical solution the possibility remains to step back easily from decisions taken before or to postpone certain decisions during a certain amount of time. Flexibility refers to decisions related to policy or management, but also to technical decisions
"3
help
Oi!
Copy
Undo
OK
Advanced. »j
Canqgl
Fig. 2 Criterion Definition Windows
3.1. Problem Structuring — Model Implementation Using IDS, the construction of a skeleton criterion tree is straightforward as shown in Fig. 1(a), where there are a tree view window for displaying the criterion tree and a list view window to display the alternatives: Deep and Surface. After the tree is structured, the definition of each criterion, including its explanation, whether it is a quantitative or qualitative, and the number of grades (or points) used for assessing it, can be specified. The explanation recorded (Fig. 2) can be retrieved later to aid the assessment processes. The different five point scales and their corresponding standards (if available) for assessing the criteria can be specified through the interfaces such as the one shown in Fig. 1 (b). When participants assess the options, the standards are displayed (see Fig. 4) as scoring guidelines. This helps to reduce subjectivity and inconsistency.
871 In IDS, weights can be assigned to attribute through either pairwise comparison or interactive graph such as the one shown in Fig. 3, where the bars can be drag and dropped to the desired heights to represent the weight of the criteria. This is useful in both individual and group decision situations where users can see their views on criterion importance graphically. 3.2. Information Collection The assessment model implemented using IDS as described above can be distributed to individual participants to assess each option and record their scores and opinions. The assessment of each option normally involves collecting and recording evidence, comparing evidence with available standards, scoring and assigning belief degrees to the scores. A belief degree represents the strength to which the score is believed or evidenced to be appropriate for describing the option on the criterion. IDS supports the assessment process both technically and cognitively. Fig. 4 shows the interface for assessing an option on the criterion "1.2.1 Exposure to radiation." Technically, an individual participant needs only to tick appropriate grades and a belief degree will automatically appear beside each answer (Fig. 4). By default, IDS equally divides 100% belief degrees and assigns them to the ticked answers automatically. However, the participant may modify them and IDS will validate that the sum of the belief degrees is not greater than 100%. 3.3. Group Decision Support Individual participants may record their assessments of each option independently and anonymously using IDS. When an assessment is completed, individual files can be either examined separately or imported to a single file. After imported, the assessments made by individuals can be compared graphically or a collective assessment can be generated for each option by combining all the individual assessments made for the option. Fig. 4 shows an assessment generated by the software by combining the 16 participants' assessments. The belief degrees (0.3125, 0.5 and 0.1875) assigned to the three grades (0, 15 and 35) reflect the fact that there were 5, 8 and 3 participants scored the risk of Surface on "Exposure to radiation" criterion to be 0, 15 and 35 respectively. Such combination is carried out automatically for every criterion in the bottom (leaf) level of the criteria tree. 3.4. Information Aggregation The aggregation of the assessment information is through the ER algorithm [9] which is built into IDS. In IDS, the aggregation process is automatic and is updated in the background whenever assessment information is entered or modified.
872 Therefore, the users of IDS are able to see the aggregated outcomes at any stage of the assessment, even before the assessment is complete on some criteria. Relative Weights of Attributes
1 2 Safety of Popula 1 4 Accessibility 1 6 Retnevability 1 1 Safety of the wo 1 3 External risks 1 5 Controllability 1 7 Collective memor
Attributes Fig. 3 Interactive windows for Assigning Weights to Criteria
EEESEaEEJ
ill
mzw.
Father Attribute 1.2 Safety of Population Name. Current Attribute 1.2,1 BqjosuretD radiation Name.
zl
Grade Definition: Assessment standards for the selected grade (by mouse click on one of the grades below) appear hear^J if available. . ' <,
zJ Belief Degree
Hoffi to Assess
Grade Name:
\3
Attribute Definition Provide Evidence Provide Comments
Fig. 4 An Interface for Inputting Assessment Information 0.6000
Rmklmg ®f Alternatives on Overall Risks
O o CO
Group Surface Group Deep Fig. 5 Aggregated Assessment Scores with Surface Option Preferred
873 3.5. View Assessment Results IDS can provide a variety of graphics to support data presentation and decision communication. For example, Fig. 5 displays the aggregated overall risk scores showing that Surface is slightly more risky. Fig. 6 shows the distribution of the group opinions. IDS also provides a quick search function to graphically reveal the high and low risk areas, and the criteria with largest and smallest discriminating power. Dmrttotit&tf Assessment on Qmmti risks 100.00%; 90.00% \
70.00% i
ffl
50.00% < 43.00%;
(3
3000%20.00%10.00%0.00%
.-.r
|
'
.:.-.]
- . •*•«
|
iM2>a
jD 60.00%;
"5
|
V:*I*J
-
1
1
I-.,
" 1 \TLt -i.-'"1; 25
0
75
IX
30
Evaluation grades
Fig. 6 Aggregated Assessment Distribution Showing Composition of Low and High Risk Areas Avwags scores ofei&matlves on Owmit Risks
Rfinxtoy QlfJtemvuves m Overall ifste . Score Range Due to , Global Uncertainty in "1 Safety"
100%
2000
4000
6000
8000
Weight of 5. Economy
Fig. 7 Sensitivity of Preferences to Weights of the Six Main Criteria
3.6. Sensitivity Analysis Sensitivity analysis examines the robustness of outcomes under uncertainties, such as the value ranges in scores and criteria weights due to group opinion diversity. IDS provides of variety of graphs to aid the analysis. For example, Fig. 7 (a) shows that for criteria "5 Economy," no matter how its weight changes, Surface is always slightly more preferred than Deep. Fig. 7 (b) shows the combined effect of many simultaneous unknowns assuming that for Deep, the scores on "1. Safety," including all of its sub-criteria, are unknown. For the other main criteria, the scores are the same as the combined group scores as discussed in Section 3.3. The effects of the uncertainties are shown in the grey area.
874 4. Concluding Remarks This paper described how the multi-criteria decision software IDS can support the analysis of waste depository options, in both individual and group situations. IDS generates a range of outcomes graphically, including the sensitivity of outcomes to various uncertainties both locally and globally. It can search for and display the strong and weak areas of each option or the areas where the two options are most different or similar. Such visualised information can help the decision makers to compare the options side by side from different aspects and help to understand what are the risks involved in each alternative course of actions. Acknowledgements This work is the result of a request by MONA vzw to the Belgian Nuclear Research Centre formalized in contract CO-90 04 1176.00. This work is partly supported by the UK Engineering and Physical Science Research Council (EPSRC) under the grant number Grant No: GR/S85498/01 and GR/S85504/01. References 1. B.G. Buchanan, E.H. Shortliffe, Rule-Based Expert Systems, Addison-Wesley: Reading, MA, 1984. 2. B. Carle and F. Hardeman, Multi-criteria analysis as a support for MONA vzw at the choice between a surface or a deep depository for low radioactive nuclear wastes in municipality of Mol, Scientific Report of the Belgian Nuclear Research Centre sck'cen-blg-994, October 2004. 3. B. Carle and F. Hardeman, A multi-criteria analysis process to clarify preferences in the choice for a low radioactive waste repository in Belgium. 14th SRA-Europe Annual meeting , Como, Italy. Book of Abstracts, 75-76. 4. R.L. Keeney, H. Raiffa, Decisions with Multiple Objectives, Cambridge University Press, 1976. 5. R. Lopez de Mantaras, Approximate Reasoning Models, Ellis Horwood Limited: Chichester, England, 1990. 6. B. Meus and H. Ceulemans, MONA, Public participation in the siting of a LLW repository in Mol, Belgium, Proceedings oflCEM '03: The 9th International Conference on Radioactive Waste Management and Environmental Remediation, September 2 1 - 2 5 , 2003, Examination School, Oxford, England. 7. Saltelli, S. Tarantola and K. Chan (1999). "A quantitative, model independent method for global sensitivity analysis of model output". Technometrics, 41 (1), 39-56. 8. G.A. Shafer, Mathematical Theory of Evidence, Princeton University Press, 1976. 9. J.B. Yang, D.L. Xu, On the evidential reasoning algorithm for multiple attribute decision analysis under uncertainty, IEEE Transactions on Systems, Man and Cybernetics Part A: Systems and Humans 32 (2002), 289-304.
MODEL OF FUZZY EXPERT SYSTEM FOR THE CALCULATION OF PERFORMANCE AND SAFETY INDICATOR OF NUCLEAR POWER PLANTS KELLING CABRAL SOUTO 1 Programa de Engenharia Nuclear, Universidade Federal do Rio de Janeiro COPPE/UFRJ, Caixa Postal 68509 Rio de Janeiro, RJ, 21945-970, Brasil ROBERTO SCHIRRU COPPE/UFRJ, Laboratorio de Monitoracao de Processos, Caixa Postal 68509 Rio de Janeiro, RJ, 21945-970, Brasil This work develops the basis for a expert system, which is able to infer about a generic structure of indicators, in such a way that it monitor, measures and evaluates questions related to project, operational safety and human performance according to politics, objectives and goals for the nuclear power plant Angra 2. This structure organized in graphs and in a context of objects orientation, represents the knowledge of the expert system and is mapped out in a fuzzy concept. The inference engine is of a backward chaining type associated to a process of search in depths, in such a way that it establishes the representative status of the plant, making possible to analyze and manage of its mission and situation.
1. Introduction Monitoring a nuclear power plant in order to promote a secure operation is the objective of all involved in this industry However, it's considered a critical process, once the number of variables to be continually observed is big as well as the difficulty in assuring performance with safety above anything else. The operational experience, developed for the past 30 years, has lead the industry to understand that safety is assurance of production and to identify a set of characteristics a nuclear power plant must have in order to operate safely. The present challenge is then, to measure these characteristics. Such a challenge made the so called performance indicators as indices to monitor, measure e evaluate these characteristics come to pass, making it possible to manage the
'Supported by FAPERJ (Fundacao Carlos Chagas de Amparo a Pesquisa do Estado do Rio de Janeiro).
875
876 plant, analyze the business activities, as well as lead to decisions in strategic planning. This way, several works related to performance indicators have been published. The International Agency of Atomic Energy (AIEA) presents a general guide of indicators [1] which suggests actions, conditions e procedures in order to reach the safety requests, as well as assure a high performance rate. This guide gives a generic structure for identification and organization of the performance indicators directly related to the requested safety characteristics, serving as an instrument to compare the indicators to the goals and objectives identified to evaluate positive and negative performance aspects. However, the concept in this guide goes beyond simple treatment of the indicator's manipulation. The AIEA states that the numerical value of any indicator may not be meaningful if taken solely into consideration, but may be enhanced when taken together with other indicators. It permits an analysis of macro problems of the plant and consequently the establishment of its representative status. The Institute of Nuclear Plants Operations (INPO) [2] also presents works related to performance indicators, and identifies a valuable set of indices. In this context, the making of an intelligent system in real time was taken into consideration, which would be able to infer a generic structure of indicators, build with a specialist and according to the concept suggested by AIEA and INPO, in order to establish the representative status, in colors, of the nuclear power plant Angra 2 making an analyses of the management of the mission and its situation possible. This system was supported by expert systems theory [3] and fuzzy logic [4], for they permit the elaboration of systems able to reason, from the usage of knowledge about the specific domain of a problem with an effective presentation of reality in relation to the human reasoning behavior. It refers to a expert system that works with the basic compounds of knowledge and inference engine and which uses fuzzy information, once it permits the treatment of uncertainty and inexactness using a set of membership functions and fuzzy rules, to reason about the data. About the fuzzy rules leading directly to the establishment of the color each indicator, they are produced by the system through four proposed models: conservative, proportional, deterministic and specialist. The logical structure of the expert system, called inference engine is of backward chaining type [3] associated to a process of search in depths [3], structured in order to make inferences about a set of fuzzy indicators.
877 With the aim of test the functions of the developed system, an environment of tests was implemented trying to show the viability of development of a expert system together with a well structured and mapped out in fuzzy form set of indicators, as a tool for the monitoring, management and analysis of several parameter of the plant. Fulfilling the needs of the nuclear area concerning managing models that permit the strategic planning of the business. 2. Model of Fuzzy Expert System 2.1. Generic Structure of Indicators Concerning the performance indicators, a generic structure of these indices was built with the help of three information sources: the guide established by the AIEA, the suggested indicators from INPO and the knowledge of the Angra 2 specialists. The development of de indicators structure started when the main concept was taken, the nuclear power plant safety performance. AIEA states that in order to obtain a high level of safety operational performance it is necessary to interact design, safety operation and human performance. This concept is presented as a main indicator and worked in a highest level in the hierarchical structure, as it is shown in Figure 1. In the next level, safety characteristics were established and mapped out by the following key indicators: Plant operates safely; Plant operates in low risk; and Plant operates with a positive safety attitude. Three other levels of indicators (Figure 1) are suggested by the guideline, expanding the structure up to a level possible to be measured through the general indicators (concepts that express the key indicators of the above level), strategic indicators (detail the general indices and link to the indicators in the same level) and the specific indicators (represent the last structural level and are the measurements of the plant).
Safety Operational Performance
4... Keys Indicators
j
:!::;::::;;:;:;:::;:::t:;:::::::::::::;:::!:;:::::::::::; General Indicators
•
:!;::::i:i:::!::::::i:iir::i:::: Strategics Indicators
;
Specifics Indicators Figure 1. Generic structure of indicators recommended by AIEA.
The indicators structure in this work, up to the strategic level was constructed exactly equal to the one suggested by the Agency, whereas the specific level presents the particular indices of Angra 2 power plant, done with the help of specialists and INPO suggestions. 2.2.
Knowledge Representation
The generic indicator structure that is part of the system knowledge was organized in a graph, what naturally leads to the concept of object classes and consequently to the establishment of knowledge presentation by the use of general class rules, based, therefore, on the Object Orientation theory [3], in which each indicator (node) is treated as an object orientation paradigm, getting its characteristics and methods Generically, two kinds of classes were created, in which the indicators are set according to their functions. These classes are: sensor nodes (represent the specific indicators responsible to the acquisition of the plant data) and connector nodes (represent the other indicators of the generic structure). They are father nodes, which relate their sons by means of an and type arc. Besides the general class rules, fuzzy rules are used, from the fuzzy logic technique. These rules are directly addressed to the establishment of each indicator color (status), given by the system for four proposed models. 1. Conservative model: gives the fuzzy rules (If-Then), in which the preceding structure (If) is the color's combination of the sons indicators and the following one (Then) is the result of the most dangerous color among the colors in the preceding one.
2.
Proportional model: the preceding of the rule is obtained as described in the conservative model and the following one is the result of a conservative proportion among the previous colors. 3. Deterministic model: the preceding is obtained as in the previous models and the consequent part is given from the application of a simple numerical procedure (weighted mean). 4. Specialist model: the preceding is obtained the same way it is in the previous models, whereas the following one is the result of the indication of a specialist. The use of the fuzzy logic in this work, concerning the knowledge presentation, is due to the possibility of establishing performance zones (fuzzy sets) for each of the indicators values, considering the inherent uncertainties of the procedure. It also permits to associate color to these performance zones what makes easier to have a graphic view of each plant indicator status and to relate them. The used color were: green, white, yellow and red, as it is recommended by AIEA, which represents a scale of performance that ranges form good to unsatisfactory, respectively. 2.3. Inference Engine The inference engine developed for the system is of a backward chaining type associated to a search in depth, what makes it possible to get a chaining process backwards and in depth, beginning with evidence and getting to a conclusion. Its procedure is sketched as it follows: Let it be the graph presented in Figure 2 an indicator structure form which is intended to obtain the indicator A color.
toot B
•Jul.
ka
Figure 2. Example - indicator structure.
The inference engine starts the process by the A indicator trying to establish its color. However, it is necessary to establish the colors of the sons B and C. As they are specific indicators and belong to the sensor node class, they get
acquisition data right from the plant, being, therefore, the determination of its colors done only by data reading and their fuzzification (fuzzy logic procedure that transforms a numeric input into a membership of the fuzzy set) of the same in the fuzzy sets of each color indices. Once the color in B and C nodes is established, it is necessary to give the fuzzy rule(s) to indicator A, according to its kind of model. This way, let A present the conservative model, so the preceding of the given rule will be the result of the combination of the B-C colors and the following will be the most dangerous color among the color in the preceding ones. With the B and C color, the A fuzzy rule(s) and using the general class rules (If color node B and color node C Then gives color node A) it is possible to determine A color. It is only necessary to start the fuzzy rules, and then defuzzificate (fuzzy logic procedure that transforms the output fuzzy set into a numeric value - MoM method [4]), the result of these rules into a numeric value and, at last, use this output value as the input for node A. This input will be fuzzificated and will give the color of this indicator The inference engine was programmed in CLOS (Common Lisp Object System) [5] which working process is given mainly through the basic definition (GOAL) and two main methods (EVALUATE and PROCESSING-MODEL). Further details of the structure and functioning of the inference engine and the format of the fuzzy rules and general class rules, are described in the reference [3]. 3. Fuzzy Expert System Implementation The basic definition developed was called GOAL function implementation can be seen below:
and it
(defun goal (track) (unless (null track) (progn (setf nodo (car(last track))) (nome nodo) (avalia nodo) (cond ((equal (estado-cor-pertinencia nodo) '(Null)) (setf track (append track (busca-nodo-pelo-nome (busca-profundidade-esquerda nodo))))) ((not(equal (estado-cor-pertinencia nodo) '(Null))) (progn (unless (null (cdr track)) (atribui-estado-filho (first (cdr (reverse track))))) (setf track (reverse (cdr (reverse track)))) ))) (goal track) ) )
881 This function manages the system inference process about the indicators structure, establishing the color of each node running the rules, in a backwards chaining process. The GOAL function gets to its operation through the EVALUATING method that runs the rules, when called by each node, determining the color of each indicator. During this process, EVALUATING starts another method, called PROCESSING-MODEL, that permits the production of essential fuzzy rules in the production of the node color. The running of the fuzzification and defuzzification procedures, characteristics of the fuzzy logic theory, and used in the establishment of the indicators status are part of the EVALUATING method, as it is described in item 2.3. Right after the indicator structure if defined and the fuzzy expert system is implemented, an environment for test is developed, with the aim of verify and valid the functioning, concerning viability and approach, as computational performance. The test are now running and the methodology use to evaluate the presented results of the system for the simulations taken are being based on the comparison of given results to the desired ones, as well as their analyses by specialists. As a matter of illustration, Figure 3 shows the output presented by the test environment of the fuzzy expert system for the example given in item 2.3. The values 5 and 6 were used as inputs of B and C indicators, respectively, and the conservative model was used for given the fuzzy rules in node A. It is important to notice the presence of a graph presentation with the node reference A color (status) and its indication of tendency. iiJiiiJimjui.i.iJBriaai j f" N o d o R a l z -
EstadoNodoRalz-
gwm
f Referenda r^"*
Fttridp Nutto Rafz ~
AitAlise Estado Nods Rate
1
"VT-" A - AMARELO '- -.OB-AMARELO ' '.• DC-VERDE
Figure 3. Result of the fuzzy expert system for the example given in item 2.3.
882 Using the structure of indicators constructed to Angra 2, the status of the nuclear power plant was investigated in January/2005, where it was stopped with its energy production and May/2005, when reconquested its normal process. The results showed by model of fuzzy expert system, were: red and yellow, respectively, as soon as the whished, since in January the nuclear power plant was prejudiced its main activity and in May it was with reflexes of the stopping period. Further details of the constructed model and of the tests realized are described in the reference [3]. 4. Conclusions Consistence and performance presented by the intelligent system in the initial test, permit to come to the following conclusions: It is viable to develop a expert system that uses fuzzy knowledge to manage measure and evaluate a nuclear power plant from a well defined structure of performance and safety indicators. It also presents as an efficient help managerial tool to the strategic planning of a nuclear plant. The combination of the expert systems technique with the success of fuzzy logic creates a progress in technology for this kind of system, for it permits better presentation of reality in relation to the reasoning human behavior Modeling the object orientation rules makes it possible to have an efficient implementation strategy for the project of intelligent systems. Acknowledgments I'd like to express my gratitude to the specialists of Angra 2 nuclear power plant for all the help in elaboration of the indicators structure and validation in the results presented by the expert fuzzy system developed. References 1. IAEA-TECDOC, "Operational Safety performance Indicators for Nuclear Power Plants", TECDOC-1141, (2000). 2. INPO, "Indicators of Changing Performance", INPO 01-005 (2001). 3. K. C. Souto, "Sistema Especialista com Logica Nebulosa para o Calculo em Tempo Real de Indicadores de Desempenho e Seguranca na Monitoracao de Usinas Nucleares", Tese de D.Sc, COPPE/UFRJ, RJ (2005). 4. K. M. Passino e S. Yurkovich, "Fuzzy Control", pp. 21-103 (1998). 5. Manual Allegro CL (Versao 5.01 for Windows) - Allegro CL/Dynamic Object Oriented Programming System, Franz Inc.
ARTIFICIAL INTELLIGENCE APPLIED TO SIMULATION OF RADIATION DAMAGE IN FERRITIC ALLOYS* ROBERTO P. DOMINGOS(1), GENNARO M. CERCHIARA(2), FLYURA DJURABEKOVA(1>, LORENZO MALERBA(1). (1)
Reactor Materials Research Dept, SCK-CEN .Boeretang 200 Mol, 2400, Belgium,
(2)
Dipartimento di Ingegneria Meccanica, Nucleare e della Produzione (DIMNP), Universita degli Studi di Pisa, Via Diotisalvi n 2 - CAP 56100, Pisa - Italia.
Molecular dynamics (MD) is the only computational tool that, without any approximation, can be applied to study atomic level phenomena involving radiationproduced point-defects and their evolution driven by mutual interaction. Kinetic Monte Carlo (KMC) is suitable to extend the timeframe of an atomic level simulation to simulate an irradiation process, but all mechanisms involved must be known in advance, e.g. from MD studies. The present work is an effort to integrate artificial intelligence (AI) techniques, essentially an artificial neural network (ANN) and a fuzzy logic system (FLS), in order to build an adaptive model capable of improving both the physical reliability and the computational efficiency of a KMC model fully parameterized on MD results. 1. Problem Statement The integrity of reactor pressure vessels (RPV) of current nuclear power plants can be compromised due to changes in the mechanical properties of steels caused by the accumulation of radiation induced defects (essentially vacancies - i.e. empty lattice sites - and interstitial atoms - i.e. atoms out of their lattice site). To improve the understanding of the creation and evolution of these defects and their impact on the mechanical properties of RPV steels, computational atomic scale models are built [1]. Molecular dynamics (MD) [2] is a statistical mechanics numerical technique capable of providing a precise description of how radiation-induced defects behave in a certain material at the atomic level. However, the MD timespan is limited to the scale of tens of nanoseconds, so that slow diffusion driven phenomena cannot be studied with this tool. Kinetic Monte Carlo (KMC) techniques [3], parameterised on MD results, can be used instead to extend the timeframe of the simulation. In particular, vacancy diffusion driven phenomena can be studied using atomistic KMC (AKMC) models [4], where atoms of any chemical species are distributed on a rigid lattice and the physical process determining the This work is supported by the EC-funded FP6 Integrated Project PERFECT (Prediction of Irradiation Damage Effect in Reactor Components). 883
884 evolution of the system is the exchange of position between an atom and a vacancy (diffusion jump). In this work the physical system taken as a reference is Fe-Cu, due to its relevance in connection with the problem of RPV steel degradation [5]. The basic formula on which the AKMC method is based is the Arrhenius expression for the frequency of vacancy jumps, r=v-exp(-£,0/ftfl7), where kB is Boltzmann's constant, T the absolute temperature, v an attempt frequency, here considered constant (the adopted value is 6.010 12 s"', close to the Debye frequency for Fe), and Ea is the energy barrier to be overcome for the exchange of a vacancy with a neighbouring atom to occur, which is an a prori unknown function of the local atomic configuration. Up to now, heuristic formulae, which explicitly correlate the migration energy to the initial and final energy state, before and after the jump [4], have been used to determine it. This procedure, however, in addition to being approximated, requires multiple total energy calculations at each Monte Carlo step and totally neglects the effect of atomic relaxations. Given a good interatomic potential, these barriers can be exactly calculated by MD, allowing also for atomic relaxation, but the necessary computing time makes it impossible to think of launching at each AKMC step a multiple, full-MD barrier calculation. The objective of the present work is to show that it is possible to use artificial intelligence (AI) to substitute, after appropriate training, full MD calculations in AKMC simulations. In this way the AKMC simulation will be at the same time computationally more efficient and physically more accurate. The prerequisite to any method to solve our problem is a numerical way of defining the local atomic configuration, as a function of which the energy barrier should be provided. This has been done by associating to each atomic site around the exchanged vacancy (V) an integer number (0 for a V, 1 for an Fe atom, 2 for a Cu atom). If only the first nearest neighbors (Inn) of the V and the jumping Atom (JA) are included, we are working in the Inn approximation and the string will contain (being in a body centred cubic, bcc, lattice) 14 positions (see Figure 1). Given a coding system for the local atomic configuration, the simplest solution to the problem is to pre-tabulate the energy barries corresponding to all possible configurations. However, it is reasonable to produce these tables only when the number of configurations (possible table rows) needed to approach the physical problem is limited, i.e. so long as the size of the tables remains manageable. Note that in Inn approximation the total number of configurations is 3 M : although this number can be reduced by about a factor six allowing for symmetries, it gives rise to a fairly large table. In addition, the Inn approximation may turn out to be insufficient from the physical point of view.
0 AT
1 A
2 Bl
3 B2
4 B3
5 CI
6 C2
7 C3
8 Dl
9 D2
10 D3
11 El
12 E2
Figure 1 - Coding for configurations in Inn approximation
13 E3
14 F
885 An alternative method should be to find patterns in the dependence of the energy barriers on the configuration and use a model instead of tables. From the mathematical point of view these barriers are a discrete function of 14 integer variables. No classical regression approach has any hope of being able to interpolate such a function and even less to provide an extrapolable model. It is in this framework that AI has been considered to provide a solution to the problem. 2. Model design and application The problem of substituting an AI system to the tables of energy barriers versus configurations, including the case where these tables cannot be produced for all possible configurations (beyond Inn approximation), passes through two steps: (i) classification of the configurations into groups providing energy barriers in a certain range and (ii) actual regression within those groups. This was made necessary in particular by the fact that the sequence of the configurations according to which they have been tabulated is arbitrary and does not have any reason to be considered physically meaningful. The AI system first considered to solve the problem consisted of a hybrid model, composed of an artificial neural network (ANN) [6] - together with a genetic programming regressed model [7], The reason for trying to apply symbolic regression comes from the fact that it offers the unique benefit of giving the possibility to human beings to get insight and interpretability of model results. This hybrid model was supposed to learn first how to classify atomic configurations and subsequently how to obtain, by means of evolutionary computation, an adequate regression model to describe the migration energy using a non-linear equation inside each class. Yet, the performance of the regressed equation turned out to be insufficient, despite our effort to improve the genetic programming model. On the other hand, the use of ANN for the classification and eventually for interpolation provided a reasonable performance. It was hence decided to build the AI system architecture by substituting the regressed model by a specialized ANN, extensively applied to interpolation tasks (Generalized Regressed Model), as shown in Figure 2. classification
regression
model
Figure 2 - AI System Architecture The data used for training and validating the AI architecture in the present work was taken from two tables, produced by MD using a recent interatomic potential for the Fe-
886 Cu system [8]. These tables contain, when combined, all the configurations that a single vacancy may encounter during Cu precipitation due to thermal annealing. Of all data, 15% was used for training, 20% for testing and the rest for full validation. For the classification task the used architecture was a multilayer perceptron and the applied learning process was back propagation. The performance of this was considered acceptable and is presented in Figure 3 for a specific subset of data. In this figure all data are production data, i.e. data corresponding to cases never seen before by the system. It can be appreciated that the ANN succeeds in correctly classifying a given configuration inside the corresponding energy range. In the worst case it classifies a configuration in a class immediately above or immediately below the correct one, but this behavior does not compromise the performance, since a class immediately nearby the correct class represents a close energy range, which has little effect in the interpolation process. Inside each energy range a generalized regressed neural network was trained to correctly map input configurations into migration energies. The data set for this ANN was selected in the same fashion as for back propagation. In Figure 4 a graph for the performance of the interpolation task in a low energy range (production data) is shown for both tables. The maximum absolute error in the whole energy range never exceed 0.08 eV, with a mean absolute error below 0.03 eV. It must be emphasized that this performance was achieved for configurations never seen before by the network. To have an idea of how the obtained error bar may affect the value of the parameters defining diffusion processes, a series of simulations of Cu-V pair diffusion in the 250-550 K temperature range were performed, using the AKMC code with both the migration barriers predicted by the ANN in the Inn approximation and those tabulated (denoted as C05). The result is shown in Figure 4(c) and demonstrates that there is excellent agreement between both simulation approaches. The advancement of Cu precipitation was also seen to be equally described by tables and ANN.
Class calculated o
1
225
449
Class predicted
673 897 1121 1345 1569
Figure 3 - Classification of configurations according to energy range 3. Fuzzy Model While developing the ANN, a fuzzy logic system (FLS) coupled to the ANN has been developed to predict the expected error for a given configuration. The FLS is essential to assess the uncertainty inherent to the use of the ANN and to evaluate the "risk" associated with its use instead of the tabulation, so as to be able to build an integrated system, capable of feedback. This will become necessary to work beyond the Inn approximation. Since in those cases the full table containing data for all possible configurations cannot be produced, each time the ANN is called to provide the required energy barrier the FLS will evaluate the expected error; if this is not acceptable, the code
887 will use tabulated data instead (if available) or will launch an ad hoc MD simulation whose result will be stored. The additional tabulated data thereby created will then be used for a further round of training of the ANN, until the convergence is found acceptable, with the backup of tabulations for critical configurations, if needed. The ANN-FLS for error prediction has been developed for all Inn approximation cases. The error on the predicted energy is first calculated for the tabulated set using the ANN results compared to the tabulated data, as shown in Figure 4(b). The first step for its realization is the determination of the important parameters in terms of type and number: these are the factors that govern the problem which are to be described through the membership functions [9], defined in the continuous interval [0,1] and chosen to be triangular in the present case. For our application the FLS provides one output (the error) and is built from four input data, namely, with reference to Figure 1: type of atom in A position, type of jumping atom, position of first copper atom in the string, energy barrier value. The first and second input give four possible combinations; the third input is a natural number between 1 and 13; the fourth input is built calculating the ranges of values in which the energy changes meaningfully, as shown in Figure 5 (s = small; p = positive; n = negative; a = average so vs = very small). The other essential element of the FLS is the set of linguistic rules (e.g. Table 2), which are built by counting the frequencies of event occurrences [10,13] obtained from the analysis of the error trend initially produced as a function of the above mentioned parameters, which completes the "expert opinion" used in the focusing of the main parameters[l 1,12].
1 41 81 121 161 201 241 281 321 361401 441 481 521 561 601 641 681 721 Error /En*rgy J.5E-02 2.0E-02 • 1.5E-03 i
<•:«*«"
•i,0E-03 -•
•. • £4v&^|yKSfef4^*'" : *"'"
•".•".'" .''""'* ''v ''•<•¥'' '-*'• * X " ' < ' * ^ W X 5 ' ' >'•"'• •
.-l,GF.-02 -
'
'
'
• '*•
. . . . - •
(b)
••••
.-. ,
-t,.<E-«2 * ,-MI-O:
0,20
1
1
1
.
.
>
.
1
.
.
0,2S
0,30
'3,3!
0,40
6,45
0,J0
0,55
0,60
0,61
0,70
.
8,7} {,vj
888
Diffusion coefficient • -H
(.05 niip -OOJ'ieV S N 1 mi.e
^
0 u.< i.-V
-
(C) •<;V-
V
Q
SD
•15
•10 1
l/k B T ( e V ) Figure 4 - (a) Predictions of the AI system versus reference; (b) absolute error versus energy (in red the error trend function); (c) Comparison of results for the diffusion coefficient of a Cu-V pair obtained by AKMC simulation using the barriers predicted by the ANN and the tabulated ones. Table 2 - Example of fuzzy rules event tree. A-atom
J-atom
•"u-position
Low
Iron
Iron
Average
Energy ws vs s as a ab b vb wb
Error ws vvs vvs vvs ws vs vsp vsn sn
vvs vs s as a
vvs vvs ws vvs vs
ay
fA'.-lBlt'B'ilrt'Sr
Figure 5 - Fuzzy input "energy-level" and its partitions.
Figure 6 - Error vs. energy level and Cu position (with J = Cu and A = Fe).
889 Finally, by elaborating the data with graphical inference methods [13] we obtain the error trend as a function of the main parameters, as shown in Figure 6. 4. Summary, conclusions and outlook In order to simulate Cu precipitation in Fe using an AKMC approach a correct evaluation of the energy barriers for the vacancy exchange with an atom as functions of the local atomic configuration is needed. This can be done by MD, but the associated computing time prevents this solution from being systematically used on-the-fiy during the AKMC simulation. Tables containing all possible cases can be built, but they become quickly too large to be managed. An AI system, whose architecture is based on an ANN, has then been developed, which demonstrated to be able, after being trained on a subset of data (15% of a complete tabulation) to correctly classify atomic configurations according to the corresponding energy barrier and to find the correlation between atomic configurations in the lattice, so as to predict the value of the energy barrier itself for configurations never seen before, with a mean absolute error of 0.03 eV and a maximum of 0.08 ev. A FLS for the prediction of the error, to be used for risk assessment and for the construction of a system with feedback, has also been developed; the error is considered in this scheme as a multifunction output dependent on energy range, type of atoms and position in the array. Both the ANN and the FLS modules are now being integrated into the AKMC code to form a single package and the subsequent objective is to improve the system to work beyond Inn approximation, with the smallest possible training set. This work demonstrates that adequately trained AI systems can be of help to find patterns in magnitudes strongly dependent on the local atomic arrangement, which in principle would require in each case a dedicated MD study and for which it is difficult find a generalisable physical law based on first principles. It is thus believed that similar schemes can be used also for other problems of radiation damage and, more generally, materials science. References [ 1 ] See e.g. https://www.fp6perfect.net/perfect [2] F. Ercolessi "A molecular dynamics primer", available at http://www.fisica.uniud.it/~ercolessi/md/rnd/ [3] K.A. Fichthorn and W.H. Weinberg, J. Chem. Phys. 95(2) (1991) 1090. [4] C. Domain, C.S. Becquart and J.-C. van Duysen, in: "Microstructural Processes in Irradiated Materials", Editors: S.J. Zinkle, G.F. Lucas, R.C. Ewing and J.S. Williams, Mater. Res. Soc. Symp. Proc. 540 (1999) 643. [5] G. R. Odette, Scripta Met. 11 (1983) 1183. [6] S. Haykin "Neural Networks: A Comprehensive Foundation", New York: MacMillian(1994). [7] J.R. Koza "Genetic Programming - On the Programming of Computers by Means of Natural Selection", Cambridge (1992). [8] R.C. Pasianot and L. Malerba, submitted to Phys. Rev. B; see also L. Malerba and R.C. Pasianot, SCK-CEN External Report ER-6, February 2006. [9] Zadeh L.A "Fuzzy sets as a basis for a theory of possibility", F.S.S., 1978.
890 [10] D.Dubois, H.Prade "Unfair Coins and Necessity Measures: Towards a Possibilistic interpretation of Histograms"- England Publishing Company, F.S.S. 10 (April 1983). [11] Glenn Shafer "A mathematical theory of evidence"- Princeton University Press 1976. [12] Didier Dubois and Henri Prade "Possibility Theory, an Approach to Computerized Processing of Uncertainty"- Plenum Press, New York - 1988. [13] T.J. Ross "Fuzzy logic with engineering applications" - McGraw-Hill, New York, 1995. [14] S. Ferson et al. "Constructing Probability Boxes and Dempster - Shafer Structures" Sandia National Laboratories - SAND2002-4015 Jan 2003.
PARTICLE SWARM OPTIMIZATION APPLIED TO THE COMBINATORIAL PROBLEM IN ORDER TO SOLVE THE NUCLEAR REACTOR FUEL RELOADING PROBLEM ANDERSON ALVARENGA DE MOURA MENESES Programa de Engenharia Nuclear (COPPE/UFRJ), Universidade Federal do Rio de Janeiro, Caixa Postal 68509, 21945-970, Rio de Janeiro, RJ, Brasil, email: ameneses@con. ufrj. br ROBERTO SCHIRRU Programa de Engenharia Nuclear (COPPE/UFRJ), Universidade Federal do Rio de Janeiro, Caixa Postal 68509, 21945-970, Rio de Janeiro, RJ, Brasil, email: schirru@lmp. ufrj. br
This work focuses on the use of the Artificial Intelligence metaheuristic technique Particle Swarm Optimization (PSO) to optimize a nuclear reactor fuel reloading. This is a combinatorial problem, in which the goal is to find the best feasible solution, minimizing a specific objective function. However, in the first moment it is possible to compare the fuel reloading problem with the Traveling Salesman Problem (TSP), since both of them are combinatorial and similar in terms of complexity, with one advantage: the evaluation of the TSP objective function is more simple. Thus, the proposed method has been applied to two TSPs: Oliver 30 and Rykel 48. In 1995, KENNEDY and EBERHART presented the PSO technique to optimize non-linear continuous functions. Recently some PSO models for discrete search spaces have been developed for combinatorial optimization, although all of them have different formulation from the one presented in this work. Here we use the PSO theory associated with to the Random Keys (RK) model, used in some optimizations with Genetic Algorithms, as a way to transform the combinatorial problem into a continuous space search. The Particle Swarm Optimization with Random Keys (PSORK) results from this association, which combines PSO and RK. The adaptations and changings in the PSO aim to allow the appliance of the PSO at the nuclear fuel reloading problem. This work shows the PSORK applied to the TSP and the obtained results as well.
1. Introduction One of the problems that stimulate new techniques research in Nuclear Engineering is the nuclear reactor fuel reloading optimization problem. The reloading operation substitutes for fresh nuclear fuel assemblies part of the 891
892 burned nuclear fuel assemblies, which is taken off the reactor core. Some specific criteria are followed to place fresh fuel assemblies, as well as to reorganize the old ones which remain in the core, in order to optimize the fuel burn up. Thus, it is a combinatorial problem, which objective function to be optimized depends on many factors. The multiobjective characteristic, the great number of feasible solutions and the non-linearity of the problem difficult its optimization. The reloading objective function is evaluated with specific codes of Reactor Physics and they take a considerable amount of computational cost. However, we can use the Traveling Salesman Problem (TSP) for a previous analysis of the new technique, since both a 30~50 nodes (cities) TSP and the Reloading (with 1/8 core simmetry) are equivalent combinatorial problems with no repetitions allowed on its feasible solutions and similar in complexity, with one advantage: the evaluation of the objective function of the TSP is more simple. The TSP is a NP-Hard combinatorial problem [1, 2] whose feasible solutions are discrete sets, corresponding to a permutation of the sequence of visited cities (nodes). The objective is to find the shortest length itinerary, passing throughout all the cities only once and turning back to the first one. The PSO technique was first presented at 1995 by KENNEDY and EBERHART [3,4] and the algorithm follows a collaborative search model, taking into account the social aspects of intelligence. According to this model, many applications to optimization problems of continuous search spaces have been developed [5]. There are some contributions for discrete search spaces as those of KENNEDY and EBERHART [6]; WANG et al. [7], showing results for a 14 nodes symmetrical TSP; CLERC [8], who adapted PSO to solve a 17 nodes TSP; YIN [9], who uses PSO to obtain optimal approximations of digital curves; and SALMAN et al. [10], who applied PSO to the Task Assignment Problem (TAP), obtaining discrete sets as feasible solutions by truncating real numbers. We must say that although PSO is an efficient algorithm to optimize continuous functions, there is no satisfatory model to its discrete version. Here we have associated the PSO philosophy to the Random Keys (RK) model [11] in order to optimize two TSPs: Oliver 30 and Rykel 48. This article has the following structure: section 2 shows TSP formal definition; section 3 explains the PSORK (Particle Swarm Optimization with Random Keys); the obtained results are in section 4 and conclusions and future work proposal are done in section 5.
893 2. TSP Formal Definition TSP is a reference problem in the Computational Complexity Theory [1, 2], considered of difficult solution although having a simple formulation: given a number n S 3 of cities (or nodes) and the distance between them, the goal is to determine the shorter total distance path, visiting each city once and turning back to the first one visited. There are different kinds of TSP. The symmetrical one is that in which given two different cities i and j , the distance dy to go from one to another is the same in the inverse path, it means dy = dp V i, j = 1, ..., n, and it has [(n-l)!]/2 feasible solutions. For an asymmetrical one, dy * djj V i * j and there are (n-1)! different paths. BURIOL [12] presents its mathematical formulation as a minimization problem. 3. PSORK (Particle Swarm Optimization with Random Keys) 3.1. Particle Swarm Optimization (PSO) The concept of Particle Swarm was proposed in 1995 by KENNEDY e EBERHART to optimize non-linear continuous functions [3]. PSO is a search process based on the social learning metaphor. Each individual guarantee success from its own experience, as from the experience of the group, balancing exploration and exploitation. Each particle of a swarm has a position and a velocity. The positions are Xy = (XJI, XQ, ... , Xjn) with i = 1, 2, ..., P, where P is the number of particles of the swarm and n is the number of dimensions of the search space. The position of each particle represents a feasible solution to the problem. Thus, the objective function f(xyt+1) is evaluated at each iteration t+1 and the best individual and global positions are updated. The global best position is gbestj = (gbest1; gbest2, ... , gbestj- The best position obtained by each particle i, we have pbesty = (pbestn, pbestj2, ... , pbestin). The velocities are vy = (vn, vi2, ... , V;,,). The velocity and the position of each particle are updated according to the equations (6) and (7). Vijt+1 = wvij' + cir,' (pbesty - xy') + c2r2' (gbestj - Xj/) and
1
x ^ x y ' + vij* .
(1) (2)
At the right side of eq. (1), the first term represents the influence of the own particle motion, where w is the inertia weight; the second term represents the individual cognition, which depends on the particle's previous behavior; and the third term represents the social aspect of intelligence, based on a comparison between the particle's position and the best result obtained by the swarm. Eq. (7)
894 describes how the positions are simply updated. Both ci and c2 are acceleration constants; ri e r2 are uniform distributed random numbers. The positions and velocities are inicialized randomly at implementation. 3.2. Random Keys (RK) Random Keys model was proposed by BEAN [11], It encodes and decodes a solution with random numbers. These numbers obtained randomly in a (0,1) uniform probabilistic distribution are keys for sorting other numbers, in order to form feasible solutions into a problem. Here we use the Single Machine Scheduling Problem (SMSP) approach [11]. If the key sequence SA = (0,39; 0,12; 0,54; 0,98; 0,41) had been obtained randomly, the resulting decoding would be VA = (2, 1, 5, 3, 4), since 0,12 is the minimum and it is in the 2nd. position in SA; 0,39 corresponds to the first position and so on. Hence the decoding gives discrete sets that can solve the problem. It means that a search on a real continuous space is generating feasible solutions to a combinatorial problem. 3.3. PSORK (Particle Swarm Optimization with Random Keys) SALMAN et al. [10] apllied PSO to the optimization of the Task Assignment Problem (TAP) truncating the components of the positions in order to obtain feasible solution in the discrete optimization. In the TAP case, each feasible solution has the tasks executed by the processors, and repetition is allowed. For example for 5 processors, the list (3, 2, 1,3, 1) representing tasks for each one of them would be a possible solution. The velocities Vy would be obtained by eq. (6) and so the positions xy by the eq. (7), but it would only be a feasible solution if its components had been truncated. Then it would be evaluated by the objective function. For the TSP, since the repetition of cities invalidate its solutions we use RK to generate feasible solutions [13, 14], as figure 1 shows. Position XJI1 —>
4.3
2.3
1.2
2.6
4.2
Velocity VH' -
-0.2
+0.4
+0.2
-0.7
+0.2
4.1
2.7
1.4
1.9
4.4
1
2
3
4
5
New position xu**
(a)
i
(b)
' 1
1
Auxiliar vector uu -
3
4
2
1
5
(c)
Figure 1. (a) Position and velocities for a 5 cities TSP. (b) New positions as random keys, (c) Auxiliary vector indicating the visited cities order, which will be evalueted by the fitness.
895 In short, the main adaptation is the interpretation of the position. It does not represent the order of the cities itself, but the set of keys that allows the decoding of the information acquired along the iterations. Thus, the positions do not need to be rounded or truncated as in the PSO model for TAP [10] for example. The positions vector informations are decoded by the RK and it gives a feasible solution to be evaluated by the objective function. Thus, we obtain a discrete search space (with feasible solutions to the TSP or the reloading problem) from a key-search continuous space, where PSO reaches good results. This method was implemented in order to optimize the TSPs Oliver 30 and Rykel 48, as it will be seen in the next section. 3.3.1. PSORK applied to Oliver 30 The eqs. (6) and (7) are used but pbesty e gbestj are obtained from the auxiliar vectors Uy having the best evaluations (uy* is a feasible solution containing integers from 1 to 30, indicating the order of the visited cities, given by RK); Xy' contains the keys to be decoded by RK. 3.3.2. PSORK applied to Rykel 48 For populations of 80~140 particles, we have implemented some strategies in order to provide diversity for the swarm [13, 14]. These strategies were not necessary with 500 and 1,000 particles and in such cases we have found more regularity in the results. 4. Experimental Results 4.1. Oliver 30 (500 particles) Here we have the best result for swarm with 500 particles (table 1) with the constants w = 0.06; C] = c2 = 0.1. Such constants are not usually found in literature, however, they gave the best results in the various tests. Table 1. Best results with 500 particles for Oliver 30. Iteration
Best Fitness
20 40 60 80
531.59 425.82 423.95 423.73*
Swarm's Average Fitness 988.81 604.92 515.14 476.91
100
423.73*
464.96
* Global minimum [15].
896 The average results for 10 experiments (with 10 differents seeds), but with the same constants w, Ci and c2 are shown in table 2. Table 2. Average results for 10 experiments with 500 particles.
20
564.41
Average of the Avera &e Fitnesses 884.90
40
453.60
614.53
60
443.33
539.60
80
441.63
512.05
100
441.57
502.22
Iteration
Average of the B e s t F i, n e s S es
Swarms
4.2. Rykel 48 (1000 particles) Tables 3 shows the experiment with the best result (constants w = 0.7 and C] = c2 = 1.8). Rykel 48's minimum is 14,422 [16]. Table 3. Best results with 1000 particles for Rykel 48. Iteration
Best Fitness
2,000
15,672
Swarm's Average Fitness 26,035
4,000
15,504
24,792
6,000
15,504
24,347
8,000
15,504
23,886
10,000
15,504
23,497
Table 4 shows the average results for 10 experiments (with 10 differents seeds), but with the same constants w, ci and c2 of the previous table. Table 4. Average results for 10 experiments with 1000 particles. Average of the Iteration 2,000 4,000 6,000 8,000 10,000
Best Fitnesses
17,886 17,235 16,895 16,834 16,794
Average of the Swarms Average Fitnesses 27,253 27,309 27,372 27,307 27,169
897 5. Conclusions and Future Work The applications show that the tecnique reaches reasonable results when applied to TSPs such as Oliver 30 and Rykel 48. The results 423.74 (global minimum) for Oliver 30 and 15,504 for Rykel 48 are satisfatory compared to the ones obtained by Genetic Algorithms (which reaches 423.74 for Oliver 30 and 16,535 for Rykel 48) and Population Based Incremental Learning (PBIL, which reaches 423.74 for Oliver 30 and 15,430 for Rykel 48) [15]. In addition, we can also notice that PSO tends to reach near global optimum results for TSP within relatively few iterations at several experiments, demonstrating its robustness. Figure 2 shows how the average fitnesses decrease within the first 1,000 iterations by using this model (table 4 data). 50000 40000 8
30000
~ b-
20000
v c
10000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Iterations -Average of the Best Fitnesses —•—Average of the Swarm's Average Fitnesses
Figure 2. Graphic for Average results for the optimization of the TSP Rykel 48 (relative to the table 4 data).
Further research on neighboorhood topology, parameters and parallel computing must be done. It is needed a sensibility study in relation to the constants used and to improve the technique in order to apply it to the real-world problem of a nuclear reactor fuel reloading. References 1.
2. 3.
Lawler, E. L., Lenstra, J. K., Kan, A. H. G. R., and Shmoys, D. B. (Org.). The Traveling Salesman Problem: a guided tour of combinatorial optimization. 4a. Ed. John Wiley & Sons, Wiltshire, Great-Bretain (1985). Papadimitriou, C. H., and Steiglitz, K. Combinatorial Optimization. Prentice-Hall, Inc., New Jersey, USA (1982). Kennedy, J., and Eberhart, R. "A New Optimizer Using Particles Swarm Theory", Proceedings of Sixth International Symposium on Micro Machine
898
4. 5.
6.
7.
8.
9.
10.
11. 12.
13.
14.
15.
16.
and Human Science, Nagoya, Japan. IEEE Service Center, Piscataway, NJ, pp. 39-43(1995). Kennedy, J. and Eberhart, R. C. Swarm intelligence. San Diego, EUA: Academic Press (2001). Parsopoulos, K. E., and Vrahatis, M. N. "Recent aproaches to global optimization problems through Particle Swarm Optimization", Natural Computing 1 (pp. 235-306), Netherlands, Kluwer Academic Publishers (2002). Kennedy, J., and Eberhart R. C. "A discrete binary version of the particle swarm algorithm", Conference on Systems, Man and Cybernetics, pp.41044109 (1997). Wang, K.-P., Huang, L., Zhou, C.-G.., and Pang, W. "Particle Swarm Optimization for Traveling Salesman Problem", Proceedings of the Second International Conference on Machine Learning and Cybernetics, Xi'an, 2-5 November 2003, pp. 1583-85 (2003). Clerc, Maurice. Discrete Particle Swarm Optimization, Illustrated by the Traveling Salesman Problem. France Telecom Recherche & D6veloppement (2004). Yin, Peng-Yeng. "A discrete particle swarm algorithm for optimal polygonal approximation of digital curves". Journal of Visual Communication and Image Representation, pp. 241-260 (2004). Salman, A., Ahmad, I., and Al-Madani, S. "Particle Swarm Optimization for Task Assignment Problem", Microprocessors and Microsystems, Vol. 26, Issue 8, pp. 363-371 (2002). Bean, J. C. "Genetics Algorithms and Random Keys for Sequencing and Optimization". ORSA Journal of Computing, Vol. 6, No. 2 (1994). Buriol, L. S., "Algoritmo Memetico para o Problema do Caixeiro Viajante Assimetrico como parte de Framework para Algoritmos Evolutivos", M.Sc. Dissertation, DENSIS/FEE/UNICAMP, Campinas, SP (2000). Meneses, A. A. M., and Schirru, R. "Particle Swarm Optimization Aplicado ao Problema Combinatorio com Vistas a Solucao do Problema de Recarga em um Reator Nuclear". Proceedings of the International Nuclear Atlantic Conference - INAC (2005). Meneses, A. A. M. "Otimizacao por Enxame de Particulas Aplicado ao Problema Combinatorio da Recarga de um Reator Nuclear", M. Sc. Dissertation, Rio de Janeiro, COPPE/UFRJ (2005). Machado, M. D. "Um Novo Algoritmo Evolucionario com Aprendizagem LVQ para Otimizacao de Problemas Combinatories como a Recarga de Reatores Nucleares", M.Sc. Dissertation, COPPE/UFRJ, Rio de Janeiro, RJ (1999). TSPLib. http://elib.zib.de/pub/Packages/mp-testdata/tsp/tsplib/tsplib.html
USE OF GENETIC ALGORITHM TO OPTIMIZE SIMILAR PRESSURIZER EXPERIMENTS DAVID A. BOTELHO Institute de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil PAULO A. B. DE SAMPAIO Instituto de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil CELSO M. F. LAPA Instituto de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil CLAUDIO M. N. A. PEREIRA Instituto de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil and Universidade Federal do Rio de Janeiro - PEN/COPPE/UFRJ, Centro de Tecnologia, Ilha do Fundao, bloco G, sola 206, PO Box 68.509, 21945-970, Rio de Janeiro, Brazil MARIA DE LOURDES MOREIRA Instituto de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil
ANTONIO CARLOS DE O. BARROSO Instituto de Pesquisas Energeticas e Nucleares - IPEN/CNEN, Avenida Professor Lineu Prestes 2242 Cidade Universitdria, Sao Paulo, Brazil
A genetic algorithm (GA) is used to search the parameters of a scaled experiment similar to a reactor pressurizer. The dimensionless similarity numbers are multipliers in the nondimensional conservation equations of the pressurizer. A "fitness function" of the similarity numbers evaluates the quality of the parameters of the scaled experiment in relation to the full-size pressurizer. Once the parameters are defined, the operation of experiment is verified comparing the non-dimensional pressure of a typical transient in the pressurizer and experiment.
899
1. Introduction The main function of a nuclear reactor pressurizer (a vessel containing liquid water and steam volumes, connected to the primary cooling system) is to absorb and control volume changes of the cooling fluid due to operation transients [1]. Pressurizers can be tested in small-scale similar experiments with reduced pressure. Similar systems are those represented by the same dimensionless equations, which contain the same similarity numbers, that represent important thermal-hydraulic processes. The similarity numbers are used to optimize the parameters of the scaled experiment. Reference [2] presents a derivation of the similarity numbers of a pressurizer and the definition of the symbols used here. 2. The Dimensionless Equations of a Two-Region Pressurizer With the choice of some "reference parameters" (with subscript "0"), the procedure to obtain the dimensionless conservation equations of a pressurizer generates a certain dimensionless "similarity numbers" (with superscript "0"). The respective equations of mass conservation in the vapor and liquid volumes, in dimensionless form, are:
±-,{m'>W'F1-W'RO - {NW^S'^rT:) at
^W=Ko dt
(l)
hf„
+ {NKc)h;vcS'v¥
T:)
-W'F,+msurge
(2)
hfg
The respective equations of energy conservation in the vapor and liquid volumes, in dimensionless form, are:
at
-W'm [{NWh0RO}+(NWh°RO\(p'-\)] (3)
- hwhlc \ + {Nmlc \ (p' - I)]^ESEZZL) h
Jk
7(mX)= W'm
off
[ K ) + (AW^K//-!)]
- ^.[(iV^Xw^.Kp'-o]
v*
( 4) dt'
+ (^)(l-/)+(^,°)f(l-/K where (NQ°h)=
Q°h/m^
2.1. The Local Phenomena and Corresponding Similarity Numbers The main local phenomena in a pressurizer are the wall condensation of steam and the rainout (condensation) of liquid drops in the steam volume, which ultimately fall into the liquid volume. And flashing (evaporation) in the liquid volume producing bubbles of vapor that rises into the vapor volume. The respective similarity numbers for these processes are:
N0={NW:c)={NQ°h)(NQl)
(5)
rr
(6)
un xl/4 r
^2=Ov05r)=-
4//° it0 r°
X
A°k°T0/L0^
(7)
\ ( A
^ = {^K.)=A/7
^
oL"o
/>/
(8)
<x
2.2. 77»e Pressure and Control Similarity Numbers The pressure number and the integral and proportional controller numbers that appear in the dimensionless energy conservation equations in the pressurizer are: - 0 0
(9)
^5=K0)-^fe)
(10)
N6={NK°p)=Kp{NQ°h)
(11)
2.3. The Enthalpy Transport Similarity Numbers A first order expansion of enthalpy about pressure is used to transform the enthalpy transport by local phenomena into the pressure variable. The respective rainout, flashing, and wall condensation enthalpy transport numbers are:
(12) h
J*
(13)
h
*{dP Jo
(14)
dh\ (15)
dp
P (16)
(17) h
Jk
dhf dp Ys
N„=(NWh°wc\ = {NW°cW
Nu =
{Nm°wc\={NW°cW
(18) /o
fdh\
dp
(19)
Jo
3. Genetic Algorithm Optimization Genetic algorithms (GA) [3] are optimization methods inspired in the evolution theory, in which artificial chromosomes composed of binary numbers (genes) encode solution candidates. In this work, the chromosome encodes the list of search variable. In the GA, initially a population of chromosomes is randomly generated. Then, guided by a fitness function (the objective function of the optimization problem), the evolution takes place by simulated natural selection, crossovers and mutations. By such operations, the solution candidates (chromosomes) are improved from generation to generation. One of the first applications of GA to design a thermal-hydraulic experiment can be seen in [4],
In this work, the pressure of a scaled experiment is fixed, but exist complex dependencies of the similarity numbers on the size and operation parameters. The search variables are the radius of the hemisphere of the pressurizer, the surge mass flow rate, and the heater thermal power. The Eq. (20) below defines the fitness function as the overall difference between all corresponding similarity numbers of the scaled experiment and the original pressurizer. The similarity numbers with superscript, "P", are for the full-scale pressurizer, and those with superscript, "m", are of the scaled experiment.
FIT= V (l-iV, m /<J/l5
(20)
3.1. Results of the Optimization The parameters of the PI controller were calculated from the exact match of the PI controller similarity numbers, Eqs. (10) and (11). The size data that resulted from the GA optimizations for various pressures are presented in Table 1. Table 1 - Size Data of the Pressurizer and Scaled Experiments Pressure (MPa)
Hemispheric radius(m)
Surge mass flow rate Heating power (Maximum), (kg/s)
(Maximum), (kW)
15.5
3.1115
54.84
1000.0
10.0
2.6434
51.1719
558.4416
5.0
1.6512
30.0
150.0
2.5
1.2394
20.0
60.0
The similarity numbers that reflect the quality or degree of the similarity that is possible to attain at the various pressures are presented in Tables 2 through Table 4. As expected, it can be seen in Table 2 through Table 4, that the discrepancy in the similarity numbers increase when the pressure decreases. It is shown by the relative deviation in the wall condensation numbers, that the local phenomenon most difficult to attain similarity in a much-reduced pressure is wall condensation. The deviations in the similarity numbers cause the distortion in the solution of the pressurizer equations for the very small-scaled experiments.
Table 2 - Pressure, Heating, and Local Phenomena Similarity Numbers
Pressure
)
(NP° V
(MPa)
pres)
N<£c
lyrr
NW°FL
RO
FIT
15.5
0.13038
17.388
8.5569e5
35.805
10.0
0.12581
27.031
5.8344e5
40.921
0.36950
5.0
0.11636
46.147
2.0586e5
68.354
0.81692
2.5
0.10699
66.517
1.2737e5
104.67
1.2915
Table 3 - Enthalpy Transport Similarity Numbers by Rainout and Flashing
{NWh°RO\
{NWhH
{NWh0Fl\
(NWh%\
15.5
1.6866
0.63489
2.6866
0.47436
10.0
1.0685
0.32678
2.0685
0.13717
5.0
0.70408
0.18991
1.7041
0.024904
2.5
0.52280
0.13346
1.5228
0.006555
Pressure (MPa)
Table 4 - Enthalpy Transport Similarity Numbers by Wall Condensation
Pressure
(NWh°wc\
(™4)
{NWh°wc\
[NWh°wc; {NWh°wc\
15.5
0.32811
0.55338
0.20831
0.88148
0.15564
10
0.51007
0.54501
0.16668
1.10551
0.06995
5
0.87080
0.61312
0.16537
1.4839
0.021687
2.5
1.2552
0.65621
0.16751
1.9114
0.0008228
(MPa)
Figure 1 illustrates the degree of agreement that can be obtained for the nondimensional pressure curve of the out-surge transient of the similar experiment with reduced pressure. The experiment with 2.5 MPa has a certain degree of distortion in the non-dimensional pressure between 0.05MPa and 0.25MPa due to the large difference in the similarity numbers (mainly wall condensation). But it still permits experimental measurement for code validation.
• Full Scale (15.5 MPa) • Reduced Scaled (2.5 MPa)
0,920 0,00
I
I
I
I
I
I
I
I
I
0,10
0,20
0,30
0,40
0,50
0,60
0,70
0,80
0,90
1,00
NONDIMENSIONAL TIME
Figure 1 -Dimensionless Pressure for Pressurizer and Experiment (2.5 MPa) 4.
Conclusions
This GA optimization is a much valuable tool to obtain the parameters of an experimental pressurizer. The improvement of the similarity numbers as the pressure of the experiment increases, demonstrates the usefulness of this GA optimization to design scaled experiments of reactor pressurizers. References 1. Barroso, A. C. O. and Batista Fo., B. D., "Refining the Design of the IRIS Pressurizer," Proceeding of the 5th International Conference on Nuclear Option in Countries with Small and Medium Electricity Grids, Dubrovnik, Croatia (2004). 2. Botelho, D. A., De Sampaio, P .A. B., Lapa, C. M., Pereira, C. M. N. A., Moreira, M. de L., and Barroso, A. C. O., Optimization procedure to design pressurizer experiments, International Nuclear Atlantic Conference (INAC), Santos, SP, Brazil, (2005). 3. Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989. 4. Lapa, C. M. F., De Sampaio, P. A. B., and Pereira, C. M. N. A., "A new approach to designing reduced scale thermal-hydraulic experiment," Nuclear Engineering and Design, Vol. 229, No. 2/3, pp. 205-212 (2004).
PARTICLE SWARM OPTIMIZATION APPLIED TO THE NUCLEAR CORE RELOAD PROBLEM MARCEL WAINTRAUB Institute) de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil RAFAEL P. BAPTISTA Universidade Federal do Rio de Janeiro - PEN/COPPE/UFRJ, Centro de Tecnologia, llha do Fundao, block G, room 206, PO Box 68.509, 21945-970, Rio de Janeiro, Brazil ROBERTO SCHIRRU Universidade Federal do Rio de Janeiro -PEN/COPPE/UFRJ, Centro de Tecnologia, Ilha do Fundao, block G, room 206, PO Box 68.509, 21945-970, Rio de Janeiro, Brazil
CLAUDIO M. N. A. PEREIRA Instituto de Engenharia Nuclear - IEN/CNEN, Ilha do Fundao s/n, PO Box 68.550, 21945-970, Rio de Janeiro, Brazil and Universidade Federal do Rio de Janeiro - PEN/COPPE/UFRJ, Centro de Tecnologia, Ilha do Fundao, block G, room 206, PO Box 68.509, 21945-970, Rio de Janeiro, Brazil
This work proposes the use of the Particle Swarm Optimization (PSO) algorithm as an alternative tool for solving the nuclear core reload problem. The key of this work is the use of PSO for solving such kind of combinatorial problem. As the PSO (original version) is only skilled for numerical optimization, an indirect (floating-point) encoding of solution candidates is obtained by applying the random-keys approach. Here, the method is introduced and its application to a practical example is described. Computational experiments demonstrate that, for the sample case, PSO performed better than a Genetic Algorithm (GA). Obtained results are shown and discussed in this paper.
1. Introduction The nuclear core reload problem (NCRP) is a complex combinatorial optimization problem, which consists in finding the fuel loading pattern that maximizes the burnup period of a nuclear power plant (NPP). As its main motivation is the reduction of costs, it has been one of the most outstanding 907
908 optimization problem related to NPP operations, and many approaches have been proposed. One of the most popular one is the FORMOSA [1], which uses Simulated Annealing (AS) as optimization tool. Other successful approaches use Genetic Algorithms (GA) ([2], [3] and [4]), Ant Colony Systems (ACS) [5] and Population-Based Incremental Learning (PBIL) [6]. In this work, it is proposed the use of Particle Swarm Optimization (PSO) [7] as an alternative method for the NCRP. PSO is a population-based optimization metaheuristic, which has demonstrating to be efficient in finding (near-) global optimum solutions in other complex multimodal problems ([8], [9]). The main challenge in applying PSO to the NCRP is, however, its combinatorial nature. Therefore, the use the PSO in its standard form requires an indirect (floating-point) encoding of solution candidates. Here, such indirect encoding is obtained by the use of the random-keys (RK) approach [10]. Previous research [11] demonstrated that PSO with RK can be applied to the Travelling Salesman Problem (TSP), and the results are comparable (maybe slightly better in the chosen experiments) with those obtained by a GA. In the next section the PSO algorithm is introduced. The optimization problem is described in section 3, while method application with results and concluding remarks are seen in sections 4 and 5 respectively.
2. Particle Swarm Optimization Overview Particle Swarm Optimization (PSO) [7] is a recently developed optimization algorithm inspired by the behavior of biological swarms and aspects of social adaptation. In some sense it can be seen as a kind of Evolutionary Computation (EC) technique, but while traditional EC models has its strength in competition (Darwinian competition), PSO algorithm choose collaboration as its strategy to evolve. In PSO, a swarm of structures comprising solution candidates, called "particles", is simulated. They "fly" in an n-dimensional space (the search space of the optimization problem), looking for optima or near-optima region. The position of a particle represents a solution candidate itself, while the search space topology is given by the problem's objective function. Each particle has also a velocity attribute, which have information about direction and changing rate of its position, and Hie. performance (or fitness) attribute obtained by the evaluation of the objective function on the particle's position.
The particle's position and velocity changes are guided by its own experience (historical information of good and bad regions through which the particle has passed), as well as by the observation of their well-succeeded neighbors. Let X,(0 = {xl,i(0,...,xl^t)} and Vi(t) = {vll(t),...,vi,„(t)} be, respectively, the position (the solution candidate vector itself) and the velocity (its changing rate) of particle / in time t, in an n-dimensional search space. Consider also, pBestt(t) = {pBestiX(Q,...,pBestin(t)}, the best position already found by particle / until time / and gBestj(t) = {gBestjl(t),—, gBestin(t)} the best position already found by a neighbor until t. The PSO updating rules for velocity and position are given by: v,.„a+l) = w . v ; i M ( 0 + c 1 7 - 1 . ( p 5 e ^ ( 0 - ^ ( 0 ) + c 2 r 2 . ^ 5 e ^ ( 0 - x , > ! ( 0 )
(1)
*,,„ (f + 1) = *,-,„ ( 0 + v,,,, (*+ 1)
(2)
where rt and r2 are random numbers between 0 and 1. Coefficients c, and c2 are given acceleration constants (often called cognitive and social acceleration, respectively) towards pBest and gBest respectively and w is the inertia weight. In PSO algorithm, the swarm is randomly initialized (positions and velocities). Then, while the stopping criterion (in this case a maximum number of iteration) is not reached, a loop containing the following steps takes place: i) particles are evaluated according to the problems objective function, and fitness values are assigned to each particle; ii) pBest and gBest values are updated; iii) particles moves according to the updating equations for velocity and positions (equations 1 and 2). 3. The Optimization Problem The Brazilian Angra-1 PWR core has been used as the sample case for this investigation. Using an eighth-core symmetry, the number of fuel assemblies to be shuffled falls from 121 (the whole core) to 20 (central element is fixed), as shown in Figure 1. In order to test the PSO approach, a simplified problem has been proposed, considering: i) no constraint about the allowed positions for the assemblies, ii) no burnable poisons, iii) no rotation in the fuel assemblies. The goal is to find the best loading pattern for the 20 fuel assemblies that maximize the Boron concentration at end-of-cycle (which implies in cycle length
maximization). As a constraint, the radial peaking-factor must be inferior to 1.435. 1 2
3
4
5
6
7
8
0
in
11
12
13
14
16
17
19
•
18
20
15
Figure 1 - Angra-1 PWR eighth-core with core positions enumerated.
The objective function,/ to be maximized is, then, given by Equation 3.
/ =
\B, \B-k.F.AT'
Fxy < 1.435 otherwise
(3)
where B is the Boron concentration, FXY m e radial peaking-factor and k a penalization multiplier.
4. Method Application and Results As the standard PSO is skilled to continuous numerical optimization (real search spaces), to solve the NCRP by PSO implies in one of the following two options: i) modifying the original PSO, in order to adapt it to combinatorial optimization, or ii) transform the combinatorial search space into a continuous numerical one. In this work, it was chosen the second approach.
911 4.1. The Random Keys approach To transform the combinatorial search space into a continuous numerical one, it is proposed the use of the Random Keys approach [10], originally applied to a GA for solving the Travelling Salesman Problem (TSP). The method consists in encoding solution candidates into real-coded vectors in which every element may range between 0 and 1. Such vector is called the "random keys" (RK). In case of the NCRP, the dimension of the vector is the number of fuel assemblies to be shuffled (20 in the proposed sample case). The decoding of the RK into a fuel loading pattern is by sorting the fuel assemblies by the RK vector. As illustrated in the example of Figure 2. Random Keys (not sorted) 0.12 0.23 0.01 0.63 0.29 0.99 0.54 0.81 0.44 0.20 0.11 0.79 0.05 0.33 0.77 0.67 0.16 0.49 0.91 0.66 Core position (fixed) 1
2
1
2
3
4
5
6
7
8
9
8
9
10 11 12
13 14 15 16 17 18 19 20
10 11 12
13 14 15 16 17 18 19 20
Fuel assemblies (before sorting) 3
4
5
6
7
Decoding process
\7
Random Keys (sorted)
0.01 0.05 0.11 0.12 0.16 0.20 0.23 0.29 0.33 0.4< 0.49 0.54 0.63 0.66 0.67 0.77 0.79 0.81 0.91 0.99 Core position (fixed) 1
2
3
4
5
6
7
8
9
10 11 12
13 14 15 16 17 18 19 20 Fuel assemblies (after sorting)
3
13 11
1
17 10 2
5
14
9
18
7
4
20 16 15 12
8
19
Figure 2 - Decoding the random keys vector into a fuel loading pattern (an example)
6
4.2. Experiments and Results The proposed approach has been tested along several experiments. The number of particles was fixed in 50. The coefficients e, and c2 were set to 2.0 and FFhas been decreased from 0.8 to 0.2 along the PSO evolution. In order to verify consistence, different random seeds have been used. The reactor physics calculations have been made by the RECNOD [12] code, which runs on the same computational platform used by the PSO. Due to several limitations of the code, the constraint FXY< 1.435 (FXY is not provided by RECNOD) is substituted by PMAX<139, where P^A-is the maximum normalized assembly power. Value 1.39 in RECNOD means F^yc 1.435 in the code used in practice. Results were compared to those obtained by a GA with population size equal to 50 and different sets of parameters (mutation rate between 0.001 and 0.01; crossover rate between 0.6 and 0.8) and random seeds. Table 1 shows results (fitness values) obtained along 9 experiments. Remark that, for all experiments FXY values have found by both PSO and GA satisfies the constraint (FXY<1-435), hence, fitness values shown in Table 1 are exactly the Boron concentrations. Table 1 - Results and comparisons
Experiment 1 2 3 4 5 6 7 8 9 Average
GA 1219.00 1244.00 1297.00 1211.00 1150.00 1021.00 1299.00 1331.00 1214.00 1220.67
PSO 1310.00 1402.00 1404.00 1226.00 1329.00 1388.00 1271.00 1231.00 1335.00 1321.78
Note that PSO has been much more efficient and consistent than the GA in the proposed problem and experiments.
5. Conclusions In this work, it has been demonstrating the feasibility of using PSO with RK as an alternative method for solving the NCRP. Moreover, it has outperformed the GA (standard algorithm). The use of RK has allowed the use of a standard PSO, to which parameterization and convergence studies can be found in literature. Current investigations point to the application of niching or neighborhood restrictions as a promising improvement. However, such improvements would imply in a great computational overhead. Hence, nowadays, current research is on parallel PSO approaches, comprising niching and neighborhood restrictions model. Besides, investigating the application of PSO in a more realistic problem (without the simplifications made here) is an important technological contribution. References 1. Kropackzek, D.J., Turinsky, P.J., July. In-core Nuclear Fuel Management Optimization for Pressurized Water Reactors Utilizing Simulated Annealing, Nuclear Technology, 95, n.9, pp.9-31 (1991). 2. Poon, P.W., Parks, G.T. Optimizing PWR Reload Core Design, Parallel Solving from Nature, 2, pp371-380 (1992). 3. DeChaine, M.D., Feltus, M.A., Nuclear Fuel Management Optimization Using Genetic Algorithms, Nuclear Technology, 3, pp.109-114 (1995). 4. Chapot, J.L.C , Da Silva, F.C, Schirru, R. A New Approach to the Use of Genetic Algorithms to Solve the Pressurized Water Reactor's Fuel Management Optimization Problem, Annals of Nuclear Energy, 26, pp.641655 (1999). 5. Machado, L. And Schirru. R., The Ant Algorithm Applied to the Nuclear Reload Problem, Annals ofNuclear Energy, 29, pp. 1455-1470 (2002). 6. De Lima, A.M.M. and Machado, M.D., Modelo de Ilhas para a Implementacao Paralela do Algoritmo Evolucionario de Otimizacao PBIL no Problema da Recarga de Reatores Nucleares PWR, International Nuclear Atlantic Conference (INAC), Brazil, (2002). 7. Kennedy, J., Eberhart, R.C., Particle Swarm Optimization, Proceedings of IEEE International Conference on Neural Networks, 4, pp.1942-1948, Australia (1995) 8. Domingos, R. P. Schirru, R. and Pereira, C. M. N. A. Particle Swarm Optimization in Reactor Core Designs, Nuclear Science and Engineering, 152, pp.1-7 (2006).
914 9.
Siqueira, N.N., Pereira, C.M.N.A., Lapa, C.M.F. The Particle Swarm Optimization Algorithm Applied to Nuclear Systems Surveillance Test Planning, INAC, 2005 10. Bean J. C. Genetic Algorithms and Random Keys for Sequencing and Optimization, ORSA Journal on Computing, 6, 2, pp 154-160, 1994. 11. Menezes, A. A. M. Otimizacao por Enxame de Particulas Aplicado ao Problema Combinatorio da Recarga de um Reator Nuclear, M.Sc. Thesis, COPPE/UFRJ (2005). 12. Chapot, J.L. C. . Otimizacao Automatica de Recargas de Reatores de Agua Pressurizada utilizando Algoritmos Geneticos, D.Sc. Thesis, COPPE/UFRJ (2000).
PARALLEL EVOLUTIONARY METHODS APPLIED TO A PWR CORE RELOAD PATTERN OPTIMIZATION ROBERTO SCHIRRU PEN/COPPE,
Universidade Federal do Rio de Janeiro, Caixa Postal Rio de Janeiro, Rio de Janeiro 21945-970, Brasil
68509
ALAN M. M. DE LIMA, MARCELO D. MACHADO PEN/COPPE,
Universidade Federal do Rio de Janeiro, Caixa Postal Rio de Janeiro, Rio de Janeiro 21945-970, Brasil
68509
The nuclear reactor core reload pattern optimization problem consists in finding a pattern of partially bumed-up and fresh fuels that optimizes the next operation cycle. This problem has been traditionally solved using an expert's knowledge, but recently artificial intelligence techniques have also been applied successfully. This problem is NP-hard, meaning that complexity grows exponentially with the number of fuel assemblies in the core. Besides that, the problem is non-linear and its search space is highly discontinuous and multimodal. The aim of this work is to apply parallel computational systems based on the Population-Based Incremental Learning (PBIL) algorithm and on Ant Colony System (ACS) to the core reload pattern optimization problem and compare the results to those obtained by the genetic algorithm and by a pattern obtained by an expert. The case study is the optimization of cycle 7 of the Angra 1 Pressurized Water Reactor.
1. Introduction At the end of a period of time called operation cycle, it is no longer possible to sustain the nuclear plant's nominal power. Consequently, the reactor is turned off and all fuel assemblies in the core are unloaded and put in a spent fuel pool. The most burned up assemblies are replaced with fresh ones. These fresh assemblies and the usable spent ones form the set that will be used in the nuclear reactor core reload optimization problem. The aim of the PWR core reload optimization problem [1] is to find a pattern of fresh and partially burned fuel assemblies that optimizes the performance of the reactor over the next operating cycle, while ensuring that various operational and safety constraints are always satisfied. This problem is NP-hard, which means that its difficulty grows exponentially with the number of fuel assemblies in the reactor core. For a 121 fuel-assembly reactor, there are approximately 10273 patterns, that fall to 915
916 approximately 10 patterns using a 1/8 symmetry and some positioning rules, which is still extremely high to solve by enumeration [2], Generally, in order to solve this problem a cyclic reloading scheme is used, in which only a part of fuel assemblies is reloaded in each cycle (generally 1/3). Among the reloading strategies, the most traditional are the out-in and low leakage ones. 2. PBIL Algorithm The PBIL algorithm [3] is a method that combines the mechanism of the Genetic Algorithm (GA) [4] with the simple competitive learning, creating an important tool to be used in the optimization of numerical functions and combinatorial optimization problems. PBIL works with a set of solutions of the problem, called population, in a code chaining bit form. The most commonly used form is binary, meaning the use of 0s and Is. The aim is to create a probability vector, containing real values in each position, that when used in a decoding procedure generates a solution for the function to be optimized. In order to obtain high diversity for the population at the beginning of the search procedure, each position of the probability vector is defined with an initial value of 0.5. This determines that the probability of getting the value "0" or " 1 " in each position of the "bit" chain will be the same, generating a random initial population. Similarly to the competitive network training, the values for the probability vector are gradually changed from the initial value 0.5 to values near to 0.0 or 1.0, in order to present the best values found in the population for each generation. During the search procedure, at each generation, the values for the probability vector are updated using the following rule:
P(i) =P(i)x(l - Ta) + (X(i) x Ta)
(1)
where: • • •
Ta : learning rate P(i): value of the probability vector in the ith position X(i): value of the best solution vector of the population in the ith position
3. Ant Colony system In the early 90s, a new algorithm was developed specifically for combinatorial optimization problems. It was inspired by the observation of ant colonies, and was named Ant System [5]. Ant System was successfully applied to complex combinatory problems as the Traveling Salesman Problem (TSP) [5] and the Quadratic Assignment Problem (QAP) [6], and it has been applied to many problems in network-telecommunication optimization, vehicle-routing, allocation tasks and many other combinatorial optimization problems. ACS is mathematically described by Eqs. (2) and Eq. (3), below.
{[FE(r,s,p)]s
J max
x[HE(r,s,p)]p}
Roulette
se
q
se q > q0
(2)
r [FE(r,S,p)Yx[HE(r,s,p)Y
se s e Jk (r)
A ±1±J '• ^V Z,[FE(r,z,p)]'x[HE(r,z,p)Y >*>yjl Y V i^'FJ}
Roulette^
(3)
zeJt(r)
0
se s <£ Jk (r)
\. where: • • • • • • • •
r - ant's current position s - ant's next position p - ant's position in the core FE - Pheromone matrix HE - Heuristic matrix q - random parameter between 0 and 1 a and p - parameters that dtermine the realtive importance of FE and HE j(r) - list of unused fuel assemblies
4. Parallel Model PBIL and ACS were paralellized using the Island Model [7] [8], which was originally developed for the Genetic Algorithm. This model consists of
918 connected islands that search for the best solution and periodically exchange information in a procedure called migration, as shown in Figure 1.
Figure 1. Topology for 5 islands.
5. Case Study The parallel versions of PBIL and ACS were tested using as case study the cycle 7 of the Angra 1 nuclear power plant. This plant is a 626 MW PWR designed by Westinghouse and operated by ELETRONUCLEAR. It is located in the southeast of Brazil. The reactor core is divided in four symmetric quadrants by two main axes. There are also two secondary axes, called diagonals, that together with the main axes divide the reactor in eight symmetric parts. Considering a 1/8 core symmetry and excluding the central fuel assembly, there are 20 fuel assemblies, 10 belonging to the symmetry axes (quartets), and 10 out of the axes (octets). The optimization is performed in this set and the pattern is repeated in their symmetric counterparts. In our formulation, there are two objectives:
919 • Maximization of the cycle length; • Minimization of the Maximum Average Relative Power (MARP); It must be taken into account that the assemblies that did not belong to the symmetry axis cannot be put on the symmetry axis. The reactor core neutronic calculations were performed using the RECNOD [11] computational program.
6. Genotype and Fitness Description For PBIL, each fuel assembly is represented by a set of bits in the genotype. As in the TSP, the fuel assemblies can't be repeated. The decodification mechanism is described by Machado [12]. In the case of ACS, there is no genotype, and each loading pattern is represented by an ant [10]. Chapot [11] defined 1.395 as the upper allowable limit for the MARP. The fitness function was developed in a way that, if this constraint is satisfied, its value is equal to the reciprocal of the boron concentration. Otherwise, it is equal to the MARP.
7. Results In order to evaluate the performance of the PBIL and ACS parallel models, comparative experiments with their serial versions, with the GA and with the expert's optimization were performed. A population of 50 individuals for 10,000 iterations was used for all algorithms. Taking a glance at table 1, it can be noticed that: • The parallel models outperformed their serial counterparts in all the experiments. • Increasing the number of islands, there is an improvement in the results. • PBIL and ACS outperformed the GA and the loading pattern generated by the expert. It is worth mentioning that, in the case of Angra 1, a gain of one Effective Full Power Day (EFPD) is worth a profit of US$ 600,000. Each 4 ppm of boron concentration in the core is equivalent to 1 EFPD. Table 1 shows that PBIL obtained 206 ppm more than the GA and 338 ppm more than the expert, or 51 and 84 EFPDs, respectively. ACS, by its turn,
obtained a gain of 398 ppm (99 EFPDs) and 530 ppm (132 EFPDs) in relation to the GA and to the expert, respectively. Table 1. Comparison among GA, PBIL and ACS for serial, 3, 5 and 7 islands.
Number of Islands (MARP / Boron Cone.)
Algorithm
Serial (MARP / Boron Cone.)
3
5
7
Expert
1.430 / 894
—
—
—
GA
1.390/1026
—
—
—
PBIL
1.385 /1083
ACS
1.379 /1263
1.382 / 1106 1.384/ 1297
1.380 / 1152 1.382/ 1368
1.389/ 1232 1.388/ 1424
References 1. Poon, P. W., Parks, G. T., Optimizing PWR reload core Designs, In: Parallel Problem Solving from Nature 2, Elsevier Science Publishers B.V., pp. 371-380, 1992. 2. Galperin, A., Exploration of the Search Space of the In-Core Fuel Management Problem by Knowledge-Based Techniques, Nuclear Science and Engineering, Volume 119, pp. 144-152 (1995). 3. Baluja, S., Caruana, R., Removing the genetics from de standard genetic algorithm, Technical Report CMU-CS-95-141, May 1995. 4. Goldeberg, D.E., 1989, Genetic Algorithms in Search, Optimization & Machine Learning, Reading, Addison-Wesley. 5. Dorigo, M., Gambardella, L.M., Ant Colony System: A cooperative Learning Approach to Traveling Salesman Problem, IEEE Transactions and Evolutionary Computation, Volume 1, n.l, pp. 53-66 (1997). 6. Gambardella, L.M., Taillard, E.D., Dorigo, M., Ant Colonies for the Quadratic Assignment Problem, Journal of the Operational Research Society, Volume 50, pp. 167-176 (1999).
921 7. Cantii-Paz, E., Topologies Migration Rates, and Multi-population Parallel Genetic Algorithms. In: IlliGAL Report N° 99007, Illinois Genetic Algorithms Laboratory, University Illinois at Urbana-Champaign, Urbana, January (1999). 8. Cantu-Paz, E., A Survey of Parallel Genetic Algorithms, Calculateurs Paralleles, v. 10,n.2,1998. 9. Lima, A. M. M., Modelo de Ilhas para a Implementacao Paralela do Algoritmo Evolucionario de Otimizacao PBIL, Tese de M. Sc, COPPE/UFRJ, Rio de Janeiro, Brasil, Marco de 2000. 10. Lima, A. M. M., Recarga de Reatores Nucleares Utilizando Redes Conectivas de Colonias Artificiais, Tese de D. Sc, COPPE/UFRJ, Rio de Janeiro, Brasil, Junho de 2005. 11. Chapot, J.L.C., Otimizacao Automatica de Recargas de Reatores a Agua Pressurizada Utilizando Algoritmos Geneticos, Tese de D.Sc, Programa de Engenharia Nuclear, COPPE/UFRJ, Rio de Janeiro, Brasil, Junho (2000). 12. Machado, M. D., Um novo algoritmo evolucionario com aprendizado LVQ para a otimizacao de problemas combinatories como a recarga de reatores nucleares, Tese de M. Sc, COPPE/UFRJ, Rio de Janeiro, Brasil, Abril de 1999.
ROBUST DISTANCE MEASURES FOR ON-LINE MONITORING: WHY USE EUCLIDEAN? DUSTIN R. GARVEY AND J. WESLEY HINES The University of Tennessee, Knoxville, TN 37771 Traditionally, the calibration of safety critical nuclear instrumentation has been performed at each refueling cycle. However, many nuclear plants have moved toward condition-directed rather than time-directed calibration. This condition-directed calibration is accomplished through the use of on-line monitoring which commonly uses an autoassociative predictive modeling architecture to assess instrument channel performance. An autoassociative architecture predicts a group of correct sensor values when supplied with a group of sensor values that is corrupted with process and instrument noise, and could also contain faults such as sensor drift or complete failure. This paper introduces two robust distance measures for use in nonparametric, similarity based models, specifically the L'-norm and the new robust Euclidean distance function. In this paper, representative autoassociative kernel regression (AAKR) models are developed for sensor calibration monitoring and tested with data from an operating nuclear power plant using the standard Euclidean (L2-norm), L'-norm, and robust Euclidean distance functions. It is shown that the alternative robust distance functions have performance advantages for the common task of sensor drift detection. In particular, it is shown that the L'-norm produces small accuracy and robustness improvements, while the robust Euclidean distance function produces significant robustness improvements at the expense of accuracy.
1.
Introduction
In the U.S. nuclear power industry, millions of dollars are spent annually on the calibration of instrument chains that are performing within the required specifications. For the past twenty years, several nuclear utilities, along with the Electric Power Research Institute (EPRI), have investigated methods to monitor the calibration of safety critical process instruments. In 2000, the U.S. Nuclear Regulatory Commission (NRC) issued a safety evaluation report (SER) [1] on an EPRI submitted Topical Report (TR) 104965, "On-Line Monitoring of Instrument Channel Performance" [2]. This SER concluded that the generic concept of on-line monitoring (OLM) for tracking instrument performance as discussed in the topical report is acceptable. However, they also listed 14 requirements that must be addressed by plant specific license amendments if the TS-required calibration frequency of safety-related instrumentation is to be relaxed. Since the applicability of an OLM system is directly related to the ability of an empirical model to correctly predict sensor values when supplied faulty data, methods must be developed to ensure that robust empirical models can be developed. In order to satisfy this requirement, two robust distance functions have been developed for use in nonparametric, similarity based models, such as 922
923 kernel regression [3] and the multivariate state estimation technique (MSET) [4]. 2. Nonparametric Modeling An empirical model's architecture may be either defined by a set of parameters and functional relationships (parametric) or a set of data and algorithmic estimation procedures (nonparametric). In a parametric model, training data is used to fit the model to the data according to a pre-defined mathematical structure. For example, consider the following polynomial model: y = b0 + 6, x, + b2 x2 + b3 JC, x2 + b4 x\ + b5 x\
(2-1)
In order to completely define this model for a given set of training observations, the polynomial coefficients, bh are optimized to minimize some objective function, usually the sum of the squared error (SSE). Once the optimal polynomial coefficients have been estimated, the model is completely specified by Equation 2-1 and the estimated coefficients. Therefore, a parametric model may be roughly defined as a model that may be completely specified by a set of parameters and a functional relationship for applying these parameters to new data in order to estimate the response. A non-parametric model, by contrast, stores historical data exemplars in memory and processes them when a new query is made. For instance, rather than modeling a whole input space with a parametric model such as a neural network or linear regression, local non-parametric techniques may be used to construct a local model in the immediate region of the query. These models are constructed "on the fly" not beforehand. When the query is made, the algorithm locates historical exemplars in its vicinity and performs a weighted regression with the nearby observations. The observations are weighted with respect to their proximity to the query point. In order to construct a robust local model, one must define a distance function to measure what is considered to be local to the query, implement locally weighted regression, and in some cases consider additional regularization techniques. For this work, we will examine the predictive performance of an autoassociative kernel regression (AAKR) empirical model [5, 6]. Since descriptions of AAKR do not readily appear in the open literature, the following derivation is based upon multivariate, inferential kernel regression as derived by Wand and Jones [7]. The mathematical framework of this modeling technique is composed of three basic steps. First, the distance between a query vector and each of the historical exemplar (memory) vectors is computed using the conventional Euclidean distance or L2-norm:
924
(2 2)
"j-Jik-rf
-
where, uj is the distance between the query vector (x) and j * memory vector, n is the number of variables in the data set, x' is the ith variable of the query vector, and m'j is the ith variable of the j t h memory vector. Second, these distances are used to determine weights by evaluating the standard, Gaussian kernel, expressed by:
-UV
1 w = K{u,h)=
•
e
/h
*
(2-3)
where, h is the kernel's bandwidth. Finally, these weights are combined with the memory vectors to make predictions according to:
Hwj-mj x , = ^
(2-4)
Here, w, are the weights, rrij are the memory vectors, nm is the number of memory vectors, and X is the prediction for the query vector. Since the monitoring system's objective is to detect and quantify sensor drift, the model should be made as immune as possible to sensor drift. In order to improve the robustness of the AAKR modeling routine, distance functions other than the standard Euclidean distance may be used. Before discussing the alternative distance functions, the parameters used to measure model performance must be discussed. The performance of autoassociative OLM systems is measured in terms of its accuracy, robustness, and spillover. Accuracy measures the ability of the model to correctly and accurately predict sensor values and is normally presented as the mean squared error (MSE) between the prediction and the correct sensor value. Robustness measures the ability of the model to make correct sensor predictions when the respective sensor value is incorrect due to some sort of fault. Spillover measures the effect a faulty sensor input has on the other sensor predictions in the model. An ideal system would be accurate and would not have sensor predictions affected by degraded inputs. These metrics are explained in detail by Hines and Usynin [8].
925 3. Robust Distance Measures Now that the performance metrics have been defined, the distance measures will now be presented. The most basic form of the AAKR modeling technique makes use of the Euclidean distance or L2-norm given by:
Since this distance function squares the individual differences, the effects of a faulty input may be amplified, resulting in parameter predictions which are more affected by input variations and therefore less robust. In order to improve robustness, we desire distance measures which are not affected by errant sensor readings, and two robust distance functions have been investigated. The first robust distance function is the L'-norm, which is defined by the following equation.
/=]
Notice that rather than square the individual differences, the L'-norm uses the absolute value. This alteration will be shown to provide a modest improvement in robustness, but the distance will still be affected by faulty input. The next robust distance function attempts to remove faulty input from the distance calculation and therefore should provide the largest improvement to model robustness. The final robust distance function is named robust Euclidean distance, and is defined by the following equation:
«J=jib -4J-$%)[&-*JJ] Here, m a x \\x' — m'.f i={l,...,n}
*
(3 3)
-
\ is the maximum squared difference of the query
'
vector from the j t h memory vector. Simply speaking, one "bad performer" is assumed to exist and its influence is removed from the calculation. To more clearly illustrate Equation 3-3, consider the following example vectors. *,= [0.9501 0.2311 0.6068 0.4860] m,=[0.8913 1.7621 0.4565 0.0185]
926 The squared differences are found to be: (x^-ntj)2 =[0.0035 2.3438 0.0226 0.2185] Notice that the largest squared difference is 2.3438. Therefore, the robust Euclidean distance is defined to be the square root of the sum of the squared distances minus the largest squared difference. Uj = V2.5884-2.3438 = 0.4946 In conclusion, the robust Euclidean distance is the Euclidean distance with the largest distance or worst performer removed. 4. Results In this section, data collected from an operating nuclear power plant steam system is used to compare and evaluate the robust distance metrics. The model (variable grouping) chosen was developed during the EPRI OLM Implementation Project and is currently being used to monitor steam system sensor calibration at an operating plant; thus, it is an appropriate model to evaluate. The steam system model contains 5 plant sensors, primarily from one loop, which include 2 turbine pressure sensors and 3 steam pressure sensors. The quoted sensor units are as follows: 1) turbine pressure in pounds per square inch atmospheric (PSIA) and 2) steam pressure in pounds per square inch gauge (PSIG). The training data for each of the sensor types is presented in Figure 1.
'o
m
«
m
m
>«B
rcm
inn
in*
o
x»
»
«o
a»
«n>
ia»
i«w
tso
Figure 1: Training data for (a) turbine pressure and (b) steam pressure
The data presented in Figure 1 was selected from data collected every two minutes over a two month period. Overall the training and test data spans approximately 2 weeks of data observing every 5th sample or every 10 seconds.
927 The training data was chosen to be 1,600 observations from steady state plant operation. The test data were chosen to be a successive set of 400 observations sampled from steady state plant operation. The training data were used to develop the empirical models and the test data were used to evaluate the performance of the empirical models. For completeness, the AAKR model was developed with 800 memory vectors and a bandwidth of 0.5. The resulting accuracy, robustness, and spillover performance metrics are listed in Table 1 and presented in Figure 2.
Spillover
Robustness
Accuracy (xlOO)
Table 1: Accuracy, robustness, and spillover performance for robust distance functions Turbine Pressure
Steam Pressure
#1
#2
#1
#2
#3
Euclidean
0.23
0.60
0.44
0.21
0.29
0.35
Ll-norm
0.08
0.20
0.28
0.07
0.02
0.17
Robust Euclidean
0.59
2.80
0.89
0.42
0.36
1.10
Euclidean
0.56
0.63
0.29
0.33
0.37
0.44
L'-norm
0.64
0.73
0.21
0.25
0.24
0.41
Robust Euclidean
0.20
0.23
0.23
0.18
0.13
0.19
Euclidean
0.11
0.11
0.18
0.18
0.16
0.15
L'-norm
0.11
0.12
0.12
0.15
0.12
0.13
Robust Euclidean
0.09
0.12
0.06
0.08
0.09
0.09
Average
The plots show a decrease in robustness and spill-over for the robust distance functions. In other words, the models that use the robust distance functions are less affected by faulty input and are considered to be more robust. This increased robustness is not without consequence though, as all of the variable accuracy metrics (MSE) for the robust Euclidean distance function are larger than those of the model with the L2-norm. Even though there may be an increase in the accuracy metric (predictive error of the model), using the normal L2-norm, the decreases in robustness and spillover metrics using the L'-norm and robust Euclidean distance more than validate its effectiveness in detecting sensor drift.
928 RMt «**»*** *f*ffH
••EwK*w
EsJ
H
^ ^ ^ j l . - 1 Nwm HMNet>u'il imw*?V'
5
4
S •
2 •
,11
IJ III
t •
.mm\
3*»S!K f<W&«
I 1
ill III,
Spffliw«m*«tSr
$&a& tfOUtxr
(a)
(b)
(c)
Figure 2: Illustration of the accuracy, robustness, and spillover performance for robust distance functions
5. Conclusions This paper has introduced two robust distance measures for use in nonparametric, similarity based models, specifically the L'-norm and the new robust Euclidean distance function. In this paper, representative autoassociative kernel regression (AAKR) models were developed and tested with data from an operating nuclear power plant using the standard Euclidean (L2-norm), L'-norm, and robust Euclidean distance functions. It was shown that the alternative robust distance functions have performance advantages for the common task of sensor drift detection. In particular, it was shown that the L'-norm produces small accuracy and robustness improvements, while the robust Euclidean distance function produces significant robustness improvements at the expense of accuracy. 6. Acknowledgments We would like to acknowledge the U.S. Nuclear Regulatory Commission funding through a Cooperative Agreement between the NRC and The Ohio State University on "Research on Instrumentation and Control Reliability, ThermalHydraulics, and Waste-Management": Grant/Agreement No. NRC-RES-04-076. The information and conclusion presented here in are those of the authors and do not necessary represent the views or positions of the NRC. Neither the U.S.
Government nor any agency thereof, nor any employee, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third party's use of this information.
References 1. NRC Project no. 669 (2000), "Safety evaluation by the office of nuclear reactor regulation: Application of on-line performance monitoring to extend calibration intervals of instrument channel calibrations required by the technical specifications - EPRI Topical Report (TR) 104965 OnLine Monitoring of Instrument Channel Performance" U.S. Nuclear Regulatory Commission: Washington, D.C., July, 2000. 2. EPRI TR-104965 (2000), "On-Line Monitoring of Instrument Channel Performance," EPRI, Palo Alto, CA: September 2000. 3. Fan, J. and I. Gijbels (1996), Local Polynomial Modeling and Its Applications, Chapman & Hall/CRC, New York, NY: 1996. 4. Singer, R.M., K.C. Gross, J.P. Herzog, R.W. King, and S.W. Wegerich (1996), "Model-Based Nuclear Power Plant Monitoring and Fault Detection: Theoretical Foundations," Proc. 9th Intl. Conf. on Intelligent Systems Applications to Power Systems, Seoul, Korea, 1996. 5. Nadaraya E.A.(1964), "On estimating regression", Theory of Probability and its Applications, vol. 10, pp. 186-190, 1964. 6. Watson, G. S. (1964), "Smooth Regression Analysis", The Indian Journal of Statistics, Series A, Vol. 26, pp. 359-372: 1964. 7. Wand, M.P., and M.C. Jones (1995), Kernel Smoothing, Monographs on Statistics and Applied Probability, Chapman & Hall, London: 1995. 8. Hines, J. Wesley and Alexander Usynin (2004), "On-Line Monitoring Robustness Measures and Comparisons", International Atomic Energy Agency Technical Meeting on "Increasing instrument calibration interval through on-line calibration technology", OECD Halden Reactor Project, Halden, Norway, 27th-29th September 2004.
MULTIPLE OBJECTIVE EVOLUTIONARY OPTIMISATION FOR ROBUST DESIGN DANIEL E. SALAZAR A. 'Division de Computation Evolutiva (CEANI), Instituto de Sistemas Inteligentes y Aplicaciones Numericas en Ingenieria (IUSIANl), Universidad de Las Palmas de Gran Canaria. Canary Islands, [email protected], CLAUDIO M. ROCCO S. Universidad Central de Venezuela, Facultad de Ingenieria, Caracas, Venezuela. crocco@reacciun. ve ENRICO ZIO Dipartimento di Ingegneria Nucleare Politecnico di Milano, Milano,Italy, enrico.zio@polimi. it
}
This paper proposes the use of Multiple Objective Evolutionary Algorithms (MOEA) for robust system design. A numerical example relative to the evaluation of the reliability of a Residual Heat Removal system of a Nuclear Power Plant illustrates the approach. 1. Introduction The robustness of a design is defined as the maximum deviation from its specifications that can be tolerated with the product still meeting all the requirements [1]. The final goal is to limit the design parameters uncertainty so as to maintain the system required performance within specified bounds. As an example, consider the reliability of a system Rs= f (R b R 2 ,.., R„) where Rj is the reliability of component i: the designer is interested in knowing the maximum allowed reliability deviation for each component consistent with a predefined system reliability requirement, e.g., 0.990 < R s < 0.999. Given the unknown feasible zone F specifying the required performance bounds of a generic design (dotted line in Figs, la and lb for a case of a design depending on two variables Ra and Rb), the objective of robust design can be achieved by identifying an approximate description by using simply shaped sets like boxes contained in the allowed domain [2]. More specifically, the objective of optimal robust design can be satisfied by identifying the Maximum Volume 930
931 Inner Box (MIB) whose points all satisfy the system performance requirements, up to the boundary (solid line in Figs, la and lb).
a) Maximum volume Inner Box (fixed center)
b) Maximum volume Inner Box (variable center)
Fig. 1. Feasible Solution Set (dotted line) and approximate description MIB (solid line) for the case of two-dimensional design space [3]
In [3] the authors propose an indirect approach to identify the MIB by formulating the problem as a single objective optimisation of the MIB volume. The approach uses an Evolutionary Strategy to optimise the hypervolume whereas Interval Arithmetic (IA) is used to guarantee the feasibility of the design. Indeed, IA allows obtaining the exact range of a function with variables defined in a range with only one "interval" evaluation. In practice, there are situations, in which the decision-maker (DM) needs to optimise two or more objectives, e.g. the reliability and the cost of a given system design. Adopting a single objective (SO) formulation in which the multiple objectives are combined into a weighted sum, the DM must a priori select the arbitrary weights in reflection of his/her preferences on the different objectives, or repetitively solve the optimisation problem with different weight values to obtain a group of alternatives from which to choose a posteriori the final solution. On the contrary, a multiple objective (MO) approach allows determining directly the Pareto set of alternatives from which the DM can choose a posteriori the preferred one. In this paper, the latter MO approach is embedded within an evolutionary algorithm (EA) scheme aimed at determining the MIB in the design variables space. Two different optimization problems are considered, depending on whether or not the centre of the approximating box is a priori specified. The remainder of the article is organised as follows: Section 2 introduces the multiple objective formulation for robust design as well as a short description of the heuristic tool here employed and the new proposed approach to solve the "centre unspecified" case. Section 3 brings the results for the different types of robust design relative to a reliability design of a Residual Heat Removal system of a Nuclear Plant. Finally, Section 4 presents the conclusions.
932 2. Robust Design Multiple Objective Formulation A MO optimisation problem (MOP) consists of optimising a vector F(x) of objective functions f;(x), i=l,2,...,k, possibly under specified equality (h(x)) and inequality (g(x)) constraints: Opt[F(x)=(f 1 (x) ) f 2 (x),... ) f k (x)] s.t.: gj (x)<0,j=l,2,..,q; hj(x) = 0,j=l,2,..,r (q + r = m) where x = ( x , , x 2 , . . . , X n ) e X is the vector of decision variables, and X is the feasible domain. In the case of robust design, the decision variables are the design variables and the optimization problem consists in the determination of the MIB. Let B be a box of feasible solutions x defined as B:{x, c e 5Rn|xj efx/, Xjtt], Cj =(XJ" +xf)/2}, where c is the centre. Two cases are considered [3]: 1) Centre specified. The idea is to identify a symmetrical MIB using a known point c as symmetry centre. 2) Centre unspecified. In this case, the centre of coordinates c is unknown and it is considered as an additional decision variable to be determined. In both cases, the goal is to produce a symmetrical MIB around c. The robust design cases 1) and 2) can be formulated as MOP by including the original design objective function in the box search and transforming the constraints into objectives. For example, transforming the maximal cost Csmax (i.e. the maximal cost associated to solutions belonging to the inner box) of a design into an objective yields: 1. Centre Specified:
2. Centre Unspecified:
max
n xf-c,
n maxf] Xf-C: (=1
miner
(1)
minC™
(2)
The first objective function measures the MIB hyper-volume, in which xB represents a vertex of the optimal MIB. From here, the range of each variable is easily determined. Note that in the second case the centre c is also a design variable to be determined within the intersection of the feasible domain F and the boxB. The MOP approach is general and can be used for any type and number of objectives, depending on the problem under study and the DM criteria.
933 In this research, the solution to the robust design MOP is tackled directly using Multiple Objective Evolutionary Algorithms (MOEA). This family of Evolutionary Algorithms is designed to deal with MOP as well as handling constraints. The approach does not guarantee the determination of the exact Pareto frontier, nor does any other heuristic approach for that matter. Nevertheless a number of comparisons performed in Evolutionary Multicriteria Optimisation on benchmark problems have shown that results obtained using different instances of MOEA are very close to the exact solution (e.g. [4]). 2.1. Multiple Objective Evolutionary Algorithms In the Evolutionary Multicriteria Optimisation field, the term Multiple Objective Evolutionary Algorithms (MOEA) refers to a group of evolutionary algorithms tailored to deal with MOP. This group of algorithms conjugates the basic concepts of dominance with the general characteristics of evolutionary algorithms. Therefore, MOEA are able to deal with non-continuous, non-convex and/or non-linear spaces, as well as problems whose objective functions are not explicitly known (e.g. the output of Monte Carlo simulation codes). In this work, the Non-dominated Sorting Genetic Algorithm (NSGA-II) is used [5]. It is a very efficient MOEA, which incorporates an elitist archive and a rule for adaptation of the population chromosomes that takes into account both the rank and the distance of each solution with respect to its neighbours in the population. NSGA-II is implemented following the pseudo-code presented in Alg. 1 below. Such implementation allows integer, real and mixed chromosomes. For the particular problem studied here, only real variables were used. The recombination mechanism is one-point crossover. For real variables, the crossover is performed as a linear combination whereas the mutation operation is performed as Gaussian mutation of type Rnew = Rold + N(0,c2) (for more details see [6]). In the most general case of unspecified centre, the decision variables of the n-dimensional problem are the centroid Cj and the lower vertex Xj', where x,' e [cimin,Cjmax]> 0 < Cimin, cimax < 1, i=l,2,...,n. For each c(, the following constraints stand: the lower vertex must verify Cimm < Xj' < Cj whereas the upper vertex, which can be calculated as x/* = 2q - x,' due to the symmetry condition imposed, is restricted to c,< x*' < c^. A simple MOEA approach can be used directly to analyse the centrespecified case. On the contrary, the centre-unspecified case requires more attention. As a matter of fact, there exists a dependency between the bounds of
934 Xj{ and the value of cf. At least two strategies can be employed to solve the problem with MOEA. The first one consists in a double-loop configuration, where the external loop controls the value of C; and the nested loop finds the best vertex for each prefixed cf. The process is repeated for different values of Cj, until a representative number of different centroids have been visited. ALGORITHM 1. Pseudo-code for NSGA-II Input: • M (Population size) • N (Archive size) • tmax (Max. number of generations) Begin: • Randomly initiate the non dominated solutions archive PA°, and set the population P° = 0 , t = 0. While t < tm,,: . p t = p ' + pA' • Assign adaptation to P' • PA,+1 = {N best individuals from P1} • MP (mating pool) = {M individuals randomly selected from PA1*' using a binary tournament} • P,+1 = {M new individuals generated by applying recombination operators on MP} • t = t+l End loop Output: » Non dominated solutions from PA1
The other alternative is to transform the search space by means of a percentage representation that codifies each individual as a group of n pairs {Cj, %Ximax} where %Xjmax represents a percentage of the maximal distance between q and its limits [6,7]. Therefore, the mathematical relation to determine the value o f Xi is:
%
jj.rna, _ l q - * , - | ' max
x,
=
• A
\Cj~X, I „min| I
min{|c,. - c ,
„max K
|,|c, - c ,
|}
/"5\
(i)
Since now the vertex and centroids relationship is relative, only feasible individuals can be produced when the recombination operators are applied [6]. 3. Computational Example The Residual Heat Removal system (RHR) is a low pressure system (400 psi) directly connected to the primary system which is at higher pressure (1200 psi). The RHR constitutes an essential part of the low-pressure core flooding system
935 which is part of the Emergency Core Cooling System (ECCS) of a nuclear reactor [8]. In Figure 2, a schematic of the system is shown. The reliability of the system, Rs, is modeled by 16 third-order cut sets originated by the combination of 8 basic events. The problem is to obtain the maximum allowed ranges of variation of the reliability of each basic component Rf, i=l,2,...,8 such that the system reliability remains bounded as 0.99 < Rs < 1. The component reliability values are subject to the constraint 0.80 < R( <1 and the centre of each variability interval is constrained to Cj=0.90, i=l,2,...,8. Normally, the design problem seeks to constrain a cost function, e.g. C s = 2 V K i R " ' . In this example, K={ 100,100,100,150,100,100,100,150}, and ctj = 0.6 Vi. The MO formulation is obtained considering the cost as an objective function.
Figure 2: Schematic of the RHR system for a BWR [12]
The centre-unspecified case is first illustrated with reference to the double loop approach [7,6]. Six sets of solutions were generated with different predefined centroids (Fig. 3), after a time-consuming search requiring 12550 evaluations of the objective functions for each centroid value. For each set it is assumed that all centroids are equals. Figure 4 shows the results obtained using the "percentage representation" (PR), where each Cj is not fixed in advance. In this figure we also included the non-dominated front for the set of solutions obtained earlier. Note that the Pareto front is enlarged using the PR, thus providing a bigger MIB. The obtained results show that the percentage representation formulation reduces the computational burden and facilitates the solution of the centreunspecified case in a very efficient way, which leads in our example to the possibility of finding more robust and less expensive design alternatives. Note that better results are due not only to the evolutionary approach adopted, but as
936 previously mentioned, to the fact that the PR formulation transforms the search space, allowing a better exploration of the search space. DM can now select any solution of the Pareto-optimal frontier, analyse its characteristics (position of the centroids, vertexes, and associated costs) and choose the preferred one. 1850 -i
E 1700 - I
1650-
1600 -
1550 -I -2.00E-10
1
1
i
1
i
.
1
8.00E-10
1.80E-09
2.80E-09
3.80E-09
4.80E-09
5.806-09
6.80E-09
1— 7.80E-09
MB |
-&-Ci=0.89
-fr-Ci=0.9O
-X-Ci=0.91
-X-Ci=0.92
-e-Ci=0.93
H-Ci=0.8S
|
Fig. 3. Trade-off between MIB and Csmax for selected centroids
1850 -i
1550 -I -2.00E-10 I
1
1
1
1
1
1
1
1
8.00E-10
1.80E-09
2.80E-09
3.80E-09
4.80E-09
5.80E-09
6.80E-09
- X - PR [0.80.11
+
Ci=0.88
MIB CF0.89
Ci=0.90
r—' 7.80E-09 Ci=0.91
i
Fig. 4. Non-dominated front using the percentage representation
4. Conclusions This paper analyses a MO formulation to obtain robust system designs. The MO formulation extends the possibilities of the robust design approach providing the
937 DM with a wider horizon of non-dominated alternatives. The Multiple Objective Evolutionary Algorithm approach employed to solve the MOP formulation provides an excellent way to approximate the Pareto frontier, for both the centrespecified and die centre-unspecified cases. The approach based on the percentage representation remarkably improves the efficiency of the MO centreunspecified solution technique.
References 1. Hendrix EMT, Mecking CJ, Hendriks ThHB (1996) Finding Robust Solutions for Product Design Problems. EJOR 92: 28-36 2. Milanese M, Norton J, J. Piet-Lahanier J (Eds.) (1998) Bounding Approaches to System Identification. Plenum Press, New York, USA 3. Rocco C, Moreno JA, Carrasquero N (2003) Robust Design using a HybridCellular-Evolutionary and Interval-Arithmetic Approach: A Reliability Application. Reliab Engnng Sys Safety 79(2): 149-159 4. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4): 257-271 5. Deb K, Pratap A, Agarwal S, Meyarivan T (2001) A Fast and Elitist MultiObjective Genetic Algorithm: NSGA-II. KanGAL Report No. 200001. Kanpur Genetic Algorithms Laboratory (KanGAL). IIT India 6. Salazar D, Rocco C (IN PRESS) Solving Advanced Multi-Objective Robust Designs by means of Multiple Objective Evolutionary Algorithms (MOEA): A Reliability Application. Reliab Engnng Sys Safety 7. Martorell S, Carlos S, Villanueva JF, Sanchez AI, Galvan B, Salazar D, Cepin M (IN PRESS): Special Issue: Use of Multiple Objective Evolutionary Algorithms in Optimizing Surveillance Requirements. Reliab Engnng Sys Safety. 8. Marseguerra M, Zio E, Bosi F (2002) Direct Monte Carlo Availability Assessment Of A Nuclear Safety System With Time-Dependent Failure Characteristics, Proceedings of MMR 2002, Third International Conference on Mathematical Methods in Reliability, Trondheim, Norway.
FEATURE SELECTION FOR TRANSIENTS CLASSIFICATION BY A NICHED PARETO GENETIC ALGORITHM E. ZIO, P. BARALDI AND N. PEDRONI Department
of Nuclear Engineering, Polytechnic of Milan, Via Ponzio Milan, 20133, Italy
34/3
Multi-objective genetic algorithms can be effective means for choosing the process features relevant for transient diagnosis. The technique allows identifying a family of equivalently optimal feature subsets, in the Pareto sense. However, difficulties in the convergence of the standard Pareto-based multi-objective genetic algorithm search in large feature spaces may arise in terms of representativeness of the identified Pareto front whose elements may turn out to be unevenly distributed in the objective functions space. To overcome this problem, a modified Niched Pareto Genetic Algorithm is embraced in this work. The performance of the feature subsets examined during the search is evaluated in terms of two optimization objectives: the classification accuracy of a Fuzzy K-Nearest Neighbors classifier and the number of features in the subsets. During the genetic search, the algorithm applies a controlled "niching pressure" to spread out the population in the search space so that convergence is shared on different niches of the Pareto front. The method is tested on a diagnostic problem regarding the classification of simulated transients in the feedwater system of a Boiling Water Reactor.
1. Introduction In this paper, the search for an optimal feature subset upon which to perform diagnostics of nuclear transients is carried out via a multi-objective genetic algorithm (MOGA) within a wrapper approach [1]. The objective functions used for evaluating and comparing the feature subsets during the search are the recognition rate achieved by a Fuzzy K-Nearest Neighbors classifier [2] and the number of features forming the subsets. Correspondingly, the goal of the MOGA search is to converge on a family of feature subsets representative of the true nondominated solutions which form the Pareto front in the two-dimensional objective functions space [3, 4]. In this respect, a standard Pareto-based MOGA [4] may encounter difficulties in maintaining genetic diversity during a search in a high-dimensional feature space [3], so that the solutions found at convergence may not evenly represent the Pareto front [5]. To overcome this problem, in the present paper a modified Niched Pareto MOGA [5] is adopted to exploit its capability of evolving the population 938
939 towards alternative, equivalent solutions of feature subsets which give a well distributed, representative description of the Pareto front of nondominated solutions. This is achieved by applying a "niching pressure" in the parents selection step of the algorithm, such that those individuals with less crowded neighborhoods are preferentially selected as parents, and thus allowed to create more offsprings in the following generations: this results in a population more evenly distributed in the objective functions space [5], The proposed search scheme is compared with a standard Pareto-based MOGA in a task of classification of simulated transients in the feedwater system of a Boiling Water Reactor [6]. Haar wavelet decomposition [7] is applied to the transient signals for capturing their dynamic behaviour: this actually triplicates the number of features to be initially considered, significantly increasing the complexity of the problem. The paper is organized as follows. In Section 2, the way GA search can be applied to the feature selection task for classification is illustrated. In Section 3, the modified niched Pareto-based MOGA is introduced. In Section 4, the nuclear case study is presented. Finally, some conclusions are drawn in the last Section. 2. Genetic algorithms for feature selection Given n features, the problem of selecting a subset of m relevant ones can be formulated as an optimization problem. In this view, given a set A of ndimensional input patterns, a GA can be devised to find an optimal binary transformation vector Vf , of dimension n, which maximizes/minimizes a set of optimization criteria, i.e. the objective functions. Let m be the number of 1 's in Vf and n - m the number of O's. Then, a modified set of patterns B = Vf(A) is obtained in an /w-dimensional space (m < n). Figure 1 shows the structure of a multi-objective GA feature selector that uses the final classification accuracy (to be maximized) and the dimension m of the transformed patterns (to be minimized) as optimization criteria. The GA creates a population of competing transformation vectors V,, I = 1,2, ..., which are evaluated as follows [8]: i. The vector V, is applied to each pattern of set A , giving a modified pattern which is then sent in input to the classifier, ii. The set B of modified patterns thereby obtained is divided into a training set, used to train the classifier, and a testing set, used to evaluate the classification accuracy on new patterns, iii. The classification accuracy obtained and the number of selected features, m, are used by the GA as a measure of the goodness of the transformation vector Vj used to obtain the set of transformed patterns.
940 iv. On the basis of this feedback, the GA conducts its search for a vector or a set of vectors which give rise to the best compromise among classification accuracy and parsimony in the selection of features. The organization of the chromosome is quite straightforward [8]: each bit of the chromosome is associated with a parameter and interpreted such that if the /-th bit equals 1, then the j'-th parameter is included as feature in the pattern for classification or viceversa if the bit is 0. Concerning the fitness function of classification accuracy, each subset of features encoded in a chromosome is evaluated on a set of testing data using a fast-running nearest neighbor classifier. More specifically, in the applications which follow, the total number of pre-labelled patterns available is randomly subdivided into training and test sets consisting of 75% and 25% of the data, respectively. The Fuzzy K-Nearest Neighbor algorithm (FKNN) [3], with K = 5, has been applied to classify the test data on the basis of the location of the labelled training data. The random subdivision of the available patterns in training and test sets is repeated 10 times (10 cross-validation): the mean recognition rate, i.e. the average fraction of correct classifications over the 10 cross-validation tests, is calculated and sent back to the GA as the fitness value of classification accuracy of the transformation chromosome used to produce the transformed set of patterns B .
Original patterns
Genetic Algorithm population of chromosomes
f
1 1 1 0 1 D
Transformed patterns
1
0 0 1 1 DD
A
B=y,(A)
I Dim (A) = n
Classifier
J Dim(B) = m Optimization criteria: flP accuracy of classifier using Vt -transformed patterns, B f2, dimensional of the transformed patterns, B
Figure 1. GA-based feature selection using classification accuracy and number of selected features as optimization criteria. Each binary chromosome from the GA population is used to transform the original patterns 3c, which are then passed to a classifier. The objective function values of the chromosome are the classification accuracy attained on the transformed patterns and their dimension m.
941 3. The Niched Pareto Genetic Algorithm (NPGA) with random sampling tournament selection The Niched Pareto-based random sampling tournament selection procedure is adopted for selecting from the population the individuals that are going to reproduce [5]. The procedure is based on the random sampling of two groups of individuals from the entire population. The first one is named dominance tournament group and contains n, chromosomes which are the candidates for selection as parents whereas the second one, named dominance tournament sampling group and made of ns chromosomes, is used for comparison of the individuals of the first group with respect to dominance. Each of the n, individuals in the dominance tournament group is tested for domination against all the ns individuals in the dominance tournament sampling set. Three different situations may occur: i. only one of the individuals in the dominance tournament group is nondominated by all the individuals in the dominance tournament sampling set. In this case, the non-dominated individual is selected for reproduction; ii. all individuals in the dominance tournament group are dominated by individuals in the dominance tournament sampling set; iii. at least two of the individuals in the dominance tournament group are non-dominated. In cases ii) and iii) the individual which best seems to maintain diversity is selected for reproduction by using the equivalence class sharing method [8]. This method is based on the selection of the individual with the smallest niche count (see definition below) between all the individuals of the tournament group in case ii) and all the non-dominated individuals in case iii). The niche count /n; of the /-th individual in the tournament group, / = 1,2, ..., n,, is an estimate of how crowded its neighbourhood (niche) is. It is calculated over all the np individuals in the current population:
m,=i>* ff )
(1)
where dy is the distance, either in the genotype or phenotype spaces (in the latter case, with respect to either the decision variables or the fitness functions), between the /-th candidate for selection and they'-th individual in the population and sidy) is the sharing function. This is a function decreasing with dtJ and such that s(0)=l and s{dlj)= 0 for d,j>as, where crs is the niche radius, i.e. the
942 distance threshold below which two individuals are considered similar enough to affect the niche count. Typically, a triangular sharing function is used such 1foats(d,j)= \-d,jlcTs for dlj
943 using a large population size (np=200) and a high probability of mutation (pm=0.008). In a single run, the Pareto-based MOGA identifies a range of nondominated solutions with different classification performance (FKNN mean recognition rate)/complexity (number of features) trade-offs (Figure 2). The feature selector shows difficulties in exploring the solution space in the region with number of features m<\0: individuals with m = 4,5,6,7 present unsatisfactory recognition rates and individuals with m = 0 , 1, 2, 3 are not even found. The Niched Pareto-based approach of Section 2 is then investigated to improve the uniformity of coverage of the Pareto front by the optimal feature subsets at convergence. The following set of parameters has turned out by crude search to give the best results in terms of both classification accuracies and coverage and distribution of the individuals on the Pareto front: {n, = 4, ns = 20, oy= 0.1}. The population size (np) and the probability of mutation (Pm) have been set equal to the values of the previous case. Figure 2 shows the comparison between the Pareto sets obtained by the Niched Pareto and the standard Pareto-based MOGA. The niching "pressure" applied by the equivalence class sharing method succeeds in spreading the population out along the Pareto optimal front: indeed, the NPGA Pareto solutions cover from m = 0 to m = 22 with only individuals with m = 8, 9, 15 not present; on the contrary, the standard Pareto-based MOGA front extends from m = 4 to m = 25, with elements with m = 15, 18, 19, 22-24 missing. Moreover, all the elements of the NPGA set have larger recognition rates than those of the set found by the standard Pareto-based MOGA, with particularly significant differences for m = 4-7. These results prove that the Pareto domination tournament and equivalence class sharing are efficient in preserving good individuals and maintaining genetic diversity in the population throughout.
944 Pareto sets
* -
25
o NPGA 20
*
standard Pareto-based MOGA
(0
15 10
E 3
*o , * o .* o * o o o
5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
FKNN Mean recognition rate Figure 2. Comparison between the Pareto optimal fronts obtained by the Niched Pareto (o) and the standard Pareto-based MOGA (0)
5. Conclusions A Niched Pareto-optimal tournament selection genetic algorithm search has been embraced for the selection of the features relevant to a nuclear transient classification task. Two objectives, i.e. the maximization of the classification accuracy in terms of mean recognition rate and the minimization of the number of features forming the subsets, have been used to drive the algorithm towards the identification of a representative and evenly distributed set of alternative, equivalent feature sets offering different trade-offs in terms of diagnostic power and complexity. The Niched Pareto GA has been compared to a standard MOGA, with respect to its ability of maintaining genetic diversity by means of niches during the search. The results obtained on the case study of selecting features relevant for diagnosing transients in the feedwater system of a Boiling Water Reactor, prove that the Niched Pareto approach is more effective than the standard Pareto-based MOGA in the selection of features in a high-dimensional space. The number of features contained at convergence in the NPGA Pareto solutions has turned out to range from m = 0 to m = 22 and the corresponding recognition rates from 0.0667 to 0.9549, whereas for the standard Pareto-based MOGA m goes from 4 to 25 with recognition rates from 0.7506 to 0.9329. Thus, the NPGA is superior in producing a diverse set of solutions with differing performance versus complexity trade-off characteristics.
945 Acknowledgements The authors wish to thank Drs. Paolo Fantoni and Davide Roverso of the IFE, Halden Reactor Project for providing the transient simulation data. References 1. R. Kohavi, G. John, Artificial Intelligence 97, 273 (1997). 2. J. M. Keller, M. R. Gray, J. A. Givens, , IEEE Trans. Syst., Man, Cybern. SMC-15 4,580,(1985). 3. C. Emmanouilidis, A. Hunter, J. Maclntyre, C. Cox, Proc. of ICANN'99. 9th International Conference on Artificial Neural Networks, 2 (1999). 4. D.E. Goldberg, Addison-Wesley Publ. Co., 1989. 5. J. Horn, N. Nafpliotis, and D.E. Goldberg, , Proc. Of the IEEE Conference on Evolutionary Computation, ICEC '94 1, 82, (1994). 6. E.Puska, S. Normann, Enlarged Halden programme group meeting, 2, 2002. 7. Strang, G. & Nguyen, T. Wavelets and Filter Banks, (1996). 8. M. L. Raymer, W. F. Punch, E. D. Goodman, L. A. Khun, A. K. Jain, IEEE Transaction on Evolutionary Computation 4, No. 2, (2000). 9. D. Roverso, Proceedings of System Diagnosis and Prognosis: Security and Condition Monitoring Issues III, AeroSense2003, Aerospace and Defense Sensing and Control Technologies Symposium, (2003). 10. E. Zio, P.Baraldi, D.Roverso, Annals of Nuclear Energy 32, 1649, (2005)
OPTIMIZED DIRECT FUZZY MODEL REFERENCE ADAPTIVE CONTROL APPLIED TO NUCLEAR REACTOR DYNAMICS FRANCESCO CADINI AND ENRICO ZIO Department of Nuclear Engineering, Polytechnic of Milan Via Ponzio 34/3 Milano, 20133, Italy In this work, the stable Direct Fuzzy Model Reference Adaptive Control is integrated with a Genetic Algorithm search of the optimal values of the critical controller parameters. The optimized controller is applied to a model of nuclear reactor dynamics.
1. Introduction The complexity and nonlinearity of nuclear systems render difficult the application of advanced control approaches relying on analytical representations of the underlying plant dynamics. For this reason, control methods are being studied which do not require an analytical model of the system dynamics [1] [2]. Nevertheless, the range of applicability of such controllers has thus far remained limited to simple cases, mainly because they demand advanced mathematics which is not easily accepted in practical applications [3], In order to reduce the gap between theory and practice, computational intelligence techniques, such as fuzzy logic, genetic algorithms and neural networks, are being introduced for establishing efficient, empirical input/output mappings of the system dynamics for use in the optimal control of complex systems [3] [4] [5] [6] [7]. In this paper, we address the problem of effectively controlling a plant resorting to the Direct Fuzzy Model Reference Adaptive Control (DFMRAC) method [3], which exploits a Takagi-Sugeno fuzzy description of the plant dynamics to extend to nonlinear systems the applicability of classic Model Reference Adaptive Control (MRAC) [8] [9]. An issue to be resolved in practice is the determination of the values of the parameters governing the fuzzy controller, rendered particularly difficult by the fact that the control approach does not rely on an explicit representation of the plant dynamics. 946
947 To tackle this problem, a Genetic Algorithm (GA) search procedure [10] [11] is here adopted for determining the 'best' parameter values that minimize a properly defined objective function which measures the goodness of the control. The paper is organized as follows. In Sec. 2, the structure of the DFMRAC controller is briefly introduced. In Sec. 3, the application of the proposed approach is offered with respect to the control of a nuclear reactor whose dynamics is described by a simplified, nonlinear model of literature [12]. Finally, some conclusions are drawn in Sec. 4. 2. The DFMRAC Controller for dynamic trajectory tracking The tracking control of a system (called 'plant', in control terminology) amounts to determining a closed-loop function ('control action') for driving the plant output as close as possible to a desired reference trajectory, with guaranteed stability. Often, in practice, the system to be controlled is too complex and its underlying processes are poorly understood, so that a reliable mathematical model of its dynamics cannot be formulated. Recently, fuzzy control has gained popularity as a model-free approach which often outperforms other conventional approaches, such as PID controllers [4] [6]. A fuzzy controller relies on a Knowledge Base (KB) made up of fuzzy rules which capture the experience of human operators and/or are constructed on the basis of measured data representative of the plant controlled operation. The Direct Fuzzy Model Reference Adaptive Control method here adopted belongs to the direct adaptive control family, based on a fuzzy logic representation of the plant dynamics and on the Lyapunov redesigned scheme for guaranteed stability [8] [9], In this method, the antecedents of the inference logic rules are taken from the fuzzy partition of the ranges of the relevant input variables measured by the plant sensors whereas the consequents are the crisp values of two parameters (gains) in the control laws. In the following, we provide a brief summary of the concepts underlying the stable DFMRAC control method introduced in [3]. 2.1. Overview of the DFMRAC controller The general control scheme is shown in Figure 1. The aim of the tracking control strategy is to drive the evolution of the output yp of the Plant, whose dynamic model is unknown, as close as possible to die output ym of the known, properly chosen Reference Model. The Reference Model is fed with the actual, desired reference signal yref and its output ym represents a more realistically trackable Plant trajectory. The unknown dynamics of the Plant is represented by means of
948 a Takagi-Sugeno (T-S) fozzy logic system [13]. Adaptive Laws are introduced to compute and update the control gains/'s and <7,'s (with i = 1, ..., k and k = number of fuzzy rules) based on the tracking error e, the Plant output, the reference signal and the control action u at the previous time step. In the Controller, the control action is computed as a linear combination of the reference signal and the Plant output, weighed by the control gains^'s and qfs, multiplied by the strengths $ ' s of the rules activated in the fuzzy logic system, as it will be shown in the next Section. Reference Model
GA (off-line)
f„Q,
rref
Adaptive Laws
-<3> Fuzzy System
—e
Controller
*
Plait?
•*
Vo
Figure 1: General DFMRAC control scheme
2.2. The DFMRAC algorithm In what follows, a brief description of the different blocks of Figure 1 is given. The interested reader may refer to [3] for further details. 2.2.1. The Plant and the Fuzzy Logic System In the DFMRAC framework, it is assumed that the relevant Plant dynamics can be described as a nonlinear first order model, whereas the higher order components are not modeled, but do not cause instability thanks to the introduction of appropriate, robust adaptive laws (Sec! 2.2.4 below). A TakagiSugeno (T-S) Fuzzy Logic System is employed for modeling the non-linear Plant behavior. If a first order representation of the Plant is assumed and the nonlinearity depends on a vector 9 of n measurable quantities q> = (zh ..., z„), then the process can be described by k 'if-then' rules (fuzzy rules) of the kind if z/is A, and z; is 2?. and ... and z„ is N, then y =-a{y
+b-u
949 ia=\, ...,ma;ib=
1, ...,mb;...
;i„= 1, ...,m„;i = 1, ...,k=maxmbx
... xmn
where A , B. , ..., JV. are the fuzzy sets which describe the antecedents zt, ..., z„ in the i-th rule, among those which partition the corresponding Universes of Discourse (UODs), and a, and 6, are unknown Plant parameters. For a given input vector 9, the output of the T-S system is modeled as
.
Z(tf(»)(^,+M)
yP=~
;
(i)
KtfdO) where /?" ( 9 ) is the strength of the j-th rule in correspondence of 9 and it is obtained resorting to a T-norm which, in this case, is the simple algebraic product of the membership functions. Upon normalization of the strengths of the rules and inclusion of the parasitic dynamics (i.e., linear higher-order perturbations of the Plant), assumed to be stable, and external disturbances (i.e., exogenous signals interfering with the Plant functioning), the above equation reads
yp=-(VT*)yp+{VTb)u-Ay(p)yp+\(p)u
+ d'
(3)
where a and b are the vectors of unknown parameters appearing in the fuzzy rule implication (for a list of assumptions on the plant parameters refer to [3]) and p is the vector collecting the normalized strengths of the rules defined above and it is used to determine the fuzzified control gains (see Sec. 2.2.4), p is the differential operator djdt and A (p), &u(p) are the time-domain linear operators corresponding to the parasitic dynamics. The model is a first order, nonlinear system, since the coefficients p vary dynamically. It can be shown [3] that satisfactory results can be achieved even when a higher order plant is treated as a first order plant. 2.2.2. The Controller A direct control strategy is adopted which implies the fuzzy estimation of the controller parameters. In this view, the control law is a simple extension of the one already devised for the classic MRAC control scheme
«=(prf)>v-(p^k
(4)
950 where f and q are vectors of the fuzzified control gains corresponding to each rule. In the MRAC control scheme, where the Plant is assumed to be a first order linear system, there are only two scalar control gains/and q. They are updated resorting to ad hoc adaptive laws which guarantee stability by means of the Lyapunov redesign scheme [8] [9]. In the DFMRAC algorithm, the Plant is assumed to be a combination of first order linear systems (in the T-S fuzzy model) driven by the strengths of the rules activated by the Fact, i.e. a vector 9 of signals measured from the Plant. The first order linear system related to the generic j-th rule gives rise to two control gains,/ and qt, (i = 1, ..., k) and two corresponding adaptive laws. The control action u in Eq. 4 has the same expression as in the MRAC scheme, but the scalar control gains are replaced by a weighed combination of the 2xk new control gains updated by 2xk adaptive laws. 2.2.3. The Reference Model In classic MRAC theory, the design of the Reference Model requires the condition of perfect model matching, i.e. the relative degree (pole excess) should equal that of the Plant. Thus, in the DFMRAC case, where the Plant is modeled as a first order Takagi-Sugeno fuzzy system, a good choice is a first order Linear Time Invariant (LTI) system, with stability, controllability and minimum phase guaranteed. It is not advisable to choose the reference model neither quicker than the parasitic dynamics, due to robustness issues, nor too slow, for obvious performance reasons. In the case study considered in this work, the Reference Model is the same as in [3], its transfer function in the Laplace domain being: .(*) = - ^ 7
(5)
5+3
2.2.4. The Adaptive Laws In analogy with the classic MRAC method [3] [8] [9], the Lyapunov redesigned scheme for guaranteeing the global stability of the controlled system leads to the following expressions for updating the control gains:
/ = -rA^ynA a
£
- r, \£m\vofA £m v
^rA^ yP^-rJ \ o<}A
i = \,2,...,k (6) i = l,2,-.,k
The parameters involved in the definition of the adaptive laws above are:
951 • • • •
the sign of the elements of b in (3), bsign. The elements of b have the same signs (refer to [3] for the model assumptions). the positive scalar adaptive gains y^ and yqi, the design parameter v0, which determines the intensity of the leakage term, the parameter m, which can be found solving the system ™ =l + n]\n] = ylf+yl+rn.im,
•
= -S0mt+u2
+y\;mt (o) = 0
(7)
with So being another design parameter, e, which is related to the tracking error through the differential equation
e = e-Gm{p){™])
(8)
where n] has been defined above and Gm{p) is the Reference Model operator in the time domain corresponding to Gm(s). Of the above parameters, the positive scalar adaptive gains yfi and Yqi (here assumed to be the same for the i = 1, 2, ..., k fuzzy rules), the leakage term v0, the design parameter 50 are particularly critical for the success of the control strategy and the task of determining their values is a non trivial one. In this work, we addressed this problem resorting to a genetic algorithm (GA) optimization search [10] [11]. The cost function (objective function or fitness in GA terminology) to be minimized is typical for tracking a reference trajectory with a constrained control action: J2=-t[e2(l)
+
RV(l)]
^(0 = 3', (0-^(0
(10)
01)
where yp(J) is the Plant observable output delivered by the system at the /-th of N time steps, ym(/) is the output of the Reference Model and u(J) is the control action, also assumed to be measurable. The coefficient R functions as a weight and needs to be calibrated. 3. Power Level Control in a Nuclear Reactor The DFMRAC method illustrated in Sec. 2 has been applied to the control of the power level (neutron flux) of a nuclear reactor. A simplified model of the Plant, taken from literature [12], has been adopted for describing the neutron flux
952 evolution, based on a one-group, point kinetics equation with nonlinear reactivity feedback and Xenon and Iodine balance equations. The model accounts only for the neutronics of the reactor. The control action u(t) is the reactivity. The reference trajectory yref considered for tracking is a combination of two quasi-steps representing a decrease to 75% of the nominal power followed by an increase restoring the initial power level. The mission time considered is 500 minutes. In the fuzzy rules a single antecedent describes the neutron flux (output of the Plant) at the current time step, normalized with respect to its steady state value. The corresponding UOD, [0.5, 3], has been divided into 11 halfoverlapping Fuzzy Sets defined by triangular Membership Functions. A GA optimization procedure has been carried out to determine the values of the four controller parameters introduced in Sec. 2, so as to minimize the cost function of Eq. 10 with R = 0, i.e. only the mean quadratic tracking error is considered. The time step has been chosen equal to 1 s, tacking into account the limitations on the typical sampling frequencies of operation of the sensors and control devices in such systems. Table 1 contains the parameter ranges considered in the search and the optimal values found. The tracking capabilities of the optimized controller are highly satisfactory (Figure 2), the mean quadratic error being J2 = 3.336 x 10"5. The optimal values of the two control gains # and yq are close to the border of the search range. Yet, an extension of the search space for these variables has not led to any improvement. Table 1. Genetic algorithm data. Parameter v
Search range (10-3,102)
So
(io-3, io 2 )
n Ya
(10 s , IO"2) (IO-5, 10 2 )
Optimal value 4.941x10° 2.627x10-' 9.966xl0"3 9.966x10'3
4. Conclusions When designing a controller for a real plant, the major concern is that an accurate mathematical model of the plant behavior is often not available, or, when it is, it may be characterized by high nonlinearities and complications difficult to capture accurately. For this reason, resorting to model-free controllers is becoming more and more attractive in many practical situations. In this paper, a Genetic Algorithm was employed to optimize the critical parameters of a recently proposed stable, adaptive, model-free fuzzy controller
953 (DFMRAC). The optimization has been carried out with respect to an objective function which in general can include the tracking error and the cost of the control action, thus being capable of taking into account the typical constraints which appear in practice in the controller operation.
40
45
50 55 Tims [mini
60
65
430
435
440 445 Time (mini
450
455
Figure 2: Output of the Plant (nonnalized neutron flux) vs. the Reference signal. At optimized, R = 0 and 1 antecedent. Only the portions in correspondence of the two quasi-step transients are shown.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
K.S. Tsakalis and P.A. Ioannou,, Automatica, 23,459 (1987). M. Krstic and P. V. Kokotovic, Systems Control Letters, 26, 17 (1995). S. Blazic, I. Skrjanc and D.Matko, Int. Journal ofSyst. Sci., 33, 995 (2002). L.-X. Wang, IEEE Transactions on Fuzzy Systems, 1, 146 (1993). M. Marseguerra and E. Zio, Annals of Nuclear Energy, 30, 953 (2003). H. Habibiyan, S. Satayeshi and H. Arab-Alibeik, Annals of Nuclear Energy, 30,1765 (2004). M. Marseguerra, E. Zio and F. Cadini, Ann. ofNuc. Energy, 32, 712 (2005). R. V. Monopoli, IEEE Transactions on Automatic Control, 474 (1974). K. S. Narendra, Y. H. Lin and L. S. Valavani, IEEE Transactions on Automatic Control, AC-25, 440 (1980). J. H. Holland, The University of Michigan, Ann Arbor (1975). D. Goldberg, Addison-Wesley Publishing Company (1989). J. Chernick, Nuclear Science and Engineering, 8, 233 (1960). T. Takagi and M. Sugeno, IEEE Trans. Syst, Man, Cybern., 15, 116 (1985).
A FUZZY-LOGIC-BASED METHODOLOGY FOR SIGNAL TREND IDENTIFICATION ENRICO ZIO Department of Nuclear Engineering, Polytechnic of Milan Via Ponzio 34/3, 20133 Milan, Italy enrico.zio(3).polimi.it
IRINA CRENGUTA POPESCU Department of Nuclear Engineering, Polytechnic of Milan Via Ponzio 34/3, 20133 Milan, Italy
The present work addresses the problem of on-line signal trend identification within a fuzzy logic-based methodology previously proposed in the literature. A modification is investigated which entails the use of singletons instead of triangular fuzzy numbers for the characterization of the truth values of the six parameters describing the dynamic trend of the evolving process. Further, calibration of the model parameters is performed by a genetic algorithm procedure.
1. Introduction The present work investigates the practical use of a modified version of the fuzzy logic-based methodology for on-line signal trend identification proposed in [1]. The procedure for the application of the original methodology is as follows: six parameters are computed from the current and historical values of the measured signal; the trend information carried by each one of the six parameters is evaluated fuzzily by mapping its values into a properly constructed truth decision curve (tdc); the truth values are then again fuzzyfied into triangular fuzzy numbers (TFN); a final fuzzy number (FFN) is computed by means of the max operator to integrate the partial information carried by the individual parameters; the FFN is compared to pre-constructed fuzzy prototypes of trend to make the final decision on the signal dynamic behavior. In view of a simplification of the above procedure for its practical application, in the present work singletons are used, instead of TFNs. The corresponding max operator used for the integration of the information of the six parameters becomes trivial and the decision making task for the final trend 954
955 identification is performed by simply comparing the obtained singleton values with pre-established numerical thresholds indicating steady state or transient behaviors. Furthermore, to improve early detection and transients identification accuracy, the truth decision curves are pre-calibrated via a single-objective genetic algorithm optimization. The optimized method here proposed is tested on a case study concerning a distillation column. The paper is organized as follows. Section 2 synthetizes the fuzzy approach of [1] and its propounded modification. In Section 3 the modified method is tested on artificial signals similar to those in [1]. The calibration of the tdcs by a single-objective genetic algorithm is presented in Section 4. In Section 5, the method is applied for the identification of signal trends developing from the anomalous functioning of a distillation column. Finally, some conclusions are proposed in the last Section. 2. The fuzzy logic-based method of trend identification [1] Let us consider a monitored signal x(t), sampled at equal time intervals of length At, and its associated measurement noise, generally normally distributed with a mean value equal to 0 and a standard deviation o\ The measured signal can then be assumed normally distributed with different means during steady-state and transient conditions [1]. In this view, the objective of a trend identification method becomes that of distinguishing between two Gaussian distributions. To this aim, six parameters are computed from the current and historical values of the signal to capture evidence on its trend [1]: the probability density function (px(t)); the cumulative probability density function (cx(t)); the probability density function of the average of the signal („ it\); the average exponentially weighted derivative Q M ) ; the relative deviation of the signal average from its steady state (d-(t)); the sample derivative (dx(t)). To account for the increasing or decreasing signal trend, to x(t) is associated a parameter, sign(t), equal to -1 if x(t)<mm(xsl) and to 1 otherwise, where xs, is the signal steady state value. The information on the trend carried by the value of each of the six parameters is evaluated in fuzzy terms via a mapping into a properly constructed tdc whose values represent the degrees of truth of the evidence carried by the parameter with respect to the presence or absence of a dynamic trend. The tdcs are functions of the noise amplitude and truncated at a truth value corresponding to a pre-assigned level of importance of the parameter with respect to the trend identification (Fig. 1) [1].
956
Figure 1. The truth decision curves of the six parameters for artificial signals similar to those used in [1] and different amplitudes of noise: 0.1% (dashed line), 1% (dotted line) and 3% (continuous line).
The tdc of each parameter depends on one or two coefficients, here identified after a number of tests (Fig. 1). The truth values of the parameters are fuzzified into triangular fuzzy numbers [1]. The membership functions are set to 1 for the computed truth values and to 0 for the truth values equal to 0 and 1 (sign(t) positive) or -1 (else). The max operator [2] is then adopted for the integration of the information carried by each of the six parameters into a FFN [1]. A practical simplification here propounded amounts to skipping the fuzzification of the truth values, thus considering them directly as singletons. In this case, the use of the max operator for integrating the information carried by the truth values of the six parameters becomes trivial. To make the final decision on the trend underlying the evolving signal, in f 1] the distances between the FFN and properly defined prototype fuzzy numbers are computed in terms of the Dissemblance Index (DI) [1,3]. Two symmetric pairs of triangular prototypes are considered (Fig. 2a): prototypes FN0+ and FN0- correspond to steady state and FN1+, FN1- represent increasing and decreasing transients, respectively. The calculations become trivial when using the truth values (singletons) directly. In this case, the decision making task for the trend identification is performed by simply comparing the obtained max singleton truth value with preestablished thresholds indicating steady state, T0=0, or transient behaviors, T1 + =1 for an increasing trend or Tl- = -1 for a decreasing trend (Fig. 2b).
957 FN1+
T1+
-I
-08 -06
44
0,2
0.1 S.t
M
05
(a) (b) Figure 2.(a) Prototype triangular fuzzy numbers (solid line) and final fuzzy number (dashed line); (b) Singletons prototype numbers (solid lines) and max truth value (dashed line)
In [1], the decision on the undergoing trend is taken with respect to a parameter confid{t) which is a function of DI. Confid(t) quantifies the trust in the information on the signal dynamic trend: a high value means a high confidence that the signal represents a steady state. In analogy with the original method [1], the parameter confid(t) is computed using directly the truth values. A parameter decrease^) is also computed to register the trend of confid[t). Two strategies (termed nonfuzzy and fuzzy, respectively) are then used in [1] for making the decision about the signal trend on the basis of the information carried by the parameters confidit) and decreased). 3. Verification of the method The modified procedure with singletons instead of fuzzy sets has been verified on four artificial signals similar to those analyzed in [1]: decreasing, steady state, slow increasing and step signal. The signals are considered affected by a normally distributed noise with standard deviation equal to 0.1%, 1% or 3% of the mean value. The results are represented in Fig. 3 for the case of 0.1% noise and fuzzy decision strategy. For the representation, a 0 is assigned to steady states, a -1 to decreasing trends and a +1 to increasing trends. Table 1 reports the percentages of correct identification. As expected, a lower performance in trend identification is achieved for the slow increasing signal due to the fact that the noise hides the small increasing trend of the considered signal. The overall performance of the simplified method is equivalent to that of the more general method proposed in [1],
958 StMrty . U K
A
•*«!
Avvy^J\ /v ^Vv—~vvyw
* ( k A yw^VWH<^ v ^^
(^jlf^rf^M1^
Figure 3. The artificial signals and the obtained trend identification results (0.1% noise)
4. The calibration of the truth decision curves As explained in Section 2, the information carried by the six parameters is fuzzily evaluated through properly constructed tdcs. Each one of the six tdcs contains one or two coefficients whose values must be arbitrarily set also depending on the noise level. This can be done effectively by resorting to a single-objective genetic algorithm [4]: the coefficients values selected during the search are evaluated using as single criterion (fitness) the percentage of correct identification achieved by the fuzzy identification algorithm itself. Since the truth decision curves are functions of noise, the determination of their coefficients must be realized separately for every considered noise level. The optimized tdcs for the artificial signals used for verification (Section 3) are presented in Fig. 4 in comparison with the original curves of [1]. 1
1
1
0.5
05
05
0
0
0
-0.5
•0.5
-0.5
-1
-1
-1
Figure 4. Optimized (solid lines) and original (dashed lines) truth decision curves
959 The optimized truth decision curves for the probability density function p cumulative probability density function cx and probability density function for the average value ^ are modified in such a way to associate higher truth values to transients. Indeed, for the transients considered the three parameters bear small values and the detection method is made more sensitive by assigning them higher truth values. On the contrary, to keep the balance for the steady state identification, the tdcs of the other three parameters (dxe, d-> d ) are smoothed X
x
out to assign smaller truth values. The detection results obtained using the optimized tdcs are represented in Fig. 5 for the 0.1% noise case and reported in Table 1 for all cases. Decreasing signal
Steady state signal
^^WV*Y\^Mv-wvVv,l 10
0
120
20 Results
10 20 Slow increasing signal
10
-* 30
^*rfy^v
20 Results
*—
10 20 Step signal 120
20 Results
W*/*'^ 10
A r i n l
^^^
20 Results
Figure 5. Trend identification results obtained using the optimized truth decision curves (0.1% noise) Table 1. Correct identification percentage before and after the tdcs calibration 0.1% noise 1 % noise 3% noise Signal Original Optimized Original Optimized Original
r%i Decreasing Steady state Slow increasing Step
95.00 100.00 70.50 95.00
r%i 98.00 96.00 90.00 98.50
F%1 75.00 89.34 33.00 49.50
r%i 96.00 72.00 82.50 99.00
r%i 47.00 98.67 30.50 23.00
Optimized
r%i 71.00 52.00 51.00 71.50
Early transient detection is clearly improved (for example, the slow increasing signal is correctly classified after only 4 seconds instead of the previous 13 seconds). The higher sensitivity to the transients trend is paid with a minor (safe) worsening in the steady-state recognition.
960 5. Application: identification of transients trends in a distillation column A diagnostic problem is considered to exemplify the application of the proposed approach to trend identification. The system considered is a 12-plates distillation column for separating a flow of methanol and water [5]. The aim is to identify die anomalous functioning regimes due to a variation of three key process parameters, namely the reflux ratio, the heat duty of the reboiler and the feed flow [5]. The time evolution of 270 different signals (temperatures, compositions, liquid and vapor factors for each one of the 12 plates of the distillation column) was simulated in steady state as well as in transient conditions originating in response to three different types of process anomalies: variation of the heat duty (Q), variation of the reflux ratio (R) and variation of the input mass flow (F). Gaussian noise of 3% amplitude was applied to the signals. The results of die optimized fuzzy trend identification are reported in Table 2 for the signals measured in correspondence of the first 6 plates. Table 2. Percentage of correct identification atioilary. rr=transitory, T=temperature, C composition, LF= 2 3 4 Plate St Tr St Tr St Tr St Tr T 0 100 0 100 0 100 0 100 98 6 C 100 29 100 29 97 58 F LF - 98 30 99 40 99 47 15 VF 99 99 99 49 100 57 30 0 100 T 0 0 100 100 100 0 0 C 100 100 0 100 0 100 2 Q LF - 100 92 100 97 100 96 9 VF 100 100 85 100 83 100 78 T 000 100 0 100 0 100 0 100 94 100 92 100 95 C 100 58 99 R LF - 100 2 100 8 100 7 99 99 100 99 VF 100 99 100 99
iquid factor, 5 St Tr 0 100 97 63 99 40 100 70 0 100 97 6 100 93 100 72 0 100 99 95 100 2 99 99
VF=vapor fac 6 St Tr 0 100 95 72 95 53 100 99 0 100 99 3 100 97 100 7 0 100 100 90 100 1 99 96
6. Conclusions In this paper, a modified version of the fuzzy logic-based method for signal trend identification previously proposed in [1] has been presented. In the original procedure, six parameters related to the signal time-history are computed, then fuzzily quantified in truth values by properly constructed truth decision curves and finally summarized into a final fuzzy number. The trend identification is based on the analysis of this final fuzzy number with respect to given fuzzy
961 prototypes representing the signal trend. Because the fuzzification of the truth decision curves is arbitrary, the information integration and the decision strategy may become somewhat cumbersome. In this respect, a simplification may be achieved by using directly the truth values as singletons, as proposed in the present work. The results obtained on artificial data indicate that the simplified method is capable of obtaining the same performance as the original method in detecting transients accurately and identifying their trends reliably, with no misinterpretation of steady-state signals as transients and vice versa. In order to improve the identification of the transients, a calibration of the parameters of the truth decision curves has been proposed. To this aim, a singleobjective genetic algorithm has been exploited. The optimized method turns out to be more sensitive to the presence of a dynamic regime, with the time needed for the correct transient identification being reduced of up to 70%. The simplified and optimized method has been successfully applied on simulated signals of a 12-plates distillation column. The results indicate that the proposed method is capable of early and reliable identification of signal trends in the presence of noisy data. Acknowledgments The authors whish to thank Professors Jose Gozalvez and Juan Carlos Garcia for providing the simulation for the case study of Section 5. References 1. 2. 3. 4. 5.
X. Wang, et al., Nucl. Tech., 135, 67, (2001). L.H. Tsoukalas, R.E. Uhrig, John Wiley and Sons, New York, 101, (1997). G. Bojadziev, M. Bojadziev, WorldSc. Publ. Co. Pte. Ltd., 85, (1995). M. Marseguerra, E. Zio, L. Podofillini, Nucl. En., 30, 1437, (2003). J.M. Gozalvez Zafrilla, J. C. Garcia-Diaz, "Optimization of Distillation Columns using Genetics Algorithms", Milano, (2006).
IDENTIFICATION OF TRANSIENTS IN NUCLEAR SYSTEMS BY A SUPERVISED EVOLUTIONARY POSSD3DLISTIC CLUSTERING APPROACH E. ZIO, P. BARALDI AND D. MERCURIO Department of Nuclear Engineering, Polytechnic of Milan, Via Ponzio 34/3, Milan, 20133, Italy In this paper, the task of identifying transients in nuclear systems is tackled by means of a possibilistic fuzzy classifier devised in such a way to recognize the transients belonging to a priori foreseen classes while filtering out unforeseen plant conditions, independently from the operational state of the plant before the transient occurrence. The classifier is constructed through a supervised evolutionary procedure which searches geometric clusters as close as possible to the real physical classes. The proposed approach is applied to the classification of simulated transients in the feedwater system of a boiling water reactor.
1. Introduction Two important issues for the practical implementation of model-based fault diagnostic systems in Nuclear Power Plants (NPPs) regard the possibility of defining and controlling the boundaries of their utilization and their capability to diagnose a fault independently from the plant operational state before its occurrence [1]. In this work, these issues are tackled by means of a novel possibilistic clustering classifier. The possibilistic viewpoint considers the memberships to a given cluster as degrees of compatibility, or 'typicality' measured with respect to the cluster prototypical members [2, 3]. In this view, the memberships of representative (typical) patterns are high, while unrepresentative (atypical) points bear low membership to all clusters. The approach embraced in this work exploits i) a possibilistic clustering algorithm for classifying the transients or labeling them as "unknown" if the associated feature values are located far away, in the feature space, from those characteristics of the training data; ii) a supervised evolutionary procedure for optimizing a different Mahalanobis metric for each of the possibilistic clusters by exploiting a priori known information regarding the true classes which a set of available labeled patterns belong to [4], [5]. 962
963 The proposed possibilistic clustering scheme is verified with respect to a problem regarding the early identification of a predefined set of faults in a Boiling Water Reactor (BWR). The corresponding transients have been simulated by the HAMBO simulator of the Forsmark 3 BWR plant in Sweden [6]. 2. The supervised evolutionary possibilistic clustering algorithm for classification In this Section, a possibilistic clustering algorithm is developed to perform the diagnostic identification of transients. The traditional, unsupervised possibilistic algorithm based on a Euclidean metric to measure compatibility leads to spherical clusters that rarely are adequate to represent the data partition in practice. A significant improvement in classification performance is achieved by considering a different Mahalanobis metric for each cluster, thus obtaining different ellipsoidal shapes and orientations of the clusters that more adequately fit the a priori known data partition [4,5]. The information on the membership of the N available patterns xk,k=\,..., N, to the c a priori known classes, can be used to supervise the algorithm for finding the optimal Mahalanobis metrics such as to achieve geometric clusters as close as possible to the a priori known physical classes. Correspondingly, the possibilistic clustering algorithm is said to be constructed though an iterative procedure of 'training' based on a set of available patterns, pre-labeled with their possibilistic memberships to the a priori classes. The training procedure for the optimization of the metrics is carried out via an evolutionary procedure, presented in the literature within supervised fuzzy clustering schemes [4] and further extended to diagnostic applications [5]. Here, the procedure is employed within the possibilistic clustering scheme [2]. The target of the supervised optimization is the minimization of the distance D(r',r) between the a priori known physical class partition r ' =(ri,r' 2 ,...,r^.) and the obtained geometric cluster partition
r s (r,,r 2 ,...,r c ): D(r',r)
= ±^l^. M
c
= ±±1^-^1 tTw
(i)
N-c
where 0 < fi'ik <, 1 is the a priori known (possibilistic) membership of the k-th pattern to the i-th physical class, 0 < fijk < 1 is the possibilistic membership to the
964 corresponding geometric cluster in the feature space, N is the number of available patterns and c the number of classes. The overall iterative training scheme can be summarized as follows: 1. At the first iteration ( r = 1), initialize the metrics of all the c clusters to the Euclidean metrics, i.e. Ml{l) = i , i-l,2,...,c, where / is the identity 2.
matrix. At the generic iteration step r, run the possibilistic clustering algorithm [2] to partition the N training data into c clusters of memberships T (r) = | r ] ( r ) , . . . , r c ( r ) j , based on the current metrics Az~,(r) and on the
3.
"supervising" initial partition T' which sets the initial memberships of the N patterns to c clusters equal to die true memberships to the a priori known classes. Compute the distance D(T' ,r(r)) between the a priori known physical classes and the geometric possibilistic clusters by eq.(l). At the first iteration (r = 1) initialize the best distance D.+. to D ( r ' , r ( l ) ) , £>_,_+. to D(T'l,Ti (1)) and the best metrics M,+ toM,.(l) and go to step 5.
4.
If F(r) is close to T', i.e. D(r',T(r))
is smaller than a predefined
threshold e, or if the number of iterations r is greater than the predefined maximum allowed number of iterations T.max., stop: T(r) is the optimal cluster partition T*; otherwise, if D(r',r(r))
is less than D.+. upgrade D.+-
to £>(r',r(r)), M; to M,(r) and D? =£>(r;,r,(r)) . 5.
Increment r by 1. Update each matrix Mt
by exploiting its unique
decomposition into Cholesky factors [5], M* =\&.\
G+ , where G* is a
lower triangular matrix with positive entries on the main diagonal. More precisely, at iterationr, the entries g'lih (r) of the Cholesky factor G ( (r) are updated as follows: gU W = gi;,,2 +Kk(°'3+>
Xl-i-
8U(r) = m a x ( l ( T X / 2 + A ^ ( ° ' 3 + ) )
if L
'-=1^
-(2) -(3>
where 8* = aD*, a is a parameter that controls the size of the random step of modification of die Cholesky factor entries g'*t ,N\,
denotes a
965 Gaussian noise with mean 0 and standard deviation 5, and eq.(3) ensures that all entries in the main diagonal of the matrices Gi (r) are positive numbers and so Mt (r) are definite positive distance matrices. Notice that the elements of the j-th Mahalanobis matrix are updated proportionally to the distance D* between the i-th a priori known class and the J-th cluster found. In this way, only the matrices of those clusters which are not satisfactory for the classification purpose are modified. 6. Return to step 2. At convergence, the supervised evolutionary possibilistic clustering algorithm provides the c optimal metrics M' with respect to the classification task, the possibilistic cluster centers v* and the possibilistic membership values jujk of the patterns xk, k=l, ...,N, to the clusters i=l,2, ,..,c. When fed with a new pattern X, the classification algorithm provides the values of the membership functions [i'(x), i = 1 , 2 , . . . , c , to the possibilistic clusters. These values give the degree of compatibility or "typicality" of 3c to the c clusters. In practice, three situations may arise: i) x does not belong to any cluster with enough membership, i.e. all the membership values /u'(x) are below a given threshold ef (degree of ignorance): this means that x is an atypical pattern with respect to the training patterns; ii) at least two membership values are above the threshold sc (degree of confidence): x is thus ambiguous. In this case, the ambiguity must be regarded as "equal evidence", i.e. the pattern is typical of more than one class and thus cannot be assigned to a class with enough confidence. This situation occurs if x is at the boundary between two classes, iii) x belongs only to a cluster with a membership value greater than the threshold ec: in this case, it is assigned to the corresponding class. 3. Application of the possibilistic classifier to the identification of nuclear transients In this Section, the possibilistic classifier described in the previous Section is applied to the early identification of a predefined set of faults in the feedwater system of a Boiling Water Reactor (BWR) (see [7] for a detailed description of the faults). The corresponding transients have been simulated by the HAMBO simulator of the Forsmark 3 BWR plant in Sweden [6]. Here, the diagnosis considers three power operation levels, i.e. 50%, 80% and 108% of full power. Transient data were made available for each of the fault
966 types, with varying degrees of leakage and valve closures. All transients start after 60 seconds of steady state operation. tau« class 1
+T
ooooooooooococoeeeee >
OOOCCOO
•ftmoooooofcrawowfr
Figure 1: Time profiles of the pattern assignment to the different classes. Memberships to: (+) class Fl, (o) class F2, (*) class F3, (.) class F4, (x) class F5 and (0) class F7. Upper solid line, ec = 0.7; lower dashed line, e./ .= 0.2
Among the 363 measured signals, only 5 signals, i.e. Temperature of drain 4 before valve VB3, Water level of tank TDl, feedwater temperature after preheater EA2, feedwater temperature after preheater EB2, Position level of control valve for preheater EA1, have been chosen for the transient classification using the feature selection algorithm proposed in [8]. 3.1. Case study 1: filtering out unknown transients In this analysis, the patterns used for building the classification system have been taken from the six faults Fl, F2, F3, F4, F5 and F7 that regard line 1 of the feedwater system [7]. For each type of fault, the simulated transients with the plant at 80 % of full power have been considered, taking patterns every 6 seconds from t = 80s to t = 200s. After the training of the possibilistic classifier, its performance has been tested using patterns taken every second from t = 0s to t = 300s from both the training transients and from an unknown transient caused by F13. Figure 1 shows the obtained transient classification as time progresses. Considering a degree of confidence £<..= 0.7 and a degree of ignorance e./= 0.2, the results are quite satisfactory, even though at the beginning of the transient the possibilistic classifier assigns the steady state patterns to the class of fault F2 albeit with low membership. This is explained by the fact that for transients of
967 class F2 there are no significant effects on the selected input signals so that understandably the steady state may be confused with a fault of class 2. Also note that, the possibilistic classifier is able to assign to the right class the foreseen transients also at times well beyond the temporal domain of training of 200s, due to the increased significance of the signals as the transients continue evolving away from their initial steady state. Finally, the algorithm is very efficient in filtering out the patterns of the unknown fault F13 as atypical, by assigning them membership values to all classes less than £./.= 0.2 (Figure 1, bottom). 3.2. Case study 2: classification of transients at different power levels In this Section, the capability of the classifier to identify faults that initiate from different plant operational conditions is investigated. In this respect, the possibilistic classifier is trained using patterns taken from classes Fl, F2, F3, F4, F5 and F7 at 50 % and 108 % power whereas in the test phase also patterns taken at 80 % power are considered. From each of the 12 training transients considered (6 transients for each of the 2 power levels), patterns taken every 6 seconds from 80s to 200s have been used, for a total of 252 patterns. The performance of die classifier has been tested using the training transients (belonging to the foreseen classes of faults at 50 % and 108 %) and the new transients (belonging to the same classes of faults but at 80 % power), taking a pattern every second from 0s to 300s. The behavior of the memberships with respect to all the classes at 80% power is shown in Figure 2. Considering a degree of confidence e^. = 0.7 and a degree of ignorance s.f. = 0.2, the performance of the classification at the new power level 80 % is very satisfactory for the first five classes of faults whereas the transients caused by the fault F7 at 80 % is filtered as atypical at all times. This happens even for the training patterns of class F7 taken at 50 % of full power. To further investigate this situation, a sensitivity analysis has been performed, based on the technique reported in [5]. Figure 3 shows the disposition of the patterns in the subspace formed by two of the three signals identified as most important (signal 320 and signal 195).
968
^.onnoaoooe.oaooaooooo
°l'""&"",«'"&"'&•—&""&
0
SO
0
50
100
150
100
150
200r
2&J
300
0
50
100
150
ioo
190
an
2so
an
o
so
160
sr
150
0 oo5
0 ' « * » * » » »•<
so
200
250
300
^ o o o o o o oo' 200
xo
300
Figure 2: Time profiles of the pattern assignment to the different classes for power level 80%. Memberships to: (+) class Fl, (o) class F2, (*) class F3, (.) class F4, (x) class F5 and (0) class F7. Upper solid line, e^ .= 0.7; lower dashed line, £./ .= 0.2 1
0
' .
V
feature 195
0
• ' -5
t
1
A
J'A
*•>
-•
.
+ cluster 7 center
I
• patterns classes 1,2, 3, 4 and 5 D patterns class 7 (50%) 0 patterns class 7 (80%) * patterns class 7 (108%)
feature 320
Figure 3: Patterns and cluster centers in the subspace formed by features 320 and 195 In this case, the evolutionary algorithm cannot find an optimal metric M'7 that results in a cluster that contains exclusively all the patterns of class F7, without containing those patterns of the other classes which are close to the patterns of class F7 at 50% power level. The reason for this is that two patterns having nearly the same distance from the center of cluster 7 (+), for example pattern A of class F7 and pattern B of class F5 in Figure 3, have nearly the same membership values to class F7, measuring their compatibility with it (eq. A2). In this situation, it is impossible to have possibilistic clusters with sharp borders and the target of the evolutionary algorithm of minimizing the distance
969 D ( r ' 7 , r 7 ) , between the true known physical class memberships r"7, and the possibilistic cluster memberships T 7 , is better satisfied by a small cluster centred on the patterns of class F7 at 108 % than by a big cluster that contains all patterns of class F7 but also patterns of other classes 4. Conclusions In the present paper a supervised, evolutionary possibilistic clustering algorithm is proposed for building a diagnostic system for the classification of transients in nuclear systems. Within the possibilistic clustering scheme, a supervised evolutionary algorithm finds a Mahalanobis metric for each cluster which is optimal with respect to the classification of an available set of labeled patterns. The approach distinguishes from the existing ones because it allows identifying me boundaries of application of the classification model so that 'unbeknownst' transients are filtered out. Also it is capable of correctly classifying faults that occur with the plant in operating conditions different from those of the patterns used for the construction of the classifier model. The approach has been successfully verified with respect to the classification of simulated nuclear transients in the feedwater system of a BWR plant. Acknowledgements The aumors wish to thank Drs. Paolo Fantoni and Davide Roverso of the IFE, Halden Reactor Project for providing the transient simulation data. References 1. J. Reifman, Nuclear Technology 119, 76-97 (1997). 2. R. Krishnapuram and J.M. Keller, IEEE Trans. On Fuzzy systems 1, 98110,(1993). 3. P. Fantoni, internarional Journal of General Systems. 9, 305-320, (2000) 4. B. Yuan G. Klir and J. Swan-Stone, Proc. Fourth IEEE International Conference on Fuzzy Systems, 2221-2226, (1995) 5. E. Zio and P. Baraldi, Annals of Nuclear Energy 32, 1068-1080, (2005) 6. E. Puska and S. Noemann, Enlarged Halden Programme Group Meeting 2, (2002) 7. D. Roverso, Proceedings of PNPIC and HMIT, (2004) 8. E. Zio P. Baraldi and N. Pedroni, Submitted to IEEE Transaction on Nuclear Science (2005)
SIGNAL GROUPING ALGORITHM FOR AN IMPROVED ON-LINE CALIBRATION MONITORING SYSTEM MARIO HOFFMANN OECD Halden Reactor Project, P.O.Box Halden, NO-1751, Norway
173
On-Line Monitoring evaluates instrument channel performance by assessing its consistency with other process indications. In nuclear power plants the elimination or reduction of unnecessary field calibrations can reduce associated labor costs, reduce personnel radiation exposure, and reduce the potential for calibration errors. A signal validation system based on fuzzy-neural networks, called PEANO [1], has been developed to assist in on-line calibration monitoring. Currently the system can be applied to a limited set of channels. In this paper we will explore different grouping algorithms to make the system scalable and applicable in large applications.
1. Introduction On-line calibration monitoring provides information about the performance and calibration state of the monitored channel while a process, e.g. a nuclear power plant, is in operation. The instrument channel performance is evaluated by assessing its consistency with other plant indications. PEANO [1] is a system for on-line calibration monitoring and signal validation, which makes use of empirical modeling techniques, e.g. fuzzy-neural networks. The system utilizes auto-associative neural networks and fuzzy logic to calculate estimates of the monitored signals. One of the limiting factors of the PEANO system in real plant application so far is the number of signals that can be handled within a single application. Due to computational limitations as they exist today, the practical limit of the number of channels that can be handled with a single model lies around 60-65 signals. Considering the number of sensors installed in an operational nuclear power plant, an application able to deal with a maximum of around 60-65 signals will not be sufficient. In these types of application numbers of 300-1000 signals or even more are common. This means that one would have to create multiple models and run several instances of PEANO to handle such an application. This brings along a considerable strain on model handling, configuration and computational requirements. By and large it is desirable that an on-line 970
971 calibration monitoring system is capable to deal with the large number of signals, common in a real plant application. One obvious approach to address this problem is to have the system divide a large number of signals into smaller groups. After the grouping is performed, a model is developed for each of the sets of signals. The final results of these separate models are then compiled together, as is shown in Figure 1, and handled further as one set. It is important to recognize that this grouping is completely independent from the clustering technique, which will still be applied. For each of the identified groups a complete PEANO model will be developed. This means that the clustering technique is applied within each of the groups and an auto-associative neural network (AANN) will be developed for each cluster. Model Group 1
Model Group 2
Model Group...
Figure 1 Overall PEANO model with grouping
2. Signal grouping When dividing a large group of signals into smaller sets, one has to take several aspects into consideration. The modelling techniques applied in PEANO, i.e. specifically the auto-associative neural network (AANN), require that a sufficient level of correlation exists between the signals that are used in the model or sub-models. This criterion should be the basis for any type of grouping routine that will be applied. If the basic requirement is not fulfilled, one will not be able to develop a proper model for the group. However, it has to be noted that we are not looking for a solution where a few of the groups have a very high cross-correlation between the signals and one or two groups are composed of badly cross-correlated signals. The PEANO modelling techniques will not be suited for the groups with the low correlated signals. A grouping solution where
972 the average cross-correlation within the groups is of a sufficient level is much more desirable in that case. Other requirements that have to be considered for a grouping function are: • Any single sensor can be included in more than one group • Not all groups need to have the same number of sensors • The grouping algorithm should produce groups within an acceptable minimum and maximum size • Manual addition of sensors to groups must possible, based on physical process dependencies and expert knowledge When one decides to apply a grouping algorithm as described here, where a single signal can be included in more than one group, one also has to consider how to combine the results. Each sub-model that contains the sensor will generate a prediction for that sensor and there will be some variation in these values. In addition, each of the predictions will have different accuracy bands and there will be a reliability measure associated with each sub-model. Furthermore, the uncertainty for the prediction will be unique with each submodel. To achieve the most reliable prediction for a particular sensor that is validated by more than one model, all these aspects have to be taken into account when merging the prediction results produced by the different sub-models for that one sensor. 3. Grouping Algorithms 3.1. Genetic Algorithm This grouping routine that has been developed and implemented in the NNPLS model is based on genetic algorithm theory [2]. With this technique the problem is described in terms of a so-called chromosome and by applying genetic concepts such as a crossover or mutation operator, the chromosome is mutated. The operations are performed randomly, so when the same operation is applied twice on the same chromosome the offspring that result will not be similar. In Figure 2 it can be seen how the grouping problem is described in terms of a chromosome and in what way a mutation operation works on the chromosome to create the next generation.
31111 3 2 2 1 1 1 3 - 1 2 1 31212
t Signal 1 belongs to group 3 Figure 2 Signal grouping in terms of a chromosome
973 A grouping routine based on a genetic algorithm has a number of drawbacks. Most importantly one is not sure that the final solution is the absolute best result. In principal a genetic algorithm performs a search in a finite, but very large, space of possible solutions. Since a search of the complete space would take far too much time, the search is stopped after a predefined number of iterations. The best result that was found within the searched sub-space is presented as the final solution. The iterative search is computationally intensive and the required time increases as the search space expands. This means that the time needed for the algorithm will rapidly become longer to perform the grouping for an application with a larger number of sensors. This is a common drawback with empirical modelling techniques and should be minimized and preferably prevented as much as possible. The way the genetic algorithm has been implemented the user specifies the desired group size. The solution will consist of groups that in principal will all contain the same number of sensors. When the total number of signals cannot be divided evenly over the number of group, the remaining sensors will be divided over groups, which will have the desired group size plus one. As was identified earlier, splitting the total list of sensors in evenly sized groups will most likely not produce an optimal solution. However, if one would extend the genetic algorithm to include unevenly sized groups, the search space would be increased dramatically. This means that the iterative search would have to be expanded accordingly and the routine would have to run for a long time to make sure that a reasonable solution has been found. 3.2. Symmetric Reverse Cuthill-McKee ordering An alternative approach to the grouping problem has been developed together with the University of Tennessee in Knoxville [3]. This algorithm is also based on the cross-correlation matrix of the signals and makes use of the symmetric reverse Cuthill-McKee ordering [4]. This is a permutation that tends to have its nonzero elements of a matrix closest to the diagonal. This means that the signals with highest correlations are reordered around the diagonal. An example of a typical result from this grouping algorithm is shown in Figure 3. The user specifies a cut-off value for the minimum cross-correlation value that is permitted within a group. The algorithm first determines the crosscorrelation matrix for the signals and setting all values that are smaller than the specified cut-off value to zero. Next the Cuthill-McKee reordering is applied on
974 the resulting matrix. After the appearing blocks have been solidified, groups of signals can be identified along the diagonal. 0
10
20
30
40
50
60
0
10
20
30
40
50
60
Figure 3 Grouping result after a Cuthili-McKee re-ordering
The described algorithm works very fast compared to the genetic algorithm approach, since it is not based on an iterative search of a large and complex space, but a deterministic permutation of the cross-correlation matrix. The fact that the algorithm is deterministic is also an advantage, since one can be sure that the best solution achievable with the algorithm is found every time it is applied. The proposed groups are not necessarily of the same size. The user is not required to specify a desired group size, since the algorithm determines them. Groups also tend to overlap, allowing a sensor to be assigned to multiple groups. The initial algorithm did not limit the minimum or maximum number of signals in each group. As was noted earlier, it is essential for the PEANO approach that a group has a certain minimum number of signals for the modelling technique to be applied. A maximum group size is desired to ensure that not all signals are assigned to one group and essentially the total modelling problem is not divided into multiple sub-models. The initial experience with the Cuthili-McKee based grouping algorithm is that the cut-off value for the cross-correlation is a very sensitive parameter. When the cut-off value is too large the proposed solution will contain some very small groups. Even groups with only one sensor quickly appear as the cut-off value is increased. When the cut-off value is chosen too small, the resulting groups tend to cover all the sensors available for modelling, as can be seen in the
975 grouping result shown in Figure 4. These results undermine the purpose of the grouping routine and are not desired. Grouping. The 7 groups cover 66 variables
0
10
20
30 40 Variable numbers
50
60
Figure 4 Grouping result with a small cut-off cross-correlation value
Overall is seems difficult to find a cut-off value that produces a satisfying result with respect to maximum and minimum group sizes. Furthermore, the algorithm seems to tend towards a solution where a few groups have high correlations between their sensors and one or two groups are left with low correlated sensors. For example, merging the signals in the lower right corner of Figure 3 into one group will result in a sub-model with very badly correlated signal. As we know this will not result in a good model performance and should be avoided. 4. Conclusions It is known that On-Line Calibration Monitoring can improve the safety and efficiency of industrial processes, e.g. nuclear power plants, and the PEANO system has been successfully applied in many small scale feasibility studies to perform this task. From recent applications it has become clear that the system will have to be made scalable to handle large scale applications to satisfy the industries needs. It is clear that a robust and autonomous signal grouping algorithm is needed to achieve this scalability. None of the two grouping solutions presented here allow for one signal to be assigned to multiple groups. It could be left to the user to add specific signals to multiple groups, but an automated approach is desired. These two routines and
976 several other grouping techniques are under investigation and will be developed further to implement a grouping routine that satisfies all the specified requirements. A solution still needs to be found to deal with the posed problems of combining the prediction results for a sensor that has been assigned and validated by multiple sub-models. These issues do need to be addressed, since it is very much desirable to be able to include certain sensors in multiple submodels. References 1.
2. 3.
4.
P. Fantoni, M. Hoffmann, R. Shankar and E. Davis, On-Line Monitoring of Instrument Channel Performance in Nuclear Power Plant Using PEANO, ICONE-10 (2002). J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI (1975). J.W. Hines, A. Usynin, Autoassociative Model Input Variable Selection for Process Monitoring, International Symposium on the Future I&C for Nuclear Power Plant, November 1-4, 2005, Tongyeong, Gyungnam, Korea. E. Cuthill and J. McKee, Reducing the bandwidth of sparse symmetric matrices, InProc. 24th Nat. Conf. ACM, pages 157-172 (1969).
INTELLIGENT TRANSIENT NORMALIZATION FOR IMPROVED EMPIRICAL DIAGNOSIS DAVIDE ROVERSO OECD Halden Reactor Project, PO Box 173 NO-1751 Halden, Norway A fundamental requirement on data to be used for the training of an empirical diagnostic model is that the data has to be sufficiently similar and consistent with what will be observed during on-line monitoring. In other words, the training data has to "cover" the data space within which the monitored process operates. The coverage requirement can quickly become a problem if the process to be monitored has a wide range of operating regimes leading to large variations in the manifestation of the faults of interest in the observable measurement transients. In this paper we propose a novel technique aimed at reducing the variability of fault manifestations through a process of "intelligent normalization" of transients. The approach that we propose in this paper is to use a neural network to perform this mapping. The neural network uses the information contained in all monitored measurements to compute the appropriate mapping into the corresponding normalized state. The paper includes the application of the proposed method to a nuclear power plant transient classification case study.
1. Fault Detection and Diagnosis by Transient Classification A large number of industrial processes are characterized by long periods of steady-state operation with occasional shorter periods of a more dynamic nature in correspondence of either normal events, such as minor disturbances, planned interruptions or transitions to different operation states, or abnormal events, such as major disturbances, equipment failures, instrumentation failures, etc. This second class of events represents a challenge to the smooth, safe, and economical operation of the monitored plant and its equipment. The prompt detection and recognition of such an event is of the essence for the performance of the most effective and informed response to the challenge. The task of dynamic event recognition can be approached from different perspectives, such as symptom-based deductive inference (i.e. expert systems), residual generation, state classification (i.e. static pattern recognition), or transient classification (i.e. dynamic pattern recognition). The latter is the approach that we adopted and implemented in the neural network based a l a d d i n system [1, 2, 3] developed at the OECD Halden Reactor Project. The basic assumption behind this approach is that an event or fault will generate, in 977
978 time, unique changes in monitored plant measurements. The discrimination of such changes can then in principle lead back to the originating event via an inverse mapping process. 2. The Coverage Problem of Empirical Modeling Techniques Neural network based approaches, such as the one adopted in aladdin, fall into die category of empirical (i.e. data-driven) modelling techniques. These techniques have the key advantage of not requiring extensive and detailed knowledge neither of the process being monitored nor of the underlying physical principles involved. Data is in itself completely sufficient for a successful modelling. Then again, accurate and representative data is also a necessary prerequisite for successful empirical modelling. What this means in practice is that the data used for the training of an empirical model has to be sufficiently similar and consistent with the data that will be observed during on-line monitoring. In other words, the training data has to "cover" the data space within which the monitored process operates. This is because empirical models cannot in general extrapolate to unknown situations in a reliable way. The coverage requirement can quickly become a problem if the process to be monitored has a wide range of operating regimes leading to large variations in the manifestation of the faults of interest in the observable measurements. In the specific case of transient classification, where a time-series of process measurements is analysed to detect and diagnose faults and anomalies, it is clear that a fault manifestation in the observed measurements can vary significantly depending on the specific condition in which the fault occurs. This variability has two main consequences: 1. The number of fault transient examples that will have to be acquired (or simulated) to achieve sufficient coverage will quickly grow to potentially unmanageable proportions. 2. The specific empirical modeling technique used might run into problems while trying to learn to classify as the same fault a set of widely different transient manifestations. Both of these problems have been experienced in different applications of the a l a d d i n system mentioned above. One example, which we will also use in the case study reported in Section 4, involved the recognition of a number of faults and anomalies in a boiling water reactor (BWR) simulator [4]. The simulator used is an experimental simulator of the Swedish Forsmark-3 BWR nuclear power plant.
979 The initial set of faults considered, as reported in [4], involved a number of malfunctions and leakages in the high-pressure preheating section of the feedwater system, all simulated at full power operation. This task proved manageable with aladdin, but the extension of the case study to a full range of power operation levels (from 50% and upwards) proved much more difficult. This was mainly due to the large influence that reactor power has on most monitored measurements, leading to significantly different fault manifestations. In this paper we propose a novel technique aimed at reducing the variability of fault manifestations through a process of "intelligent normalization". Section 3 will describe an intelligent normalization technique based on neural networks, while Section 4 will show its application to the problem of power normalization in the case of nuclear power plant transient classification. 3. Intelligent Transient Normalization with Neural Networks Transient normalization can be defined as the process of transforming or mapping instances or examples of a transient class into a common (normalized) transient prototype. An example of this is given in Figure 1, which shows a case in which three different instances of the same fault are manifested in one observed measurement in three different ways due to different initial conditions . The task of transient normalization is to transform these so as to make them as similar as possible to the given prototype. Obviously, this is in general not possible by observing a single measurement, but when observing simultaneous changes in a number of measurements it should be possible to infer the current process state and compute an appropriate transformation. The approach that we propose in this paper is to use a neural network to perform this mapping. The neural network uses the information contained in all monitored measurements to compute the appropriate mapping into the corresponding normalized state. The proposed neural network model will have N inputs and N outputs, where N is the dimensionality of the transient data. The network is then trained using as input data all the available transient examples, while the output (target) data will be corresponding repetitions of a chosen prototype transient. An example taken from the data shown in Figure 1 is given in Figure 2.
* Pis. note that in all figures the abscissa indicates time in seconds while the ordinate is in the physical unit of the particular measurement being shown. These physical units are not shown for simplicity since they have no direct relevance in the discussion.
980 Transient Example 2
Transient Example 1
Transient Example 3
Figure 1. Transient Normalization. Input Training Data
Output Training Data
55
55
50 45
rl
40 35
25
-J
0
200
400
600
800
-
r
50
45
30
20
rT
40
^1
1
' 0
200
400
600
800
Figure 2. Training Data for one Input of the Neural Network Transient Normalizer.
In the preparation of the training data it is of course essential for the input examples to be correctly matched with the output prototypes, meaning that both the beginning of the faults as well as the size and character of the faults need to be the same for the inputs and for the target outputs. The following section shows an application of this approach to the normalization of the set of nuclear power plant transients already mentioned in Section 2.
981 4. Power Normalization for Nuclear Power Plant Transient Classification We discuss here the implementation of a neural network based transient normalization function for the case of 13 fault classes simulated in an experimental simulator of the Forsmark-3 BWR nuclear power plant in Sweden. For each of the 13 fault classes, 15 transients were simulated, corresponding to 5 fault variations, each simulated at 3 different power levels, namely 50%, 80%, and 108% power. In all simulations 12 selected measurements were logged and constituted the basis for the diagnosis. The 5 fault variations were the following: 1. ss - step small (e.g. a small abrupt leakage) 2. sm - step medium (e.g. a medium size abrupt leakage) 3. sb - step big (e.g. a larger size abrupt leakage) 4. rst - ramp short time (e.g. a gradual leakage developing relatively quickly) 5. rlt- ramp long time (e.g. a gradual leakage developing relatively slowly) The size of the fault in the rlt and rst cases corresponded to the ss case, i.e. it was a small fault. For the development of the neural network transient normalizer the 13 sb fault examples at 108% power where chosen as transient prototypes while the sb fault examples at all three power levels where used for training. In the following we will present a selection of the results obtained. Input Training Data
Output Training Data and Output Results
100
200300400500500
700
800900
Figure 3. Training Results of Input 1 of the three sb Cases of Fault 1.
First we show in Figure 3 the training result on the first input of the three sb cases of Fault 1, and in Figure 4 the training results on the third input of the three sb cases of Fault 8. In all these graphs the clearly visible sequence of three transients corresponds to the three power levels 50%, 80%, and 108%. In all
982 figures the training and target output data are shown as bold lines while the neural network output is shown as a thin line. Output Training Data and Output Results
Input Training Data 220 215
215
210 214 205
v
200
213
196
212
190 211
_ 0
100
.
.
200300400500600
.
700
800
900
0
100
200300400500600
700
800900
Figure4. Training Results of Input 3 of the three sb Cases of Fault 8.
We can easily observe in both cases how the neural network model has almost perfectly learned how to map transients at different power levels to the prototype transient at 108% power. Similar results to the ones shown were also obtained for the other measurements and the other fault classes. Tests were then performed on transient cases not used during training, namely all the ss, sb, rlt, and rst transients. A selection of the obtained results is presented in the following. First we show in Figure 5 the test results of the first output of the three rst cases of Fault 1. Input Test data
300
Output Test Data and Output Results
1000
1200
1000
1200
Figure 5. Test Results of Input 1 of the three rst Cases of Fault 1.
The results are still more than satisfactory, greatly reducing the variability of the original transients. Next we show in Figure 6 the test results of the twelfth output of the three rst cases of Fault 13.
983 Input Test Data
Output Test Data and Test Results
195
•
190
185
•
175
170
•
-
165
200
400
600
800
1000
1200
200
400
BOO
BOO
1000
1200
Figure 6. Test Results of Input 12 of the three rst Cases of Fault 13.
Here we can see how even very small changes are well mapped by the trained neural network (one should notice that the apparently large normalization errors appearing in the right hand side graph are due to the scaling of the y axis), a fact demonstrated also by Figure 7 where a similar case is shown. Input Test Data
Output Test Data and Output Results
Figure 7. Test Results of Input 9 of the three rst Cases of Fault 6.
Comparable results to the ones shown here were obtained for most of the tested transients. 5. Final Remarks and Future Work From this set of tests it appears that the proposed technique could be a promising approach for addressing the coverage and modeling problems of empirical techniques when applied to fault detection and diagnosis tasks. Additional work is needed to better evaluate the proposed method. One obvious extension of the tests conducted so far would be the validation on
984 transient data obtained at power levels different from the ones used for training (e.g. at 60% or 90%). Furthermore one should compare the actual training performance of an empirical diagnostic system like a l a d d i n on the normalized data versus the performance on the original data. The utilization of ordinary transient data, i.e. not generated by faults, to train the neural network normalizer could also be considered. If successful, this would have the very desirable consequence of further reducing the amount of fault examples necessary for covering the operational regimes of the monitored process. In principle, the availability of an ideal normalizer would mean that a single example (i.e. a prototype) for each fault class would be in principle sufficient to train an adequate classifier, because any other instance of the same fault would be correctly mapped to the learned prototype by the normalizer. If a particular type of fault, when occurring in different operating regimes, were to lead to very different manifestations in the observed transients, then the proposed approach can not be expected to necessarily provide sufficient normalization and alternatives would have to be sought. One obvious possibility would be to split the specific fault type into a sufficient number of fault subtypes depending on the number of fundamentally different manifestations that are observed. Our experimentation so far has not shown clear examples of this happening, however practical experience with more applications of the proposed technique will guide further developments in this direction. References 1. D. Roverso: "Plant Diagnostics by Transient Classification: The ALADDIN Approach", International Journal of Intelligent Systems, Vol. 17 No. 8, pp. 767-790, Wiley Periodicals Inc., 2002. 2. D. Roverso, "On-Line Early Fault Detection & Diagnosis with the Aladdin Transient Classifier", in Proceeding of NPIC&HMIT-2004, the 4th American Nuclear Society International Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies, September 19-22, 2004, Columbus, Ohio. 3. D. Roverso, "Dynamic Empirical Modelling Techniques for Equipment and Process Diagnostics in Nuclear Power Plants", IAEA Technical Meeting On-Line Condition Monitoring of Equipment and Processes in Nuclear Power Plants Using Advanced Diagnostic Systems. 27-30 June 2005, Knoxville, TN, USA. 4. D. Roverso, "Fault diagnosis with the Aladdin transient classifier", in Proceedings of System Diagnosis and Prognosis: Security and Condition Monitoring Issues III, AeroSense2003, Aerospace and Defense Sensing and Control Technologies Symposium. Orlando, FL, 21-25 April 2000.
USER INTERFACE FOR VALIDATION OF POWER CONTROL ALGORITHMS IN A TRIGA REACTOR JORGE S. BENITEZ-READ*, CESAR L. RAMIREZ-CHAVEZ Institute National de Investigations Nucleares (ININ), Carretera Mexico-Toluca La Marquesa, Ocoyoacac, Mexico, C.P. 52750 f jsbr@nuclear. inin.mx And Institute Tecnologico de Toluca and UAEM, Mexico
S/N,
DA RUAN Belgian Nuclear Research Centre (SCK-CEN), Boeretang 200, 2400 Mol, Belgium Phone: +32-14-332272, Fax: +32-14-321529, E-mail: [email protected] The development of a user interface whose main purpose is the testing of new power control algorithms in a TRIGA Mark ID reactor is presented. The interface is fully compatible with the current computing environment of the reactor's operating digital console. The interface, developed in Visual Basic, has been conceived as an aid tool in the testing and validation of new and existing algorithms for the ascent, regulation and decrease of the reactor power. The interface calls a DLL file that contains the control program, makes available to the user the plant and controller parameters, and displays some of the key variables of the closed loop system. The system also displays the condition of the reactor with respect to the nuclear safety constraints imposed by the Mexican Nuclear Regulatory Commission (CNSNS). One of the algorithms under test is based on a control scheme that uses variable state feedback gain and prediction of the state gain that guarantee the compliance of the safety constraints.
1. Introduction Different power control algorithms have been developed for the TRIGA Mark III research nuclear reactor of Mexico. One of them is the exact input-output linearizing control [1], which presents good reference signal tracking characteristics. Another controller, based on a Mamdani type fuzzy inference machine [2], efficiently deals with the uncertainties and the slow variations of the plant parameters, mainly due to its intrinsic interpolation among operating regions. These controllers satisfactorily bring the reactor power from 50W (source level region) to 1MW (full power). However, the first controller was not designed to avoid the automatic reactor shutdown (scram) due to small reactor period values, which means that the power increment rate overpasses a safety limit. On the other hand, the second controller and its sub products have not been 985
986 exhaustively tested for initial and final power levels other than 50W and 1MW, respectively. Also, these controllers do not consider any kind of regulation during power descent operation, situation that can lead to unexpected oscillations. One control scheme that considers minimization of reactor period scram, variable initial and final power levels, and controlled descent of power is called variable state feedback gain controller [3], which combines different techniques such as state feedback to attain a given set point, a numerical method used to speed up the power increase, predicting the best state gain that guarantees the compliance of the reactor period, and a stage based on fuzzy logic that minimizes oscillations. The performance of these control schemes has been simulated using Matlab. The next step is the testing and validation of these controllers in real time. To this end, it was proposed to design a computing system, similar to the reactor digital console, together with a simulator of the reactor dynamics, to perform such testing. Two problems initially arise: (a) The Matlab .m files are not directly compatible with the operating software used in the reactor console, which was developed in Visual Basic; and (b) The software of the reactor console is not sufficiently modular. For instance, the current control algorithm (PID) [4] is part of the main program and cannot be replaced by new algorithms as modules of the system. The proposed interface windows are designed to look similar to the windows that the reactor operators are used to see. Thus, the control buttons and visualization of parameters are similar to those presented by the reactor console. Likewise, the system will have the capability to manage nuclear instrumentation to test its response in real time. The interface, developed in Visual Basic, has been conceived as an aid tool in the testing and validation of new and existing algorithms for the ascent, regulation and decrease of the reactor power. 2. Current Reactor Digital Console The operation of the reactor digital console is briefly described. The console software was ergonomically designed. The user, through a friendly computational environment [4], can easily get information about the operating condition of the reactor. The control rods are managed by the Rod Control System (RCS), which is shown in Fig. 1. The RCS can be operated manually by means of a keypad located on the control panel or automatically from the reactor digital console. The console contains also electronic modules through which control commands are sent to the actuators to move the control rods, and also
987 receives data from the sensors to determine the operating condition of the
Figure 1. Rod Control System (RCS)
When the reactor is operated in automatic mode, a PID controller regulates the reactor power. , PID CONTROLLER
SPEED REGULATOR
JUMP LIMITER
CONTROL DIGITAL CONSOLE ROD JUMP PERCENTAGE OF DEMAND
NEUTRON POWER
ROD CONTROL SYSTEM 4.
DETECTION SYSTEM
..
| TRIGA MARK III REACTOR
J
ROD DRIVERS
-IIREACTOR CORE
Figure 2. Block diagram of the control system of the TRIGA Mark HI Reactor.
988 The controller output governs the up and down motion of the regulation and fine control rods to attain and maintain an acceptable error between the percentage of demand (set point) and the instantaneous power measured through the linear channel. A block diagram of the subsystems of the digital console system and their relation with the reactor is shown in Fig. 2. 3. Variable State Feedback Control Algorithm This control scheme (Fig. 3) has been simulated under different initial and final power levels, and different values of the reactor period for scram condition to guarantee a safe ascent of power. The period scram condition states that the power at time tt+At should not exceed the level n(tj)exp(At/T), that is, nfa+At) < n(tj)exp(At/T) or T < At/In (n(t; + A0/n(r,))- For instance, let T=3s; if At/ In (n(tj +Ai)/n(ti)) falls below the 3 seconds limit, the power is increasing very fast; the result is an automatic shutdown of the reactor (scram).
-C>
-> g
.c
X
u
, k
\
\ h
r
Predictor
,
r
^Y |
x(0)
Takagi Sug
Figure 3. Block diagram of the variable state feedback controller.
The design procedure started with the analytical definition, based on the mathematical model of the reactor, of the scaling factor in steady state between the external reactivity pext (t) and die reactor power n (t). Then, using the theory of state space, a state feedback control law was proposed. The resultant closedloop control system, although eliminates the overshoot, it renders an excessively slow neutron power response. The solution to this problem was given by incorporating a stage that predicts the required feedback gain k3. The predictor, derived from the theory of first order numerical integration, produces very good results during the first stage of the ascent of power. However, the neutron power presents an abnormal behavior (including irregular oscillations) when it
989 approaches the desired reference level. To cope with this problem, a TakagiSugeno fuzzy stage was added to regulate the predictor action. The results obtained include a fast response and independence of the wide variety of potential operating conditions -something not easy and even impossible to obtain with other procedures. The point kinetic equations of the reactor were used, as the mathematical reactor model, in all stages of the controller design. These equations describe the dynamic behavior of small reactors where the spatial variations of neutron flux are also small [5]. 4. Development of the User Interface The algorithm coded in Matlab was converted into a shared DLL block using the Matlab compiler tool, which generates source C code. Then, the Visual C++ compiler converts the C code into the DLL block. One of the modifications consisted in carrying out the storage of state gain k3 and the solution of the plant model outside the DLL block. The methodology used to convert the algorithm to a shared DLL block follows these steps: a) The control algorithm is defined as a Matlab function (userfitnction); the source and header files are generated with the Matlab compiler tool using the instruction: mcc -W lib:userfunctionlib -L C -t -T linklib userfunction.m; this instruction is used to create shared C libraries that perform specialized computations. These libraries can be called from a program developed in other programming language. The options of this command are: (i) -W main to create a wrap file for a Matlab independent application; (ii) -L C to generate C code as object language; (iii) - t to convert .m code into C code; and (iv) -T link:lib to create a C library as output. b) The source (.c) and header (.h) files generated in a) are stored in the working directory where the DLL is to be created. The libraries used by the Matlab function are also copied in the working directory. For instance: libmx.lib, libmmfile.lib, libmatl.lib, libmwsglm.lib, and sgl.lib. These libraries can vary, depending on the tasks performed by the Matlab function. c) Visual C++ generates the DLL WIN 32; within the DLL project (in the Project menu) the files and libraries in the working directory are added. d) A new source file (userfunctionWrap) is added to the project, which is created as a .cpp file. Then, this file is renamed as userfunctionWrap.c since the libraries generated by the Matlab compiler are .c; tiiis file defines the pointers to the input and output variables of the DLL. The number of pointers is defined by the input and output arguments in the original Matlab function. Also, this file
990 (userfunctionWrap.c) calls some of the source C files generated by the Matlab compiler. e) A DEF file is created for the DLL in the DLL project using the Text file option. f) To create the DLL of the original Matlab function, the following paths are added to the Visual C++ compiler: c:\MATLAB6p5\extern\include and c:\MATLAB6p5\extern\lib\win32\Microsoft\msvc60. Once the control algorithm has been converted to a DLL, the interface was developed in Visual Basic [8], where the DLL is declared as a function with its corresponding inputs in Visual Basic. Thus, the reactor console can call the DLL controller to regulate the neutron power. Fig. 4 shows the user interface of the reactor digital console. The main features of this interface are: (a) Numeric display of the following parameters: Demanded power, current power, reactor period, external reactivity; (b) Graphical display of the current power; (c) Selection of the operating mode, the start up and the shut down of the reactor; and (d) Selection of the controller to be used in the automatic mode. AnJ>vo
£cnliq J,L_ fin Putennn n ( 1 ) Contrckwor r
' Propo'Dcrnl
4paja*J
Merdnni
Polar* la {Watts)
I ?S C?f
|
,»
Cena.i . n f . U K i )
ii'icii
[Tnao ^ J
I
. J-U
i
Sail'
|
Figura 4. User interface of the reactor digital console.
In order to test the controller in a computational environment as close as the real operating condition of the reactor, a simulator of the reactor dynamics was also developed in Visual Basic. The interaction between the user interface and the reactor simulator is through serial communication. Fig. 5 shows the main
991 window of the reactor simulator. Notice that the control is being carried out exclusively with the regulation rod.
IsDO
.Simutador del Reactor TRIGA'. Mark Rcact.vidid Lt'eina
Estada del Reacts* | Crcend JD
Tiansitaia
Seguidad
|
|
M
u
Fma |
i
Requladora |
=hb
Potmen PVallt)
T
I j
£nagaf
eirpeia"uia(,ZJ
iITi Tempo [icg.^
I
Figure 5. Simulator of the TRIGA Mark HI reactor.
The control ActiveX MSComm [7] was used to develop the communication module. The serial communication rate is 56000 bps. At this speed the entire transmission and processing of the control variables are guaranteed. Fig. 6 shows a block diagram of the interface/simulator system. Digital Console System PCI VB User Interface
t1 1•
. <_
Control algorithml (DLL)
Serial communication module
Reactor Simulator PC2 Axt-
Serial communication module i
n (I), T
i i
Conversion algorithm of reactivity to rod position
'
Point kinetic equations
Figure 6. Block diagram of the interface/simulator system.
The variables transmitted between the modules are the external reactivity (control signal), the neutron power (controlled variable), and the reactor period. After each module was independently tested, they were interconnected via RS232.
992 5. Results and Conclusions The execution of the control algorithm in C++ takes a few milliseconds, thus permitting its application in real time. The communications protocol works at 56000 bps, allowing the data acquisition, control signal computation and transmission as in the real console. A proportional control has been tested with satisfactory results. The testing of the real time performance of the variable state feedback controller is still under development. Further work is related to the training of reactor's personnel and the addition of electronic instrumentation to the interface system. Acknowledgments This work has been partially funded by the Mexican National Council for Science and Technology under grant 33797. References 1. Perez Carvajal, Victor; "Control linealizador entrada-salida de un reactor nuclear", Tesis Ingenieria Electronica, I.T. Toluca, 1994. 2. Benitez-Read, J.S.; Velez-Diaz, D.; "Controlling neutron power of a TRIGA Mark III research nuclear reactor with fuzzy adaptation of the set of output membership functions"; Ch. 5 Fuzzy Systems and Soft Computing in Nuclear Engineering; pp. 83-114; Physica-Verlag 2000. 3. Perez Cruz, J.H.; Benitez Read, J.S.; "Controlador con Retroalimentacion Adaptable de Estado para la Regulation de la Potencia de un Reactor Triga MARK III"; Electro 2004, Vol. XXVI, pp. 49-54. 4. Rivero Gutierrez, T.; Gonzalez Marroquin, J.L.; Sainz Mejia, E.; "Manual de Operation, Consola de Control Digital del Reactor TRIGA Mark III"; INTN, mayo de 1997. 5. Hetrick, D.L. "Dynamics of Nuclear Reactors"; The University of Chicago Press, 1971. 6. Perez Martinez C , "Simulador de la cinetica puntual de un reactor nuclear TRIGA Mark III con control difuso de potencia en un ambiente visual", Tesis Ing. Computation, UAEM-UAP Valle de Chalco, Edo. de Mexico, 26-MAR-2004. 7. Gurewich, N.; Gurewich, Orl; "Aprendiendo Visual Basic 5"; Prentice Hall, 1998. 8. Ying, B.; "Applications Interface Programming Using Multiple Languages, A Windows Programmer's Guide"; Prentice Hall, 2003. 9. Laney, J.; "Writing Test Modules Using Standard Interfaces and Languages"; 0-7803-5868-6/00/2000 IEEE Press Series, 2000.
AUTHOR INDEX Abramson Y. Aktas E. Albayrak Y.E. Alhajj Aliabady S. Alipour M. A. Alonso S. Apolloni B. Ates N.Y. Augusto J.C
737 351 307, 375, 389 193 783 811 425 625 321 817
Babu K.S. Bacardit J. Bandera A. Baptista R.P. Baraldi P. Barroso A. Becerra J.A. Bellas F. Benftez-Read J.S. Beskese A. Bhattacharya A. Bien Z. Bila J. Blazewicz J. Bocker S. Bosin A. Botelho D. Botta A. Bozbura F.T. Brega A. Bukovsky I. Biiytikozkan G.
775 601 711 907 938, 962 899 725 725 985 395 315 5 215 601 565 593 899 35 395 625 215 367, 383, 477
Cabrerizo F.J. Cadini F. CaoZ. Carle B.
425 946 719 867
Ceravolo P. Cerchiara G.M. Cevik S. Chen G. Chen S. Chen X. Cheng W. Chiclana F. Cinar D. Cleva J.M. Coelho L. Curcio A. Czauderna S.T.
511 883 321 130, 169 433 692 745 425 359 19 573 511 549
Damiani E. de Andres R. de Lima A. De Lope J. de Moraes R.M. de Oliveira Valdek M.C. de Pedro T. de Sampaio P. Decker J. Del Toro J.C. Delavar M.R. Demirel N.C. Demirel T. Deng Z. Deschrijver G. Dessi N. Dillon T. Ding Y. Djurabekova F. Domingos R.P. Duan C. Duro R.J.
511 417 915 725 791,799 799 833 899 541 711 401 457 457 485 43 593 291 533,665,731 883 883 692 725
Elssner T.
541
993
994 Ercan S. Erensal Y.C. Ertay T.
383 307 299
FangM. Feng L. Feyzioglu O. Filippone M. Fiot C. Foerster T. Francois D.
499 162,659,686 367 617 519 223 557
Gallego F. GanG. GaoY. Garcia R. Garcia-Lapresta J.L. Garvey D.R. Genevois M.E. Goertzel B. Goertzel I. Gola G. Gonzalez C. Gregoire E. Gupta M.M.
805 81 839 833 417 922 375 573 573 825 833 67 215,704
Halang W.A. Hampel R. HanS. HaoK. Hardeman F. Herrera F. Herrera-Viedma E. Hines J.W. Hirst J.D. Hoffmann M. Holena M. Hong K.S. HouZ. Hu F.S. Huang C. Huang Q.
130,505 223 671 731 867 425 409, 425 922 601 970 209 753 704,719 162 146, 698 692
JinW. Jalili S. JiaD. Jiang B.
231 811 146 185
Kaestner W. Kahraman C. Kanemoto S. Karami J. Kaya I. Kayakutlu G. Kerre E.E. Kilic S. KimJ.H. Koehl L. Komura I. Kostrzewa M. Krasnogor N. Krier C. Kumar R.S. Kunsch P.L
223 283,315,449, 477 745 783 359 383 43 449 753 201 745 541 601 557 775 859
LaiD. Lapa C. Laurent A. Lazzerini B. Lee H.E. Lee S.W. Leung Y. LiB. LiD. LiF. Li P. LiT. LiX. Li Y. LiZ. Liang X. Liao G. Liao Q. Lin J.
609 899 519 35 5 5 643 103 671 81,704 130 177, 839 493 678 130 75, 329 633 493 839
Lintow C. Liu J. LiuX. LiuY. Llorens F. Lobo F. Lollini P.-L. LongX. Lopez V. LuH. LuJ. LuZ.
223 817 201, 329 122 805 573 587 111 19 59 59,247,291, 441, 659 154,485
Ma J. Machado L.S. Machado M.D. Malek M.R. Malerba L. Marcek D. Marcek M. Marcelloni F. Martinez L. Maschio I. Mastriani E. Masulli F. Mata F. Matros H.-P. Meneses A. Mercurio D. Metin B.C. Milis K. Mock U. Moens D. Montero J. Moreira M. Motta S. Moutarde F. Mundici D. Muselli M. Naranjo J.E. Nattkemper T.W.
433 791, 799 915 401, 783 883 259, 275 259, 275 35 409,417 851 587 617 409 549 891 962 383 343 549 51 19 899 587 737 8 267 833 565
Nikiforuk P.N. Nikravesh M. Nunez P.
704 27 711
0wre F. Ozyer T. Ozyol M.
11 193 389
Palaniswami M. Palopoli L. PanW. Pappalardo F. Park K.H. Parker M.W. Pedroni N. PeiZ. Pennachin C. Pennisi M. Pereira C. PesB. Peterek A. Pinelli M. Popescu I.C. Prosdocimi F.
609 579 177 587 5 609 938 87 573 587 899, 907 593 549 511 954 573
Qian Y. QinC. QiuL. Queiroz M.
253 643 692 573
Rama G.L.J. Ramirez-Chavez C.L. RaoS. RenL. Ren X.H. Resconi G. Riecan B. Rocco C. M. Rombo S.E. Roverso D. Rovetta S. Ruan D.
609 985 845 533 665, 678 27 138 267, 930 579 977 617 867,985
996 Salazar A. D.E. Sanchez E. Sandoval F. Satorre R. Schirru R. SchleifF.-M. Seiffert Sergiadis G.D. Shen Z. Shiwa M. Silavi T. Souto K.C. Stanciulescu B. Steux B. Stout M. Strickert M. Susanto S.
930 13 711 805 875, 891, 907, 915 541 549 759, 767 719 745 401 875 737 737 601 549 315
TanM. Tang B. Tanyas M. Teisseire M. Terracina G. Tian H.X. Timm W. Tradigo Tsunoyama S. Twellmann T.
704, 719 75, 329 335 519 579 678 565 565 745 565
Ulukan Z. Ustundag A. Valentini G. Vandepitte D. Vanhoof K. Vasant P. Vazquez-Martin R. Verleysen Veltri P. Villmann T. Viviani M. Vlachos I.K.
283 335 625 51 343 315 711 557 579 541 511 759, 767
Vroman A.
43
Waintraub M. WanL. Wang H. Wang P.P. Wang S. Wang W. WangX. Wang Y.
907 162 704, 817 16 719 95, 103, 185 505 81,485,665, 678 169, 185 557 185 81
WeiQ. Wertz V. WuK. WuZ. Xiao F. XieW. XieY. Xu D.L. Xu X.P. XuY.
633 241,465,471 671 867 525 87, 95, 103, 111, 116, 122, 201,241,433, 465
YanX. Yang F. Yang J.B. YiL. Yu C.Y. YuJ. Yucel G.
247 698 867 87, 111, 116 665 499 351
Zadeh L.A. Zeng C. Zeng X. Zeng Y. Zhang G.
3 241,465,471 201 499 59,247,291, 441 247, 643 499
Zhang J. Zhang P.
997 Zhang T. Zhang X. Zhang Y. Zhang Z. ZhaoD. Zhao J. ZhaoY. ZhouY.
154 533,686 704,839 75 231 116 505 154
ZhuD. ZhuH. ZhuangH. Zio E. ZouK. Zou L.
75,329 116 329 825,930,938, 946,954,962 651 95,103
FUNS, originally an acronym for Fuzzy Logic and Intelligent Technologies in Nuclear Science, is now extended to Applied Artificial Intelligence for Applied Research. The contributions to the seventh in the series of FUNS conferences contained in this volume cover state-of-the-art research and development in applied artificial intelligence for applied research in general and for power/nuclear engineering in particular.
1
'orld Scientific YEARS OF n 8
1
-
>>1 I M - l l 2
0
0
9 789812 56690',
www.worldscientific.com
Recommend Documents
Sign In